Hi Jamie & all, > > I will modify the libunistring and libicu based algorithms tomorrow so > that if ASCII-7 only, normalization and casefolding is not done, just a > tolower() of each character. That would make the values more approximate > to the glib/custom parser. >
Just finished the ASCII-only improvement in both libunistring and libicu, and here are the new results. This time instead of the mean value of several tests, I took the minimum one. For the 50k ASCII-only file: * glib/pango: 0.062 * libicu: 0.060 * libunistring: 0.057 For the 200k ASCII-only file: * glib/pango: 0.189 * libicu: 0.200 * libunistring: 0.119 And for the 182k mixed english/chinese/japanese file: * glib/pango: 21.4 * libicu: 0.220 * libunistring: 0.175 So, with this improvement considering ASCII-only words a special case, libunistring really beats them all. libicu and glib/pango remain pretty similar, and while libicu seems faster for the smallest file, glib/pango seems faster in the biggest one. As a reference, added also the test with the mixed english/chinese/japanese, which also change with the new ASCII-only parsing improvement. Now libunistring seems 20% faster than libicu (was around 10% yesterday). Cheers! -- Aleksander _______________________________________________ tracker-list mailing list [email protected] http://mail.gnome.org/mailman/listinfo/tracker-list
