Re: [Tracker] libicu & libunistring based parsers (was:Re: libunistring-based parser in libtracker-fts)

Aleksander Morgado Wed, 05 May 2010 03:54:21 -0700

Hi Jamie & all,

> 
> I will modify the libunistring and libicu based algorithms tomorrow so
> that if ASCII-7 only, normalization and casefolding is not done, just a
> tolower() of each character. That would make the values more approximate
> to the glib/custom parser.
>


Just finished the ASCII-only improvement in both libunistring and
libicu, and here are the new results. This time instead of the mean
value of several tests, I took the minimum one.

For the 50k ASCII-only file:
 * glib/pango:   0.062
 * libicu:       0.060
 * libunistring: 0.057

For the 200k ASCII-only file:
 * glib/pango:   0.189
 * libicu:       0.200
 * libunistring: 0.119

And for the 182k mixed english/chinese/japanese file:
* glib/pango:   21.4
* libicu:        0.220
* libunistring:  0.175

So, with this improvement considering ASCII-only words a special case,
libunistring really beats them all.

libicu and glib/pango remain pretty similar, and while libicu seems
faster for the smallest file, glib/pango seems faster in the biggest
one.

As a reference, added also the test with the mixed
english/chinese/japanese, which also change with the new ASCII-only
parsing improvement. Now libunistring seems 20% faster than libicu (was
around 10% yesterday).

Cheers!

-- 
Aleksander

_______________________________________________
tracker-list mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/tracker-list

Re: [Tracker] libicu & libunistring based parsers (was:Re: libunistring-based parser in libtracker-fts)

Reply via email to