Package: grep
Version: 2.5.1.ds1-5
Followup-For: Bug #181378

Hello,

I tried the gofast patche, and did not find a real improvement.

However, Fedora is now using a different patch, which improve dramaticaly
grep performances on an UTF-8 environment.

Please find attached the following patches:
  * I put the original Fedora patches in the orig directory. The other
    patches are updated for the Debian package.
  * 64-egf-speedup.patch
    It does most of the work. Here is the explanation, according to:
    http://savannah.gnu.org/patch/?func=detailitem&item_id=3803
>    The full story behind this patch is that grep-2.5.1a does not handle
>    UTF-8 gracefully at all. The basic plan with handling UTF-8 in 2.5.1a
>    is:
>    * whenever a buffer is parsed, go through the entire buffer deciding
>      how many bytes make up each character
>    * use this information when necessary
>
>    This patch changes that to:
>    * when information about how many bytes make up a character is needed,
>      work it out on demand
>
>    On the face of it, this is a small obvious improvement. In fact it is
>    much better than that, because the original scheme would calculate
>    character lengths several times for each buffer: in fact, one full
>    pass for every single potential match!

  * 65-dfa-optional.patch
    I'm not sure this one is really needed.
    I've read the DFA algorithme is slow for UTF-8 and this patch disable
    it in that case (and it can be forced enabled by setting an evirronment
    variable)
  * grep-2.5.1-tests.patch
    Fedora also added a test for UTF-8.
  * 66-match_icase.patch
  * 67-w.patch
    After testing the new UTF-8 tests, these too seems to be needed.
    (It is not really related to the grep's speed, but these patches may
    be interresting)

I tried a grep packages with all these patches, and for the following
command:
    grep '^' /var/lib/dpkg/available> /dev/null
grep is more than 1500 faster on an UTF-8 environment.
(on my machine, it take less than 3/4s instead of more than 10 minutes!)

Also, I did not notice any regression, and grep is not dramatically
slower on the C locale.

These patches may be important for Etch since the transition to UTF-8 is
mentionned on the (unofficial) Etch TODO list:
http://wiki.debian.net/?EtchTODOList

(And the French team is considering using UTF-8 for the default French
locale)

Thanks in advance,
-- 
Nekral

Attachment: patches.tar.bz2
Description: Binary data

Reply via email to