Package: grep Version: 2.5.1.ds1-5 Followup-For: Bug #181378 Hello,
I tried the gofast patche, and did not find a real improvement. However, Fedora is now using a different patch, which improve dramaticaly grep performances on an UTF-8 environment. Please find attached the following patches: * I put the original Fedora patches in the orig directory. The other patches are updated for the Debian package. * 64-egf-speedup.patch It does most of the work. Here is the explanation, according to: http://savannah.gnu.org/patch/?func=detailitem&item_id=3803 > The full story behind this patch is that grep-2.5.1a does not handle > UTF-8 gracefully at all. The basic plan with handling UTF-8 in 2.5.1a > is: > * whenever a buffer is parsed, go through the entire buffer deciding > how many bytes make up each character > * use this information when necessary > > This patch changes that to: > * when information about how many bytes make up a character is needed, > work it out on demand > > On the face of it, this is a small obvious improvement. In fact it is > much better than that, because the original scheme would calculate > character lengths several times for each buffer: in fact, one full > pass for every single potential match! * 65-dfa-optional.patch I'm not sure this one is really needed. I've read the DFA algorithme is slow for UTF-8 and this patch disable it in that case (and it can be forced enabled by setting an evirronment variable) * grep-2.5.1-tests.patch Fedora also added a test for UTF-8. * 66-match_icase.patch * 67-w.patch After testing the new UTF-8 tests, these too seems to be needed. (It is not really related to the grep's speed, but these patches may be interresting) I tried a grep packages with all these patches, and for the following command: grep '^' /var/lib/dpkg/available> /dev/null grep is more than 1500 faster on an UTF-8 environment. (on my machine, it take less than 3/4s instead of more than 10 minutes!) Also, I did not notice any regression, and grep is not dramatically slower on the C locale. These patches may be important for Etch since the transition to UTF-8 is mentionned on the (unofficial) Etch TODO list: http://wiki.debian.net/?EtchTODOList (And the French team is considering using UTF-8 for the default French locale) Thanks in advance, -- Nekral
patches.tar.bz2
Description: Binary data