Public bug reported:
On a source tree with 28MB of .c and .h files (Mesa), grep is slow with -i and
fast without it with the default Ubuntu locale settings (LANG=en_US.UTF-8, no
LC_ variables set). Actually, even some [Vv] style patterns are much faster
with LANG=C, so this is even more like
https://bugs.launchpad.net/distros/ubuntu/+source/grep/+bug/47634
My box is a core 2 duo (2.4GHz), which makes a beast like gnome feel
almost as snappy as fluxbox :) Everything is in the disk cache, so I/O
isn't a factor. Neither is memory bandwidth. The machine was otherwise
idle. I'm running AMD64 Edgy.
[EMAIL PROTECTED]:/usr/local/src/g965/mesa$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
... (all the same)
(times are measured for the second run in a row, so the CPU core it runs on is
at full clock speed the whole time.)
time find -name '*.[ch]' | xargs grep -i 'volatile_s3tc'
real 0m3.498s; user 0m3.483s; sys 0m0.023s
time find -name '*.[ch]' | xargs grep 'volatile.*s3tc'
real 0m0.076s; user 0m0.050s; sys 0m0.023s
Non UTF-8 locales are just as fast as without -i
time find -name '*.[ch]' | LANG=C xargs grep -i 'volatile.*s3tc'
real 0m0.083s; user 0m0.067s; sys 0m0.020s
time find -name '*.[ch]' | LANG=en_CA xargs grep -i 'volatile.*s3tc'
real 0m0.079s; user 0m0.050s; sys 0m0.027s
Making a case insensitive pattern takes more time, but is not really slow.
However, it probably doesn't really match everything that grep -i would on
input that wasn't all 7 bit ASCII:
time find -name '*.[ch]' | xargs grep
'[Vv][Oo][Ll][Aa][Tt][Ii][Ll][Ee].*[Ss]3[Tt][Cc]'
real 0m0.340s; user 0m0.313s; sys 0m0.027s
It is affected by locale settings, too.
time find -name '*.[ch]' | LANG=C xargs grep
'[Vv][Oo][Ll][Aa][Tt][Ii][Ll][Ee].*[Ss]3[Tt][Cc]'
real 0m0.096s; user 0m0.080s; sys 0m0.027s
** Affects: grep (Ubuntu)
Importance: Undecided
Status: Unconfirmed
--
huge performance hit for -i with UTF-8 locales
https://launchpad.net/bugs/75695
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs