Public bug reported:

On a source tree with 28MB of .c and .h files (Mesa), grep is slow with -i and 
fast without it with the default Ubuntu locale settings (LANG=en_US.UTF-8, no 
LC_ variables set).  Actually, even some [Vv] style patterns are much faster 
with LANG=C, so this is even more like 
https://bugs.launchpad.net/distros/ubuntu/+source/grep/+bug/47634

 My box is a core 2 duo (2.4GHz), which makes a beast like gnome feel
almost as snappy as fluxbox :)  Everything is in the disk cache, so I/O
isn't a factor.  Neither is memory bandwidth.  The machine was otherwise
idle.  I'm running  AMD64 Edgy.

[EMAIL PROTECTED]:/usr/local/src/g965/mesa$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
... (all the same)

(times are measured for the second run in a row, so the CPU core it runs on is 
at full clock speed the whole time.)
time find -name '*.[ch]' | xargs grep -i 'volatile_s3tc'
 real    0m3.498s; user    0m3.483s; sys     0m0.023s

time find -name '*.[ch]' | xargs grep  'volatile.*s3tc'
 real    0m0.076s; user    0m0.050s; sys     0m0.023s


Non UTF-8 locales are just as fast as without -i
time find -name '*.[ch]' | LANG=C xargs grep -i 'volatile.*s3tc'
 real    0m0.083s; user    0m0.067s; sys     0m0.020s

time find -name '*.[ch]' | LANG=en_CA xargs grep -i 'volatile.*s3tc'
 real    0m0.079s; user    0m0.050s; sys     0m0.027s


 Making a case insensitive pattern takes more time, but is not really slow.  
However, it probably doesn't really match everything that grep -i would on 
input that wasn't all 7 bit ASCII:
 time find -name '*.[ch]' | xargs grep  
'[Vv][Oo][Ll][Aa][Tt][Ii][Ll][Ee].*[Ss]3[Tt][Cc]'
real    0m0.340s; user    0m0.313s; sys     0m0.027s

It is affected by locale settings, too.
time find -name '*.[ch]' | LANG=C xargs grep  
'[Vv][Oo][Ll][Aa][Tt][Ii][Ll][Ee].*[Ss]3[Tt][Cc]'
real    0m0.096s; user    0m0.080s; sys     0m0.027s

** Affects: grep (Ubuntu)
     Importance: Undecided
         Status: Unconfirmed

-- 
huge performance hit for -i with UTF-8 locales
https://launchpad.net/bugs/75695

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to