RE: grep is horriby slow in UTF-8 locales

2004-06-09 Thread Markus Kuhn
are interested in getting this situation fixed. Brad From: Chen, Brad [EMAIL PROTECTED] Sent: Saturday, June 05, 2004 4:04 PM To: '[EMAIL PROTECTED]' Subject: RE: grep is horriby slow in UTF-8 locales From the proposed patch: - if (MB_CUR_MAX 1 mb_properties[beg - buf] == 0

Re: grep is horriby slow in UTF-8 locales

2003-11-16 Thread Bruno Haible
Markus Kuhn wrote: b) relying entirely on ISO C's generic multi-byte functions, to make sure that even stateful monsters like the ISO 2022 encodings are supported equally. Use of mbrlen is not done because of ISO 2022 encodings (which are not usable as locale encodings!), but

Re: grep is horriby slow in UTF-8 locales

2003-11-10 Thread jmaiorana
I recall that we had about two years ago heated discussions here on whether UTF-8 support should be implemented by a) hardwired mechanisms fully optimized to make good use of UTF-8's neat properties b) relying entirely on ISO C's generic multi-byte functions, to make sure that even

Re: grep is horriby slow in UTF-8 locales

2003-11-09 Thread Markus Kuhn
Mika Fischer wrote on 2003-11-08 21:47 UTC: So it seems the slowdown occurs in the function mbrlen from libc. The real problem is of course that this function is called once for every character of the input because grep makes a map of the input file containing the number of bytes of each

Re: grep is horriby slow in UTF-8 locales

2003-11-08 Thread Glenn Maynard
On Fri, Nov 07, 2003 at 12:52:44PM +, Markus Kuhn wrote: $ grep --version grep (GNU grep) 2.5.1 $ LC_ALL=en_GB.UTF-8 time grep XYZ test.txt Command exited with non-zero status 1 6.83user 0.07system 0:06.93elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs

Re: grep is horriby slow in UTF-8 locales

2003-11-08 Thread Danilo Segan
Markus Kuhn [EMAIL PROTECTED] writes: $ grep --version grep (GNU grep) 2.5.1 This doesn't happen with: $ grep --version grep (GNU grep) 2.4.2 $ LC_ALL=POSIX time grep XYZ test.txt Command exited with non-zero status 1 0.03user 0.07system 0:00.36elapsed 27%CPU (0avgtext+0avgdata

Re: grep is horriby slow in UTF-8 locales

2003-11-08 Thread Markus Kuhn
Rob Park wrote on 2003-11-08 00:49 UTC: grep is slower on my system, but it doesn't appear to be as bad as on your system. Your results show that grep in UTF-8 mode is equally 100x slower than in single-byte mode, just like on my system (300 MHz P3). You just have used a faster CPU. Markus

Re: grep is horriby slow in UTF-8 locales

2003-11-08 Thread Rob Park
Markus Kuhn wrote: Your results show that grep in UTF-8 mode is equally 100x slower than in single-byte mode, just like on my system (300 MHz P3). You just have used a faster CPU. D'oh :) -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/

Re: grep is horriby slow in UTF-8 locales

2003-11-08 Thread Danilo egan
Hi Markus, Markus Kuhn [EMAIL PROTECTED] writes: Rob Park wrote on 2003-11-08 00:49 UTC: grep is slower on my system, but it doesn't appear to be as bad as on your system. Your results show that grep in UTF-8 mode is equally 100x slower than in single-byte mode, just like on my system (300

Re: grep is horriby slow in UTF-8 locales

2003-11-08 Thread Glenn Maynard
On Fri, Nov 07, 2003 at 04:49:58PM +0100, Danilo Segan wrote: This doesn't happen with: $ grep --version grep (GNU grep) 2.4.2 This was probably before full multibyte support was added to grep; the issue here specifically only happens in multibyte encodings. (My grep is slow in en_US.UTF-8,

Re: grep is horriby slow in UTF-8 locales

2003-11-08 Thread Mika Fischer
Hi! * Markus Kuhn [EMAIL PROTECTED] [2003-11-07 16:33]: It seems grep performs about 100x worse in a UTF-8 locale than in and ASCII locale, even where the search strring contains no regex metacharacters. Same here on Debian with grep 2.5.1 and libc 2.3.2. There is technically no reason, why

Re: grep is horriby slow in UTF-8 locales

2003-11-08 Thread Danilo egan
[EMAIL PROTECTED] (Danilo egan) writes: $ LC_ALL=en_GB.UTF-8 time grep2.5 XYZ test.txt Command exited with non-zero status 1 0.05user 0.07system 0:00.12elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (140major+45minor)pagefaults 0swaps Whoops, this above is total crap. I

grep is horriby slow in UTF-8 locales

2003-11-07 Thread Markus Kuhn
On Red Hat 9: $ grep --version grep (GNU grep) 2.5.1 $ LC_ALL=en_GB.UTF-8 time grep XYZ test.txt Command exited with non-zero status 1 6.83user 0.07system 0:06.93elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (157major+34minor)pagefaults 0swaps $ LC_ALL=POSIX time grep XYZ

Re: grep is horriby slow in UTF-8 locales

2003-11-07 Thread Rob Park
Markus Kuhn wrote: On Red Hat 9: $ grep --version grep (GNU grep) 2.5.1 $ LC_ALL=en_GB.UTF-8 time grep XYZ test.txt Command exited with non-zero status 1 6.83user 0.07system 0:06.93elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (157major+34minor)pagefaults 0swaps $ LC_ALL=POSIX