are interested in getting
this situation fixed.
Brad
From: Chen, Brad [EMAIL PROTECTED]
Sent: Saturday, June 05, 2004 4:04 PM
To: '[EMAIL PROTECTED]'
Subject: RE: grep is horriby slow in UTF-8 locales
From the proposed patch:
- if (MB_CUR_MAX 1 mb_properties[beg - buf] == 0
Markus Kuhn wrote:
b) relying entirely on ISO C's generic multi-byte functions, to make
sure that even stateful monsters like the ISO 2022 encodings
are supported equally.
Use of mbrlen is not done because of ISO 2022 encodings (which are not
usable as locale encodings!), but
I recall that we had about two years ago heated discussions here on
whether UTF-8 support should be implemented by
a) hardwired mechanisms fully optimized to make good use of UTF-8's
neat properties
b) relying entirely on ISO C's generic multi-byte functions, to make
sure that even
Mika Fischer wrote on 2003-11-08 21:47 UTC:
So it seems the slowdown occurs in the function mbrlen from libc.
The real problem is of course that this function is called once for
every character of the input because grep makes a map of the input
file containing the number of bytes of each
On Fri, Nov 07, 2003 at 12:52:44PM +, Markus Kuhn wrote:
$ grep --version
grep (GNU grep) 2.5.1
$ LC_ALL=en_GB.UTF-8 time grep XYZ test.txt
Command exited with non-zero status 1
6.83user 0.07system 0:06.93elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs
Markus Kuhn [EMAIL PROTECTED] writes:
$ grep --version
grep (GNU grep) 2.5.1
This doesn't happen with:
$ grep --version
grep (GNU grep) 2.4.2
$ LC_ALL=POSIX time grep XYZ test.txt
Command exited with non-zero status 1
0.03user 0.07system 0:00.36elapsed 27%CPU (0avgtext+0avgdata
Rob Park wrote on 2003-11-08 00:49 UTC:
grep is slower on my system, but it doesn't appear to be as bad as on
your system.
Your results show that grep in UTF-8 mode is equally 100x slower than in
single-byte mode, just like on my system (300 MHz P3). You just have
used a faster CPU.
Markus
Markus Kuhn wrote:
Your results show that grep in UTF-8 mode is equally 100x slower than in
single-byte mode, just like on my system (300 MHz P3). You just have
used a faster CPU.
D'oh :)
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/
Hi Markus,
Markus Kuhn [EMAIL PROTECTED] writes:
Rob Park wrote on 2003-11-08 00:49 UTC:
grep is slower on my system, but it doesn't appear to be as bad as on
your system.
Your results show that grep in UTF-8 mode is equally 100x slower than in
single-byte mode, just like on my system (300
On Fri, Nov 07, 2003 at 04:49:58PM +0100, Danilo Segan wrote:
This doesn't happen with:
$ grep --version
grep (GNU grep) 2.4.2
This was probably before full multibyte support was added to grep; the
issue here specifically only happens in multibyte encodings. (My grep
is slow in en_US.UTF-8,
Hi!
* Markus Kuhn [EMAIL PROTECTED] [2003-11-07 16:33]:
It seems grep performs about 100x worse in a UTF-8 locale than in and
ASCII locale, even where the search strring contains no regex
metacharacters.
Same here on Debian with grep 2.5.1 and libc 2.3.2.
There is technically no reason, why
[EMAIL PROTECTED] (Danilo egan) writes:
$ LC_ALL=en_GB.UTF-8 time grep2.5 XYZ test.txt
Command exited with non-zero status 1
0.05user 0.07system 0:00.12elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (140major+45minor)pagefaults 0swaps
Whoops, this above is total crap. I
On Red Hat 9:
$ grep --version
grep (GNU grep) 2.5.1
$ LC_ALL=en_GB.UTF-8 time grep XYZ test.txt
Command exited with non-zero status 1
6.83user 0.07system 0:06.93elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (157major+34minor)pagefaults 0swaps
$ LC_ALL=POSIX time grep XYZ
Markus Kuhn wrote:
On Red Hat 9:
$ grep --version
grep (GNU grep) 2.5.1
$ LC_ALL=en_GB.UTF-8 time grep XYZ test.txt
Command exited with non-zero status 1
6.83user 0.07system 0:06.93elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (157major+34minor)pagefaults 0swaps
$ LC_ALL=POSIX
14 matches
Mail list logo