Re: horrible utf-8 performace in wc

2008-05-08 Thread Jim Meyering
Bruno Haible <[EMAIL PROTECTED]> wrote: > 2008-05-08 Bruno Haible <[EMAIL PROTECTED]> > > Speed up "wc -m" and "wc -w" in multibyte case. > * src/wc.c: Include mbchar.h. > (wc): New variable in_shift. Use it to avoid calling mbrtowc for most > ASCII characters. Thanks! I'

Re: horrible utf-8 performace in wc

2008-05-08 Thread Bruno Haible
> $ time ./wc -m long_lines.txt > 13357046 long_lines.txt > real0m1.860s It processes at the speed of 7 million characters per second. I would not call this a "horrible performance". > However wc calls mbrtowc() for each multibyte character. Yes. One could use mbstowcs (or mbsnrtowcs, but th

Re: horrible utf-8 performace in wc

2008-05-08 Thread Pádraig Brady
Bruno Haible wrote: > As a consequence: > - The number of characters is the same as the number of wide characters. > - "wc -m" must output the number of characters. > - In a Unicode locale, is one character, and is > two characters, Fair enough. > If you want wc to count characters af

Re: horrible utf-8 performace in wc

2008-05-08 Thread Bo Borgerson
Bruno Haible wrote: > If you want wc to count characters after canonicalization, then you can > invent a new wc command-line option for it. But I would find it more useful > to have a filter program that reads from standard input and writes the > canonicalized output to standard output; that would

Re: BugReport about "ln" command worked in NTFS

2008-05-08 Thread Philip Rowlands
[ re-adding bug-coreutils@gnu.org ] On Thu, 8 May 2008, [EMAIL PROTECTED] wrote: The complete log about running "ln" is in the attachment. The strace -c output you posted shows 1 successful call to link(2), as I'd expect. It then shows further expected output from stat(1) that the link coun

Re: locales for testing

2008-05-08 Thread Bruno Haible
Jim Meyering wrote: > you'll need to include the new test only if there is > sufficient multi-byte support and if you can find a suitable locale to > test with. gnulib has a few autoconf macros to determine suitables locales: gt_LOCALE_FR_UTF8 - french locale with UTF-8 encoding

Re: horrible utf-8 performace in wc

2008-05-08 Thread Bruno Haible
> @@ -368,6 +370,8 @@ wc (int fd, char const *file_x, struct fstatus *fstatus) > linepos += width; > if (iswspace (wide_char)) > goto mb_word_separator; > + else if (uc_combining_class (wide_ch

Re: horrible utf-8 performace in wc

2008-05-08 Thread Bruno Haible
> Is there a good library for combining-character canonicalization > available? That seems like something that would be useful to have in a > lot of text-processing tools. Also, for Unicode, something to shuffle > between the normalization forms might be helpful for comparisons. Such functionali

Re: horrible utf-8 performace in wc

2008-05-08 Thread Bruno Haible
Pádraig Brady wrote: > mbstowcs doesn't canonicalize equivalent multibyte sequences, > and so therefore functions the same in this regard as our > processing of each wide character separately. > This could be considered a bug actually- i.e. should -m give > the number of wide chars, or the number o

Problème sous linux

2008-05-08 Thread nel natou
Bonjour. J'ai instalé Ubuntu sur mon pc en dual boot avec windows et j'ai des problèmes d'éceran ou de fréquence de raffraichissement. ce qui es bizarre, c'est qu'avant ça n'affectait que linux maintenant, ça vient même sous windows et parfois c'est très long. Je ne sia spas quoi faire. J

Re: feature request: error codes for 'rm'

2008-05-08 Thread Danny Rawlins
Danny Rawlins wrote: Hi I'm quite surprised 'rm' does not return a error code for no such file, I would like to see at least error code 1 so I can use it in a shell script, additional error codes might also be nice. Regards, Danny Rawlins http://crux.nu/Public/DannyRawlins Damn it sorry I ma

feature request: error codes for 'rm'

2008-05-08 Thread Danny Rawlins
Hi I'm quite surprised 'rm' does not return a error code for no such file, I would like to see at least error code 1 so I can use it in a shell script, additional error codes might also be nice. Regards, Danny Rawlins http://crux.nu/Public/DannyRawlins ___

Re: horrible utf-8 performace in wc

2008-05-08 Thread Bo Borgerson
Pádraig Brady wrote: > Bo Borgerson wrote: >> I poked around a little in gnulib and found a function for determining >> the combining class of a Unicode character. >> >> I think the attached patch does what you were intending to do, and it >> also counts all of the stand-alone zero-width characters