Re: bug in join: case comparisons don't work in multibyte locales

2009-03-16 Thread Simon Josefsson
Bruno Haible writes: > James Youngman wrote: >> My first reaction was, why isn't libunistring===glibc > > glibc means to implement POSIX and be the interface to the system calls. > The general guideline nowadays among glibc maintainers is "no new API" > (unless it's a new system call). IIRC, when

Re: bug in join: case comparisons don't work in multibyte locales

2009-03-12 Thread Paolo Bonzini
> OK, I'll work on the creation of a GNU project called 'libunistring', that > will export the functions from gnulib as a shared library. That's simply great to hear. Paolo

Re: bug in join: case comparisons don't work in multibyte locales

2009-03-12 Thread Pádraig Brady
Bruno Haible wrote: > Pádraig Brady wrote: >> Note as well as folding case I think it might >> be useful to fold other forms like: >> Enclosed: \u24b6 -> A >> Stylistic: \uff21-> A > > These two transformations are already executed when you use ulc_casecmp > with the UNINORM_NFKD argument. A

Re: bug in join: case comparisons don't work in multibyte locales

2009-03-12 Thread Bruno Haible
Pádraig Brady wrote: > Note as well as folding case I think it might > be useful to fold other forms like: > Enclosed: \u24b6 -> A > Stylistic: \uff21-> A These two transformations are already executed when you use ulc_casecmp with the UNINORM_NFKD argument. > Diacritics: À -> A Very goo

Re: bug in join: case comparisons don't work in multibyte locales

2009-03-12 Thread Bruno Haible
James Youngman wrote: > My first reaction was, why isn't libunistring===glibc glibc means to implement POSIX and be the interface to the system calls. The general guideline nowadays among glibc maintainers is "no new API" (unless it's a new system call). IIRC, when libidn was added to glibc as an

Re: bug in join: case comparisons don't work in multibyte locales

2009-03-12 Thread James Youngman
On Wed, Mar 11, 2009 at 11:57 AM, Bruno Haible wrote: > OK, I'll work on the creation of a GNU project called 'libunistring', that > will export the functions from gnulib as a shared library. My first reaction was, why isn't libunistring===glibc, but then we'd end up in a situation where gnulib w

Re: bug in join: case comparisons don't work in multibyte locales

2009-03-11 Thread Ben Pfaff
Bruno Haible writes: > | on NUL terminated| on memory areas or > | strings | strings with embedded NULs > --+--+--- > For ASCII strings | c_strcasecmp,| > only

Re: bug in join: case comparisons don't work in multibyte locales

2009-03-11 Thread Bruno Haible
Hi Jim and Pádraig, > > 1) Which functions to use for case comparison in coreutils? pb> I think if we're going to do it we should do it right. pb> I.E. use ulc_casecmp jm> I prefer the "correct" approach, especially since I believe that will jm> eventually align with POSIX, even if it doesn't ma

Re: bug in join: case comparisons don't work in multibyte locales

2009-03-11 Thread Jim Meyering
Bruno Haible wrote: > In coreutils/src/join.c, there is a FIXME mentioning that the -i option for > case insensitive comparison of the input lines does not work in multibyte > locales. And indeed, in an UTF-8 locale, I see this: ... > Find attached a draft patch for the 'join' program, that fixes t

Re: bug in join: case comparisons don't work in multibyte locales

2009-03-10 Thread Pádraig Brady
Pádraig Brady wrote: > Bruno Haible wrote: >> Hi Jim, > > Thanks for looking at this Bruno. > >> In coreutils/src/join.c, there is a FIXME mentioning that the -i option for >> case insensitive comparison of the input lines does not work in multibyte >> locales. > > Utils that have this issue are

Re: bug in join: case comparisons don't work in multibyte locales

2009-03-10 Thread Pádraig Brady
Bruno Haible wrote: > Hi Jim, Thanks for looking at this Bruno. > In coreutils/src/join.c, there is a FIXME mentioning that the -i option for > case insensitive comparison of the input lines does not work in multibyte > locales. Utils that have this issue are: join -i, uniq -i, sort -f, ptx -f

bug in join: case comparisons don't work in multibyte locales

2009-03-10 Thread Bruno Haible
Hi Jim, In coreutils/src/join.c, there is a FIXME mentioning that the -i option for case insensitive comparison of the input lines does not work in multibyte locales. And indeed, in an UTF-8 locale, I see this: $ cat > in1 < in2 < in1 < in2 <--- coreutils-7.1/src/join.c.bak 2008-11-10 14:17:52.