Re: Pattern replacement fails if string contains multibyte characters

2007-09-28 Thread Andreas Schwab
Bernd Eggink [EMAIL PROTECTED] writes:

 This happens on a utf-8 based system (CRUX 2.3), LANG=de_DE.UTF-8:

 t=123abc456äöüABCD
 echo ${t//[a-c]/}
 # output: 123456öüCD

Which is correct.  [a-c] matches every character between a and c
(inclusive) in the collating sequence defined by the locale.  For your
locale that includes characters like ä and A.  You should avoid the use
of ranges when not using the C locale.

Andreas.

-- 
Andreas Schwab, SuSE Labs, [EMAIL PROTECTED]
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.




Re: Pattern replacement fails if string contains multibyte characters

2007-09-28 Thread Chet Ramey
I wrote:

 The difference is in the gnu libc implementation of strcoll(), which bash
 uses to compare characters for range matching.  The glibc implementation
 ignores the locale; the other systems incorporate the current locale's
 collating sequence into their strcoll implementation.

Sorry, that's backwards.  On systems where strcoll() honors the current
locale's collating sequence, you'll get the output you see on Linux.
Systems that either don't have locale support or don't reflect the
locale's collating sequence in strcoll() will produce the output you
expect.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
   Live Strong.  No day but today.
Chet Ramey, ITS, CWRU[EMAIL PROTECTED]http://cnswww.cns.cwru.edu/~chet/