Re: filename pattern case-insensitive, but why?

2009-09-23 Thread Mike Stroyan
On Tue, Sep 22, 2009 at 02:36:30AM -0700, thahn01 wrote:
 
 Hello, If I try something like:
 
 $ touch a.c b.c A.c
 $ ls [a-z]*.c
 a.c  A.c  b.c
 
 then I get A.c in the output, even if no capital letters are to be found.

  The [a-z] range expression matches characters between a and z in the
current locale's collation order.  The collation order for en_US.UTF-8 and
other locales has uppercase and lowercase alphabetic characters together.
So in those locales your range includes 'a' through 'z' and 'A' through
'Y'.  You can change the locale to C or POSIX to get plain ascii
collation order.  You can see the collation order using the sort command.

for c in {32..126}; do eval printf '%c - %d\n' $(printf $'%o' $c) 
$c;done | sort -k 1.1,1.1

for c in {32..126}; do eval printf '%c - %d\n' $(printf $'%o' $c) 
$c;done | LANG=C sort -k 1.1,1.1

The collation order lists 'a' before 'A', but actually lets a later
character break a tie between otherwise equal uppercase and lowercase
characters.  Sort will arrange 'a1', 'A1', 'a2', and 'A2' with the '1'
vs. '2' characters acting as a tiebreaker.

-- 
Mike Stroyan m...@stroyan.net




Re: filename pattern case-insensitive, but why?

2009-09-23 Thread Richard Leeden

Mike Stroyan wrote:

On Tue, Sep 22, 2009 at 02:36:30AM -0700, thahn01 wrote:

Hello, If I try something like:

$ touch a.c b.c A.c
$ ls [a-z]*.c
a.c  A.c  b.c

then I get A.c in the output, even if no capital letters are to be found.


  The [a-z] range expression matches characters between a and z in the
current locale's collation order.  The collation order for en_US.UTF-8 and
other locales has uppercase and lowercase alphabetic characters together.
So in those locales your range includes 'a' through 'z' and 'A' through
'Y'.  You can change the locale to C or POSIX to get plain ascii
collation order.  You can see the collation order using the sort command.

for c in {32..126}; do eval printf '%c - %d\n' $(printf $'%o' $c) 
$c;done | sort -k 1.1,1.1

for c in {32..126}; do eval printf '%c - %d\n' $(printf $'%o' $c) 
$c;done | LANG=C sort -k 1.1,1.1

The collation order lists 'a' before 'A', but actually lets a later
character break a tie between otherwise equal uppercase and lowercase
characters.  Sort will arrange 'a1', 'A1', 'a2', and 'A2' with the '1'
vs. '2' characters acting as a tiebreaker.



...and that it is why instead of using

 $ ls [a-z]*.c

you should use

 $ ls [[:lower:]]*.c