Re: can [[:digit:]] match something other than 0123456789?

2018-05-25 Thread Steffen Nurpmeso
Garrett Wollman wrote: |< said: |> Also, my feeling is that [[:digit:]] should match just the digits |> that are actually relevant for that locale, e.g., just "western" |> digits for en_GB. And fractions and superscripts are not digits. | |Implementations often use the same character defini

RE: can [[:digit:]] match something other than 0123456789?

2018-05-24 Thread Garrett Wollman
< said: > Also, my feeling is that [[:digit:]] should match just the digits > that are actually relevant for that locale, e.g., just "western" > digits for en_GB. And fractions and superscripts are not digits. Implementations often use the same character definitions for all locales using the sam

Re: can [[:digit:]] match something other than 0123456789?

2018-05-24 Thread Joerg Schilling
Stephane CHAZELAS wrote: > Is that a POSIX invention (the [a-z] based on collation) by the > way, or does it come from implementations that already existed > at the time? Around 1993, all major UNIX platforms used the same code that was derived from IBM. Maybe this is the background... Jörg

RE: can [[:digit:]] match something other than 0123456789?

2018-05-24 Thread Schwarz, Konrad
> -Original Message- > From: Stephane Chazelas [mailto:stephane.chaze...@gmail.com] > Sent: Sunday, May 20, 2018 10:43 PM > To: Geoff Clare > Cc: austin-group-l@opengroup.org > Subject: Re: can [[:digit:]] match something other than 0123456789? > > Note that

Re: can [[:digit:]] match something other than 0123456789?

2018-05-23 Thread Stephane CHAZELAS
2018-05-23 22:44:46 +0100, Stephane CHAZELAS: [...] > [a-z] is not guaranteed to match on lower case letters only let > alone abcdefghijklmnopqrstuvwxyz only, it may even match on > characters outside the latin script. [...] Actually, I suspect that POSIX requires ranges in the POSIX locale to be

Re: can [[:digit:]] match something other than 0123456789?

2018-05-23 Thread Stephane CHAZELAS
2018-05-22 13:49:20 +0100, Stephane CHAZELAS: [...] > In the case of the fnmatch and regexp of most systems, I don't > know how they make so that [0-9] only matches on 0123456789 or > [a-z] not on uppercase letters. Possibly, that's with special > cases as well. [...] Sorry, my bad. It looks like

Re: can [[:digit:]] match something other than 0123456789?

2018-05-22 Thread Chet Ramey
On 5/22/18 6:32 AM, Joerg Schilling wrote: >> bash's [a-z] still matches on A..Y or B..Z though (source of >> much consusion, many bugs and lots of ranting), and that >> makes me realise that bash is actually one of those utilities > > This strange and unexpected behavior did cause once that bash

Re: can [[:digit:]] match something other than 0123456789?

2018-05-22 Thread Stephane CHAZELAS
2018-05-22 12:32:20 +0200, Joerg Schilling: [...] > > bash's [a-z] still matches on A..Y or B..Z though (source of > > much consusion, many bugs and lots of ranting), and that > > makes me realise that bash is actually one of those utilities > > This strange and unexpected behavior did cause once

Re: can [[:digit:]] match something other than 0123456789?

2018-05-22 Thread keld
I Listed digits that were consequitive. I did not list japanese nor chinese digits. But it would be easy to also include japanese and chinese digits. you could just include character classes like zero, one, two etc. Best regards Keld On Tue, May 22, 2018 at 02:15:16PM +0200, Joerg Schilling wro

Re: can [[:digit:]] match something other than 0123456789?

2018-05-22 Thread Joerg Schilling
"k...@keldix.com" wrote: > I already cited text from 14652 and 30112. That would be fine. I mentioned already that japanese/chinese numbers are not consecutive. > On Tue, May 22, 2018 at 11:45:26AM +0200, Joerg Schilling wrote: > > "k...@keldix.com" wrote: > > > > > Well, if ctype.h does not

Re: can [[:digit:]] match something other than 0123456789?

2018-05-22 Thread keld
I already cited text from 14652 and 30112. That would be fine. best regards keld On Tue, May 22, 2018 at 11:45:26AM +0200, Joerg Schilling wrote: > "k...@keldix.com" wrote: > > > Well, if ctype.h does not cover the functionality that we want, then we > > need to > > specify new functionality.

Re: can [[:digit:]] match something other than 0123456789?

2018-05-22 Thread Joerg Schilling
Stephane Chazelas wrote: > Note that having [x-y] be based on collation order would mean > that things like [a-z] would also match on uppercase letters in > the latin script in locales where case is not considered in the > first weight for sorting (as is typical for English locales for > instance

Re: can [[:digit:]] match something other than 0123456789?

2018-05-22 Thread Joerg Schilling
"k...@keldix.com" wrote: > Well, if ctype.h does not cover the functionality that we want, then we need > to > specify new functionality. WG14 is looking into some reentrant functionality > in this area, in something that could be a TS. Could you please explain what functionallity you like to

Re: can [[:digit:]] match something other than 0123456789?

2018-05-20 Thread Stephane Chazelas
2018-05-16 09:42:56 +0100, Geoff Clare: > Stephane Chazelas wrote, on 15 May 2018: > > > > OK, so to rephrase and make sure I understand correctly. In > > locales other than C, [[:digit:]] will be guaranteed to match on > > 0123456789 only but not [0-9]. 0123456789 are guaranteed to be > > in that

Re: can [[:digit:]] match something other than 0123456789?

2018-05-18 Thread k...@keldix.com
On Fri, May 18, 2018 at 01:35:03PM -0500, Eric Blake wrote: > On 05/18/2018 12:24 PM, Wheeler, David A wrote: > >This conversation seems strange; many locales use digits other than 0-9 to > >represent numbers. > > > >The Eastern Arabic, Perso-Arabic variant, and Urdu variant all have > >digits, t

Re: can [[:digit:]] match something other than 0123456789?

2018-05-18 Thread Eric Blake
On 05/18/2018 12:24 PM, Wheeler, David A wrote: This conversation seems strange; many locales use digits other than 0-9 to represent numbers. The Eastern Arabic, Perso-Arabic variant, and Urdu variant all have digits, they just aren't 0-9. In Unicode/ISO-646 in particular there are the digits

RE: can [[:digit:]] match something other than 0123456789?

2018-05-18 Thread Wheeler, David A
This conversation seems strange; many locales use digits other than 0-9 to represent numbers. The Eastern Arabic, Perso-Arabic variant, and Urdu variant all have digits, they just aren't 0-9. In Unicode/ISO-646 in particular there are the digits U+0660 through U+0669 and U+06F0 through U+06F9.

Re: can [[:digit:]] match something other than 0123456789?

2018-05-17 Thread keld
On Thu, May 17, 2018 at 12:36:35PM +0200, Hans Åberg wrote: > > > On 17 May 2018, at 11:02, Joerg Schilling > > wrote: > > > > Hans Åberg wrote: > > > |I asked a person who speaks japanese and he told me that > | > | "\u4e00\u4e8c\u4e09" > | > |is similar to > |

Re: can [[:digit:]] match something other than 0123456789?

2018-05-17 Thread Hans Åberg
> On 17 May 2018, at 11:02, Joerg Schilling > wrote: > > Hans Åberg wrote: > |I asked a person who speaks japanese and he told me that | | "\u4e00\u4e8c\u4e09" | |is similar to | | "one two three" | |and this is not used for computing. I

Re: can [[:digit:]] match something other than 0123456789?

2018-05-17 Thread keld
On Thu, May 17, 2018 at 11:02:48AM +0200, Joerg Schilling wrote: > Hans Åberg wrote: > > > >> |I asked a person who speaks japanese and he told me that > > >> | > > >> | "\u4e00\u4e8c\u4e09" > > >> | > > >> |is similar to > > >> | > > >> | "one two three" > > >> | > > >> |and this is not used for

Re: can [[:digit:]] match something other than 0123456789?

2018-05-17 Thread Joerg Schilling
Hans Åberg wrote: > >> |I asked a person who speaks japanese and he told me that > >> | > >> | "\u4e00\u4e8c\u4e09" > >> | > >> |is similar to > >> | > >> | "one two three" > >> | > >> |and this is not used for computing. > >> > >> If i recall correctly this has been discussed already; if not he

Re: can [[:digit:]] match something other than 0123456789?

2018-05-16 Thread Hans Åberg
> On 16 May 2018, at 18:13, Hans Åberg wrote: > > >> On 16 May 2018, at 17:14, Steffen Nurpmeso wrote: >> >> Joerg Schilling wrote: >> |Steffen Nurpmeso wrote: >> |>|> have some Unicode support. >> |>| >> |>|What do you expect: >> |>| >> |>| strtol("\u4e00\u4e8c\u4e09", &endp, 0); >> |> >

Re: can [[:digit:]] match something other than 0123456789?

2018-05-16 Thread Hans Åberg
> On 16 May 2018, at 17:14, Steffen Nurpmeso wrote: > > Joerg Schilling wrote: > |Steffen Nurpmeso wrote: > |>|> have some Unicode support. > |>| > |>|What do you expect: > |>| > |>| strtol("\u4e00\u4e8c\u4e09", &endp, 0); > |> > |> The entire is*() family cannot work with multibyte or statef

Re: can [[:digit:]] match something other than 0123456789?

2018-05-16 Thread Steffen Nurpmeso
Joerg Schilling wrote: |Steffen Nurpmeso wrote: |>|> have some Unicode support. |>| |>|What do you expect: |>| |>| strtol("\u4e00\u4e8c\u4e09", &endp, 0); |> |> The entire is*() family cannot work with multibyte or stateful |> encodings, right. | |I asked a person who speaks japanese

Re: can [[:digit:]] match something other than 0123456789?

2018-05-16 Thread Joerg Schilling
Steffen Nurpmeso wrote: > |> have some Unicode support. > | > |What do you expect: > | > | strtol("\u4e00\u4e8c\u4e09", &endp, 0); > > The entire is*() family cannot work with multibyte or stateful > encodings, right. I asked a person who speaks japanese and he told me that "\u4e0

Re: can [[:digit:]] match something other than 0123456789?

2018-05-16 Thread Steffen Nurpmeso
Joerg Schilling wrote: |Hans Åberg wrote: |>> On 16 May 2018, at 10:29, Joerg Schilling > er.de> wrote: |>> |>> Robert Elz wrote: |>> |>>> How does one specify a locale for some area using Latin as its |>>> language, where I V X L C D M are the digits ? |>> |>> how do you like to spe

Re: can [[:digit:]] match something other than 0123456789?

2018-05-16 Thread Hans Åberg
> On 16 May 2018, at 10:53, Joerg Schilling > wrote: > > Hans Åberg wrote: > >> >>> On 16 May 2018, at 10:29, Joerg Schilling >>> wrote: >>> >>> Robert Elz wrote: >>> How does one specify a locale for some area using Latin as its language, where I V X L C D M are the digits ?

Re: can [[:digit:]] match something other than 0123456789?

2018-05-16 Thread keld
On Wed, May 16, 2018 at 10:41:15AM +0200, Joerg Schilling wrote: > Robert Elz wrote: > > > would be easy, but you say it alao has to look for > > > > (c) [[:latindigs:]]+ > > (c) [[:vdigits:]]+ > > > > (and how many more)? This is actualy kind of important, as > > > > (c) MMXVI > >

Re: can [[:digit:]] match something other than 0123456789?

2018-05-16 Thread Shware Systems
For conforming charsets XBD 6 requires the range <0>-<9> to be contiguous. By XBD 9.3.5, Rule 6, {:digit:] may include MBS elements aside from the <0> to <9> in LC_CTYPE, but the range [0-9] depends on whether additional characters have the same collation weight as digits. If this is the case th

Re: can [[:digit:]] match something other than 0123456789?

2018-05-16 Thread Joerg Schilling
Hans Åberg wrote: > > > On 16 May 2018, at 10:29, Joerg Schilling > > wrote: > > > > Robert Elz wrote: > > > >> How does one specify a locale for some area using Latin as its > >> language, where I V X L C D M are the digits ? > > > > how do you like to specify a hexadecimal number in this

Re: can [[:digit:]] match something other than 0123456789?

2018-05-16 Thread Hans Åberg
> On 16 May 2018, at 10:29, Joerg Schilling > wrote: > > Robert Elz wrote: > >> How does one specify a locale for some area using Latin as its >> language, where I V X L C D M are the digits ? > > how do you like to specify a hexadecimal number in this locale? They have no need for that in

Re: can [[:digit:]] match something other than 0123456789?

2018-05-16 Thread Joerg Schilling
Geoff Clare wrote: > Stephane Chazelas wrote, on 15 May 2018: > > > > OK, so to rephrase and make sure I understand correctly. In > > locales other than C, [[:digit:]] will be guaranteed to match on > > 0123456789 only but not [0-9]. 0123456789 are guaranteed to be > > in that order but [0-9] is

Re: can [[:digit:]] match something other than 0123456789?

2018-05-16 Thread Geoff Clare
Stephane Chazelas wrote, on 15 May 2018: > > OK, so to rephrase and make sure I understand correctly. In > locales other than C, [[:digit:]] will be guaranteed to match on > 0123456789 only but not [0-9]. 0123456789 are guaranteed to be > in that order but [0-9] is unspecified anyway outside of th

Re: can [[:digit:]] match something other than 0123456789?

2018-05-16 Thread Joerg Schilling
Robert Elz wrote: > would be easy, but you say it alao has to look for > > (c) [[:latindigs:]]+ > (c) [[:vdigits:]]+ > > (and how many more)? This is actualy kind of important, as > > (c) MMXVI > > type strings are not uncommon in certain environments (can't recall > ever seei

Re: can [[:digit:]] match something other than 0123456789?

2018-05-16 Thread Joerg Schilling
Robert Elz wrote: > How does one specify a locale for some area using Latin as its > language, where I V X L C D M are the digits ? how do you like to specify a hexadecimal number in this locale? Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg

Re: can [[:digit:]] match something other than 0123456789?

2018-05-15 Thread Shware Systems
No, in the C locale and locale definitions where the charmap includes definitions of <0>-<9> [:digit:] will match on [0-9]. In locales other than C it may not match what another locale uses for [0-9], if their charmap assignment is different, and may match more character assignments. This is one

Re: can [[:digit:]] match something other than 0123456789?

2018-05-15 Thread Shware Systems
Yes, it nominally is unworkable as static rosters so isn't considered portable enough to standardize, that I see. K&R originally just wanted to support decimal and octal in C, iirc, and octal only because DEC did PDP core dumps that way. While Unicode provides some support for rosters of arbitra

Re: can [[:digit:]] match something other than 0123456789?

2018-05-15 Thread Robert Elz
Date:Tue, 15 May 2018 18:42:29 -0400 From:Shware Systems Message-ID: <16365f81e7e-179a-29...@webjas-vab019.srv.aolmail.net> | That locale would define a latindigs charclass, same as Venusians are requi= | red to define a vdigits for theirs, and it's up to the appl

Re: can [[:digit:]] match something other than 0123456789?

2018-05-15 Thread Shware Systems
That locale would define a latindigs charclass, same as Venusians are required to define a vdigits for theirs, and it's up to the application to do the equivalences to 1, 5, 10, 50, etc. in a latinstr2ull() routine. In a message dated 5/15/2018 6:31:31 PM Eastern Standard Time, k...@munnari.oz

Re: can [[:digit:]] match something other than 0123456789?

2018-05-15 Thread Steffen Nurpmeso
Stephane Chazelas wrote: |2018-05-15 16:55:45 -0500, Eric Blake: |> On 05/15/2018 03:43 PM, Stephane Chazelas wrote: |>>Does that mean that [0-9] is also guaranteed to match on |>>0123456789 only? And that then [[:digit:]] in regexp/fnmatch is |>>close to useless as it's longer than [0-9] |>

Re: can [[:digit:]] match something other than 0123456789?

2018-05-15 Thread Robert Elz
Date:Tue, 15 May 2018 13:38:15 -0500 From:Eric Blake Message-ID: <08af8b99-dcf0-5775-3aed-533611cec...@redhat.com> | Please read http://austingroupbugs.net/view.php?id=1078 where this | wording has been tightened to cover ALL locales, not just the POSIX | loca

Re: can [[:digit:]] match something other than 0123456789?

2018-05-15 Thread Stephane Chazelas
2018-05-15 16:55:45 -0500, Eric Blake: > On 05/15/2018 03:43 PM, Stephane Chazelas wrote: > > > >Does that mean that [0-9] is also guaranteed to match on > >0123456789 only? And that then [[:digit:]] in regexp/fnmatch is > >close to useless as it's longer than [0-9] > > Yes, I think that's a fair

Re: can [[:digit:]] match something other than 0123456789?

2018-05-15 Thread Eric Blake
On 05/15/2018 03:43 PM, Stephane Chazelas wrote: Does that mean that [0-9] is also guaranteed to match on 0123456789 only? And that then [[:digit:]] in regexp/fnmatch is close to useless as it's longer than [0-9] Yes, I think that's a fair conclusion for the C locale, by virtue of the fact th

Re: can [[:digit:]] match something other than 0123456789?

2018-05-15 Thread Shware Systems
For that hypothetical Venusian locale, as discussed for 1078, it would be expected to define a VDIGIT (sic) custom LC_CTYPE charclass for specifying other character names representing digits, and then using [[:digit:][:VDIGIT:]] to test for both. Code like this couldn't be considered strictly co

Re: can [[:digit:]] match something other than 0123456789?

2018-05-15 Thread Stephane Chazelas
2018-05-15 13:38:15 -0500, Eric Blake: > On 05/15/2018 12:50 PM, Stephane Chazelas wrote: [...] > >> digit > >> Define the characters to be classified as numeric digits. > >> > >> In the POSIX locale, only: > >> > >>0 1 2 3 4 5 6 7 8 9 > > Please read http://austingroupbugs.n

Re: can [[:digit:]] match something other than 0123456789?

2018-05-15 Thread Eric Blake
On 05/15/2018 12:50 PM, Stephane Chazelas wrote: You're a bit late to the party on this question :) digit Define the characters to be classified as numeric digits. In the POSIX locale, only: 0 1 2 3 4 5 6 7 8 9 Please read http://austingroupbugs.net/view.php?id=1078