Garrett Wollman wrote:
|< said:
|> Also, my feeling is that [[:digit:]] should match just the digits
|> that are actually relevant for that locale, e.g., just "western"
|> digits for en_GB. And fractions and superscripts are not digits.
|
|Implementations often use the same character defini
< said:
> Also, my feeling is that [[:digit:]] should match just the digits
> that are actually relevant for that locale, e.g., just "western"
> digits for en_GB. And fractions and superscripts are not digits.
Implementations often use the same character definitions for all
locales using the sam
Stephane CHAZELAS wrote:
> Is that a POSIX invention (the [a-z] based on collation) by the
> way, or does it come from implementations that already existed
> at the time?
Around 1993, all major UNIX platforms used the same code that was derived from
IBM.
Maybe this is the background...
Jörg
> -Original Message-
> From: Stephane Chazelas [mailto:stephane.chaze...@gmail.com]
> Sent: Sunday, May 20, 2018 10:43 PM
> To: Geoff Clare
> Cc: austin-group-l@opengroup.org
> Subject: Re: can [[:digit:]] match something other than 0123456789?
>
> Note that
2018-05-23 22:44:46 +0100, Stephane CHAZELAS:
[...]
> [a-z] is not guaranteed to match on lower case letters only let
> alone abcdefghijklmnopqrstuvwxyz only, it may even match on
> characters outside the latin script.
[...]
Actually, I suspect that POSIX requires ranges in the POSIX
locale to be
2018-05-22 13:49:20 +0100, Stephane CHAZELAS:
[...]
> In the case of the fnmatch and regexp of most systems, I don't
> know how they make so that [0-9] only matches on 0123456789 or
> [a-z] not on uppercase letters. Possibly, that's with special
> cases as well.
[...]
Sorry, my bad. It looks like
On 5/22/18 6:32 AM, Joerg Schilling wrote:
>> bash's [a-z] still matches on A..Y or B..Z though (source of
>> much consusion, many bugs and lots of ranting), and that
>> makes me realise that bash is actually one of those utilities
>
> This strange and unexpected behavior did cause once that bash
2018-05-22 12:32:20 +0200, Joerg Schilling:
[...]
> > bash's [a-z] still matches on A..Y or B..Z though (source of
> > much consusion, many bugs and lots of ranting), and that
> > makes me realise that bash is actually one of those utilities
>
> This strange and unexpected behavior did cause once
I Listed digits that were consequitive. I did not list japanese nor chinese
digits.
But it would be easy to also include japanese and chinese digits.
you could just include character classes like zero, one, two etc.
Best regards
Keld
On Tue, May 22, 2018 at 02:15:16PM +0200, Joerg Schilling wro
"k...@keldix.com" wrote:
> I already cited text from 14652 and 30112. That would be fine.
I mentioned already that japanese/chinese numbers are not consecutive.
> On Tue, May 22, 2018 at 11:45:26AM +0200, Joerg Schilling wrote:
> > "k...@keldix.com" wrote:
> >
> > > Well, if ctype.h does not
I already cited text from 14652 and 30112. That would be fine.
best regards
keld
On Tue, May 22, 2018 at 11:45:26AM +0200, Joerg Schilling wrote:
> "k...@keldix.com" wrote:
>
> > Well, if ctype.h does not cover the functionality that we want, then we
> > need to
> > specify new functionality.
Stephane Chazelas wrote:
> Note that having [x-y] be based on collation order would mean
> that things like [a-z] would also match on uppercase letters in
> the latin script in locales where case is not considered in the
> first weight for sorting (as is typical for English locales for
> instance
"k...@keldix.com" wrote:
> Well, if ctype.h does not cover the functionality that we want, then we need
> to
> specify new functionality. WG14 is looking into some reentrant functionality
> in this area, in something that could be a TS.
Could you please explain what functionallity you like to
2018-05-16 09:42:56 +0100, Geoff Clare:
> Stephane Chazelas wrote, on 15 May 2018:
> >
> > OK, so to rephrase and make sure I understand correctly. In
> > locales other than C, [[:digit:]] will be guaranteed to match on
> > 0123456789 only but not [0-9]. 0123456789 are guaranteed to be
> > in that
On Fri, May 18, 2018 at 01:35:03PM -0500, Eric Blake wrote:
> On 05/18/2018 12:24 PM, Wheeler, David A wrote:
> >This conversation seems strange; many locales use digits other than 0-9 to
> >represent numbers.
> >
> >The Eastern Arabic, Perso-Arabic variant, and Urdu variant all have
> >digits, t
On 05/18/2018 12:24 PM, Wheeler, David A wrote:
This conversation seems strange; many locales use digits other than 0-9 to
represent numbers.
The Eastern Arabic, Perso-Arabic variant, and Urdu variant all have digits,
they just aren't 0-9. In Unicode/ISO-646 in particular there are the digits
This conversation seems strange; many locales use digits other than 0-9 to
represent numbers.
The Eastern Arabic, Perso-Arabic variant, and Urdu variant all have digits,
they just aren't 0-9. In Unicode/ISO-646 in particular there are the digits
U+0660 through U+0669 and U+06F0 through U+06F9.
On Thu, May 17, 2018 at 12:36:35PM +0200, Hans Åberg wrote:
>
> > On 17 May 2018, at 11:02, Joerg Schilling
> > wrote:
> >
> > Hans Åberg wrote:
> >
> |I asked a person who speaks japanese and he told me that
> |
> | "\u4e00\u4e8c\u4e09"
> |
> |is similar to
> |
> On 17 May 2018, at 11:02, Joerg Schilling
> wrote:
>
> Hans Åberg wrote:
>
|I asked a person who speaks japanese and he told me that
|
| "\u4e00\u4e8c\u4e09"
|
|is similar to
|
| "one two three"
|
|and this is not used for computing.
I
On Thu, May 17, 2018 at 11:02:48AM +0200, Joerg Schilling wrote:
> Hans Åberg wrote:
>
> > >> |I asked a person who speaks japanese and he told me that
> > >> |
> > >> | "\u4e00\u4e8c\u4e09"
> > >> |
> > >> |is similar to
> > >> |
> > >> | "one two three"
> > >> |
> > >> |and this is not used for
Hans Åberg wrote:
> >> |I asked a person who speaks japanese and he told me that
> >> |
> >> | "\u4e00\u4e8c\u4e09"
> >> |
> >> |is similar to
> >> |
> >> | "one two three"
> >> |
> >> |and this is not used for computing.
> >>
> >> If i recall correctly this has been discussed already; if not he
> On 16 May 2018, at 18:13, Hans Åberg wrote:
>
>
>> On 16 May 2018, at 17:14, Steffen Nurpmeso wrote:
>>
>> Joerg Schilling wrote:
>> |Steffen Nurpmeso wrote:
>> |>|> have some Unicode support.
>> |>|
>> |>|What do you expect:
>> |>|
>> |>| strtol("\u4e00\u4e8c\u4e09", &endp, 0);
>> |>
>
> On 16 May 2018, at 17:14, Steffen Nurpmeso wrote:
>
> Joerg Schilling wrote:
> |Steffen Nurpmeso wrote:
> |>|> have some Unicode support.
> |>|
> |>|What do you expect:
> |>|
> |>| strtol("\u4e00\u4e8c\u4e09", &endp, 0);
> |>
> |> The entire is*() family cannot work with multibyte or statef
Joerg Schilling wrote:
|Steffen Nurpmeso wrote:
|>|> have some Unicode support.
|>|
|>|What do you expect:
|>|
|>| strtol("\u4e00\u4e8c\u4e09", &endp, 0);
|>
|> The entire is*() family cannot work with multibyte or stateful
|> encodings, right.
|
|I asked a person who speaks japanese
Steffen Nurpmeso wrote:
> |> have some Unicode support.
> |
> |What do you expect:
> |
> | strtol("\u4e00\u4e8c\u4e09", &endp, 0);
>
> The entire is*() family cannot work with multibyte or stateful
> encodings, right.
I asked a person who speaks japanese and he told me that
"\u4e0
Joerg Schilling wrote:
|Hans Åberg wrote:
|>> On 16 May 2018, at 10:29, Joerg Schilling > er.de> wrote:
|>>
|>> Robert Elz wrote:
|>>
|>>> How does one specify a locale for some area using Latin as its
|>>> language, where I V X L C D M are the digits ?
|>>
|>> how do you like to spe
> On 16 May 2018, at 10:53, Joerg Schilling
> wrote:
>
> Hans Åberg wrote:
>
>>
>>> On 16 May 2018, at 10:29, Joerg Schilling
>>> wrote:
>>>
>>> Robert Elz wrote:
>>>
How does one specify a locale for some area using Latin as its
language, where I V X L C D M are the digits ?
On Wed, May 16, 2018 at 10:41:15AM +0200, Joerg Schilling wrote:
> Robert Elz wrote:
>
> > would be easy, but you say it alao has to look for
> >
> > (c) [[:latindigs:]]+
> > (c) [[:vdigits:]]+
> >
> > (and how many more)? This is actualy kind of important, as
> >
> > (c) MMXVI
> >
For conforming charsets XBD 6 requires the range <0>-<9> to be contiguous. By
XBD 9.3.5, Rule 6, {:digit:] may include MBS elements aside from the <0> to <9>
in LC_CTYPE, but the range [0-9] depends on whether additional characters have
the same collation weight as digits. If this is the case th
Hans Åberg wrote:
>
> > On 16 May 2018, at 10:29, Joerg Schilling
> > wrote:
> >
> > Robert Elz wrote:
> >
> >> How does one specify a locale for some area using Latin as its
> >> language, where I V X L C D M are the digits ?
> >
> > how do you like to specify a hexadecimal number in this
> On 16 May 2018, at 10:29, Joerg Schilling
> wrote:
>
> Robert Elz wrote:
>
>> How does one specify a locale for some area using Latin as its
>> language, where I V X L C D M are the digits ?
>
> how do you like to specify a hexadecimal number in this locale?
They have no need for that in
Geoff Clare wrote:
> Stephane Chazelas wrote, on 15 May 2018:
> >
> > OK, so to rephrase and make sure I understand correctly. In
> > locales other than C, [[:digit:]] will be guaranteed to match on
> > 0123456789 only but not [0-9]. 0123456789 are guaranteed to be
> > in that order but [0-9] is
Stephane Chazelas wrote, on 15 May 2018:
>
> OK, so to rephrase and make sure I understand correctly. In
> locales other than C, [[:digit:]] will be guaranteed to match on
> 0123456789 only but not [0-9]. 0123456789 are guaranteed to be
> in that order but [0-9] is unspecified anyway outside of th
Robert Elz wrote:
> would be easy, but you say it alao has to look for
>
> (c) [[:latindigs:]]+
> (c) [[:vdigits:]]+
>
> (and how many more)? This is actualy kind of important, as
>
> (c) MMXVI
>
> type strings are not uncommon in certain environments (can't recall
> ever seei
Robert Elz wrote:
> How does one specify a locale for some area using Latin as its
> language, where I V X L C D M are the digits ?
how do you like to specify a hexadecimal number in this locale?
Jörg
--
EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg
No, in the C locale and locale definitions where the charmap includes
definitions of <0>-<9> [:digit:] will match on [0-9]. In locales other than C
it may not match what another locale uses for [0-9], if their charmap
assignment is different, and may match more character assignments. This is one
Yes, it nominally is unworkable as static rosters so isn't considered portable
enough to standardize, that I see. K&R originally just wanted to support
decimal and octal in C, iirc, and octal only because DEC did PDP core dumps
that way. While Unicode provides some support for rosters of arbitra
Date:Tue, 15 May 2018 18:42:29 -0400
From:Shware Systems
Message-ID: <16365f81e7e-179a-29...@webjas-vab019.srv.aolmail.net>
| That locale would define a latindigs charclass, same as Venusians are requi=
| red to define a vdigits for theirs, and it's up to the appl
That locale would define a latindigs charclass, same as Venusians are required
to define a vdigits for theirs, and it's up to the application to do the
equivalences to 1, 5, 10, 50, etc. in a latinstr2ull() routine.
In a message dated 5/15/2018 6:31:31 PM Eastern Standard Time,
k...@munnari.oz
Stephane Chazelas wrote:
|2018-05-15 16:55:45 -0500, Eric Blake:
|> On 05/15/2018 03:43 PM, Stephane Chazelas wrote:
|>>Does that mean that [0-9] is also guaranteed to match on
|>>0123456789 only? And that then [[:digit:]] in regexp/fnmatch is
|>>close to useless as it's longer than [0-9]
|>
Date:Tue, 15 May 2018 13:38:15 -0500
From:Eric Blake
Message-ID: <08af8b99-dcf0-5775-3aed-533611cec...@redhat.com>
| Please read http://austingroupbugs.net/view.php?id=1078 where this
| wording has been tightened to cover ALL locales, not just the POSIX
| loca
2018-05-15 16:55:45 -0500, Eric Blake:
> On 05/15/2018 03:43 PM, Stephane Chazelas wrote:
> >
> >Does that mean that [0-9] is also guaranteed to match on
> >0123456789 only? And that then [[:digit:]] in regexp/fnmatch is
> >close to useless as it's longer than [0-9]
>
> Yes, I think that's a fair
On 05/15/2018 03:43 PM, Stephane Chazelas wrote:
Does that mean that [0-9] is also guaranteed to match on
0123456789 only? And that then [[:digit:]] in regexp/fnmatch is
close to useless as it's longer than [0-9]
Yes, I think that's a fair conclusion for the C locale, by virtue of the
fact th
For that hypothetical Venusian locale, as discussed for 1078, it would be
expected to define a VDIGIT (sic) custom LC_CTYPE charclass for specifying
other character names representing digits, and then using [[:digit:][:VDIGIT:]]
to test for both. Code like this couldn't be considered strictly co
2018-05-15 13:38:15 -0500, Eric Blake:
> On 05/15/2018 12:50 PM, Stephane Chazelas wrote:
[...]
> >> digit
> >> Define the characters to be classified as numeric digits.
> >>
> >> In the POSIX locale, only:
> >>
> >>0 1 2 3 4 5 6 7 8 9
>
> Please read http://austingroupbugs.n
On 05/15/2018 12:50 PM, Stephane Chazelas wrote:
You're a bit late to the party on this question :)
digit
Define the characters to be classified as numeric digits.
In the POSIX locale, only:
0 1 2 3 4 5 6 7 8 9
Please read http://austingroupbugs.net/view.php?id=1078
46 matches
Mail list logo