Austin Group teleconference +1-888-426-6840 PIN: 2115756

2018-05-17 Thread Single UNIX Specification
BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//opengroup.org//NONSGML kigkonsult.se iCalcreator 2.22.1//
CALSCALE:GREGORIAN
METHOD:REQUEST
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20120311T02
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20121104T02
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
UID:5afdc92678...@opengroup.org
DTSTAMP:20180517T182542Z
ATTENDEE;ROLE=CHAIR:MAILTO:a.jo...@opengroup.org
CREATED:20180517T00Z
DESCRIPTION:Web/Project: Single UNIX Specification\nTitle: Austin Group tel
 econference +1-888-426-6840 PIN: 2115756\nDate/Time: 24-May-2018 at 11:00 
 America/New_York\nDuration: 1.50 hours\nURL: https://collaboration.opengro
 up.org/platform/single_unix_specification/events.php\n\n** All calls are a
 nchored on US time **\n\nTopic: Austin Group teleconference\n-
 --\nAudio conference information\n
 ---\nCall-in toll free
  number (US/Canada): +1-888-426-6840\nParticipant PIN: 2115756.\n\nAll Aus
 tin Group participants are most welcome to join the call.\nThe call will l
 ast for 1.5 hours .\nThis call is handling defect report processing.\n\nAn
  etherpad is usually up for a meeting\, with a URL using the date format a
 s below:\n\nhttp://posix.rhansen.org/p/201x-mm-dd\nusername=posix password
 =2115756#\n\nAdditional Call-in numbers:\nGermany Caller P
 aid0-69-2443-2290\nGermany Toll-Free  
  0800-000-1018\nUnited Kingdom   Caller Paid   0-20-305964
 51\nUnited Kingdom   Toll-Free  0800-368-0638\nUSA
  Caller Paid   215-861-6239\nUSA  
   Toll-Free   888-426-6840\nDenmark Caller
  Paid32711870\nDenmark Toll-Free  
  80-717000\nCzech Republic  Caller Paid 2-39016353\nCz
 ech Republic  Toll-Free   800-143-484\nCall-in numbers
  for other countries are available on request\n\nBug reports are available
  at:\nhttp://www.austingroupbugs.net\n
DTSTART;TZID=America/New_York:20180524T11
DURATION:PT1H30M0S
LAST-MODIFIED:20180517T142542Z
ORGANIZER;CN=Single UNIX Specification:MAILTO:do-not-re...@opengroup.org
SEQUENCE:0
STATUS:CONFIRMED
SUMMARY:Austin Group teleconference +1-888-426-6840 PIN: 2115756
TRANSP:OPAQUE
URL:https://collaboration.opengroup.org/platform/single_unix_specification/
 events.php
X-MICROSOFT-CDO-ALLDAYEVENT:FALSE
X-VISIBILITY:40
X-JOINBEFORE:5
X-CATEGORY:Teleconference
X-PLATO-SITE:Single UNIX Specification
X-PLATO-SITEID:136
END:VEVENT
END:VCALENDAR


meeting.ics
Description: application/ics


[1003.1(2016)/Issue7+TC2 0001100]: Rewrite of Section 2.10 Shell Grammar, of the Shell Standard, to fix previous reports, fix new issues, and improve presentation.

2018-05-17 Thread Austin Group Bug Tracker

The following issue has been CLOSED. 
== 
http://austingroupbugs.net/view.php?id=1100 
== 
Reported By:Mark_Galeck
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1100
Category:   Shell and Utilities
Type:   Clarification Requested
Severity:   Editorial
Priority:   normal
Status: Closed
Name:   Mark Galeck 
Organization:
User Reference:  
Section:2.10 Shell Grammar 
Page Number:2375-2381 
Line Number:75873-76150 
Interp Status:  --- 
Final Accepted Text: 
Resolution: Rejected
Fixed in Version:   
== 
Date Submitted: 2016-10-27 12:40 UTC
Last Modified:  2018-05-17 16:03 UTC
== 
Summary:Rewrite of Section 2.10 Shell Grammar, of the Shell
Standard, to fix previous reports, fix new issues, and improve presentation.
==
Relationships   ID  Summary
--
has duplicate   0001098 do_group symbol cannot be accepted as w...
has duplicate   0001088 When more than one rule applies, ...
has duplicate   0001091 Some WORD tokens do not hav...
has duplicate   0001093 or applies globally is poin...
related to  0001082 delimited is incorrect
related to  0001083 next character is misleading
related to  0001084 rule 3, 4, 5 do not say that a token is...
related to  0001085 token shall be from the current p...
related to  0001086 Token Recognition is mislea...
== 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2016-10-27 12:40 Mark_GaleckNew Issue
2016-10-27 12:40 Mark_GaleckName  => Mark Galeck 
2016-10-27 12:40 Mark_GaleckSection   => 2.10 Shell Grammar
2016-10-27 12:40 Mark_GaleckPage Number   => 2375-2381   
2016-10-27 12:40 Mark_GaleckLine Number   => 75873-76150 
2016-10-27 12:57 Mark_GaleckNote Added: 0003470  
2016-10-28 08:19 geoffclare Relationship added   related to 0001082  
2016-10-28 08:20 geoffclare Relationship added   related to 0001083  
2016-10-28 08:20 geoffclare Relationship added   related to 0001084  
2016-10-28 08:21 geoffclare Relationship added   related to 0001085  
2016-10-28 08:21 geoffclare Relationship added   related to 0001086  
2016-10-28 08:22 geoffclare Relationship added   related to 0001098  
2018-03-28 03:59 kreNote Added: 0003944  
2018-04-12 15:38 eblake Relationship added   has duplicate 0001088
2018-04-12 15:39 eblake Relationship added   has duplicate 0001091
2018-04-12 15:39 eblake Relationship added   has duplicate 0001093
2018-04-12 15:40 eblake Relationship replacedhas duplicate 0001098
2018-05-11 20:10 shware_systems Note Added: 0004030  
2018-05-11 21:39 kreNote Added: 0004031  
2018-05-12 06:59 shware_systems Note Added: 0004032  
2018-05-17 15:33 eblake Note Added: 0004037  
2018-05-17 15:58 Don Cragun Note Added: 0004038  
2018-05-17 16:03 Don Cragun Interp Status => --- 
2018-05-17 16:03 Don Cragun Status   New => Closed   
2018-05-17 16:03 Don Cragun Resolution   Open => Rejected
==




[1003.1(2016)/Issue7+TC2 0001100]: Rewrite of Section 2.10 Shell Grammar, of the Shell Standard, to fix previous reports, fix new issues, and improve presentation.

2018-05-17 Thread Austin Group Bug Tracker

A NOTE has been added to this issue. 
== 
http://austingroupbugs.net/view.php?id=1100 
== 
Reported By:Mark_Galeck
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1100
Category:   Shell and Utilities
Type:   Clarification Requested
Severity:   Editorial
Priority:   normal
Status: New
Name:   Mark Galeck 
Organization:
User Reference:  
Section:2.10 Shell Grammar 
Page Number:2375-2381 
Line Number:75873-76150 
Interp Status:  --- 
Final Accepted Text: 
== 
Date Submitted: 2016-10-27 12:40 UTC
Last Modified:  2018-05-17 15:58 UTC
== 
Summary:Rewrite of Section 2.10 Shell Grammar, of the Shell
Standard, to fix previous reports, fix new issues, and improve presentation.
==
Relationships   ID  Summary
--
has duplicate   0001098 do_group symbol cannot be accepted as w...
has duplicate   0001088 When more than one rule applies, ...
has duplicate   0001091 Some WORD tokens do not hav...
has duplicate   0001093 or applies globally is poin...
related to  0001082 delimited is incorrect
related to  0001083 next character is misleading
related to  0001084 rule 3, 4, 5 do not say that a token is...
related to  0001085 token shall be from the current p...
related to  0001086 Token Recognition is mislea...
== 

-- 
 (0004038) Don Cragun (manager) - 2018-05-17 15:58
 http://austingroupbugs.net/view.php?id=1100#c4038 
-- 
We believe that some of the changes suggested in this bug report reflect a
misunderstanding of the grammar as it is presented in the standard rather
than problems in the grammar itself.  With no rationale for the changes
that are being made, no indication of what is intended to be fixed by the
changes that have been made, and no definitions for new terms that have
been added to the grammar and the description of the grammar, we are unable
to determine which, if any, of the suggested changes should be made.

We believe that there may be discrepancies between the grammar as it
currently appears in the standard and the shell language described by the
standard, but are unable to determine which, if any, of the changes
suggested in this bug report address those problems.  We are going to
reject this bug report, but would be happy to have the submitter provide
another bug report with a list of defects that need to be addressed and a
set of changes to meet those defects (with each change identifying the
defect it addresses).  We would also like to see addtitions to the
definitions section for newly defined terms (e.g., "important" 
characters) and changes to the rationale in XRAT C.2.10 explaining how the
grammar is being changed to reflect differences between what the standard
has intended to require and what the grammar currently does require.

When describing problems in the grammar, giving an example of a shell
construct that is not accepted by the grammar when it should be or that is
accepted by the grammar when it should not be would be a big help in
understanding the issues that are being addressed by proposed changes.

Note that existing shells are allowed to support extensions to constructs
required by the POSIX shell grammar.  Therefore, there is no requirement
that all existing shell constructs need to be recognized by the grammar. 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2016-10-27 12:40 Mark_GaleckNew Issue
2016-10-27 12:40 Mark_GaleckName  => Mark Galeck 
2016-10-27 12:40 Mark_GaleckSection   => 2.10 Shell Grammar
2016-10-27 12:40 Mark_GaleckPage Number   => 2375-2381   
2016-10-27 12:40 Mark_GaleckLine Number   => 75873-76150 
2016-10-27 12:57 Mark_GaleckNote Added: 0003470  
2016-10-28 08:19 geoffclare Relationship added   related to 0001082  

[1003.1(2016)/Issue7+TC2 0001100]: Rewrite of Section 2.10 Shell Grammar, of the Shell Standard, to fix previous reports, fix new issues, and improve presentation.

2018-05-17 Thread Austin Group Bug Tracker

A NOTE has been added to this issue. 
== 
http://austingroupbugs.net/view.php?id=1100 
== 
Reported By:Mark_Galeck
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1100
Category:   Shell and Utilities
Type:   Clarification Requested
Severity:   Editorial
Priority:   normal
Status: New
Name:   Mark Galeck 
Organization:
User Reference:  
Section:2.10 Shell Grammar 
Page Number:2375-2381 
Line Number:75873-76150 
Interp Status:  --- 
Final Accepted Text: 
== 
Date Submitted: 2016-10-27 12:40 UTC
Last Modified:  2018-05-17 15:33 UTC
== 
Summary:Rewrite of Section 2.10 Shell Grammar, of the Shell
Standard, to fix previous reports, fix new issues, and improve presentation.
==
Relationships   ID  Summary
--
has duplicate   0001098 do_group symbol cannot be accepted as w...
has duplicate   0001088 When more than one rule applies, ...
has duplicate   0001091 Some WORD tokens do not hav...
has duplicate   0001093 or applies globally is poin...
related to  0001082 delimited is incorrect
related to  0001083 next character is misleading
related to  0001084 rule 3, 4, 5 do not say that a token is...
related to  0001085 token shall be from the current p...
related to  0001086 Token Recognition is mislea...
== 

-- 
 (0004037) eblake (manager) - 2018-05-17 15:33
 http://austingroupbugs.net/view.php?id=1100#c4037 
-- 
Here is a diff between the original formal grammar and the proposed new
one:

--- /tmp/grammar.12018-05-10 09:11:35.894306140 -0700
+++ /tmp/grammar.22018-05-10 09:12:23.347012514 -0700
@@ -96,21 +96,18 @@
 term : term separator and_or
  |and_or
  ;
-for_clause   : For name  do_group
- | For name   sequential_sep do_group
- | For name linebreak in  sequential_sep do_group
- | For name linebreak in wordlist sequential_sep do_group
- ;
-name : NAME /* Apply rule 5 */
- ;
-in   : In   /* Apply rule 6 */
+/* Apply rule 7:*/
+for_clause : For NAME do_group
+ | For NAME sequential_sep do_group
+ | For NAME linebreak In sequential_sep do_group
+ | For NAME linebreak In wordlist sequential_sep do_group
  ;
 wordlist : wordlist WORD
  |  WORD
  ;
-case_clause  : Case WORD linebreak in linebreak case_listEsac
- | Case WORD linebreak in linebreak case_list_ns Esac
- | Case WORD linebreak in linebreak  Esac
+case_clause : Case WORD linebreak In linebreak case_list Esac
+ | Case WORD linebreak In linebreak case_list_ns Esac
+ | Case WORD linebreak In linebreak Esac
  ;
 case_list_ns : case_list case_item_ns
  |   case_item_ns
@@ -118,18 +115,22 @@
 case_list: case_list case_item
  |   case_item
  ;
-case_item_ns : pattern ')' linebreak
- | pattern ')' compound_list
+case_item_ns : pattern_not_esac ')' linebreak
+ | pattern_not_esac ')' compound_list
  | '(' pattern ')' linebreak
  | '(' pattern ')' compound_list
  ;
-case_item: pattern ')' linebreak DSEMI linebreak
- | pattern ')' compound_list DSEMI linebreak
+case_item : pattern_not_esac ')' linebreak DSEMI linebreak
+ | pattern_not_esac ')' compound_list DSEMI linebreak
  | '(' pattern ')' linebreak DSEMI linebreak
  | '(' pattern ')' compound_list DSEMI linebreak
  ;
-pattern  : WORD /* Apply rule 4 */
- | 

[1003.1(2016)/Issue7+TC2 0001105]: problems with backslashes in awk strings and EREs

2018-05-17 Thread Austin Group Bug Tracker

The following issue NEEDS AN INTERPRETATION. 
== 
http://austingroupbugs.net/view.php?id=1105 
== 
Reported By:stephane
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1105
Category:   Shell and Utilities
Type:   Enhancement Request
Severity:   Editorial
Priority:   normal
Status: Interpretation Required
Name:   Stéphane Chazelas 
Organization:
User Reference:  
Section:awk 
Page Number: 
Line Number: 
Interp Status:  Pending 
Final Accepted Text:http://austingroupbugs.net/view.php?id=1105#c4019 
== 
Date Submitted: 2016-12-05 21:52 UTC
Last Modified:  2018-05-17 15:17 UTC
== 
Summary:problems with backslashes in awk strings and EREs
== 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2016-12-05 21:52 stephane   New Issue
2016-12-05 21:52 stephane   Name  => Stéphane Chazelas
2016-12-05 21:52 stephane   Section   => awk 
2018-04-25 22:27 McDutchie  Note Added: 0003999  
2018-04-26 08:53 joerg  Note Added: 0004000  
2018-04-30 09:59 geoffclare Note Added: 0004014  
2018-04-30 10:39 stephane   Note Added: 0004015  
2018-04-30 11:06 McDutchie  Note Added: 0004016  
2018-04-30 11:26 geoffclare Note Added: 0004017  
2018-04-30 12:00 stephane   Note Added: 0004018  
2018-04-30 12:05 stephane   Note Edited: 0004018 
2018-04-30 12:05 stephane   Note Edited: 0004015 
2018-04-30 12:28 McDutchie  Note Edited: 0004016 
2018-05-03 15:54 geoffclare Note Added: 0004019  
2018-05-03 15:57 geoffclare Note Edited: 0004019 
2018-05-03 15:59 geoffclare Interp Status => Pending 
2018-05-03 15:59 geoffclare Final Accepted Text   =>
http://austingroupbugs.net/view.php?id=1105#c4019
2018-05-03 15:59 geoffclare Status   New => Interpretation
Required
2018-05-03 15:59 geoffclare Resolution   Open => Accepted As
Marked
2018-05-03 15:59 geoffclare Tag Attached: tc3-2008   
2018-05-03 16:00 geoffclare Note Added: 0004020  
2018-05-03 16:00 geoffclare Note Edited: 0004020 
2018-05-04 08:43 geoffclare Note Edited: 0004019 
2018-05-04 08:47 geoffclare Note Edited: 0004019 
2018-05-04 08:47 geoffclare Note Deleted: 0004020
2018-05-04 15:25 geoffclare Note Added: 0004022  
2018-05-04 15:25 geoffclare Status   Interpretation Required
=> Under Review
2018-05-04 15:25 geoffclare Resolution   Accepted As Marked =>
Reopened
2018-05-17 15:11 geoffclare Note Edited: 0004019 
2018-05-17 15:12 geoffclare Note Edited: 0004019 
2018-05-17 15:14 geoffclare Note Edited: 0004019 
2018-05-17 15:16 geoffclare Note Edited: 0004019 
2018-05-17 15:17 geoffclare Note Added: 0004036  
2018-05-17 15:17 geoffclare Status   Under Review =>
Interpretation Required
2018-05-17 15:17 geoffclare Resolution   Reopened => Accepted As
Marked
==




[1003.1(2016)/Issue7+TC2 0001105]: problems with backslashes in awk strings and EREs

2018-05-17 Thread Austin Group Bug Tracker

A NOTE has been added to this issue. 
== 
http://austingroupbugs.net/view.php?id=1105 
== 
Reported By:stephane
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1105
Category:   Shell and Utilities
Type:   Enhancement Request
Severity:   Editorial
Priority:   normal
Status: Under Review
Name:   Stéphane Chazelas 
Organization:
User Reference:  
Section:awk 
Page Number: 
Line Number: 
Interp Status:  Pending 
Final Accepted Text:http://austingroupbugs.net/view.php?id=1105#c4019 
== 
Date Submitted: 2016-12-05 21:52 UTC
Last Modified:  2018-05-17 15:17 UTC
== 
Summary:problems with backslashes in awk strings and EREs
== 

-- 
 (0004036) geoffclare (manager) - 2018-05-17 15:17
 http://austingroupbugs.net/view.php?id=1105#c4036 
-- 
http://austingroupbugs.net/view.php?id=1105#c4019 has been updated to add "when
not inside a bracket expression"
to the line 80144 change, and to add the line 80161 change. 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2016-12-05 21:52 stephane   New Issue
2016-12-05 21:52 stephane   Name  => Stéphane Chazelas
2016-12-05 21:52 stephane   Section   => awk 
2018-04-25 22:27 McDutchie  Note Added: 0003999  
2018-04-26 08:53 joerg  Note Added: 0004000  
2018-04-30 09:59 geoffclare Note Added: 0004014  
2018-04-30 10:39 stephane   Note Added: 0004015  
2018-04-30 11:06 McDutchie  Note Added: 0004016  
2018-04-30 11:26 geoffclare Note Added: 0004017  
2018-04-30 12:00 stephane   Note Added: 0004018  
2018-04-30 12:05 stephane   Note Edited: 0004018 
2018-04-30 12:05 stephane   Note Edited: 0004015 
2018-04-30 12:28 McDutchie  Note Edited: 0004016 
2018-05-03 15:54 geoffclare Note Added: 0004019  
2018-05-03 15:57 geoffclare Note Edited: 0004019 
2018-05-03 15:59 geoffclare Interp Status => Pending 
2018-05-03 15:59 geoffclare Final Accepted Text   =>
http://austingroupbugs.net/view.php?id=1105#c4019
2018-05-03 15:59 geoffclare Status   New => Interpretation
Required
2018-05-03 15:59 geoffclare Resolution   Open => Accepted As
Marked
2018-05-03 15:59 geoffclare Tag Attached: tc3-2008   
2018-05-03 16:00 geoffclare Note Added: 0004020  
2018-05-03 16:00 geoffclare Note Edited: 0004020 
2018-05-04 08:43 geoffclare Note Edited: 0004019 
2018-05-04 08:47 geoffclare Note Edited: 0004019 
2018-05-04 08:47 geoffclare Note Deleted: 0004020
2018-05-04 15:25 geoffclare Note Added: 0004022  
2018-05-04 15:25 geoffclare Status   Interpretation Required
=> Under Review
2018-05-04 15:25 geoffclare Resolution   Accepted As Marked =>
Reopened
2018-05-17 15:11 geoffclare Note Edited: 0004019 
2018-05-17 15:12 geoffclare Note Edited: 0004019 
2018-05-17 15:14 geoffclare Note Edited: 0004019 
2018-05-17 15:16 geoffclare Note Edited: 0004019 
2018-05-17 15:17 geoffclare Note Added: 0004036  
==




Re: can [[:digit:]] match something other than 0123456789?

2018-05-17 Thread keld
On Thu, May 17, 2018 at 12:36:35PM +0200, Hans Åberg wrote:
> 
> > On 17 May 2018, at 11:02, Joerg Schilling 
> >  wrote:
> > 
> > Hans Åberg  wrote:
> > 
>  |I asked a person who speaks japanese and he told me that
>  |
>  | "\u4e00\u4e8c\u4e09"
>  |
>  |is similar to
>  |
>  | "one two three"
>  |
>  |and this is not used for computing.
>  
>  If i recall correctly this has been discussed already; if not here
>  then on the Unicode list.  Unicode brings quite a lot of
>  codepoints, like CIRCLED DIGIT ONE, PARENTHESIZED DIGIT ONE, DIGIT
>  ONE FULL STOP etc.  All these are marked "No", and i think the
>  discussion concluded that they should not be taken into account
>  when converting strings to numbers.
> >> 
> >> The intent may be that the value of the digit character c can be computed 
> >> by the expression c - '0' when >= 0 and <= 9, and is otherwise a 
> >> non-digit. Then 'isdigit' and [[:digit:]] are tied to that, so it is 
> >> impossible to use any other decimal digits.
> > 
> > This seems to be an important idea, as this japanese one two three
> > is not in a contiguous order.
> 
> It provides an efficient implementation, important on earlier computers. The 
> UTF-8 article [1], "History", mentions that they struggled around 1992 to 
> find proposals for that providing efficient implementations.
> 
> 1. https://en.wikipedia.org/wiki/UTF-8

Oh, well. You should be able to implement efficient code for the specs from 
14652 and 30112,
one would be that you, after testing for isdigit, the you index into a 4-bit 
table
with the binary value corresponding to the digit character. This is probably on 
par speedwise
with  subtracting the value for zero.

Best regards
keld



Re: can [[:digit:]] match something other than 0123456789?

2018-05-17 Thread Hans Åberg

> On 17 May 2018, at 11:02, Joerg Schilling 
>  wrote:
> 
> Hans Åberg  wrote:
> 
 |I asked a person who speaks japanese and he told me that
 |
 | "\u4e00\u4e8c\u4e09"
 |
 |is similar to
 |
 | "one two three"
 |
 |and this is not used for computing.
 
 If i recall correctly this has been discussed already; if not here
 then on the Unicode list.  Unicode brings quite a lot of
 codepoints, like CIRCLED DIGIT ONE, PARENTHESIZED DIGIT ONE, DIGIT
 ONE FULL STOP etc.  All these are marked "No", and i think the
 discussion concluded that they should not be taken into account
 when converting strings to numbers.
>> 
>> The intent may be that the value of the digit character c can be computed by 
>> the expression c - '0' when >= 0 and <= 9, and is otherwise a non-digit. 
>> Then 'isdigit' and [[:digit:]] are tied to that, so it is impossible to use 
>> any other decimal digits.
> 
> This seems to be an important idea, as this japanese one two three
> is not in a contiguous order.

It provides an efficient implementation, important on earlier computers. The 
UTF-8 article [1], "History", mentions that they struggled around 1992 to find 
proposals for that providing efficient implementations.

1. https://en.wikipedia.org/wiki/UTF-8





Re: can [[:digit:]] match something other than 0123456789?

2018-05-17 Thread keld
On Thu, May 17, 2018 at 11:02:48AM +0200, Joerg Schilling wrote:
> Hans Åberg  wrote:
> 
> > >> |I asked a person who speaks japanese and he told me that
> > >> |
> > >> | "\u4e00\u4e8c\u4e09"
> > >> |
> > >> |is similar to
> > >> |
> > >> | "one two three"
> > >> |
> > >> |and this is not used for computing.
> > >> 
> > >> If i recall correctly this has been discussed already; if not here
> > >> then on the Unicode list.  Unicode brings quite a lot of
> > >> codepoints, like CIRCLED DIGIT ONE, PARENTHESIZED DIGIT ONE, DIGIT
> > >> ONE FULL STOP etc.  All these are marked "No", and i think the
> > >> discussion concluded that they should not be taken into account
> > >> when converting strings to numbers.
> >
> > The intent may be that the value of the digit character c can be computed 
> > by the expression c - '0' when >= 0 and <= 9, and is otherwise a non-digit. 
> > Then 'isdigit' and [[:digit:]] are tied to that, so it is impossible to use 
> > any other decimal digits.
> 
> This seems to be an important idea, as this japanese one two three
> is not in a contiguous order.

Well, the digits in other scripts are ordered consequetively, so the calculation
could easily be  done, for the scripts I previously documented, as prescribed 
in ISO 14652.
This is not rocket science.

Best regards
keld



Re: can [[:digit:]] match something other than 0123456789?

2018-05-17 Thread Joerg Schilling
Hans Åberg  wrote:

> >> |I asked a person who speaks japanese and he told me that
> >> |
> >> | "\u4e00\u4e8c\u4e09"
> >> |
> >> |is similar to
> >> |
> >> | "one two three"
> >> |
> >> |and this is not used for computing.
> >> 
> >> If i recall correctly this has been discussed already; if not here
> >> then on the Unicode list.  Unicode brings quite a lot of
> >> codepoints, like CIRCLED DIGIT ONE, PARENTHESIZED DIGIT ONE, DIGIT
> >> ONE FULL STOP etc.  All these are marked "No", and i think the
> >> discussion concluded that they should not be taken into account
> >> when converting strings to numbers.
>
> The intent may be that the value of the digit character c can be computed by 
> the expression c - '0' when >= 0 and <= 9, and is otherwise a non-digit. Then 
> 'isdigit' and [[:digit:]] are tied to that, so it is impossible to use any 
> other decimal digits.

This seems to be an important idea, as this japanese one two three
is not in a contiguous order.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'