[Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2023-04-11 Thread Austin Group Bug Tracker via austin-group-l at The Open Group


The following issue has been set as RELATED TO issue 0001662. 
== 
https://austingroupbugs.net/view.php?id=1550 
== 
Reported By:calestyo
Assigned To:
== 
Project:Issue 8 drafts
Issue ID:   1550
Category:   Shell and Utilities
Type:   Enhancement Request
Severity:   Editorial
Priority:   normal
Status: Applied
Name:   Christoph Anton Mitterer 
Organization:
User Reference:  
Section:Utilities, sed 
Page Number:3132, ff. (in the draft) 
Line Number:see below 
Final Accepted Text:https://austingroupbugs.net/view.php?id=1550#c5816 
Resolution: Accepted As Marked
Fixed in Version:   
== 
Date Submitted: 2022-01-14 05:32 UTC
Last Modified:  2022-05-26 10:34 UTC
== 
Summary:clarifications/ambiguities in the description of
context addresses and their delimiters for sed
==
Relationships   ID  Summary
--
related to  0001551 sed: ambiguities in the how BREs/EREs a...
related to  0001556 clarify meaning of \n used in a bracket...
related to  0001578 sed y-command: error in description abo...
related to  0001662 Delimiter issues in ed and ex
== 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2022-01-14 05:32 calestyo   New Issue
2022-01-14 05:32 calestyo   Name  => Christoph Anton
Mitterer
2022-01-14 05:32 calestyo   Section   => Utilities, sed  
2022-01-14 05:32 calestyo   Page Number   => 3132, ff. (in the
draft)
2022-01-14 05:32 calestyo   Line Number   => see below   
2022-01-14 05:40 calestyo   Note Added: 0005601  
2022-01-14 06:34 Don Cragun Relationship added   related to 0001551  
2022-01-14 06:52 Don Cragun Project 
1003.1(2016/18)/Issue7+TC2 => Issue 8 drafts
2022-01-14 06:54 Don Cragun Note Added: 0005603  
2022-01-14 06:54 Don Cragun version   => Draft 2.1   
2022-03-18 11:15 geoffclare Note Added: 0005756  
2022-03-18 11:15 geoffclare Note Edited: 0005756 
2022-03-25 16:18 geoffclare Note Added: 0005761  
2022-03-25 16:22 geoffclare Note Edited: 0005761 
2022-03-26 00:08 calestyo   Note Added: 0005767  
2022-03-26 00:34 calestyo   Note Edited: 0005767 
2022-03-31 16:00 nick   Relationship added   related to 0001556  
2022-04-02 01:53 calestyo   Note Added: 0005771  
2022-04-02 02:30 calestyo   Note Added: 0005772  
2022-04-02 09:37 kreNote Added: 0005775  
2022-04-02 19:47 calestyo   Note Added: 0005777  
2022-04-08 09:20 geoffclare Note Added: 0005790  
2022-04-17 23:51 calestyo   Note Added: 0005809  
2022-04-22 08:29 geoffclare Note Added: 0005816  
2022-04-28 15:12 geoffclare Final Accepted Text   =>
https://austingroupbugs.net/view.php?id=1550#c5816
2022-04-28 15:12 geoffclare Status   New => Resolved 
2022-04-28 15:12 geoffclare Resolution   Open => Accepted As
Marked
2022-04-28 15:13 geoffclare Tag Attached: issue8 
2022-05-05 15:00 nick   Relationship added   related to 0001578  
2022-05-26 10:34 geoffclare Status   Resolved => Applied 
2023-04-11 14:30 geoffclare Relationship added   related to 0001662  
==




[Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-05-26 Thread Austin Group Bug Tracker via austin-group-l at The Open Group


The following issue has a resolution that has been APPLIED. 
== 
https://austingroupbugs.net/view.php?id=1550 
== 
Reported By:calestyo
Assigned To:
== 
Project:Issue 8 drafts
Issue ID:   1550
Category:   Shell and Utilities
Type:   Enhancement Request
Severity:   Editorial
Priority:   normal
Status: Applied
Name:   Christoph Anton Mitterer 
Organization:
User Reference:  
Section:Utilities, sed 
Page Number:3132, ff. (in the draft) 
Line Number:see below 
Final Accepted Text:https://austingroupbugs.net/view.php?id=1550#c5816 
Resolution: Accepted As Marked
Fixed in Version:   
== 
Date Submitted: 2022-01-14 05:32 UTC
Last Modified:  2022-05-26 10:34 UTC
== 
Summary:clarifications/ambiguities in the description of
context addresses and their delimiters for sed
==
Relationships   ID  Summary
--
related to  0001551 sed: ambiguities in the how BREs/EREs a...
related to  0001556 clarify meaning of \n used in a bracket...
related to  0001578 sed y-command: error in description abo...
== 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2022-01-14 05:32 calestyo   New Issue
2022-01-14 05:32 calestyo   Name  => Christoph Anton
Mitterer
2022-01-14 05:32 calestyo   Section   => Utilities, sed  
2022-01-14 05:32 calestyo   Page Number   => 3132, ff. (in the
draft)
2022-01-14 05:32 calestyo   Line Number   => see below   
2022-01-14 05:40 calestyo   Note Added: 0005601  
2022-01-14 06:34 Don Cragun Relationship added   related to 0001551  
2022-01-14 06:52 Don Cragun Project 
1003.1(2016/18)/Issue7+TC2 => Issue 8 drafts
2022-01-14 06:54 Don Cragun Note Added: 0005603  
2022-01-14 06:54 Don Cragun version   => Draft 2.1   
2022-03-18 11:15 geoffclare Note Added: 0005756  
2022-03-18 11:15 geoffclare Note Edited: 0005756 
2022-03-25 16:18 geoffclare Note Added: 0005761  
2022-03-25 16:22 geoffclare Note Edited: 0005761 
2022-03-26 00:08 calestyo   Note Added: 0005767  
2022-03-26 00:34 calestyo   Note Edited: 0005767 
2022-03-31 16:00 nick   Relationship added   related to 0001556  
2022-04-02 01:53 calestyo   Note Added: 0005771  
2022-04-02 02:30 calestyo   Note Added: 0005772  
2022-04-02 09:37 kreNote Added: 0005775  
2022-04-02 19:47 calestyo   Note Added: 0005777  
2022-04-08 09:20 geoffclare Note Added: 0005790  
2022-04-17 23:51 calestyo   Note Added: 0005809  
2022-04-22 08:29 geoffclare Note Added: 0005816  
2022-04-28 15:12 geoffclare Final Accepted Text   =>
https://austingroupbugs.net/view.php?id=1550#c5816
2022-04-28 15:12 geoffclare Status   New => Resolved 
2022-04-28 15:12 geoffclare Resolution   Open => Accepted As
Marked
2022-04-28 15:13 geoffclare Tag Attached: issue8 
2022-05-05 15:00 nick   Relationship added   related to 0001578  
2022-05-26 10:34 geoffclare Status   Resolved => Applied 
==




[Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-05-05 Thread Austin Group Bug Tracker via austin-group-l at The Open Group


The following issue has been set as RELATED TO issue 0001578. 
== 
https://austingroupbugs.net/view.php?id=1550 
== 
Reported By:calestyo
Assigned To:
== 
Project:Issue 8 drafts
Issue ID:   1550
Category:   Shell and Utilities
Type:   Enhancement Request
Severity:   Editorial
Priority:   normal
Status: Resolved
Name:   Christoph Anton Mitterer 
Organization:
User Reference:  
Section:Utilities, sed 
Page Number:3132, ff. (in the draft) 
Line Number:see below 
Final Accepted Text:https://austingroupbugs.net/view.php?id=1550#c5816 
Resolution: Accepted As Marked
Fixed in Version:   
== 
Date Submitted: 2022-01-14 05:32 UTC
Last Modified:  2022-04-28 15:12 UTC
== 
Summary:clarifications/ambiguities in the description of
context addresses and their delimiters for sed
==
Relationships   ID  Summary
--
related to  0001551 sed: ambiguities in the how BREs/EREs a...
related to  0001556 clarify meaning of \n used in a bracket...
related to  0001578 sed y-command: error in description abo...
== 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2022-01-14 05:32 calestyo   New Issue
2022-01-14 05:32 calestyo   Name  => Christoph Anton
Mitterer
2022-01-14 05:32 calestyo   Section   => Utilities, sed  
2022-01-14 05:32 calestyo   Page Number   => 3132, ff. (in the
draft)
2022-01-14 05:32 calestyo   Line Number   => see below   
2022-01-14 05:40 calestyo   Note Added: 0005601  
2022-01-14 06:34 Don Cragun Relationship added   related to 0001551  
2022-01-14 06:52 Don Cragun Project 
1003.1(2016/18)/Issue7+TC2 => Issue 8 drafts
2022-01-14 06:54 Don Cragun Note Added: 0005603  
2022-01-14 06:54 Don Cragun version   => Draft 2.1   
2022-03-18 11:15 geoffclare Note Added: 0005756  
2022-03-18 11:15 geoffclare Note Edited: 0005756 
2022-03-25 16:18 geoffclare Note Added: 0005761  
2022-03-25 16:22 geoffclare Note Edited: 0005761 
2022-03-26 00:08 calestyo   Note Added: 0005767  
2022-03-26 00:34 calestyo   Note Edited: 0005767 
2022-03-31 16:00 nick   Relationship added   related to 0001556  
2022-04-02 01:53 calestyo   Note Added: 0005771  
2022-04-02 02:30 calestyo   Note Added: 0005772  
2022-04-02 09:37 kreNote Added: 0005775  
2022-04-02 19:47 calestyo   Note Added: 0005777  
2022-04-08 09:20 geoffclare Note Added: 0005790  
2022-04-17 23:51 calestyo   Note Added: 0005809  
2022-04-22 08:29 geoffclare Note Added: 0005816  
2022-04-28 15:12 geoffclare Final Accepted Text   =>
https://austingroupbugs.net/view.php?id=1550#c5816
2022-04-28 15:12 geoffclare Status   New => Resolved 
2022-04-28 15:12 geoffclare Resolution   Open => Accepted As
Marked
2022-04-28 15:13 geoffclare Tag Attached: issue8 
2022-05-05 15:00 nick   Relationship added   related to 0001578  
==




[Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-04-28 Thread Austin Group Bug Tracker via austin-group-l at The Open Group


The following issue has been RESOLVED. 
== 
https://austingroupbugs.net/view.php?id=1550 
== 
Reported By:calestyo
Assigned To:
== 
Project:Issue 8 drafts
Issue ID:   1550
Category:   Shell and Utilities
Type:   Enhancement Request
Severity:   Editorial
Priority:   normal
Status: Resolved
Name:   Christoph Anton Mitterer 
Organization:
User Reference:  
Section:Utilities, sed 
Page Number:3132, ff. (in the draft) 
Line Number:see below 
Final Accepted Text:https://austingroupbugs.net/view.php?id=1550#c5816 
Resolution: Accepted As Marked
Fixed in Version:   
== 
Date Submitted: 2022-01-14 05:32 UTC
Last Modified:  2022-04-28 15:12 UTC
== 
Summary:clarifications/ambiguities in the description of
context addresses and their delimiters for sed
==
Relationships   ID  Summary
--
related to  0001551 sed: ambiguities in the how BREs/EREs a...
related to  0001556 clarify meaning of \n used in a bracket...
== 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2022-01-14 05:32 calestyo   New Issue
2022-01-14 05:32 calestyo   Name  => Christoph Anton
Mitterer
2022-01-14 05:32 calestyo   Section   => Utilities, sed  
2022-01-14 05:32 calestyo   Page Number   => 3132, ff. (in the
draft)
2022-01-14 05:32 calestyo   Line Number   => see below   
2022-01-14 05:40 calestyo   Note Added: 0005601  
2022-01-14 06:34 Don Cragun Relationship added   related to 0001551  
2022-01-14 06:52 Don Cragun Project 
1003.1(2016/18)/Issue7+TC2 => Issue 8 drafts
2022-01-14 06:54 Don Cragun Note Added: 0005603  
2022-01-14 06:54 Don Cragun version   => Draft 2.1   
2022-03-18 11:15 geoffclare Note Added: 0005756  
2022-03-18 11:15 geoffclare Note Edited: 0005756 
2022-03-25 16:18 geoffclare Note Added: 0005761  
2022-03-25 16:22 geoffclare Note Edited: 0005761 
2022-03-26 00:08 calestyo   Note Added: 0005767  
2022-03-26 00:34 calestyo   Note Edited: 0005767 
2022-03-31 16:00 nick   Relationship added   related to 0001556  
2022-04-02 01:53 calestyo   Note Added: 0005771  
2022-04-02 02:30 calestyo   Note Added: 0005772  
2022-04-02 09:37 kreNote Added: 0005775  
2022-04-02 19:47 calestyo   Note Added: 0005777  
2022-04-08 09:20 geoffclare Note Added: 0005790  
2022-04-17 23:51 calestyo   Note Added: 0005809  
2022-04-22 08:29 geoffclare Note Added: 0005816  
2022-04-28 15:12 geoffclare Final Accepted Text   =>
https://austingroupbugs.net/view.php?id=1550#c5816
2022-04-28 15:12 geoffclare Status   New => Resolved 
2022-04-28 15:12 geoffclare Resolution   Open => Accepted As
Marked
==




Re: [Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-04-26 Thread Geoff Clare via austin-group-l at The Open Group
Christoph Anton Mitterer wrote, on 26 Apr 2022:
>
> There's the thing left with e.g. GNU sed per default allowing escape
> sequences in bracket expressions,... but you already have the changes
> from #1233, and I guess there's not much more one can do about this
> situation.
> Perhaps finding out, whether there are even any (current)
> implementations that behave the way POSIX would specify (i.e. not
> considering it an escape sequence) - if there were none, one could
> perhaps implement the "future directions".

It is tested by one of the UNIX conformance test suites, so we know
that at least every certified UNIX system behaves as stated in the
standard (and also some, such as Solaris, that were once certified
but no longer are).

> From my side, these three issues (#1550, #1551 and #1556) may be
> considered fixed with the current proposal in #1550’s note #5816.

Excellent news.  Thank you.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-04-25 Thread Christoph Anton Mitterer via austin-group-l at The Open Group
On Thu, 2022-04-21 at 15:10 +0100, Geoff Clare via austin-group-l at
The Open Group wrote:
> > > 
> > >  does not work, because if it appears unescaped later
> > > in
> > > the RE, it either escapes the following character, which can then
> > > never be the ending delimiter
> > 
> > It feels a bit is missing here... namely that it couldn't be
> > decided,
> > whether an unescaped '\' would escape the following character OR be
> > the
> > delimiter.
> > The above sentence seems to assume already that such unescaped '\'
> > wouldn't be the delimiter but rather the RE escape character... but
> > that is already impossible to decide in the first place (which is
> > why
> > it cannot be used)?!
> > 
> > And even if it was decided... and '\' was the escape character (and
> > not
> > the delimiter)... then:
> > In: "it either escapes the following character, which can then
> > never be
> > the ending delimiter" ... isn't that anyway never the case?!
> > 
> > If '\' is the escape character (and not the delimiter), then e.g.
> > in
> > '\.' the '.' would of course never be the delimiter - and if it was
> > the
> > delimiter that the '.' wouldn't be either!?
> > 
> > So that "which can then never be the ending delimiter" seems a bit
> > strange or I just don't understand it.
> 
> That part of the wording is pretty much exactly as per kre's
> suggestion,
> so perhaps he could explain it better, but my reading of it is that
> it
> is explaining why, *given the requirements stated in the normative
> text*,
> '\' cannot be the delimiter. Whereas you seem to be thinking in terms
> of
> what was there to prevent sed being designed differently so that '\'
> could
> be a delimiter.

Yes that's probably the reason.



> > > 
> > > The proposed normative text clearly forbids it, and the rationale
> > > points out that it forbids it.  I see no reason to do anything
> > > more
> > > here.
> > 
> > Personally, I still think that this should be said more directly
> > (and
> > not "hidden" behind the fact that such characters are not "special
> > characters" but merely "characters that may get 'special
> > meaning'").
> > It seems all to easy to consider e.g. in BRE '(' as "somewhat"
> > special.
> 
> I think you're missing the point that XBD 9.3.3 and 9.4.3 *list*
> exactly
> which characters are BRE/ERE special characters.

I know,... my main motivation was/is that it may be too easy for people
to not realise that there is a difference between "special character"
and "character that may get a special meaning when escaped".

Maybe I'm just overcautious because of the bug with '\+' in BusyBox sed
... and perhaps also because GNU sed's escape sequences in bracket
expressions.


> but I will tweak the wording
> around the references to 9.3.3 and 9.4.3 to use the word "listed".
> Hopefully that would be clearer to casual readers that they really
> need
> to refer to those lists of characters to understand the requirement.

It's still only "indirect"... but better, yes. I guess it's enough.



> > * (g): The first part, i.e what '\n' in snAAA\nnXXXn is?
> >    => newline or the "undelimitered" delimiter character?
> 
> Once I change the text to say "listed", this will be perfectly clear,
> since 'n' is not listed as a BRE special character in XBD 9.3.3 or an
> ERE special character in XBD 9.4.3.

Hmm, well if you think so. My idea was that since both these rules (the
one that would specify it to be the literal 'n' ... and the one that
says \n is newline) were made on the same "level" it could be unclear
which of them wins over the other.

Probably BusyBox sed, which also handles it as  does so
because it follows GNU sed, and not because of ambiguity.




I've looked through all other changes in the new proposal and like them
quite a lot.


The only minor think that may perhaps be considered is:

> escaping occurrences of the  or c delimiter

Perhaps overkill, but in order to make it clear that  here means
only when  is the delimiter:
> "... the delimiter  respectively the delimiter c

But I guess from the context it should also become clear in the current
form, so leave it as you wish.



There's the thing left with e.g. GNU sed per default allowing escape
sequences in bracket expressions,... but you already have the changes
from #1233, and I guess there's not much more one can do about this
situation.
Perhaps finding out, whether there are even any (current)
implementations that behave the way POSIX would specify (i.e. not
considering it an escape sequence) - if there were none, one could
perhaps implement the "future directions".



>From my side, these three issues (#1550, #1551 and #1556) may be
considered fixed with the current proposal in #1550’s note #5816.


Thanks for the considerable efforts everyone participating put into
this. :-)


Cheers,
Chris.



[Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-04-22 Thread Austin Group Bug Tracker via austin-group-l at The Open Group


A NOTE has been added to this issue. 
== 
https://austingroupbugs.net/view.php?id=1550 
== 
Reported By:calestyo
Assigned To:
== 
Project:Issue 8 drafts
Issue ID:   1550
Category:   Shell and Utilities
Type:   Enhancement Request
Severity:   Editorial
Priority:   normal
Status: New
Name:   Christoph Anton Mitterer 
Organization:
User Reference:  
Section:Utilities, sed 
Page Number:3132, ff. (in the draft) 
Line Number:see below 
Final Accepted Text: 
== 
Date Submitted: 2022-01-14 05:32 UTC
Last Modified:  2022-04-22 08:29 UTC
== 
Summary:clarifications/ambiguities in the description of
context addresses and their delimiters for sed
==
Relationships   ID  Summary
--
related to  0001551 sed: ambiguities in the how BREs/EREs a...
related to  0001556 clarify meaning of \n used in a bracket...
== 

-- 
 (0005816) geoffclare (manager) - 2022-04-22 08:29
 https://austingroupbugs.net/view.php?id=1550#c5816 
-- 
Updated proposal following further discussion on the mailing list ...

These changes are to fix this bug and also bugs
https://austingroupbugs.net/view.php?id=1551 and
https://austingroupbugs.net/view.php?id=1556.

On page 3134 line 106070 section sed, change:or a context
address (which consists of an RE, as described in [xref to Regular
Expressions in sed], preceded and followed by a delimiter, usually a
).to:or a context address. A context
address has either the form "/RE/" or "\cREc", where RE is a
regular expression as described in [xref to Regular Expressions in sed],
and c is any character other than  or . In a
sed context address, the BRE and ERE syntax shall be extended to
support escaping occurrences of the  or c delimiter within
the RE by means of an escape sequence (see [xref to XBD 9.1]). For the
"\cREc" form, if the character designated by c is not
listed as a special BRE character (if the -E option is not
specified) or a special ERE character (if -E is specified) in [xref
to XBD 9.3.3] or [xref to XBD 9.4.3], respectively, the escape sequence
c shall be treated as that literal character; otherwise,
it is unspecified whether the escape sequence c is
treated as the literal character or the special character. In either case,
the escape sequence c shall not terminate the RE. For
example, in the context address "/abc\/def/", the second  stands for
itself, so that the RE is "abc/def", and in "\xabc\xdefx", the second 'x'
stands for itself, so that the RE is "abcxdef".

On page 3134 line 106085 section sed, change:Both BREs and EREs
shall also support the following additionsto:In
sed, the BRE and ERE syntax shall be extended as
follows
On page 3134 line 106087 section sed, replace the first bullet item
(beginning "In a context address") with:The delimiter character
that precedes and follows the RE shall not terminate the RE when it appears
within a bracket expression, and shall have its normal meaning in the
bracket expression. For example, the context address "\%[%]%" is equivalent
to "/[%]/", and the command "s-[0-9]--g" is equivalent to
"s/[0-9]//g".
On page 3137 line 106204 section sed (s command), change:Within
the RE and the replacement, the RE delimiter itself can be used as a
literal character if it is preceded by a
.to:Within the RE (as a sed
extension to the BRE and ERE syntax) and the replacement, the delimiter
shall not terminate the RE or replacement if it is the second character of
an escape sequence (see [xref to XBD 9.1]). If the delimiter character is
not listed as a special BRE character (if the -E option is not
specified) or a special ERE character (if -E is specified) in [xref
to XBD 9.3.3] or [xref to XBD 9.4.3], respectively, the escaped delimiter
shall be treated as that literal character in the RE; otherwise, it is
unspecified whether the escaped delimiter is treated as the literal
character or the special character. Likewise, if the delimiter character is
not  ('&'), the escaped delimiter shall be treated as that
literal character in the replacement; if it is , it is
unspecified 

Re: [Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-04-21 Thread Geoff Clare via austin-group-l at The Open Group
Christoph Anton Mitterer wrote, on 18 Apr 2022:
>
> On Tue, 2022-04-05 at 15:54 +0100, Geoff Clare via austin-group-l at
> The Open Group wrote:
> > > ---
> > > --- 
> > >  (0005771) calestyo (reporter) - 2022-04-02 01:53
> > >  https://austingroupbugs.net/view.php?id=1550#c5771 
> > > ---
> > > --- 
> > 
> > > So maybe, in "Addresses in sed" we should better *only* describe
> > > the \cREc
> > > form of these,... and link to "Regular Expressions in sed" for how
> > > delimiters are escaped?
> > 
> > I think it works better the way I have it (which you said you could
> > live
> > with).
> 
[...]
> 
> Anyway... your decision.

I still prefer it the way I have it.

> issue #1550, note #5771, point (c): I assume this *really* is on
> purpose?!

I don't know if it happened on purpose or by accident, but it's *right*
either way, as it's what implementations do.

> > > (Ic), (Id) as well as my original (2b) would be fixed, if we'd
> > > write
> > > something like:
> > > "When the delimiter character c is , a context address \/RE/
> > > can
> > > also be written as /RE/." (or something similar but better).
> > > That would make it clear that \/RE/ is allowed and identical to
> > > /RE/ and at
> > > the same time define /RE/.
> > 
> > I'd suggest just changing the "example" part of my proposal.
> > I.e. instead of:
> > 
> >     For example, the context address "\xabc\xdefx" is equivalent to
> >     "/abcxdef/".
> > 
> > it could say:
> > 
> >     The construction "\cREc" does not need to be used when the
> > delimiter
> >     is a ; for example, the context address "\xabc\xdefx" is
> >     equivalent to "/abcxdef/".
> 
> I guess I could live with that... if I had to ;-)
> 
[...]
> 
> Maybe a completely different way to go would be to define a context
> address as:
> dREd
> with d being the delimiter, RE the regular expression and with d being
> any character other than  and , and any character
> other than  needing to have the first 'd' (only!) escaped with
>  as in \dREd ... and  *may* have it's first
> occurrence escaped.

I don't think this dREd idea works, but I'm beginning to think that
something more up-front about the two different context address forms
would be better than the current text, so I'll try to rework that
paragraph appropriately.

> > > ---
> > > --- 
> > >  (0005777) calestyo (reporter) - 2022-04-02 19:47
> > >  https://austingroupbugs.net/view.php?id=1550#c5777 
> > > ---
> > > --- 
> > > That indented paragraph of yours (in Note 0005775) should (if at
> > > all) only
> > > go to the Rationale, IMO. At least the part which describes *why*
> > >  and  cannot be used.
> > 
> > I'll put something similar in rationale.
> 
> Regarding that:
> 
> > (even in a context address using "\cREc")
> 
> Is there anything special about context address vs. the other "place"
> where this is relevant (i.e. s-command) which I miss, so that you
> specifically mention the context addresses here?
> 
> I just ask because *not* mentioning the "obvious" other "place" where
> the same thing should apply, could lead people to question whether/why
> something may be different there.

I'll remove the text in parentheses.

> >  does not work, because if it appears unescaped later in
> > the RE, it either escapes the following character, which can then
> > never be the ending delimiter
> 
> It feels a bit is missing here... namely that it couldn't be decided,
> whether an unescaped '\' would escape the following character OR be the
> delimiter.
> The above sentence seems to assume already that such unescaped '\'
> wouldn't be the delimiter but rather the RE escape character... but
> that is already impossible to decide in the first place (which is why
> it cannot be used)?!
> 
> And even if it was decided... and '\' was the escape character (and not
> the delimiter)... then:
> In: "it either escapes the following character, which can then never be
> the ending delimiter" ... isn't that anyway never the case?!
> 
> If '\' is the escape character (and not the delimiter), then e.g. in
> '\.' the '.' would of course never be the delimiter - and if it was the
> delimiter that the '.' wouldn't be either!?
> 
> So that "which can then never be the ending delimiter" seems a bit
> strange or I just don't understand it.

That part of the wording is pretty much exactly as per kre's suggestion,
so perhaps he could explain it better, but my reading of it is that it
is explaining why, *given the requirements stated in the normative text*,
'\' cannot be the delimiter. Whereas you seem to be thinking in terms of
what was there to prevent sed being designed differently so that '\' could
be a delimiter.

> > or it forms part of a bracket expression
> 
> "or it *is* part..."?
> Just 

[Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-04-17 Thread Austin Group Bug Tracker via austin-group-l at The Open Group


A NOTE has been added to this issue. 
== 
https://www.austingroupbugs.net/view.php?id=1550 
== 
Reported By:calestyo
Assigned To:
== 
Project:Issue 8 drafts
Issue ID:   1550
Category:   Shell and Utilities
Type:   Enhancement Request
Severity:   Editorial
Priority:   normal
Status: New
Name:   Christoph Anton Mitterer 
Organization:
User Reference:  
Section:Utilities, sed 
Page Number:3132, ff. (in the draft) 
Line Number:see below 
Final Accepted Text: 
== 
Date Submitted: 2022-01-14 05:32 UTC
Last Modified:  2022-04-17 23:51 UTC
== 
Summary:clarifications/ambiguities in the description of
context addresses and their delimiters for sed
==
Relationships   ID  Summary
--
related to  0001551 sed: ambiguities in the how BREs/EREs a...
related to  0001556 clarify meaning of \n used in a bracket...
== 

-- 
 (0005809) calestyo (reporter) - 2022-04-17 23:51
 https://www.austingroupbugs.net/view.php?id=1550#c5809 
-- 
With respect to the proposal in #5790 and any open points in this issue as
well as issues #1551 and #1556, I've replied to Geoff's longer mail on the
austin-group-l@opengroup.org list.

It contains some ideas for improvement (including some concrete ones like
for better examples)... and also a list of those points that haven't been
dealt with, yet, (the most important one IMO being the case of what '\n' is
in 'snAAA\nnXXXn' - newline or escaped delimiter character).

So may I kindly ask any participant in this ticket, to look there? 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2022-01-14 05:32 calestyo   New Issue
2022-01-14 05:32 calestyo   Name  => Christoph Anton
Mitterer
2022-01-14 05:32 calestyo   Section   => Utilities, sed  
2022-01-14 05:32 calestyo   Page Number   => 3132, ff. (in the
draft)
2022-01-14 05:32 calestyo   Line Number   => see below   
2022-01-14 05:40 calestyo   Note Added: 0005601  
2022-01-14 06:34 Don Cragun Relationship added   related to 0001551  
2022-01-14 06:52 Don Cragun Project 
1003.1(2016/18)/Issue7+TC2 => Issue 8 drafts
2022-01-14 06:54 Don Cragun Note Added: 0005603  
2022-01-14 06:54 Don Cragun version   => Draft 2.1   
2022-03-18 11:15 geoffclare Note Added: 0005756  
2022-03-18 11:15 geoffclare Note Edited: 0005756 
2022-03-25 16:18 geoffclare Note Added: 0005761  
2022-03-25 16:22 geoffclare Note Edited: 0005761 
2022-03-26 00:08 calestyo   Note Added: 0005767  
2022-03-26 00:34 calestyo   Note Edited: 0005767 
2022-03-31 16:00 nick   Relationship added   related to 0001556  
2022-04-02 01:53 calestyo   Note Added: 0005771  
2022-04-02 02:30 calestyo   Note Added: 0005772  
2022-04-02 09:37 kreNote Added: 0005775  
2022-04-02 19:47 calestyo   Note Added: 0005777  
2022-04-08 09:20 geoffclare Note Added: 0005790  
2022-04-17 23:51 calestyo   Note Added: 0005809  
==




Re: [Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-04-17 Thread Christoph Anton Mitterer via austin-group-l at The Open Group
On Tue, 2022-04-05 at 15:54 +0100, Geoff Clare via austin-group-l at
The Open Group wrote:
> > ---
> > --- 
> >  (0005771) calestyo (reporter) - 2022-04-02 01:53
> >  https://austingroupbugs.net/view.php?id=1550#c5771 
> > ---
> > --- 
> 
> > So maybe, in "Addresses in sed" we should better *only* describe
> > the \cREc
> > form of these,... and link to "Regular Expressions in sed" for how
> > delimiters are escaped?
> 
> I think it works better the way I have it (which you said you could
> live
> with).

There are IMO two aspects here:
- I think it makes it a bit less unorganised, because from the 2nd
  sentence on, it's less about the context address, but more about
  something resulting from these (the delimiter) as part of the RE.

- A key part in (definitely) understanding how the delimiters (and
  '\n' as newline) in the RE part of context addresses and s-commands
  work semantically is, that the RE language itself is extended by
  these.

In the "Addresses in sed" section this is kinda done via "The BRE and
ERE syntax shall additionally support"

For the s-command it's however not that directly emphasised.
(See my proposal further below.)


That's why I had thought putting all into in "Regular Expressions in
sed" and emphasising there that for sed, the RE languages are extended
by the following which is considered to be part of them, could make
things better..

Of course however, this wouldn't work for the y-command.


Anyway... your decision.



issue #1550, note #5771, point (c): I assume this *really* is on
purpose?!


> > (Ic), (Id) as well as my original (2b) would be fixed, if we'd
> > write
> > something like:
> > "When the delimiter character c is , a context address \/RE/
> > can
> > also be written as /RE/." (or something similar but better).
> > That would make it clear that \/RE/ is allowed and identical to
> > /RE/ and at
> > the same time define /RE/.
> 
> I'd suggest just changing the "example" part of my proposal.
> I.e. instead of:
> 
>     For example, the context address "\xabc\xdefx" is equivalent to
>     "/abcxdef/".
> 
> it could say:
> 
>     The construction "\cREc" does not need to be used when the
> delimiter
>     is a ; for example, the context address "\xabc\xdefx" is
>     equivalent to "/abcxdef/".

I guess I could live with that... if I had to ;-)


It's still a bit non-straightforward, I think,... we have now:
- > context address (which consists of an RE, as described in Regular
  > Expressions in sed, preceded and followed by a delimiter, usually a
  > ).

  This basically says that a context-address is:
  RE
  and that delimiter is usually a , which alone doesn't say
  strictly anything whether or not it needs to be escaped.

- > In a context address, any character other than  or
  >  can be specified for use as the delimiter by means of
  > the construction "\cREc", where c is the chosen delimiter 
  > character.

  This basically says: Anything else than  or 
  (thus including ) can be used via "\cREc".

- > The construction "\cREc" does not need to be used when the
  > delimiter is a 
  (from the current proposal in:
  https://austingroupbugs.net/view.php?id=1550#c5790 )

  This kinda works... but what I dislike about it is, that it says the
  "[whole] construct" doesn't need to be used...


All these together seems like some circular dependency to me.

But the actual point would be: If  is used as delimiter, the
first one in cREc doesn't need to be escaped with .

That's what I'd thought my proposed:
> When the delimiter character c is , a context address \/RE/
> can also be written as /RE/."

would nicely resolve.


Maybe a completely different way to go would be to define a context
address as:
dREd
with d being the delimiter, RE the regular expression and with d being
any character other than  and , and any character
other than  needing to have the first 'd' (only!) escaped with
 as in \dREd ... and  *may* have it's first
occurrence escaped.


But,... I'd guess that any reader, who's a bit familiar with sed, would
still be able to realise from your current proposal what's meant.

So, if you wish, keep it as is... from my side.


> > Oh, and if you should change your proposed text,... could you
> > please always
> > make a new post
> 
> I expect that too much will change to make it reasonable to edit in
> place anyway.

*cough* git-based-workflow *cough* O;-)




> > ---
> > --- 
> >  (0005777) calestyo (reporter) - 2022-04-02 19:47
> >  https://austingroupbugs.net/view.php?id=1550#c5777 
> > ---
> > --- 
> > That indented paragraph of yours (in Note 0005775) should (if at
> > all) only
> > go to the Rationale, IMO. At least the part which describes *why*
> >  and  cannot be used.
> 
> I'll put something 

Re: [Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-04-16 Thread Christoph Anton Mitterer via austin-group-l at The Open Group
On Tue, 2022-04-05 at 23:33 +0700, Robert Elz via austin-group-l at The
Open Group wrote:
> Not just portable, but sane.   Only a moron would actually use . ? *
> [ ( ...
> as a delimiter, there are plenty of perfectly good alternatives
> available
> when good old / isn't the best choice (which it often isn't when
> manipulating path names).   Personally I'm quite fond of ascii BEL
> (^G)
> as the delimiter in the cases when neither / nor ; (my 2 favourites)
> are really available (while BEL probably isn't technically portable,
> it always works in my experience).
> 
> It still needs to be clear that it is possible to be a moron if one
> wants, but in such cases, some things just might not be possible.

I should perhaps add that I don't actually want to use special
characters as delimiters myself. ;-)

My use case is rather writing a function which escapes arbitrary
strings as literal for use in BREs respectivel EREs and also for the
use in sed commands (thus the delimiter to be used by the user of my
function needs to be considered).

I didn't just want to forbid using any special characters as delimiter,
if it would technically work.


>   | It might be worth altering this somehow, but "literal" is wrong
>   | (specifically if the delimiter is '^' or '-', or things like ':'
> in
>   | [[:alpha:]]).
> 
> That depends upon the context of the word "literal" there - I just
> took
> it to mean that the character would mean the same thing as it would
> if it
> were not also the delimiter, not that it would be deprived of any
> other
> magic properties it might gain by such use.

I like Geoff's choice of "normal" plus the example.

"Literal" and assuming some context (even when explaining it) would
have just made the text again ambiguous or at least more complex to
read.


>   | > => And perhaps something like "should put it inside a bracket
>   | > expression __with not other characters__" to make clear, that
> one
>   | > cannot re-use one e.g. 'sX\X[0-9]XfooX' can NOT be written as
>   | > 'sX[X0-9]XfooX' but only as 'sX[X][0-9]XfooX'.
>   |
>   | Incorrect, sX[X0-9]XfooX is required to "work"
> 
> I think the point there was that it doesn't mean the same thing, in
> that
> one a single char is being substituted, in the others, it is a 2 char
> sequence,
> the delimiter, followed by a digit, not either the delimiter or a
> digit.

I've added my thoughts about the solution Geoff and you came up with in
my upcoming (probably tomorrow) reply to Geoff's longer mail.


Thanks,
Chris.



[Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-04-08 Thread Austin Group Bug Tracker via austin-group-l at The Open Group


A NOTE has been added to this issue. 
== 
https://austingroupbugs.net/view.php?id=1550 
== 
Reported By:calestyo
Assigned To:
== 
Project:Issue 8 drafts
Issue ID:   1550
Category:   Shell and Utilities
Type:   Enhancement Request
Severity:   Editorial
Priority:   normal
Status: New
Name:   Christoph Anton Mitterer 
Organization:
User Reference:  
Section:Utilities, sed 
Page Number:3132, ff. (in the draft) 
Line Number:see below 
Final Accepted Text: 
== 
Date Submitted: 2022-01-14 05:32 UTC
Last Modified:  2022-04-08 09:20 UTC
== 
Summary:clarifications/ambiguities in the description of
context addresses and their delimiters for sed
==
Relationships   ID  Summary
--
related to  0001551 sed: ambiguities in the how BREs/EREs a...
related to  0001556 clarify meaning of \n used in a bracket...
== 

-- 
 (0005790) geoffclare (manager) - 2022-04-08 09:20
 https://austingroupbugs.net/view.php?id=1550#c5790 
-- 
New proposed changes to fix this bug and bug
https://austingroupbugs.net/view.php?id=1551 ...

On page 3134 line 106070 section sed, after:... preceded and
followed by a delimiter, usually a ).add a new
paragraph:In a context address, any character other than
 or  can be specified for use as the delimiter by means
of the construction "\cREc", where c is the chosen
delimiter character. The BRE and ERE syntax shall additionally support
escaping occurrences of the delimiter within the RE by means of an escape
sequence (see [xref to XBD 9.1]). If the character designated by c
is not special in a BRE (if the -E option is not specified) or ERE
(if -E is specified) according to [xref to XBD 9.3.3] or [xref to
XBD 9.4.3], respectively, the escape sequence c shall be
treated as that literal character; otherwise, it is unspecified whether the
escape sequence c is treated as the literal character or
the special character. In either case, the escape sequence
c shall not terminate the RE. The construction
"\cREc" does not need to be used when the delimiter is a
; for example, the context address "\xabc\xdefx" is equivalent to
"/abcxdef/".
On page 3134 line 106087 section sed, replace the first bullet item
(beginning "In a context address") with:The delimiter character
that precedes and follows the RE shall not terminate the RE when it appears
within a bracket expression, and shall have its normal meaning in the
bracket expression. For example, the context address "/[/]/" is equivalent
to "/\//", and the command "s-[0-9]--g" is equivalent to
"s/[0-9]//g".
On page 3137 line 106204 section sed (s command), change:Within
the RE and the replacement, the RE delimiter itself can be used as a
literal character if it is preceded by a
.to:Within the RE and the replacement,
the delimiter shall not terminate the RE or replacement if it is the second
character of an escape sequence (see [xref to XBD 9.1]). If the delimiter
character is not special in a BRE (if the -E option is not
specified) or ERE (if -E is specified) according to [xref to XBD
9.3.3] or [xref to XBD 9.4.3], respectively, the escaped delimiter shall be
treated as that literal character in the RE; otherwise, it is unspecified
whether the escaped delimiter is treated as the literal character or the
special character. Likewise, if the delimiter character is not 
('&'), the escaped delimiter shall be treated as that literal character in
the replacement; if it is , it is unspecified whether the
escaped delimiter is treated as the literal character or the special
character (see below).
On page 3138 line 106253 section sed (y command), change:...
the delimiter itself can be used as a literal character if it is preceded
by a . If a  character is immediately followed by a
 character in string1 or string2, the two
 characters shall be counted as a single literal 
character.to:... the delimiter itself can be used
as a literal character if it is preceded by an unescaped . If a
 character is escaped by an immediately preceding unescaped
 character in string1 or string2, the two
 

Re: [Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-04-07 Thread Robert Elz via austin-group-l at The Open Group
Date:Thu, 07 Apr 2022 18:15:55 +0700
From:"Robert Elz via austin-group-l at The Open Group" 

Message-ID:  <5473.1649330...@jinx.noi.kre.to>

  |   | e.g. adding:
  |   |
  |   | For example, the context address "\.[.][0-9]." is equivalent
  |   | to "/\.[0-9]/".
  |
  | Looks good to me.

Actually, to make things even clearer, you might want to add to that:

, however with "\.\.[0-9]." it is unspecified which of
"/\.[0-9]/" or "/.[0-9]/" is its equivalent form.

kre



Re: [Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-04-07 Thread Robert Elz via austin-group-l at The Open Group
Date:Thu, 7 Apr 2022 10:37:06 +0100
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  <20220407093706.GA7005@localhost>

  | The new definition in bug 1546 is specific to regular expressions
  | (since it talks about the backslash not being in a bracket expression),

Yes, of course.

  | e.g. adding:
  |
  | For example, the context address "\.[.][0-9]." is equivalent
  | to "/\.[0-9]/".

Looks good to me.

kre



Re: [Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-04-07 Thread Geoff Clare via austin-group-l at The Open Group
Robert Elz wrote, on 05 Apr 2022:
>
>   | Okay, I'll see what I can do.  It may make sense to use the new
>   | definition of "escape sequence" from bug 1546.
> 
>   | It won't be possible in the y command, as that doesn't use an RE (so
>   | would need its own definition of "escape character").
> 
> I wasn't paying attention to just where any of this was to be placed in
> the final doc, but couldn't the definition of "escape sequence" (and those
> related to it) be somewhere generic?

The new definition in bug 1546 is specific to regular expressions
(since it talks about the backslash not being in a bracket expression),
so it's going in XBD 9.1 with the other "Regular Expression Definitions".

>   | What matters is that the delimiter can only be escaped with an
>   | _unescaped_ backslash, and that it doesn't end the RE when it is in a
>   | bracket expression. I believe my proposal makes both of those things
>   | clear.
> 
> I suspect that the point was more related to when 2-pass parsing is used,
> and an escaped delimiter is seen, does the second pass still see the escaped
> delimiter, or is it now unescaped.   I'm no sed expert (I use it a lot,
> but have never really looked into an implementation, and don't push the
> wacky boundary cases in my uses) - but I believe this is to be explicitly
> unspecified (that is, implementations can do either, and applications must
> not depend upon which is done).

Yes that issue, too, is covered by the proposed text.  I believe it
says everything that needs to be said about behaviour that could be
affected by the number of passes, and nothing needs to be said about
the actual number of passes.

>   | > => And perhaps something like "should put it inside a bracket
>   | > expression __with not other characters__" to make clear, that one
>   | > cannot re-use one e.g. 'sX\X[0-9]XfooX' can NOT be written as
>   | > 'sX[X0-9]XfooX' but only as 'sX[X][0-9]XfooX'.
>   |
>   | Incorrect, sX[X0-9]XfooX is required to "work"
> 
> I think the point there was that it doesn't mean the same thing, in that
> one a single char is being substituted, in the others, it is a 2 char 
> sequence,
> the delimiter, followed by a digit, not either the delimiter or a digit.

Okay, it seems there's a mismatch between the stated problem and the
proposed solution.  The proposed solution implies that the delimiter
character can't be put in a bracket expression with other characters,
even if that's the behaviour the user wants.  Rather than try to come
up with new descriptive words, I think it could be illustrated with
an example, e.g. adding:

For example, the context address "\.[.][0-9]." is equivalent
to "/\.[0-9]/".

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-04-05 Thread Robert Elz via austin-group-l at The Open Group
Date:Tue, 5 Apr 2022 15:54:40 +0100
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  <20220405145440.GB6489@localhost>

  | Okay, I'll see what I can do.  It may make sense to use the new
  | definition of "escape sequence" from bug 1546.

  | It won't be possible in the y command, as that doesn't use an RE (so
  | would need its own definition of "escape character").

I wasn't paying attention to just where any of this was to be placed in
the final doc, but couldn't the definition of "escape sequence" (and those
related to it) be somewhere generic?   It might even be worth (since it
is so common) defining a "backslash escape sequence" in XBD - but for
that allow it to have an application specified following sequence (one or
more following characters, as defined by the application), and then for REs
just define only the case for a single following char.

Or just define "escape sequence" and leave it for the application also to
define what the escape character is (are there many that don't use \ though?)

  | What matters is that the delimiter can only be escaped with an
  | _unescaped_ backslash, and that it doesn't end the RE when it is in a
  | bracket expression. I believe my proposal makes both of those things
  | clear.

I suspect that the point was more related to when 2-pass parsing is used,
and an escaped delimiter is seen, does the second pass still see the escaped
delimiter, or is it now unescaped.   I'm no sed expert (I use it a lot,
but have never really looked into an implementation, and don't push the
wacky boundary cases in my uses) - but I believe this is to be explicitly
unspecified (that is, implementations can do either, and applications must
not depend upon which is done).


  | It really is hardly any limitation on applications if they need to
  | avoid using special RE characters as delimiters in order to be portable

I agree.

Not just portable, but sane.   Only a moron would actually use . ? * [ ( ...
as a delimiter, there are plenty of perfectly good alternatives available
when good old / isn't the best choice (which it often isn't when
manipulating path names).   Personally I'm quite fond of ascii BEL (^G)
as the delimiter in the cases when neither / nor ; (my 2 favourites)
are really available (while BEL probably isn't technically portable,
it always works in my experience).

It still needs to be clear that it is possible to be a moron if one
wants, but in such cases, some things just might not be possible.

  | It might be worth altering this somehow, but "literal" is wrong
  | (specifically if the delimiter is '^' or '-', or things like ':' in
  | [[:alpha:]]).

That depends upon the context of the word "literal" there - I just took
it to mean that the character would mean the same thing as it would if it
were not also the delimiter, not that it would be deprived of any other
magic properties it might gain by such use.

  | > => And perhaps something like "should put it inside a bracket
  | > expression __with not other characters__" to make clear, that one
  | > cannot re-use one e.g. 'sX\X[0-9]XfooX' can NOT be written as
  | > 'sX[X0-9]XfooX' but only as 'sX[X][0-9]XfooX'.
  |
  | Incorrect, sX[X0-9]XfooX is required to "work"

I think the point there was that it doesn't mean the same thing, in that
one a single char is being substituted, in the others, it is a 2 char sequence,
the delimiter, followed by a digit, not either the delimiter or a digit.

kre



Re: [Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-04-05 Thread Geoff Clare via austin-group-l at The Open Group
Replying to a whole bunch of bugnotes here, including two from 1551.
Together they are very long, so I've only quoted the minimum necessary
to give context to my replies.

> -- 
>  (0005771) calestyo (reporter) - 2022-04-02 01:53
>  https://austingroupbugs.net/view.php?id=1550#c5771 
> -- 

> So maybe, in "Addresses in sed" we should better *only* describe the \cREc
> form of these,... and link to "Regular Expressions in sed" for how
> delimiters are escaped?

I think it works better the way I have it (which you said you could live
with).  The s command and y command need to describe delimiter escaping in
the replacement as well, so I think it makes sense to keep all delimiter
escaping together for those.

> (Ic), (Id) as well as my original (2b) would be fixed, if we'd write
> something like:
> "When the delimiter character c is , a context address \/RE/ can
> also be written as /RE/." (or something similar but better).
> That would make it clear that \/RE/ is allowed and identical to /RE/ and at
> the same time define /RE/.

I'd suggest just changing the "example" part of my proposal.
I.e. instead of:

For example, the context address "\xabc\xdefx" is equivalent to
"/abcxdef/".

it could say:

The construction "\cREc" does not need to be used when the delimiter
is a ; for example, the context address "\xabc\xdefx" is
equivalent to "/abcxdef/".

> Oh, and if you should change your proposed text,... could you please always
> make a new post

I expect that too much will change to make it reasonable to edit in
place anyway.

> -- 
>  (0005775) kre (reporter) - 2022-04-02 09:37
>  https://austingroupbugs.net/view.php?id=1550#c5775 
> -- 

> Then, in a subsequent sentence, or perhaps even paragraph, say something
> like
> Note: even if escaped, the characters  and  cannot
> be used as dellimiter characters.  does not work, [...]
>  does not work either, as if not escaped, it [...]

(see below)

> -- 
>  (0005777) calestyo (reporter) - 2022-04-02 19:47
>  https://austingroupbugs.net/view.php?id=1550#c5777 
> -- 
> That indented paragraph of yours (in Note 0005775) should (if at all) only
> go to the Rationale, IMO. At least the part which describes *why*
>  and  cannot be used.

I'll put something similar in rationale.

> -- 
>  (0005774) kre (reporter) - 2022-04-02 09:15
>  https://austingroupbugs.net/view.php?id=1551#c5774 
> -- 

> About being an editorial change, I agree, but I think it would be better
> if
> it were changed to be "escape character" rather than "unescaped
> ".

Okay, I'll see what I can do.  It may make sense to use the new
definition of "escape sequence" from bug 1546.

It won't be possible in the y command, as that doesn't use an RE (so
would need its own definition of "escape character").

> -- 
>  (0005780) calestyo (reporter) - 2022-04-05 00:59
>  https://austingroupbugs.net/view.php?id=1551#c5780 
> -- 
[This one has been edited since it was sent to the mailing list, so
the quotes below are copied from Mantis instead of the email.]

> Still, as I propose in https://austingroupbugs.net/view.php?id=1556#c5778
> point (c) I'd make this more clear by directly saying, that sed's
> additions '\n' (for newlines) and '\c' (for escaped delimiter) are -
> with respect to sed, considered part of the RE respectively replacement
> language... and that the whole command string (context address
> respectively s-command) is parsed in one go from left to right.

We can't specify parsing in one go - that's an internal implementation
detail.  In fact, any sed implementation that wants to use the standard
regcomp() and regexec() functions to do the RE matching will need to do
a separate pass to produce the RE to give to regcomp().

What matters is that the delimiter can only be escaped with an
_unescaped_ backslash, and that it doesn't end the RE when it is in a
bracket expression. I believe my proposal makes both of those things
clear.

> Apart from GNU's vs. busybox' sed ... is it known whether any current
> (= not older than 5 years and still maintained) sed implementations
> differ in that behaviour?
> 
> BusyBox sed may simply change it's behaviour (if persuaded ;-) )...
> I think they usually try to follow GNU... and so the difference might
> be simply some implementation coincidence.
> 
> In note 

Re: [Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-04-02 Thread Lawrence Velázquez via austin-group-l at The Open Group
On Sat, Apr 2, 2022, at 11:34 PM, Christoph Anton Mitterer via austin-group-l 
at The Open Group wrote:
> Apparently some sed implementations use \+ (and friends) in BREs like +
> EREs, while some use it as the literal + .

The behavior of things like \+ are explicitly undefined by the BRE
specification; sed scripts that use such things are nonportable by
definition.  There is nothing wrong with implementations disagreeing
on behavior not addressed by the standard.

> Some uudeocde implementations use -o - as stdout, some as the file '-'.
> (which would also be one example, where one might get hit, even if one
> fully adhered to POSIX, in which previously -o - would have been the
> file '-').

As I understand it, these implementations are already not compliant
with a plain reading of the standard.  Perhaps this could have been
avoided if the standard had been clearer, but ultimately this is
an implementation issue.  POSIX cannot *force* implementations to
be compliant, no matter how comprehensive the standard gets.

> People adapt typically to what's been extended bei "their"
> implementation... and things quickly get non-portable.

It is up to developers to avoid using nonportable extensions if
they wish to be portable.

The logical conclusion to your line of thought is that POSIX should
enforce portability at the implementation level by leaving nothing
undefined and demanding that compliant implementations not draw
outside the lines, even a little bit.  This is obviously untenable.

-- 
vq



Re: [Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-04-02 Thread Christoph Anton Mitterer via austin-group-l at The Open Group
On Sun, 2022-04-03 at 05:51 +0300, Oğuz wrote:
> 3 Nisan 2022 Pazar tarihinde Christoph Anton Mitterer via austin-
> group-l at The Open Group  yazdı:
> > But many of those extensions made by implementations in areas where
> > POSIX doesn't define things, cause IMO quite some trouble in
> > practise.
> > 
> 
> Such as? 


Apparently some sed implementations use \+ (and friends) in BREs like +
EREs, while some use it as the literal + .


Some uudeocde implementations use -o - as stdout, some as the file '-'.
(which would also be one example, where one might get hit, even if one
fully adhered to POSIX, in which previously -o - would have been the
file '-').

I guess one could easily go on with that list for quite a while.

People adapt typically to what's been extended bei "their"
implementation... and things quickly get non-portable.


Cheers,
Chris.



Re: [Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-04-02 Thread Oğuz via austin-group-l at The Open Group
3 Nisan 2022 Pazar tarihinde Christoph Anton Mitterer via austin-group-l at
The Open Group  yazdı:
>
> But many of those extensions made by implementations in areas where
> POSIX doesn't define things, cause IMO quite some trouble in practise.
>

Such as?


-- 
Oğuz


Re: [Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-04-02 Thread Christoph Anton Mitterer via austin-group-l at The Open Group
On Sun, 2022-04-03 at 08:41 +0700, Robert Elz via austin-group-l at The
Open Group wrote:
> Actually there's no reason to forbid them, they simply do
> not work.   Applications cannot expect them to work.
> That's all that needs to be said.

Well than let's not call it forbid, but - as already the case (also in
Geoff's proposal) - use a wording like "other than  or
"

That doesn't forbid it, but leaves the "complex" details on *why* this
is so, to the interested reader in the rationale.


> That's how innovation happens.

Well it's not that I'd have anything against innovation (I mean I do
love PCRE).

But many of those extensions made by implementations in areas where
POSIX doesn't define things, cause IMO quite some trouble in practise.
Not rarely even when the user just uses what POSIX actually does
define.

Just take the mess with `locale` in the shell command language. Many
subtle differences... basically no chance to ever reconcile them.



btw: For those interested: I've made a PDF for just sed, based on the
draft, but with colourisations for the additions/removals of Geoff's
proposal from:
https://austingroupbugs.net/view.php?id=1550#c5761
(my mind got twisted, when I tried to do it via the "replace that on
page x line y with..." ^^)

I'd share it if anyone who's interested in those issues has an
interest... but I have no idea whether I'd break some copyright or
so... :-/


Cheers,
Chris.



Re: [Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-04-02 Thread Robert Elz via austin-group-l at The Open Group
Date:Sat, 2 Apr 2022 19:47:44 +
From:Austin Group Bug Tracker 
Message-ID:  

  | That indented paragraph of yours (in Note 0005775) should (if at all)
  | only go to the Rationale, IMO.

That's fine.

  | Cause for the purpose of the standard itself it's not really relevant
  | *why* they mustn't be used, but only *that* this is the case.

Actually there's no reason to forbid them, they simply do
not work.   Applications cannot expect them to work.
That's all that needs to be said.

  | Also, I'd rather really forbid their use,

That doesn't work for your intended purpose, as implementations
are permitted to extend the standard -- to give interpretations
to inputs that have either unspecified results, or which applications
shall not use.

To prevent that the standard would need to give an interpretation
to what these sequences must do, but I kind of doubt that in this
area there's an actual established standard to do that (do
remember that the purpose here is to document tbe existing
standard, not to define (invent) what that should be).

  | Simply because otherwise an (crazy) implementation might try to
  | "workaround" that limitation in some odd way, which again makes it
  | non-portable. 

That's fine, just an extension.  Those exist all over tbe place.
We neither really can, nor should, attempt to prevent that.  That's
how innovation happens.  Without that we get revolution instead.
Innovation allows applications that restrict themselves to the
standard idioms to continue to function.  Revolution does not.

kre



[Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-04-02 Thread Austin Group Bug Tracker via austin-group-l at The Open Group


A NOTE has been added to this issue. 
== 
https://austingroupbugs.net/view.php?id=1550 
== 
Reported By:calestyo
Assigned To:
== 
Project:Issue 8 drafts
Issue ID:   1550
Category:   Shell and Utilities
Type:   Enhancement Request
Severity:   Editorial
Priority:   normal
Status: New
Name:   Christoph Anton Mitterer 
Organization:
User Reference:  
Section:Utilities, sed 
Page Number:3132, ff. (in the draft) 
Line Number:see below 
Final Accepted Text: 
== 
Date Submitted: 2022-01-14 05:32 UTC
Last Modified:  2022-04-02 19:47 UTC
== 
Summary:clarifications/ambiguities in the description of
context addresses and their delimiters for sed
==
Relationships   ID  Summary
--
related to  0001551 sed: ambiguities in the how BREs/EREs a...
related to  0001556 clarify meaning of \n used in a bracket...
== 

-- 
 (0005777) calestyo (reporter) - 2022-04-02 19:47
 https://austingroupbugs.net/view.php?id=1550#c5777 
-- 
That indented paragraph of yours (in Note 0005775) should (if at all) only
go to the Rationale, IMO. At least the part which describes *why*
 and  cannot be used.

Cause for the purpose of the standard itself it's not really relevant *why*
they mustn't be used, but only *that* this is the case.

Also, I'd rather really forbid their use, instead of just explaining why it
wouldn't work anyway.

Simply because otherwise an (crazy) implementation might try to
"workaround" that limitation in some odd way, which again makes it
non-portable. 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2022-01-14 05:32 calestyo   New Issue
2022-01-14 05:32 calestyo   Name  => Christoph Anton
Mitterer
2022-01-14 05:32 calestyo   Section   => Utilities, sed  
2022-01-14 05:32 calestyo   Page Number   => 3132, ff. (in the
draft)
2022-01-14 05:32 calestyo   Line Number   => see below   
2022-01-14 05:40 calestyo   Note Added: 0005601  
2022-01-14 06:34 Don Cragun Relationship added   related to 0001551  
2022-01-14 06:52 Don Cragun Project 
1003.1(2016/18)/Issue7+TC2 => Issue 8 drafts
2022-01-14 06:54 Don Cragun Note Added: 0005603  
2022-01-14 06:54 Don Cragun version   => Draft 2.1   
2022-03-18 11:15 geoffclare Note Added: 0005756  
2022-03-18 11:15 geoffclare Note Edited: 0005756 
2022-03-25 16:18 geoffclare Note Added: 0005761  
2022-03-25 16:22 geoffclare Note Edited: 0005761 
2022-03-26 00:08 calestyo   Note Added: 0005767  
2022-03-26 00:34 calestyo   Note Edited: 0005767 
2022-03-31 16:00 nick   Relationship added   related to 0001556  
2022-04-02 01:53 calestyo   Note Added: 0005771  
2022-04-02 02:30 calestyo   Note Added: 0005772  
2022-04-02 09:37 kreNote Added: 0005775  
2022-04-02 19:47 calestyo   Note Added: 0005777  
==




[Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-04-02 Thread Austin Group Bug Tracker via austin-group-l at The Open Group


A NOTE has been added to this issue. 
== 
https://austingroupbugs.net/view.php?id=1550 
== 
Reported By:calestyo
Assigned To:
== 
Project:Issue 8 drafts
Issue ID:   1550
Category:   Shell and Utilities
Type:   Enhancement Request
Severity:   Editorial
Priority:   normal
Status: New
Name:   Christoph Anton Mitterer 
Organization:
User Reference:  
Section:Utilities, sed 
Page Number:3132, ff. (in the draft) 
Line Number:see below 
Final Accepted Text: 
== 
Date Submitted: 2022-01-14 05:32 UTC
Last Modified:  2022-04-02 09:37 UTC
== 
Summary:clarifications/ambiguities in the description of
context addresses and their delimiters for sed
==
Relationships   ID  Summary
--
related to  0001551 sed: ambiguities in the how BREs/EREs a...
related to  0001556 clarify meaning of \n used in a bracket...
== 

-- 
 (0005775) kre (reporter) - 2022-04-02 09:37
 https://austingroupbugs.net/view.php?id=1550#c5775 
-- 
Rather than making the spec for delimiters ban the use of backslash
or newline, I'd prefer to simply say "Any character ..." and leave the
basic definition at that.

Then, in a subsequent sentence, or perhaps even paragraph, say something
like
Note: even if escaped, the characters  and  cannot
be used as dellimiter characters.  does not work, as if
unescaped later in the RE, it either becomes the escape character,
in which case its purpose is to escape the following character, which
can then never be the ending delimiter, or it forms part of a bracket
expression, inside which the ending delimiter for the RE cannot be
located.does not work either, as if not escaped, it
is removed, and terminates the command, meaning it cannot be the
ending
delimiter., and if escaped, cannot be the ending delimiter either.
Hence use of either of these characters as a delimiter makes it
impossible
to supply the required ending delimiter.

That both removes the \ and \n from being odd special cases, and explains
just why they don't (can't) work as delimiter characters (without
requiring
any extra text to explain what the implementation should do should the
user
attempt to do such a thing). 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2022-01-14 05:32 calestyo   New Issue
2022-01-14 05:32 calestyo   Name  => Christoph Anton
Mitterer
2022-01-14 05:32 calestyo   Section   => Utilities, sed  
2022-01-14 05:32 calestyo   Page Number   => 3132, ff. (in the
draft)
2022-01-14 05:32 calestyo   Line Number   => see below   
2022-01-14 05:40 calestyo   Note Added: 0005601  
2022-01-14 06:34 Don Cragun Relationship added   related to 0001551  
2022-01-14 06:52 Don Cragun Project 
1003.1(2016/18)/Issue7+TC2 => Issue 8 drafts
2022-01-14 06:54 Don Cragun Note Added: 0005603  
2022-01-14 06:54 Don Cragun version   => Draft 2.1   
2022-03-18 11:15 geoffclare Note Added: 0005756  
2022-03-18 11:15 geoffclare Note Edited: 0005756 
2022-03-25 16:18 geoffclare Note Added: 0005761  
2022-03-25 16:22 geoffclare Note Edited: 0005761 
2022-03-26 00:08 calestyo   Note Added: 0005767  
2022-03-26 00:34 calestyo   Note Edited: 0005767 
2022-03-31 16:00 nick   Relationship added   related to 0001556  
2022-04-02 01:53 calestyo   Note Added: 0005771  
2022-04-02 02:30 calestyo   Note Added: 0005772  
2022-04-02 09:37 kreNote Added: 0005775  

[Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-04-01 Thread Austin Group Bug Tracker via austin-group-l at The Open Group


A NOTE has been added to this issue. 
== 
https://austingroupbugs.net/view.php?id=1550 
== 
Reported By:calestyo
Assigned To:
== 
Project:Issue 8 drafts
Issue ID:   1550
Category:   Shell and Utilities
Type:   Enhancement Request
Severity:   Editorial
Priority:   normal
Status: New
Name:   Christoph Anton Mitterer 
Organization:
User Reference:  
Section:Utilities, sed 
Page Number:3132, ff. (in the draft) 
Line Number:see below 
Final Accepted Text: 
== 
Date Submitted: 2022-01-14 05:32 UTC
Last Modified:  2022-04-02 02:30 UTC
== 
Summary:clarifications/ambiguities in the description of
context addresses and their delimiters for sed
==
Relationships   ID  Summary
--
related to  0001551 sed: ambiguities in the how BREs/EREs a...
related to  0001556 clarify meaning of \n used in a bracket...
== 

-- 
 (0005772) calestyo (reporter) - 2022-04-02 02:30
 https://austingroupbugs.net/view.php?id=1550#c5772 
-- 
What I write above in (1b) kinda also applies to the newly added paragraph
in "Regular Expressions in sed":
   "The delimiter character that precedes..."
and it's counterpart in the s-command:
   "Within the RE and the replacement, the delimiter shall not terminate"

Conceptually they match the new sentence:
   "In either case, the escape sequence c shall
not terminate the RE. For example"
in "Addresses in sed"...

... all describe, AFAIU, two things:

- the delimiter character with a RE bracket expression is never used as a
delimiter (and thus escaping it in the bracket expression, would actually
not be escaping, but the literal \)

and:

- otherwise, a escaped delimiter (where the escaping \ is itself not
escaped)... is not a delimiter (but something else... which depends on the
character, BRE vs ERE,... and (what I hate:) the implementation)

and both of these fro s-command AND context addresses. 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2022-01-14 05:32 calestyo   New Issue
2022-01-14 05:32 calestyo   Name  => Christoph Anton
Mitterer
2022-01-14 05:32 calestyo   Section   => Utilities, sed  
2022-01-14 05:32 calestyo   Page Number   => 3132, ff. (in the
draft)
2022-01-14 05:32 calestyo   Line Number   => see below   
2022-01-14 05:40 calestyo   Note Added: 0005601  
2022-01-14 06:34 Don Cragun Relationship added   related to 0001551  
2022-01-14 06:52 Don Cragun Project 
1003.1(2016/18)/Issue7+TC2 => Issue 8 drafts
2022-01-14 06:54 Don Cragun Note Added: 0005603  
2022-01-14 06:54 Don Cragun version   => Draft 2.1   
2022-03-18 11:15 geoffclare Note Added: 0005756  
2022-03-18 11:15 geoffclare Note Edited: 0005756 
2022-03-25 16:18 geoffclare Note Added: 0005761  
2022-03-25 16:22 geoffclare Note Edited: 0005761 
2022-03-26 00:08 calestyo   Note Added: 0005767  
2022-03-26 00:34 calestyo   Note Edited: 0005767 
2022-03-31 16:00 nick   Relationship added   related to 0001556  
2022-04-02 01:53 calestyo   Note Added: 0005771  
2022-04-02 02:30 calestyo   Note Added: 0005772  
==




[Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-04-01 Thread Austin Group Bug Tracker via austin-group-l at The Open Group


A NOTE has been added to this issue. 
== 
https://austingroupbugs.net/view.php?id=1550 
== 
Reported By:calestyo
Assigned To:
== 
Project:Issue 8 drafts
Issue ID:   1550
Category:   Shell and Utilities
Type:   Enhancement Request
Severity:   Editorial
Priority:   normal
Status: New
Name:   Christoph Anton Mitterer 
Organization:
User Reference:  
Section:Utilities, sed 
Page Number:3132, ff. (in the draft) 
Line Number:see below 
Final Accepted Text: 
== 
Date Submitted: 2022-01-14 05:32 UTC
Last Modified:  2022-04-02 01:53 UTC
== 
Summary:clarifications/ambiguities in the description of
context addresses and their delimiters for sed
==
Relationships   ID  Summary
--
related to  0001551 sed: ambiguities in the how BREs/EREs a...
related to  0001556 clarify meaning of \n used in a bracket...
== 

-- 
 (0005771) calestyo (reporter) - 2022-04-02 01:53
 https://austingroupbugs.net/view.php?id=1550#c5771 
-- 
Okay I finally had a chance to look into
https://austingroupbugs.net/view.php?id=1550#c5761:

I'll try to add to this ticket only those review parts of your proposal,
that really affect this ticket... and everything else to the other
tickets.


I) Regarding your new paragraph "In a context address, any character…"

a) I think this (its first sentence) basically fixes my original point (1)
of this ticket.


b) The whole paragraph does however add some mess to the subchapters:
I know that it was me who suggested that originally (shame on me)... but
now that I see it... it seems I was wrong (though I could live with it).

Everything form it's 2nd sentence ("The BRE and ERE...") describes rather
the behaviour of context address delimiters in REs, yet it's found in
"Addresses in sed" rather than "Regular Expressions in sed".

Similar content than in this paragraph, is found in the one you add to the
s-command ("Within the RE and the replacement...").

So maybe, in "Addresses in sed" we should better *only* describe the \cREc
form of these,... and link to "Regular Expressions in sed" for how
delimiters are escaped? On could then also try to remove the respective
parts from the s-command and unify both in "Regular Expressions in sed".
The only difficulty is perhaps that for the s-command there's also the
ambiguity with & in the replacement, that part should perhaps stay in the
s-command.


c) Also, right now (which may be on purpose, however)... \/RE/ would be
allowed, and identical to /RE/.
This is basically my original point (2b)


d) Right now, it does not really seem to be specified, that / alone as
delimiter needs not to be escaped in a context address, there's merely the
"preceded and followed by a delimiter, usually a " left, but that
alone could also mean that the \ needs to be added on the first / .

(Ic), (Id) as well as my original (2b) would be fixed, if we'd write
something like:
"When the delimiter character c is , a context address \/RE/ can
also be written as /RE/." (or something similar but better).
That would make it clear that \/RE/ is allowed and identical to /RE/ and at
the same time define /RE/.



II) Regarding your removal/replacement of the first bullet item (beginning
"In a context address") on page 3134 line 106087.
- fixes my original points (2a) and (2c).
- fixes the problem you found with "shall be identical"



My previous notes in https://austingroupbugs.net/view.php?id=1550#c5767
should now be all obsolete or I've mentioned them again in this post.


Further comments to your proposed text in the other issues.

Oh, and if you should change your proposed text,... could you please always
make a new post, and perhaps add a version number or so? That way one can
diff the a new version with an already "reviewed" one more easily, and also
refer to it more easily. Thanks! 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2022-01-14 05:32 calestyo   New Issue 

[Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-03-31 Thread Austin Group Bug Tracker via austin-group-l at The Open Group


The following issue has been set as RELATED TO issue 0001556. 
== 
https://austingroupbugs.net/view.php?id=1550 
== 
Reported By:calestyo
Assigned To:
== 
Project:Issue 8 drafts
Issue ID:   1550
Category:   Shell and Utilities
Type:   Enhancement Request
Severity:   Editorial
Priority:   normal
Status: New
Name:   Christoph Anton Mitterer 
Organization:
User Reference:  
Section:Utilities, sed 
Page Number:3132, ff. (in the draft) 
Line Number:see below 
Final Accepted Text: 
== 
Date Submitted: 2022-01-14 05:32 UTC
Last Modified:  2022-03-26 00:08 UTC
== 
Summary:clarifications/ambiguities in the description of
context addresses and their delimiters for sed
==
Relationships   ID  Summary
--
related to  0001551 sed: ambiguities in the how BREs/EREs a...
related to  0001556 clarify meaning of \n used in a bracket...
== 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2022-01-14 05:32 calestyo   New Issue
2022-01-14 05:32 calestyo   Name  => Christoph Anton
Mitterer
2022-01-14 05:32 calestyo   Section   => Utilities, sed  
2022-01-14 05:32 calestyo   Page Number   => 3132, ff. (in the
draft)
2022-01-14 05:32 calestyo   Line Number   => see below   
2022-01-14 05:40 calestyo   Note Added: 0005601  
2022-01-14 06:34 Don Cragun Relationship added   related to 0001551  
2022-01-14 06:52 Don Cragun Project 
1003.1(2016/18)/Issue7+TC2 => Issue 8 drafts
2022-01-14 06:54 Don Cragun Note Added: 0005603  
2022-01-14 06:54 Don Cragun version   => Draft 2.1   
2022-03-18 11:15 geoffclare Note Added: 0005756  
2022-03-18 11:15 geoffclare Note Edited: 0005756 
2022-03-25 16:18 geoffclare Note Added: 0005761  
2022-03-25 16:22 geoffclare Note Edited: 0005761 
2022-03-26 00:08 calestyo   Note Added: 0005767  
2022-03-26 00:34 calestyo   Note Edited: 0005767 
2022-03-31 16:00 nick   Relationship added   related to 0001556  
==




[Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-03-25 Thread Austin Group Bug Tracker via austin-group-l at The Open Group


A NOTE has been added to this issue. 
== 
https://austingroupbugs.net/view.php?id=1550 
== 
Reported By:calestyo
Assigned To:
== 
Project:Issue 8 drafts
Issue ID:   1550
Category:   Shell and Utilities
Type:   Enhancement Request
Severity:   Editorial
Priority:   normal
Status: New
Name:   Christoph Anton Mitterer 
Organization:
User Reference:  
Section:Utilities, sed 
Page Number:3132, ff. (in the draft) 
Line Number:see below 
Final Accepted Text: 
== 
Date Submitted: 2022-01-14 05:32 UTC
Last Modified:  2022-03-26 00:08 UTC
== 
Summary:clarifications/ambiguities in the description of
context addresses and their delimiters for sed
==
Relationships   ID  Summary
--
related to  0001551 sed: ambiguities in the how BREs/EREs a...
== 

-- 
 (0005767) calestyo (reporter) - 2022-03-26 00:08
 https://austingroupbugs.net/view.php?id=1550#c5767 
-- 
Re: https://austingroupbugs.net/view.php?id=1550#c5756

your (1): agreed

your (2a): agreed, and you're right about the issue with "shall be
identical"


your (2b):
I wouldn't agree here because...

In "where c is any character other than  or ", the idea
behind exclusion of  and  is, AFAIU, rather a *general*
exclusion, that is in the sense of "neither of these two characters can
ever be delimiters".

I wouldn't have understood that sentence in the sense of "any c except
 and  have to use the \cREc form" but rather as "any c
have to use the \cREc form and  or  cannot be used at
all".

I agree, that the wording of the sentence actually describes the former,...
but I think the spirit that it means is rather the latter (in which case it
would be open whether c is any character or any character except (forward)
).

If not, the question would be open as to how  or  would
be encoded as delimiters.


your (2c):
Well I agree it's "semi-clear" (especially because of the given example)...
still adding a short clarification as in:
"If any but the first occurrence of the character designated by c appears
following a "
shouldn't cause much harm and make it really definite.

I personally would recommend against dealing with that in
https://austingroupbugs.net/view.php?id=1551 .

While they touch the same section, this one here is merely a subtle
cosmetic change, which doesn't really affect the deeper semantics as most
of #1551 does. 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2022-01-14 05:32 calestyo   New Issue
2022-01-14 05:32 calestyo   Name  => Christoph Anton
Mitterer
2022-01-14 05:32 calestyo   Section   => Utilities, sed  
2022-01-14 05:32 calestyo   Page Number   => 3132, ff. (in the
draft)
2022-01-14 05:32 calestyo   Line Number   => see below   
2022-01-14 05:40 calestyo   Note Added: 0005601  
2022-01-14 06:34 Don Cragun Relationship added   related to 0001551  
2022-01-14 06:52 Don Cragun Project 
1003.1(2016/18)/Issue7+TC2 => Issue 8 drafts
2022-01-14 06:54 Don Cragun Note Added: 0005603  
2022-01-14 06:54 Don Cragun version   => Draft 2.1   
2022-03-18 11:15 geoffclare Note Added: 0005756  
2022-03-18 11:15 geoffclare Note Edited: 0005756 
2022-03-25 16:18 geoffclare Note Added: 0005761  
2022-03-25 16:22 geoffclare Note Edited: 0005761 
2022-03-26 00:08 calestyo   Note Added: 0005767  
==




[Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-03-25 Thread Austin Group Bug Tracker via austin-group-l at The Open Group


A NOTE has been added to this issue. 
== 
https://austingroupbugs.net/view.php?id=1550 
== 
Reported By:calestyo
Assigned To:
== 
Project:Issue 8 drafts
Issue ID:   1550
Category:   Shell and Utilities
Type:   Enhancement Request
Severity:   Editorial
Priority:   normal
Status: New
Name:   Christoph Anton Mitterer 
Organization:
User Reference:  
Section:Utilities, sed 
Page Number:3132, ff. (in the draft) 
Line Number:see below 
Final Accepted Text: 
== 
Date Submitted: 2022-01-14 05:32 UTC
Last Modified:  2022-03-25 16:18 UTC
== 
Summary:clarifications/ambiguities in the description of
context addresses and their delimiters for sed
==
Relationships   ID  Summary
--
related to  0001551 sed: ambiguities in the how BREs/EREs a...
== 

-- 
 (0005761) geoffclare (manager) - 2022-03-25 16:18
 https://austingroupbugs.net/view.php?id=1550#c5761 
-- 
Proposed changes to fix this bug and bug
https://austingroupbugs.net/view.php?id=1551 (I was unable to
disentangle the changes into two clean separate edits):

On page 3134 line 106070 section sed, after:... preceded and
followed by a delimiter, usually a ).add a new
paragraph:In a context address, any character other than
 or  can be specified for use as the delimiter by means
of the construction "\cREc", where c is the chosen
delimiter character. The BRE and ERE syntax shall additionally support
escaping occurrences of the delimiter within the RE with an unescaped
 (except inside a bracket expression). If the character
designated by c is not special in a BRE or ERE according to [xref to
XBD 9.3] or [xref to XBD 9.4], respectively, the escape sequence
c shall be treated as that literal character; otherwise,
it is unspecified whether the escape sequence c is
treated as the literal character or the special character. In either case,
the escape sequence c shall not terminate the RE. For
example, the context address "\xabc\xdefx" is equivalent to
"/abcxdef/".
On page 3134 line 106087 section sed, replace the first bullet item
(beginning "In a context address") with:The delimiter character
that precedes and follows the RE shall not terminate the RE when it appears
within a bracket expression. For example, the context address "/[/]/" is
equivalent to "/\//".
On page 3137 line 106204 section sed (s command), change:Within
the RE and the replacement, the RE delimiter itself can be used as a
literal character if it is preceded by a
.to:Within the RE and the replacement,
the delimiter shall not terminate the RE or replacement if it is preceded
by an unescaped  (that is not inside a bracket expression in the
RE, where the delimiter does not terminate the RE anyway - see [xref to
Regular Expressions in sed]). If the delimiter character is not special in
a BRE or ERE according to [xref to XBD 9.3] or [xref to XBD 9.4],
respectively, the escape sequence delimiter shall be
treated as that literal character in the RE; otherwise, it is unspecified
whether the escape sequence delimiter is treated as the
literal character or the special character. Likewise, if the delimiter
character is not  ('&'), the escape sequence
delimiter shall be treated as that literal character in
the replacement; if it is , it is unspecified whether the escape
sequence delimiter is treated as the literal character or
the special character (see below).
On page 3138 line 106253 section sed (y command), change:...
the delimiter itself can be used as a literal character if it is preceded
by a . If a  character is immediately followed by a
 character in string1 or string2, the two
 characters shall be counted as a single literal 
character.to:... the delimiter itself can be used
as a literal character if it is preceded by an unescaped . If a
 character is escaped by an immediately preceding unescaped
 character in string1 or string2, the two
 characters shall be treated as a single literal 
character.
On page 3138 line 106278 section sed, add a new paragraph to APPLICATION
USAGE:Applications that use a special RE character as 

[Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-03-18 Thread Austin Group Bug Tracker via austin-group-l at The Open Group


A NOTE has been added to this issue. 
== 
https://austingroupbugs.net/view.php?id=1550 
== 
Reported By:calestyo
Assigned To:
== 
Project:Issue 8 drafts
Issue ID:   1550
Category:   Shell and Utilities
Type:   Enhancement Request
Severity:   Editorial
Priority:   normal
Status: New
Name:   Christoph Anton Mitterer 
Organization:
User Reference:  
Section:Utilities, sed 
Page Number:3132, ff. (in the draft) 
Line Number:see below 
Final Accepted Text: 
== 
Date Submitted: 2022-01-14 05:32 UTC
Last Modified:  2022-03-18 11:15 UTC
== 
Summary:clarifications/ambiguities in the description of
context addresses and their delimiters for sed
==
Relationships   ID  Summary
--
related to  0001551 sed: ambiguities in the how BREs/EREs a...
== 

-- 
 (0005756) geoffclare (manager) - 2022-03-18 11:15
 https://austingroupbugs.net/view.php?id=1550#c5756 
-- 
Taking the points raised in turn:

1) Since there is already a cross-reference to "Regular Expressions in
sed", rather than adding a second in the same sentence, I would rearrange
the sentence, e.g.:... (which consists of an RE, preceded and
followed by a delimiter—usually a —as described in Regular
Expressions in sed).
2a) I would put the reason for the construction up front instead of
later:In a context address, a delimiter other than  can
be used by means of the construction ...I also think "shall be
identical" is wrong because that implies that  still needs to be
escaped in the RE even if "c" is a different character.
2b) It is already perfectly clear that "c" can be any character other than
 or . Which means it can be . No change needed.
2c) By a strict reading, you are right, although I think the intention is
clear. Any fix for this would overlap with bug
https://austingroupbugs.net/view.php?id=1551 so is probably
best addressed there.  

I agree with the final comment about moving that bullet item. The need to
escape the delimiter is described in the s and y commands, not here, so it
seems odd that it is here for context addresses instead of in the
description of addresses. 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2022-01-14 05:32 calestyo   New Issue
2022-01-14 05:32 calestyo   Name  => Christoph Anton
Mitterer
2022-01-14 05:32 calestyo   Section   => Utilities, sed  
2022-01-14 05:32 calestyo   Page Number   => 3132, ff. (in the
draft)
2022-01-14 05:32 calestyo   Line Number   => see below   
2022-01-14 05:40 calestyo   Note Added: 0005601  
2022-01-14 06:34 Don Cragun Relationship added   related to 0001551  
2022-01-14 06:52 Don Cragun Project 
1003.1(2016/18)/Issue7+TC2 => Issue 8 drafts
2022-01-14 06:54 Don Cragun Note Added: 0005603  
2022-01-14 06:54 Don Cragun version   => Draft 2.1   
2022-03-18 11:15 geoffclare Note Added: 0005756  
==




[Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-01-13 Thread Austin Group Bug Tracker via austin-group-l at The Open Group


The following issue has been UPDATED. 
== 
https://austingroupbugs.net/view.php?id=1550 
== 
Reported By:calestyo
Assigned To:
== 
Project:Issue 8 drafts
Issue ID:   1550
Category:   Shell and Utilities
Type:   Enhancement Request
Severity:   Editorial
Priority:   normal
Status: New
Name:   Christoph Anton Mitterer 
Organization:
User Reference:  
Section:Utilities, sed 
Page Number:3132, ff. (in the draft) 
Line Number:see below 
Final Accepted Text: 
== 
Date Submitted: 2022-01-14 05:32 UTC
Last Modified:  2022-01-14 06:54 UTC
== 
Summary:clarifications/ambiguities in the description of
context addresses and their delimiters for sed
==
Relationships   ID  Summary
--
related to  0001551 sed: ambiguities in the how BREs/EREs a...
== 

-- 
 (0005603) Don Cragun (manager) - 2022-01-14 06:54
 https://austingroupbugs.net/view.php?id=1550#c5603 
-- 
This was originally filed against the Issue 7 + TC2 project, but the page
and line numbers are from Issue 8 draft 2.1.  It has been moved to the
Issue 8 project. 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2022-01-14 05:32 calestyo   New Issue
2022-01-14 05:32 calestyo   Name  => Christoph Anton
Mitterer
2022-01-14 05:32 calestyo   Section   => Utilities, sed  
2022-01-14 05:32 calestyo   Page Number   => 3132, ff. (in the
draft)
2022-01-14 05:32 calestyo   Line Number   => see below   
2022-01-14 05:40 calestyo   Note Added: 0005601  
2022-01-14 06:34 Don Cragun Relationship added   related to 0001551  
2022-01-14 06:52 Don Cragun Project 
1003.1(2016/18)/Issue7+TC2 => Issue 8 drafts
2022-01-14 06:54 Don Cragun Note Added: 0005603  
2022-01-14 06:54 Don Cragun version   => Draft 2.1   
==