Re: Status of $'...' addition (was: ksh93 job control behaviour)

2020-07-30 Thread shwaresyst

It is not "some sensible \u sequences" alone. First off, there's little 
agreement on what constitutes 'sensible'. Just the headache of the U300 
diacritics adds to XBD6 significantly, if they're to be supported, as one 
example. The 'sensible' present solution is to not support them at all; others 
will argue the 'sensible' thing is to support them because Unocode does include 
these code points. The headache stems from it is not simply arbitrarily saying 
let's have the utility support these in $'', it's ensuring there are interfaces 
for the utilities to be written in that understand left-associative combining 
sequences, and these interfaces are portable because requirements in XBD add 
that support.
On Thursday, July 30, 2020 Steffen Nurpmeso  wrote:
shwaresyst wrote in
 <1127836834.9524758.1596121054...@mail.yahoo.com>:
 |Yes, the additions necessary still for even limited Unicode support \
 |above the broken bandaids C11+ provide are one of those issues. Where \
 |Unicode is incompatible with POSIX, and is therefore (by design) broken \
 |too needs addressing also. The white papers detailing most of these \
 |changes have yet to be written, or published if some have been.

Hmm, the ISO C reference is of course true.  But then this is
about Unix/POSIX shells, and then adding some sensible \u
sequences and defining their conversion to locale charset can only
be an improvement, i think.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter          he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)


Re: Status of $'...' addition (was: ksh93 job control behaviour)

2020-07-30 Thread Chet Ramey
On 7/30/20 7:29 PM, Robert Elz wrote:

>   | And for that it would be tremendous if $'' would be defined so
>   | that it can be used as the sole quoting mechanism,
> 
> No thanks.   Partly because $'' is already implemented (widely)
> and used (perhaps slightly less yet) - so that ship has sailed.
> 
> I believe I've seen $" ... " used that way somewhere though (don't
> recall where) and I believe it is a mistake.

None of the existing implementations of $"..." use it in that way.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Status of $'...' addition (was: ksh93 job control behaviour)

2020-07-30 Thread Robert Elz
Date:Thu, 30 Jul 2020 15:53:53 +0200
From:Steffen Nurpmeso 
Message-ID:  <20200730135353.qwslp%stef...@sdaoden.eu>


  | The problem being that what is in the wild does not work out for
  | many languages.

I admit to not knowing a lot of the internationalisation issues,
or of unicode, but I don't understand this at all.

The quoting mechanisms in the shell provide a means to create
specific bit patterns to assign to variables, pass as parameters
to programs, etc.   I don't see that the mechanism by which they're
encoded in the sh language should matter all that much, the same
thing could be read from a file instead ( var=$(cat file) ) in which
case the shell spec has no control over the bit patterns at all.

Of course the quoting mechanisms make a difference to the ease of
use for the sh programmer, but that's an entirely different issue.

  | The in-use shell quote pattern consisting of small, isolated parts
  | which depend on which kind of escaping and expanding is necessary
  | just does not work out for many languages.

Can you give an example of something which cannot be done (assuming
$'' as currently intended to be specified)?   Note: not an example of
someone using the mechanisms to do the wrong thing - there are zillions
of ways to write bad code, but an example of something which cannot be
done correctly as specified.   Then we'll see if that really matters.

  |   ? echo Don"'"t you worry$'\x21' The sun shines on us. $'\u263A'
  |
  | The latter is what i mean.  There are many languages on this world
  | where these \u expansions do not work out that way, but where the
  | "entire sentence must be interpreted as a unity" in order to get
  | the iconv(3) conversation to nl_langinfo(CODESET) correctly, aka
  | the way it is _desired_.

Surely this depends upon how the shell works - if the shell is attempting
to convert just the \u escape into some other codeset, I can see your point,
but it doesn't need to work like that - it can work internally in 10646
code points (whether encoded in 16 or 32 bit values, or as UTF-8), and
only convert to the desired charset when actually used (that is, when
about to run "echo" at which point the entire string is available.

In any case, if the user has specified a specific unicode code point,
shouldn't that always be what is generated, regardless of whether it
makes sense or not?

  | And for that it would be tremendous if $'' would be defined so
  | that it can be used as the sole quoting mechanism,

No thanks.   Partly because $'' is already implemented (widely)
and used (perhaps slightly less yet) - so that ship has sailed.

I believe I've seen $" ... " used that way somewhere though (don't
recall where) and I believe it is a mistake.

As soon as you have multiple different types of expansions that
can occur, there are problems with which one gets priority, which
is performed first.   So, assuming there is a $"..." which works
as you desire, what happens with

$"${VAR+foo\x7Dbar}"

Do we get foo}bar or foobar} ?   (assuming VAR was set of course).

Whichever way you pick, there will be arguments for doing it
the other way, in some other case.   This stuff simply becomes
a mess.   Please, don't go there.   If we wanted to add C type
encodings along with the others, we'd need to do it in a way that
is consistent with the other expansions, perhaps using something
like $[x7D] or $[u263A] or $[n] (but no, this is not a serious
suggestion).

And I cannot fathom how this in any way overcomes your earlier
objection, quoted strings in sh are not units, they're simply
pieces of some longer word (or can be) - your Don"'"t example
above (and the worry$'\x21') are both examples of that.

kre




Re: More issues with pattern matching

2020-07-30 Thread Harald van Dijk

On 26/09/2019 10:20, Geoff Clare wrote:

Geoff Clare  wrote, on 26 Sep 2019:



Are shells required to support this, and are shells therefore implicitly
required to translate patterns to regular expressions, or should it be okay
to implement this with single character support only?


Shells are required to support it.  They don't need to translate
entire patterns to regular expressions - they can use either
regcomp()+regexec() or fnmatch() to see if the bracket expression
matches the next character.


Sorry, I should have written "matches *at* the next character" here;
I didn't mean to imply checking against a single character.

For example, if using regcomp()+regexec() the shell could try to
match the bracket expression against the remainder of the string and
see how much of it regexec() reported as matching.  To use fnmatch()
I suppose you would have to use it in a loop, passing it first one
character, then two, etc. (stopping at the number of characters
between the '.'s).


As I had replied at the time, it is fundamentally impossible in the 
general case as POSIX does not provide any mechanism to escape 
characters and there is nothing in POSIX that rules out the possibility 
of a collating element containing "=]" or ".]".


However, ignoring that aspect of it, looking at implementing this once 
again, implementing it the way you specified is incorrect, fixing it to 
make it correct cannot possibly be done efficiently with standard 
library support, and shells in general don't bother to implement what 
POSIX specifies here.


Take the previous example glibc's cy_GB.UTF-8 locale, but with a 
different collating element: in this locale, "dd" is a single collating 
element too. Therefore, this must be matchable by bracket expressions. 
However, "d" individually must *also* be matched by pattern expressions. 
"dd" can be matched by both [!x] and [!x][!x]. A shell cannot use 
regcomp()+regexec() to find the longest match for [!x] and assume that 
that is matched: a shell where


  case dd in [!x]d) echo match ;; esac

does not print "match" does not implement what POSIX requires. A shell where

  case dd in [!x]) echo match ;; esac

does not print "match" does not implement what POSIX requires either. 
Using regcomp()+regexec() to bind [!x] to either "d" or "dd" without 
taking the rest of the pattern into account will fail to match in one of 
these cases. And it needn't be the same way for all bracket expressions 
in a single pattern:


  case ddd in [!x][!x]) echo match ;; esac

Shells are required by POSIX to consider both the possibility that [!x] 
picks up "d" and that it picks up "dd" for each bracket expression 
individually. This means that in the worst case, if every bracket 
expression in a pattern has X ways to match, and a pattern has Y bracket 
expressions, the shell is required to consider X^Y possibilities. This 
is completely unreasonable and it's obvious why no shell actually does 
this. The complexity can be reduced in theory, but POSIX does not expose 
enough information to allow that to be implemented in a shell. The only 
way around this mess is by translating the whole pattern to a regular 
expression, as only the C library has enough detailed knowledge about 
the locale that it can implement it efficiently.[*] Doing that has its 
own new set of problems though: translating the whole pattern to a 
regular expression means the shell no longer has the option to decide 
how to handle invalid byte sequences (byte sequences that lead to 
EILSEQ) that shells in general try to tolerate, and the shell no longer 
has the option to decide how to handle invalid patterns (patterns 
containing non-existent character classes or collating elements) which 
shells in general also aim to tolerate.


Cheers,
Harald van Dijk

[*] I have not investigated whether implementations actually do do this 
efficiently.




Re: Status of $'...' addition (was: ksh93 job control behaviour)

2020-07-30 Thread Steffen Nurpmeso
shwaresyst wrote in
 <1127836834.9524758.1596121054...@mail.yahoo.com>:
 |Yes, the additions necessary still for even limited Unicode support \
 |above the broken bandaids C11+ provide are one of those issues. Where \
 |Unicode is incompatible with POSIX, and is therefore (by design) broken \
 |too needs addressing also. The white papers detailing most of these \
 |changes have yet to be written, or published if some have been.

Hmm, the ISO C reference is of course true.  But then this is
about Unix/POSIX shells, and then adding some sensible \u
sequences and defining their conversion to locale charset can only
be an improvement, i think.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



Re: Status of $'...' addition (was: ksh93 job control behaviour)

2020-07-30 Thread Steffen Nurpmeso
David A. Wheeler wrote in
 :
 |Steffen Nurpmeso  wrote:
 |>> And for that it would be tremendous if $'' would be defined so
 |>> that it can be used as the sole quoting mechanism, and that would
 |>> then also include expansion of $VAR (i use \$VAR or \${VAR} in my
 |>> mailer).  But to know exactly how problematic splitting of quotes
 |>> is for many languages of the world, including right-to-left
 |>> direction and shift state changes etc., and changing of meaning as
 |>> such if the sentence cannot be interpreted as a unity, a real
 |>> expert had to be asked.  Anyhow, the Unicode effort mandates
 |>> processing of entire strings and denotes isolated treatment as
 |>> a complete error.
 |
 |I think eliminating old quoting mechanisms would be a mistake.

That is an unfortunate misunderstanding, sorry.  I do not want to
obsolete them from the standard side, regarding that all i would
like to see is that $'' gets the few tweaks it needs to include
the possibilities of the other quoting mechanisms, and in effect
this is only "" ($VAR and `` thereof).  And this is solely, no,
this is because (a) like that the entire string expansion can be
fed into iconv(3), and (b) because i think for users, and for
program/script source hm audit it is much easier to grasp than
having the need to sequence it, for example to embed $VAR
expansion into a string.

 |On Thu, 30 Jul 2020 16:09:56 +0200, Joerg Schilling  wrote:
 |> Even if it would become part of the standad today, you stilll would need
 |> to wait some years until all implementations take it up.
 |
 |That's true for almost all standards changes.
 |However, many shells *already* implement $'...'.
 |It's also relatively trivial to implement, and it provides
 |very useful capabilities (such as the ability to easily assign terminating \
 |newlines).
 |
 |I'd still like to see the addition of $'...'.

Me too, i am all in favour of $'', and i hope it is not because of
me that issue 249 is still open.  It is anyway implemented the way
it is as of today!

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



Re: Status of $'...' addition (was: ksh93 job control behaviour)

2020-07-30 Thread Steffen Nurpmeso
Joerg Schilling wrote in
 <5f22d4b4.8vf9+w1hbegjrn1d%joerg.schill...@fokus.fraunhofer.de>:
 |Steffen Nurpmeso  wrote:
 |
 |> And for that it would be tremendous if $'' would be defined so
 |> that it can be used as the sole quoting mechanism, and that would
 |> then also include expansion of $VAR (i use \$VAR or \${VAR} in my
 |> mailer).  But to know exactly how problematic splitting of quotes
 |> is for many languages of the world, including right-to-left
 |> direction and shift state changes etc., and changing of meaning as
 |> such if the sentence cannot be interpreted as a unity, a real
 |> expert had to be asked.  Anyhow, the Unicode effort mandates
 |> processing of entire strings and denotes isolated treatment as
 |> a complete error.
 |
 |Even if it would become part of the standad today, you stilll would need
 |to wait some years until all implementations take it up.

I must admit the last time i looked in an iconv(3) implementation
(GNU) it was not like that either, it was plain "1:1" conversion.
(I hope i am not lying now, ..it is what i remember.)

But even if it is for the future, if you write u$'\u0308' nothing
can happen, if you would write $'u\u0308' then an iconv(3) which
does its job really well could recognize the COMBINING DIAERESIS
and create the ü you want in your LATIN1 environment.  (This is
a simple example, but \u is meant to embed Unicode, and then
graphemes come into play; i mean, even ncurses is capable to
properly deal with this stuff since many years, and this is
something yet to be standardized.)

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



Re: A question about interpretation

2020-07-30 Thread Robert Elz
Date:Thu, 30 Jul 2020 09:53:00 -0700
From:Nick Stoughton 
Message-ID:  


  | Cross-references are informative.

That is what I would have expected.

  | However, even without a cross-reference if the normative text in
  | two places disagrees or allows for different things,

Not quite that, the fake example I gave was somewhat more extreme.
It is more a question of what should be done, specifics below.

  | If a specific case refers to a
  | generalized one, then there is nothing wrong.

Sure.

  | In your example, I would file an interpretation request, since the standard
  | does not appear to define the concept of Orange as a fruit except in this
  | one narrow case.

That would be the same as the actual case.  Further, there are real
similarities, in both the general expectation would be that the general
case doesn't really need defining, everyone "simply knows" what it is
(ie: the concept of an orange is common knowledge) but a definition is
required for the specific case to make it clear what it applies to
(just which oranges are Valencia oranges, must they actually be grown in
Spain?).


Now the actual case:

XCU 2.6.1

(this is from the Issue 8 draft, but the relevant parts are mostly
unchanged from Issue 7 I believe)

In an assignment (see XBD Section 4.23),

["assignment" is the orange]

multiple tilde-prefixes can be used: one at the beginning of the
word (that is, following the  of the assignment), or
one following any unquoted , or both. A tilde-prefix in an
assignment is terminated by the first unquoted  or ,
or the end of the assignment word.

(earlier text says that in the default case, ie: when not an assignment, the
tilde-prefix is everything up to an unquoted '/' or the whole word if there
is none.)

XBD 4.23 is

4.23 Variable Assignment

In the shell command language, a word consisting of the
following parts:
[...]

There is no other definition of "assignment" (in XBD 3 or XBD 4, or
anywhere I can see in XCU), just this one, which is the definition of
one specific form of assignment.

Where this becomes an issue is in relation to XCU 2.6.2

In addition, a parameter expansion can be modified by using one
of the following formats. In each case that a value of word is
needed (based on the state of parameter, as described below),
word shall be subjected to tilde expansion, parameter expansion, [...]

and still from 2.6.2

${parameter:=[word]} Assign Default Values. If parameter is unset
 or null, quote removal shall be performed on the expansion
 of word and the result (or an empty string if word is omitted)
 shall be assigned to parameter. [...]

Now the question is in an expression like

${unset_var=~:~user}

what should happen?   That last quote from 2.6.2 says an assignment takes
place (given unset_var is in fact unset, if not, the "word" is irrelevant),
the earlier quote from 2.6.2 says that tilde expansion happens on the word,
and the quote from 2.6.1 says that in an assignment, ':' terminates the
tilde prefix, and the ~ after the ':' starts a new tilde prefix, so provided
that "user" is a known user name, this should set unset_var to
 ${HOME}:$(homedir_of user)
(assuming there was a function/command "homedir_of" which does the obvious
thing).

That's how the NetBSD shell works.

But it is the only one that I'm aware of.   The various ksh's (and bash)
seem to treat the ':' as a terminator for the tilde prefix, but don't
treat it as being the starting point of a new one (ie: kind of half of
an assignment).  As best I can tell that's behaviour that makes no
sense (with one caveat below).

Other shells treat the entire word (there being no '/') as the tilde-prefix.

This must be being justified by the xref to XBD 4.23 as the definition
of an "assignment" even though what it defines is a "variable assignment"
which is not the phrase that 2.6.1 uses.  (ie: orange vs valencia orange).
The word in question does not meet the definition of a variable assignment
(while the var= part exists in the parameter expansion, only the word
part of it is being processed here - and even if we step backwards, the
parameter expansion, while superficially similar to a variable assignment
doesn't meet the definition (it cotains "${" at the start for example, and
even that might be embedded in some longer word).

Since they do not treat this expansion as an assignment, the ':' does
not terminate the tilde-prefix, and they fail to find a user with the
resulting name (":~user") which isn't a portable user name in any case.
They then simply say that tilde expansion does nothing, and leave the
word unaltered (assign '~:~user' to unset_var).

The analysis gets complicated here, (again from XCU 2.6.1):

If these characters do not form a portable login name (see
the 

Re: Status of $'...' addition (was: ksh93 job control behaviour)

2020-07-30 Thread David A. Wheeler
Steffen Nurpmeso  wrote:
> > And for that it would be tremendous if $'' would be defined so
> > that it can be used as the sole quoting mechanism, and that would
> > then also include expansion of $VAR (i use \$VAR or \${VAR} in my
> > mailer).  But to know exactly how problematic splitting of quotes
> > is for many languages of the world, including right-to-left
> > direction and shift state changes etc., and changing of meaning as
> > such if the sentence cannot be interpreted as a unity, a real
> > expert had to be asked.  Anyhow, the Unicode effort mandates
> > processing of entire strings and denotes isolated treatment as
> > a complete error.

I think eliminating old quoting mechanisms would be a mistake.

On Thu, 30 Jul 2020 16:09:56 +0200, Joerg Schilling 
 wrote:
> Even if it would become part of the standad today, you stilll would need
> to wait some years until all implementations take it up.

That's true for almost all standards changes.
However, many shells *already* implement $'...'.
It's also relatively trivial to implement, and it provides
very useful capabilities (such as the ability to easily assign terminating 
newlines).

I'd still like to see the addition of $'...'.

--- David A. Wheeler



Re: A question about interpretation

2020-07-30 Thread Nick Stoughton
Cross-references are informative. However, even without a cross-reference
if the normative text in two places disagrees or allows for different
things, then an interpretation is required. If a specific case refers to a
generalized one, then there is nothing wrong.

In your example, I would file an interpretation request, since the standard
does not appear to define the concept of Orange as a fruit except in this
one narrow case. If, on the other hand, the definition was for an Orange,
and the squeezing requirement was only for Valencia Oranges, then the
standard is consistent, and the standard is silent about Mandarin
Oranges (permitting but not requiring squeezing).

Hope that helps!
-- 
Nick

On Thu, Jul 30, 2020 at 7:54 AM Robert Elz  wrote:

> In the standard, if the words say something, and a
> followed by a cross reference (xref) to a definition
> which defines something subtly different, is the
> correct reading to limit the specification to what
> is defined in the xref, or is the xref to be treated
> as more informative - as additional information which
> might help explain a term used ?
>
> To give an example (purposely, and obviously, not
> related to posix for right now), suppose the standard
> said
>
> if the fruit is an orange [xref definitions 17)
> then squeeze it, otherwise take care not to squeeze it.
>
> and definitions 17 is:
>
> 17. Valencia Orange: a roundish orange coloured citrus ...
> (the rest of what it might say is irrelevant).
>
> In this case, if we pick up a piece of fruit, and it is an orange,
> but not a valencia orange, are we to squeeze it or not?
>
> That is, does the definition that was xref'd limit the interpretation
> of the preceding word (or phrase) to only apply to what is defined,
> or is it to be taken as simply providing information, should the
> reader happen not to know what an orange might be?
>
> kre
>
>


Re: Status of $'...' addition (was: ksh93 job control behaviour)

2020-07-30 Thread shwaresyst

Yes, the additions necessary still for even limited Unicode support above the 
broken bandaids C11+ provide are one of those issues. Where Unicode is 
incompatible with POSIX, and is therefore (by design) broken too needs 
addressing also. The white papers detailing most of these changes have yet to 
be written, or published if some have been.
On Thursday, July 30, 2020 Steffen Nurpmeso  wrote:
shwaresyst wrote in
 <311169368.9432836.1596108598...@mail.yahoo.com>:
 |On Thursday, July 30, 2020 Geoff Clare  wrote:
 |Robert Elz  wrote, on 29 Jul 2020:
 |>
 |> Speaking of which, what is the current holdup with resolving
 |> whichever bug it is (I hate searching in mantis, so I won't
 |> try here) which specifies $'...' ?  Perhaps whatever the
 |> problem was (before my time) with the specification of that
 |> is no longer a problem?
 |
 |It's bug 249. It was reopened in Oct 2015 and several notes were
 |added to the bug after that, starting with 
 |
 |https://austingroupbugs.net/view.php?id=249#c2893
 |
 |My guess is the conference calls postponed returning to it because
 |there was ongoing discussion, but by the time the discussion ended
 |it had "gone off the radar".
 ...
 |Also, as something new, its inclusion is part of a later draft of Issue \
 |8. Additional issues it depends on need to be addressed first, specified \
 |fully, and incorporated. This is more why it went on the back burner, \
 |that I recall. Various other bugs are in similar state; the prerequisites \
 |to finish speciifying them so they can be considered portable aren't \
 |done yet either.

The problem being that what is in the wild does not work out for
many languages.  The in-use shell quote pattern consisting of
small, isolated parts which depend on which kind of escaping and
expanding is necessary just does not work out for many languages.
Period.

I (the mailer i maintain, using POSIX-incompatible sh(1)ell-style
command line input) for example claim

  ? echo 'Quotes '${HOME}' and 'tokens" differ!"# no comment
  ? echo Quotes ${HOME} and tokens differ! # comment
  ? echo Don"'"t you worry$'\x21' The sun shines on us. $'\u263A'

The latter is what i mean.  There are many languages on this world
where these \u expansions do not work out that way, but where the
"entire sentence must be interpreted as a unity" in order to get
the iconv(3) conversation to nl_langinfo(CODESET) correctly, aka
the way it is _desired_.  Of course you can move it all to the
twilight zone of "undefined behaviour", but if you do not, then
quoting must extend to the largest possible extend, and
interpreted as a unity.

And for that it would be tremendous if $'' would be defined so
that it can be used as the sole quoting mechanism, and that would
then also include expansion of $VAR (i use \$VAR or \${VAR} in my
mailer).  But to know exactly how problematic splitting of quotes
is for many languages of the world, including right-to-left
direction and shift state changes etc., and changing of meaning as
such if the sentence cannot be interpreted as a unity, a real
expert had to be asked.  Anyhow, the Unicode effort mandates
processing of entire strings and denotes isolated treatment as
a complete error.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter          he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

A question about interpretation

2020-07-30 Thread Robert Elz
In the standard, if the words say something, and a
followed by a cross reference (xref) to a definition
which defines something subtly different, is the
correct reading to limit the specification to what
is defined in the xref, or is the xref to be treated
as more informative - as additional information which
might help explain a term used ?

To give an example (purposely, and obviously, not
related to posix for right now), suppose the standard
said

if the fruit is an orange [xref definitions 17)
then squeeze it, otherwise take care not to squeeze it.

and definitions 17 is:

17. Valencia Orange: a roundish orange coloured citrus ...
(the rest of what it might say is irrelevant).

In this case, if we pick up a piece of fruit, and it is an orange,
but not a valencia orange, are we to squeeze it or not?

That is, does the definition that was xref'd limit the interpretation
of the preceding word (or phrase) to only apply to what is defined,
or is it to be taken as simply providing information, should the
reader happen not to know what an orange might be?

kre



Re: Status of $'...' addition (was: ksh93 job control behaviour)

2020-07-30 Thread Joerg Schilling
Steffen Nurpmeso  wrote:

> And for that it would be tremendous if $'' would be defined so
> that it can be used as the sole quoting mechanism, and that would
> then also include expansion of $VAR (i use \$VAR or \${VAR} in my
> mailer).  But to know exactly how problematic splitting of quotes
> is for many languages of the world, including right-to-left
> direction and shift state changes etc., and changing of meaning as
> such if the sentence cannot be interpreted as a unity, a real
> expert had to be asked.  Anyhow, the Unicode effort mandates
> processing of entire strings and denotes isolated treatment as
> a complete error.

Even if it would become part of the standad today, you stilll would need
to wait some years until all implementations take it up.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: Status of $'...' addition (was: ksh93 job control behaviour)

2020-07-30 Thread Steffen Nurpmeso
shwaresyst wrote in
 <311169368.9432836.1596108598...@mail.yahoo.com>:
 |On Thursday, July 30, 2020 Geoff Clare  wrote:
 |Robert Elz  wrote, on 29 Jul 2020:
 |>
 |> Speaking of which, what is the current holdup with resolving
 |> whichever bug it is (I hate searching in mantis, so I won't
 |> try here) which specifies $'...' ?  Perhaps whatever the
 |> problem was (before my time) with the specification of that
 |> is no longer a problem?
 |
 |It's bug 249. It was reopened in Oct 2015 and several notes were
 |added to the bug after that, starting with 
 |
 |https://austingroupbugs.net/view.php?id=249#c2893
 |
 |My guess is the conference calls postponed returning to it because
 |there was ongoing discussion, but by the time the discussion ended
 |it had "gone off the radar".
 ...
 |Also, as something new, its inclusion is part of a later draft of Issue \
 |8. Additional issues it depends on need to be addressed first, specified \
 |fully, and incorporated. This is more why it went on the back burner, \
 |that I recall. Various other bugs are in similar state; the prerequisites \
 |to finish speciifying them so they can be considered portable aren't \
 |done yet either.

The problem being that what is in the wild does not work out for
many languages.  The in-use shell quote pattern consisting of
small, isolated parts which depend on which kind of escaping and
expanding is necessary just does not work out for many languages.
Period.

I (the mailer i maintain, using POSIX-incompatible sh(1)ell-style
command line input) for example claim

  ? echo 'Quotes '${HOME}' and 'tokens" differ!"# no comment
  ? echo Quotes ${HOME} and tokens differ! # comment
  ? echo Don"'"t you worry$'\x21' The sun shines on us. $'\u263A'

The latter is what i mean.  There are many languages on this world
where these \u expansions do not work out that way, but where the
"entire sentence must be interpreted as a unity" in order to get
the iconv(3) conversation to nl_langinfo(CODESET) correctly, aka
the way it is _desired_.  Of course you can move it all to the
twilight zone of "undefined behaviour", but if you do not, then
quoting must extend to the largest possible extend, and
interpreted as a unity.

And for that it would be tremendous if $'' would be defined so
that it can be used as the sole quoting mechanism, and that would
then also include expansion of $VAR (i use \$VAR or \${VAR} in my
mailer).  But to know exactly how problematic splitting of quotes
is for many languages of the world, including right-to-left
direction and shift state changes etc., and changing of meaning as
such if the sentence cannot be interpreted as a unity, a real
expert had to be asked.  Anyhow, the Unicode effort mandates
processing of entire strings and denotes isolated treatment as
a complete error.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



Re: ksh93 job control behaviour [was: Draft suggestion: Job control and subshells]

2020-07-30 Thread Joerg Schilling
Geoff Clare  wrote:

> It's only easy because (most/all?) shells take the easy option and do
> a lexical analysis of the command to be substituted. Applications can't
> expect the following to work, but if the feature was implemented
> "properly", it would:
>
> showalltraps() { trap -p; }
> alltraps=$(showalltraps)

Could you explain what you understand by "most shells take the easy option and
do a lexical analysis of the command to be substituted."

What I however remember is that nobody checked the output of the command
when used in a command substitution...while we discussed the new feature.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



RE: Status of $'...' addition (was: ksh93 job control behaviour)

2020-07-30 Thread shwaresyst

Also, as something new, its inclusion is part of a later draft of Issue 8. 
Additional issues it depends on need to be addressed first, specified fully, 
and incorporated. This is more why it went on the back burner, that I recall. 
Various other bugs are in similar state; the prerequisites to finish 
speciifying them so they can be considered portable aren't done yet either.
On Thursday, July 30, 2020 Geoff Clare  wrote:
Robert Elz  wrote, on 29 Jul 2020:
>
> Speaking of which, what is the current holdup with resolving
> whichever bug it is (I hate searching in mantis, so I won't
> try here) which specifies $'...' ?  Perhaps whatever the
> problem was (before my time) with the specification of that
> is no longer a problem?

It's bug 249. It was reopened in Oct 2015 and several notes were
added to the bug after that, starting with 

https://austingroupbugs.net/view.php?id=249#c2893

My guess is the conference calls postponed returning to it because
there was ongoing discussion, but by the time the discussion ended
it had "gone off the radar".

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Status of $'...' addition (was: ksh93 job control behaviour)

2020-07-30 Thread Geoff Clare
Robert Elz  wrote, on 29 Jul 2020:
>
> Speaking of which, what is the current holdup with resolving
> whichever bug it is (I hate searching in mantis, so I won't
> try here) which specifies $'...' ?   Perhaps whatever the
> problem was (before my time) with the specification of that
> is no longer a problem?

It's bug 249. It was reopened in Oct 2015 and several notes were
added to the bug after that, starting with 

https://austingroupbugs.net/view.php?id=249#c2893

My guess is the conference calls postponed returning to it because
there was ongoing discussion, but by the time the discussion ended
it had "gone off the radar".

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England