Re: quote removal issues within character class

2019-11-13 Thread Oğuz
Thanks for the explanation. I didn't know any of these


Re: quote removal issues within character class

2019-11-13 Thread Chet Ramey

On 11/8/19 4:50 PM, Oğuz wrote:

v=foo
echo ${v#[[:"lower":]]}

should print oo, but it prints foo instead. This is reproducible on bash

4.4


Plus

case foo in (*[![:"lower":]]*) echo bar; esac

prints bar, while


The idea is that at this point in command processing, quote removal hasn't
been performed. According to the abstract model the shell uses for word
expansions, that means the double quotes are still present in the word, and
`"lower"' is not the same as `lower'.

There was a recent extensive discussion of this and other points on the
posix mailing list, and, as kre said, the committee has decided to make
this a special case. I changed this about a month ago, and the chage is
in the devel branch.


case foo in (*[![":lower":]]*) echo bar; esac

doesn't print anything. And this is only reproducible on bash >5.0


This is an invalid character class, since a class has to begin with the
two-character sequence `[:'. The intervening double quote causes that test
to fail (this case is not so special, it seems). The first `]' then
terminates the bracket expression, so th string has to contain at least
a `]' to have a possibility of matching.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: quote removal issues within character class

2019-11-09 Thread L A Walsh
FWIW, Andreas's description really was sufficient...






Re: quote removal issues within character class

2019-11-09 Thread Robert Elz
Date:Sat, 09 Nov 2019 06:46:05 -0800
From:L A Walsh 
Message-ID:  <5dc6d12d.6040...@tlinx.org>

  | Is this really what the standard says,

Yes, I used cut (and then some line length/wrappoing reformatting)

  | because '\\' is not a character, but 2 characters.

In sh, yes, in C, no.   So it all depends upon context.  The text comes
from the XBD (basic definitions) so isn't either sh or C specific.

But since that is followed by (and you quoted)
(, , , and ,
respectively)
I don't thnk there is any doubt what it means.

  | They could use "\\" but if a backslash is between single
  | quotes, it loses its special meaning.

In sh yes, but this text is from the section on Regular Expressions,
in the basic definitions (XBD), not the (many) sections about the shell
in XCU (Commands and Utilities).

  | The only way to get a backslash

get in what sense?   In sh '\' is just fine, as is \\ if what you
want is a literal backslash.

  | when using single quotes that I've found is to end the single-quote
  | then use the backslash.

If you want to \ escape a character when you're in single quotes, then
yes (in sh) that is correct.   But the only time that makes sense is if
the character that is to follow the \ is a ' (which cannot appear in a
single quoted string), or if you want to elide a newline \ (where
the  means a literal newline character).

  | So if you wanted to insert single quotes in
  | a string that is single-quoted, you would have to do this:
  |
  | 'this is a single-quote(SQ) quoted string using a SQ ('\'') within
  | the single quote.' 

That is the usual way, yes.

  | Alternatively using double-quotes or another quoting
  | mechanism might be preferable.

Yes, there are several other ways to do it, but the \' form is
certainly the most common.   When $'...' quoting becomes more popular
(after it actually makes it into the standard) then that form allows
$' ...\' ... '
($' is "C" quoting, it works just like single quotes, except that C style
backslash escape sequences, plus a few new ones, are expanded).

  | I find it odd that the standard would try to use a backslash within
  | a SQ'd string as a literalizer.

It all relates to the context, and the expectations of the reader.  Since
the parenthisised note that follows is explicit about what it means, I
don't think it really matters.   In the section about the shell, \ tends
to be written exactly like that (no quo9ting, just the character) when it
is needed (it might occasionally be written as ).

kre




Re: quote removal issues within character class

2019-11-09 Thread Andreas Schwab
On Nov 09 2019, L A Walsh wrote:

> On 2019/11/09 04:49, Robert Elz wrote:
>> There's also
>>
>>  The special characters '.', '*', '[', and '\\'
>>  (, , , and ,
>>  respectively) shall lose their special meaning within a bracket
>>  expression.
>>   
> 
> Is this really what the standard says, because '\\' is not a character, but
> 2 characters.  They could use "\\" but if a backslash is between single
> quotes, it loses its special meaning.

This is C notation, not shell notation.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



Re: quote removal issues within character class

2019-11-09 Thread Oğuz
You've already answered it, thank you. I didn't know that [:, [., [= were
special *sequences*, I guess I overlooked that part. Thanks again for
taking time to explain it in detail, I'm grateful


9 Kasım 2019 Cumartesi tarihinde Robert Elz  yazdı:

> Date:Sat, 9 Nov 2019 07:35:16 +0300
> From:=?UTF-8?B?T8SfdXo=?= 
> Message-ID:  <
> cah7i3lr68civxlr9_hoogqa7vd-zyvz+fck-0k3uqptnsir...@mail.gmail.com>
>
>   | is correct, as "foo" does not contain a ']' which would be required
>   | > to match there (quoting the ':' means there is no character class,
>   | > hence we have instead (the negation of) a char class containing '['
> ':'
>   | > 'l' 'o' 'w' 'e' ';r' (and ':' again), preceded by anything, and
>   | > followed by ']' and anything.   foo does not match. f]oo would.
>   | >
>   |
>   | where exactly is this documented in the standard?
>
> I'm not sure which part exactly you're looking for, but char sets in sh
> are specified to be the same as in REs, except that ! replaces ^ as the
> negation character (that's in XCU 2.13.1).  Char sets (bracket expressions)
> in RE's are documented in XBD 9.3.5 wherein it states
>
> A bracket expression is either a matching list expression or a
> non-matching list expression. It consists of one or more
> expressions:
> ordinary characters, collating elements, collating symbols,
> equivalence classes, character classes, or range expressions.
> The  (']') shall lose its special meaning and
> represent itself in a bracket expression if it occurs first in the
> list
> (after an initial  ('^'), if any).
>
> Otherwise, it shall terminate the bracket expression,
>
> That is, a ']' that occurs anywhere else terminates the bracket expression
> except:
>
> unless it   appears in a collating symbol (such as "[.].]")
>
> (not relevant in the given example)
>
> or is the ending  for a collating symbol,
> equivalence class, or character class.
>
> So the ']' that immediately follows the second ':' would not terminate the
> bracket expression if it is the ending ']' for a character class
> (collating symbols and equiv classes not being relevant to the example).
> Of course, that can only happen if there is a character class to end.
>
> There's also
>
> The special characters '.', '*', '[', and '\\'
> (, , , and ,
> respectively) shall lose their special meaning within a bracket
> expression.
>
> whereupon if the [": sequence does not start a char class, the '[' there
> is simply a literal char inside the bracket expression.
>
> Similarly if the bracket expression ends at the first ']' (the one
> imediately
> after the second ':') the following ']' is simply a literal character, as
> ']' chars are special only when following a '['.
>
> So, all that's left to determine is whether the [": sequence can be
> considered as beginning a char class.
>
> In a RE it certainly cannot - quote chars (' and ") are not special in
> REs at all, and [": is no different syntatically than [x: which no-one
> would treat as being the introduction to a char class.
>
> This is also, I believe (Chet can confirm, or refute, if he desires) where
> bash gets the interpretation that "lower" (including the quotes) is the
> name of the char class in [:"lower":] except that it cannot be, as char
> class names cannot contain quote characters (which should lead to the
> whole sub-expression not being treated as a char class at all, instead
> bash treats it, I think, as if it were an unknown but valid class name).
>
> But when it comes from sh, quote chars are "different" and instead of
> just being characters, they instead affect the interpretation of the
> characters that are quoted.  See XCU 2.2:
>
> Quoting is used to remove the special meaning of certain characters
> or words to the shell.
>
> Quoting can be used to preserve the literal meaning of the special
> characters in the next paragrapyh [...]
>
> and the following may need to be quoted under certain
> circumstances.
> That is, these characters may be special depending on conditions
> described elsewhere in this volume of POSIX.1-2017:
>
> * ? [ # ~ = %
>
> to which more chars have been added (as I recall) recently by some
> Austin Group correction (which I think includes ! : - and ]), that is
> to make it clear, that in sh
>
> [a'-'z]
>
> is a bracket expression containing 3 chars 'a' '-' and 'z' (which form
> of quoting is used to remove the specialness of the '-' is irrelevant).
> and that "[a-z]" isn't a bracket expression at all (neither of which
> is true in an RE - though the role of \ in RE's is being altered slightlty
> so if it had been [a\-z] in a RE things are less clear.)
>
> The effect of this is that in sh, in an expression like
>
> [![":lower":]]
>
> the first ':' is not "special" and hence 

Re: quote removal issues within character class

2019-11-09 Thread L A Walsh
On 2019/11/09 04:49, Robert Elz wrote:
> There's also
>
>   The special characters '.', '*', '[', and '\\'
>   (, , , and ,
>   respectively) shall lose their special meaning within a bracket
>   expression.
>   

Is this really what the standard says, because '\\' is not a character, but
2 characters.  They could use "\\" but if a backslash is between single
quotes, it loses its special meaning.  The only way to get a backslash
when using single quotes that I've found is to end the single-quote
then use the backslash.  So if you wanted to insert single quotes in
a string that is single-quoted, you would have to do this:

'this is a single-quote(SQ) quoted string using a SQ ('\'') within
the single quote.' 

Alternatively using double-quotes or another quoting
mechanism might be preferable.

I find it odd that the standard would try to use a backslash within
a SQ'd string as a literalizer.





Re: quote removal issues within character class

2019-11-09 Thread Robert Elz
Date:Sat, 9 Nov 2019 07:35:16 +0300
From:=?UTF-8?B?T8SfdXo=?= 
Message-ID:  


  | is correct, as "foo" does not contain a ']' which would be required
  | > to match there (quoting the ':' means there is no character class,
  | > hence we have instead (the negation of) a char class containing '[' ':'
  | > 'l' 'o' 'w' 'e' ';r' (and ':' again), preceded by anything, and
  | > followed by ']' and anything.   foo does not match. f]oo would.
  | >
  |
  | where exactly is this documented in the standard?

I'm not sure which part exactly you're looking for, but char sets in sh
are specified to be the same as in REs, except that ! replaces ^ as the
negation character (that's in XCU 2.13.1).  Char sets (bracket expressions)
in RE's are documented in XBD 9.3.5 wherein it states

A bracket expression is either a matching list expression or a
non-matching list expression. It consists of one or more expressions:
ordinary characters, collating elements, collating symbols,
equivalence classes, character classes, or range expressions.
The  (']') shall lose its special meaning and
represent itself in a bracket expression if it occurs first in the list
(after an initial  ('^'), if any).

Otherwise, it shall terminate the bracket expression,

That is, a ']' that occurs anywhere else terminates the bracket expression
except:

unless it   appears in a collating symbol (such as "[.].]")

(not relevant in the given example)

or is the ending  for a collating symbol,
equivalence class, or character class.

So the ']' that immediately follows the second ':' would not terminate the
bracket expression if it is the ending ']' for a character class
(collating symbols and equiv classes not being relevant to the example).
Of course, that can only happen if there is a character class to end.

There's also

The special characters '.', '*', '[', and '\\'
(, , , and ,
respectively) shall lose their special meaning within a bracket
expression.

whereupon if the [": sequence does not start a char class, the '[' there
is simply a literal char inside the bracket expression.

Similarly if the bracket expression ends at the first ']' (the one imediately
after the second ':') the following ']' is simply a literal character, as
']' chars are special only when following a '['.

So, all that's left to determine is whether the [": sequence can be
considered as beginning a char class.

In a RE it certainly cannot - quote chars (' and ") are not special in
REs at all, and [": is no different syntatically than [x: which no-one
would treat as being the introduction to a char class.

This is also, I believe (Chet can confirm, or refute, if he desires) where
bash gets the interpretation that "lower" (including the quotes) is the
name of the char class in [:"lower":] except that it cannot be, as char
class names cannot contain quote characters (which should lead to the
whole sub-expression not being treated as a char class at all, instead
bash treats it, I think, as if it were an unknown but valid class name).

But when it comes from sh, quote chars are "different" and instead of
just being characters, they instead affect the interpretation of the
characters that are quoted.  See XCU 2.2:

Quoting is used to remove the special meaning of certain characters
or words to the shell.

Quoting can be used to preserve the literal meaning of the special
characters in the next paragrapyh [...]

and the following may need to be quoted under certain circumstances.
That is, these characters may be special depending on conditions
described elsewhere in this volume of POSIX.1-2017:

* ? [ # ~ = %

to which more chars have been added (as I recall) recently by some
Austin Group correction (which I think includes ! : - and ]), that is
to make it clear, that in sh

[a'-'z]

is a bracket expression containing 3 chars 'a' '-' and 'z' (which form
of quoting is used to remove the specialness of the '-' is irrelevant).
and that "[a-z]" isn't a bracket expression at all (neither of which
is true in an RE - though the role of \ in RE's is being altered slightlty
so if it had been [a\-z] in a RE things are less clear.)

The effect of this is that in sh, in an expression like

[![":lower":]]

the first ':' is not "special" and hence cannot form part of the
magic opening '[:' sequence for a character class.   Hence this
expression contains no character class, and consequently the
':]' chars are simply a ':' in the bracket expression, and then
the terminating ']' - which leaves the second ']' being just a
literal character.


While here (these following parts are not relevant to your question I believe)
when used in sh

[[:"lower":]]

should be treated just the same as

[[:lower:]]

for the same reason that

["abc"]


Re: quote removal issues within character class

2019-11-08 Thread Oğuz
is correct, as "foo" does not contain a ']' which would be required
> to match there (quoting the ':' means there is no character class,
> hence we have instead (the negation of) a char class containing '[' ':'
> 'l' 'o' 'w' 'e' ';r' (and ':' again), preceded by anything, and
> followed by ']' and anything.   foo does not match. f]oo would.
>

where exactly is this documented in the standard?


Re: quote removal issues within character class

2019-11-08 Thread Robert Elz
Date:Sat, 9 Nov 2019 00:50:52 +0300
From:=?UTF-8?B?T8SfdXo=?= 
Message-ID:  


These two

  | v=foo
  | echo ${v#[[:"lower":]]}

  | case foo in (*[![:"lower":]]*) echo bar; esac

are because bash believes that the character class name must not
be quoted (which is likely to be clarified to be incorrect in the
next revision of posix).

This one

  | case foo in (*[![":lower":]]*) echo bar; esac

is correct, as "foo" does not contain a ']' which would be required
to match there (quoting the ':' means there is no character class,
hence we have instead (the negation of) a char class containing '[' ':'
'l' 'o' 'w' 'e' ';r' (and ':' again), preceded by anything, and
followed by ']' and anything.   foo does not match. f]oo would.

kre




quote removal issues within character class

2019-11-08 Thread Oğuz
v=foo
echo ${v#[[:"lower":]]}

should print oo, but it prints foo instead. This is reproducible on bash
>4.4

Plus

case foo in (*[![:"lower":]]*) echo bar; esac

prints bar, while

case foo in (*[![":lower":]]*) echo bar; esac

doesn't print anything. And this is only reproducible on bash >5.0