bug in arithmetic expansion

2019-11-09 Thread Joern Knoll

Hallo,

in playing around with digital keys (integers) which have a simple 
arithmetic check property, I encountered problemsusing bash's arithmetic 
expansion, when ever the used digital substrings have leading zeros. The 
problem shows up already for the simplest operations, namely converting 
a string argument to its numerical value, as shown below.


With thanks for your attention and best regards, Jörn Knoll

[tplx99]:/the/knoll > echo $((0123))
83
[tplx99]:/the/knoll > echo $((123))
123
[tplx99]:/the/knoll > echo $((01234))
668
[tplx99]:/the/knoll > echo $((1234))
1234


--

%  %
% Jörn Knoll  phone: +49 6159 71 2753  %
% GSI fax:   +49 6159 71 2990  %
% Planckstr. 1email:j.kn...@gsi.de %
% D-64291 Darmstadthttps://theory.gsi.de %
%  %
% GSI Student Programhttps://theory.gsi.de/stud-pro%
% Schnelle Ionen e.V.https://www.SchnelleIonen.de  %





Re: quote removal issues within character class

2019-11-09 Thread Oğuz
You've already answered it, thank you. I didn't know that [:, [., [= were
special *sequences*, I guess I overlooked that part. Thanks again for
taking time to explain it in detail, I'm grateful


9 Kasım 2019 Cumartesi tarihinde Robert Elz  yazdı:

> Date:Sat, 9 Nov 2019 07:35:16 +0300
> From:=?UTF-8?B?T8SfdXo=?= 
> Message-ID:  <
> cah7i3lr68civxlr9_hoogqa7vd-zyvz+fck-0k3uqptnsir...@mail.gmail.com>
>
>   | is correct, as "foo" does not contain a ']' which would be required
>   | > to match there (quoting the ':' means there is no character class,
>   | > hence we have instead (the negation of) a char class containing '['
> ':'
>   | > 'l' 'o' 'w' 'e' ';r' (and ':' again), preceded by anything, and
>   | > followed by ']' and anything.   foo does not match. f]oo would.
>   | >
>   |
>   | where exactly is this documented in the standard?
>
> I'm not sure which part exactly you're looking for, but char sets in sh
> are specified to be the same as in REs, except that ! replaces ^ as the
> negation character (that's in XCU 2.13.1).  Char sets (bracket expressions)
> in RE's are documented in XBD 9.3.5 wherein it states
>
> A bracket expression is either a matching list expression or a
> non-matching list expression. It consists of one or more
> expressions:
> ordinary characters, collating elements, collating symbols,
> equivalence classes, character classes, or range expressions.
> The  (']') shall lose its special meaning and
> represent itself in a bracket expression if it occurs first in the
> list
> (after an initial  ('^'), if any).
>
> Otherwise, it shall terminate the bracket expression,
>
> That is, a ']' that occurs anywhere else terminates the bracket expression
> except:
>
> unless it   appears in a collating symbol (such as "[.].]")
>
> (not relevant in the given example)
>
> or is the ending  for a collating symbol,
> equivalence class, or character class.
>
> So the ']' that immediately follows the second ':' would not terminate the
> bracket expression if it is the ending ']' for a character class
> (collating symbols and equiv classes not being relevant to the example).
> Of course, that can only happen if there is a character class to end.
>
> There's also
>
> The special characters '.', '*', '[', and '\\'
> (, , , and ,
> respectively) shall lose their special meaning within a bracket
> expression.
>
> whereupon if the [": sequence does not start a char class, the '[' there
> is simply a literal char inside the bracket expression.
>
> Similarly if the bracket expression ends at the first ']' (the one
> imediately
> after the second ':') the following ']' is simply a literal character, as
> ']' chars are special only when following a '['.
>
> So, all that's left to determine is whether the [": sequence can be
> considered as beginning a char class.
>
> In a RE it certainly cannot - quote chars (' and ") are not special in
> REs at all, and [": is no different syntatically than [x: which no-one
> would treat as being the introduction to a char class.
>
> This is also, I believe (Chet can confirm, or refute, if he desires) where
> bash gets the interpretation that "lower" (including the quotes) is the
> name of the char class in [:"lower":] except that it cannot be, as char
> class names cannot contain quote characters (which should lead to the
> whole sub-expression not being treated as a char class at all, instead
> bash treats it, I think, as if it were an unknown but valid class name).
>
> But when it comes from sh, quote chars are "different" and instead of
> just being characters, they instead affect the interpretation of the
> characters that are quoted.  See XCU 2.2:
>
> Quoting is used to remove the special meaning of certain characters
> or words to the shell.
>
> Quoting can be used to preserve the literal meaning of the special
> characters in the next paragrapyh [...]
>
> and the following may need to be quoted under certain
> circumstances.
> That is, these characters may be special depending on conditions
> described elsewhere in this volume of POSIX.1-2017:
>
> * ? [ # ~ = %
>
> to which more chars have been added (as I recall) recently by some
> Austin Group correction (which I think includes ! : - and ]), that is
> to make it clear, that in sh
>
> [a'-'z]
>
> is a bracket expression containing 3 chars 'a' '-' and 'z' (which form
> of quoting is used to remove the specialness of the '-' is irrelevant).
> and that "[a-z]" isn't a bracket expression at all (neither of which
> is true in an RE - though the role of \ in RE's is being altered slightlty
> so if it had been [a\-z] in a RE things are less clear.)
>
> The effect of this is that in sh, in an expression like
>
> [![":lower":]]
>
> the first ':' is not "special" and hence 

Re: bug in arithmetic expansion

2019-11-09 Thread pepa65
In the arithmetic context, leading zeroes signify an octal base. Had you
used an 8 or 9, you would have gotten a message like:

bash: 08: value too great for base (error token is "08")

when trying: echo $((08))

So it's not a bug, it's a feature; make sure your base-10 numbers don't
have leading zeroes!

Peter


On 11/9/19 5:52 PM, Joern Knoll wrote:
> Hallo,
> 
> in playing around with digital keys (integers) which have a simple
> arithmetic check property, I encountered problemsusing bash's arithmetic
> expansion, when ever the used digital substrings have leading zeros. The
> problem shows up already for the simplest operations, namely converting
> a string argument to its numerical value, as shown below.
> 
> With thanks for your attention and best regards, Jörn Knoll
> 
> [tplx99]:/the/knoll > echo $((0123))
> 83
> [tplx99]:/the/knoll > echo $((123))
> 123
> [tplx99]:/the/knoll > echo $((01234))
> 668
> [tplx99]:/the/knoll > echo $((1234))
> 1234
> 
> 



Re: quote removal issues within character class

2019-11-09 Thread L A Walsh
FWIW, Andreas's description really was sufficient...






Re: bug in arithmetic expansion

2019-11-09 Thread Robert Elz
Date:Sat, 9 Nov 2019 16:39:52 +0100
From:Davide Brini 
Message-ID:  <1mi5ud-1ifip305pl-00f...@mail.gmx.com>

  | If you want to force base 10 interpretation (remember that leading 0 mean
  | octal in arithmetic context), you need to explicitly tell bash:
  |
  | $ echo $(( 10#0123 ))
  | 123

But do remember that that form is not portable, and is difficult to
use correctly in the cases that matter (when the actyal number comes
from a variable .. when it is literal, as in all the examples in this
thread, simply omitting the leading 0 is much simpler, and fully portable).

kre





Re: quote removal issues within character class

2019-11-09 Thread Robert Elz
Date:Sat, 9 Nov 2019 07:35:16 +0300
From:=?UTF-8?B?T8SfdXo=?= 
Message-ID:  


  | is correct, as "foo" does not contain a ']' which would be required
  | > to match there (quoting the ':' means there is no character class,
  | > hence we have instead (the negation of) a char class containing '[' ':'
  | > 'l' 'o' 'w' 'e' ';r' (and ':' again), preceded by anything, and
  | > followed by ']' and anything.   foo does not match. f]oo would.
  | >
  |
  | where exactly is this documented in the standard?

I'm not sure which part exactly you're looking for, but char sets in sh
are specified to be the same as in REs, except that ! replaces ^ as the
negation character (that's in XCU 2.13.1).  Char sets (bracket expressions)
in RE's are documented in XBD 9.3.5 wherein it states

A bracket expression is either a matching list expression or a
non-matching list expression. It consists of one or more expressions:
ordinary characters, collating elements, collating symbols,
equivalence classes, character classes, or range expressions.
The  (']') shall lose its special meaning and
represent itself in a bracket expression if it occurs first in the list
(after an initial  ('^'), if any).

Otherwise, it shall terminate the bracket expression,

That is, a ']' that occurs anywhere else terminates the bracket expression
except:

unless it   appears in a collating symbol (such as "[.].]")

(not relevant in the given example)

or is the ending  for a collating symbol,
equivalence class, or character class.

So the ']' that immediately follows the second ':' would not terminate the
bracket expression if it is the ending ']' for a character class
(collating symbols and equiv classes not being relevant to the example).
Of course, that can only happen if there is a character class to end.

There's also

The special characters '.', '*', '[', and '\\'
(, , , and ,
respectively) shall lose their special meaning within a bracket
expression.

whereupon if the [": sequence does not start a char class, the '[' there
is simply a literal char inside the bracket expression.

Similarly if the bracket expression ends at the first ']' (the one imediately
after the second ':') the following ']' is simply a literal character, as
']' chars are special only when following a '['.

So, all that's left to determine is whether the [": sequence can be
considered as beginning a char class.

In a RE it certainly cannot - quote chars (' and ") are not special in
REs at all, and [": is no different syntatically than [x: which no-one
would treat as being the introduction to a char class.

This is also, I believe (Chet can confirm, or refute, if he desires) where
bash gets the interpretation that "lower" (including the quotes) is the
name of the char class in [:"lower":] except that it cannot be, as char
class names cannot contain quote characters (which should lead to the
whole sub-expression not being treated as a char class at all, instead
bash treats it, I think, as if it were an unknown but valid class name).

But when it comes from sh, quote chars are "different" and instead of
just being characters, they instead affect the interpretation of the
characters that are quoted.  See XCU 2.2:

Quoting is used to remove the special meaning of certain characters
or words to the shell.

Quoting can be used to preserve the literal meaning of the special
characters in the next paragrapyh [...]

and the following may need to be quoted under certain circumstances.
That is, these characters may be special depending on conditions
described elsewhere in this volume of POSIX.1-2017:

* ? [ # ~ = %

to which more chars have been added (as I recall) recently by some
Austin Group correction (which I think includes ! : - and ]), that is
to make it clear, that in sh

[a'-'z]

is a bracket expression containing 3 chars 'a' '-' and 'z' (which form
of quoting is used to remove the specialness of the '-' is irrelevant).
and that "[a-z]" isn't a bracket expression at all (neither of which
is true in an RE - though the role of \ in RE's is being altered slightlty
so if it had been [a\-z] in a RE things are less clear.)

The effect of this is that in sh, in an expression like

[![":lower":]]

the first ':' is not "special" and hence cannot form part of the
magic opening '[:' sequence for a character class.   Hence this
expression contains no character class, and consequently the
':]' chars are simply a ':' in the bracket expression, and then
the terminating ']' - which leaves the second ']' being just a
literal character.


While here (these following parts are not relevant to your question I believe)
when used in sh

[[:"lower":]]

should be treated just the same as

[[:lower:]]

for the same reason that

["abc"]


Re: quote removal issues within character class

2019-11-09 Thread Robert Elz
Date:Sat, 09 Nov 2019 06:46:05 -0800
From:L A Walsh 
Message-ID:  <5dc6d12d.6040...@tlinx.org>

  | Is this really what the standard says,

Yes, I used cut (and then some line length/wrappoing reformatting)

  | because '\\' is not a character, but 2 characters.

In sh, yes, in C, no.   So it all depends upon context.  The text comes
from the XBD (basic definitions) so isn't either sh or C specific.

But since that is followed by (and you quoted)
(, , , and ,
respectively)
I don't thnk there is any doubt what it means.

  | They could use "\\" but if a backslash is between single
  | quotes, it loses its special meaning.

In sh yes, but this text is from the section on Regular Expressions,
in the basic definitions (XBD), not the (many) sections about the shell
in XCU (Commands and Utilities).

  | The only way to get a backslash

get in what sense?   In sh '\' is just fine, as is \\ if what you
want is a literal backslash.

  | when using single quotes that I've found is to end the single-quote
  | then use the backslash.

If you want to \ escape a character when you're in single quotes, then
yes (in sh) that is correct.   But the only time that makes sense is if
the character that is to follow the \ is a ' (which cannot appear in a
single quoted string), or if you want to elide a newline \ (where
the  means a literal newline character).

  | So if you wanted to insert single quotes in
  | a string that is single-quoted, you would have to do this:
  |
  | 'this is a single-quote(SQ) quoted string using a SQ ('\'') within
  | the single quote.' 

That is the usual way, yes.

  | Alternatively using double-quotes or another quoting
  | mechanism might be preferable.

Yes, there are several other ways to do it, but the \' form is
certainly the most common.   When $'...' quoting becomes more popular
(after it actually makes it into the standard) then that form allows
$' ...\' ... '
($' is "C" quoting, it works just like single quotes, except that C style
backslash escape sequences, plus a few new ones, are expanded).

  | I find it odd that the standard would try to use a backslash within
  | a SQ'd string as a literalizer.

It all relates to the context, and the expectations of the reader.  Since
the parenthisised note that follows is explicit about what it means, I
don't think it really matters.   In the section about the shell, \ tends
to be written exactly like that (no quo9ting, just the character) when it
is needed (it might occasionally be written as ).

kre




Re: bug in arithmetic expansion

2019-11-09 Thread Davide Brini
On Sat, 9 Nov 2019 11:52:56 +0100, Joern Knoll  wrote:

> [tplx99]:/the/knoll > echo $((0123))
> 83
> [tplx99]:/the/knoll > echo $((123))
> 123
> [tplx99]:/the/knoll > echo $((01234))
> 668
> [tplx99]:/the/knoll > echo $((1234))
> 1234

If you want to force base 10 interpretation (remember that leading 0 mean
octal in arithmetic context), you need to explicitly tell bash:

$ echo $(( 10#0123 ))
123

--
D.



Re: quote removal issues within character class

2019-11-09 Thread L A Walsh
On 2019/11/09 04:49, Robert Elz wrote:
> There's also
>
>   The special characters '.', '*', '[', and '\\'
>   (, , , and ,
>   respectively) shall lose their special meaning within a bracket
>   expression.
>   

Is this really what the standard says, because '\\' is not a character, but
2 characters.  They could use "\\" but if a backslash is between single
quotes, it loses its special meaning.  The only way to get a backslash
when using single quotes that I've found is to end the single-quote
then use the backslash.  So if you wanted to insert single quotes in
a string that is single-quoted, you would have to do this:

'this is a single-quote(SQ) quoted string using a SQ ('\'') within
the single quote.' 

Alternatively using double-quotes or another quoting
mechanism might be preferable.

I find it odd that the standard would try to use a backslash within
a SQ'd string as a literalizer.





Re: quote removal issues within character class

2019-11-09 Thread Andreas Schwab
On Nov 09 2019, L A Walsh wrote:

> On 2019/11/09 04:49, Robert Elz wrote:
>> There's also
>>
>>  The special characters '.', '*', '[', and '\\'
>>  (, , , and ,
>>  respectively) shall lose their special meaning within a bracket
>>  expression.
>>   
> 
> Is this really what the standard says, because '\\' is not a character, but
> 2 characters.  They could use "\\" but if a backslash is between single
> quotes, it loses its special meaning.

This is C notation, not shell notation.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."