Re: out-of-bounds numbers in shell utility arguments

2023-06-27 Thread Thorsten Glaser via austin-group-l at The Open Group
Robert Elz via austin-group-l at The Open Group dixit:

>And of course, that means that even though the >> operator is in Table 1-2
>as one that must be supported, it cannot actually work, as >> is unspecified
>(or even undefined, I forget) on signed values, and POSIX sh arithmetic only

No, only for negative signed values and in some other cases,
as Gabriel Ravier mentioned.

>of signed.   I suspect other shells might do the same.

As for mksh, I got so fed up with UB that I calculate all
operations in unsigned. The “strict POSIX” one (usually
shipped as lksh (built with the Build.sh -L flag), usually
built with -DMKSH_BINSHPOSIX and symlinkable to /bin/sh)
uses the long data type for that, the “proper mksh” one
(usually shipped as mksh) has guaranteed 32-bit arithmetics,
guaranteed 2s complement (though POSIX guarantees that for
the C signed long as well, thankfully), and the shell has
some extra operations (e.g. rotate) there. I plan on adding
a bigint mechanism eventually, to make up for the fact that
it’s limited to 32 bits normally (relying on “long” which
has diverging sizes, making $((1<<31+1)) UB on ILP32, is too
unsafe in my eyes, but POSIX demands it so a (currently)
separate binary does it).

In most cases, I do the operations as unsigned; this works
well for addition, subtraction, even multiplication if 2s
complement and wraparound can be assumed, for division and
modulo I do them by hand on the magnitudes then deal with
the signs later so it’s actually defined for negative values,
etc.

It’s still a work in progress, not yet perfect, but I’ve
extracted the workings into macros, with a testsuite. If
things work out, the use of long can be made a runtime,
not compile-time, decision eventually, too.

mksh also has an “unsigned arithmetics” extension: if the
$(( or ksh-style (( is immediately followed by # the expression
is evaluated in unsigned (using the 2s complement representation
of the variables used). This is major useful for hashes etc.

bye,
//mirabilos
-- 
08:05⎜ mika: Does grml have an tool to read Apple
 ⎜System Log (asl) files? :)
08:08⎜ yeah. /bin/rm. ;)   08:09⎜ hexdump -C
08:31⎜ ft, mrud: *g*



Re: out-of-bounds numbers in shell utility arguments

2023-06-27 Thread Gabriel Ravier via austin-group-l at The Open Group

On 6/27/23 13:42, Robert Elz via austin-group-l at The Open Group wrote:

 Date:Tue, 27 Jun 2023 09:41:02 +0100
 From:"Geoff Clare via austin-group-l at The Open Group" 

 Message-ID:  

   | Yes, via XCU 1.1.2; the C standard allows it for signed long, so it's
   | allowed for anything that 1.1.2 requires to be "equivalent to the
   | ISO C standard signed long data type".

And of course, that means that even though the >> operator is in Table 1-2
as one that must be supported, it cannot actually work, as >> is unspecified
(or even undefined, I forget) on signed values, and POSIX sh arithmetic only
allows for signed values.   << may have similar issues (at least some compilers
are starting to complain about the use of << with a signed left operand, which
I am guessing means at least some version of the C standard has made that be
unspecified/undefined as well).

The implementation I work with ignores that, and when an operation works
better with unsigned operands, it simply treats them as unsigned instead
of signed.   I suspect other shells might do the same.

kre

> that means that even though the >> operator is in Table 1-2 as one 
that must be supported, it cannot actually work, as >> is unspecified 
(or even undefined, I forget) on signed values, and POSIX sh arithmetic 
only allows for signed values


This certainly doesn't apply for *all* shift operations involving signed 
types, only specific ones, in particular:
- when the right operand is negative or >= to the width of the promoted 
left operand, the behavior is undefined
- when the left operand of << is signed and either it is negative or the 
result of the operation (without any wrap-around or anything like that) 
is not representable in the result type, the behavior is undefined
- when the right operand of >> is a negative (and thus also signed) 
number, the result is implementation-defined




Re: out-of-bounds numbers in shell utility arguments

2023-06-27 Thread Robert Elz via austin-group-l at The Open Group
Date:Tue, 27 Jun 2023 09:41:02 +0100
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  

  | Yes, via XCU 1.1.2; the C standard allows it for signed long, so it's
  | allowed for anything that 1.1.2 requires to be "equivalent to the
  | ISO C standard signed long data type".

And of course, that means that even though the >> operator is in Table 1-2
as one that must be supported, it cannot actually work, as >> is unspecified
(or even undefined, I forget) on signed values, and POSIX sh arithmetic only
allows for signed values.   << may have similar issues (at least some compilers
are starting to complain about the use of << with a signed left operand, which
I am guessing means at least some version of the C standard has made that be
unspecified/undefined as well).

The implementation I work with ignores that, and when an operation works
better with unsigned operands, it simply treats them as unsigned instead
of signed.   I suspect other shells might do the same.

kre



Re: out-of-bounds numbers in shell utility arguments

2023-06-27 Thread Geoff Clare via austin-group-l at The Open Group
Thorsten Glaser wrote, on 26 Jun 2023:
>
> Geoff Clare via austin-group-l at The Open Group dixit:
> 
> >XCU 1.1.2 relates to utilities that "perform complex data manipulation
> >using their own procedure and arithmetic languages".  So it applies to
> >shell arithmetic expansion, but isn't really relevant to simple
> >argument parsing by a utility.
> 
[...]
> 
> Can you confirm (for the sake of completeness) that wraparound
> is ok for shell arithmetics in POSIX mode?

Yes, via XCU 1.1.2; the C standard allows it for signed long, so it's
allowed for anything that 1.1.2 requires to be "equivalent to the
ISO C standard signed long data type".

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: out-of-bounds numbers in shell utility arguments

2023-06-26 Thread Thorsten Glaser via austin-group-l at The Open Group
Geoff Clare via austin-group-l at The Open Group dixit:

>XCU 1.1.2 relates to utilities that "perform complex data manipulation
>using their own procedure and arithmetic languages".  So it applies to
>shell arithmetic expansion, but isn't really relevant to simple
>argument parsing by a utility.  For that, the relevant text is in XBD
>12.1 Utility Argument Syntax, item 6.

Ah okay.

>It's somewhere between the second and third. It's unspecified whether
>the utility will report an error, but if it doesn't then it has to
>handle the value correctly, i.e. test 2 -lt  must
>exit with status 0 or status >1; it must not exit with status 1.

OK, so wraparound behaviour is not allowed for utilities like test(1),
and, unless I do bignums, it must return an error.

Thank you.

Can you confirm (for the sake of completeness) that wraparound
is ok for shell arithmetics in POSIX mode?

bye,
//mirabilos
-- 
[16:04:33] bkix: "veni vidi violini"
[16:04:45] bkix: "ich kam, sah und vergeigte"...



Re: out-of-bounds numbers in shell utility arguments

2023-06-26 Thread Geoff Clare via austin-group-l at The Open Group
Thorsten Glaser wrote, on 24 Jun 2023:
> 
> what’s the POSIX mode behaviour expected when scripts attempt to
> use overlong numbers in arguments e.g. to utilities (but possibly
> anywhere in XSH)?
> 
> Say a script has on a 64-bit system:
> 
> test 2 -lt 
> 
> I found “1.1.2 Concepts Derived from the ISO C Standard” in XSH
> Introduction, but that just says it should be signed long.

XCU 1.1.2 relates to utilities that "perform complex data manipulation
using their own procedure and arithmetic languages".  So it applies to
shell arithmetic expansion, but isn't really relevant to simple
argument parsing by a utility.  For that, the relevant text is in XBD
12.1 Utility Argument Syntax, item 6.  This specifies ranges that must
be "syntactically recognized as numeric values" and then says "Ranges
greater than those listed here are allowed."

So the allowed behaviours are that either the utility syntactically
recognises the argument as a numeric value or it doesn't.

If it doesn't, then it must report this as a syntax error.
If it does, then its behaviour must be as described by the standard
for the value that was recognised.

> So, is it:
> 
> • application error (the script writer is at fault, and the shell
>   can do what it wants but should be consistent)
> 
> • unspecified behaviour (the shell can do as it wants but should
>   be consistent); I really hope not C-level UB
> 
> • the utility or shell must detect this, while parsing the argument
>   as number, erroring out
> 
> I’d hope for one of the first two because having wraparound semantics
> is one of the guarantees for script writers I have in mksh for shell
> arithmetics (not yet explicitly in the test(1) builtin).

It's somewhere between the second and third. It's unspecified whether
the utility will report an error, but if it doesn't then it has to
handle the value correctly, i.e. test 2 -lt  must
exit with status 0 or status >1; it must not exit with status 1.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England