Re: out-of-bounds numbers in shell utility arguments
Robert Elz via austin-group-l at The Open Group dixit: >And of course, that means that even though the >> operator is in Table 1-2 >as one that must be supported, it cannot actually work, as >> is unspecified >(or even undefined, I forget) on signed values, and POSIX sh arithmetic only No, only for negative signed values and in some other cases, as Gabriel Ravier mentioned. >of signed. I suspect other shells might do the same. As for mksh, I got so fed up with UB that I calculate all operations in unsigned. The “strict POSIX” one (usually shipped as lksh (built with the Build.sh -L flag), usually built with -DMKSH_BINSHPOSIX and symlinkable to /bin/sh) uses the long data type for that, the “proper mksh” one (usually shipped as mksh) has guaranteed 32-bit arithmetics, guaranteed 2s complement (though POSIX guarantees that for the C signed long as well, thankfully), and the shell has some extra operations (e.g. rotate) there. I plan on adding a bigint mechanism eventually, to make up for the fact that it’s limited to 32 bits normally (relying on “long” which has diverging sizes, making $((1<<31+1)) UB on ILP32, is too unsafe in my eyes, but POSIX demands it so a (currently) separate binary does it). In most cases, I do the operations as unsigned; this works well for addition, subtraction, even multiplication if 2s complement and wraparound can be assumed, for division and modulo I do them by hand on the magnitudes then deal with the signs later so it’s actually defined for negative values, etc. It’s still a work in progress, not yet perfect, but I’ve extracted the workings into macros, with a testsuite. If things work out, the use of long can be made a runtime, not compile-time, decision eventually, too. mksh also has an “unsigned arithmetics” extension: if the $(( or ksh-style (( is immediately followed by # the expression is evaluated in unsigned (using the 2s complement representation of the variables used). This is major useful for hashes etc. bye, //mirabilos -- 08:05⎜ mika: Does grml have an tool to read Apple ⎜System Log (asl) files? :) 08:08⎜ yeah. /bin/rm. ;) 08:09⎜ hexdump -C 08:31⎜ ft, mrud: *g*
Re: out-of-bounds numbers in shell utility arguments
On 6/27/23 13:42, Robert Elz via austin-group-l at The Open Group wrote: Date:Tue, 27 Jun 2023 09:41:02 +0100 From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: | Yes, via XCU 1.1.2; the C standard allows it for signed long, so it's | allowed for anything that 1.1.2 requires to be "equivalent to the | ISO C standard signed long data type". And of course, that means that even though the >> operator is in Table 1-2 as one that must be supported, it cannot actually work, as >> is unspecified (or even undefined, I forget) on signed values, and POSIX sh arithmetic only allows for signed values. << may have similar issues (at least some compilers are starting to complain about the use of << with a signed left operand, which I am guessing means at least some version of the C standard has made that be unspecified/undefined as well). The implementation I work with ignores that, and when an operation works better with unsigned operands, it simply treats them as unsigned instead of signed. I suspect other shells might do the same. kre > that means that even though the >> operator is in Table 1-2 as one that must be supported, it cannot actually work, as >> is unspecified (or even undefined, I forget) on signed values, and POSIX sh arithmetic only allows for signed values This certainly doesn't apply for *all* shift operations involving signed types, only specific ones, in particular: - when the right operand is negative or >= to the width of the promoted left operand, the behavior is undefined - when the left operand of << is signed and either it is negative or the result of the operation (without any wrap-around or anything like that) is not representable in the result type, the behavior is undefined - when the right operand of >> is a negative (and thus also signed) number, the result is implementation-defined
Re: out-of-bounds numbers in shell utility arguments
Date:Tue, 27 Jun 2023 09:41:02 +0100 From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: | Yes, via XCU 1.1.2; the C standard allows it for signed long, so it's | allowed for anything that 1.1.2 requires to be "equivalent to the | ISO C standard signed long data type". And of course, that means that even though the >> operator is in Table 1-2 as one that must be supported, it cannot actually work, as >> is unspecified (or even undefined, I forget) on signed values, and POSIX sh arithmetic only allows for signed values. << may have similar issues (at least some compilers are starting to complain about the use of << with a signed left operand, which I am guessing means at least some version of the C standard has made that be unspecified/undefined as well). The implementation I work with ignores that, and when an operation works better with unsigned operands, it simply treats them as unsigned instead of signed. I suspect other shells might do the same. kre
Re: out-of-bounds numbers in shell utility arguments
Thorsten Glaser wrote, on 26 Jun 2023: > > Geoff Clare via austin-group-l at The Open Group dixit: > > >XCU 1.1.2 relates to utilities that "perform complex data manipulation > >using their own procedure and arithmetic languages". So it applies to > >shell arithmetic expansion, but isn't really relevant to simple > >argument parsing by a utility. > [...] > > Can you confirm (for the sake of completeness) that wraparound > is ok for shell arithmetics in POSIX mode? Yes, via XCU 1.1.2; the C standard allows it for signed long, so it's allowed for anything that 1.1.2 requires to be "equivalent to the ISO C standard signed long data type". -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: out-of-bounds numbers in shell utility arguments
Geoff Clare via austin-group-l at The Open Group dixit: >XCU 1.1.2 relates to utilities that "perform complex data manipulation >using their own procedure and arithmetic languages". So it applies to >shell arithmetic expansion, but isn't really relevant to simple >argument parsing by a utility. For that, the relevant text is in XBD >12.1 Utility Argument Syntax, item 6. Ah okay. >It's somewhere between the second and third. It's unspecified whether >the utility will report an error, but if it doesn't then it has to >handle the value correctly, i.e. test 2 -lt must >exit with status 0 or status >1; it must not exit with status 1. OK, so wraparound behaviour is not allowed for utilities like test(1), and, unless I do bignums, it must return an error. Thank you. Can you confirm (for the sake of completeness) that wraparound is ok for shell arithmetics in POSIX mode? bye, //mirabilos -- [16:04:33] bkix: "veni vidi violini" [16:04:45] bkix: "ich kam, sah und vergeigte"...
Re: out-of-bounds numbers in shell utility arguments
Thorsten Glaser wrote, on 24 Jun 2023: > > what’s the POSIX mode behaviour expected when scripts attempt to > use overlong numbers in arguments e.g. to utilities (but possibly > anywhere in XSH)? > > Say a script has on a 64-bit system: > > test 2 -lt > > I found “1.1.2 Concepts Derived from the ISO C Standard” in XSH > Introduction, but that just says it should be signed long. XCU 1.1.2 relates to utilities that "perform complex data manipulation using their own procedure and arithmetic languages". So it applies to shell arithmetic expansion, but isn't really relevant to simple argument parsing by a utility. For that, the relevant text is in XBD 12.1 Utility Argument Syntax, item 6. This specifies ranges that must be "syntactically recognized as numeric values" and then says "Ranges greater than those listed here are allowed." So the allowed behaviours are that either the utility syntactically recognises the argument as a numeric value or it doesn't. If it doesn't, then it must report this as a syntax error. If it does, then its behaviour must be as described by the standard for the value that was recognised. > So, is it: > > • application error (the script writer is at fault, and the shell > can do what it wants but should be consistent) > > • unspecified behaviour (the shell can do as it wants but should > be consistent); I really hope not C-level UB > > • the utility or shell must detect this, while parsing the argument > as number, erroring out > > I’d hope for one of the first two because having wraparound semantics > is one of the guarantees for script writers I have in mksh for shell > arithmetics (not yet explicitly in the test(1) builtin). It's somewhere between the second and third. It's unspecified whether the utility will report an error, but if it doesn't then it has to handle the value correctly, i.e. test 2 -lt must exit with status 0 or status >1; it must not exit with status 1. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England