Re: [PATCH v2 2/2] xen: Fix latent check-endbr.sh bug with 32bit build environments

2022-07-18 Thread Jan Beulich
On 18.07.2022 14:07, Andrew Cooper wrote:
> On 18/07/2022 10:49, Jan Beulich wrote:
>> On 18.07.2022 11:31, Andrew Cooper wrote:
>>> On 18/07/2022 10:11, Jan Beulich wrote:
 On 15.07.2022 15:26, Andrew Cooper wrote:
> --- a/xen/tools/check-endbr.sh
> +++ b/xen/tools/check-endbr.sh
> @@ -61,19 +61,36 @@ ${OBJDUMP} -j .text $1 -d -w | grep ' endbr64 *$' | 
> cut -f 1 -d ':' > $VALID &
>  #the lower bits, rounding integers to the nearest 4k.
>  #
>  #Instead, use the fact that Xen's .text is within a 1G aligned 
> region, and
> -#split the VMA in half so AWK's numeric addition is only working on 
> 32 bit
> -#numbers, which don't lose precision.
> +#split the VMA so AWK's numeric addition is only working on <32 bit
> +#numbers, which don't lose precision.  (See point 5)
>  #
>  # 4) MAWK doesn't support plain hex constants (an optional part of the 
> POSIX
>  #spec), and GAWK and MAWK can't agree on how to work with hex 
> constants in
>  #a string.  Use the shell to convert $vma_lo to decimal before 
> passing to
>  #AWK.
>  #
> +# 5) Point 4 isn't fully portable.  POSIX only requires that $((0xN)) be
> +#evaluated as long, which in 32bit shells turns negative if bit 31 
> of the
> +#VMA is set.  AWK then interprets this negative number as a double 
> before
> +#adding the offsets from the binary grep.
> +#
> +#Instead of doing an 8/8 split with vma_hi/lo, do a 9/7 split.
> +#
> +#The consequence of this is that for all offsets, $vma_lo + offset 
> needs
> +#to be less that 256M (i.e. 7 nibbles) so as to be successfully 
> recombined
> +#with the 9 nibbles of $vma_hi.  This is fine; .text is at the start 
> of a
> +#1G aligned region, and Xen is far far smaller than 256M, but leave 
> safety
> +#check nevertheless.
> +#
>  eval $(${OBJDUMP} -j .text $1 -h |
> -$AWK '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1, 
> 8), substr($4, 9, 16)}')
> +$AWK '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1, 
> 9), substr($4, 10, 16)}')
>  
>  ${OBJCOPY} -j .text $1 -O binary $TEXT_BIN
>  
> +bin_sz=$(stat -c '%s' $TEXT_BIN)
> +[ "$bin_sz" -ge $(((1 << 28) - $vma_lo)) ] &&
> +{ echo "$MSG_PFX Error: .text offsets can exceed 256M" >&2; exit 1; }
 ... s/can/cannot/ ?
>>> Why?  "Can" is correct here.  If the offsets can't exceed 256M, then
>>> everything is good.
>> Hmm, the wording then indeed is ambiguous.
> 
> I see your point.  In this case it's meant as "are able to", but this is
> still clearer than using "can't" because at least the text matches the
> check which triggered it.
> 
>>  I read "can" as "are allowed
>> to", when we mean "aren't allowed to". Maybe ".text is 256M or more in
>> size"? If you mention "offsets", then I think the check should be based
>> on actually observing an offset which is too large (which .text size
>> alone doesn't guarantee will happen).
> 
> It's not just .text on its own because the VMA of offset by 2M, hence
> the subtraction of $vma_lo in the main calculation.
> 
> There's no point searching for offsets.  There will be one near the end,
> so all searching for an offset would do is complicate the critical loop.
> 
> How about ".text offsets must not exceed 256M" ?
> 
> That should be unambiguous.

Yes, that reads fine. Thanks.

Jan



Re: [PATCH v2 2/2] xen: Fix latent check-endbr.sh bug with 32bit build environments

2022-07-18 Thread Andrew Cooper
On 18/07/2022 10:49, Jan Beulich wrote:
> On 18.07.2022 11:31, Andrew Cooper wrote:
>> On 18/07/2022 10:11, Jan Beulich wrote:
>>> On 15.07.2022 15:26, Andrew Cooper wrote:
 --- a/xen/tools/check-endbr.sh
 +++ b/xen/tools/check-endbr.sh
 @@ -61,19 +61,36 @@ ${OBJDUMP} -j .text $1 -d -w | grep '  endbr64 *$' | 
 cut -f 1 -d ':' > $VALID &
  #the lower bits, rounding integers to the nearest 4k.
  #
  #Instead, use the fact that Xen's .text is within a 1G aligned 
 region, and
 -#split the VMA in half so AWK's numeric addition is only working on 
 32 bit
 -#numbers, which don't lose precision.
 +#split the VMA so AWK's numeric addition is only working on <32 bit
 +#numbers, which don't lose precision.  (See point 5)
  #
  # 4) MAWK doesn't support plain hex constants (an optional part of the 
 POSIX
  #spec), and GAWK and MAWK can't agree on how to work with hex 
 constants in
  #a string.  Use the shell to convert $vma_lo to decimal before 
 passing to
  #AWK.
  #
 +# 5) Point 4 isn't fully portable.  POSIX only requires that $((0xN)) be
 +#evaluated as long, which in 32bit shells turns negative if bit 31 of 
 the
 +#VMA is set.  AWK then interprets this negative number as a double 
 before
 +#adding the offsets from the binary grep.
 +#
 +#Instead of doing an 8/8 split with vma_hi/lo, do a 9/7 split.
 +#
 +#The consequence of this is that for all offsets, $vma_lo + offset 
 needs
 +#to be less that 256M (i.e. 7 nibbles) so as to be successfully 
 recombined
 +#with the 9 nibbles of $vma_hi.  This is fine; .text is at the start 
 of a
 +#1G aligned region, and Xen is far far smaller than 256M, but leave 
 safety
 +#check nevertheless.
 +#
  eval $(${OBJDUMP} -j .text $1 -h |
 -$AWK '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1, 
 8), substr($4, 9, 16)}')
 +$AWK '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1, 
 9), substr($4, 10, 16)}')
  
  ${OBJCOPY} -j .text $1 -O binary $TEXT_BIN
  
 +bin_sz=$(stat -c '%s' $TEXT_BIN)
 +[ "$bin_sz" -ge $(((1 << 28) - $vma_lo)) ] &&
 +{ echo "$MSG_PFX Error: .text offsets can exceed 256M" >&2; exit 1; }
>>> ... s/can/cannot/ ?
>> Why?  "Can" is correct here.  If the offsets can't exceed 256M, then
>> everything is good.
> Hmm, the wording then indeed is ambiguous.

I see your point.  In this case it's meant as "are able to", but this is
still clearer than using "can't" because at least the text matches the
check which triggered it.

>  I read "can" as "are allowed
> to", when we mean "aren't allowed to". Maybe ".text is 256M or more in
> size"? If you mention "offsets", then I think the check should be based
> on actually observing an offset which is too large (which .text size
> alone doesn't guarantee will happen).

It's not just .text on its own because the VMA of offset by 2M, hence
the subtraction of $vma_lo in the main calculation.

There's no point searching for offsets.  There will be one near the end,
so all searching for an offset would do is complicate the critical loop.

How about ".text offsets must not exceed 256M" ?

That should be unambiguous.

~Andrew


Re: [PATCH v2 2/2] xen: Fix latent check-endbr.sh bug with 32bit build environments

2022-07-18 Thread Jan Beulich
On 18.07.2022 11:31, Andrew Cooper wrote:
> On 18/07/2022 10:11, Jan Beulich wrote:
>> On 15.07.2022 15:26, Andrew Cooper wrote:
>>> --- a/xen/tools/check-endbr.sh
>>> +++ b/xen/tools/check-endbr.sh
>>> @@ -61,19 +61,36 @@ ${OBJDUMP} -j .text $1 -d -w | grep '   endbr64 *$' | 
>>> cut -f 1 -d ':' > $VALID &
>>>  #the lower bits, rounding integers to the nearest 4k.
>>>  #
>>>  #Instead, use the fact that Xen's .text is within a 1G aligned region, 
>>> and
>>> -#split the VMA in half so AWK's numeric addition is only working on 32 
>>> bit
>>> -#numbers, which don't lose precision.
>>> +#split the VMA so AWK's numeric addition is only working on <32 bit
>>> +#numbers, which don't lose precision.  (See point 5)
>>>  #
>>>  # 4) MAWK doesn't support plain hex constants (an optional part of the 
>>> POSIX
>>>  #spec), and GAWK and MAWK can't agree on how to work with hex 
>>> constants in
>>>  #a string.  Use the shell to convert $vma_lo to decimal before passing 
>>> to
>>>  #AWK.
>>>  #
>>> +# 5) Point 4 isn't fully portable.  POSIX only requires that $((0xN)) be
>>> +#evaluated as long, which in 32bit shells turns negative if bit 31 of 
>>> the
>>> +#VMA is set.  AWK then interprets this negative number as a double 
>>> before
>>> +#adding the offsets from the binary grep.
>>> +#
>>> +#Instead of doing an 8/8 split with vma_hi/lo, do a 9/7 split.
>>> +#
>>> +#The consequence of this is that for all offsets, $vma_lo + offset 
>>> needs
>>> +#to be less that 256M (i.e. 7 nibbles) so as to be successfully 
>>> recombined
>>> +#with the 9 nibbles of $vma_hi.  This is fine; .text is at the start 
>>> of a
>>> +#1G aligned region, and Xen is far far smaller than 256M, but leave 
>>> safety
>>> +#check nevertheless.
>>> +#
>>>  eval $(${OBJDUMP} -j .text $1 -h |
>>> -$AWK '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1, 
>>> 8), substr($4, 9, 16)}')
>>> +$AWK '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1, 
>>> 9), substr($4, 10, 16)}')
>>>  
>>>  ${OBJCOPY} -j .text $1 -O binary $TEXT_BIN
>>>  
>>> +bin_sz=$(stat -c '%s' $TEXT_BIN)
>>> +[ "$bin_sz" -ge $(((1 << 28) - $vma_lo)) ] &&
>>> +{ echo "$MSG_PFX Error: .text offsets can exceed 256M" >&2; exit 1; }
>> ... s/can/cannot/ ?
> 
> Why?  "Can" is correct here.  If the offsets can't exceed 256M, then
> everything is good.

Hmm, the wording then indeed is ambiguous. I read "can" as "are allowed
to", when we mean "aren't allowed to". Maybe ".text is 256M or more in
size"? If you mention "offsets", then I think the check should be based
on actually observing an offset which is too large (which .text size
alone doesn't guarantee will happen).

Jan



Re: [PATCH v2 2/2] xen: Fix latent check-endbr.sh bug with 32bit build environments

2022-07-18 Thread Andrew Cooper
On 18/07/2022 10:11, Jan Beulich wrote:
> On 15.07.2022 15:26, Andrew Cooper wrote:
>> While Xen's current VMA means it works, the mawk fix (i.e. using $((0xN)) in
>> the shell) isn't portable in 32bit shells.  See the code comment for the fix.
>>
>> The fix found a second latent bug.  Recombining $vma_hi/lo should have used
>> printf "%s%08x" and only worked previously because $vma_lo had bits set in
>> it's top nibble.  Combining with the main fix, %08x becomes %07x.
>>
>> Fixes: $XXX patch 1
>> Reported-by: Jan Beulich 
>> Signed-off-by: Andrew Cooper 
> Reviewed-by: Jan Beulich 

Thanks, but...

> with, I guess, ...
>
>> --- a/xen/tools/check-endbr.sh
>> +++ b/xen/tools/check-endbr.sh
>> @@ -61,19 +61,36 @@ ${OBJDUMP} -j .text $1 -d -w | grep 'endbr64 *$' | 
>> cut -f 1 -d ':' > $VALID &
>>  #the lower bits, rounding integers to the nearest 4k.
>>  #
>>  #Instead, use the fact that Xen's .text is within a 1G aligned region, 
>> and
>> -#split the VMA in half so AWK's numeric addition is only working on 32 
>> bit
>> -#numbers, which don't lose precision.
>> +#split the VMA so AWK's numeric addition is only working on <32 bit
>> +#numbers, which don't lose precision.  (See point 5)
>>  #
>>  # 4) MAWK doesn't support plain hex constants (an optional part of the POSIX
>>  #spec), and GAWK and MAWK can't agree on how to work with hex constants 
>> in
>>  #a string.  Use the shell to convert $vma_lo to decimal before passing 
>> to
>>  #AWK.
>>  #
>> +# 5) Point 4 isn't fully portable.  POSIX only requires that $((0xN)) be
>> +#evaluated as long, which in 32bit shells turns negative if bit 31 of 
>> the
>> +#VMA is set.  AWK then interprets this negative number as a double 
>> before
>> +#adding the offsets from the binary grep.
>> +#
>> +#Instead of doing an 8/8 split with vma_hi/lo, do a 9/7 split.
>> +#
>> +#The consequence of this is that for all offsets, $vma_lo + offset needs
>> +#to be less that 256M (i.e. 7 nibbles) so as to be successfully 
>> recombined
>> +#with the 9 nibbles of $vma_hi.  This is fine; .text is at the start of 
>> a
>> +#1G aligned region, and Xen is far far smaller than 256M, but leave 
>> safety
>> +#check nevertheless.
>> +#
>>  eval $(${OBJDUMP} -j .text $1 -h |
>> -$AWK '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1, 8), 
>> substr($4, 9, 16)}')
>> +$AWK '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1, 9), 
>> substr($4, 10, 16)}')
>>  
>>  ${OBJCOPY} -j .text $1 -O binary $TEXT_BIN
>>  
>> +bin_sz=$(stat -c '%s' $TEXT_BIN)
>> +[ "$bin_sz" -ge $(((1 << 28) - $vma_lo)) ] &&
>> +{ echo "$MSG_PFX Error: .text offsets can exceed 256M" >&2; exit 1; }
> ... s/can/cannot/ ?

Why?  "Can" is correct here.  If the offsets can't exceed 256M, then
everything is good.

~Andrew


Re: [PATCH v2 2/2] xen: Fix latent check-endbr.sh bug with 32bit build environments

2022-07-18 Thread Jan Beulich
On 15.07.2022 15:26, Andrew Cooper wrote:
> While Xen's current VMA means it works, the mawk fix (i.e. using $((0xN)) in
> the shell) isn't portable in 32bit shells.  See the code comment for the fix.
> 
> The fix found a second latent bug.  Recombining $vma_hi/lo should have used
> printf "%s%08x" and only worked previously because $vma_lo had bits set in
> it's top nibble.  Combining with the main fix, %08x becomes %07x.
> 
> Fixes: $XXX patch 1
> Reported-by: Jan Beulich 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Jan Beulich 
with, I guess, ...

> --- a/xen/tools/check-endbr.sh
> +++ b/xen/tools/check-endbr.sh
> @@ -61,19 +61,36 @@ ${OBJDUMP} -j .text $1 -d -w | grep ' endbr64 *$' | 
> cut -f 1 -d ':' > $VALID &
>  #the lower bits, rounding integers to the nearest 4k.
>  #
>  #Instead, use the fact that Xen's .text is within a 1G aligned region, 
> and
> -#split the VMA in half so AWK's numeric addition is only working on 32 
> bit
> -#numbers, which don't lose precision.
> +#split the VMA so AWK's numeric addition is only working on <32 bit
> +#numbers, which don't lose precision.  (See point 5)
>  #
>  # 4) MAWK doesn't support plain hex constants (an optional part of the POSIX
>  #spec), and GAWK and MAWK can't agree on how to work with hex constants 
> in
>  #a string.  Use the shell to convert $vma_lo to decimal before passing to
>  #AWK.
>  #
> +# 5) Point 4 isn't fully portable.  POSIX only requires that $((0xN)) be
> +#evaluated as long, which in 32bit shells turns negative if bit 31 of the
> +#VMA is set.  AWK then interprets this negative number as a double before
> +#adding the offsets from the binary grep.
> +#
> +#Instead of doing an 8/8 split with vma_hi/lo, do a 9/7 split.
> +#
> +#The consequence of this is that for all offsets, $vma_lo + offset needs
> +#to be less that 256M (i.e. 7 nibbles) so as to be successfully 
> recombined
> +#with the 9 nibbles of $vma_hi.  This is fine; .text is at the start of a
> +#1G aligned region, and Xen is far far smaller than 256M, but leave 
> safety
> +#check nevertheless.
> +#
>  eval $(${OBJDUMP} -j .text $1 -h |
> -$AWK '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1, 8), 
> substr($4, 9, 16)}')
> +$AWK '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1, 9), 
> substr($4, 10, 16)}')
>  
>  ${OBJCOPY} -j .text $1 -O binary $TEXT_BIN
>  
> +bin_sz=$(stat -c '%s' $TEXT_BIN)
> +[ "$bin_sz" -ge $(((1 << 28) - $vma_lo)) ] &&
> +{ echo "$MSG_PFX Error: .text offsets can exceed 256M" >&2; exit 1; }

... s/can/cannot/ ?

Jan



[PATCH v2 2/2] xen: Fix latent check-endbr.sh bug with 32bit build environments

2022-07-15 Thread Andrew Cooper
While Xen's current VMA means it works, the mawk fix (i.e. using $((0xN)) in
the shell) isn't portable in 32bit shells.  See the code comment for the fix.

The fix found a second latent bug.  Recombining $vma_hi/lo should have used
printf "%s%08x" and only worked previously because $vma_lo had bits set in
it's top nibble.  Combining with the main fix, %08x becomes %07x.

Fixes: $XXX patch 1
Reported-by: Jan Beulich 
Signed-off-by: Andrew Cooper 
---
CC: George Dunlap 
CC: Jan Beulich 
CC: Stefano Stabellini 
CC: Wei Liu 
CC: Julien Grall 
CC: Anthony PERARD 
CC: Luca Fancellu 
CC: Mathieu Tarral 
CC: Bertrand Marquis 

v2:
 * New
---
 xen/tools/check-endbr.sh | 25 +
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/xen/tools/check-endbr.sh b/xen/tools/check-endbr.sh
index b3febd6a4ccc..d6aa117de13b 100755
--- a/xen/tools/check-endbr.sh
+++ b/xen/tools/check-endbr.sh
@@ -61,19 +61,36 @@ ${OBJDUMP} -j .text $1 -d -w | grep '   endbr64 *$' | 
cut -f 1 -d ':' > $VALID &
 #the lower bits, rounding integers to the nearest 4k.
 #
 #Instead, use the fact that Xen's .text is within a 1G aligned region, and
-#split the VMA in half so AWK's numeric addition is only working on 32 bit
-#numbers, which don't lose precision.
+#split the VMA so AWK's numeric addition is only working on <32 bit
+#numbers, which don't lose precision.  (See point 5)
 #
 # 4) MAWK doesn't support plain hex constants (an optional part of the POSIX
 #spec), and GAWK and MAWK can't agree on how to work with hex constants in
 #a string.  Use the shell to convert $vma_lo to decimal before passing to
 #AWK.
 #
+# 5) Point 4 isn't fully portable.  POSIX only requires that $((0xN)) be
+#evaluated as long, which in 32bit shells turns negative if bit 31 of the
+#VMA is set.  AWK then interprets this negative number as a double before
+#adding the offsets from the binary grep.
+#
+#Instead of doing an 8/8 split with vma_hi/lo, do a 9/7 split.
+#
+#The consequence of this is that for all offsets, $vma_lo + offset needs
+#to be less that 256M (i.e. 7 nibbles) so as to be successfully recombined
+#with the 9 nibbles of $vma_hi.  This is fine; .text is at the start of a
+#1G aligned region, and Xen is far far smaller than 256M, but leave safety
+#check nevertheless.
+#
 eval $(${OBJDUMP} -j .text $1 -h |
-$AWK '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1, 8), 
substr($4, 9, 16)}')
+$AWK '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1, 9), 
substr($4, 10, 16)}')
 
 ${OBJCOPY} -j .text $1 -O binary $TEXT_BIN
 
+bin_sz=$(stat -c '%s' $TEXT_BIN)
+[ "$bin_sz" -ge $(((1 << 28) - $vma_lo)) ] &&
+{ echo "$MSG_PFX Error: .text offsets can exceed 256M" >&2; exit 1; }
+
 # instruction:hex:   oct:
 # endbr64 f3 0f 1e fa363 017 036 372
 # endbr32 f3 0f 1e fb363 017 036 373
@@ -84,7 +101,7 @@ then
 else
 grep -aob -e "$(printf '\363\17\36\372')" -e "$(printf '\363\17\36\373')" \
  -e "$(printf '\146\17\37\1')" $TEXT_BIN
-fi | $AWK -F':' '{printf "%s%x\n", "'$vma_hi'", int('$((0x$vma_lo))') + $1}' > 
$ALL
+fi | $AWK -F':' '{printf "%s%07x\n", "'$vma_hi'", int('$((0x$vma_lo))') + $1}' 
> $ALL
 
 # Wait for $VALID to become complete
 wait
-- 
2.11.0