Re: [PATCH v2] ARM: Block predication on atomics [PR111235]

2023-10-20 Thread Richard Earnshaw




On 02/10/2023 18:12, Wilco Dijkstra wrote:

Hi Ramana,


I used --target=arm-none-linux-gnueabihf --host=arm-none-linux-gnueabihf
--build=arm-none-linux-gnueabihf --with-float=hard. However it seems that the
default armhf settings are incorrect. I shouldn't need the --with-float=hard 
since
that is obviously implied by armhf, and they should also imply armv7-a with 
vfpv3
according to documentation. It seems to get confused and skip some tests. I 
tried
using --with-fpu=auto, but that doesn't work at all, so in the end I forced it 
like:
--with-arch=armv8-a --with-fpu=neon-fp-armv8. With this it runs a few more 
tests.


Yeah that's a wart that I don't like.

armhf just implies the hard float ABI and came into being to help
distinguish from the Base PCS for some of the distros at the time
(2010s). However we didn't want to set a baseline arch at that time
given the imminent arrival of v8-a and thus the specification of
--with-arch , --with-fpu and --with-float became second nature to many
of us working on it at that time.


Looking at it, the default is indeed incorrect, you get:
'-mcpu=arm10e' '-mfloat-abi=hard' '-marm' '-march=armv5te+fp'
That's not incorrect.  It's the first version of the architecture that 
can support the hard-float ABI.




That's like 25 years out of date!


It's not a matter of being out of date (and it's only 22 years since 
arm1020e was announced ;) it's a matter of being as compatible as we can 
be with existing hardware out-of-the-box.  Distros are free, of course, 
to set a higher bar and do so.




However all the armhf distros have Armv7-a as the baseline and use Thumb-2:
'-mfloat-abi=hard' '-mthumb' '-march=armv7-a+fp'


Wrong.  Rawhide uses Arm state (or it did last I checked).  As I 
mentioned above, distros are free to set a higher bar.




So the issue is that dg-require-effective-target arm_arch_v7a_ok doesn't work on
armhf. It seems that if you specify an architecture even with hard-float 
configured,
it turns off FP and then complains because hard-float implies you must have 
FP...


OK, I think I see the problem there, it's in the data for
proc add_options_for_arm_arch_FUNC

in lib/target-supports.exp.  In order to work correctly with -mfpu=auto, 
the -march flags in the table need "+fp" adding in most cases (pretty 
much everything from armv5e onwards) - that's harmless whenever the 
float-abi is soft, but should do the right thing when softfp or hard are 
used.




So in most configurations Iincluding the one used by distro compilers) we 
basically
skip lots of tests for no apparent reason...


Ok, thanks for promising to do so - I trust you to get it done. Please
try out various combinations of -march v7ve, v7-a , v8-a with the tool
as each of them have slightly different rules. For instance v7ve
allows LDREXD and STREXD to be single copy atomic for 64 bit loads
whereas v7-a did not .


You mean LDRD may be generated on CPUs with LPAE. We use LDREXD by
default since that is always atomic on v7-a.


Ok if no regressions but as you might get nagged by the post commit CI ...


Thanks, I've committed it. Those links don't show anything concrete, however I 
do note
the CI didn't pick up v2.

Btw you're happy with backports if there are no issues reported for a few days?

Cheers,
Wilco


R.


Re: [PATCH v2] ARM: Block predication on atomics [PR111235]

2023-10-03 Thread Maxim Kuvyrkov
> On Oct 1, 2023, at 00:36, Ramana Radhakrishnan  
> wrote:
> 
> + linaro-toolchain as I don't understand the CI issues on patchwork.
> 
> 
...
> Ok if no regressions but as you might get nagged by the post commit CI ...

I don't see any pre-commit failures for this patch, but regardless of what 
results are for pre-commit CI, there's always a chance to identify problems in 
post-commit CI -- simply because we test wa-a-ay more configurations in 
post-commit CI than in pre-commit CI.

> 
> While it is not policy yet to look at these bots but given the
> enthusiasm at Cauldron for patchwork and pre-commit CI and because all
> my armhf boxes are boxed up, I decided to do something a bit novel !
> 
> I tried reviewing this via patchwork
> 
> https://patchwork.sourceware.org/project/gcc/patch/pawpr08mb8982a6aa40749b74cad14c5783...@pawpr08mb8982.eurprd08.prod.outlook.com/
> 
> and notice that
> 
> https://ci.linaro.org/job/tcwg_gcc_build--master-arm-precommit/2393/artifact/artifacts/artifacts.precommit/notify/mail-body.txt
> says nothing could be built.

Um, no.  This says ...
===
Results changed to
# reset_artifacts:
-10
# true:
0
# build_abe gcc:
1

From
# reset_artifacts:
-10
# true:
0
# build_abe gcc:
1
===
... i.e., build succeeded both before and after patch.  We'll change the 
boilerplate intro for successful builds from ...
"Dear contributor, our automatic CI has detected problems related to your 
patch(es)."
... to ...
"Dear contributor, you are awesome, no CI failures related to your patch(es)".

One things that is strange -- testsuite builds were not triggered, we have only 
2 reports from build tests, but are missing another 2 reports from testsuite 
tests.

> 
> Possibly worth double checking the status for it being a false
> negative as to why the build failed.

Pre-commit CI is happy with the patch, albeit testsuite checks didn't run for 
some reason.  Regardless, we'll quickly catch and report any fallout in the 
post-commit CI once the patch is merged.

> 
> It was green on patchwork but remembering that Green is not Green for
> CI in patchwork I clicked on the afore mentioned ci.linaro.org link
> and see that it's actually broken.

Unfortunately, I seem to have confused developers about green and red at my 
Cauldron presentation.  "Green/Red" in patchwork mean the usual PASS/FAIL.  
It's only in post-commit CI in jenkins interface green and red mean something 
different.

--
Maxim Kuvyrkov
https://www.linaro.org



Re: [PATCH v2] ARM: Block predication on atomics [PR111235]

2023-10-02 Thread Wilco Dijkstra
Hi Ramana,

>> I used --target=arm-none-linux-gnueabihf --host=arm-none-linux-gnueabihf
>> --build=arm-none-linux-gnueabihf --with-float=hard. However it seems that the
>> default armhf settings are incorrect. I shouldn't need the --with-float=hard 
>> since
>> that is obviously implied by armhf, and they should also imply armv7-a with 
>> vfpv3
>> according to documentation. It seems to get confused and skip some tests. I 
>> tried
>> using --with-fpu=auto, but that doesn't work at all, so in the end I forced 
>> it like:
>> --with-arch=armv8-a --with-fpu=neon-fp-armv8. With this it runs a few more 
>> tests.
> 
> Yeah that's a wart that I don't like.
> 
> armhf just implies the hard float ABI and came into being to help
> distinguish from the Base PCS for some of the distros at the time
> (2010s). However we didn't want to set a baseline arch at that time
> given the imminent arrival of v8-a and thus the specification of
> --with-arch , --with-fpu and --with-float became second nature to many
> of us working on it at that time.

Looking at it, the default is indeed incorrect, you get:
'-mcpu=arm10e' '-mfloat-abi=hard' '-marm' '-march=armv5te+fp'

That's like 25 years out of date!

However all the armhf distros have Armv7-a as the baseline and use Thumb-2:
'-mfloat-abi=hard' '-mthumb' '-march=armv7-a+fp'

So the issue is that dg-require-effective-target arm_arch_v7a_ok doesn't work on
armhf. It seems that if you specify an architecture even with hard-float 
configured,
it turns off FP and then complains because hard-float implies you must have 
FP...

So in most configurations Iincluding the one used by distro compilers) we 
basically
skip lots of tests for no apparent reason...

> Ok, thanks for promising to do so - I trust you to get it done. Please
> try out various combinations of -march v7ve, v7-a , v8-a with the tool
> as each of them have slightly different rules. For instance v7ve
> allows LDREXD and STREXD to be single copy atomic for 64 bit loads
> whereas v7-a did not .

You mean LDRD may be generated on CPUs with LPAE. We use LDREXD by
default since that is always atomic on v7-a.

> Ok if no regressions but as you might get nagged by the post commit CI ...

Thanks, I've committed it. Those links don't show anything concrete, however I 
do note
the CI didn't pick up v2.

Btw you're happy with backports if there are no issues reported for a few days?

Cheers,
Wilco

Re: [PATCH v2] ARM: Block predication on atomics [PR111235]

2023-09-30 Thread Ramana Radhakrishnan
d regressions we have to use 'm' so that an immediate
> offset can be merged into the memory access.

This is because the instructions used here are ldr and str and the use
of the 'm' constraint is considered safe.


>
> >> -  VUNSPEC_LDA  ; Represent a store-register-acquire.
> >> +  VUNSPEC_LDR  ; Represent a load-register-relaxed.
> >> +  VUNSPEC_LDA  ; Represent a load-register-acquire.
> >
> > Nit: LDA before LDR ? Though I suspect this list can be alphabetically
> > ordered at another point of time.
>
> Swapped.

Thanks

>
> > There are new tests added for v7-a , what happens with the output for
> > v8-a and the changes for ldacq and other such instructions ?
>
> v7-a and v8-a generate the same instructions for relaxed load/store.
> The acquire/release versions are identical except they are no longer
> predicated. Basically the new patterns are not only significantly simpler,
> they are now the same between the many ARM/Thumb-2/v7-a/v8-m/v8-a
> combinations, so test coverage is much higher now. This is how these
> patterns should have been designed all along.


>
> v2 follows below.
>
> Cheers,
> Wilco
>
>
> [PATCH v2] ARM: Block predication on atomics [PR111235]
>
> The v7 memory ordering model allows reordering of conditional atomic
> instructions.  To avoid this, make all atomic patterns unconditional.
> Expand atomic loads and stores for all architectures so the memory access
> can be wrapped into an UNSPEC.
>
> gcc/ChangeLog/
> PR target/111235
> * config/arm/constraints.md: Remove Pf constraint.
> * config/arm/sync.md (arm_atomic_load): Add new pattern.
> (arm_atomic_load_acquire): Likewise.
> (arm_atomic_store): Likewise.
> (arm_atomic_store_release): Likewise.
> (atomic_load): Switch patterns to define_expand.
> (atomic_store): Likewise.
> (arm_atomic_loaddi2_ldrd): Remove predication.
> (arm_load_exclusive): Likewise.
> (arm_load_acquire_exclusive): Likewise.
> (arm_load_exclusivesi): Likewise.
> (arm_load_acquire_exclusivesi: Likewise.
> (arm_load_exclusivedi): Likewise.
> (arm_load_acquire_exclusivedi): Likewise.
> (arm_store_exclusive): Likewise.
> (arm_store_release_exclusivedi): Likewise.
> (arm_store_release_exclusive): Likewise.
> * gcc/config/arm/unspecs.md: Add VUNSPEC_LDR and VUNSPEC_STR.
>
> gcc/testsuite/ChangeLog/
> PR target/111235
> * gcc.dg/rtl/arm/stl-cond.c: Remove test.
> * gcc.target/arm/atomic_loaddi_7.c: Fix dmb count.
> * gcc.target/arm/atomic_loaddi_8.c: Likewise.
> * gcc.target/arm/pr111235.c: Add new test.
>
> ---
>
> diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
> index 
> 05a4ebbdd67601d7b92aa44a619d17634cc69f17..d7c4a1b0cd785f276862048005e6cfa57cdcb20d
>  100644
> --- a/gcc/config/arm/constraints.md
> +++ b/gcc/config/arm/constraints.md
> @@ -36,7 +36,7 @@
>  ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
>  ;; in Thumb-2 state: Ha, Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py, Pz, Rd, Rf, Rb, 
> Ra,
>  ;;  Rg, Ri
> -;; in all states: Pf, Pg
> +;; in all states: Pg
>
>  ;; The following memory constraints have been used:
>  ;; in ARM/Thumb-2 state: Uh, Ut, Uv, Uy, Un, Um, Us, Up, Uf, Ux, Ul
> @@ -239,13 +239,6 @@ (define_constraint "Pe"
>(and (match_code "const_int")
> (match_test "TARGET_THUMB1 && ival >= 256 && ival <= 510")))
>
> -(define_constraint "Pf"
> -  "Memory models except relaxed, consume or release ones."
> -  (and (match_code "const_int")
> -   (match_test "!is_mm_relaxed (memmodel_from_int (ival))
> -   && !is_mm_consume (memmodel_from_int (ival))
> -   && !is_mm_release (memmodel_from_int (ival))")))
> -
>  (define_constraint "Pg"
>"@internal In Thumb-2 state a constant in range 1 to 32"
>(and (match_code "const_int")
> diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md
> index 
> 7626bf3c443285dc63b4c4367b11a879a99c93c6..2210810f67f37ce043b8fdc73b4f21b54c5b1912
>  100644
> --- a/gcc/config/arm/sync.md
> +++ b/gcc/config/arm/sync.md
> @@ -62,68 +62,110 @@ (define_insn "*memory_barrier"
> (set_attr "conds" "unconditional")
> (set_attr "predicable" "no")])
>
> -(define_insn "atomic_load"
> -  [(set (match_operand:QHSI 0 "register_operand" "=r,r,l")
> +(define_insn "arm_a

Re: [PATCH v2] ARM: Block predication on atomics [PR111235]

2023-09-27 Thread Wilco Dijkstra
Hi Ramana,

> Hope this helps.

Yes definitely!

>> Passes regress/bootstrap, OK for commit?
>
> Target ? armhf ? --with-arch , -with-fpu , -with-float parameters ?
> Please be specific.

I used --target=arm-none-linux-gnueabihf --host=arm-none-linux-gnueabihf
--build=arm-none-linux-gnueabihf --with-float=hard. However it seems that the
default armhf settings are incorrect. I shouldn't need the --with-float=hard 
since
that is obviously implied by armhf, and they should also imply armv7-a with 
vfpv3
according to documentation. It seems to get confused and skip some tests. I 
tried
using --with-fpu=auto, but that doesn't work at all, so in the end I forced it 
like:
--with-arch=armv8-a --with-fpu=neon-fp-armv8. With this it runs a few more 
tests.

> Since these patterns touch armv8m.baseline can you find all the
> testcases in the testsuite and ensure no change in code for
> armv8m.baseline as that's unpredicated already and this patch brings
> this in line with the same ? Does the testsuite already cover these
> arch variants and are you satisfied that the tests in the testsuite
> can catch / don't make any additional code changes to the other
> architectures affected by this ?

There are various v8-m(.base/.main) tests and they all pass. The generated
code is generally unchanged if there was no conditional execution. I made
the new UNSPEC_LDR/STR patterns support offsets so there is no difference
in generated code for relaxed loads/stores (since they used to use a plain
load/store which has an immediate offset).

>> * onfig/arm/sync.md (arm_atomic_load): Add new pattern.
>
> Nit: s/onfig/config

Fixed.

>> (atomic_load): Always expand atomic loads explicitly.
>> (atomic_store): Always expand atomic stores explicitly.
>
> Nit: Change message to :
> Switch patterns to define_expand.

Fixed.

> Largely looks ok though I cannot work out tonight if we need more v8-a
> or v8m-baseline specific tests for scan-assembler patterns.
>
> Clearly our testsuite doesn't catch it , so perhaps the OP could help
> validate this patch with their formal models to see if this fixes
> these set of issues and creates no new regressions ? Is that feasible
> to do ?

Disabling conditional execution avoids the issue. It's trivial to verify that
atomics can no longer be conditionally executed (no "%?"). When this is
committed, we can run the random testing again to confirm the issue
is no longer present.

> -(define_insn "atomic_load"
> -  [(set (match_operand:QHSI 0 "register_operand" "=r,r,l")
> +(define_insn "arm_atomic_load"
> +  [(set (match_operand:QHSI 0 "register_operand" "=r,l")
>  (unspec_volatile:QHSI
> -  [(match_operand:QHSI 1 "arm_sync_memory_operand" "Q,Q,Q")
> -   (match_operand:SI 2 "const_int_operand" "n,Pf,n")]  ;; model
> +  [(match_operand:QHSI 1 "memory_operand" "m,m")]
>
> Remind me again why is it safe to go from the Q constraint to the m
> constraint here and everywhere else you've done this ?

That's because the relaxed loads/stores use LDR/STR wrapped in an
UNSPEC. To avoid regressions we have to use 'm' so that an immediate
offset can be merged into the memory access.

>> -  VUNSPEC_LDA  ; Represent a store-register-acquire.
>> +  VUNSPEC_LDR  ; Represent a load-register-relaxed.
>> +  VUNSPEC_LDA  ; Represent a load-register-acquire.
>
> Nit: LDA before LDR ? Though I suspect this list can be alphabetically
> ordered at another point of time.

Swapped.

> There are new tests added for v7-a , what happens with the output for
> v8-a and the changes for ldacq and other such instructions ?

v7-a and v8-a generate the same instructions for relaxed load/store.
The acquire/release versions are identical except they are no longer
predicated. Basically the new patterns are not only significantly simpler,
they are now the same between the many ARM/Thumb-2/v7-a/v8-m/v8-a
combinations, so test coverage is much higher now. This is how these
patterns should have been designed all along.

v2 follows below.

Cheers,
Wilco


[PATCH v2] ARM: Block predication on atomics [PR111235]

The v7 memory ordering model allows reordering of conditional atomic
instructions.  To avoid this, make all atomic patterns unconditional.
Expand atomic loads and stores for all architectures so the memory access
can be wrapped into an UNSPEC.

gcc/ChangeLog/
PR target/111235
* config/arm/constraints.md: Remove Pf constraint.
* config/arm/sync.md (arm_atomic_load): Add new pattern.
(arm_atomic_load_acquire): Likewise.
(arm_atomic_store): Likewise.
(arm_atomic_store_release): Likewise.
(atomic_load): Switch patt