Re: [PATCH 4/4] powerpc/64s: Use POWER10 stsync barrier for wmb()

2023-08-25 Thread Michael Ellerman
Joel Stanley  writes:
> On Thu, 24 Aug 2023 at 12:12, Michael Ellerman  wrote:
>>
>> Michael Ellerman  writes:
>> > Michael Ellerman  writes:
>> >> "Nicholas Piggin"  writes:
>> >>> On Wed Jun 14, 2023 at 3:56 PM AEST, Michael Ellerman wrote:
>>  Michael Ellerman  writes:
>>  > Nicholas Piggin  writes:
>>  >> The most expensive ordering for hwsync to provide is the store-load
>>  >> barrier, because all prior stores have to be drained to the caches
>>  >> before subsequent instructions can complete.
>>  >>
>>  >> stsync just orders stores which means it can just be a barrer that
>>  >> goes down the store queue and orders draining, and does not prevent
>>  >> completion of subsequent instructions. So it should be faster than
>>  >> hwsync.
>>  >>
>>  >> Use stsync for wmb(). Older processors that don't recognise the SC
>>  >> field should treat this as hwsync.
>>  >
>>  > qemu (7.1) emulating ppc64e does not :/
>>  >
>>  >   mpic: Setting up MPIC " OpenPIC  " version 1.2 at fe004, max 1 
>>  > CPUs
>>  >   mpic: ISU size: 256, shift: 8, mask: ff
>>  >   mpic: Initializing for 256 sources
>>  >   Oops: Exception in kernel mode, sig: 4 [#1]
>>  ..
>>  >
>>  > I guess just put it behind an #ifdef 64S.
>> 
>>  That doesn't work because qemu emulating a G5 also doesn't accept it.
>> 
>>  So either we need to get qemu updated and wait a while for that to
>>  percolate, or do some runtime patching of wmbs in the kernel >_<
>> >>>
>> >>> Gah, sorry. QEMU really should be ignoring reserved fields in
>> >>> instructions :(
>> >>
>> >> Yeah, it's an annoying discrepancy vs real hardware and the ISA.
>> >>
>> >>> I guess leave it out for now. Should fix QEMU but we probably also need
>> >>> to do patching so as not to break older QEMUs.
>> >>
>> >> I'll plan to take the first 3 patches, they seem OK as-is.
>> >
>> > I didn't do that in the end, because patch 2 suffers from the same
>>  ^
>>  3
>> > problem of not working on QEMU.
>
> Did we get a patch to fix this in to Qemu?

No. Nick might have looked at it but he hasn't posted anything AFAIK.

cheers


Re: [PATCH 4/4] powerpc/64s: Use POWER10 stsync barrier for wmb()

2023-08-24 Thread Joel Stanley
On Thu, 24 Aug 2023 at 12:12, Michael Ellerman  wrote:
>
> Michael Ellerman  writes:
> > Michael Ellerman  writes:
> >> "Nicholas Piggin"  writes:
> >>> On Wed Jun 14, 2023 at 3:56 PM AEST, Michael Ellerman wrote:
>  Michael Ellerman  writes:
>  > Nicholas Piggin  writes:
>  >> The most expensive ordering for hwsync to provide is the store-load
>  >> barrier, because all prior stores have to be drained to the caches
>  >> before subsequent instructions can complete.
>  >>
>  >> stsync just orders stores which means it can just be a barrer that
>  >> goes down the store queue and orders draining, and does not prevent
>  >> completion of subsequent instructions. So it should be faster than
>  >> hwsync.
>  >>
>  >> Use stsync for wmb(). Older processors that don't recognise the SC
>  >> field should treat this as hwsync.
>  >
>  > qemu (7.1) emulating ppc64e does not :/
>  >
>  >   mpic: Setting up MPIC " OpenPIC  " version 1.2 at fe004, max 1 
>  > CPUs
>  >   mpic: ISU size: 256, shift: 8, mask: ff
>  >   mpic: Initializing for 256 sources
>  >   Oops: Exception in kernel mode, sig: 4 [#1]
>  ..
>  >
>  > I guess just put it behind an #ifdef 64S.
> 
>  That doesn't work because qemu emulating a G5 also doesn't accept it.
> 
>  So either we need to get qemu updated and wait a while for that to
>  percolate, or do some runtime patching of wmbs in the kernel >_<
> >>>
> >>> Gah, sorry. QEMU really should be ignoring reserved fields in
> >>> instructions :(
> >>
> >> Yeah, it's an annoying discrepancy vs real hardware and the ISA.
> >>
> >>> I guess leave it out for now. Should fix QEMU but we probably also need
> >>> to do patching so as not to break older QEMUs.
> >>
> >> I'll plan to take the first 3 patches, they seem OK as-is.
> >
> > I didn't do that in the end, because patch 2 suffers from the same
>  ^
>  3
> > problem of not working on QEMU.

Did we get a patch to fix this in to Qemu?

Qemu has recently developed a stable tree process, so if we had a
backportable fix we could get it in there too.

Cheers,

Joel


Re: [PATCH 4/4] powerpc/64s: Use POWER10 stsync barrier for wmb()

2023-08-24 Thread Michael Ellerman
Michael Ellerman  writes:
> Michael Ellerman  writes:
>> "Nicholas Piggin"  writes:
>>> On Wed Jun 14, 2023 at 3:56 PM AEST, Michael Ellerman wrote:
 Michael Ellerman  writes:
 > Nicholas Piggin  writes:
 >> The most expensive ordering for hwsync to provide is the store-load
 >> barrier, because all prior stores have to be drained to the caches
 >> before subsequent instructions can complete.
 >>
 >> stsync just orders stores which means it can just be a barrer that
 >> goes down the store queue and orders draining, and does not prevent
 >> completion of subsequent instructions. So it should be faster than
 >> hwsync.
 >>
 >> Use stsync for wmb(). Older processors that don't recognise the SC
 >> field should treat this as hwsync.
 >
 > qemu (7.1) emulating ppc64e does not :/
 >
 >   mpic: Setting up MPIC " OpenPIC  " version 1.2 at fe004, max 1 CPUs
 >   mpic: ISU size: 256, shift: 8, mask: ff
 >   mpic: Initializing for 256 sources
 >   Oops: Exception in kernel mode, sig: 4 [#1]
 ..
 >
 > I guess just put it behind an #ifdef 64S.

 That doesn't work because qemu emulating a G5 also doesn't accept it.

 So either we need to get qemu updated and wait a while for that to
 percolate, or do some runtime patching of wmbs in the kernel >_<
>>>
>>> Gah, sorry. QEMU really should be ignoring reserved fields in
>>> instructions :(
>>
>> Yeah, it's an annoying discrepancy vs real hardware and the ISA.
>>
>>> I guess leave it out for now. Should fix QEMU but we probably also need
>>> to do patching so as not to break older QEMUs.
>>
>> I'll plan to take the first 3 patches, they seem OK as-is.
>
> I didn't do that in the end, because patch 2 suffers from the same
 ^
 3
> problem of not working on QEMU.
>
> cheers


Re: [PATCH 4/4] powerpc/64s: Use POWER10 stsync barrier for wmb()

2023-08-24 Thread Michael Ellerman
Michael Ellerman  writes:
> "Nicholas Piggin"  writes:
>> On Wed Jun 14, 2023 at 3:56 PM AEST, Michael Ellerman wrote:
>>> Michael Ellerman  writes:
>>> > Nicholas Piggin  writes:
>>> >> The most expensive ordering for hwsync to provide is the store-load
>>> >> barrier, because all prior stores have to be drained to the caches
>>> >> before subsequent instructions can complete.
>>> >>
>>> >> stsync just orders stores which means it can just be a barrer that
>>> >> goes down the store queue and orders draining, and does not prevent
>>> >> completion of subsequent instructions. So it should be faster than
>>> >> hwsync.
>>> >>
>>> >> Use stsync for wmb(). Older processors that don't recognise the SC
>>> >> field should treat this as hwsync.
>>> >
>>> > qemu (7.1) emulating ppc64e does not :/
>>> >
>>> >   mpic: Setting up MPIC " OpenPIC  " version 1.2 at fe004, max 1 CPUs
>>> >   mpic: ISU size: 256, shift: 8, mask: ff
>>> >   mpic: Initializing for 256 sources
>>> >   Oops: Exception in kernel mode, sig: 4 [#1]
>>> ..
>>> >
>>> > I guess just put it behind an #ifdef 64S.
>>>
>>> That doesn't work because qemu emulating a G5 also doesn't accept it.
>>>
>>> So either we need to get qemu updated and wait a while for that to
>>> percolate, or do some runtime patching of wmbs in the kernel >_<
>>
>> Gah, sorry. QEMU really should be ignoring reserved fields in
>> instructions :(
>
> Yeah, it's an annoying discrepancy vs real hardware and the ISA.
>
>> I guess leave it out for now. Should fix QEMU but we probably also need
>> to do patching so as not to break older QEMUs.
>
> I'll plan to take the first 3 patches, they seem OK as-is.

I didn't do that in the end, because patch 2 suffers from the same
problem of not working on QEMU.

cheers


Re: [PATCH 4/4] powerpc/64s: Use POWER10 stsync barrier for wmb()

2023-06-14 Thread Michael Ellerman
"Nicholas Piggin"  writes:
> On Wed Jun 14, 2023 at 3:56 PM AEST, Michael Ellerman wrote:
>> Michael Ellerman  writes:
>> > Nicholas Piggin  writes:
>> >> The most expensive ordering for hwsync to provide is the store-load
>> >> barrier, because all prior stores have to be drained to the caches
>> >> before subsequent instructions can complete.
>> >>
>> >> stsync just orders stores which means it can just be a barrer that
>> >> goes down the store queue and orders draining, and does not prevent
>> >> completion of subsequent instructions. So it should be faster than
>> >> hwsync.
>> >>
>> >> Use stsync for wmb(). Older processors that don't recognise the SC
>> >> field should treat this as hwsync.
>> >
>> > qemu (7.1) emulating ppc64e does not :/
>> >
>> >   mpic: Setting up MPIC " OpenPIC  " version 1.2 at fe004, max 1 CPUs
>> >   mpic: ISU size: 256, shift: 8, mask: ff
>> >   mpic: Initializing for 256 sources
>> >   Oops: Exception in kernel mode, sig: 4 [#1]
>> ..
>> >
>> > I guess just put it behind an #ifdef 64S.
>>
>> That doesn't work because qemu emulating a G5 also doesn't accept it.
>>
>> So either we need to get qemu updated and wait a while for that to
>> percolate, or do some runtime patching of wmbs in the kernel >_<
>
> Gah, sorry. QEMU really should be ignoring reserved fields in
> instructions :(

Yeah, it's an annoying discrepancy vs real hardware and the ISA.

> I guess leave it out for now. Should fix QEMU but we probably also need
> to do patching so as not to break older QEMUs.

I'll plan to take the first 3 patches, they seem OK as-is.

cheers


Re: [PATCH 4/4] powerpc/64s: Use POWER10 stsync barrier for wmb()

2023-06-14 Thread Nicholas Piggin
On Wed Jun 14, 2023 at 3:56 PM AEST, Michael Ellerman wrote:
> Michael Ellerman  writes:
> > Nicholas Piggin  writes:
> >> The most expensive ordering for hwsync to provide is the store-load
> >> barrier, because all prior stores have to be drained to the caches
> >> before subsequent instructions can complete.
> >>
> >> stsync just orders stores which means it can just be a barrer that
> >> goes down the store queue and orders draining, and does not prevent
> >> completion of subsequent instructions. So it should be faster than
> >> hwsync.
> >>
> >> Use stsync for wmb(). Older processors that don't recognise the SC
> >> field should treat this as hwsync.
> >
> > qemu (7.1) emulating ppc64e does not :/
> >
> >   mpic: Setting up MPIC " OpenPIC  " version 1.2 at fe004, max 1 CPUs
> >   mpic: ISU size: 256, shift: 8, mask: ff
> >   mpic: Initializing for 256 sources
> >   Oops: Exception in kernel mode, sig: 4 [#1]
> ..
> >
> > I guess just put it behind an #ifdef 64S.
>
> That doesn't work because qemu emulating a G5 also doesn't accept it.
>
> So either we need to get qemu updated and wait a while for that to
> percolate, or do some runtime patching of wmbs in the kernel >_<

Gah, sorry. QEMU really should be ignoring reserved fields in
instructions :(

I guess leave it out for now. Should fix QEMU but we probably also need
to do patching so as not to break older QEMUs.

Thanks,
Nick


Re: [PATCH 4/4] powerpc/64s: Use POWER10 stsync barrier for wmb()

2023-06-13 Thread Michael Ellerman
Michael Ellerman  writes:
> Nicholas Piggin  writes:
>> The most expensive ordering for hwsync to provide is the store-load
>> barrier, because all prior stores have to be drained to the caches
>> before subsequent instructions can complete.
>>
>> stsync just orders stores which means it can just be a barrer that
>> goes down the store queue and orders draining, and does not prevent
>> completion of subsequent instructions. So it should be faster than
>> hwsync.
>>
>> Use stsync for wmb(). Older processors that don't recognise the SC
>> field should treat this as hwsync.
>
> qemu (7.1) emulating ppc64e does not :/
>
>   mpic: Setting up MPIC " OpenPIC  " version 1.2 at fe004, max 1 CPUs
>   mpic: ISU size: 256, shift: 8, mask: ff
>   mpic: Initializing for 256 sources
>   Oops: Exception in kernel mode, sig: 4 [#1]
..
>
> I guess just put it behind an #ifdef 64S.

That doesn't work because qemu emulating a G5 also doesn't accept it.

So either we need to get qemu updated and wait a while for that to
percolate, or do some runtime patching of wmbs in the kernel >_<

cheers


Re: [PATCH 4/4] powerpc/64s: Use POWER10 stsync barrier for wmb()

2023-06-13 Thread Michael Ellerman
Nicholas Piggin  writes:
> The most expensive ordering for hwsync to provide is the store-load
> barrier, because all prior stores have to be drained to the caches
> before subsequent instructions can complete.
>
> stsync just orders stores which means it can just be a barrer that
> goes down the store queue and orders draining, and does not prevent
> completion of subsequent instructions. So it should be faster than
> hwsync.
>
> Use stsync for wmb(). Older processors that don't recognise the SC
> field should treat this as hwsync.

qemu (7.1) emulating ppc64e does not :/

  mpic: Setting up MPIC " OpenPIC  " version 1.2 at fe004, max 1 CPUs
  mpic: ISU size: 256, shift: 8, mask: ff
  mpic: Initializing for 256 sources
  Oops: Exception in kernel mode, sig: 4 [#1]

No more output.

(qemu) info registers   
 │
NIP c0df4264   LR c00ce49c CTR  XER 
2000 CPU#0   │
MSR 80001000 HID0   HF 24020006 iidx 1 didx 1   
 │
...
 SRR0 c00ce7c4  SRR1 80081000PVR 80240020 VRSAVE 


$ objdump -d vmlinux | grep c00ce7c4
c00ce7c4:   7c 03 04 ac stsync


That's qemu -M ppce500 -cpu e5500 or e6500.

I guess just put it behind an #ifdef 64S.

cheers


[PATCH 4/4] powerpc/64s: Use POWER10 stsync barrier for wmb()

2023-06-09 Thread Nicholas Piggin
The most expensive ordering for hwsync to provide is the store-load
barrier, because all prior stores have to be drained to the caches
before subsequent instructions can complete.

stsync just orders stores which means it can just be a barrer that
goes down the store queue and orders draining, and does not prevent
completion of subsequent instructions. So it should be faster than
hwsync.

Use stsync for wmb(). Older processors that don't recognise the SC
field should treat this as hwsync.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/barrier.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/barrier.h 
b/arch/powerpc/include/asm/barrier.h
index f0ff5737b0d8..95e637c1a3b6 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -39,7 +39,7 @@
  */
 #define __mb()   __asm__ __volatile__ ("sync" : : : "memory")
 #define __rmb()  __asm__ __volatile__ ("sync" : : : "memory")
-#define __wmb()  __asm__ __volatile__ ("sync" : : : "memory")
+#define __wmb()  __asm__ __volatile__ (PPC_STSYNC : : : "memory")
 
 /* The sub-arch has lwsync */
 #if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC)
-- 
2.40.1