Re: [PATCH 4/4] powerpc/64s: Use POWER10 stsync barrier for wmb()
Joel Stanley writes: > On Thu, 24 Aug 2023 at 12:12, Michael Ellerman wrote: >> >> Michael Ellerman writes: >> > Michael Ellerman writes: >> >> "Nicholas Piggin" writes: >> >>> On Wed Jun 14, 2023 at 3:56 PM AEST, Michael Ellerman wrote: >> Michael Ellerman writes: >> > Nicholas Piggin writes: >> >> The most expensive ordering for hwsync to provide is the store-load >> >> barrier, because all prior stores have to be drained to the caches >> >> before subsequent instructions can complete. >> >> >> >> stsync just orders stores which means it can just be a barrer that >> >> goes down the store queue and orders draining, and does not prevent >> >> completion of subsequent instructions. So it should be faster than >> >> hwsync. >> >> >> >> Use stsync for wmb(). Older processors that don't recognise the SC >> >> field should treat this as hwsync. >> > >> > qemu (7.1) emulating ppc64e does not :/ >> > >> > mpic: Setting up MPIC " OpenPIC " version 1.2 at fe004, max 1 >> > CPUs >> > mpic: ISU size: 256, shift: 8, mask: ff >> > mpic: Initializing for 256 sources >> > Oops: Exception in kernel mode, sig: 4 [#1] >> .. >> > >> > I guess just put it behind an #ifdef 64S. >> >> That doesn't work because qemu emulating a G5 also doesn't accept it. >> >> So either we need to get qemu updated and wait a while for that to >> percolate, or do some runtime patching of wmbs in the kernel >_< >> >>> >> >>> Gah, sorry. QEMU really should be ignoring reserved fields in >> >>> instructions :( >> >> >> >> Yeah, it's an annoying discrepancy vs real hardware and the ISA. >> >> >> >>> I guess leave it out for now. Should fix QEMU but we probably also need >> >>> to do patching so as not to break older QEMUs. >> >> >> >> I'll plan to take the first 3 patches, they seem OK as-is. >> > >> > I didn't do that in the end, because patch 2 suffers from the same >> ^ >> 3 >> > problem of not working on QEMU. > > Did we get a patch to fix this in to Qemu? No. Nick might have looked at it but he hasn't posted anything AFAIK. cheers
Re: [PATCH 4/4] powerpc/64s: Use POWER10 stsync barrier for wmb()
On Thu, 24 Aug 2023 at 12:12, Michael Ellerman wrote: > > Michael Ellerman writes: > > Michael Ellerman writes: > >> "Nicholas Piggin" writes: > >>> On Wed Jun 14, 2023 at 3:56 PM AEST, Michael Ellerman wrote: > Michael Ellerman writes: > > Nicholas Piggin writes: > >> The most expensive ordering for hwsync to provide is the store-load > >> barrier, because all prior stores have to be drained to the caches > >> before subsequent instructions can complete. > >> > >> stsync just orders stores which means it can just be a barrer that > >> goes down the store queue and orders draining, and does not prevent > >> completion of subsequent instructions. So it should be faster than > >> hwsync. > >> > >> Use stsync for wmb(). Older processors that don't recognise the SC > >> field should treat this as hwsync. > > > > qemu (7.1) emulating ppc64e does not :/ > > > > mpic: Setting up MPIC " OpenPIC " version 1.2 at fe004, max 1 > > CPUs > > mpic: ISU size: 256, shift: 8, mask: ff > > mpic: Initializing for 256 sources > > Oops: Exception in kernel mode, sig: 4 [#1] > .. > > > > I guess just put it behind an #ifdef 64S. > > That doesn't work because qemu emulating a G5 also doesn't accept it. > > So either we need to get qemu updated and wait a while for that to > percolate, or do some runtime patching of wmbs in the kernel >_< > >>> > >>> Gah, sorry. QEMU really should be ignoring reserved fields in > >>> instructions :( > >> > >> Yeah, it's an annoying discrepancy vs real hardware and the ISA. > >> > >>> I guess leave it out for now. Should fix QEMU but we probably also need > >>> to do patching so as not to break older QEMUs. > >> > >> I'll plan to take the first 3 patches, they seem OK as-is. > > > > I didn't do that in the end, because patch 2 suffers from the same > ^ > 3 > > problem of not working on QEMU. Did we get a patch to fix this in to Qemu? Qemu has recently developed a stable tree process, so if we had a backportable fix we could get it in there too. Cheers, Joel
Re: [PATCH 4/4] powerpc/64s: Use POWER10 stsync barrier for wmb()
Michael Ellerman writes: > Michael Ellerman writes: >> "Nicholas Piggin" writes: >>> On Wed Jun 14, 2023 at 3:56 PM AEST, Michael Ellerman wrote: Michael Ellerman writes: > Nicholas Piggin writes: >> The most expensive ordering for hwsync to provide is the store-load >> barrier, because all prior stores have to be drained to the caches >> before subsequent instructions can complete. >> >> stsync just orders stores which means it can just be a barrer that >> goes down the store queue and orders draining, and does not prevent >> completion of subsequent instructions. So it should be faster than >> hwsync. >> >> Use stsync for wmb(). Older processors that don't recognise the SC >> field should treat this as hwsync. > > qemu (7.1) emulating ppc64e does not :/ > > mpic: Setting up MPIC " OpenPIC " version 1.2 at fe004, max 1 CPUs > mpic: ISU size: 256, shift: 8, mask: ff > mpic: Initializing for 256 sources > Oops: Exception in kernel mode, sig: 4 [#1] .. > > I guess just put it behind an #ifdef 64S. That doesn't work because qemu emulating a G5 also doesn't accept it. So either we need to get qemu updated and wait a while for that to percolate, or do some runtime patching of wmbs in the kernel >_< >>> >>> Gah, sorry. QEMU really should be ignoring reserved fields in >>> instructions :( >> >> Yeah, it's an annoying discrepancy vs real hardware and the ISA. >> >>> I guess leave it out for now. Should fix QEMU but we probably also need >>> to do patching so as not to break older QEMUs. >> >> I'll plan to take the first 3 patches, they seem OK as-is. > > I didn't do that in the end, because patch 2 suffers from the same ^ 3 > problem of not working on QEMU. > > cheers
Re: [PATCH 4/4] powerpc/64s: Use POWER10 stsync barrier for wmb()
Michael Ellerman writes: > "Nicholas Piggin" writes: >> On Wed Jun 14, 2023 at 3:56 PM AEST, Michael Ellerman wrote: >>> Michael Ellerman writes: >>> > Nicholas Piggin writes: >>> >> The most expensive ordering for hwsync to provide is the store-load >>> >> barrier, because all prior stores have to be drained to the caches >>> >> before subsequent instructions can complete. >>> >> >>> >> stsync just orders stores which means it can just be a barrer that >>> >> goes down the store queue and orders draining, and does not prevent >>> >> completion of subsequent instructions. So it should be faster than >>> >> hwsync. >>> >> >>> >> Use stsync for wmb(). Older processors that don't recognise the SC >>> >> field should treat this as hwsync. >>> > >>> > qemu (7.1) emulating ppc64e does not :/ >>> > >>> > mpic: Setting up MPIC " OpenPIC " version 1.2 at fe004, max 1 CPUs >>> > mpic: ISU size: 256, shift: 8, mask: ff >>> > mpic: Initializing for 256 sources >>> > Oops: Exception in kernel mode, sig: 4 [#1] >>> .. >>> > >>> > I guess just put it behind an #ifdef 64S. >>> >>> That doesn't work because qemu emulating a G5 also doesn't accept it. >>> >>> So either we need to get qemu updated and wait a while for that to >>> percolate, or do some runtime patching of wmbs in the kernel >_< >> >> Gah, sorry. QEMU really should be ignoring reserved fields in >> instructions :( > > Yeah, it's an annoying discrepancy vs real hardware and the ISA. > >> I guess leave it out for now. Should fix QEMU but we probably also need >> to do patching so as not to break older QEMUs. > > I'll plan to take the first 3 patches, they seem OK as-is. I didn't do that in the end, because patch 2 suffers from the same problem of not working on QEMU. cheers
Re: [PATCH 4/4] powerpc/64s: Use POWER10 stsync barrier for wmb()
"Nicholas Piggin" writes: > On Wed Jun 14, 2023 at 3:56 PM AEST, Michael Ellerman wrote: >> Michael Ellerman writes: >> > Nicholas Piggin writes: >> >> The most expensive ordering for hwsync to provide is the store-load >> >> barrier, because all prior stores have to be drained to the caches >> >> before subsequent instructions can complete. >> >> >> >> stsync just orders stores which means it can just be a barrer that >> >> goes down the store queue and orders draining, and does not prevent >> >> completion of subsequent instructions. So it should be faster than >> >> hwsync. >> >> >> >> Use stsync for wmb(). Older processors that don't recognise the SC >> >> field should treat this as hwsync. >> > >> > qemu (7.1) emulating ppc64e does not :/ >> > >> > mpic: Setting up MPIC " OpenPIC " version 1.2 at fe004, max 1 CPUs >> > mpic: ISU size: 256, shift: 8, mask: ff >> > mpic: Initializing for 256 sources >> > Oops: Exception in kernel mode, sig: 4 [#1] >> .. >> > >> > I guess just put it behind an #ifdef 64S. >> >> That doesn't work because qemu emulating a G5 also doesn't accept it. >> >> So either we need to get qemu updated and wait a while for that to >> percolate, or do some runtime patching of wmbs in the kernel >_< > > Gah, sorry. QEMU really should be ignoring reserved fields in > instructions :( Yeah, it's an annoying discrepancy vs real hardware and the ISA. > I guess leave it out for now. Should fix QEMU but we probably also need > to do patching so as not to break older QEMUs. I'll plan to take the first 3 patches, they seem OK as-is. cheers
Re: [PATCH 4/4] powerpc/64s: Use POWER10 stsync barrier for wmb()
On Wed Jun 14, 2023 at 3:56 PM AEST, Michael Ellerman wrote: > Michael Ellerman writes: > > Nicholas Piggin writes: > >> The most expensive ordering for hwsync to provide is the store-load > >> barrier, because all prior stores have to be drained to the caches > >> before subsequent instructions can complete. > >> > >> stsync just orders stores which means it can just be a barrer that > >> goes down the store queue and orders draining, and does not prevent > >> completion of subsequent instructions. So it should be faster than > >> hwsync. > >> > >> Use stsync for wmb(). Older processors that don't recognise the SC > >> field should treat this as hwsync. > > > > qemu (7.1) emulating ppc64e does not :/ > > > > mpic: Setting up MPIC " OpenPIC " version 1.2 at fe004, max 1 CPUs > > mpic: ISU size: 256, shift: 8, mask: ff > > mpic: Initializing for 256 sources > > Oops: Exception in kernel mode, sig: 4 [#1] > .. > > > > I guess just put it behind an #ifdef 64S. > > That doesn't work because qemu emulating a G5 also doesn't accept it. > > So either we need to get qemu updated and wait a while for that to > percolate, or do some runtime patching of wmbs in the kernel >_< Gah, sorry. QEMU really should be ignoring reserved fields in instructions :( I guess leave it out for now. Should fix QEMU but we probably also need to do patching so as not to break older QEMUs. Thanks, Nick
Re: [PATCH 4/4] powerpc/64s: Use POWER10 stsync barrier for wmb()
Michael Ellerman writes: > Nicholas Piggin writes: >> The most expensive ordering for hwsync to provide is the store-load >> barrier, because all prior stores have to be drained to the caches >> before subsequent instructions can complete. >> >> stsync just orders stores which means it can just be a barrer that >> goes down the store queue and orders draining, and does not prevent >> completion of subsequent instructions. So it should be faster than >> hwsync. >> >> Use stsync for wmb(). Older processors that don't recognise the SC >> field should treat this as hwsync. > > qemu (7.1) emulating ppc64e does not :/ > > mpic: Setting up MPIC " OpenPIC " version 1.2 at fe004, max 1 CPUs > mpic: ISU size: 256, shift: 8, mask: ff > mpic: Initializing for 256 sources > Oops: Exception in kernel mode, sig: 4 [#1] .. > > I guess just put it behind an #ifdef 64S. That doesn't work because qemu emulating a G5 also doesn't accept it. So either we need to get qemu updated and wait a while for that to percolate, or do some runtime patching of wmbs in the kernel >_< cheers
Re: [PATCH 4/4] powerpc/64s: Use POWER10 stsync barrier for wmb()
Nicholas Piggin writes: > The most expensive ordering for hwsync to provide is the store-load > barrier, because all prior stores have to be drained to the caches > before subsequent instructions can complete. > > stsync just orders stores which means it can just be a barrer that > goes down the store queue and orders draining, and does not prevent > completion of subsequent instructions. So it should be faster than > hwsync. > > Use stsync for wmb(). Older processors that don't recognise the SC > field should treat this as hwsync. qemu (7.1) emulating ppc64e does not :/ mpic: Setting up MPIC " OpenPIC " version 1.2 at fe004, max 1 CPUs mpic: ISU size: 256, shift: 8, mask: ff mpic: Initializing for 256 sources Oops: Exception in kernel mode, sig: 4 [#1] No more output. (qemu) info registers │ NIP c0df4264 LR c00ce49c CTR XER 2000 CPU#0 │ MSR 80001000 HID0 HF 24020006 iidx 1 didx 1 │ ... SRR0 c00ce7c4 SRR1 80081000PVR 80240020 VRSAVE $ objdump -d vmlinux | grep c00ce7c4 c00ce7c4: 7c 03 04 ac stsync That's qemu -M ppce500 -cpu e5500 or e6500. I guess just put it behind an #ifdef 64S. cheers
[PATCH 4/4] powerpc/64s: Use POWER10 stsync barrier for wmb()
The most expensive ordering for hwsync to provide is the store-load barrier, because all prior stores have to be drained to the caches before subsequent instructions can complete. stsync just orders stores which means it can just be a barrer that goes down the store queue and orders draining, and does not prevent completion of subsequent instructions. So it should be faster than hwsync. Use stsync for wmb(). Older processors that don't recognise the SC field should treat this as hwsync. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/barrier.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h index f0ff5737b0d8..95e637c1a3b6 100644 --- a/arch/powerpc/include/asm/barrier.h +++ b/arch/powerpc/include/asm/barrier.h @@ -39,7 +39,7 @@ */ #define __mb() __asm__ __volatile__ ("sync" : : : "memory") #define __rmb() __asm__ __volatile__ ("sync" : : : "memory") -#define __wmb() __asm__ __volatile__ ("sync" : : : "memory") +#define __wmb() __asm__ __volatile__ (PPC_STSYNC : : : "memory") /* The sub-arch has lwsync */ #if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC) -- 2.40.1