Re: [PATCH 1/3] arch: Introduce load_acquire() and store_release()
On 11/14/2014 02:45 AM, David Laight wrote: From: Alexander Duyck It is common for device drivers to make use of acquire/release semantics when dealing with descriptors stored in device memory. On reviewing the documentation and code for smp_load_acquire() and smp_store_release() as well as reviewing an IBM website that goes over the use of PowerPC barriers at http://www.ibm.com/developerworks/systems/articles/powerpc.html it occurred to me that the same code could likely be applied to device drivers. As a result this patch introduces load_acquire() and store_release(). The load_acquire() function can be used in the place of situations where a test for ownership must be followed by a memory barrier. The below example is from ixgbe: if (!rx_desc->wb.upper.status_error) break; /* This memory barrier is needed to keep us from reading * any other fields out of the rx_desc until we know the * descriptor has been written back */ rmb(); With load_acquire() this can be changed to: if (!load_acquire(_desc->wb.upper.status_error)) break; If I'm quickly reading the 'new' code I need to look up yet another function, with the 'old' code I can easily see the logic. You've also added a memory barrier to the 'break' path - which isn't needed. The driver might also have additional code that can be added before the barrier so reducing the cost of the barrier. The driver may also be able to perform multiple actions before a barrier is needed. Hiding barriers isn't necessarily a good idea anyway. If you are writing a driver you need to understand when and where they are needed. Maybe you need a new (weaker) barrier to replace rmb() on some architectures. ... David Yeah, I think I might explore creating some lightweight barriers. The load/acquire stuff is a bit overkill for what is needed. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] arch: Introduce load_acquire() and store_release()
On 11/14/2014 02:19 AM, Will Deacon wrote: Hi Alex, On Thu, Nov 13, 2014 at 07:27:23PM +, Alexander Duyck wrote: It is common for device drivers to make use of acquire/release semantics when dealing with descriptors stored in device memory. On reviewing the documentation and code for smp_load_acquire() and smp_store_release() as well as reviewing an IBM website that goes over the use of PowerPC barriers at http://www.ibm.com/developerworks/systems/articles/powerpc.html it occurred to me that the same code could likely be applied to device drivers. As a result this patch introduces load_acquire() and store_release(). The load_acquire() function can be used in the place of situations where a test for ownership must be followed by a memory barrier. The below example is from ixgbe: if (!rx_desc->wb.upper.status_error) break; /* This memory barrier is needed to keep us from reading * any other fields out of the rx_desc until we know the * descriptor has been written back */ rmb(); With load_acquire() this can be changed to: if (!load_acquire(_desc->wb.upper.status_error)) break; I still don't think this is a good idea for the specific use-case you're highlighting. On ARM, an mb() can be *significantly* more expensive than an rmb() (since we may have to drain store buffers on an outer L2 cache) and on arm64 it's not at all clear that an LDAR is more efficient than an LDR; DMB LD sequence. I can certainly imagine implementations where the latter would be preferred. Yeah, I am pretty sure I overdid it in using a mb() for arm. I think what I should probably be using is something like dmb(ish) which is used for smp_mb() instead. The general idea is to enforce memory-memory accesses. The memory-mmio accesses still should be using a full rmb()/wmb() barrier. The alternative I am mulling over is creating something like a lightweight set of memory barriers named lw_mb(), lw_rmb(), lw_wmb(), that could be used instead. The general idea is that on many architectures a full mb/rmb/wmb is far too much for just guaranteeing ordering for system memory only writes or reads. I'm thinking I could probably use the smp_ varieties as a template for them since I'm thinking that in most cases this should be correct. Also, just to be clear I am not advocating replacing the wmb() in most I/O setups where we have to sync the system memory before doing the MMIO write. This is for the case where the device descriptor ring has some bit indicating ownership by either the device or the CPU. So for example on the r8169 they have to do a wmb() before writing the DescOwn bit in the first descriptor of a given set of Tx descriptors to guarantee the rest are written, then they set the DescOwn bit, then they call wmb() again to flush that last bit before notifying the device it can start fetching the descriptors. My goal is to deal with that first wmb() and leave the second as it since it is correct. So, whilst I'm perfectly fine to go along with mandatory acquire/release macros (we should probably add a check to barf on __iomem pointers), I don't agree with using them in preference to finer-grained read/write barriers. Doing so will have a real impact on I/O performance. Couldn't that type of check be added to compiletime_assert_atomic_type? That seems like that would be the best place for something like that. Finally, do you know of any architectures where load_acquire/store_release aren't implemented the same way as the smp_* variants on SMP kernels? Will I should probably go back through and sort out the cases where mb() and smp_mb() are not the same thing. I think I probably went with too harsh of a barrier in probably a couple of other cases. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/3] arch: Introduce load_acquire() and store_release()
From: Alexander Duyck > It is common for device drivers to make use of acquire/release semantics > when dealing with descriptors stored in device memory. On reviewing the > documentation and code for smp_load_acquire() and smp_store_release() as > well as reviewing an IBM website that goes over the use of PowerPC barriers > at http://www.ibm.com/developerworks/systems/articles/powerpc.html it > occurred to me that the same code could likely be applied to device drivers. > > As a result this patch introduces load_acquire() and store_release(). The > load_acquire() function can be used in the place of situations where a test > for ownership must be followed by a memory barrier. The below example is > from ixgbe: > > if (!rx_desc->wb.upper.status_error) > break; > > /* This memory barrier is needed to keep us from reading >* any other fields out of the rx_desc until we know the >* descriptor has been written back >*/ > rmb(); > > With load_acquire() this can be changed to: > > if (!load_acquire(_desc->wb.upper.status_error)) > break; If I'm quickly reading the 'new' code I need to look up yet another function, with the 'old' code I can easily see the logic. You've also added a memory barrier to the 'break' path - which isn't needed. The driver might also have additional code that can be added before the barrier so reducing the cost of the barrier. The driver may also be able to perform multiple actions before a barrier is needed. Hiding barriers isn't necessarily a good idea anyway. If you are writing a driver you need to understand when and where they are needed. Maybe you need a new (weaker) barrier to replace rmb() on some architectures. ... David
Re: [PATCH 1/3] arch: Introduce load_acquire() and store_release()
Hi Alex, On Thu, Nov 13, 2014 at 07:27:23PM +, Alexander Duyck wrote: > It is common for device drivers to make use of acquire/release semantics > when dealing with descriptors stored in device memory. On reviewing the > documentation and code for smp_load_acquire() and smp_store_release() as > well as reviewing an IBM website that goes over the use of PowerPC barriers > at http://www.ibm.com/developerworks/systems/articles/powerpc.html it > occurred to me that the same code could likely be applied to device drivers. > > As a result this patch introduces load_acquire() and store_release(). The > load_acquire() function can be used in the place of situations where a test > for ownership must be followed by a memory barrier. The below example is > from ixgbe: > > if (!rx_desc->wb.upper.status_error) > break; > > /* This memory barrier is needed to keep us from reading > * any other fields out of the rx_desc until we know the > * descriptor has been written back > */ > rmb(); > > With load_acquire() this can be changed to: > > if (!load_acquire(_desc->wb.upper.status_error)) > break; I still don't think this is a good idea for the specific use-case you're highlighting. On ARM, an mb() can be *significantly* more expensive than an rmb() (since we may have to drain store buffers on an outer L2 cache) and on arm64 it's not at all clear that an LDAR is more efficient than an LDR; DMB LD sequence. I can certainly imagine implementations where the latter would be preferred. So, whilst I'm perfectly fine to go along with mandatory acquire/release macros (we should probably add a check to barf on __iomem pointers), I don't agree with using them in preference to finer-grained read/write barriers. Doing so will have a real impact on I/O performance. Finally, do you know of any architectures where load_acquire/store_release aren't implemented the same way as the smp_* variants on SMP kernels? Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] arch: Introduce load_acquire() and store_release()
Hi Alex, On Thu, Nov 13, 2014 at 07:27:23PM +, Alexander Duyck wrote: It is common for device drivers to make use of acquire/release semantics when dealing with descriptors stored in device memory. On reviewing the documentation and code for smp_load_acquire() and smp_store_release() as well as reviewing an IBM website that goes over the use of PowerPC barriers at http://www.ibm.com/developerworks/systems/articles/powerpc.html it occurred to me that the same code could likely be applied to device drivers. As a result this patch introduces load_acquire() and store_release(). The load_acquire() function can be used in the place of situations where a test for ownership must be followed by a memory barrier. The below example is from ixgbe: if (!rx_desc-wb.upper.status_error) break; /* This memory barrier is needed to keep us from reading * any other fields out of the rx_desc until we know the * descriptor has been written back */ rmb(); With load_acquire() this can be changed to: if (!load_acquire(rx_desc-wb.upper.status_error)) break; I still don't think this is a good idea for the specific use-case you're highlighting. On ARM, an mb() can be *significantly* more expensive than an rmb() (since we may have to drain store buffers on an outer L2 cache) and on arm64 it's not at all clear that an LDAR is more efficient than an LDR; DMB LD sequence. I can certainly imagine implementations where the latter would be preferred. So, whilst I'm perfectly fine to go along with mandatory acquire/release macros (we should probably add a check to barf on __iomem pointers), I don't agree with using them in preference to finer-grained read/write barriers. Doing so will have a real impact on I/O performance. Finally, do you know of any architectures where load_acquire/store_release aren't implemented the same way as the smp_* variants on SMP kernels? Will -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/3] arch: Introduce load_acquire() and store_release()
From: Alexander Duyck It is common for device drivers to make use of acquire/release semantics when dealing with descriptors stored in device memory. On reviewing the documentation and code for smp_load_acquire() and smp_store_release() as well as reviewing an IBM website that goes over the use of PowerPC barriers at http://www.ibm.com/developerworks/systems/articles/powerpc.html it occurred to me that the same code could likely be applied to device drivers. As a result this patch introduces load_acquire() and store_release(). The load_acquire() function can be used in the place of situations where a test for ownership must be followed by a memory barrier. The below example is from ixgbe: if (!rx_desc-wb.upper.status_error) break; /* This memory barrier is needed to keep us from reading * any other fields out of the rx_desc until we know the * descriptor has been written back */ rmb(); With load_acquire() this can be changed to: if (!load_acquire(rx_desc-wb.upper.status_error)) break; If I'm quickly reading the 'new' code I need to look up yet another function, with the 'old' code I can easily see the logic. You've also added a memory barrier to the 'break' path - which isn't needed. The driver might also have additional code that can be added before the barrier so reducing the cost of the barrier. The driver may also be able to perform multiple actions before a barrier is needed. Hiding barriers isn't necessarily a good idea anyway. If you are writing a driver you need to understand when and where they are needed. Maybe you need a new (weaker) barrier to replace rmb() on some architectures. ... David
Re: [PATCH 1/3] arch: Introduce load_acquire() and store_release()
On 11/14/2014 02:19 AM, Will Deacon wrote: Hi Alex, On Thu, Nov 13, 2014 at 07:27:23PM +, Alexander Duyck wrote: It is common for device drivers to make use of acquire/release semantics when dealing with descriptors stored in device memory. On reviewing the documentation and code for smp_load_acquire() and smp_store_release() as well as reviewing an IBM website that goes over the use of PowerPC barriers at http://www.ibm.com/developerworks/systems/articles/powerpc.html it occurred to me that the same code could likely be applied to device drivers. As a result this patch introduces load_acquire() and store_release(). The load_acquire() function can be used in the place of situations where a test for ownership must be followed by a memory barrier. The below example is from ixgbe: if (!rx_desc-wb.upper.status_error) break; /* This memory barrier is needed to keep us from reading * any other fields out of the rx_desc until we know the * descriptor has been written back */ rmb(); With load_acquire() this can be changed to: if (!load_acquire(rx_desc-wb.upper.status_error)) break; I still don't think this is a good idea for the specific use-case you're highlighting. On ARM, an mb() can be *significantly* more expensive than an rmb() (since we may have to drain store buffers on an outer L2 cache) and on arm64 it's not at all clear that an LDAR is more efficient than an LDR; DMB LD sequence. I can certainly imagine implementations where the latter would be preferred. Yeah, I am pretty sure I overdid it in using a mb() for arm. I think what I should probably be using is something like dmb(ish) which is used for smp_mb() instead. The general idea is to enforce memory-memory accesses. The memory-mmio accesses still should be using a full rmb()/wmb() barrier. The alternative I am mulling over is creating something like a lightweight set of memory barriers named lw_mb(), lw_rmb(), lw_wmb(), that could be used instead. The general idea is that on many architectures a full mb/rmb/wmb is far too much for just guaranteeing ordering for system memory only writes or reads. I'm thinking I could probably use the smp_ varieties as a template for them since I'm thinking that in most cases this should be correct. Also, just to be clear I am not advocating replacing the wmb() in most I/O setups where we have to sync the system memory before doing the MMIO write. This is for the case where the device descriptor ring has some bit indicating ownership by either the device or the CPU. So for example on the r8169 they have to do a wmb() before writing the DescOwn bit in the first descriptor of a given set of Tx descriptors to guarantee the rest are written, then they set the DescOwn bit, then they call wmb() again to flush that last bit before notifying the device it can start fetching the descriptors. My goal is to deal with that first wmb() and leave the second as it since it is correct. So, whilst I'm perfectly fine to go along with mandatory acquire/release macros (we should probably add a check to barf on __iomem pointers), I don't agree with using them in preference to finer-grained read/write barriers. Doing so will have a real impact on I/O performance. Couldn't that type of check be added to compiletime_assert_atomic_type? That seems like that would be the best place for something like that. Finally, do you know of any architectures where load_acquire/store_release aren't implemented the same way as the smp_* variants on SMP kernels? Will I should probably go back through and sort out the cases where mb() and smp_mb() are not the same thing. I think I probably went with too harsh of a barrier in probably a couple of other cases. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] arch: Introduce load_acquire() and store_release()
On 11/14/2014 02:45 AM, David Laight wrote: From: Alexander Duyck It is common for device drivers to make use of acquire/release semantics when dealing with descriptors stored in device memory. On reviewing the documentation and code for smp_load_acquire() and smp_store_release() as well as reviewing an IBM website that goes over the use of PowerPC barriers at http://www.ibm.com/developerworks/systems/articles/powerpc.html it occurred to me that the same code could likely be applied to device drivers. As a result this patch introduces load_acquire() and store_release(). The load_acquire() function can be used in the place of situations where a test for ownership must be followed by a memory barrier. The below example is from ixgbe: if (!rx_desc-wb.upper.status_error) break; /* This memory barrier is needed to keep us from reading * any other fields out of the rx_desc until we know the * descriptor has been written back */ rmb(); With load_acquire() this can be changed to: if (!load_acquire(rx_desc-wb.upper.status_error)) break; If I'm quickly reading the 'new' code I need to look up yet another function, with the 'old' code I can easily see the logic. You've also added a memory barrier to the 'break' path - which isn't needed. The driver might also have additional code that can be added before the barrier so reducing the cost of the barrier. The driver may also be able to perform multiple actions before a barrier is needed. Hiding barriers isn't necessarily a good idea anyway. If you are writing a driver you need to understand when and where they are needed. Maybe you need a new (weaker) barrier to replace rmb() on some architectures. ... David Yeah, I think I might explore creating some lightweight barriers. The load/acquire stuff is a bit overkill for what is needed. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] arch: Introduce load_acquire() and store_release()
It is common for device drivers to make use of acquire/release semantics when dealing with descriptors stored in device memory. On reviewing the documentation and code for smp_load_acquire() and smp_store_release() as well as reviewing an IBM website that goes over the use of PowerPC barriers at http://www.ibm.com/developerworks/systems/articles/powerpc.html it occurred to me that the same code could likely be applied to device drivers. As a result this patch introduces load_acquire() and store_release(). The load_acquire() function can be used in the place of situations where a test for ownership must be followed by a memory barrier. The below example is from ixgbe: if (!rx_desc->wb.upper.status_error) break; /* This memory barrier is needed to keep us from reading * any other fields out of the rx_desc until we know the * descriptor has been written back */ rmb(); With load_acquire() this can be changed to: if (!load_acquire(_desc->wb.upper.status_error)) break; A similar change can be made in the release path of many drivers. For example in the Realtek r8169 driver there are a number of flows that consist of something like the following: wmb(); status = opts[0] | len | (RingEnd * !((entry + 1) % NUM_TX_DESC)); txd->opts1 = cpu_to_le32(status); tp->cur_tx += frags + 1; wmb(); With store_release() this can be changed to the following: status = opts[0] | len | (RingEnd * !((entry + 1) % NUM_TX_DESC)); store_release(>opts1, cpu_to_le32(status)); tp->cur_tx += frags + 1; wmb(); The resulting assembler code generated as a result can be significantly less expensive on architectures such as x86 and s390 that support strong ordering. On architectures that are able to use different primitives than their rmb/wmb() such as powerpc, ia64, and arm64 we should see gains as we are able to use less expensive barriers, and for other architectures we end up using a mb() which may come at the same amount of overhead or more than a rmb/wmb() as we must ensure Load/Store ordering. Cc: Benjamin Herrenschmidt Cc: Frederic Weisbecker Cc: Mathieu Desnoyers Cc: Michael Ellerman Cc: Michael Neuling Cc: Russell King Cc: Geert Uytterhoeven Cc: Heiko Carstens Cc: Linus Torvalds Cc: Martin Schwidefsky Cc: Tony Luck Cc: Oleg Nesterov Cc: Will Deacon Cc: "Paul E. McKenney" Cc: Peter Zijlstra Cc: Ingo Molnar Cc: David Miller Signed-off-by: Alexander Duyck --- arch/arm/include/asm/barrier.h | 15 + arch/arm64/include/asm/barrier.h| 59 ++- arch/ia64/include/asm/barrier.h |7 +++- arch/metag/include/asm/barrier.h| 15 + arch/mips/include/asm/barrier.h | 15 + arch/powerpc/include/asm/barrier.h | 24 +++--- arch/s390/include/asm/barrier.h |7 +++- arch/sparc/include/asm/barrier_64.h |6 ++-- arch/x86/include/asm/barrier.h | 22 - include/asm-generic/barrier.h | 15 + 10 files changed, 144 insertions(+), 41 deletions(-) diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h index c6a3e73..bbdcd34 100644 --- a/arch/arm/include/asm/barrier.h +++ b/arch/arm/include/asm/barrier.h @@ -59,6 +59,21 @@ #define smp_wmb() dmb(ishst) #endif +#define store_release(p, v)\ +do { \ + compiletime_assert_atomic_type(*p); \ + mb(); \ + ACCESS_ONCE(*p) = (v); \ +} while (0) + +#define load_acquire(p) \ +({ \ + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ + compiletime_assert_atomic_type(*p); \ + mb(); \ + ___p1; \ +}) + #define smp_store_release(p, v) \ do { \ compiletime_assert_atomic_type(*p); \ diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h index 6389d60..c91571c 100644 --- a/arch/arm64/include/asm/barrier.h +++ b/arch/arm64/include/asm/barrier.h @@ -32,33 +32,7 @@ #define rmb() dsb(ld) #define wmb() dsb(st) -#ifndef CONFIG_SMP -#define smp_mb() barrier() -#define smp_rmb() barrier() -#define smp_wmb() barrier() - -#define smp_store_release(p, v) \ -do {
[PATCH 1/3] arch: Introduce load_acquire() and store_release()
It is common for device drivers to make use of acquire/release semantics when dealing with descriptors stored in device memory. On reviewing the documentation and code for smp_load_acquire() and smp_store_release() as well as reviewing an IBM website that goes over the use of PowerPC barriers at http://www.ibm.com/developerworks/systems/articles/powerpc.html it occurred to me that the same code could likely be applied to device drivers. As a result this patch introduces load_acquire() and store_release(). The load_acquire() function can be used in the place of situations where a test for ownership must be followed by a memory barrier. The below example is from ixgbe: if (!rx_desc-wb.upper.status_error) break; /* This memory barrier is needed to keep us from reading * any other fields out of the rx_desc until we know the * descriptor has been written back */ rmb(); With load_acquire() this can be changed to: if (!load_acquire(rx_desc-wb.upper.status_error)) break; A similar change can be made in the release path of many drivers. For example in the Realtek r8169 driver there are a number of flows that consist of something like the following: wmb(); status = opts[0] | len | (RingEnd * !((entry + 1) % NUM_TX_DESC)); txd-opts1 = cpu_to_le32(status); tp-cur_tx += frags + 1; wmb(); With store_release() this can be changed to the following: status = opts[0] | len | (RingEnd * !((entry + 1) % NUM_TX_DESC)); store_release(txd-opts1, cpu_to_le32(status)); tp-cur_tx += frags + 1; wmb(); The resulting assembler code generated as a result can be significantly less expensive on architectures such as x86 and s390 that support strong ordering. On architectures that are able to use different primitives than their rmb/wmb() such as powerpc, ia64, and arm64 we should see gains as we are able to use less expensive barriers, and for other architectures we end up using a mb() which may come at the same amount of overhead or more than a rmb/wmb() as we must ensure Load/Store ordering. Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Cc: Frederic Weisbecker fweis...@gmail.com Cc: Mathieu Desnoyers mathieu.desnoy...@polymtl.ca Cc: Michael Ellerman mich...@ellerman.id.au Cc: Michael Neuling mi...@neuling.org Cc: Russell King li...@arm.linux.org.uk Cc: Geert Uytterhoeven ge...@linux-m68k.org Cc: Heiko Carstens heiko.carst...@de.ibm.com Cc: Linus Torvalds torva...@linux-foundation.org Cc: Martin Schwidefsky schwidef...@de.ibm.com Cc: Tony Luck tony.l...@intel.com Cc: Oleg Nesterov o...@redhat.com Cc: Will Deacon will.dea...@arm.com Cc: Paul E. McKenney paul...@linux.vnet.ibm.com Cc: Peter Zijlstra pet...@infradead.org Cc: Ingo Molnar mi...@kernel.org Cc: David Miller da...@davemloft.net Signed-off-by: Alexander Duyck alexander.h.du...@redhat.com --- arch/arm/include/asm/barrier.h | 15 + arch/arm64/include/asm/barrier.h| 59 ++- arch/ia64/include/asm/barrier.h |7 +++- arch/metag/include/asm/barrier.h| 15 + arch/mips/include/asm/barrier.h | 15 + arch/powerpc/include/asm/barrier.h | 24 +++--- arch/s390/include/asm/barrier.h |7 +++- arch/sparc/include/asm/barrier_64.h |6 ++-- arch/x86/include/asm/barrier.h | 22 - include/asm-generic/barrier.h | 15 + 10 files changed, 144 insertions(+), 41 deletions(-) diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h index c6a3e73..bbdcd34 100644 --- a/arch/arm/include/asm/barrier.h +++ b/arch/arm/include/asm/barrier.h @@ -59,6 +59,21 @@ #define smp_wmb() dmb(ishst) #endif +#define store_release(p, v)\ +do { \ + compiletime_assert_atomic_type(*p); \ + mb(); \ + ACCESS_ONCE(*p) = (v); \ +} while (0) + +#define load_acquire(p) \ +({ \ + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ + compiletime_assert_atomic_type(*p); \ + mb(); \ + ___p1; \ +}) + #define smp_store_release(p, v) \ do { \ compiletime_assert_atomic_type(*p); \ diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h index 6389d60..c91571c