Re: [dpdk-dev] [PATCH v3 1/2] eal: add 128-bit cmpset (x86-64 only)

2019-01-25 Thread Eads, Gage



> -Original Message-
> From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com]
> Sent: Wednesday, January 23, 2019 11:22 PM
> To: Eads, Gage ; dev@dpdk.org
> Cc: olivier.m...@6wind.com; arybche...@solarflare.com; Richardson, Bruce
> ; Ananyev, Konstantin
> ; nd ;
> chao...@linux.vnet.ibm.com; jer...@marvell.com; hemant.agra...@nxp.com;
> nd 
> Subject: RE: [dpdk-dev] [PATCH v3 1/2] eal: add 128-bit cmpset (x86-64 only)
> 
> > >
> > > Added other platform owners.
> > >
> > > > > > > > @@ -208,4 +209,25 @@ static inline void
> > > > > > > > rte_atomic64_clear(rte_atomic64_t
> > > > > > > > *v)  }  #endif
> > > > > > > >
> > > > > > > > +static inline int
> > > > > > > > +rte_atomic128_cmpset(volatile uint64_t *dst, uint64_t
> > > > > > > > +*exp, uint64_t
> > > > > > > > +*src) {
> > > > > > > The API name suggests it is a 128b operation. 'dst', 'exp' and 
> > > > > > > 'src'
> > > > > > > should be pointers to 128b (__int128)? Or we could define
> > > > > > > our own data
> > > > > > type.
> > > > > >
> > > > > > I agree, I'm not a big fan of the 64b pointers here. I avoided
> > > > > > __int128 originally because it fails to compile with
> > > > > > -pedantic, but on second thought (and with your suggestion of
> > > > > > a separate data type), we can resolve that with this typedef:
> > > > > >
> > > > > > typedef struct {
> > > > > > RTE_STD_C11 __int128 val; } rte_int128_t;
> > > > > ok
> > > > >
> > > > > >
> > > > > > > Since, it is a new API, can we define it with memory
> > > > > > > orderings which will be more conducive to relaxed memory
> > > > > > > ordering based
> > > > architectures?
> > > > > > > You can refer to [1] and [2] for guidance.
> > > > > >
> > > > > > I certainly see the value in controlling the operation's
> > > > > > memory ordering, like in the __atomic intrinsics, but I'm not
> > > > > > sure this patchset is the right place to address that. I see
> > > > > > that work going a couple
> > > > > ways:
> > > > > > 1. Expand the existing rte_atomicN_* interfaces with
> > > > > > additional arguments. In that case, I'd prefer this be done in
> > > > > > a separate patchset that addresses all the atomic operations,
> > > > > > not just cmpset, so the interface changes are chosen according
> > > > > > to the needs of the full set of atomic operations. If this
> > > > > > approach is taken then there's no need to solve this while
> > > > > > rte_atomic128_cmpset is experimental, since all the
> > > > > other functions are non-experimental anyway.
> > > > > >
> > > > > > - Or -
> > > > > >
> > > > > > 2. Don't modify the existing rte_atomicN_* interfaces (or
> > > > > > their strongly ordered behavior), and instead create new
> > > > > > versions of them that take additional arguments. In this case,
> > > > > > we can implement
> > > > > > rte_atomic128_cmpset() as is and create a more flexible
> > > > > > version in a later
> > > > > patchset.
> > > > > >
> > > > > > Either way, I think the current interface (w.r.t. memory
> > > > > > ordering
> > > > > > options) can work and still leaves us in a good position for
> > > > > > future
> > > > > changes/improvements.
> > > > > >
> > > > > I do not see the need to modify/extend the existing
> > > > > rte_atomicN_* APIs as the corresponding __atomic intrinsics serve as
> replacements.
> > > > > I expect that at some point, DPDK code base will not be using
> > > > rte_atomicN_* APIs.
> > > > > However, __atomic intrinsics do not support 128b wide parameters.
> > > > > Hence
> > > >
> > > > I don't think that's correct. From the GCC docs:
> > >

Re: [dpdk-dev] [PATCH v3 1/2] eal: add 128-bit cmpset (x86-64 only)

2019-01-23 Thread Honnappa Nagarahalli
> >
> > Added other platform owners.
> >
> > > > > > > @@ -208,4 +209,25 @@ static inline void
> > > > > > > rte_atomic64_clear(rte_atomic64_t
> > > > > > > *v)  }  #endif
> > > > > > >
> > > > > > > +static inline int
> > > > > > > +rte_atomic128_cmpset(volatile uint64_t *dst, uint64_t *exp,
> > > > > > > +uint64_t
> > > > > > > +*src) {
> > > > > > The API name suggests it is a 128b operation. 'dst', 'exp' and 'src'
> > > > > > should be pointers to 128b (__int128)? Or we could define our
> > > > > > own data
> > > > > type.
> > > > >
> > > > > I agree, I'm not a big fan of the 64b pointers here. I avoided
> > > > > __int128 originally because it fails to compile with -pedantic,
> > > > > but on second thought (and with your suggestion of a separate
> > > > > data type), we can resolve that with this typedef:
> > > > >
> > > > > typedef struct {
> > > > > RTE_STD_C11 __int128 val; } rte_int128_t;
> > > > ok
> > > >
> > > > >
> > > > > > Since, it is a new API, can we define it with memory orderings
> > > > > > which will be more conducive to relaxed memory ordering based
> > > architectures?
> > > > > > You can refer to [1] and [2] for guidance.
> > > > >
> > > > > I certainly see the value in controlling the operation's memory
> > > > > ordering, like in the __atomic intrinsics, but I'm not sure this
> > > > > patchset is the right place to address that. I see that work
> > > > > going a couple
> > > > ways:
> > > > > 1. Expand the existing rte_atomicN_* interfaces with additional
> > > > > arguments. In that case, I'd prefer this be done in a separate
> > > > > patchset that addresses all the atomic operations, not just
> > > > > cmpset, so the interface changes are chosen according to the
> > > > > needs of the full set of atomic operations. If this approach is
> > > > > taken then there's no need to solve this while
> > > > > rte_atomic128_cmpset is experimental, since all the
> > > > other functions are non-experimental anyway.
> > > > >
> > > > > - Or -
> > > > >
> > > > > 2. Don't modify the existing rte_atomicN_* interfaces (or their
> > > > > strongly ordered behavior), and instead create new versions of
> > > > > them that take additional arguments. In this case, we can
> > > > > implement
> > > > > rte_atomic128_cmpset() as is and create a more flexible version
> > > > > in a later
> > > > patchset.
> > > > >
> > > > > Either way, I think the current interface (w.r.t. memory
> > > > > ordering
> > > > > options) can work and still leaves us in a good position for
> > > > > future
> > > > changes/improvements.
> > > > >
> > > > I do not see the need to modify/extend the existing rte_atomicN_*
> > > > APIs as the corresponding __atomic intrinsics serve as replacements.
> > > > I expect that at some point, DPDK code base will not be using
> > > rte_atomicN_* APIs.
> > > > However, __atomic intrinsics do not support 128b wide parameters.
> > > > Hence
> > >
> > > I don't think that's correct. From the GCC docs:
> > >
> > > "16-byte integral types are also allowed if `__int128' (see
> > > __int128) is supported by the architecture."
> > >
> > > This works with x86 -64 -- I assume aarch64 also, but haven't confirmed.
> > >
> > > Source:
> > > https://gcc.gnu.org/onlinedocs/gcc-4.7.0/gcc/_005f_005fatomic-
> > > Builtins.html
> > (following is based on my reading, not based on experiments) My
> > understanding is that the __atomic_load_n/store_n were implemented
> > using a compare-and- swap HW instruction (due to lack of HW 128b
> > atomic load and store instructions). This introduced the side effect
> > or store/load respectively. Where as the user does not expect such
> > side effects. As suggested in [1], it looks like GCC delegated the
> > implementation to libatomic which 'it seems' uses locks to implement
> > 128b __atomic intrinsics (needs to be verified)
> >
> > If __atomic intrinsics, for any of the supported platforms, do not
> > have an optimal implementation, I prefer to add a DPDK API as an
> > abstraction. A given platform can choose to implement such an API
> > using __atomic intrinsics if it wants. The DPDK API can be similar to
> __atomic_compare_exchange_n.
> >
> 
> Certainly. From the linked discussions, I see how this would affect the design
> of (hypothetical functions) rte_atomic128_read() and rte_atomic128_set(),
> but I don't see anything that suggests (for the architectures being discussed)
> that __atomic_compare_exchange_n is suboptimal.
I wrote some code and generated assembly to verify what is happening. On 
aarch64, this call is delegated to libatomic and libatomic needs to be linked. 
In the generated assembly, it is clear that it uses locks (pthread mutex lock) 
to provide atomicity for. For 32b and 64b the compiler generates the expected 
inline assembly. Both, ' __atomic_always_lock_free' and ' 
__atomic_is_lock_free' return 0 to indicate that 128b __atomic intrinsics are 
not lock free. (gcc - 8.2)

Out of curiosity, I also did similar experiments on x86 (CP

Re: [dpdk-dev] [PATCH v3 1/2] eal: add 128-bit cmpset (x86-64 only)

2019-01-22 Thread Eads, Gage



> -Original Message-
> From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com]
> Sent: Tuesday, January 22, 2019 2:31 PM
> To: Eads, Gage ; dev@dpdk.org
> Cc: olivier.m...@6wind.com; arybche...@solarflare.com; Richardson, Bruce
> ; Ananyev, Konstantin
> ; nd ;
> chao...@linux.vnet.ibm.com; jer...@marvell.com; hemant.agra...@nxp.com;
> nd 
> Subject: RE: [dpdk-dev] [PATCH v3 1/2] eal: add 128-bit cmpset (x86-64 only)
> 
> Added other platform owners.
> 
> > > > > > @@ -208,4 +209,25 @@ static inline void
> > > > > > rte_atomic64_clear(rte_atomic64_t
> > > > > > *v)  }  #endif
> > > > > >
> > > > > > +static inline int
> > > > > > +rte_atomic128_cmpset(volatile uint64_t *dst, uint64_t *exp,
> > > > > > +uint64_t
> > > > > > +*src) {
> > > > > The API name suggests it is a 128b operation. 'dst', 'exp' and 'src'
> > > > > should be pointers to 128b (__int128)? Or we could define our
> > > > > own data
> > > > type.
> > > >
> > > > I agree, I'm not a big fan of the 64b pointers here. I avoided
> > > > __int128 originally because it fails to compile with -pedantic,
> > > > but on second thought (and with your suggestion of a separate data
> > > > type), we can resolve that with this typedef:
> > > >
> > > > typedef struct {
> > > > RTE_STD_C11 __int128 val;
> > > > } rte_int128_t;
> > > ok
> > >
> > > >
> > > > > Since, it is a new API, can we define it with memory orderings
> > > > > which will be more conducive to relaxed memory ordering based
> > architectures?
> > > > > You can refer to [1] and [2] for guidance.
> > > >
> > > > I certainly see the value in controlling the operation's memory
> > > > ordering, like in the __atomic intrinsics, but I'm not sure this
> > > > patchset is the right place to address that. I see that work going
> > > > a couple
> > > ways:
> > > > 1. Expand the existing rte_atomicN_* interfaces with additional
> > > > arguments. In that case, I'd prefer this be done in a separate
> > > > patchset that addresses all the atomic operations, not just
> > > > cmpset, so the interface changes are chosen according to the needs
> > > > of the full set of atomic operations. If this approach is taken
> > > > then there's no need to solve this while rte_atomic128_cmpset is
> > > > experimental, since all the
> > > other functions are non-experimental anyway.
> > > >
> > > > - Or -
> > > >
> > > > 2. Don't modify the existing rte_atomicN_* interfaces (or their
> > > > strongly ordered behavior), and instead create new versions of
> > > > them that take additional arguments. In this case, we can
> > > > implement
> > > > rte_atomic128_cmpset() as is and create a more flexible version in
> > > > a later
> > > patchset.
> > > >
> > > > Either way, I think the current interface (w.r.t. memory ordering
> > > > options) can work and still leaves us in a good position for
> > > > future
> > > changes/improvements.
> > > >
> > > I do not see the need to modify/extend the existing rte_atomicN_*
> > > APIs as the corresponding __atomic intrinsics serve as replacements.
> > > I expect that at some point, DPDK code base will not be using
> > rte_atomicN_* APIs.
> > > However, __atomic intrinsics do not support 128b wide parameters.
> > > Hence
> >
> > I don't think that's correct. From the GCC docs:
> >
> > "16-byte integral types are also allowed if `__int128' (see __int128)
> > is supported by the architecture."
> >
> > This works with x86 -64 -- I assume aarch64 also, but haven't confirmed.
> >
> > Source: https://gcc.gnu.org/onlinedocs/gcc-4.7.0/gcc/_005f_005fatomic-
> > Builtins.html
> (following is based on my reading, not based on experiments) My understanding
> is that the __atomic_load_n/store_n were implemented using a compare-and-
> swap HW instruction (due to lack of HW 128b atomic load and store
> instructions). This introduced the side effect or store/load respectively. 
> Where
> as the user does not expect such side effects. As suggested in [1], it looks 
> like
> GCC delegate

Re: [dpdk-dev] [PATCH v3 1/2] eal: add 128-bit cmpset (x86-64 only)

2019-01-22 Thread Honnappa Nagarahalli
Added other platform owners.

> > > > > @@ -208,4 +209,25 @@ static inline void
> > > > > rte_atomic64_clear(rte_atomic64_t
> > > > > *v)  }  #endif
> > > > >
> > > > > +static inline int
> > > > > +rte_atomic128_cmpset(volatile uint64_t *dst, uint64_t *exp,
> > > > > +uint64_t
> > > > > +*src) {
> > > > The API name suggests it is a 128b operation. 'dst', 'exp' and 'src'
> > > > should be pointers to 128b (__int128)? Or we could define our own
> > > > data
> > > type.
> > >
> > > I agree, I'm not a big fan of the 64b pointers here. I avoided
> > > __int128 originally because it fails to compile with -pedantic, but
> > > on second thought (and with your suggestion of a separate data
> > > type), we can resolve that with this typedef:
> > >
> > > typedef struct {
> > > RTE_STD_C11 __int128 val;
> > > } rte_int128_t;
> > ok
> >
> > >
> > > > Since, it is a new API, can we define it with memory orderings
> > > > which will be more conducive to relaxed memory ordering based
> architectures?
> > > > You can refer to [1] and [2] for guidance.
> > >
> > > I certainly see the value in controlling the operation's memory
> > > ordering, like in the __atomic intrinsics, but I'm not sure this
> > > patchset is the right place to address that. I see that work going a
> > > couple
> > ways:
> > > 1. Expand the existing rte_atomicN_* interfaces with additional
> > > arguments. In that case, I'd prefer this be done in a separate
> > > patchset that addresses all the atomic operations, not just cmpset,
> > > so the interface changes are chosen according to the needs of the
> > > full set of atomic operations. If this approach is taken then
> > > there's no need to solve this while rte_atomic128_cmpset is
> > > experimental, since all the
> > other functions are non-experimental anyway.
> > >
> > > - Or -
> > >
> > > 2. Don't modify the existing rte_atomicN_* interfaces (or their
> > > strongly ordered behavior), and instead create new versions of them
> > > that take additional arguments. In this case, we can implement
> > > rte_atomic128_cmpset() as is and create a more flexible version in a
> > > later
> > patchset.
> > >
> > > Either way, I think the current interface (w.r.t. memory ordering
> > > options) can work and still leaves us in a good position for future
> > changes/improvements.
> > >
> > I do not see the need to modify/extend the existing rte_atomicN_* APIs
> > as the corresponding __atomic intrinsics serve as replacements. I
> > expect that at some point, DPDK code base will not be using
> rte_atomicN_* APIs.
> > However, __atomic intrinsics do not support 128b wide parameters.
> > Hence
> 
> I don't think that's correct. From the GCC docs:
> 
> "16-byte integral types are also allowed if `__int128' (see __int128) is
> supported by the architecture."
> 
> This works with x86 -64 -- I assume aarch64 also, but haven't confirmed.
> 
> Source: https://gcc.gnu.org/onlinedocs/gcc-4.7.0/gcc/_005f_005fatomic-
> Builtins.html
(following is based on my reading, not based on experiments)
My understanding is that the __atomic_load_n/store_n were implemented using a 
compare-and-swap HW instruction (due to lack of HW 128b atomic load and store 
instructions). This introduced the side effect or store/load respectively. 
Where as the user does not expect such side effects. As suggested in [1], it 
looks like GCC delegated the implementation to libatomic which 'it seems' uses 
locks to implement 128b __atomic intrinsics (needs to be verified)

If __atomic intrinsics, for any of the supported platforms, do not have an 
optimal implementation, I prefer to add a DPDK API as an abstraction. A given 
platform can choose to implement such an API using __atomic intrinsics if it 
wants. The DPDK API can be similar to __atomic_compare_exchange_n.

[1] https://patchwork.ozlabs.org/patch/721686/
[2] https://gcc.gnu.org/ml/gcc/2017-01/msg00167.html

> 
> > DPDK needs to write its own. Since this is the first API in that
> > regard, I prefer that we start with a signature that resembles
> > __atomic intrinsics which have been proven to provide best flexibility for
> all the platforms supported by DPDK.


Re: [dpdk-dev] [PATCH v3 1/2] eal: add 128-bit cmpset (x86-64 only)

2019-01-18 Thread Eads, Gage



> -Original Message-
> From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com]
> Sent: Thursday, January 17, 2019 11:28 PM
> To: Eads, Gage ; dev@dpdk.org
> Cc: olivier.m...@6wind.com; arybche...@solarflare.com; Richardson, Bruce
> ; Ananyev, Konstantin
> ; nd ; nd 
> Subject: RE: [dpdk-dev] [PATCH v3 1/2] eal: add 128-bit cmpset (x86-64 only)
> 
> > > >
> > > > This operation can be used for non-blocking algorithms, such as a
> > > > non- blocking stack or ring.
> > > >
> > > > Signed-off-by: Gage Eads 
> > > > ---
> > > >  .../common/include/arch/x86/rte_atomic_64.h| 22
> > > > ++
> > > >  1 file changed, 22 insertions(+)
> > > >
> > > > diff --git
> > > > a/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
> > > > b/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
> > > > index fd2ec9c53..34c2addf8 100644
> > > > --- a/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
> > > > +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
> > > Since this is a 128b operation should there be a new file created
> > > with the name rte_atomic_128.h?
> > >
> > > > @@ -34,6 +34,7 @@
> > > >  /*
> > > >   * Inspired from FreeBSD src/sys/amd64/include/atomic.h
> > > >   * Copyright (c) 1998 Doug Rabson
> > > > + * Copyright (c) 2019 Intel Corporation
> > > >   * All rights reserved.
> > > >   */
> > > >
> > > > @@ -208,4 +209,25 @@ static inline void
> > > > rte_atomic64_clear(rte_atomic64_t
> > > > *v)  }  #endif
> > > >
> > > > +static inline int
> > > > +rte_atomic128_cmpset(volatile uint64_t *dst, uint64_t *exp,
> > > > +uint64_t
> > > > +*src) {
> > > The API name suggests it is a 128b operation. 'dst', 'exp' and 'src'
> > > should be pointers to 128b (__int128)? Or we could define our own
> > > data
> > type.
> >
> > I agree, I'm not a big fan of the 64b pointers here. I avoided
> > __int128 originally because it fails to compile with -pedantic, but on
> > second thought (and with your suggestion of a separate data type), we
> > can resolve that with this typedef:
> >
> > typedef struct {
> > RTE_STD_C11 __int128 val;
> > } rte_int128_t;
> ok
> 
> >
> > > Since, it is a new API, can we define it with memory orderings which
> > > will be more conducive to relaxed memory ordering based architectures?
> > > You can refer to [1] and [2] for guidance.
> >
> > I certainly see the value in controlling the operation's memory
> > ordering, like in the __atomic intrinsics, but I'm not sure this
> > patchset is the right place to address that. I see that work going a couple
> ways:
> > 1. Expand the existing rte_atomicN_* interfaces with additional
> > arguments. In that case, I'd prefer this be done in a separate
> > patchset that addresses all the atomic operations, not just cmpset, so
> > the interface changes are chosen according to the needs of the full
> > set of atomic operations. If this approach is taken then there's no
> > need to solve this while rte_atomic128_cmpset is experimental, since all the
> other functions are non-experimental anyway.
> >
> > - Or -
> >
> > 2. Don't modify the existing rte_atomicN_* interfaces (or their
> > strongly ordered behavior), and instead create new versions of them
> > that take additional arguments. In this case, we can implement
> > rte_atomic128_cmpset() as is and create a more flexible version in a later
> patchset.
> >
> > Either way, I think the current interface (w.r.t. memory ordering
> > options) can work and still leaves us in a good position for future
> changes/improvements.
> >
> I do not see the need to modify/extend the existing rte_atomicN_* APIs as the
> corresponding __atomic intrinsics serve as replacements. I expect that at some
> point, DPDK code base will not be using rte_atomicN_* APIs.
> However, __atomic intrinsics do not support 128b wide parameters. Hence

I don't think that's correct. From the GCC docs:

"16-byte integral types are also allowed if `__int128' (see __int128) is 
supported by the architecture."

This works with x86 -64 -- I assume aarch64 also, but haven't confirmed.

Source: 
https://gcc.gnu.org/onlinedocs/gcc-4.7.0/gcc/_005f_005fatomic-Builtins.html

> DPDK needs to write its own. Since this is the first API in that regard, I 
> prefer that
> we start with a signature that resembles __atomic intrinsics which have been
> proven to provide best flexibility for all the platforms supported by DPDK.


Re: [dpdk-dev] [PATCH v3 1/2] eal: add 128-bit cmpset (x86-64 only)

2019-01-17 Thread Honnappa Nagarahalli
> > >
> > > This operation can be used for non-blocking algorithms, such as a
> > > non- blocking stack or ring.
> > >
> > > Signed-off-by: Gage Eads 
> > > ---
> > >  .../common/include/arch/x86/rte_atomic_64.h| 22
> > > ++
> > >  1 file changed, 22 insertions(+)
> > >
> > > diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
> > > b/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
> > > index fd2ec9c53..34c2addf8 100644
> > > --- a/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
> > > +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
> > Since this is a 128b operation should there be a new file created with
> > the name rte_atomic_128.h?
> >
> > > @@ -34,6 +34,7 @@
> > >  /*
> > >   * Inspired from FreeBSD src/sys/amd64/include/atomic.h
> > >   * Copyright (c) 1998 Doug Rabson
> > > + * Copyright (c) 2019 Intel Corporation
> > >   * All rights reserved.
> > >   */
> > >
> > > @@ -208,4 +209,25 @@ static inline void
> > > rte_atomic64_clear(rte_atomic64_t
> > > *v)  }  #endif
> > >
> > > +static inline int
> > > +rte_atomic128_cmpset(volatile uint64_t *dst, uint64_t *exp,
> > > +uint64_t
> > > +*src) {
> > The API name suggests it is a 128b operation. 'dst', 'exp' and 'src'
> > should be pointers to 128b (__int128)? Or we could define our own data
> type.
> 
> I agree, I'm not a big fan of the 64b pointers here. I avoided __int128
> originally because it fails to compile with -pedantic, but on second thought
> (and with your suggestion of a separate data type), we can resolve that with
> this typedef:
> 
> typedef struct {
> RTE_STD_C11 __int128 val;
> } rte_int128_t;
ok

> 
> > Since, it is a new API, can we define it with memory orderings which
> > will be more conducive to relaxed memory ordering based architectures?
> > You can refer to [1] and [2] for guidance.
> 
> I certainly see the value in controlling the operation's memory ordering, 
> like in
> the __atomic intrinsics, but I'm not sure this patchset is the right place to
> address that. I see that work going a couple ways:
> 1. Expand the existing rte_atomicN_* interfaces with additional arguments. In
> that case, I'd prefer this be done in a separate patchset that addresses all 
> the
> atomic operations, not just cmpset, so the interface changes are chosen
> according to the needs of the full set of atomic operations. If this approach 
> is
> taken then there's no need to solve this while rte_atomic128_cmpset is
> experimental, since all the other functions are non-experimental anyway.
> 
> - Or -
> 
> 2. Don't modify the existing rte_atomicN_* interfaces (or their strongly
> ordered behavior), and instead create new versions of them that take
> additional arguments. In this case, we can implement rte_atomic128_cmpset()
> as is and create a more flexible version in a later patchset.
> 
> Either way, I think the current interface (w.r.t. memory ordering options) can
> work and still leaves us in a good position for future changes/improvements.
> 
I do not see the need to modify/extend the existing rte_atomicN_* APIs as the 
corresponding __atomic intrinsics serve as replacements. I expect that at some 
point, DPDK code base will not be using rte_atomicN_* APIs.
However, __atomic intrinsics do not support 128b wide parameters. Hence DPDK 
needs to write its own. Since this is the first API in that regard, I prefer 
that we start with a signature that resembles __atomic intrinsics which have 
been proven to provide best flexibility for all the platforms supported by DPDK.

> > If this an external API, it requires 'experimental' tag.
> 
> Good catch -- will fix.
> 
> >
> > 1. https://github.com/ARM-
> > software/progress64/blob/master/src/lockfree/aarch64.h#L63
> 
> I didn't know about aarch64's CASP instruction -- very cool!
> 
> > 2. https://github.com/ARM-
> > software/progress64/blob/master/src/lockfree/x86-64.h#L34
> >
> > > + uint8_t res;
> > > +
> > > + asm volatile (
> > > +   MPLOCKED
> > > +   "cmpxchg16b %[dst];"
> > > +   " sete %[res]"
> > > +   : [dst] "=m" (*dst),
> > > + [res] "=r" (res)
> > > +   : "c" (src[1]),
> > > + "b" (src[0]),
> > > + "m" (*dst),
> > > + "d" (exp[1]),
> > > + "a" (exp[0])
> > > +   : "memory");
> > > +
> > > + return res;
> > > +}
> > > +
> > >  #endif /* _RTE_ATOMIC_X86_64_H_ */
> > > --
> > > 2.13.6



Re: [dpdk-dev] [PATCH v3 1/2] eal: add 128-bit cmpset (x86-64 only)

2019-01-17 Thread Eads, Gage



> -Original Message-
> From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com]
> Sent: Thursday, January 17, 2019 9:45 AM
> To: Eads, Gage ; dev@dpdk.org
> Cc: olivier.m...@6wind.com; arybche...@solarflare.com; Richardson, Bruce
> ; Ananyev, Konstantin
> ; nd ; Honnappa Nagarahalli
> ; nd 
> Subject: RE: [dpdk-dev] [PATCH v3 1/2] eal: add 128-bit cmpset (x86-64 only)
> 
> > Subject: [dpdk-dev] [PATCH v3 1/2] eal: add 128-bit cmpset (x86-64
> > only)
> >
> > This operation can be used for non-blocking algorithms, such as a non-
> > blocking stack or ring.
> >
> > Signed-off-by: Gage Eads 
> > ---
> >  .../common/include/arch/x86/rte_atomic_64.h| 22
> > ++
> >  1 file changed, 22 insertions(+)
> >
> > diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
> > b/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
> > index fd2ec9c53..34c2addf8 100644
> > --- a/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
> > +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
> Since this is a 128b operation should there be a new file created with the 
> name
> rte_atomic_128.h?
> 
> > @@ -34,6 +34,7 @@
> >  /*
> >   * Inspired from FreeBSD src/sys/amd64/include/atomic.h
> >   * Copyright (c) 1998 Doug Rabson
> > + * Copyright (c) 2019 Intel Corporation
> >   * All rights reserved.
> >   */
> >
> > @@ -208,4 +209,25 @@ static inline void
> > rte_atomic64_clear(rte_atomic64_t
> > *v)  }  #endif
> >
> > +static inline int
> > +rte_atomic128_cmpset(volatile uint64_t *dst, uint64_t *exp, uint64_t
> > +*src) {
> The API name suggests it is a 128b operation. 'dst', 'exp' and 'src' should be
> pointers to 128b (__int128)? Or we could define our own data type.

I agree, I'm not a big fan of the 64b pointers here. I avoided __int128 
originally because it fails to compile with -pedantic, but on second thought 
(and with your suggestion of a separate data type), we can resolve that with 
this typedef:

typedef struct {
RTE_STD_C11 __int128 val;
} rte_int128_t;

> Since, it is a new API, can we define it with memory orderings which will be 
> more
> conducive to relaxed memory ordering based architectures? You can refer to [1]
> and [2] for guidance.

I certainly see the value in controlling the operation's memory ordering, like 
in the __atomic intrinsics, but I'm not sure this patchset is the right place 
to address that. I see that work going a couple ways:
1. Expand the existing rte_atomicN_* interfaces with additional arguments. In 
that case, I'd prefer this be done in a separate patchset that addresses all 
the atomic operations, not just cmpset, so the interface changes are chosen 
according to the needs of the full set of atomic operations. If this approach 
is taken then there's no need to solve this while rte_atomic128_cmpset is 
experimental, since all the other functions are non-experimental anyway.

- Or -

2. Don't modify the existing rte_atomicN_* interfaces (or their strongly 
ordered behavior), and instead create new versions of them that take additional 
arguments. In this case, we can implement rte_atomic128_cmpset() as is and 
create a more flexible version in a later patchset.

Either way, I think the current interface (w.r.t. memory ordering options) can 
work and still leaves us in a good position for future changes/improvements.

> If this an external API, it requires 'experimental' tag.

Good catch -- will fix.

> 
> 1. https://github.com/ARM-
> software/progress64/blob/master/src/lockfree/aarch64.h#L63

I didn't know about aarch64's CASP instruction -- very cool! 

> 2. https://github.com/ARM-
> software/progress64/blob/master/src/lockfree/x86-64.h#L34
> 
> > +   uint8_t res;
> > +
> > +   asm volatile (
> > + MPLOCKED
> > + "cmpxchg16b %[dst];"
> > + " sete %[res]"
> > + : [dst] "=m" (*dst),
> > +   [res] "=r" (res)
> > + : "c" (src[1]),
> > +   "b" (src[0]),
> > +   "m" (*dst),
> > +   "d" (exp[1]),
> > +   "a" (exp[0])
> > + : "memory");
> > +
> > +   return res;
> > +}
> > +
> >  #endif /* _RTE_ATOMIC_X86_64_H_ */
> > --
> > 2.13.6



Re: [dpdk-dev] [PATCH v3 1/2] eal: add 128-bit cmpset (x86-64 only)

2019-01-17 Thread Honnappa Nagarahalli
> Subject: [dpdk-dev] [PATCH v3 1/2] eal: add 128-bit cmpset (x86-64 only)
> 
> This operation can be used for non-blocking algorithms, such as a non-
> blocking stack or ring.
> 
> Signed-off-by: Gage Eads 
> ---
>  .../common/include/arch/x86/rte_atomic_64.h| 22
> ++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
> b/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
> index fd2ec9c53..34c2addf8 100644
> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
Since this is a 128b operation should there be a new file created with the name 
rte_atomic_128.h?

> @@ -34,6 +34,7 @@
>  /*
>   * Inspired from FreeBSD src/sys/amd64/include/atomic.h
>   * Copyright (c) 1998 Doug Rabson
> + * Copyright (c) 2019 Intel Corporation
>   * All rights reserved.
>   */
> 
> @@ -208,4 +209,25 @@ static inline void rte_atomic64_clear(rte_atomic64_t
> *v)  }  #endif
> 
> +static inline int
> +rte_atomic128_cmpset(volatile uint64_t *dst, uint64_t *exp, uint64_t
> +*src) {
The API name suggests it is a 128b operation. 'dst', 'exp' and 'src' should be 
pointers to 128b (__int128)? Or we could define our own data type.
Since, it is a new API, can we define it with memory orderings which will be 
more conducive to relaxed memory ordering based architectures? You can refer to 
[1] and [2] for guidance.
If this an external API, it requires 'experimental' tag.

1. 
https://github.com/ARM-software/progress64/blob/master/src/lockfree/aarch64.h#L63
2. 
https://github.com/ARM-software/progress64/blob/master/src/lockfree/x86-64.h#L34

> + uint8_t res;
> +
> + asm volatile (
> +   MPLOCKED
> +   "cmpxchg16b %[dst];"
> +   " sete %[res]"
> +   : [dst] "=m" (*dst),
> + [res] "=r" (res)
> +   : "c" (src[1]),
> + "b" (src[0]),
> + "m" (*dst),
> + "d" (exp[1]),
> + "a" (exp[0])
> +   : "memory");
> +
> + return res;
> +}
> +
>  #endif /* _RTE_ATOMIC_X86_64_H_ */
> --
> 2.13.6



[dpdk-dev] [PATCH v3 1/2] eal: add 128-bit cmpset (x86-64 only)

2019-01-16 Thread Gage Eads
This operation can be used for non-blocking algorithms, such as a
non-blocking stack or ring.

Signed-off-by: Gage Eads 
---
 .../common/include/arch/x86/rte_atomic_64.h| 22 ++
 1 file changed, 22 insertions(+)

diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h 
b/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
index fd2ec9c53..34c2addf8 100644
--- a/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
+++ b/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
@@ -34,6 +34,7 @@
 /*
  * Inspired from FreeBSD src/sys/amd64/include/atomic.h
  * Copyright (c) 1998 Doug Rabson
+ * Copyright (c) 2019 Intel Corporation
  * All rights reserved.
  */
 
@@ -208,4 +209,25 @@ static inline void rte_atomic64_clear(rte_atomic64_t *v)
 }
 #endif
 
+static inline int
+rte_atomic128_cmpset(volatile uint64_t *dst, uint64_t *exp, uint64_t *src)
+{
+   uint8_t res;
+
+   asm volatile (
+ MPLOCKED
+ "cmpxchg16b %[dst];"
+ " sete %[res]"
+ : [dst] "=m" (*dst),
+   [res] "=r" (res)
+ : "c" (src[1]),
+   "b" (src[0]),
+   "m" (*dst),
+   "d" (exp[1]),
+   "a" (exp[0])
+ : "memory");
+
+   return res;
+}
+
 #endif /* _RTE_ATOMIC_X86_64_H_ */
-- 
2.13.6