Re: [dpdk-dev] [PATCH v3 1/3] rwlock: reimplement with atomic builtins
Hi Konstantin, > -Original Message- > From: Ananyev, Konstantin > Sent: Friday, March 15, 2019 7:41 PM > To: Joyce Kong (Arm Technology China) ; > dev@dpdk.org > Cc: nd ; Gavin Hu (Arm Technology China) > ; jer...@marvell.com; chao...@linux.vnet.ibm.com; > Richardson, Bruce ; tho...@monjalon.net; > hemant.agra...@nxp.com; Honnappa Nagarahalli > > Subject: RE: [PATCH v3 1/3] rwlock: reimplement with atomic builtins > > Hi, > > > > The __sync builtin based implementation generates full memory > > barriers ('dmb ish') on Arm platforms. Using C11 atomic builtins > > to generate one way barriers. > > > > Here is the assembly code of __sync_compare_and_swap builtin. > > __sync_bool_compare_and_swap(dst, exp, src); > >0x0090f1b0 <+16>:e0 07 40 f9 ldr x0, [sp, #8] > >0x0090f1b4 <+20>:e1 0f 40 79 ldrhw1, [sp, #6] > >0x0090f1b8 <+24>:e2 0b 40 79 ldrhw2, [sp, #4] > >0x0090f1bc <+28>:21 3c 00 12 and w1, w1, #0x > >0x0090f1c0 <+32>:03 7c 5f 48 ldxrh w3, [x0] > >0x0090f1c4 <+36>:7f 00 01 6b cmp w3, w1 > >0x0090f1c8 <+40>:61 00 00 54 b.ne0x90f1d4 > > // b.any > >0x0090f1cc <+44>:02 fc 04 48 stlxrh w4, w2, [x0] > >0x0090f1d0 <+48>:84 ff ff 35 cbnzw4, 0x90f1c0 > > > >0x0090f1d4 <+52>:bf 3b 03 d5 dmb ish > >0x0090f1d8 <+56>:e0 17 9f 1a csetw0, eq // eq = none > > > > Signed-off-by: Gavin Hu > > Tested-by: Joyce Kong > > Acked-by: Jerin Jacob > > --- > > Wouldn't it be plausible to change _try_ functions to use __atomic too (for > consistency)? > Apart from that looks good to me. > Konstantin Sure, we will do it in next version.
Re: [dpdk-dev] [PATCH v3 1/3] rwlock: reimplement with atomic builtins
Hi, > The __sync builtin based implementation generates full memory > barriers ('dmb ish') on Arm platforms. Using C11 atomic builtins > to generate one way barriers. > > Here is the assembly code of __sync_compare_and_swap builtin. > __sync_bool_compare_and_swap(dst, exp, src); >0x0090f1b0 <+16>:e0 07 40 f9 ldr x0, [sp, #8] >0x0090f1b4 <+20>:e1 0f 40 79 ldrhw1, [sp, #6] >0x0090f1b8 <+24>:e2 0b 40 79 ldrhw2, [sp, #4] >0x0090f1bc <+28>:21 3c 00 12 and w1, w1, #0x >0x0090f1c0 <+32>:03 7c 5f 48 ldxrh w3, [x0] >0x0090f1c4 <+36>:7f 00 01 6b cmp w3, w1 >0x0090f1c8 <+40>:61 00 00 54 b.ne0x90f1d4 > // b.any >0x0090f1cc <+44>:02 fc 04 48 stlxrh w4, w2, [x0] >0x0090f1d0 <+48>:84 ff ff 35 cbnzw4, 0x90f1c0 > >0x0090f1d4 <+52>:bf 3b 03 d5 dmb ish >0x0090f1d8 <+56>:e0 17 9f 1a csetw0, eq // eq = none > > Signed-off-by: Gavin Hu > Tested-by: Joyce Kong > Acked-by: Jerin Jacob > --- Wouldn't it be plausible to change _try_ functions to use __atomic too (for consistency)? Apart from that looks good to me. Konstantin
Re: [dpdk-dev] [PATCH v3 1/3] rwlock: reimplement with atomic builtins
Hi Stephen, > -Original Message- > From: Stephen Hemminger > Sent: Thursday, March 14, 2019 11:54 PM > To: Joyce Kong (Arm Technology China) > Cc: dev@dpdk.org; nd ; Gavin Hu (Arm Technology China) > ; jer...@marvell.com; konstantin.anan...@intel.com; > chao...@linux.vnet.ibm.com; bruce.richard...@intel.com; > tho...@monjalon.net; hemant.agra...@nxp.com; Honnappa Nagarahalli > > Subject: Re: [dpdk-dev] [PATCH v3 1/3] rwlock: reimplement with atomic > builtins > > On Thu, 14 Mar 2019 21:15:02 +0800 > Joyce Kong wrote: > > > - success = rte_atomic32_cmpset((volatile uint32_t *)&rwl- > >cnt, > > - (uint32_t)x, (uint32_t)(x + 1)); > > + success = __atomic_compare_exchange_n(&rwl->cnt, &x, > x+1, 1, > > + __ATOMIC_ACQUIRE, > __ATOMIC_RELAXED); > > Would it be possible to have rte_atomic32_cmpset be an inline function > that became __atomic_comppare_exchange? Then all usages would be the > same. There is already a patch for this and Honnappa commented on this: https://mails.dpdk.org/archives/dev/2019-January/124297.html
Re: [dpdk-dev] [PATCH v3 1/3] rwlock: reimplement with atomic builtins
On Thu, 14 Mar 2019 21:15:02 +0800 Joyce Kong wrote: > - success = rte_atomic32_cmpset((volatile uint32_t *)&rwl->cnt, > - (uint32_t)x, (uint32_t)(x + 1)); > + success = __atomic_compare_exchange_n(&rwl->cnt, &x, x+1, 1, > + __ATOMIC_ACQUIRE, __ATOMIC_RELAXED); Would it be possible to have rte_atomic32_cmpset be an inline function that became __atomic_comppare_exchange? Then all usages would be the same.