Re: [vpp-dev] Bihash is considered thread-safe but probably shouldn't

2021-11-03 Thread Florin Coras
Agreed it’s unlikely so maybe just use the 2 bits left for the epoch counter as 
a middle ground? The new approach should be better either way :-)

Florin


> On Nov 3, 2021, at 11:55 AM, Damjan Marion  wrote:
> 
> What about the following, we shift offset by 6, as all buckets are aligned to 
> 64, anyway,  and that gives us 6 more bits so we can have 8 bit epoch 
> counter…. ?
> 
> — 
> Damjan
> 
>> On 03.11.2021., at 19:45, Damjan Marion  wrote:
>> 
>> 
>> 
>> yes, i am aware of that, it is extremelly unlikely and only way i can see 
>> this fixed is introducing epoch on the bucket level but we dont have enough 
>> space there…. 
>> 
>> — 
>> Damjan
>> 
>>> On 03.11.2021., at 19:16, Florin Coras  wrote:
>>> 
>>> Hi Damjan, 
>>> 
>>> Definitely like the scheme but the change bit might not be enough, unless 
>>> I’m misunderstanding. For instance, two consecutive updates to a bucket 
>>> before reader grabs b1 will hide the change. 
>>> 
>>> Florin
>>> 
 On Nov 3, 2021, at 9:36 AM, Damjan Marion via lists.fd.io 
  >>> > wrote:
 
 
 Agree with Dave on atomic ops being bad on the reader side.
 
 What about following schema:
 
 As bucket is just u64 value on the reader side we grab bucket before (b0) 
 and after (b1) search operation.
 
 If search finds entry, we simply do 2 checks:
 - that b0 is equal to b1
 - that lock bit is not set in both of them
 If check fails, we simply retry.
 
 On the writer side, we have add, remove and replace operations.
 First 2 alter refcnt which is part of bucket.
 To deal with replace case we introduce another bit (change bit) which is 
 flipped every time data is changed in the bucket.
 
 Here are possible scenarios:
 
 - reader grabs b0 before lock and b1 after unlock
- add, del - refcnt and change bit will be different between b0 and b1 
 causing retry
- replace - change bit will be different between b0 and b1 causing retry
 
 - reader grabs b0 after lock and/or b1 before unlock
- lock bit will be set causing retry  
 
 Of course, this to work properly we need to ensure proper memory ordering 
 (i.e. avoid bucket change to be visible to remote thread before kvp 
 change).
 
 I crafted WIP patch to present my idea:
 
 https://gerrit.fd.io/r/c/vpp/+/34326 
 
 In this patch I get a rid of all store barriers and replaced them with 
 more lightweight:
 
 __atomic_store_n (ptr, val, __ATOMIC_RELEASE);
 
 On platforms with strong memory ordering (like x86_64) this will result 
 with just normal stores (but compiler will know that it should not reorder 
 them).
 On platforms with weak memory ordering (like arch64) this will result in 
 special store instruction, but that one is still cheaper than full memory 
 barrier.
 
 Thoughts? Comments?
 
 Thanks,
 
 — 
 Damjan
 
 
 
> On 02.11.2021., at 12:14, Dave Barach  > wrote:
> 
> Dear Nick,
> 
> As the code comment suggests, we tiptoe right up to the line to extract 
> performance. Have you tried e.g. ISOLCPUS, thread priority, or some other 
> expedients to make the required assumptions true?
> 
> It’s easy enough to change the code in various ways so this use-case 
> cannot backfire. High on the list: always make a working copy of the 
> bucket, vs. update in place. Won’t help write performance, but it’s 
> likely to make the pain go away.
> 
> Bucket-level reader-locks would involve adding Avogadro’s number of 
> atomic ops to the predominant case. I’m pretty sure that’s a non-starter.
> 
> FWIW... Dave
> 
> 
> From: vpp-dev@lists.fd.io  
> mailto:vpp-dev@lists.fd.io>> On Behalf Of Nick 
> Zavaritsky
> Sent: Monday, November 1, 2021 12:12 PM
> To: vpp-dev@lists.fd.io 
> Subject: [vpp-dev] Bihash is considered thread-safe but probably shouldn't
> 
> Hello bihash experts!
> 
> There's an old thread claiming that bihash lookup can produce a value=-1 
> under intense add/delete concurrent activity: 
> https://lists.fd.io/g/vpp-dev/message/15606 
> 
> 
> We had a seemingly related crash recently when a lookup in 
> snat_main.flow_hash yielded a value=-1 which was subsequently used as a 
> destination thread index to offload to. This crash prompted me to study 
> bihash more closely.
> 
> The rest of the message is structured as follows:
>  1. Presenting reasons why I believe that bihash is not thread-safe.
>  2. Proposing a fix.
> 
> 1 Bihash is probably not thread-safe
> 
> The number of 

Re: [vpp-dev] Bihash is considered thread-safe but probably shouldn't

2021-11-03 Thread Damjan Marion via lists.fd.io
What about the following, we shift offset by 6, as all buckets are aligned to 
64, anyway,  and that gives us 6 more bits so we can have 8 bit epoch counter…. 
?

— 
Damjan

> On 03.11.2021., at 19:45, Damjan Marion  wrote:
> 
> 
> 
> yes, i am aware of that, it is extremelly unlikely and only way i can see 
> this fixed is introducing epoch on the bucket level but we dont have enough 
> space there…. 
> 
> — 
> Damjan
> 
>>> On 03.11.2021., at 19:16, Florin Coras  wrote:
>>> 
>> Hi Damjan, 
>> 
>> Definitely like the scheme but the change bit might not be enough, unless 
>> I’m misunderstanding. For instance, two consecutive updates to a bucket 
>> before reader grabs b1 will hide the change. 
>> 
>> Florin
>> 
>>> On Nov 3, 2021, at 9:36 AM, Damjan Marion via lists.fd.io 
>>>  wrote:
>>> 
>>> 
>>> Agree with Dave on atomic ops being bad on the reader side.
>>> 
>>> What about following schema:
>>> 
>>> As bucket is just u64 value on the reader side we grab bucket before (b0) 
>>> and after (b1) search operation.
>>> 
>>> If search finds entry, we simply do 2 checks:
>>> - that b0 is equal to b1
>>> - that lock bit is not set in both of them
>>> If check fails, we simply retry.
>>> 
>>> On the writer side, we have add, remove and replace operations.
>>> First 2 alter refcnt which is part of bucket.
>>> To deal with replace case we introduce another bit (change bit) which is 
>>> flipped every time data is changed in the bucket.
>>> 
>>> Here are possible scenarios:
>>> 
>>> - reader grabs b0 before lock and b1 after unlock
>>>- add, del - refcnt and change bit will be different between b0 and b1 
>>> causing retry
>>>- replace - change bit will be different between b0 and b1 causing retry
>>> 
>>> - reader grabs b0 after lock and/or b1 before unlock
>>>- lock bit will be set causing retry  
>>> 
>>> Of course, this to work properly we need to ensure proper memory ordering 
>>> (i.e. avoid bucket change to be visible to remote thread before kvp change).
>>> 
>>> I crafted WIP patch to present my idea:
>>> 
>>> https://gerrit.fd.io/r/c/vpp/+/34326
>>> 
>>> In this patch I get a rid of all store barriers and replaced them with more 
>>> lightweight:
>>> 
>>> __atomic_store_n (ptr, val, __ATOMIC_RELEASE);
>>> 
>>> On platforms with strong memory ordering (like x86_64) this will result 
>>> with just normal stores (but compiler will know that it should not reorder 
>>> them).
>>> On platforms with weak memory ordering (like arch64) this will result in 
>>> special store instruction, but that one is still cheaper than full memory 
>>> barrier.
>>> 
>>> Thoughts? Comments?
>>> 
>>> Thanks,
>>> 
>>> — 
>>> Damjan
>>> 
>>> 
>>> 
 On 02.11.2021., at 12:14, Dave Barach  wrote:
 
 Dear Nick,
 
 As the code comment suggests, we tiptoe right up to the line to extract 
 performance. Have you tried e.g. ISOLCPUS, thread priority, or some other 
 expedients to make the required assumptions true?
 
 It’s easy enough to change the code in various ways so this use-case 
 cannot backfire. High on the list: always make a working copy of the 
 bucket, vs. update in place. Won’t help write performance, but it’s likely 
 to make the pain go away.
 
 Bucket-level reader-locks would involve adding Avogadro’s number of atomic 
 ops to the predominant case. I’m pretty sure that’s a non-starter.
 
 FWIW... Dave
 
 
 From: vpp-dev@lists.fd.io  On Behalf Of Nick 
 Zavaritsky
 Sent: Monday, November 1, 2021 12:12 PM
 To: vpp-dev@lists.fd.io
 Subject: [vpp-dev] Bihash is considered thread-safe but probably shouldn't
 
 Hello bihash experts!
 
 There's an old thread claiming that bihash lookup can produce a value=-1 
 under intense add/delete concurrent activity: 
 https://lists.fd.io/g/vpp-dev/message/15606
 
 We had a seemingly related crash recently when a lookup in 
 snat_main.flow_hash yielded a value=-1 which was subsequently used as a 
 destination thread index to offload to. This crash prompted me to study 
 bihash more closely.
 
 The rest of the message is structured as follows:
  1. Presenting reasons why I believe that bihash is not thread-safe.
  2. Proposing a fix.
 
 1 Bihash is probably not thread-safe
 
 The number of buckets in a hash table never changes. Every bucket has a 
 lock bit. Updates happen via clib_bihash_add_del_inline_with_hash. The 
 function grabs the bucket lock early on and performs update while holding 
 the lock. Obviously this is safe, let's focus on readers.
 
 Lookups happen via clib_bihash_search_inline_with_hash / 
 clib_bihash_search_inline_2_with_hash. The function locates the bucket and 
 waits until the lock bit is cleared.
 
  b = BV (clib_bihash_get_bucket) (h, hash);
 
  if (PREDICT_FALSE (BV (clib_bihash_bucket_is_empty) (b)))
return -1;
 
  if 

Re: [vpp-dev] Bihash is considered thread-safe but probably shouldn't

2021-11-03 Thread Damjan Marion via lists.fd.io

yes, i am aware of that, it is extremelly unlikely and only way i can see this 
fixed is introducing epoch on the bucket level but we dont have enough space 
there…. 

— 
Damjan

> On 03.11.2021., at 19:16, Florin Coras  wrote:
> 
> Hi Damjan, 
> 
> Definitely like the scheme but the change bit might not be enough, unless I’m 
> misunderstanding. For instance, two consecutive updates to a bucket before 
> reader grabs b1 will hide the change. 
> 
> Florin
> 
>> On Nov 3, 2021, at 9:36 AM, Damjan Marion via lists.fd.io 
>>  wrote:
>> 
>> 
>> Agree with Dave on atomic ops being bad on the reader side.
>> 
>> What about following schema:
>> 
>> As bucket is just u64 value on the reader side we grab bucket before (b0) 
>> and after (b1) search operation.
>> 
>> If search finds entry, we simply do 2 checks:
>> - that b0 is equal to b1
>> - that lock bit is not set in both of them
>> If check fails, we simply retry.
>> 
>> On the writer side, we have add, remove and replace operations.
>> First 2 alter refcnt which is part of bucket.
>> To deal with replace case we introduce another bit (change bit) which is 
>> flipped every time data is changed in the bucket.
>> 
>> Here are possible scenarios:
>> 
>> - reader grabs b0 before lock and b1 after unlock
>>- add, del - refcnt and change bit will be different between b0 and b1 
>> causing retry
>>- replace - change bit will be different between b0 and b1 causing retry
>> 
>> - reader grabs b0 after lock and/or b1 before unlock
>>- lock bit will be set causing retry  
>> 
>> Of course, this to work properly we need to ensure proper memory ordering 
>> (i.e. avoid bucket change to be visible to remote thread before kvp change).
>> 
>> I crafted WIP patch to present my idea:
>> 
>> https://gerrit.fd.io/r/c/vpp/+/34326
>> 
>> In this patch I get a rid of all store barriers and replaced them with more 
>> lightweight:
>> 
>> __atomic_store_n (ptr, val, __ATOMIC_RELEASE);
>> 
>> On platforms with strong memory ordering (like x86_64) this will result with 
>> just normal stores (but compiler will know that it should not reorder them).
>> On platforms with weak memory ordering (like arch64) this will result in 
>> special store instruction, but that one is still cheaper than full memory 
>> barrier.
>> 
>> Thoughts? Comments?
>> 
>> Thanks,
>> 
>> — 
>> Damjan
>> 
>> 
>> 
>>> On 02.11.2021., at 12:14, Dave Barach  wrote:
>>> 
>>> Dear Nick,
>>> 
>>> As the code comment suggests, we tiptoe right up to the line to extract 
>>> performance. Have you tried e.g. ISOLCPUS, thread priority, or some other 
>>> expedients to make the required assumptions true?
>>> 
>>> It’s easy enough to change the code in various ways so this use-case cannot 
>>> backfire. High on the list: always make a working copy of the bucket, vs. 
>>> update in place. Won’t help write performance, but it’s likely to make the 
>>> pain go away.
>>> 
>>> Bucket-level reader-locks would involve adding Avogadro’s number of atomic 
>>> ops to the predominant case. I’m pretty sure that’s a non-starter.
>>> 
>>> FWIW... Dave
>>> 
>>> 
>>> From: vpp-dev@lists.fd.io  On Behalf Of Nick Zavaritsky
>>> Sent: Monday, November 1, 2021 12:12 PM
>>> To: vpp-dev@lists.fd.io
>>> Subject: [vpp-dev] Bihash is considered thread-safe but probably shouldn't
>>> 
>>> Hello bihash experts!
>>> 
>>> There's an old thread claiming that bihash lookup can produce a value=-1 
>>> under intense add/delete concurrent activity: 
>>> https://lists.fd.io/g/vpp-dev/message/15606
>>> 
>>> We had a seemingly related crash recently when a lookup in 
>>> snat_main.flow_hash yielded a value=-1 which was subsequently used as a 
>>> destination thread index to offload to. This crash prompted me to study 
>>> bihash more closely.
>>> 
>>> The rest of the message is structured as follows:
>>>  1. Presenting reasons why I believe that bihash is not thread-safe.
>>>  2. Proposing a fix.
>>> 
>>> 1 Bihash is probably not thread-safe
>>> 
>>> The number of buckets in a hash table never changes. Every bucket has a 
>>> lock bit. Updates happen via clib_bihash_add_del_inline_with_hash. The 
>>> function grabs the bucket lock early on and performs update while holding 
>>> the lock. Obviously this is safe, let's focus on readers.
>>> 
>>> Lookups happen via clib_bihash_search_inline_with_hash / 
>>> clib_bihash_search_inline_2_with_hash. The function locates the bucket and 
>>> waits until the lock bit is cleared.
>>> 
>>>  b = BV (clib_bihash_get_bucket) (h, hash);
>>> 
>>>  if (PREDICT_FALSE (BV (clib_bihash_bucket_is_empty) (b)))
>>>return -1;
>>> 
>>>  if (PREDICT_FALSE (b->lock))
>>>{
>>>  volatile BVT (clib_bihash_bucket) * bv = b;
>>>  while (bv->lock)
>>>CLIB_PAUSE ();
>>>}
>>> 
>>> From this point on the function examines the data structure without ever 
>>> bothering to check the lock again. Nothing prevents an updater from 
>>> changing the data the reader is currently looking at, or even deallocating 

Re: [vpp-dev] Bihash is considered thread-safe but probably shouldn't

2021-11-03 Thread Florin Coras
Hi Damjan, 

Definitely like the scheme but the change bit might not be enough, unless I’m 
misunderstanding. For instance, two consecutive updates to a bucket before 
reader grabs b1 will hide the change. 

Florin

> On Nov 3, 2021, at 9:36 AM, Damjan Marion via lists.fd.io 
>  wrote:
> 
> 
> Agree with Dave on atomic ops being bad on the reader side.
> 
> What about following schema:
> 
> As bucket is just u64 value on the reader side we grab bucket before (b0) and 
> after (b1) search operation.
> 
> If search finds entry, we simply do 2 checks:
> - that b0 is equal to b1
> - that lock bit is not set in both of them
> If check fails, we simply retry.
> 
> On the writer side, we have add, remove and replace operations.
> First 2 alter refcnt which is part of bucket.
> To deal with replace case we introduce another bit (change bit) which is 
> flipped every time data is changed in the bucket.
> 
> Here are possible scenarios:
> 
> - reader grabs b0 before lock and b1 after unlock
>- add, del - refcnt and change bit will be different between b0 and b1 
> causing retry
>- replace - change bit will be different between b0 and b1 causing retry
> 
> - reader grabs b0 after lock and/or b1 before unlock
>- lock bit will be set causing retry  
> 
> Of course, this to work properly we need to ensure proper memory ordering 
> (i.e. avoid bucket change to be visible to remote thread before kvp change).
> 
> I crafted WIP patch to present my idea:
> 
> https://gerrit.fd.io/r/c/vpp/+/34326 
> 
> In this patch I get a rid of all store barriers and replaced them with more 
> lightweight:
> 
> __atomic_store_n (ptr, val, __ATOMIC_RELEASE);
> 
> On platforms with strong memory ordering (like x86_64) this will result with 
> just normal stores (but compiler will know that it should not reorder them).
> On platforms with weak memory ordering (like arch64) this will result in 
> special store instruction, but that one is still cheaper than full memory 
> barrier.
> 
> Thoughts? Comments?
> 
> Thanks,
> 
> — 
> Damjan
> 
> 
> 
>> On 02.11.2021., at 12:14, Dave Barach  wrote:
>> 
>> Dear Nick,
>> 
>> As the code comment suggests, we tiptoe right up to the line to extract 
>> performance. Have you tried e.g. ISOLCPUS, thread priority, or some other 
>> expedients to make the required assumptions true?
>> 
>> It’s easy enough to change the code in various ways so this use-case cannot 
>> backfire. High on the list: always make a working copy of the bucket, vs. 
>> update in place. Won’t help write performance, but it’s likely to make the 
>> pain go away.
>> 
>> Bucket-level reader-locks would involve adding Avogadro’s number of atomic 
>> ops to the predominant case. I’m pretty sure that’s a non-starter.
>> 
>> FWIW... Dave
>> 
>> 
>> From: vpp-dev@lists.fd.io  On Behalf Of Nick Zavaritsky
>> Sent: Monday, November 1, 2021 12:12 PM
>> To: vpp-dev@lists.fd.io
>> Subject: [vpp-dev] Bihash is considered thread-safe but probably shouldn't
>> 
>> Hello bihash experts!
>> 
>> There's an old thread claiming that bihash lookup can produce a value=-1 
>> under intense add/delete concurrent activity: 
>> https://lists.fd.io/g/vpp-dev/message/15606
>> 
>> We had a seemingly related crash recently when a lookup in 
>> snat_main.flow_hash yielded a value=-1 which was subsequently used as a 
>> destination thread index to offload to. This crash prompted me to study 
>> bihash more closely.
>> 
>> The rest of the message is structured as follows:
>>  1. Presenting reasons why I believe that bihash is not thread-safe.
>>  2. Proposing a fix.
>> 
>> 1 Bihash is probably not thread-safe
>> 
>> The number of buckets in a hash table never changes. Every bucket has a lock 
>> bit. Updates happen via clib_bihash_add_del_inline_with_hash. The function 
>> grabs the bucket lock early on and performs update while holding the lock. 
>> Obviously this is safe, let's focus on readers.
>> 
>> Lookups happen via clib_bihash_search_inline_with_hash / 
>> clib_bihash_search_inline_2_with_hash. The function locates the bucket and 
>> waits until the lock bit is cleared.
>> 
>>  b = BV (clib_bihash_get_bucket) (h, hash);
>> 
>>  if (PREDICT_FALSE (BV (clib_bihash_bucket_is_empty) (b)))
>>return -1;
>> 
>>  if (PREDICT_FALSE (b->lock))
>>{
>>  volatile BVT (clib_bihash_bucket) * bv = b;
>>  while (bv->lock)
>>CLIB_PAUSE ();
>>}
>> 
>> From this point on the function examines the data structure without ever 
>> bothering to check the lock again. Nothing prevents an updater from changing 
>> the data the reader is currently looking at, or even deallocating it right 
>> away. The only way it could work is if we could make assumptions about 
>> relative performance of lookup and update operations. Checking the lock 
>> early in lookup ensures that there's no update currently in progress. If 
>> lookup is faster than update, then no future updater will manage to progress 
>> to the point 

[vpp-dev] Headers files missing in VPP version 21.10

2021-11-03 Thread Mauro Sardara (msardara) via lists.fd.io
Hello All,

I am having an issue with the new VPP version 21.10 because of some header 
files missing.

The issue was fixed in master a few days ago [0], but it is still present in 
the tag v21.10, so
I cherry-picked the change and created a patch for the branch stable/2110 [1].

Is it possible to release a new version 21.10.1 containing this fix?

Thanks in advance for your support.

Regards,
Mauro.


[0] 
https://github.com/FDio/vpp/commit/d9fc708ee0c8c42fa09dda5d29e9cb0f985558dc#diff-931430ce72263164a78d72bf46cfb75cf63d5503fd60293a7f4fd9e70aca2383

[1] https://gerrit.fd.io/r/c/vpp/+/34339



-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#20415): https://lists.fd.io/g/vpp-dev/message/20415
Mute This Topic: https://lists.fd.io/mt/86796422/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] Query regarding packet validation checks in ip6_local

2021-11-03 Thread Ole Troan
Pankaj,

> In file vnet/ip/ip6_forward.c(VPP 21.01), function ip6_local_inline(), 
> the node's packet processing has the following check:
> 
>   if (PREDICT_FALSE (need_csum))
> {
>   flags = ip6_tcp_udp_icmp_validate_checksum (vm, b[0]);
>   good_l4_csum = flags & VNET_BUFFER_F_L4_CHECKSUM_CORRECT;
>   error = IP6_ERROR_UNKNOWN_PROTOCOL;
> }
>   else
> {
>   if (ip6_tcp_udp_icmp_bad_length (vm, b[0]))
> error = IP6_ERROR_BAD_LENGTH;
> }
> 
> Kindly explain the reason behind the checksum and the length validation being 
> under the if-else check, making the length validation not applicable for the 
> case when checksum validation is applicable.
> 
> Also, in the function ip6_tcp_udp_icmp_bad_length (refer the below snippet), 
> the function seems to assume that the hop-by-hop extension header shall 
> always be present. Is it that, IPv6 hop-by-hop extension header has to be 
> mandatorily present?

Indeed. That code looks quite questionable.
A good start would perhaps be to write tests showing where it fails? Then that 
will make refactoring and fixing afterwards.

Best regards,
Ole

> 
> static_always_inline u8
> ip6_tcp_udp_icmp_bad_length (vlib_main_t * vm, vlib_buffer_t * p0)
> {
> 
>   u16 payload_length_host_byte_order;
>   u32 n_this_buffer, n_bytes_left;
>   ip6_header_t *ip0 = vlib_buffer_get_current (p0);
>   u32 headers_size = sizeof (ip0[0]);
>   u8 *data_this_buffer;
> 
>   data_this_buffer = (u8 *) (ip0 + 1);
> 
>   ip6_hop_by_hop_ext_t *ext_hdr = (ip6_hop_by_hop_ext_t *) data_this_buffer;
> 
>   /* validate really icmp6 next */
> 
>   if (!(ext_hdr->next_hdr == IP_PROTOCOL_ICMP6)
>   || (ext_hdr->next_hdr == IP_PROTOCOL_UDP))
> return 0;
>   ::
>   ::
> 
> 
> Thanks,
> Pankaj Malhotra
> 
> 
> 



signature.asc
Description: Message signed with OpenPGP

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#20413): https://lists.fd.io/g/vpp-dev/message/20413
Mute This Topic: https://lists.fd.io/mt/86774091/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-