Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?

2023-03-19 Thread Hao Tian
Hi,

Fix confirmed. Git master branch runs the test code without any error. Thanks 
for everyone participating in this thread!

Best regards,
Hao Tian

-Original Message-
From: vpp-dev@lists.fd.io  on behalf of Hao Tian 

Sent: Friday, March 17, 2023 11:17 AM
To: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - 
misuse or bug?

Hi Dave,

I tested the change and found some issue. Please check gerrit comments. Thanks!

Best regards,
Hao Tian

-Original Message-
From: vpp-dev@lists.fd.io  on behalf of Dave Barach 

Sent: Friday, March 17, 2023 1:50 AM
To: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - 
misuse or bug?

Please see https://gerrit.fd.io/r/c/vpp/+/38507



-Original Message-

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Hao Tian

Sent: Wednesday, March 15, 2023 10:14 PM

To: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>

Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - 
misuse or bug?



Hi Dave,



Thanks for your work. I am ready to test whenever needed.



Best regards,

Hao Tian





From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> on behalf of Dave Barach 
mailto:v...@barachs.net>>

Sent: Thursday, March 16, 2023 7:02 AM

To: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>

Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - 
misuse or bug?



I'm doing a bit of work to straighten out the template, hopefully without 
causing a measurable performance regression.



Hao's test code is a bit of a corner-case: there is exactly one record in the 
database which the code thrashes as hard as possible.



D.



-Original Message-

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Andrew 
Yourtchenko

Sent: Wednesday, March 15, 2023 12:33 PM

To: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>

Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - 
misuse or bug?



Hao,



I noticed the same behavior when stress-testing the multi thread session 
handling for the ACL plugin a while ago. I thought this trade off is there to 
avoid having to do the hard locks in bihash code, rather than it being a bug.



As you say - the special value comes only if the deletion is in progress, and 
it is always the same. So I just treated that case in my code same as “not 
found”.



My logic was: if an entry is just in process of being deleted, there is very 
little use for its old value anyway.



--a



> On 15 Mar 2023, at 14:45, Hao Tian 
> mailto:tianhao...@outlook.com>> wrote:

>

> Hi,

>

> I tried but could not come up with any way that is able to ensure the kvp 
> being valid upon return without using the full bucket lock.

>

> Maybe we can make a copy of the value before returning, validate the copy and 
> return that copy instead. Critical section can be shrinked to cover only the 
> copying process, which seems to perform better, but I'm not sure if this is 
> the best approach.

>

> Could you please shed some light here? Thanks!

>

> Regards,

> Hao Tian

>




-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22722): https://lists.fd.io/g/vpp-dev/message/22722
Mute This Topic: https://lists.fd.io/mt/97599770/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?

2023-03-16 Thread Hao Tian
Hi Dave,

I tested the change and found some issue. Please check gerrit comments. Thanks!

Best regards,
Hao Tian

-Original Message-
From: vpp-dev@lists.fd.io  on behalf of Dave Barach 

Sent: Friday, March 17, 2023 1:50 AM
To: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - 
misuse or bug?

Please see https://gerrit.fd.io/r/c/vpp/+/38507



-Original Message-

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Hao Tian

Sent: Wednesday, March 15, 2023 10:14 PM

To: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>

Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - 
misuse or bug?



Hi Dave,



Thanks for your work. I am ready to test whenever needed.



Best regards,

Hao Tian





From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> on behalf of Dave Barach 
mailto:v...@barachs.net>>

Sent: Thursday, March 16, 2023 7:02 AM

To: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>

Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - 
misuse or bug?



I'm doing a bit of work to straighten out the template, hopefully without 
causing a measurable performance regression.



Hao's test code is a bit of a corner-case: there is exactly one record in the 
database which the code thrashes as hard as possible.



D.



-Original Message-

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Andrew 
Yourtchenko

Sent: Wednesday, March 15, 2023 12:33 PM

To: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>

Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - 
misuse or bug?



Hao,



I noticed the same behavior when stress-testing the multi thread session 
handling for the ACL plugin a while ago. I thought this trade off is there to 
avoid having to do the hard locks in bihash code, rather than it being a bug.



As you say - the special value comes only if the deletion is in progress, and 
it is always the same. So I just treated that case in my code same as “not 
found”.



My logic was: if an entry is just in process of being deleted, there is very 
little use for its old value anyway.



--a



> On 15 Mar 2023, at 14:45, Hao Tian 
> mailto:tianhao...@outlook.com>> wrote:

>

> Hi,

>

> I tried but could not come up with any way that is able to ensure the kvp 
> being valid upon return without using the full bucket lock.

>

> Maybe we can make a copy of the value before returning, validate the copy and 
> return that copy instead. Critical section can be shrinked to cover only the 
> copying process, which seems to perform better, but I'm not sure if this is 
> the best approach.

>

> Could you please shed some light here? Thanks!

>

> Regards,

> Hao Tian

>




-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22716): https://lists.fd.io/g/vpp-dev/message/22716
Mute This Topic: https://lists.fd.io/mt/97599770/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?

2023-03-16 Thread themiron via lists.fd.io
Hi Dave,

What if just to use highest bit of key as free mark?

While having clib_crc32c_uses_intrinsics defined, hash result will be 32bit 
only, highest bit of 64bit key is completely “free”.

If not defined, - highest but can be just stripped off while computation in 
clib_bihash_*.

With such approach, 64bit key update will be atomic, no special (and therefore 
– never about to be used) value will be necessary, can it work?

--

Best Regards, Vladislav Grishenko

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22715): https://lists.fd.io/g/vpp-dev/message/22715
Mute This Topic: https://lists.fd.io/mt/97599770/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?

2023-03-16 Thread Dave Barach
Please see https://gerrit.fd.io/r/c/vpp/+/38507 

 

-Original Message-

From: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>  mailto:vpp-dev@lists.fd.io> > On Behalf Of Hao Tian

Sent: Wednesday, March 15, 2023 10:14 PM

To: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io> 

Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - 
misuse or bug?

 

Hi Dave,

 

Thanks for your work. I am ready to test whenever needed.

 

Best regards,

Hao Tian

 



From: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>  mailto:vpp-dev@lists.fd.io> > on behalf of Dave Barach mailto:v...@barachs.net> >

Sent: Thursday, March 16, 2023 7:02 AM

To: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io> 

Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - 
misuse or bug?

 

I'm doing a bit of work to straighten out the template, hopefully without 
causing a measurable performance regression.

 

Hao's test code is a bit of a corner-case: there is exactly one record in the 
database which the code thrashes as hard as possible.

 

D.

 

-Original Message-

From: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>  mailto:vpp-dev@lists.fd.io> > On Behalf Of Andrew Yourtchenko

Sent: Wednesday, March 15, 2023 12:33 PM

To: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io> 

Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - 
misuse or bug?

 

Hao,

 

I noticed the same behavior when stress-testing the multi thread session 
handling for the ACL plugin a while ago. I thought this trade off is there to 
avoid having to do the hard locks in bihash code, rather than it being a bug.

 

As you say - the special value comes only if the deletion is in progress, and 
it is always the same. So I just treated that case in my code same as “not 
found”.

 

My logic was: if an entry is just in process of being deleted, there is very 
little use for its old value anyway.

 

--a

 

> On 15 Mar 2023, at 14:45, Hao Tian  <mailto:tianhao...@outlook.com> > wrote:

> 

> Hi,

> 

> I tried but could not come up with any way that is able to ensure the kvp 
> being valid upon return without using the full bucket lock.

> 

> Maybe we can make a copy of the value before returning, validate the copy and 
> return that copy instead. Critical section can be shrinked to cover only the 
> copying process, which seems to perform better, but I'm not sure if this is 
> the best approach.

> 

> Could you please shed some light here? Thanks!

> 

> Regards,

> Hao Tian

> 

 

 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22714): https://lists.fd.io/g/vpp-dev/message/22714
Mute This Topic: https://lists.fd.io/mt/97599770/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?

2023-03-15 Thread Hao Tian
Hi Dave,

Thanks for your work. I am ready to test whenever needed.

Best regards,
Hao Tian


From: vpp-dev@lists.fd.io  on behalf of Dave Barach 

Sent: Thursday, March 16, 2023 7:02 AM
To: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - 
misuse or bug?

I'm doing a bit of work to straighten out the template, hopefully without 
causing a measurable performance regression.

Hao's test code is a bit of a corner-case: there is exactly one record in the 
database which the code thrashes as hard as possible.

D.

-Original Message-
From: vpp-dev@lists.fd.io  On Behalf Of Andrew Yourtchenko
Sent: Wednesday, March 15, 2023 12:33 PM
To: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - 
misuse or bug?

Hao,

I noticed the same behavior when stress-testing the multi thread session 
handling for the ACL plugin a while ago. I thought this trade off is there to 
avoid having to do the hard locks in bihash code, rather than it being a bug.

As you say - the special value comes only if the deletion is in progress, and 
it is always the same. So I just treated that case in my code same as “not 
found”.

My logic was: if an entry is just in process of being deleted, there is very 
little use for its old value anyway.

--a

> On 15 Mar 2023, at 14:45, Hao Tian  wrote:
>
> Hi,
>
> I tried but could not come up with any way that is able to ensure the kvp 
> being valid upon return without using the full bucket lock.
>
> Maybe we can make a copy of the value before returning, validate the copy and 
> return that copy instead. Critical section can be shrinked to cover only the 
> copying process, which seems to perform better, but I'm not sure if this is 
> the best approach.
>
> Could you please shed some light here? Thanks!
>
> Regards,
> Hao Tian
>


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22713): https://lists.fd.io/g/vpp-dev/message/22713
Mute This Topic: https://lists.fd.io/mt/97599770/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?

2023-03-15 Thread Hao Tian
Hi Andrew,

Thanks for the insight, the idea about treating it as optimization is quite 
interesting.

This is fine for newly added code, but I'm a bit concerned about existing code 
- there are a lot of bihash callers that use the value from bihash as 
vec/pool/... index as long as the search call returned 0, without any check 
against ~0. It would be difficult to fix them all. A fix in bihash might be 
more preferable, considering these existing use cases.

Best regards,
Hao Tian

From: vpp-dev@lists.fd.io  on behalf of Andrew Yourtchenko 

Sent: Thursday, March 16, 2023 12:33 AM
To: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - 
misuse or bug?

Hao,

I noticed the same behavior when stress-testing the multi thread session 
handling for the ACL plugin a while ago. I thought this trade off is there to 
avoid having to do the hard locks in bihash code, rather than it being a bug.

As you say - the special value comes only if the deletion is in progress, and 
it is always the same. So I just treated that case in my code same as “not 
found”.

My logic was: if an entry is just in process of being deleted, there is very 
little use for its old value anyway.

--a

> On 15 Mar 2023, at 14:45, Hao Tian  wrote:
>
> Hi,
>
> I tried but could not come up with any way that is able to ensure the kvp 
> being valid upon return without using the full bucket lock.
>
> Maybe we can make a copy of the value before returning, validate the copy and 
> return that copy instead. Critical section can be shrinked to cover only the 
> copying process, which seems to perform better, but I'm not sure if this is 
> the best approach.
>
> Could you please shed some light here? Thanks!
>
> Regards,
> Hao Tian
>

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22712): https://lists.fd.io/g/vpp-dev/message/22712
Mute This Topic: https://lists.fd.io/mt/97599770/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?

2023-03-15 Thread Dave Barach
I'm doing a bit of work to straighten out the template, hopefully without 
causing a measurable performance regression.

Hao's test code is a bit of a corner-case: there is exactly one record in the 
database which the code thrashes as hard as possible.

D.

-Original Message-
From: vpp-dev@lists.fd.io  On Behalf Of Andrew Yourtchenko
Sent: Wednesday, March 15, 2023 12:33 PM
To: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - 
misuse or bug?

Hao,

I noticed the same behavior when stress-testing the multi thread session 
handling for the ACL plugin a while ago. I thought this trade off is there to 
avoid having to do the hard locks in bihash code, rather than it being a bug.

As you say - the special value comes only if the deletion is in progress, and 
it is always the same. So I just treated that case in my code same as “not 
found”.

My logic was: if an entry is just in process of being deleted, there is very 
little use for its old value anyway.

--a

> On 15 Mar 2023, at 14:45, Hao Tian  wrote:
> 
> Hi,
> 
> I tried but could not come up with any way that is able to ensure the kvp 
> being valid upon return without using the full bucket lock.
> 
> Maybe we can make a copy of the value before returning, validate the copy and 
> return that copy instead. Critical section can be shrinked to cover only the 
> copying process, which seems to perform better, but I'm not sure if this is 
> the best approach.
> 
> Could you please shed some light here? Thanks!
> 
> Regards,
> Hao Tian
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22711): https://lists.fd.io/g/vpp-dev/message/22711
Mute This Topic: https://lists.fd.io/mt/97599770/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?

2023-03-15 Thread Andrew Yourtchenko
Hao,

I noticed the same behavior when stress-testing the multi thread session 
handling for the ACL plugin a while ago. I thought this trade off is there to 
avoid having to do the hard locks in bihash code, rather than it being a bug.

As you say - the special value comes only if the deletion is in progress, and 
it is always the same. So I just treated that case in my code same as “not 
found”.

My logic was: if an entry is just in process of being deleted, there is very 
little use for its old value anyway.

--a

> On 15 Mar 2023, at 14:45, Hao Tian  wrote:
> 
> Hi,
> 
> I tried but could not come up with any way that is able to ensure the kvp 
> being valid upon return without using the full bucket lock.
> 
> Maybe we can make a copy of the value before returning, validate the copy and 
> return that copy instead. Critical section can be shrinked to cover only the 
> copying process, which seems to perform better, but I'm not sure if this is 
> the best approach.
> 
> Could you please shed some light here? Thanks!
> 
> Regards,
> Hao Tian
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22710): https://lists.fd.io/g/vpp-dev/message/22710
Mute This Topic: https://lists.fd.io/mt/97599770/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?

2023-03-15 Thread Hao Tian
Hi,

I tried but could not come up with any way that is able to ensure the kvp being 
valid upon return without using the full bucket lock.

Maybe we can make a copy of the value before returning, validate the copy and 
return that copy instead. Critical section can be shrinked to cover only the 
copying process, which seems to perform better, but I'm not sure if this is the 
best approach.

Could you please shed some light here? Thanks!

Regards,
Hao Tian
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22709): https://lists.fd.io/g/vpp-dev/message/22709
Mute This Topic: https://lists.fd.io/mt/97599770/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?

2023-03-15 Thread Dave Barach
Doubling the number of atomic ops was expected to cause a severe performance
regression. This hack demonstrates that the test code wasn't responsible. A
bit of extra suspicion in clib_bihash_search_inline_2_with_hash() will
likely solve the problem without the huge performance regression. 

D.  

-Original Message-
From: vpp-dev@lists.fd.io  On Behalf Of Hao Tian
Sent: Tuesday, March 14, 2023 9:47 PM
To: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Race condition between bihash deletion and searching
- misuse or bug?

Hi,

I've done the change you asked, and the "suspicious value" warnings are all
gone. Here is the diff:

diff --git a/src/vppinfra/bihash_template.h b/src/vppinfra/bihash_template.h
index c4e120e4a..b9f658db3 100644
--- a/src/vppinfra/bihash_template.h
+++ b/src/vppinfra/bihash_template.h
@@ -532,12 +532,7 @@ static inline int BV
(clib_bihash_search_inline_2_with_hash)
   if (PREDICT_FALSE (BV (clib_bihash_bucket_is_empty) (b)))
 return -1;
 
-  if (PREDICT_FALSE (b->lock))
-{
-  volatile BVT (clib_bihash_bucket) * bv = b;
-  while (bv->lock)
-   CLIB_PAUSE ();
-}
+  BV (clib_bihash_lock_bucket) (b);
 
   v = BV (clib_bihash_get_value) (h, b->offset);
 
@@ -557,9 +552,11 @@ static inline int BV
(clib_bihash_search_inline_2_with_hash)
   if (BV (clib_bihash_key_compare) (v->kvp[i].key, search_key->key))
{
  *valuep = v->kvp[i];
+ BV (clib_bihash_unlock_bucket) (b);
  return 0;
}
 }
+  BV (clib_bihash_unlock_bucket) (b);
   return -1;
 }
 

Some questions regarding the change:

1. There is similar logic in clib_bihash_search_inline_with_hash. I suppose
that this change needs to be applied there as well. Am I right?

2. This patch does eliminate the race condition, but appears to introduce a
huge performance regression. The clocks in `show runtime` will double after
the change (with the 10-ops limit removed, so the test node runs
indefinitely).

Regards,
Hao Tian


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22708): https://lists.fd.io/g/vpp-dev/message/22708
Mute This Topic: https://lists.fd.io/mt/97599770/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?

2023-03-14 Thread Hao Tian
Hi,

I've done the change you asked, and the "suspicious value" warnings are all 
gone. Here is the diff:

diff --git a/src/vppinfra/bihash_template.h b/src/vppinfra/bihash_template.h
index c4e120e4a..b9f658db3 100644
--- a/src/vppinfra/bihash_template.h
+++ b/src/vppinfra/bihash_template.h
@@ -532,12 +532,7 @@ static inline int BV 
(clib_bihash_search_inline_2_with_hash)
   if (PREDICT_FALSE (BV (clib_bihash_bucket_is_empty) (b)))
 return -1;
 
-  if (PREDICT_FALSE (b->lock))
-{
-  volatile BVT (clib_bihash_bucket) * bv = b;
-  while (bv->lock)
-   CLIB_PAUSE ();
-}
+  BV (clib_bihash_lock_bucket) (b);
 
   v = BV (clib_bihash_get_value) (h, b->offset);
 
@@ -557,9 +552,11 @@ static inline int BV 
(clib_bihash_search_inline_2_with_hash)
   if (BV (clib_bihash_key_compare) (v->kvp[i].key, search_key->key))
{
  *valuep = v->kvp[i];
+ BV (clib_bihash_unlock_bucket) (b);
  return 0;
}
 }
+  BV (clib_bihash_unlock_bucket) (b);
   return -1;
 }
 

Some questions regarding the change:

1. There is similar logic in clib_bihash_search_inline_with_hash. I suppose 
that this change needs to be applied there as well. Am I right?

2. This patch does eliminate the race condition, but appears to introduce a 
huge performance regression. The clocks in `show runtime` will double after the 
change (with the 10-ops limit removed, so the test node runs indefinitely).

Regards,
Hao Tian
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22706): https://lists.fd.io/g/vpp-dev/message/22706
Mute This Topic: https://lists.fd.io/mt/97599770/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?

2023-03-14 Thread Dave Barach
Quick experiment: in
src/vppinfra/bihash_template.h:clib_bihash_search_inline_2_with_hash(),
replace this:

  if (PREDICT_FALSE (b->lock))
{
  volatile BVT (clib_bihash_bucket) * bv = b;
  while (bv->lock)
CLIB_PAUSE ();
}

With:

BV(clib_bihash_lock_bucket(b));

and make sure to BV(clib_bihash_unlock_bucket(b)); just prior to return 0
and return -1 in that function.

Please let me know what happens. 

Thanks... Dave

-Original Message-
From: vpp-dev@lists.fd.io  On Behalf Of Hao Tian
Sent: Tuesday, March 14, 2023 4:13 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] Race condition between bihash deletion and searching -
misuse or bug?

Hi all,

I found that bihash might return zero'ed value (0xff's) if deletion and
searching were performed in parallel on different threads. This is
reproducible by only ~100 lines of code, from 21.06 all the way up to git
master, and on multiple machines from Coffee Lake to Tiger Lake.

The code (a plugin) and test command is shown below. Given how easy it is to
reproduce the problem, I'm not sure whether this is a bug or something
missing on my end. Any advice is welcomed. Thanks!

 bihash_race.c 

#include 
#include 
#include 

static volatile int br_run;
static clib_bihash_8_8_t br_bihash_8_8;
static int test_run_cnt[2];

static void
bihash_race_set (u64 key)
{
  clib_bihash_kv_8_8_t kv = { .key = key, .value = 1 };

  clib_bihash_add_del_8_8 (_bihash_8_8, , BIHASH_ADD); }

static void
bihash_race_del (u64 key)
{
  clib_bihash_kv_8_8_t kv = { .key = key, .value = 1 };

  clib_bihash_add_del_8_8 (_bihash_8_8, , BIHASH_DEL); }

static void
bihash_race_get (u64 key)
{
  clib_bihash_kv_8_8_t kv = { .key = key, .value = 1 };
  clib_bihash_kv_8_8_t value;

  if (!clib_bihash_search_8_8 (_bihash_8_8, , ))
    {
  if (value.value != 1)
    {
  clib_warning ("suspicious value: 0x%lx\n", value.value);
    }
    }
}

static clib_error_t *
bihash_race_init (vlib_main_t *vm)
{
  clib_bihash_init_8_8 (_bihash_8_8, "bihash_race hash", 4096, 4096 *
256);

  return 0;
}

VLIB_NODE_FN (br_test_node)
(vlib_main_t *vm, vlib_node_runtime_t *node, vlib_frame_t *frame) {
  u64 k = 0x01010101UL;

  if (!br_run || vm->thread_index == 0)
    {
  return 0;
    }

  // perform bihash set and delete for 5 times each in thread 1
  if (vm->thread_index == 1 && test_run_cnt[0] < 10)
    {
  if (test_run_cnt[1] & 1)
    bihash_race_del (k);
  else
    bihash_race_set (k);
  test_run_cnt[0]++;
  return 0;
    }

  // perform bihash lookup for 10 times in thread 2
  if (vm->thread_index == 2 && test_run_cnt[1] < 10)
    {
  bihash_race_get (k);
  test_run_cnt[1]++;
  return 0;
    }
  return 0;
}

void
bihash_race_test_reset ()
{
  memset (test_run_cnt, 0, sizeof (test_run_cnt));
  clib_warning ("== test reset =="); }

static clib_error_t *
bihash_race_test_command_fn (vlib_main_t *vm, unformat_input_t *input,
 vlib_cli_command_t *cmd) {
  u8 state = ~0;

  while (unformat_check_input (input) != UNFORMAT_END_OF_INPUT)
    {
  if (unformat (input, "on"))
    state = 1;
  else if (unformat (input, "off"))
    state = 0;
  else
    return clib_error_return (0, "invalid input");
    }

  if (state)
    {
  br_run = 1;
    }
  else
    {
  br_run = 0;
  bihash_race_test_reset ();
  vlib_cli_output (vm, "== test reset ==");
    }

  return 0;
}

VLIB_CLI_COMMAND (bihash_race_test_command, static) = {
  .path = "set bihash-race-test",
  .short_help = "set bihash-race-test |",
  .function = bihash_race_test_command_fn, };

VLIB_REGISTER_NODE (br_test_node) = {
  .name = "bihash-race-test",
  .type = VLIB_NODE_TYPE_INPUT,
  .state = VLIB_NODE_STATE_POLLING,
};

VLIB_INIT_FUNCTION (bihash_race_init);

VLIB_PLUGIN_REGISTER () = {
  .version = VPP_BUILD_VER,
  .description = "bihash race test",
};

= CMakeLists.txt =

add_vpp_plugin(bihash_race
  SOURCES
  bihash_race.c
)

= startup.conf =

unix {
  nodaemon
  full-coredump
  cli-listen /run/vpp/cli.sock
  gid 1000
  interactive
}

cpu {
  workers 3
  main-core 1
}

dpdk {
  no-pci
}

= test command =

terminal1 # vpp -c startup.conf
terminal2 # while true; do vppctl set bihash-race-test on; sleep 1; vppctl
set bihash-race-test off; done

VPP will spit out a lot of "suspicious value: 0x" in
terminal1, while the code above never saves such value into bihash - the
value comes from the !is_add branch in clib_bihash_add_del_inline_with_hash.
Changing the memset value (and clib_bihash_is_free_* obviously) in this
branch will lead to this new value being returned.

Regards,
Hao Tian


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22703): https://lists.fd.io/g/vpp-dev/message/22703
Mute This Topic: https://lists.fd.io/mt/97599770/21656
Group