On Sun, Aug 21, 2022 at 12:34 AM Peng He <[email protected]> wrote:
>
> Hi Mike
>
> Intuitionally, I do not think it is a good idea.

I agree with you, that change is really a sledge hammer. I'll collect
some telemetry from the Massif check to see why exactly the race
condition is occurring.

Cheers,
M

>
> if src->ukey_state = DELETED, but dst is to bring it back to ALIVE (some 
> state),
> then this is a bug, because it breaks the state machine behind.
>
> I think the checking we can relax here is only when dst and ukey stat is in 
> EVICTED and DELETED,
> and this is only because specific state machine we are using there.
>
> transition_ukey should be as simple as possible, not related to any specific 
> state machine used.
>
>
> Mike Pattrick <[email protected]>于2022年8月16日 周二22:05写道:
>>
>> On Tue, Aug 16, 2022 at 4:04 AM Peng He <[email protected]> wrote:
>> >
>> > Hi, Mike
>> >
>> > by changing it into the function, you mean relax the checking if the state 
>> > is
>> > either EVICTED or DELETED?
>> >
>>
>> In my testing code, I changed it to this:
>>
>> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
>> index 57f94df54..224bdfbb8 100644
>> --- a/ofproto/ofproto-dpif-upcall.c
>> +++ b/ofproto/ofproto-dpif-upcall.c
>> @@ -1937,6 +1937,12 @@ transition_ukey_at(struct udpif_key *ukey, enum
>> ukey_state dst,
>>      OVS_REQUIRES(ukey->mutex)
>>  {
>>      if (dst < ukey->state) {
>> +        if (ukey->state >= UKEY_EVICTED) {
>> +            VLOG_DBG("Possible race condition on ukey transition %d->%d "
>> +                     "(last transitioned from thread %u at %s)"ukey->state,
>> +                     dst, ukey->state_thread, ukey->state_where);
>> +            return;
>> +        }
>>          VLOG_ABORT("Invalid ukey transition %d->%d (last transitioned from "
>>                     "thread %u at %s)", ukey->state, dst, ukey->state_thread,
>>                     ukey->state_where);
>>
>>
>> >
>> > Mike Pattrick <[email protected]> 于2022年8月15日周一 20:16写道:
>> >>
>> >> On Sat, Jun 4, 2022 at 11:19 AM Peng He <[email protected]> wrote:
>> >> >
>> >> > There is a race condition between the revalidator threads and
>> >> > the handler/pmd threads.
>> >> >
>> >> > revalidator                          PMD threads
>> >> > push_dp_ops deletes a key and tries
>> >> > to del the dp magaflow.
>> >> >                                      does the upcall, generates a new 
>> >> > ukey,
>> >> >                                      and replaces the old ukey, now the 
>> >> > old
>> >> >                                      ukey state is UKEY_DELETED
>> >> >
>> >> > dp_ops succeeds, tries to change
>> >> > the old ukey's state into
>> >> > UKEY_EVICTED, however, the old
>> >> > ukey's state is already UKEY_DELETED,
>> >> > so OVS aborts.
>> >> >
>> >> > I did not observe this in the real environment, as it takes time for
>> >> > PMDs to finish the upcall and replace the old ukeys. Normally, the
>> >> > revalidator will change ukey state into UKEY_EVICTED first.
>> >> > But it's better to cover this case.
>> >>
>> >> Hello Peng,
>> >>
>> >> I've run into a similar issue by running OVS under Massif, but with a
>> >> different caller.
>> >>
>> >> Do you think it makes sense to move this change into transition_ukey() ?
>> >>
>> >>
>> >> Cheers,
>> >> M
>> >>
>> >> >
>> >> > Signed-off-by: Peng He <[email protected]>
>> >> > ---
>> >> >  ofproto/ofproto-dpif-upcall.c | 8 ++++++--
>> >> >  1 file changed, 6 insertions(+), 2 deletions(-)
>> >> >
>> >> > diff --git a/ofproto/ofproto-dpif-upcall.c 
>> >> > b/ofproto/ofproto-dpif-upcall.c
>> >> > index 4f68f14f2..9c55e43c1 100644
>> >> > --- a/ofproto/ofproto-dpif-upcall.c
>> >> > +++ b/ofproto/ofproto-dpif-upcall.c
>> >> > @@ -2411,7 +2411,9 @@ push_dp_ops(struct udpif *udpif, struct ukey_op 
>> >> > *ops, size_t n_ops)
>> >> >          if (op->dop.error) {
>> >> >              if (op->ukey) {
>> >> >                  ovs_mutex_lock(&op->ukey->mutex);
>> >> > -                transition_ukey(op->ukey, UKEY_EVICTED);
>> >> > +                if (op->ukey->state < UKEY_EVICTED) {
>> >> > +                    transition_ukey(op->ukey, UKEY_EVICTED);
>> >> > +                }
>> >> >                  ovs_mutex_unlock(&op->ukey->mutex);
>> >> >              }
>> >> >              /* if it's a flow_del error, 'stats' is unusable, it's ok
>> >> > @@ -2432,7 +2434,9 @@ push_dp_ops(struct udpif *udpif, struct ukey_op 
>> >> > *ops, size_t n_ops)
>> >> >
>> >> >          if (op->ukey) {
>> >> >              ovs_mutex_lock(&op->ukey->mutex);
>> >> > -            transition_ukey(op->ukey, UKEY_EVICTED);
>> >> > +            if (op->ukey->state < UKEY_EVICTED) {
>> >> > +                transition_ukey(op->ukey, UKEY_EVICTED);
>> >> > +            }
>> >> >              push->used = MAX(stats->used, op->ukey->stats.used);
>> >> >              push->tcp_flags = stats->tcp_flags | 
>> >> > op->ukey->stats.tcp_flags;
>> >> >              push->n_packets = stats->n_packets - 
>> >> > op->ukey->stats.n_packets;
>> >> > --
>> >> > 2.25.1
>> >> >
>> >> > _______________________________________________
>> >> > dev mailing list
>> >> > [email protected]
>> >> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>> >> >
>> >>
>> >
>> >
>> > --
>> > hepeng
>>

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to