Re: [External] : Re: if_etherbridge.c vs. parallel forwarding

2021-06-25 Thread Hrvoje Popovski
On 25.6.2021. 10:02, Alexandr Nedvedicky wrote:
> Hello David,
> 
> 
>>
>> during the drive to work it occurred to me that we should basically have
>> the same logic around whether we should insert or replace or do nothing
>> in both the smr and mutex critical sections.
>>
>> it at least makes the code easier to understand. i think?
> 
> yes, the new diff is easier to reads. I would not call it dumb, just
> 'easier to read' sounds more appropriate.
> 
> I think combination of ebt_find() and ebt_insert() (price doubles, when
> inserting a new entry) is acceptable cost for extra diagnostic panic. 
> We can always change it later. I'm OK if it will get committed as-is.
> 
> OK sashan
> 


Hi,

with this diff i can't trigger panic what ever i do... but it could be
that i'm not very creative right now :)



Re: [External] : Re: if_etherbridge.c vs. parallel forwarding

2021-06-25 Thread Alexandr Nedvedicky
Hello David,


> 
> during the drive to work it occurred to me that we should basically have
> the same logic around whether we should insert or replace or do nothing
> in both the smr and mutex critical sections.
> 
> it at least makes the code easier to understand. i think?

yes, the new diff is easier to reads. I would not call it dumb, just
'easier to read' sounds more appropriate.

I think combination of ebt_find() and ebt_insert() (price doubles, when
inserting a new entry) is acceptable cost for extra diagnostic panic. 
We can always change it later. I'm OK if it will get committed as-is.

OK sashan



Re: [External] : Re: if_etherbridge.c vs. parallel forwarding

2021-06-24 Thread David Gwynne
On Thu, Jun 24, 2021 at 09:31:20AM +0200, Alexandr Nedvedicky wrote:
> Hello David,
> 
> 
> > 
> > i think we can get away with not refcounting eb_entry structures at all.
> > either they're in the etherbridge map/table or they're not, and the
> > thing that takes them out of the map while holding the eb_lock mutex
> > becomes responsible for their cleanup.
> > 
> > i feel like most of the original logic can still hold up if we fix my
> > stupid refcnting mistake(s) and do a better job of avoiding a double
> > free.
> 
> I'm not sure. It seems to me the code in your diff deals with
> insert vs. insert race properly. how about delete vs. insert?

hrm. i can half convince myself that the outcome of losing a delete vs
insert race would have a semantically correct outcome, but while i'm
trying to do that it occurs to me that i'm trying to make the code too
clever and i should dumb it down.

> 350 mtx_enter(>eb_lock);
> 351 num = eb->eb_num + (oebe == NULL);
>
> 352 if (num <= eb->eb_max && ebt_insert(eb, nebe) == oebe) {  
>
> 353 /* we won, do the update */   
>
> 354 ebl_insert(ebl, nebe);
> 355 
> 356 if (oebe != NULL) {
> 357 ebl_remove(ebl, oebe);
> 358 ebt_replace(eb, oebe, nebe);
> 359 }
> 360 
> 361 nebe = NULL; /* give nebe reference to the table */
> 362 eb->eb_num = num; 
>
> 363 } else {  
>
> 364 /* we lost, we didn't end up replacing oebe */
> 365 oebe = NULL;
> 366 }
> 367 mtx_leave(>eb_lock);
> 368 
> 
> assume cpu0 got oebe and assumes it is going to perform update (oebe != 
> NULL).
> the cpu1 runs ahead and won mutex (->eb_lock) in etherbridge_del_addr() 
> and
> removed the entry successfully. as soon as cpu1 leaves ->eb_lock, it's
> cpu0's turn. In this case ebt_insert() returns NULL, because there is
> no conflict any more. However 'NULL != oebe'.

in this situation it would look like etherbridge_del_addr ran after nebe
was inserted, which i think is a plausible history.

> I'm not sure we can fix insert vs. delete race properly without atomic
> reference counter.

during the drive to work it occurred to me that we should basically have
the same logic around whether we should insert or replace or do nothing
in both the smr and mutex critical sections.

it at least makes the code easier to understand. i think?

Index: if_etherbridge.c
===
RCS file: /cvs/src/sys/net/if_etherbridge.c,v
retrieving revision 1.6
diff -u -p -r1.6 if_etherbridge.c
--- if_etherbridge.c10 Mar 2021 10:21:47 -  1.6
+++ if_etherbridge.c25 Jun 2021 03:56:37 -
@@ -44,7 +44,6 @@
 
 #include 
 
-static inline void ebe_take(struct eb_entry *);
 static inline void ebe_rele(struct eb_entry *);
 static voidebe_free(void *);
 
@@ -233,16 +232,9 @@ ebt_remove(struct etherbridge *eb, struc
 }
 
 static inline void
-ebe_take(struct eb_entry *ebe)
-{
-   refcnt_take(>ebe_refs);
-}
-
-static void
 ebe_rele(struct eb_entry *ebe)
 {
-   if (refcnt_rele(>ebe_refs))
-   smr_call(>ebe_smr_entry, ebe_free, ebe);
+   smr_call(>ebe_smr_entry, ebe_free, ebe);
 }
 
 static void
@@ -309,19 +301,21 @@ etherbridge_map(struct etherbridge *eb, 
 
smr_read_enter();
oebe = ebl_find(ebl, eba);
-   if (oebe == NULL)
-   new = 1;
-   else {
+   if (oebe == NULL) {
+   /*
+* peek at the space to see if it's worth trying
+* to make a new entry.
+*/
+   if (eb->eb_num < eb->eb_max)
+   new = 1;
+   } else {
if (oebe->ebe_age != now)
oebe->ebe_age = now;
 
/* does this entry need to be replaced? */
if (oebe->ebe_type == EBE_DYNAMIC &&
-   !eb_port_eq(eb, oebe->ebe_port, port)) {
+   !eb_port_eq(eb, oebe->ebe_port, port))
new = 1;
-   ebe_take(oebe);
-   } else
-   oebe = NULL;
}
smr_read_leave();
 
@@ -342,7 +336,6 @@ etherbridge_map(struct etherbridge *eb, 
}
 
smr_init(>ebe_smr_entry);
-   refcnt_init(>ebe_refs);
nebe->ebe_etherbridge = eb;
 
nebe->ebe_addr = eba;
@@ -351,40 +344,49 @@ etherbridge_map(struct etherbridge *eb, 

Re: [External] : Re: if_etherbridge.c vs. parallel forwarding

2021-06-24 Thread Alexandr Nedvedicky
Hello David,


> 
> i think we can get away with not refcounting eb_entry structures at all.
> either they're in the etherbridge map/table or they're not, and the
> thing that takes them out of the map while holding the eb_lock mutex
> becomes responsible for their cleanup.
> 
> i feel like most of the original logic can still hold up if we fix my
> stupid refcnting mistake(s) and do a better job of avoiding a double
> free.

I'm not sure. It seems to me the code in your diff deals with
insert vs. insert race properly. how about delete vs. insert?

350 mtx_enter(>eb_lock);
351 num = eb->eb_num + (oebe == NULL);  
 
352 if (num <= eb->eb_max && ebt_insert(eb, nebe) == oebe) {
 
353 /* we won, do the update */ 
 
354 ebl_insert(ebl, nebe);
355 
356 if (oebe != NULL) {
357 ebl_remove(ebl, oebe);
358 ebt_replace(eb, oebe, nebe);
359 }
360 
361 nebe = NULL; /* give nebe reference to the table */
362 eb->eb_num = num;   
 
363 } else {
 
364 /* we lost, we didn't end up replacing oebe */
365 oebe = NULL;
366 }
367 mtx_leave(>eb_lock);
368 

assume cpu0 got oebe and assumes it is going to perform update (oebe != 
NULL).
the cpu1 runs ahead and won mutex (->eb_lock) in etherbridge_del_addr() and
removed the entry successfully. as soon as cpu1 leaves ->eb_lock, it's
cpu0's turn. In this case ebt_insert() returns NULL, because there is
no conflict any more. However 'NULL != oebe'.

I'm not sure we can fix insert vs. delete race properly without atomic
reference counter.

thanks and
regards
sashan