Hello David,
</snip>
>
> i think we can get away with not refcounting eb_entry structures at all.
> either they're in the etherbridge map/table or they're not, and the
> thing that takes them out of the map while holding the eb_lock mutex
> becomes responsible for their cleanup.
>
> i feel like most of the original logic can still hold up if we fix my
> stupid refcnting mistake(s) and do a better job of avoiding a double
> free.
I'm not sure. It seems to me the code in your diff deals with
insert vs. insert race properly. how about delete vs. insert?
350 mtx_enter(&eb->eb_lock);
351 num = eb->eb_num + (oebe == NULL);
352 if (num <= eb->eb_max && ebt_insert(eb, nebe) == oebe) {
353 /* we won, do the update */
354 ebl_insert(ebl, nebe);
355
356 if (oebe != NULL) {
357 ebl_remove(ebl, oebe);
358 ebt_replace(eb, oebe, nebe);
359 }
360
361 nebe = NULL; /* give nebe reference to the table */
362 eb->eb_num = num;
363 } else {
364 /* we lost, we didn't end up replacing oebe */
365 oebe = NULL;
366 }
367 mtx_leave(&eb->eb_lock);
368
assume cpu0 got oebe and assumes it is going to perform update (oebe !=
NULL).
the cpu1 runs ahead and won mutex (->eb_lock) in etherbridge_del_addr() and
removed the entry successfully. as soon as cpu1 leaves ->eb_lock, it's
cpu0's turn. In this case ebt_insert() returns NULL, because there is
no conflict any more. However 'NULL != oebe'.
I'm not sure we can fix insert vs. delete race properly without atomic
reference counter.
thanks and
regards
sashan