On 17/05/21(Mon) 16:24, Alexandr Nedvedicky wrote:
> Hrvoje,
>
> managed to trigger diagnostic panics with diff [1] sent by bluhm@
> some time ago. The panic Hrvoje sees comes from ether_input() here:
>
> 414
> 415 /*
> 416 * Third phase: bridge processing.
> 417 *
> 418 * Give the packet to a bridge interface, ie, bridge(4),
> 419 * switch(4), or tpmr(4), if it is configured. A bridge
> 420 * may take the packet and forward it to another port, or it
> 421 * may return it here to ether_input() to support local
> 422 * delivery to this port.
> 423 */
> 424
> 425 ac = (struct arpcom *)ifp;
> 426
> 427 smr_read_enter();
> 428 eb = SMR_PTR_GET(&ac->ac_brport);
> 429 if (eb != NULL) {
> 430 m = (*eb->eb_input)(ifp, m, dst, eb->eb_port);
> 431 if (m == NULL) {
> 432 smr_read_leave();
> 433 return;
> 434 }
> 435 }
> 436 smr_read_leave();
> 437
>
> in current tree the ether_input() is protected by NET_LOCK(), which is grabbed
> by caller as a writer. bluhm's diff changes NET_LOCK() readlock, so
> ether_input() can run concurrently. Switching NET_LOCK() to r-lock has
> implications on smr read section above. The ting is the call to eb->eb_input()
> can sleep now. This is something what needs to be avoided within smr section.
Is the new sleeping point introduced by the fact the PF_LOCK() is a
rwlock? Did you consider using a mutex, at least for the time being,
in order to not run in such issues?