Hello,

On Mon, May 17, 2021 at 06:59:36PM +0200, Martin Pieuchot wrote:
> On 17/05/21(Mon) 16:24, Alexandr Nedvedicky wrote:
> > Hrvoje,
> > 
> > managed to trigger diagnostic panics with diff [1] sent by bluhm@
> > some time ago. The panic Hrvoje sees comes from ether_input() here:
> > 
> >  414 
> >  415         /*
> >  416          * Third phase: bridge processing.
> >  417          *
> >  418          * Give the packet to a bridge interface, ie, bridge(4),
> >  419          * switch(4), or tpmr(4), if it is configured. A bridge
> >  420          * may take the packet and forward it to another port, or it
> >  421          * may return it here to ether_input() to support local
> >  422          * delivery to this port.
> >  423          */
> >  424 
> >  425         ac = (struct arpcom *)ifp;
> >  426 
> >  427         smr_read_enter();
> >  428         eb = SMR_PTR_GET(&ac->ac_brport);
> >  429         if (eb != NULL) {
> >  430                 m = (*eb->eb_input)(ifp, m, dst, eb->eb_port);
> >  431                 if (m == NULL) {
> >  432                         smr_read_leave();
> >  433                         return;
> >  434                 }
> >  435         }
> >  436         smr_read_leave();
> >  437 
> > 
> > in current tree the ether_input() is protected by NET_LOCK(), which is 
> > grabbed
> > by caller as a writer. bluhm's diff changes NET_LOCK() readlock, so
> > ether_input() can run concurrently. Switching NET_LOCK() to r-lock has
> > implications on smr read section above. The ting is the call to 
> > eb->eb_input()
> > can sleep now. This is something what needs to be avoided within smr 
> > section.
> 
> Is the new sleeping point introduced by the fact the PF_LOCK() is a
> rwlock?  Did you consider using a mutex, at least for the time being,
> in order to not run in such issues?

    there are two locks in pf_test():
        PF_LOCK()

        PF_STATE_LOCK()

    we would have to convert both locks into mutexes.

    trading PF_STATE_LOCK() into mutex is step backward IMO.
    bluhm's parallel diff enables PF to run in parallel for packets.
    if packet matches existing state, it just grabs state lock as
    a reader. This brings huge benefit, because majority of packets
    just do the state look up. Only some packets need to go expensive
    way through rules and state creation, which involves PF_LOCK() and
    write lock on state rw.

    as readers and are done with PF.

trading both locks for mutexes will do the trick too. However I'm not
sure I see a benefit there (when comparing to current NET_LOCK() we have
in tree).

I think all safety belts we have kicked in. We learned smr read block is
not an option for ether_input() when comes to bridges.  perhaps alternative
solution would be to introduce an RW-lock, which would protect 'ac->br_port'.
Such R-lock would cover call ->eb_input(). However I'm not sure about it.

I don't mind to trade pf_lock and pf_state_lock for mutexes, however
I see such step is kind of 'NO-OP'. We do have sufficient measure
currently, which is: keep NET_LOCK() as is. May be I'm not seeing
your idea/plan behind changing pf's rw-locks to mutexes. If you feel
there is a benefit to go that way, then let's do it, but I'd like
to understand where we will be going/what to expect.

thanks and
regards
sashan

Reply via email to