Re: fix pppx(4) with net/ifq.c rev 1.38

David Gwynne Fri, 29 May 2020 23:42:04 -0700

On Mon, May 25, 2020 at 09:44:22AM +0200, Martin Pieuchot wrote:
> On 23/05/20(Sat) 15:38, Vitaliy Makkoveev wrote:
> > > On 23 May 2020, at 12:54, Martin Pieuchot <m...@openbsd.org> wrote:
> > > On 22/05/20(Fri) 13:25, Vitaliy Makkoveev wrote:
> > >> On Fri, May 22, 2020 at 07:57:13AM +1000, David Gwynne wrote:
> > >>> [...] 
> > >>> can you try the following diff?
> > >>> 
> > >> 
> > >> I tested this diff and it works for me. But the problem I pointed is
> > >> about pipex(4) locking.
> > >> 
> > >> pipex(4) requires NET_LOCK() be grabbed not only for underlaying
> > >> ip{,6}_output() but for itself too. But since pppac_start() has
> > >> unpredictable behavior I suggested to make it predictable [1].
> > > 
> > > What needs the NET_LOCK() in their?  We're talking about
> > > pipex_ppp_output(), right?  Does it really need the NET_LOCK() or
> > > the KERNEL_LOCK() is what protects those data structures?
> > 
> > Yes, about pipex_ppp_output() and pipex_output(). Except
> > ip{,6}_output() nothing requires NET_LOCK(). As David Gwynne pointed,
> > they can be replaced by ip{,6}_send().
> 
> Locks protect data structures, you're talking about functions, which
> data structures are serialized by this lock?  I'm questioning whether
> there is one.
> 
> > [...]
> > > In case of pipex(4) is isn't clear that the NET_LOCK() is necessary.
> > 
> > I guess, pipex(4) was wrapped by NET_LOCK() to protect it while it???s
> > accessed through `pr_input???. Is NET_LOCK() required for this case?
> 
> pipex(4) like all the network stack has been wrapped in the NET_LOCK()
> because it was easy to do.  That means it isn't a concious decision or
> design.  The fact that pipex(4) code runs under the NET_LOCK() is a side
> effect of how the rest of the stack evolved.  I'm questioning whether
> this lock is required there.  In theory it shouldn't.  What is the
> reality?


pipex and pppx pre-date the NET_LOCK, which means you can assume
that any implicit locking was and is done by the KERNEL_LOCK. mpi is
asking the right questions here.

As for the ifq maxlen difference between pppx and pppac, that's more
about when and how quickly they were written more than anything else.
The IFQ_SET_MAXLEN(&ifp->if_snd, 1) in pppx is because that's a way to
bypass transmit mitigation for pseudo/virtual interfaces. That was the
only way to do it historically. It is not an elegant hack to keep
hold of the NET_LOCK over a call to a start routine.

As a rule of thumb, network interface drivers should not (maybe
cannot) rely on the NET_LOCK in their if_start handlers. To be
clear, they should not rely on it being held by the network stack
when if_start is called because sometimes the stack calls it without
holding NET_LOCK, and they should not take it because they might
be called by the stack when it is being held.

Also, be aware that the ifq machinery makes sure that the start
routine is not called concurrently or recursively. You can queue
packets for transmission on an ifq from anywhere in the kernel at
any time, but only one cpu will run the start routine. Other cpus
can queue packets while another one is running if_start, but the
first one ends up responsible for trying to transmit it.

ifqs also take the KERNEL_LOCK before calling if_start if the interface
is not marked as IFXF_MPSAFE.

The summary is that pppx and pppac are not marked as mpsafe so their
start routines are called with KERNEL_LOCK held. Currently pppx
accidentally gets NET_LOCK because of the IFQ_SET_MAXLEN, but shouldn't
rely on it.

Cheers,
dlg

Re: fix pppx(4) with net/ifq.c rev 1.38

Reply via email to