Hello,

On Mon, May 17, 2021 at 08:19:37PM +0200, Alexander Bluhm wrote:
> On Mon, May 17, 2021 at 04:24:15PM +0200, Alexandr Nedvedicky wrote:
> > in current tree the ether_input() is protected by NET_LOCK(), which is 
> > grabbed
> > by caller as a writer. bluhm's diff changes NET_LOCK() readlock, so
> > ether_input() can run concurrently. Switching NET_LOCK() to r-lock has
> > implications on smr read section above. The ting is the call to 
> > eb->eb_input()
> > can sleep now. This is something what needs to be avoided within smr 
> > section.
> 
> Do you think the sleeping occures in PF_LOCK()?  Could it be that
> this rwlock did never sleep before, but was acquired instantly?
> 

    yes, the sleep occurs on PF_LOCK(), right here in pf_test():
    7052                            a = s->anchor.ptr;
    7053    #if NPFLOG > 0
    7054                            pd.pflog |= s->log;
    7055    #endif  /* NPFLOG > 0 */
    7056                    } else if (s == NULL) {
    7057                            PF_LOCK();
    7058                            have_pf_lock = 1;
    7059                            action = pf_test_rule(&pd, &r, &s, &a, 
&ruleset,
    7060                                &reason);
    7061                            s = pf_state_ref(s);
    (gdb)

    with exclusive NET_LOCK() the PF_LOCK() never blocks/sleeps, because
    exclusive NET_LOCK() ensures there is only single packet running in
    pf_test().  changing exclusive NET_LOCK() to reader lock allows more
    packets to run in pf_test(), therefore PF_LOCK() can block.


> Having sleeps in the packet forwarding path is a bad idea.  And
> refcounting per packet looks like a workarund.  It does not make
> things faster.

    I'm afraid sleeps (exclusive locks) can not be avoided with current data
    structures we have in PF. for example if packet will be creating state,
    then it must grab state lock exclusively.

    we have to make sure the sleep operation won't occur in code, which
    is protected by lock. Such risk comes from blocking malloc being
    called by caller, which owns lock (either reader or writer).

> 
> How about implementing PF_LOCK() with a mutex?

    As I've answered to mpi@ already, there are two locks, which
    we need to change to mutexes:
        PF_LOCK() and state lock
> 
> I made such a diff a few months ago.  It is not well tested yet.
> There were some problems when allocating pf mutex in ioctl path.
> mpi@ did not like it, as it removes the pf lock in pf ioctl and
> relies on net lock for reconfigure.

    I've read through your diff. If I understand it right two things
    are being changed:
        PF_LOCK() is thrown away completely and we use NET_LOCK() instead

        state_lock is changed to mutex

    IMO removing PF_LOCK() is step backward. I think every subsystem (including
    PF) should provide its own lock and should not piggy back on NET_LOCK().
> 
> It is a design question.  Do we want to keep net lock for reconfigure?

    No, we don't want NET_LOCK() for reconfigure. configuration changes
    should be synchronized using PF_LOCK()

> If we process packets with shared netlock and reconfigure pf with
> exclusive netlock this looks reasonable.
> 
> Do we want to go this way?  Then I will polish and test this diff.

    no I think we don't want to go that way.

if it comes to pf(4) configuration is more colorful.  the ioctl, which updates
firewall configuration grabs lock exclusively (be it current pf_lock or
net_lock). For certain operations (queue management and tables) we also
allocate memory _without_ NOSLEEP. We may end up in situation when we are 
sleeping in malloc while holding lock, which we share with packets.

Therefore I think we actually need extra rw-lock just for ioctl.
this new lock will serialize all callers which will come from userland.
we don't mind to sleep on malloc with ioctl-rw lock held exclusively.

as soon as memory for desired operation will be acquired, then it
will be time to grab PF_LOCK() as a writer and update firewall config.
this way we should avoid blocking processing of packets due to
memory allocations.

I think Vitaliy implemented something similar for other subsystem,
can't remember, which one (?pipex?).

thanks and
regards
sashan

Reply via email to