Hello David,

I'm still not sure if your change is ultimate fix, or just significantly
minimizes risk of the bug. If I understand things right, the problem we are
trying to solve:

    DIOCGETSTATES we have in current, grabs NET_LOCK() and pf_state_lock
    as a reader.

    it then walks through the whole state list and copies out (copyout(9f))
    data for each state into ioctl(2) buffer provided by calling process.

    we may trip assert down in copyout(9f):
> panic: acquiring blockable sleep lock with spinlock or critical section
> held (rwlock) vmmaplk
> Stopped at      db_enter+0x10:  popq    %rbp
>     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> *512895  28841      0         0x3          0    3K pfctl
> db_enter() at db_enter+0x10
> panic(ffffffff81e19411) at panic+0x12a
> witness_checkorder(fffffd83b09b4d18,1,0) at witness_checkorder+0xbce
> rw_enter_read(fffffd83b09b4d08) at rw_enter_read+0x38
> uvmfault_lookup(ffff8000238e3418,0) at uvmfault_lookup+0x8a
> uvm_fault_check(ffff8000238e3418,ffff8000238e3450,ffff8000238e3478) at
> uvm_fault_check+0x32
> uvm_fault(fffffd83b09b4d00,e36553c000,0,2) at uvm_fault+0xfc
> kpageflttrap(ffff8000238e3590,e36553c000) at kpageflttrap+0x131
> kerntrap(ffff8000238e3590) at kerntrap+0x91
> alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b
> copyout() at copyout+0x53


I'm just afraid that although your change significantly reduces the risk we
will die with similar call stack as the one above, the new code is not bullet
proof. We still do copyout() while holding pf_state_list.pfs_rwl as a reader
(in pf_states_get() from your diff). I agree packets do not grab
pf_state_list.pfs_rwl at all. Your fix solves this problem in this respect.

let's take a look at this part of pf_purge_expired_states()
from your diff:

1543         NET_LOCK();
1544         rw_enter_write(&pf_state_list.pfs_rwl);
1545         PF_LOCK();
1546         PF_STATE_ENTER_WRITE();
1547         SLIST_FOREACH(st, &gcl, gc_list) {
1548                 if (st->timeout != PFTM_UNLINKED)
1549                         pf_remove_state(st);
1550 
1551                 pf_free_state(st);
1552         }
1553         PF_STATE_EXIT_WRITE();
1554         PF_UNLOCK();
1555         rw_exit_write(&pf_state_list.pfs_rwl);

at line 1543 we grab NET_LOCK(), at line 1544 we are trying
to grab new lock (pf_state_list.pfs_rwl) exclusively. 

with your change we might be running into situation, where we do copyout() as a
reader on pf_state_list.pfs_rwl. Then we grab NET_LOCK() and attempt to acquire
pf_state_list.pfs_rwl exclusively, which is still occupied by guy, who might be
doing uvm_fault() in copyout(9f).

I'm just worried we may be trading one bug for another bug.  may be my concern
is just a false alarm here. I don't know.

anyway there are few more nits in your diff.

</snip>

> Index: pf.c
> ===================================================================
> RCS file: /cvs/src/sys/net/pf.c,v
> retrieving revision 1.1118
> diff -u -p -r1.1118 pf.c
> --- pf.c      1 Jun 2021 09:57:11 -0000       1.1118
> +++ pf.c      3 Jun 2021 06:24:48 -0000
> @@ -1247,7 +1278,8 @@ pf_purge_expired_rules(void)
>  void
>  pf_purge_timeout(void *unused)
>  {
> -     task_add(net_tq(0), &pf_purge_task);
> +     /* XXX move to systqmp to avoid KERNEL_LOCK */
> +     task_add(systq, &pf_purge_task);
>  }

    I would just clean up the comment. looks like we should be
    able to get pf's ioctl operations  out of KERNEL_LOCK completely.
    I'll take a further look at it, while be working in pf_ioctl.c
  
</snip>
> @@ -1280,11 +1311,10 @@ pf_purge(void *xnloops)
>        * Fragments don't require PF_LOCK(), they use their own lock.
>        */
>       if ((*nloops) >= pf_default_rule.timeout[PFTM_INTERVAL]) {
> -             pf_purge_expired_fragments();
> +             pfgpurge_expired_fragment(s);
>               *nloops = 0;
>       }
>       NET_UNLOCK();
> -     KERNEL_UNLOCK();
>  
>       timeout_add_sec(&pf_purge_to, 1);
>  }

    I guess 'pfgpurge_expired_fragment(s);' is unintentional change, right?

</snip>
> Index: pfvar_priv.h
> ===================================================================
> RCS file: /cvs/src/sys/net/pfvar_priv.h,v
> retrieving revision 1.6
> diff -u -p -r1.6 pfvar_priv.h
> --- pfvar_priv.h      9 Feb 2021 14:06:19 -0000       1.6
> +++ pfvar_priv.h      3 Jun 2021 06:24:48 -0000
> @@ -38,6 +38,115 @@
>  #ifdef _KERNEL
>  
>  #include <sys/rwlock.h>
> +#include <sys/mutex.h>
> +
> +/*
> +
> +states are linked into a global list to support the following
> +functionality:
> +
> + - garbage collection
> + - pfsync bulk send operations
> + - bulk state fetches via the DIOCGETSTATES ioctl
> + - bulk state clearing via the DIOCCLRSTATES ioctl
> +
> +states are inserted into the global pf_state_list once it has also
> +been successfully added to the various trees that make up the state
> +table. states are only removed from the pf_state_list by the garbage
> +collection process.
> +

    the multiline comment does not match style, I think.

I kind of got stuck with pf.c/pf_ioctl.c I have not seen if_pfsync.c yet.  will
do it later today/early tomorrow.

thanks and
regards
sashan

Reply via email to