On 20.5.2021. 3:23, Alexandr Nedvedicky wrote:
> Hello,
> 
> Hrvoje gave a try to experimental diff, which trades rw-locks in pf(4)
> for mutexes [1]. Hrvoje soon discovered machine panics, when doing 'pfctl -ss'
> The callstack looks as follows:
> 
> panic: acquiring blockable sleep lock with spinlock or critical section
> held (rwlock) vmmaplk
> Stopped at      db_enter+0x10:  popq    %rbp
>     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> *512895  28841      0         0x3          0    3K pfctl
> db_enter() at db_enter+0x10
> panic(ffffffff81e19411) at panic+0x12a
> witness_checkorder(fffffd83b09b4d18,1,0) at witness_checkorder+0xbce
> rw_enter_read(fffffd83b09b4d08) at rw_enter_read+0x38
> uvmfault_lookup(ffff8000238e3418,0) at uvmfault_lookup+0x8a
> uvm_fault_check(ffff8000238e3418,ffff8000238e3450,ffff8000238e3478) at
> uvm_fault_check+0x32
> uvm_fault(fffffd83b09b4d00,e36553c000,0,2) at uvm_fault+0xfc
> kpageflttrap(ffff8000238e3590,e36553c000) at kpageflttrap+0x131
> kerntrap(ffff8000238e3590) at kerntrap+0x91
> alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b
> copyout() at copyout+0x53
> VOP_IOCTL(fffffd83b24340e0,c0104419,ffff8000238e3930,1,fffffd842f7d7120,ffff800
> 0239597a8) at VOP_IOCTL+0x68
> vn_ioctl(fffffd83b294edc0,c0104419,ffff8000238e3930,ffff8000239597a8) at
> vn_ioctl+0x75
> sys_ioctl(ffff8000239597a8,ffff8000238e3a40,ffff8000238e3aa0) at
> sys_ioctl+0x2c4
> end trace frame: 0xffff8000238e3b00, count: 0
> https://www.openbsd.org/ddb.html
> reports.  Insufficient info makes it difficult to find and fix bugs.
> ddb{3}>
> 
> The problem comes from DIOCGETSTATES in pfioctl() here:
> 1775 
> 1776                 NET_LOCK();
> 1777                 PF_STATE_ENTER_READ();
> 1778                 state = TAILQ_FIRST(&state_list);
> 1779                 while (state) {
> 1780                         if (state->timeout != PFTM_UNLINKED) {
> 1781                                 if ((nr+1) * sizeof(*p) > ps->ps_len)
> 1782                                         break;
> 1783                                 pf_state_export(pstore, state);
> 1784                                 error = copyout(pstore, p, sizeof(*p));
> 1785                                 if (error) {
> 1786                                         free(pstore, M_TEMP, 
> sizeof(*pstore));
> 1787                                         PF_STATE_EXIT_READ();
> 1788                                         NET_UNLOCK();
> 1789                                         goto fail;
> 1790                                 }
> 1791                                 p++;
> 1792                                 nr++;
> 1793                         }
> 1794                         state = TAILQ_NEXT(state, entry_list);
> 1795                 }
> 1796                 PF_STATE_EXIT_READ();
> 1797                 NET_UNLOCK();
> 1798 
> 
> at line 1784 we do copyout() while holding mutex. As I've mentioned glitch is 
> a
> specific to experimental diff [1]. However this made me thinking, that
> it's not a good idea to do copyout() while holding NET_LOCK() and state_lock.
> 
> Diff below moves copyout() at line 1784 outside of protection of both locks.
> The approach I took is relatively straightforward:
> 
>     let DIOCGETSTATES to allocate hold_states array, which will keep
>     references to states.
> 
>     grab locks and take references, keep those references in hold
>     array.
> 
>     drop locks, export states and do copyout, while walking
>     array of references.
> 
>     drop references, release hold_states array.
> 
> does it make sense? If we agree that this approach makes sense
> I'll commit this diff and revisit other such places we currently
> have in pfioctl().


Hi,

with this diff i can't reproduce panic as before. i tried pf with
routing, veb, tpmr and bridge.

Do you want me to test this diff with parallel diff?




Reply via email to