Hello, Some time ago I've tripped over ASSERT() in pf_state_key_link_reverse(), when testing my changes to pfsync. I was using HP 710 systems with 8 cores I got from Hrvoje. The panic happened rarely when running performance tesst over 10Gb interfaces.
At time of panic the firewall was running with 'pass all' rule only. The rule creates state for inbound and outbound packets. At time of panic my box was was running a custom kernel (compiled with WITH_PF_LOCK option and multiple input net tasks). My explanation how I could trip the assert is as follows: client sends SYN packet, which creates two states. now assume the server delays a response, so the response crosses with retransmitted SYN in firewall. In this case PF will be running two instances of pf_state_key_link_reverse(). In this case there is one winner and one looser, which trips assert. the fix in patch below is straightforward. Use atomic ops to link keys, while keeping same asserts in place, when either atomic op fails. OK? thanks and regards sashan --------8<---------------8<---------------8<------------------8<-------- diff --git a/sys/net/pf.c b/sys/net/pf.c index ebe339921fa..738759e556c 100644 --- a/sys/net/pf.c +++ b/sys/net/pf.c @@ -7398,11 +7398,20 @@ pf_inp_unlink(struct inpcb *inp) void pf_state_key_link_reverse(struct pf_state_key *sk, struct pf_state_key *skrev) { - /* Note that sk and skrev may be equal, then we refcount twice. */ - KASSERT(sk->reverse == NULL); - KASSERT(skrev->reverse == NULL); - sk->reverse = pf_state_key_ref(skrev); - skrev->reverse = pf_state_key_ref(sk); + struct pf_state_key *old_reverse; + + old_reverse = atomic_cas_ptr(&sk->reverse, NULL, skrev); + if (old_reverse != NULL) + KASSERT(old_reverse == skrev); + else + pf_state_key_ref(skrev); + + + old_reverse = atomic_cas_ptr(&skrev->reverse, NULL, sk); + if (old_reverse != NULL) + KASSERT(old_reverse == sk); + else + pf_state_key_ref(sk); } #if NPFLOG > 0