Some time ago I've tripped over ASSERT() in pf_state_key_link_reverse(), when
testing my changes to pfsync. I was using HP 710 systems with 8 cores I got
from Hrvoje. The panic happened rarely when running performance tesst over 10Gb

At time of panic the firewall was running with 'pass all' rule only.  The rule
creates state for inbound and outbound packets. At time of panic my box was
was running a custom kernel (compiled with WITH_PF_LOCK option and multiple
input net tasks).  My explanation how I could trip the assert is as follows:

    client sends SYN packet, which creates two states.

    now assume the server delays a response, so the response
    crosses with retransmitted SYN in firewall. In this case
    PF will be running two instances of pf_state_key_link_reverse().
    In this case there is one winner and one looser, which trips assert.

the fix in patch below is straightforward. Use atomic ops to link keys, while
keeping same asserts in place, when either atomic op fails.


thanks and

diff --git a/sys/net/pf.c b/sys/net/pf.c
index ebe339921fa..738759e556c 100644
--- a/sys/net/pf.c
+++ b/sys/net/pf.c
@@ -7398,11 +7398,20 @@ pf_inp_unlink(struct inpcb *inp)
 pf_state_key_link_reverse(struct pf_state_key *sk, struct pf_state_key *skrev)
-       /* Note that sk and skrev may be equal, then we refcount twice. */
-       KASSERT(sk->reverse == NULL);
-       KASSERT(skrev->reverse == NULL);
-       sk->reverse = pf_state_key_ref(skrev);
-       skrev->reverse = pf_state_key_ref(sk);
+       struct pf_state_key *old_reverse;
+       old_reverse = atomic_cas_ptr(&sk->reverse, NULL, skrev);
+       if (old_reverse != NULL)
+               KASSERT(old_reverse == skrev);
+       else
+               pf_state_key_ref(skrev);
+       old_reverse = atomic_cas_ptr(&skrev->reverse, NULL, sk);
+       if (old_reverse != NULL)
+               KASSERT(old_reverse == sk);
+       else
+               pf_state_key_ref(sk);
 #if NPFLOG > 0

Reply via email to