Re: [External] : Re: parallel forwarding vs. bridges

2021-06-22 Thread Alexandr Nedvedicky
Hello, new diff still looks good to me. I could just catch typos in comment. > > I've been running versions of this diff in production at work, and have > hit a few panics and asserts. All the issues we've hit should be > addressed in this diff. > > The first issue was that pfsync

Re: [External] : Re: parallel forwarding vs. bridges

2021-06-21 Thread David Gwynne
On Wed, Jun 16, 2021 at 02:59:19AM +0200, Alexandr Nedvedicky wrote: > Hello, > > > > > > as above, copyout with a sleeping lock is fine. > > > > the whole point of my change is to give us the ability to lock in the > > forwarding path separately to locking in the ioctl path. half of that is >

Re: [External] : Re: parallel forwarding vs. bridges

2021-06-16 Thread Alexandr Nedvedicky
Hello, re-sending the same diff updated to current if_pfsync.c. diff avoids a recursion to PF_LOCK(), which is indicated by panic stack here: > > production firewalls panicked with the pf lock trying to lock against > > itself a couple of nights ago: > > > > db_enter() at db_enter+0x5 > >

Re: [External] : Re: parallel forwarding vs. bridges

2021-06-15 Thread Alexandr Nedvedicky
Hello, > > as above, copyout with a sleeping lock is fine. > > the whole point of my change is to give us the ability to lock in the > forwarding path separately to locking in the ioctl path. half of that is > so we can copyout safely. the other half is to avoid letting the ioctl > path block

Re: [External] : Re: parallel forwarding vs. bridges

2021-06-15 Thread David Gwynne
On Tue, Jun 08, 2021 at 06:54:25PM +0200, Alexandr Nedvedicky wrote: > Hello David, > > I'm still not sure if your change is ultimate fix, or just significantly > minimizes risk of the bug. If I understand things right, the problem we are > trying to solve: > DIOCGETSTATES we have in

Re: [External] : Re: parallel forwarding vs. bridges

2021-06-14 Thread Alexandr Nedvedicky
Hello, > > it seems to me we may leak `pd` we took at line 1944. The leak > > happens in case timeout_del() return zero. > > > > diff below ignores the return value from timeout_del() and makes > > sure we always call pfsync_undefefer() > > > > I have not seen such panic in

Re: [External] : Re: parallel forwarding vs. bridges

2021-06-13 Thread David Gwynne
On Sat, Jun 05, 2021 at 01:09:01PM +0200, Alexandr Nedvedicky wrote: > Hello David, > > > > > the scope of the pf locks likely needs reduction anyway. one of my > > I agree. pf_lock covers too much in PF currently. it protects, > all rules, tables and fragment caches. > > > production

Re: [External] : Re: parallel forwarding vs. bridges

2021-06-08 Thread Alexandr Nedvedicky
Hello David, I'm still not sure if your change is ultimate fix, or just significantly minimizes risk of the bug. If I understand things right, the problem we are trying to solve: DIOCGETSTATES we have in current, grabs NET_LOCK() and pf_state_lock as a reader. it then walks through

Re: [External] : Re: parallel forwarding vs. bridges

2021-06-07 Thread Hrvoje Popovski
On 7.6.2021. 9:25, Alexandr Nedvedicky wrote: > Hello, > > On Sun, Jun 06, 2021 at 09:54:50PM +0200, Hrvoje Popovski wrote: > >> >> this one? > yes, exactly this one. Ok, great .. this and similar panics are products of accidental stp loops. I have two same boxes with the same network

Re: [External] : Re: parallel forwarding vs. bridges

2021-06-07 Thread Alexandr Nedvedicky
Hello, On Sun, Jun 06, 2021 at 09:54:50PM +0200, Hrvoje Popovski wrote: > > > this one? yes, exactly this one. > > > ddb{5}> show panic > cpu5: kernel diagnostic assertion "rvebe == NULL" failed: file > "/sys/net/if_etherbridge.c", line 226 ebt_replace eb 0x80682e68 > nebe

Re: [External] : Re: parallel forwarding vs. bridges

2021-06-06 Thread Hrvoje Popovski
On 5.6.2021. 18:43, Alexandr Nedvedicky wrote: > According to early tests it works well. Currently there is a one > mysterious panic, which Hrvoje may be able to comment on > how to trigger it. The stack looks as follows: > > kernel diagnostic assertion "smr->smr_func == NULL"

Re: [External] : Re: parallel forwarding vs. bridges

2021-06-05 Thread Alexandr Nedvedicky
Hello David, > > my best argument for mutexes in pf is that currently we use smr critical > sections around things like the bridge and aggr port selection, which > are very obviously read-only workloads. pf may be a largely read-only > workload, but where it is at the moment means it's not

Re: [External] : Re: parallel forwarding vs. bridges

2021-06-05 Thread Alexandr Nedvedicky
Hello David, > the scope of the pf locks likely needs reduction anyway. one of my I agree. pf_lock covers too much in PF currently. it protects, all rules, tables and fragment caches. > production firewalls panicked with the pf lock trying to lock against > itself a couple of nights

Re: [External] : Re: parallel forwarding vs. bridges

2021-06-03 Thread David Gwynne
On Thu, Jun 03, 2021 at 01:09:48AM +0200, Alexandr Nedvedicky wrote: > > pf_purge_expired_states(u_int32_t maxcheck) > > { > > > - cur = pf_state_ref(next); > > + do { > > + if ((cur->timeout == PFTM_UNLINKED) || > > + (pf_state_expires(cur) <= getuptime()))

Re: [External] : Re: parallel forwarding vs. bridges

2021-06-03 Thread David Gwynne
On Thu, Jun 03, 2021 at 01:09:48AM +0200, Alexandr Nedvedicky wrote: > Hello, > > looks like my earlier mail got eaten by SPAM filter, let's try again... > > > > > > moving pf locks to mutexes makes sense to me, but like you say, > > this will need testing and experimentation. one of the

Re: [External] : Re: parallel forwarding vs. bridges

2021-06-02 Thread David Gwynne
On Wed, May 19, 2021 at 01:48:26AM +0200, Alexandr Nedvedicky wrote: > Hello, > > just for the record... > > > > > > in current tree the ether_input() is protected by NET_LOCK(), which is > > > grabbed > > > by caller as a writer. bluhm's diff changes NET_LOCK() readlock, so > > >

Re: [External] : Re: parallel forwarding vs. bridges

2021-05-18 Thread Alexandr Nedvedicky
Hello, just for the record... > > in current tree the ether_input() is protected by NET_LOCK(), which is > > grabbed > > by caller as a writer. bluhm's diff changes NET_LOCK() readlock, so > > ether_input() can run concurrently. Switching NET_LOCK() to r-lock has > > implications on smr read

Re: [External] : Re: parallel forwarding vs. bridges

2021-05-17 Thread Alexandr Nedvedicky
Hello, On Mon, May 17, 2021 at 08:19:37PM +0200, Alexander Bluhm wrote: > On Mon, May 17, 2021 at 04:24:15PM +0200, Alexandr Nedvedicky wrote: > > in current tree the ether_input() is protected by NET_LOCK(), which is > > grabbed > > by caller as a writer. bluhm's diff changes NET_LOCK()

Re: [External] : Re: parallel forwarding vs. bridges

2021-05-17 Thread Alexandr Nedvedicky
On Mon, May 17, 2021 at 08:27:12PM +0200, Martin Pieuchot wrote: > On 17/05/21(Mon) 19:52, Alexandr Nedvedicky wrote: > > [...] > > I don't mind to trade pf_lock and pf_state_lock for mutexes, however > > I see such step is kind of 'NO-OP'. We do have sufficient measure > > currently, which is:

Re: [External] : Re: parallel forwarding vs. bridges

2021-05-17 Thread Martin Pieuchot
On 17/05/21(Mon) 19:52, Alexandr Nedvedicky wrote: > [...] > I don't mind to trade pf_lock and pf_state_lock for mutexes, however > I see such step is kind of 'NO-OP'. We do have sufficient measure > currently, which is: keep NET_LOCK() as is. May be I'm not seeing > your idea/plan behind

Re: parallel forwarding vs. bridges

2021-05-17 Thread Alexander Bluhm
On Mon, May 17, 2021 at 04:24:15PM +0200, Alexandr Nedvedicky wrote: > in current tree the ether_input() is protected by NET_LOCK(), which is grabbed > by caller as a writer. bluhm's diff changes NET_LOCK() readlock, so > ether_input() can run concurrently. Switching NET_LOCK() to r-lock has >

Re: [External] : Re: parallel forwarding vs. bridges

2021-05-17 Thread Alexandr Nedvedicky
Hello, On Mon, May 17, 2021 at 06:59:36PM +0200, Martin Pieuchot wrote: > On 17/05/21(Mon) 16:24, Alexandr Nedvedicky wrote: > > Hrvoje, > > > > managed to trigger diagnostic panics with diff [1] sent by bluhm@ > > some time ago. The panic Hrvoje sees comes from ether_input() here: > > > > 414

Re: parallel forwarding vs. bridges

2021-05-17 Thread Hrvoje Popovski
On 17.5.2021. 16:24, Alexandr Nedvedicky wrote: > Hrvoje, > > managed to trigger diagnostic panics with diff [1] sent by bluhm@ > some time ago. The panic Hrvoje sees comes from ether_input() here: > [1] https://marc.info/?l=openbsd-tech=161903387904923=2 > > [2]

Re: parallel forwarding vs. bridges

2021-05-17 Thread Martin Pieuchot
On 17/05/21(Mon) 16:24, Alexandr Nedvedicky wrote: > Hrvoje, > > managed to trigger diagnostic panics with diff [1] sent by bluhm@ > some time ago. The panic Hrvoje sees comes from ether_input() here: > > 414 > 415 /* > 416 * Third phase: bridge processing. > 417 *

parallel forwarding vs. bridges

2021-05-17 Thread Alexandr Nedvedicky
Hrvoje, managed to trigger diagnostic panics with diff [1] sent by bluhm@ some time ago. The panic Hrvoje sees comes from ether_input() here: 414 415 /* 416 * Third phase: bridge processing. 417 * 418 * Give the packet to a bridge interface, ie,