Hello,
new diff still looks good to me. I could just catch typos
in comment.
>
> I've been running versions of this diff in production at work, and have
> hit a few panics and asserts. All the issues we've hit should be
> addressed in this diff.
>
> The first issue was that pfsync
On Wed, Jun 16, 2021 at 02:59:19AM +0200, Alexandr Nedvedicky wrote:
> Hello,
>
>
> >
> > as above, copyout with a sleeping lock is fine.
> >
> > the whole point of my change is to give us the ability to lock in the
> > forwarding path separately to locking in the ioctl path. half of that is
>
Hello,
re-sending the same diff updated to current if_pfsync.c.
diff avoids a recursion to PF_LOCK(), which is indicated
by panic stack here:
> > production firewalls panicked with the pf lock trying to lock against
> > itself a couple of nights ago:
> >
> > db_enter() at db_enter+0x5
> >
Hello,
>
> as above, copyout with a sleeping lock is fine.
>
> the whole point of my change is to give us the ability to lock in the
> forwarding path separately to locking in the ioctl path. half of that is
> so we can copyout safely. the other half is to avoid letting the ioctl
> path block
On Tue, Jun 08, 2021 at 06:54:25PM +0200, Alexandr Nedvedicky wrote:
> Hello David,
>
> I'm still not sure if your change is ultimate fix, or just significantly
> minimizes risk of the bug. If I understand things right, the problem we are
> trying to solve:
> DIOCGETSTATES we have in
Hello,
> > it seems to me we may leak `pd` we took at line 1944. The leak
> > happens in case timeout_del() return zero.
> >
> > diff below ignores the return value from timeout_del() and makes
> > sure we always call pfsync_undefefer()
> >
> > I have not seen such panic in
On Sat, Jun 05, 2021 at 01:09:01PM +0200, Alexandr Nedvedicky wrote:
> Hello David,
>
>
>
> > the scope of the pf locks likely needs reduction anyway. one of my
>
> I agree. pf_lock covers too much in PF currently. it protects,
> all rules, tables and fragment caches.
>
> > production
Hello David,
I'm still not sure if your change is ultimate fix, or just significantly
minimizes risk of the bug. If I understand things right, the problem we are
trying to solve:
DIOCGETSTATES we have in current, grabs NET_LOCK() and pf_state_lock
as a reader.
it then walks through
On 7.6.2021. 9:25, Alexandr Nedvedicky wrote:
> Hello,
>
> On Sun, Jun 06, 2021 at 09:54:50PM +0200, Hrvoje Popovski wrote:
>
>>
>> this one?
> yes, exactly this one.
Ok, great ..
this and similar panics are products of accidental stp loops. I have two
same boxes with the same network
Hello,
On Sun, Jun 06, 2021 at 09:54:50PM +0200, Hrvoje Popovski wrote:
>
>
> this one?
yes, exactly this one.
>
>
> ddb{5}> show panic
> cpu5: kernel diagnostic assertion "rvebe == NULL" failed: file
> "/sys/net/if_etherbridge.c", line 226 ebt_replace eb 0x80682e68
> nebe
On 5.6.2021. 18:43, Alexandr Nedvedicky wrote:
> According to early tests it works well. Currently there is a one
> mysterious panic, which Hrvoje may be able to comment on
> how to trigger it. The stack looks as follows:
>
> kernel diagnostic assertion "smr->smr_func == NULL"
Hello David,
>
> my best argument for mutexes in pf is that currently we use smr critical
> sections around things like the bridge and aggr port selection, which
> are very obviously read-only workloads. pf may be a largely read-only
> workload, but where it is at the moment means it's not
Hello David,
> the scope of the pf locks likely needs reduction anyway. one of my
I agree. pf_lock covers too much in PF currently. it protects,
all rules, tables and fragment caches.
> production firewalls panicked with the pf lock trying to lock against
> itself a couple of nights
On Thu, Jun 03, 2021 at 01:09:48AM +0200, Alexandr Nedvedicky wrote:
> > pf_purge_expired_states(u_int32_t maxcheck)
> > {
>
> > - cur = pf_state_ref(next);
> > + do {
> > + if ((cur->timeout == PFTM_UNLINKED) ||
> > + (pf_state_expires(cur) <= getuptime()))
On Thu, Jun 03, 2021 at 01:09:48AM +0200, Alexandr Nedvedicky wrote:
> Hello,
>
> looks like my earlier mail got eaten by SPAM filter, let's try again...
>
>
> >
> > moving pf locks to mutexes makes sense to me, but like you say,
> > this will need testing and experimentation. one of the
On Wed, May 19, 2021 at 01:48:26AM +0200, Alexandr Nedvedicky wrote:
> Hello,
>
> just for the record...
>
>
>
> > > in current tree the ether_input() is protected by NET_LOCK(), which is
> > > grabbed
> > > by caller as a writer. bluhm's diff changes NET_LOCK() readlock, so
> > >
Hello,
just for the record...
> > in current tree the ether_input() is protected by NET_LOCK(), which is
> > grabbed
> > by caller as a writer. bluhm's diff changes NET_LOCK() readlock, so
> > ether_input() can run concurrently. Switching NET_LOCK() to r-lock has
> > implications on smr read
Hello,
On Mon, May 17, 2021 at 08:19:37PM +0200, Alexander Bluhm wrote:
> On Mon, May 17, 2021 at 04:24:15PM +0200, Alexandr Nedvedicky wrote:
> > in current tree the ether_input() is protected by NET_LOCK(), which is
> > grabbed
> > by caller as a writer. bluhm's diff changes NET_LOCK()
On Mon, May 17, 2021 at 08:27:12PM +0200, Martin Pieuchot wrote:
> On 17/05/21(Mon) 19:52, Alexandr Nedvedicky wrote:
> > [...]
> > I don't mind to trade pf_lock and pf_state_lock for mutexes, however
> > I see such step is kind of 'NO-OP'. We do have sufficient measure
> > currently, which is:
On 17/05/21(Mon) 19:52, Alexandr Nedvedicky wrote:
> [...]
> I don't mind to trade pf_lock and pf_state_lock for mutexes, however
> I see such step is kind of 'NO-OP'. We do have sufficient measure
> currently, which is: keep NET_LOCK() as is. May be I'm not seeing
> your idea/plan behind
On Mon, May 17, 2021 at 04:24:15PM +0200, Alexandr Nedvedicky wrote:
> in current tree the ether_input() is protected by NET_LOCK(), which is grabbed
> by caller as a writer. bluhm's diff changes NET_LOCK() readlock, so
> ether_input() can run concurrently. Switching NET_LOCK() to r-lock has
>
Hello,
On Mon, May 17, 2021 at 06:59:36PM +0200, Martin Pieuchot wrote:
> On 17/05/21(Mon) 16:24, Alexandr Nedvedicky wrote:
> > Hrvoje,
> >
> > managed to trigger diagnostic panics with diff [1] sent by bluhm@
> > some time ago. The panic Hrvoje sees comes from ether_input() here:
> >
> > 414
On 17.5.2021. 16:24, Alexandr Nedvedicky wrote:
> Hrvoje,
>
> managed to trigger diagnostic panics with diff [1] sent by bluhm@
> some time ago. The panic Hrvoje sees comes from ether_input() here:
> [1] https://marc.info/?l=openbsd-tech=161903387904923=2
>
> [2]
On 17/05/21(Mon) 16:24, Alexandr Nedvedicky wrote:
> Hrvoje,
>
> managed to trigger diagnostic panics with diff [1] sent by bluhm@
> some time ago. The panic Hrvoje sees comes from ether_input() here:
>
> 414
> 415 /*
> 416 * Third phase: bridge processing.
> 417 *
Hrvoje,
managed to trigger diagnostic panics with diff [1] sent by bluhm@
some time ago. The panic Hrvoje sees comes from ether_input() here:
414
415 /*
416 * Third phase: bridge processing.
417 *
418 * Give the packet to a bridge interface, ie,
25 matches
Mail list logo