Re: sysupgrade vs. -stable, [was: Re: -current crash]

2022-06-01 Thread Stuart Henderson
On 2022/06/01 08:26, Florian Obser wrote: > On 2022-06-01 06:57 +02, Florian Obser wrote: > > On 2022-05-31 23:27 +01, Stuart Henderson wrote: > >> I accidentally updated a router to -current instead of 7.1 and hit this. > >> (Thanks sysupgrade - it was running a 7.0-stable kernel before...) > >

Re: -current crash

2022-06-01 Thread Stuart Henderson
On 2022/06/01 06:57, Florian Obser wrote: > On 2022-05-31 23:27 +01, Stuart Henderson wrote: > > I accidentally updated a router to -current instead of 7.1 and hit this. > > (Thanks sysupgrade - it was running a 7.0-stable kernel before...) > > Hmm? Are you saying running just running

sysupgrade vs. -stable, [was: Re: -current crash]

2022-06-01 Thread Florian Obser
On 2022-06-01 06:57 +02, Florian Obser wrote: > On 2022-05-31 23:27 +01, Stuart Henderson wrote: >> I accidentally updated a router to -current instead of 7.1 and hit this. >> (Thanks sysupgrade - it was running a 7.0-stable kernel before...) > > Hmm? Are you saying running just running

Re: -current crash

2022-05-31 Thread Florian Obser
On 2022-05-31 23:27 +01, Stuart Henderson wrote: > I accidentally updated a router to -current instead of 7.1 and hit this. > (Thanks sysupgrade - it was running a 7.0-stable kernel before...) Hmm? Are you saying running just running 'sysupgrade', without any flags, moves you from 7.0-stable to

Re: -current crash

2022-05-31 Thread Hrvoje Popovski
On 1.6.2022. 0:27, Stuart Henderson wrote: > I accidentally updated a router to -current instead of 7.1 and hit this. > (Thanks sysupgrade - it was running a 7.0-stable kernel before...) > > Unfortunately it runs with ddb.panic=0 and this time it hanged, I won't > have time to figure anything out

-current crash

2022-05-31 Thread Stuart Henderson
I accidentally updated a router to -current instead of 7.1 and hit this. (Thanks sysupgrade - it was running a 7.0-stable kernel before...) Unfortunately it runs with ddb.panic=0 and this time it hanged, I won't have time to figure anything out with it when I get it back online, but might be able

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-18 Thread Alexander Bluhm
On Mon, May 16, 2022 at 05:06:28PM +0200, Claudio Jeker wrote: > > In veb configuration we are holding the netlock and sleep in > > smr_barrier() and refcnt_finalize(). An additional sleep in malloc() > > is fine here. > > Are you sure about this? smr_barrier() on busy systems with many cpus can

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-16 Thread Claudio Jeker
On Sat, May 14, 2022 at 12:41:00AM +0200, Alexander Bluhm wrote: > On Fri, May 13, 2022 at 05:53:27PM +0200, Alexandr Nedvedicky wrote: > > at this point we hold a NET_LOCK(). So basically if there won't > > be enough memory we might start sleeping waiting for memory > > while we will

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-13 Thread Alexander Bluhm
On Fri, May 13, 2022 at 05:53:27PM +0200, Alexandr Nedvedicky wrote: > at this point we hold a NET_LOCK(). So basically if there won't > be enough memory we might start sleeping waiting for memory > while we will be holding a NET_LOCK. > > This is something we should try to avoid,

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-13 Thread Alexander Bluhm
On Fri, May 13, 2022 at 12:19:46PM +1000, David Gwynne wrote: > sorry i'm late to the party. can you try this diff? Thanks for having a look. I added veb(4) to my setup. With this diff, I cannot trigger a crash anymore. OK bluhm@ > this diff replaces the list of ports with an array/map of

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-13 Thread Alexandr Nedvedicky
Hello Dave, > > sorry i'm late to the party. can you try this diff? glad to see you are here. I think you diff looks good. I'm just concerned about the memory allocation in veb_ports_insert(). The memory is allocated with `M_WAITOK` flag, which essentially means we may give

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-13 Thread Claudio Jeker
On Fri, May 13, 2022 at 12:19:46PM +1000, David Gwynne wrote: > On Thu, May 12, 2022 at 08:07:09PM +0200, Hrvoje Popovski wrote: > > On 12.5.2022. 20:04, Hrvoje Popovski wrote: > > > On 12.5.2022. 16:22, Hrvoje Popovski wrote: > > >> On 12.5.2022. 14:48, Claudio Jeker wrote: > > >>> I think the

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-13 Thread Hrvoje Popovski
On 13.5.2022. 4:19, David Gwynne wrote: > sorry i'm late to the party. can you try this diff? > > this diff replaces the list of ports with an array/map of ports. > the map takes references to all the ports, so the forwarding paths > just have to hold a reference to the map to be able to use all

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-12 Thread David Gwynne
On Thu, May 12, 2022 at 08:07:09PM +0200, Hrvoje Popovski wrote: > On 12.5.2022. 20:04, Hrvoje Popovski wrote: > > On 12.5.2022. 16:22, Hrvoje Popovski wrote: > >> On 12.5.2022. 14:48, Claudio Jeker wrote: > >>> I think the diff below may be enough to fix this issue. It drops the SMR > >>>

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-12 Thread Hrvoje Popovski
On 12.5.2022. 20:04, Hrvoje Popovski wrote: > On 12.5.2022. 16:22, Hrvoje Popovski wrote: >> On 12.5.2022. 14:48, Claudio Jeker wrote: >>> I think the diff below may be enough to fix this issue. It drops the SMR >>> critical secition around the enqueue operation but uses a reference on the >>>

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-12 Thread Hrvoje Popovski
On 12.5.2022. 16:22, Hrvoje Popovski wrote: > On 12.5.2022. 14:48, Claudio Jeker wrote: >> I think the diff below may be enough to fix this issue. It drops the SMR >> critical secition around the enqueue operation but uses a reference on the >> port insteadt to ensure that the device can't be

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-12 Thread Alexandr Nedvedicky
Hello, > > I think the diff below may be enough to fix this issue. It drops the SMR > critical secition around the enqueue operation but uses a reference on the > port insteadt to ensure that the device can't be removed during the > enqueue. Once the enqueue is finished we enter the SMR critical

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-12 Thread Hrvoje Popovski
On 12.5.2022. 14:48, Claudio Jeker wrote: > I think the diff below may be enough to fix this issue. It drops the SMR > critical secition around the enqueue operation but uses a reference on the > port insteadt to ensure that the device can't be removed during the > enqueue. Once the enqueue is

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-12 Thread Claudio Jeker
On Wed, May 11, 2022 at 11:01:21AM +0200, Alexandr Nedvedicky wrote: > Hello Hrvoje, > > thank you for testing. > On Wed, May 11, 2022 at 10:40:28AM +0200, Hrvoje Popovski wrote: > > On 10.5.2022. 22:55, Alexander Bluhm wrote: > > > Yes. It is similar. > > > > > > I have read the whole mail

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-11 Thread Alexandr Nedvedicky
Hello Hrvoje, thank you for testing. On Wed, May 11, 2022 at 10:40:28AM +0200, Hrvoje Popovski wrote: > On 10.5.2022. 22:55, Alexander Bluhm wrote: > > Yes. It is similar. > > > > I have read the whole mail thread and the final fix got commited. > > But it looks incomplete, pf is still

Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-11 Thread Claudio Jeker
On Wed, May 11, 2022 at 10:29:57AM +0200, Claudio Jeker wrote: > On Wed, May 11, 2022 at 09:58:09AM +0200, Alexandr Nedvedicky wrote: > > Hello, > > > > > > > > Can we limit the number of span ports per bridge to a small number so that > > > the instead of a heap object for the SLIST a simple

Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-11 Thread Hrvoje Popovski
On 10.5.2022. 22:55, Alexander Bluhm wrote: > Yes. It is similar. > > I have read the whole mail thread and the final fix got commited. > But it looks incomplete, pf is still sleeping. > > Hrvoje, can you run the tests again that triggered the panics a > year ago? Hi, year ago panics was

Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-11 Thread Claudio Jeker
On Wed, May 11, 2022 at 09:58:09AM +0200, Alexandr Nedvedicky wrote: > Hello, > > > > > Can we limit the number of span ports per bridge to a small number so that > > the instead of a heap object for the SLIST a simple stack array of > > MAX_SPAN_PORTS pointers could be used? > > > > Who needs

Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-11 Thread Alexandr Nedvedicky
Hello, > > Can we limit the number of span ports per bridge to a small number so that > the instead of a heap object for the SLIST a simple stack array of > MAX_SPAN_PORTS pointers could be used? > > Who needs more than a handfull of spanports per veb? I just to make sure I follow your

Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-11 Thread Claudio Jeker
On Tue, May 10, 2022 at 12:21:02AM +0200, Alexandr Nedvedicky wrote: > Hello, > > On Mon, May 09, 2022 at 06:01:07PM +0300, Barbaros Bilek wrote: > > Hello, > > > > I was using veb (veb+vlan+ixl) interfaces quite stable since 6.9. > > My system ran as a firewall under OpenBSD 6.9 and 7.0 quite

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-11 Thread Claudio Jeker
On Wed, May 11, 2022 at 12:38:56AM +0200, Alexandr Nedvedicky wrote: > Hello, > > > > > Yes. It is similar. > > > > I have read the whole mail thread and the final fix got commited. > > But it looks incomplete, pf is still sleeping. > > > > Hrvoje, can you run the tests again that triggered

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-10 Thread Alexandr Nedvedicky
Hello, > > Yes. It is similar. > > I have read the whole mail thread and the final fix got commited. > But it looks incomplete, pf is still sleeping. > > Hrvoje, can you run the tests again that triggered the panics a > year ago? > > Sasha, I still think the way to go is mutex for pf locks.

Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-10 Thread Alexander Bluhm
On Tue, May 10, 2022 at 09:37:12PM +0200, Hrvoje Popovski wrote: > On 9.5.2022. 22:04, Alexander Bluhm wrote: > > Can some veb or smr hacker explain how this is supposed to work? > > > > Sleeping in pf is also not ideal as it is in the hot path and slows > > down packets. But that is not easy to

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-10 Thread Alexandr Nedvedicky
Hello, On Tue, May 10, 2022 at 09:37:12PM +0200, Hrvoje Popovski wrote: > On 9.5.2022. 22:04, Alexander Bluhm wrote: > > Can some veb or smr hacker explain how this is supposed to work? > > > > Sleeping in pf is also not ideal as it is in the hot path and slows > > down packets. But that is not

Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-10 Thread Vitaliy Makkoveev
> On 10 May 2022, at 22:37, Hrvoje Popovski wrote: > > On 9.5.2022. 22:04, Alexander Bluhm wrote: >> Can some veb or smr hacker explain how this is supposed to work? >> >> Sleeping in pf is also not ideal as it is in the hot path and slows >> down packets. But that is not easy to fix as we

Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-10 Thread Hrvoje Popovski
On 9.5.2022. 22:04, Alexander Bluhm wrote: > Can some veb or smr hacker explain how this is supposed to work? > > Sleeping in pf is also not ideal as it is in the hot path and slows > down packets. But that is not easy to fix as we have to refactor > the memory allocations before converting pf

Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-10 Thread Barbaros Bilek
Thanks for your support. I'll try to test when you get it done. On Mon, May 9, 2022 at 8:51 PM Alexandr Nedvedicky < alexandr.nedvedi...@oracle.com> wrote: > Hello Barbaros, > > thank you for testing and excellent report. > > > > > ddb{1}> trace > > db_enter() at db_enter+0x10 > >

Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-09 Thread Vitaliy Makkoveev
Hi, I’m not a fun of this. > > + if (span_port_pool.pr_size == 0) { > + pool_init(_port_pool, sizeof(struct veb_span_port), > + 0, IPL_SOFTNET, 0, "vebspl", NULL); > + } Does initialized pool consume significant resources? Why don’t we do this within

Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-09 Thread Alexandr Nedvedicky
Hello, On Mon, May 09, 2022 at 06:01:07PM +0300, Barbaros Bilek wrote: > Hello, > > I was using veb (veb+vlan+ixl) interfaces quite stable since 6.9. > My system ran as a firewall under OpenBSD 6.9 and 7.0 quite stable. > Also I've used 7.1 for a limited time and there were no crash. > After

Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-09 Thread Alexander Bluhm
On Mon, May 09, 2022 at 06:01:07PM +0300, Barbaros Bilek wrote: > I was using veb (veb+vlan+ixl) interfaces quite stable since 6.9. > My system ran as a firewall under OpenBSD 6.9 and 7.0 quite stable. > Also I've used 7.1 for a limited time and there were no crash. > After OpenBSD' NET_TASKQ

Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-09 Thread Alexandr Nedvedicky
Hello Barbaros, thank you for testing and excellent report. > ddb{1}> trace > db_enter() at db_enter+0x10 > panic(81f22e39) at panic+0xbf > __assert(81f96c9d,81f85ebc,a3,81fd252f) at > __assert+0x25 > assertwaitok() at assertwaitok+0xcc > mi_switch() at

7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-09 Thread Barbaros Bilek
Hello, I was using veb (veb+vlan+ixl) interfaces quite stable since 6.9. My system ran as a firewall under OpenBSD 6.9 and 7.0 quite stable. Also I've used 7.1 for a limited time and there were no crash. After OpenBSD' NET_TASKQ upgrade to 4 it crashed after 5 days. Here crash report and dmesg: