Re: svn commit: r359436 - in head/sys: kern net sys

2020-03-31 Thread Kristof Provost

On 31 Mar 2020, at 17:28, Kristof Provost wrote:

On 31 Mar 2020, at 17:17, Mark Johnston wrote:

On Tue, Mar 31, 2020 at 03:51:27PM +0800, Li-Wen Hsu wrote:
On Tue, Mar 31, 2020 at 3:00 PM Kristof Provost  
wrote:


On 31 Mar 2020, at 7:56, Li-Wen Hsu wrote:
On Tue, Mar 31, 2020 at 10:55 AM Mark Johnston  
wrote:

It seems could be triggered by sys.netinet6.frag6.*
sys.netpfil.common.* sbin.pfctl.pfctl_test.* tests, and there 
are lots

of test cases timed out.

Can you help check these?


I see, it is actually caused by r359438.  I'm looking at it now.


I verified that the netpfil and netinet6 tests pass with r359477.


Thanks for the fixing, the latest test panics at epair_qflush:

https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14747/consoleFull

while executing sys.netpfil.pf.* tests. I'm not sure if this is
related or because of previous commits (I suspect the later). I'll
look into this.


That’s a know issue with epair (since EPOCH, I believe).
A number of the pf tests are disabled due to this. See 238870.


I also think so, btw, currently every test run panics so I am afraid
that the recent commits might make status worse (or say, make the
issue easier to reproduce?)


I haven't been able to reproduce any panics or test failures so far.


Once you disable the ‘atf_skip’ lines in the pf tests a simple 
`sudo kldload pfsync && cd /usr/tests/sys/netpfil/pf && sudo kyua 
test` is likely sufficient.


The names:names test is a great candidate for this. Remove the `atf_skip 
…` line in /usr/tests/sys/netpfil/pf/names and run that a few times.
It’s not 100% reliable, but the test is very fast and will likely 
panic every other run or more.


Example backtrace:

panic: epair_qflush: ifp=0xf800079c9000, epair_softc gone? sc=0

cpuid = 1
time = 1585666518
KDB: stack backtrace:
	db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe001bd7e790

vpanic() at vpanic+0x182/frame 0xfe001bd7e7e0
panic() at panic+0x43/frame 0xfe001bd7e840
epair_qflush() at epair_qflush+0x1a8/frame 0xfe001bd7e890
if_down() at if_down+0x12d/frame 0xfe001bd7e8c0
	if_detach_internal() at if_detach_internal+0x2ee/frame 
0xfe001bd7e920

if_vmove() at if_vmove+0x3c/frame 0xfe001bd7e970
vnet_if_return() at vnet_if_return+0x50/frame 0xfe001bd7e990
vnet_destroy() at vnet_destroy+0x130/frame 0xfe001bd7e9c0
prison_deref() at prison_deref+0x29d/frame 0xfe001bd7ea00
	taskqueue_run_locked() at taskqueue_run_locked+0xaa/frame 
0xfe001bd7ea80
	taskqueue_thread_loop() at taskqueue_thread_loop+0x94/frame 
0xfe001bd7eab0

fork_exit() at fork_exit+0x80/frame 0xfe001bd7eaf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe001bd7eaf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 0 tid 100014 ]
Stopped at  kdb_enter+0x37: movq$0,0x10927a6(%rip)
db>

You might see different panics too. The epair teardown flow is complex, 
and broken.


Best regards,
Kristof
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r359436 - in head/sys: kern net sys

2020-03-31 Thread Kristof Provost

On 31 Mar 2020, at 17:17, Mark Johnston wrote:

On Tue, Mar 31, 2020 at 03:51:27PM +0800, Li-Wen Hsu wrote:
On Tue, Mar 31, 2020 at 3:00 PM Kristof Provost  
wrote:


On 31 Mar 2020, at 7:56, Li-Wen Hsu wrote:
On Tue, Mar 31, 2020 at 10:55 AM Mark Johnston  
wrote:

It seems could be triggered by sys.netinet6.frag6.*
sys.netpfil.common.* sbin.pfctl.pfctl_test.* tests, and there 
are lots

of test cases timed out.

Can you help check these?


I see, it is actually caused by r359438.  I'm looking at it now.


I verified that the netpfil and netinet6 tests pass with r359477.


Thanks for the fixing, the latest test panics at epair_qflush:

https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14747/consoleFull

while executing sys.netpfil.pf.* tests. I'm not sure if this is
related or because of previous commits (I suspect the later). I'll
look into this.


That’s a know issue with epair (since EPOCH, I believe).
A number of the pf tests are disabled due to this. See 238870.


I also think so, btw, currently every test run panics so I am afraid
that the recent commits might make status worse (or say, make the
issue easier to reproduce?)


I haven't been able to reproduce any panics or test failures so far.


Once you disable the ‘atf_skip’ lines in the pf tests a simple `sudo 
kldload pfsync && cd /usr/tests/sys/netpfil/pf && sudo kyua test` is 
likely sufficient.


There’s a complex race around tearing down epair interfaces and moving 
them back to their home vnet that’s proven very tricky to resolve.


Best regards,
Kristof
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r359436 - in head/sys: kern net sys

2020-03-31 Thread Mark Johnston
On Tue, Mar 31, 2020 at 03:51:27PM +0800, Li-Wen Hsu wrote:
> On Tue, Mar 31, 2020 at 3:00 PM Kristof Provost  wrote:
> >
> > On 31 Mar 2020, at 7:56, Li-Wen Hsu wrote:
> > > On Tue, Mar 31, 2020 at 10:55 AM Mark Johnston  wrote:
> >  It seems could be triggered by sys.netinet6.frag6.*
> >  sys.netpfil.common.* sbin.pfctl.pfctl_test.* tests, and there are lots
> >  of test cases timed out.
> > 
> >  Can you help check these?
> > >>>
> > >>> I see, it is actually caused by r359438.  I'm looking at it now.
> > >>
> > >> I verified that the netpfil and netinet6 tests pass with r359477.
> > >
> > > Thanks for the fixing, the latest test panics at epair_qflush:
> > >
> > > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14747/consoleFull
> > >
> > > while executing sys.netpfil.pf.* tests. I'm not sure if this is
> > > related or because of previous commits (I suspect the later). I'll
> > > look into this.
> > >
> > That’s a know issue with epair (since EPOCH, I believe).
> > A number of the pf tests are disabled due to this. See 238870.
> 
> I also think so, btw, currently every test run panics so I am afraid
> that the recent commits might make status worse (or say, make the
> issue easier to reproduce?)

I haven't been able to reproduce any panics or test failures so far.
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r359436 - in head/sys: kern net sys

2020-03-31 Thread Li-Wen Hsu
On Tue, Mar 31, 2020 at 3:00 PM Kristof Provost  wrote:
>
> On 31 Mar 2020, at 7:56, Li-Wen Hsu wrote:
> > On Tue, Mar 31, 2020 at 10:55 AM Mark Johnston  wrote:
>  It seems could be triggered by sys.netinet6.frag6.*
>  sys.netpfil.common.* sbin.pfctl.pfctl_test.* tests, and there are lots
>  of test cases timed out.
> 
>  Can you help check these?
> >>>
> >>> I see, it is actually caused by r359438.  I'm looking at it now.
> >>
> >> I verified that the netpfil and netinet6 tests pass with r359477.
> >
> > Thanks for the fixing, the latest test panics at epair_qflush:
> >
> > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14747/consoleFull
> >
> > while executing sys.netpfil.pf.* tests. I'm not sure if this is
> > related or because of previous commits (I suspect the later). I'll
> > look into this.
> >
> That’s a know issue with epair (since EPOCH, I believe).
> A number of the pf tests are disabled due to this. See 238870.

I also think so, btw, currently every test run panics so I am afraid
that the recent commits might make status worse (or say, make the
issue easier to reproduce?)

Best,
Li-Wen
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r359436 - in head/sys: kern net sys

2020-03-31 Thread Kristof Provost
On 31 Mar 2020, at 7:56, Li-Wen Hsu wrote:
> On Tue, Mar 31, 2020 at 10:55 AM Mark Johnston  wrote:
 It seems could be triggered by sys.netinet6.frag6.*
 sys.netpfil.common.* sbin.pfctl.pfctl_test.* tests, and there are lots
 of test cases timed out.

 Can you help check these?
>>>
>>> I see, it is actually caused by r359438.  I'm looking at it now.
>>
>> I verified that the netpfil and netinet6 tests pass with r359477.
>
> Thanks for the fixing, the latest test panics at epair_qflush:
>
> https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14747/consoleFull
>
> while executing sys.netpfil.pf.* tests. I'm not sure if this is
> related or because of previous commits (I suspect the later). I'll
> look into this.
>
That’s a know issue with epair (since EPOCH, I believe).
A number of the pf tests are disabled due to this. See 238870.

Best regards,
Kristof
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r359436 - in head/sys: kern net sys

2020-03-30 Thread Li-Wen Hsu
On Tue, Mar 31, 2020 at 10:55 AM Mark Johnston  wrote:
> > > It seems could be triggered by sys.netinet6.frag6.*
> > > sys.netpfil.common.* sbin.pfctl.pfctl_test.* tests, and there are lots
> > > of test cases timed out.
> > >
> > > Can you help check these?
> >
> > I see, it is actually caused by r359438.  I'm looking at it now.
>
> I verified that the netpfil and netinet6 tests pass with r359477.

Thanks for the fixing, the latest test panics at epair_qflush:

https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14747/consoleFull

while executing sys.netpfil.pf.* tests. I'm not sure if this is
related or because of previous commits (I suspect the later). I'll
look into this.

Best,
Li-Wen
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r359436 - in head/sys: kern net sys

2020-03-30 Thread Mark Johnston
On Mon, Mar 30, 2020 at 09:59:05PM -0400, Mark Johnston wrote:
> On Tue, Mar 31, 2020 at 09:40:51AM +0800, Li-Wen Hsu wrote:
> > On Mon, Mar 30, 2020 at 10:32 PM Mark Johnston  wrote:
> > >
> > > Author: markj
> > > Date: Mon Mar 30 14:22:52 2020
> > > New Revision: 359436
> > > URL: https://svnweb.freebsd.org/changeset/base/359436
> > >
> > > Log:
> > >   Simplify taskqgroup inititialization.
> > >
> > >   taskqgroup initialization was broken into two steps:
> > >
> > >   1. allocate the taskqgroup structure, at SI_SUB_TASKQ;
> > >   2. initialize taskqueues, start taskqueue threads, enqueue "binder"
> > >  tasks to bind threads to specific CPUs, at SI_SUB_SMP.
> > >
> > >   Step 2 tries to handle the case where tasks have already been attached
> > >   to a queue, by migrating them to their intended queue.  In particular,
> > >   tasks can't be enqueued before step 2 has completed.  This breaks NFS
> > >   mountroot on systems using an iflib-based driver when EARLY_AP_STARTUP
> > >   is not defined, since mountroot happens before SI_SUB_SMP in this case.
> > >
> > >   Simplify initialization: do all initialization except for CPU binding at
> > >   SI_SUB_TASKQ.  This means that until CPU binding is completed, group
> > >   tasks may be executed on a CPU other than that to which they were bound,
> > >   but this should not be a problem for existing users of the taskqgroup
> > >   KPIs.
> > >
> > >   Reported by:  sbruno
> > >   Tested by:bdragon, sbruno
> > >   MFC after:1 month
> > >   Sponsored by: The FreeBSD Foundation
> > >   Differential Revision:https://reviews.freebsd.org/D24188
> > >
> > > Modified:
> > >   head/sys/kern/subr_gtaskqueue.c
> > >   head/sys/net/iflib.c
> > >   head/sys/sys/gtaskqueue.h
> > 
> > Hi Mark,
> > 
> > I see many "panic: deadlres_td_sleep_q: possible deadlock detected" in
> > the CI after this commit:
> > 
> > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14738/consoleFull
> > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14739/consoleFull
> > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14741/consoleFull
> > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14742/consoleFull
> > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14743/console
> > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14744/consoleFull
> > 
> > It seems could be triggered by sys.netinet6.frag6.*
> > sys.netpfil.common.* sbin.pfctl.pfctl_test.* tests, and there are lots
> > of test cases timed out.
> > 
> > Can you help check these?
> 
> I see, it is actually caused by r359438.  I'm looking at it now.

I verified that the netpfil and netinet6 tests pass with r359477.
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r359436 - in head/sys: kern net sys

2020-03-30 Thread Mark Johnston
On Tue, Mar 31, 2020 at 09:40:51AM +0800, Li-Wen Hsu wrote:
> On Mon, Mar 30, 2020 at 10:32 PM Mark Johnston  wrote:
> >
> > Author: markj
> > Date: Mon Mar 30 14:22:52 2020
> > New Revision: 359436
> > URL: https://svnweb.freebsd.org/changeset/base/359436
> >
> > Log:
> >   Simplify taskqgroup inititialization.
> >
> >   taskqgroup initialization was broken into two steps:
> >
> >   1. allocate the taskqgroup structure, at SI_SUB_TASKQ;
> >   2. initialize taskqueues, start taskqueue threads, enqueue "binder"
> >  tasks to bind threads to specific CPUs, at SI_SUB_SMP.
> >
> >   Step 2 tries to handle the case where tasks have already been attached
> >   to a queue, by migrating them to their intended queue.  In particular,
> >   tasks can't be enqueued before step 2 has completed.  This breaks NFS
> >   mountroot on systems using an iflib-based driver when EARLY_AP_STARTUP
> >   is not defined, since mountroot happens before SI_SUB_SMP in this case.
> >
> >   Simplify initialization: do all initialization except for CPU binding at
> >   SI_SUB_TASKQ.  This means that until CPU binding is completed, group
> >   tasks may be executed on a CPU other than that to which they were bound,
> >   but this should not be a problem for existing users of the taskqgroup
> >   KPIs.
> >
> >   Reported by:  sbruno
> >   Tested by:bdragon, sbruno
> >   MFC after:1 month
> >   Sponsored by: The FreeBSD Foundation
> >   Differential Revision:https://reviews.freebsd.org/D24188
> >
> > Modified:
> >   head/sys/kern/subr_gtaskqueue.c
> >   head/sys/net/iflib.c
> >   head/sys/sys/gtaskqueue.h
> 
> Hi Mark,
> 
> I see many "panic: deadlres_td_sleep_q: possible deadlock detected" in
> the CI after this commit:
> 
> https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14738/consoleFull
> https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14739/consoleFull
> https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14741/consoleFull
> https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14742/consoleFull
> https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14743/console
> https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14744/consoleFull
> 
> It seems could be triggered by sys.netinet6.frag6.*
> sys.netpfil.common.* sbin.pfctl.pfctl_test.* tests, and there are lots
> of test cases timed out.
> 
> Can you help check these?

I see, it is actually caused by r359438.  I'm looking at it now.
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r359436 - in head/sys: kern net sys

2020-03-30 Thread Li-Wen Hsu
On Mon, Mar 30, 2020 at 10:32 PM Mark Johnston  wrote:
>
> Author: markj
> Date: Mon Mar 30 14:22:52 2020
> New Revision: 359436
> URL: https://svnweb.freebsd.org/changeset/base/359436
>
> Log:
>   Simplify taskqgroup inititialization.
>
>   taskqgroup initialization was broken into two steps:
>
>   1. allocate the taskqgroup structure, at SI_SUB_TASKQ;
>   2. initialize taskqueues, start taskqueue threads, enqueue "binder"
>  tasks to bind threads to specific CPUs, at SI_SUB_SMP.
>
>   Step 2 tries to handle the case where tasks have already been attached
>   to a queue, by migrating them to their intended queue.  In particular,
>   tasks can't be enqueued before step 2 has completed.  This breaks NFS
>   mountroot on systems using an iflib-based driver when EARLY_AP_STARTUP
>   is not defined, since mountroot happens before SI_SUB_SMP in this case.
>
>   Simplify initialization: do all initialization except for CPU binding at
>   SI_SUB_TASKQ.  This means that until CPU binding is completed, group
>   tasks may be executed on a CPU other than that to which they were bound,
>   but this should not be a problem for existing users of the taskqgroup
>   KPIs.
>
>   Reported by:  sbruno
>   Tested by:bdragon, sbruno
>   MFC after:1 month
>   Sponsored by: The FreeBSD Foundation
>   Differential Revision:https://reviews.freebsd.org/D24188
>
> Modified:
>   head/sys/kern/subr_gtaskqueue.c
>   head/sys/net/iflib.c
>   head/sys/sys/gtaskqueue.h

Hi Mark,

I see many "panic: deadlres_td_sleep_q: possible deadlock detected" in
the CI after this commit:

https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14738/consoleFull
https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14739/consoleFull
https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14741/consoleFull
https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14742/consoleFull
https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14743/console
https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14744/consoleFull

It seems could be triggered by sys.netinet6.frag6.*
sys.netpfil.common.* sbin.pfctl.pfctl_test.* tests, and there are lots
of test cases timed out.

Can you help check these?

Thanks,
Li-Wen
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"