Re: svn commit: r359436 - in head/sys: kern net sys
On 31 Mar 2020, at 17:28, Kristof Provost wrote: On 31 Mar 2020, at 17:17, Mark Johnston wrote: On Tue, Mar 31, 2020 at 03:51:27PM +0800, Li-Wen Hsu wrote: On Tue, Mar 31, 2020 at 3:00 PM Kristof Provost wrote: On 31 Mar 2020, at 7:56, Li-Wen Hsu wrote: On Tue, Mar 31, 2020 at 10:55 AM Mark Johnston wrote: It seems could be triggered by sys.netinet6.frag6.* sys.netpfil.common.* sbin.pfctl.pfctl_test.* tests, and there are lots of test cases timed out. Can you help check these? I see, it is actually caused by r359438. I'm looking at it now. I verified that the netpfil and netinet6 tests pass with r359477. Thanks for the fixing, the latest test panics at epair_qflush: https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14747/consoleFull while executing sys.netpfil.pf.* tests. I'm not sure if this is related or because of previous commits (I suspect the later). I'll look into this. That’s a know issue with epair (since EPOCH, I believe). A number of the pf tests are disabled due to this. See 238870. I also think so, btw, currently every test run panics so I am afraid that the recent commits might make status worse (or say, make the issue easier to reproduce?) I haven't been able to reproduce any panics or test failures so far. Once you disable the ‘atf_skip’ lines in the pf tests a simple `sudo kldload pfsync && cd /usr/tests/sys/netpfil/pf && sudo kyua test` is likely sufficient. The names:names test is a great candidate for this. Remove the `atf_skip …` line in /usr/tests/sys/netpfil/pf/names and run that a few times. It’s not 100% reliable, but the test is very fast and will likely panic every other run or more. Example backtrace: panic: epair_qflush: ifp=0xf800079c9000, epair_softc gone? sc=0 cpuid = 1 time = 1585666518 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe001bd7e790 vpanic() at vpanic+0x182/frame 0xfe001bd7e7e0 panic() at panic+0x43/frame 0xfe001bd7e840 epair_qflush() at epair_qflush+0x1a8/frame 0xfe001bd7e890 if_down() at if_down+0x12d/frame 0xfe001bd7e8c0 if_detach_internal() at if_detach_internal+0x2ee/frame 0xfe001bd7e920 if_vmove() at if_vmove+0x3c/frame 0xfe001bd7e970 vnet_if_return() at vnet_if_return+0x50/frame 0xfe001bd7e990 vnet_destroy() at vnet_destroy+0x130/frame 0xfe001bd7e9c0 prison_deref() at prison_deref+0x29d/frame 0xfe001bd7ea00 taskqueue_run_locked() at taskqueue_run_locked+0xaa/frame 0xfe001bd7ea80 taskqueue_thread_loop() at taskqueue_thread_loop+0x94/frame 0xfe001bd7eab0 fork_exit() at fork_exit+0x80/frame 0xfe001bd7eaf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfe001bd7eaf0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic [ thread pid 0 tid 100014 ] Stopped at kdb_enter+0x37: movq$0,0x10927a6(%rip) db> You might see different panics too. The epair teardown flow is complex, and broken. Best regards, Kristof ___ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r359436 - in head/sys: kern net sys
On 31 Mar 2020, at 17:17, Mark Johnston wrote: On Tue, Mar 31, 2020 at 03:51:27PM +0800, Li-Wen Hsu wrote: On Tue, Mar 31, 2020 at 3:00 PM Kristof Provost wrote: On 31 Mar 2020, at 7:56, Li-Wen Hsu wrote: On Tue, Mar 31, 2020 at 10:55 AM Mark Johnston wrote: It seems could be triggered by sys.netinet6.frag6.* sys.netpfil.common.* sbin.pfctl.pfctl_test.* tests, and there are lots of test cases timed out. Can you help check these? I see, it is actually caused by r359438. I'm looking at it now. I verified that the netpfil and netinet6 tests pass with r359477. Thanks for the fixing, the latest test panics at epair_qflush: https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14747/consoleFull while executing sys.netpfil.pf.* tests. I'm not sure if this is related or because of previous commits (I suspect the later). I'll look into this. That’s a know issue with epair (since EPOCH, I believe). A number of the pf tests are disabled due to this. See 238870. I also think so, btw, currently every test run panics so I am afraid that the recent commits might make status worse (or say, make the issue easier to reproduce?) I haven't been able to reproduce any panics or test failures so far. Once you disable the ‘atf_skip’ lines in the pf tests a simple `sudo kldload pfsync && cd /usr/tests/sys/netpfil/pf && sudo kyua test` is likely sufficient. There’s a complex race around tearing down epair interfaces and moving them back to their home vnet that’s proven very tricky to resolve. Best regards, Kristof ___ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r359436 - in head/sys: kern net sys
On Tue, Mar 31, 2020 at 03:51:27PM +0800, Li-Wen Hsu wrote: > On Tue, Mar 31, 2020 at 3:00 PM Kristof Provost wrote: > > > > On 31 Mar 2020, at 7:56, Li-Wen Hsu wrote: > > > On Tue, Mar 31, 2020 at 10:55 AM Mark Johnston wrote: > > It seems could be triggered by sys.netinet6.frag6.* > > sys.netpfil.common.* sbin.pfctl.pfctl_test.* tests, and there are lots > > of test cases timed out. > > > > Can you help check these? > > >>> > > >>> I see, it is actually caused by r359438. I'm looking at it now. > > >> > > >> I verified that the netpfil and netinet6 tests pass with r359477. > > > > > > Thanks for the fixing, the latest test panics at epair_qflush: > > > > > > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14747/consoleFull > > > > > > while executing sys.netpfil.pf.* tests. I'm not sure if this is > > > related or because of previous commits (I suspect the later). I'll > > > look into this. > > > > > That’s a know issue with epair (since EPOCH, I believe). > > A number of the pf tests are disabled due to this. See 238870. > > I also think so, btw, currently every test run panics so I am afraid > that the recent commits might make status worse (or say, make the > issue easier to reproduce?) I haven't been able to reproduce any panics or test failures so far. ___ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r359436 - in head/sys: kern net sys
On Tue, Mar 31, 2020 at 3:00 PM Kristof Provost wrote: > > On 31 Mar 2020, at 7:56, Li-Wen Hsu wrote: > > On Tue, Mar 31, 2020 at 10:55 AM Mark Johnston wrote: > It seems could be triggered by sys.netinet6.frag6.* > sys.netpfil.common.* sbin.pfctl.pfctl_test.* tests, and there are lots > of test cases timed out. > > Can you help check these? > >>> > >>> I see, it is actually caused by r359438. I'm looking at it now. > >> > >> I verified that the netpfil and netinet6 tests pass with r359477. > > > > Thanks for the fixing, the latest test panics at epair_qflush: > > > > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14747/consoleFull > > > > while executing sys.netpfil.pf.* tests. I'm not sure if this is > > related or because of previous commits (I suspect the later). I'll > > look into this. > > > That’s a know issue with epair (since EPOCH, I believe). > A number of the pf tests are disabled due to this. See 238870. I also think so, btw, currently every test run panics so I am afraid that the recent commits might make status worse (or say, make the issue easier to reproduce?) Best, Li-Wen ___ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r359436 - in head/sys: kern net sys
On 31 Mar 2020, at 7:56, Li-Wen Hsu wrote: > On Tue, Mar 31, 2020 at 10:55 AM Mark Johnston wrote: It seems could be triggered by sys.netinet6.frag6.* sys.netpfil.common.* sbin.pfctl.pfctl_test.* tests, and there are lots of test cases timed out. Can you help check these? >>> >>> I see, it is actually caused by r359438. I'm looking at it now. >> >> I verified that the netpfil and netinet6 tests pass with r359477. > > Thanks for the fixing, the latest test panics at epair_qflush: > > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14747/consoleFull > > while executing sys.netpfil.pf.* tests. I'm not sure if this is > related or because of previous commits (I suspect the later). I'll > look into this. > That’s a know issue with epair (since EPOCH, I believe). A number of the pf tests are disabled due to this. See 238870. Best regards, Kristof ___ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r359436 - in head/sys: kern net sys
On Tue, Mar 31, 2020 at 10:55 AM Mark Johnston wrote: > > > It seems could be triggered by sys.netinet6.frag6.* > > > sys.netpfil.common.* sbin.pfctl.pfctl_test.* tests, and there are lots > > > of test cases timed out. > > > > > > Can you help check these? > > > > I see, it is actually caused by r359438. I'm looking at it now. > > I verified that the netpfil and netinet6 tests pass with r359477. Thanks for the fixing, the latest test panics at epair_qflush: https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14747/consoleFull while executing sys.netpfil.pf.* tests. I'm not sure if this is related or because of previous commits (I suspect the later). I'll look into this. Best, Li-Wen ___ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r359436 - in head/sys: kern net sys
On Mon, Mar 30, 2020 at 09:59:05PM -0400, Mark Johnston wrote: > On Tue, Mar 31, 2020 at 09:40:51AM +0800, Li-Wen Hsu wrote: > > On Mon, Mar 30, 2020 at 10:32 PM Mark Johnston wrote: > > > > > > Author: markj > > > Date: Mon Mar 30 14:22:52 2020 > > > New Revision: 359436 > > > URL: https://svnweb.freebsd.org/changeset/base/359436 > > > > > > Log: > > > Simplify taskqgroup inititialization. > > > > > > taskqgroup initialization was broken into two steps: > > > > > > 1. allocate the taskqgroup structure, at SI_SUB_TASKQ; > > > 2. initialize taskqueues, start taskqueue threads, enqueue "binder" > > > tasks to bind threads to specific CPUs, at SI_SUB_SMP. > > > > > > Step 2 tries to handle the case where tasks have already been attached > > > to a queue, by migrating them to their intended queue. In particular, > > > tasks can't be enqueued before step 2 has completed. This breaks NFS > > > mountroot on systems using an iflib-based driver when EARLY_AP_STARTUP > > > is not defined, since mountroot happens before SI_SUB_SMP in this case. > > > > > > Simplify initialization: do all initialization except for CPU binding at > > > SI_SUB_TASKQ. This means that until CPU binding is completed, group > > > tasks may be executed on a CPU other than that to which they were bound, > > > but this should not be a problem for existing users of the taskqgroup > > > KPIs. > > > > > > Reported by: sbruno > > > Tested by:bdragon, sbruno > > > MFC after:1 month > > > Sponsored by: The FreeBSD Foundation > > > Differential Revision:https://reviews.freebsd.org/D24188 > > > > > > Modified: > > > head/sys/kern/subr_gtaskqueue.c > > > head/sys/net/iflib.c > > > head/sys/sys/gtaskqueue.h > > > > Hi Mark, > > > > I see many "panic: deadlres_td_sleep_q: possible deadlock detected" in > > the CI after this commit: > > > > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14738/consoleFull > > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14739/consoleFull > > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14741/consoleFull > > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14742/consoleFull > > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14743/console > > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14744/consoleFull > > > > It seems could be triggered by sys.netinet6.frag6.* > > sys.netpfil.common.* sbin.pfctl.pfctl_test.* tests, and there are lots > > of test cases timed out. > > > > Can you help check these? > > I see, it is actually caused by r359438. I'm looking at it now. I verified that the netpfil and netinet6 tests pass with r359477. ___ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r359436 - in head/sys: kern net sys
On Tue, Mar 31, 2020 at 09:40:51AM +0800, Li-Wen Hsu wrote: > On Mon, Mar 30, 2020 at 10:32 PM Mark Johnston wrote: > > > > Author: markj > > Date: Mon Mar 30 14:22:52 2020 > > New Revision: 359436 > > URL: https://svnweb.freebsd.org/changeset/base/359436 > > > > Log: > > Simplify taskqgroup inititialization. > > > > taskqgroup initialization was broken into two steps: > > > > 1. allocate the taskqgroup structure, at SI_SUB_TASKQ; > > 2. initialize taskqueues, start taskqueue threads, enqueue "binder" > > tasks to bind threads to specific CPUs, at SI_SUB_SMP. > > > > Step 2 tries to handle the case where tasks have already been attached > > to a queue, by migrating them to their intended queue. In particular, > > tasks can't be enqueued before step 2 has completed. This breaks NFS > > mountroot on systems using an iflib-based driver when EARLY_AP_STARTUP > > is not defined, since mountroot happens before SI_SUB_SMP in this case. > > > > Simplify initialization: do all initialization except for CPU binding at > > SI_SUB_TASKQ. This means that until CPU binding is completed, group > > tasks may be executed on a CPU other than that to which they were bound, > > but this should not be a problem for existing users of the taskqgroup > > KPIs. > > > > Reported by: sbruno > > Tested by:bdragon, sbruno > > MFC after:1 month > > Sponsored by: The FreeBSD Foundation > > Differential Revision:https://reviews.freebsd.org/D24188 > > > > Modified: > > head/sys/kern/subr_gtaskqueue.c > > head/sys/net/iflib.c > > head/sys/sys/gtaskqueue.h > > Hi Mark, > > I see many "panic: deadlres_td_sleep_q: possible deadlock detected" in > the CI after this commit: > > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14738/consoleFull > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14739/consoleFull > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14741/consoleFull > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14742/consoleFull > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14743/console > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14744/consoleFull > > It seems could be triggered by sys.netinet6.frag6.* > sys.netpfil.common.* sbin.pfctl.pfctl_test.* tests, and there are lots > of test cases timed out. > > Can you help check these? I see, it is actually caused by r359438. I'm looking at it now. ___ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r359436 - in head/sys: kern net sys
On Mon, Mar 30, 2020 at 10:32 PM Mark Johnston wrote: > > Author: markj > Date: Mon Mar 30 14:22:52 2020 > New Revision: 359436 > URL: https://svnweb.freebsd.org/changeset/base/359436 > > Log: > Simplify taskqgroup inititialization. > > taskqgroup initialization was broken into two steps: > > 1. allocate the taskqgroup structure, at SI_SUB_TASKQ; > 2. initialize taskqueues, start taskqueue threads, enqueue "binder" > tasks to bind threads to specific CPUs, at SI_SUB_SMP. > > Step 2 tries to handle the case where tasks have already been attached > to a queue, by migrating them to their intended queue. In particular, > tasks can't be enqueued before step 2 has completed. This breaks NFS > mountroot on systems using an iflib-based driver when EARLY_AP_STARTUP > is not defined, since mountroot happens before SI_SUB_SMP in this case. > > Simplify initialization: do all initialization except for CPU binding at > SI_SUB_TASKQ. This means that until CPU binding is completed, group > tasks may be executed on a CPU other than that to which they were bound, > but this should not be a problem for existing users of the taskqgroup > KPIs. > > Reported by: sbruno > Tested by:bdragon, sbruno > MFC after:1 month > Sponsored by: The FreeBSD Foundation > Differential Revision:https://reviews.freebsd.org/D24188 > > Modified: > head/sys/kern/subr_gtaskqueue.c > head/sys/net/iflib.c > head/sys/sys/gtaskqueue.h Hi Mark, I see many "panic: deadlres_td_sleep_q: possible deadlock detected" in the CI after this commit: https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14738/consoleFull https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14739/consoleFull https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14741/consoleFull https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14742/consoleFull https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14743/console https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14744/consoleFull It seems could be triggered by sys.netinet6.frag6.* sys.netpfil.common.* sbin.pfctl.pfctl_test.* tests, and there are lots of test cases timed out. Can you help check these? Thanks, Li-Wen ___ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"