Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface
Thanks for your support. I'll try to test when you get it done. On Mon, May 9, 2022 at 8:51 PM Alexandr Nedvedicky < alexandr.nedvedi...@oracle.com> wrote: > Hello Barbaros, > > thank you for testing and excellent report. > > > > > ddb{1}> trace > > db_enter() at db_enter+0x10 > > panic(81f22e39) at panic+0xbf > > __assert(81f96c9d,81f85ebc,a3,81fd252f) at > __assert+0x25 > > assertwaitok() at assertwaitok+0xcc > > mi_switch() at mi_switch+0x40 > > assert indicates we attempt to sleep inside SMR section, > which must be avoided. > > > sleep_finish(800025574da0,1) at sleep_finish+0x10b > > rw_enter(822cfe50,1) at rw_enter+0x1cb > > pf_test(2,1,8520e000,800025575058) at pf_test+0x1088 > > ip_input_if(800025575058,800025575064,4,0,8520e000) at > ip_input_if+0xcd > > ipv4_input(8520e000,fd8053616700) at ipv4_input+0x39 > > ether_input(8520e000,fd8053616700) at ether_input+0x3ad > > vport_if_enqueue(8520e000,fd8053616700) at > vport_if_enqueue+0x19 > > > veb_port_input(851c3800,fd806064c200,,82066600) > at veb_port_input+0x4d2 > > ether_input(851c3800,fd806064c200) at ether_input+0x100 > > vlan_input(8095a050,fd806064c200,8000255752bc) at > vlan_input+0x23d > > ether_input(8095a050,fd806064c200) at ether_input+0x85 > > if_input_process(8095a050,800025575358) at > if_input_process+0x6f > > ifiq_process(8095a460) at ifiq_process+0x69 > > taskq_thread(80035080) at taskq_thread+0x100 > > above is a call stack, which has done a bad thing (sleeping SMR > section) > > in my opinion the primary suspect is veb_port_input() which code reads as > follows: > > 966 static struct mbuf * > 967 veb_port_input(struct ifnet *ifp0, struct mbuf *m, uint64_t dst, void > *brport) > 968 { > 969 struct veb_port *p = brport; > 970 struct veb_softc *sc = p->p_veb; > 971 struct ifnet *ifp = >sc_if; > 972 struct ether_header *eh; > ... > 1021 counters_pkt(ifp->if_counters, ifc_ipackets, ifc_ibytes, > 1022 m->m_pkthdr.len); > 1023 > 1024 /* force packets into the one routing domain for pf */ > 1025 m->m_pkthdr.ph_rtableid = ifp->if_rdomain; > 1026 > 1027 #if NBPFILTER > 0 > 1028 if_bpf = READ_ONCE(ifp->if_bpf); > 1029 if (if_bpf != NULL) { > 1030 if (bpf_mtap_ether(if_bpf, m, 0) != 0) > 1031 goto drop; > 1032 } > 1033 #endif > 1034 > 1035 veb_span(sc, m); > 1036 > 1037 if (ISSET(p->p_bif_flags, IFBIF_BLOCKNONIP) && > 1038 veb_ip_filter(m)) > 1039 goto drop; > 1040 > 1041 if (!ISSET(ifp->if_flags, IFF_LINK0) && > 1042 veb_vlan_filter(m)) > 1043 goto drop; > 1044 > 1045 if (veb_rule_filter(p, VEB_RULE_LIST_IN, m, src, dst)) > 1046 goto drop; > > call to veb_span() at line 1035 seems to be our guy/culprit (in my > opinion): > > 356 smr_read_enter(); > 357 SMR_TAILQ_FOREACH(p, >sc_spans.l_list, p_entry) { > 358 ifp0 = p->p_ifp0; > 359 if (!ISSET(ifp0->if_flags, IFF_RUNNING)) > 360 continue; > 361 > 362 m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, > M_NOWAIT); > 363 if (m == NULL) { > 364 /* XXX count error */ > 365 continue; > 366 } > 367 > 368 if_enqueue(ifp0, m); /* XXX count error */ > 369 } > 370 smr_read_leave(); > > loop above comes from veb_span(), which calls if_enqueue() from within > a smr section. The line 368 calls here: > > 2191 static int > 2192 vport_if_enqueue(struct ifnet *ifp, struct mbuf *m) > 2193 { > 2194 /* > 2195 * switching an l2 packet toward a vport means pushing it > 2196 * into the network stack. this function exists to make > 2197 * if_vinput compat with veb calling if_enqueue. > 2198 */ > 2199 > 2200 if_vinput(ifp, m); > 2201 > 2202 return (0); > 2203 } > > which in turn calls if_vinput() which calls further down to ipstack, and IP > stack my sleep. We must change veb_span() such calls to if_vinput() will > happen > outside of SMR section. > > I don't have such complex setup to use vlans and virtual ports. I'll try to > cook some diff and pass it to you for testing. > > thanks again for coming back to us with report. > > regards > sashan > > >
7.1-Current crash with NET_TASKQ 4 and veb interface
Hello, I was using veb (veb+vlan+ixl) interfaces quite stable since 6.9. My system ran as a firewall under OpenBSD 6.9 and 7.0 quite stable. Also I've used 7.1 for a limited time and there were no crash. After OpenBSD' NET_TASKQ upgrade to 4 it crashed after 5 days. Here crash report and dmesg: ether_input(8520e000,fd8053616700) at ether_input+0x3ad vport_if_enqueue(8520e000,fd8053616700) at vport_if_enqueue+0x19 veb_port_input(851c3800,fd806064c200,,82066600) at veb_port_input+0x4d2 ether_input(851c3800,fd806064c200) at ether_input+0x100 end trace frame: 0x800025575290, count: 0 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{1}> show panic *cpu1: kernel diagnostic assertion "curcpu()->ci_schedstate.spc_smrdepth == 0" f ailed: file "/usr/src/sys/kern/subr_xxx.c", line 163 ddb{1}> trace db_enter() at db_enter+0x10 panic(81f22e39) at panic+0xbf __assert(81f96c9d,81f85ebc,a3,81fd252f) at __assert+0x2 5 assertwaitok() at assertwaitok+0xcc mi_switch() at mi_switch+0x40 sleep_finish(800025574da0,1) at sleep_finish+0x10b rw_enter(822cfe50,1) at rw_enter+0x1cb pf_test(2,1,8520e000,800025575058) at pf_test+0x1088 ip_input_if(800025575058,800025575064,4,0,8520e000) at ip_input _if+0xcd ipv4_input(8520e000,fd8053616700) at ipv4_input+0x39 ether_input(8520e000,fd8053616700) at ether_input+0x3ad vport_if_enqueue(8520e000,fd8053616700) at vport_if_enqueue+0x19 veb_port_input(851c3800,fd806064c200,,82066600) at veb_port_input+0x4d2 ether_input(851c3800,fd806064c200) at ether_input+0x100 vlan_input(8095a050,fd806064c200,8000255752bc) at vlan_input+0x 23d ether_input(8095a050,fd806064c200) at ether_input+0x85 if_input_process(8095a050,800025575358) at if_input_process+0x6f ifiq_process(8095a460) at ifiq_process+0x69 taskq_thread(80035080) at taskq_thread+0x100 end trace frame: 0x0, count: -19 ddb{1}> ps /o TIDPIDUID PRFLAGS PFLAGS CPU COMMAND 422021 80579 0 0x2 07 ifconfig 292011 89065020x12 0x4008 mariadbd 427181 89065020x12 0x4006K mariadbd 86788 89065020x12 0x4003 mariadbd 302453 98158 0 0x14000 0x2009 softnet 88346 66890 0 0x14000 0x2005 softnet ddb{1}> machine ddbcpu 2 Stopped at x86_ipi_db+0x12:leave x86_ipi_db(80001d1c3ff0) at x86_ipi_db+0x12 x86_ipi_handler() at x86_ipi_handler+0x80 Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23 acpicpu_idle() at acpicpu_idle+0x203 sched_idle(80001d1c3ff0) at sched_idle+0x280 end trace frame: 0x0, count: 10 ddb{2}> trace x86_ipi_db(80001d1c3ff0) at x86_ipi_db+0x12 x86_ipi_handler() at x86_ipi_handler+0x80 Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23 acpicpu_idle() at acpicpu_idle+0x203 sched_idle(80001d1c3ff0) at sched_idle+0x280 end trace frame: 0x0, count: -5 ddb{2}> machine ddbcpu 2 Invalid cpu 2 ddb{2}> t[A[A Bad character x86_ipi_db(80001d1c3ff0) at x86_ipi_db+0x12 x86_ipi_handler() at x86_ipi_handler+0x80 Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23 acpicpu_idle() at acpicpu_idle+0x203 sched_idle(80001d1c3ff0) at sched_idle+0x280 end trace frame: 0x0, count: -5 ddb{2}> machine ddbcpu 3 Stopped at x86_ipi_db+0x12:leave x86_ipi_db(80001d1ccff0) at x86_ipi_db+0x12 x86_ipi_handler() at x86_ipi_handler+0x80 Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23 __mp_lock(823d986c) at __mp_lock+0x72 wakeup_n(8000fffeca88,1) at wakeup_n+0x32 futex_requeue(c13fb4a32e0,1,0,0,2) at futex_requeue+0xe4 sys_futex(8000fffc2008,8000265ca780,8000265ca7d0) at sys_futex+0xe6 syscall(8000265ca840) at syscall+0x374 Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0xc13f5b4b090, count: 6 ddb{3}> trace x86_ipi_db(80001d1ccff0) at x86_ipi_db+0x12 x86_ipi_handler() at x86_ipi_handler+0x80 Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23 __mp_lock(823d986c) at __mp_lock+0x72 wakeup_n(8000fffeca88,1) at wakeup_n+0x32 futex_requeue(c13fb4a32e0,1,0,0,2) at futex_requeue+0xe4 sys_futex(8000fffc2008,8000265ca780,8000265ca7d0) at sys_futex+0xe6 syscall(8000265ca840) at syscall+0x374 Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0xc13f5b4b090, count: -9 ddb{3}> machine ddbcpu 4 Stopped at x86_ipi_db+0x12:leave x86_ipi_db(80001d1d5ff0) at x86_ipi_db+0x12 x86_ipi_handler() at x86_ipi_handler+0x80 Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23 acpicpu_idle() at acpicpu_idle+0x203 sched_idle(80001d1d5ff0) at