Re: Panic: 12.2 fails to use VIMAGE jails
On 9 Dec 2020, at 2:31, Peter wrote: On Tue, Dec 08, 2020 at 08:02:47PM +0100, Kristof Provost wrote: ! > Sorry for the bad news. ! > ! You appear to be triggering two or three different bugs there. That is possible. Then there are two or three different bugs in the production code. In any case, my current workaround, i.e. delaying in the exec.poststop exec.poststop = " sleep 6 ; /usr/sbin/ngctl shutdown ${ifname1l}: ; "; helps for it all and makes the system behave solid. This is true with and without Your patch. ! Can you reduce your netgraph use case to a small test case that can trigger ! the problem? I'm sorry, I fear I don't get Your point. Assumed there are actually two or three bugs here, You are asking me to reduce config so that it will trigger only one of them? Is that correct? No, we need a simple case to reproduce these problems. It’s fine if that test case triggers multiple issues. Then let me put this different: assuming this is the OS for the life support system of the manned Jupiter mission. Then, which one of the bugs do You want to get fixed, and which would You prefer to keep and make Your oxygen supply cut off? https://www.youtube.com/watch?v=BEo2g-w545A Happily we’re not in space. ! I’m not likely to be able to do anything unless I can reproduce ! the problem(s). I understand that. From Your former mail I get the impression that you prefer to rely on tests. I consider this a bad habit[1] and prefer logical thinking. (Background: It is not that I would be unwilling to create clean and precisely reproducible scenarious, But, one of my problems is currently, I only have two machines availabe: the graphical one where I'm just typing, and the backend server with the jails that does practically everything. These issues should trigger just fine in VMs. There’s no need for hardware pain. Regards, Kristof ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 'ifconfig lagg0.300 create' fails with 'SIOIFCREATE2: Device not configured' on r368459
El 9/12/20 a las 2:45, Raúl Muñoz - CUSTOS via freebsd-stable escribió: [] > maybe related to: > https://svnweb.freebsd.org/base?view=revision=368346 Excuse me, I meant this one: https://svnweb.freebsd.org/base?view=revision=368297 ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Panic: 12.2 fails to use VIMAGE jails
On Tue, Dec 8, 2020 at 7:45 PM Peter wrote: > > > On Tue, Dec 08, 2020 at 08:02:47PM +0100, Kristof Provost wrote: > > Can you reduce your netgraph use case to a small test case that can trigger > ? the problem? > > I'm sorry, I fear I don't get Your point. > Assumed there are actually two or three bugs here, You are asking me > to reduce config so that it will trigger only one of them? Is that > correct? > > Then let me put this different: assuming this is the OS for the life > support system of the manned Jupiter mission. Then, which one of the > bugs do You want to get fixed, and which would You prefer to keep and > make Your oxygen supply cut off? > > https://www.youtube.com/watch?v=BEo2g-w545A You seem to have misinterpreted this; he doesn't want to narrow it down to one bug, he wants simple steps that he can follow to reproduce any failure, preferably steps that can actually be followed by just about anyone and don't require immense amounts of setup time or additional hardware. Unfortunately, your tone following the misunderstanding was pretty discouraging. Thanks, Kyle Evans ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Panic: 12.2 fails to use VIMAGE jails
On Tue, Dec 08, 2020 at 08:02:47PM +0100, Kristof Provost wrote: ! > Sorry for the bad news. ! > ! You appear to be triggering two or three different bugs there. That is possible. Then there are two or three different bugs in the production code. In any case, my current workaround, i.e. delaying in the exec.poststop > exec.poststop = " >sleep 6 ; >/usr/sbin/ngctl shutdown ${ifname1l}: ; >"; helps for it all and makes the system behave solid. This is true with and without Your patch. ! Can you reduce your netgraph use case to a small test case that can trigger ! the problem? I'm sorry, I fear I don't get Your point. Assumed there are actually two or three bugs here, You are asking me to reduce config so that it will trigger only one of them? Is that correct? Then let me put this different: assuming this is the OS for the life support system of the manned Jupiter mission. Then, which one of the bugs do You want to get fixed, and which would You prefer to keep and make Your oxygen supply cut off? https://www.youtube.com/watch?v=BEo2g-w545A ! I’m not likely to be able to do anything unless I can reproduce ! the problem(s). I understand that. From Your former mail I get the impression that you prefer to rely on tests. I consider this a bad habit[1] and prefer logical thinking. So lets try that: We know that there is a problem with taking down an interface from a VIMAGE, in the way it is done by "jail -r". We know this problem can be solidly workarounded by delaying the interface takedown for a short time. Now with Your patch, we do not get the typical crash at interface takedown. Instead, all of a sudden, there are strange crashes from various other places. And, interestingly, we get these also when STARTING a jail. I think this is not an additional problem, it is instead a valuable information (albeit not the one You might like to get). Furthermore, we get these new crashes always invoked by "ifconfig", and they seem to have in common that somebody tries to obtain information about some interface configuration and receives some bogus. I might conclude, just out of the belly without looking into details, that either - your patch achieves to garble some internal interface data, instead of what it is intended to do, or - the original problem manages to garble internal interface data (leading to the usual crash), and Your patch does not achieve to solve this, but only protects from the immediate consequence. It might also be worth consideration, that, while the problem may be more easy to reproduce with epair, this effect may or may not be a netgraph specific one[2]. Now lets keep in mind that a successful test means EXACTLY NOTHING. By which other means can we confirm that Your patch fully achieves what it is intended for? (E.g. something like dumping and verifying the respective internal tables in-vivo) (Background: It is not that I would be unwilling to create clean and precisely reproducible scenarious, But, one of my problems is currently, I only have two machines availabe: the graphical one where I'm just typing, and the backend server with the jails that does practically everything. Therefore, experimenting on any of them creates considerable pain. I'm working on that issue, trying to get a real server board for the backend so to get the current one free for testing - but what I would like to use, e.g. ASUS Z10PE+cores+regECC, is not something one would easily find on yardsales - and seldom for an acceptable price.) cheerio, PMc [1] Rationale: a failing test tells us that either the test or the application has a bug (50/50 chance). A succeeding test tells us that 1 equals 1, which we knew already before. In fact, tests tell us *nothing at all* about the state of our code, and specifically, 'successful' outcomes do NOT mean that things are all correct. The only true usefulness of tests is to protect against re-introducing a fault that was already fixed before, i.e. regressions. [2] My netgraph configuration consists of bringing up some bridges and then attaching the jails to them. Here is the bridge starter (only respective component, there are more of these populated, but probably not influencing the issue): #! /bin/sh # PROVIDE: netgraphs # REQUIRE: netwait # BEFORE: NETWORKING . /etc/rc.subr name="netgraphs" start_cmd="${name}_start" stop_cmd="${name}_stop" load_rc_config $name netgraphs_graphs="svc" netgraphs_svc_if1_name="nge_svc_1u" netgraphs_svc_if1_mac="00:1d:92:01:02:01" netgraphs_svc_if1_addr="***.***.***.***/29" netgraphs_svc_start() { local _ifname if ngctl info svcswitch: > /dev/null 2>&1; then netgraphs_svc_stop fi echo "Creating SVC Switch" ngctl -f - < /dev/null 2>&1; then $_cmd else echo "netgraphs-start: object $i not found" >&2 fi done
'ifconfig lagg0.300 create' fails with 'SIOIFCREATE2: Device not configured' on r368459
But works on r368184 no problem on regular, not aggregated interfaces can anyone give it a try? maybe related to: https://svnweb.freebsd.org/base?view=revision=368346 ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Panic: 12.2 fails to use VIMAGE jails
Here is the next funny crashdump - I obtained this one twice and also the sysctl_rtsock() again. I can reproduce this by just starting and stopping a most simple jail that does only exec.start = "/bin/sleep 4 &"; (And as usual, when I let it time out, nothing bad happens.) Fatal trap 9: general protection fault while in kernel mode cpuid = 1; apic id = 02 instruction pointer = 0x20:0x80a2ac45 stack pointer = 0x28:0xfe0047cf2890 frame pointer = 0x28:0xfe0047cf2890 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 13557 (ifconfig) trap number = 9 panic: general protection fault cpuid = 1 time = 1607469295 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe0047cf25a0 vpanic() at vpanic+0x17b/frame 0xfe0047cf25f0 panic() at panic+0x43/frame 0xfe0047cf2650 trap_fatal() at trap_fatal+0x391/frame 0xfe0047cf26b0 trap() at trap+0x67/frame 0xfe0047cf27c0 calltrap() at calltrap+0x8/frame 0xfe0047cf27c0 --- trap 0x9, rip = 0x80a2ac45, rsp = 0xfe0047cf2890, rbp = 0xfe0047cf2890 --- strncmp() at strncmp+0x15/frame 0xfe0047cf2890 ifunit_ref() at ifunit_ref+0x59/frame 0xfe0047cf28d0 ifioctl() at ifioctl+0x427/frame 0xfe0047cf2990 kern_ioctl() at kern_ioctl+0x275/frame 0xfe0047cf29f0 sys_ioctl() at sys_ioctl+0x101/frame 0xfe0047cf2ac0 amd64_syscall() at amd64_syscall+0x380/frame 0xfe0047cf2bf0 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe0047cf2bf0 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800475b2a, rsp = 0x7fffe3b8, rbp = 0x7fffe450 --- Uptime: 8m54s Dumping 880 out of 3959 MB: ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Panic: 12.2 fails to use VIMAGE jails
On 8 Dec 2020, at 19:49, Peter wrote: On Tue, Dec 08, 2020 at 04:50:00PM +0100, Kristof Provost wrote: ! Yeah, the bug is not exclusive to epair but that’s where it’s most easily ! seen. Ack. ! Try http://people.freebsd.org/~kp/0001-if-Fix-panic-when-destroying-vnet-and-epair-simultan.patch Great, thanks a lot. Now I have bad news: when playing yoyo with the next-best three application jails (with all their installed stuff) it took about ten up and down's then I got this one: Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 02 fault virtual address = 0x10 fault code = supervisor read data, page not present instruction pointer = 0x20:0x80aad73c stack pointer = 0x28:0xfe003f80e810 frame pointer = 0x28:0xfe003f80e810 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 15486 (ifconfig) trap number = 12 panic: page fault cpuid = 1 time = 1607450838 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe003f80e4d0 vpanic() at vpanic+0x17b/frame 0xfe003f80e520 panic() at panic+0x43/frame 0xfe003f80e580 trap_fatal() at trap_fatal+0x391/frame 0xfe003f80e5e0 trap_pfault() at trap_pfault+0x4f/frame 0xfe003f80e630 trap() at trap+0x4cf/frame 0xfe003f80e740 calltrap() at calltrap+0x8/frame 0xfe003f80e740 --- trap 0xc, rip = 0x80aad73c, rsp = 0xfe003f80e810, rbp = 0xfe003f80e810 --- ng_eiface_mediastatus() at ng_eiface_mediastatus+0xc/frame 0xfe003f80e810 ifmedia_ioctl() at ifmedia_ioctl+0x174/frame 0xfe003f80e850 ifhwioctl() at ifhwioctl+0x639/frame 0xfe003f80e8d0 ifioctl() at ifioctl+0x448/frame 0xfe003f80e990 kern_ioctl() at kern_ioctl+0x275/frame 0xfe003f80e9f0 sys_ioctl() at sys_ioctl+0x101/frame 0xfe003f80eac0 amd64_syscall() at amd64_syscall+0x380/frame 0xfe003f80ebf0 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe003f80ebf0 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800475b2a, rsp = 0x7fffe358, rbp = 0x7fffe450 --- Uptime: 9m51s Dumping 899 out of 3959 MB: I decided to give it a second try, and this is what I did: root@edge:/var/crash # jls JID IP Address Hostname Path 1 1***gate.***.org /j/gate 3 1***raix.***.org /j/raix 4 oper.***.org /j/oper 5 admn.***.org /j/admn 6 data.***.org /j/data 7 conn.***.org /j/conn 8 kerb.***.org /j/kerb 9 tele.***.org /j/tele 10 rail.***.org /j/rail root@edge:/var/crash # service jail stop rail Stopping jails: rail. root@edge:/var/crash # service jail stop tele Stopping jails: tele. root@edge:/var/crash # service jail stop kerb Stopping jails: kerb. root@edge:/var/crash # jls JID IP Address Hostname Path 1 1***gate.***.org /j/gate 3 1***raix.***.org /j/raix 4 oper.***.org /j/oper 5 admn.***.org /j/admn 6 data.***.org /j/data 7 conn.***.org /j/conn root@edge:/var/crash # jls -d JID IP Address Hostname Path 1 1***gate.***.org /j/gate 3 1***raix.***.org /j/raix 4 oper.***.org /j/oper 5 admn.***.org /j/admn 6 data.***.org /j/data 7 conn.***.org /j/conn 9 tele.***.org /j/tele 10 rail.***.org /j/rail root@edge:/var/crash # service jail start kerb Starting jails:Fssh_packet_write_wait: Connection to 1*** port 22: Broken pipe Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 02 fault virtual address = 0x0 fault code = supervisor read instruction, page not present instruction pointer = 0x20:0x0 stack pointer = 0x28:0xfe00540ea658 frame pointer = 0x28:0xfe00540ea670 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 13420 (ifconfig) trap number = 12 panic: page fault cpuid = 1 time = 1607451910 KDB:
Re: Panic: 12.2 fails to use VIMAGE jails
On Tue, Dec 08, 2020 at 04:50:00PM +0100, Kristof Provost wrote: ! Yeah, the bug is not exclusive to epair but that’s where it’s most easily ! seen. Ack. ! Try http://people.freebsd.org/~kp/0001-if-Fix-panic-when-destroying-vnet-and-epair-simultan.patch Great, thanks a lot. Now I have bad news: when playing yoyo with the next-best three application jails (with all their installed stuff) it took about ten up and down's then I got this one: Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 02 fault virtual address = 0x10 fault code = supervisor read data, page not present instruction pointer = 0x20:0x80aad73c stack pointer = 0x28:0xfe003f80e810 frame pointer = 0x28:0xfe003f80e810 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 15486 (ifconfig) trap number = 12 panic: page fault cpuid = 1 time = 1607450838 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe003f80e4d0 vpanic() at vpanic+0x17b/frame 0xfe003f80e520 panic() at panic+0x43/frame 0xfe003f80e580 trap_fatal() at trap_fatal+0x391/frame 0xfe003f80e5e0 trap_pfault() at trap_pfault+0x4f/frame 0xfe003f80e630 trap() at trap+0x4cf/frame 0xfe003f80e740 calltrap() at calltrap+0x8/frame 0xfe003f80e740 --- trap 0xc, rip = 0x80aad73c, rsp = 0xfe003f80e810, rbp = 0xfe003f80e810 --- ng_eiface_mediastatus() at ng_eiface_mediastatus+0xc/frame 0xfe003f80e810 ifmedia_ioctl() at ifmedia_ioctl+0x174/frame 0xfe003f80e850 ifhwioctl() at ifhwioctl+0x639/frame 0xfe003f80e8d0 ifioctl() at ifioctl+0x448/frame 0xfe003f80e990 kern_ioctl() at kern_ioctl+0x275/frame 0xfe003f80e9f0 sys_ioctl() at sys_ioctl+0x101/frame 0xfe003f80eac0 amd64_syscall() at amd64_syscall+0x380/frame 0xfe003f80ebf0 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe003f80ebf0 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800475b2a, rsp = 0x7fffe358, rbp = 0x7fffe450 --- Uptime: 9m51s Dumping 899 out of 3959 MB: I decided to give it a second try, and this is what I did: root@edge:/var/crash # jls JID IP Address Hostname Path 1 1***gate.***.org /j/gate 3 1***raix.***.org /j/raix 4 oper.***.org /j/oper 5 admn.***.org /j/admn 6 data.***.org /j/data 7 conn.***.org /j/conn 8 kerb.***.org /j/kerb 9 tele.***.org /j/tele 10 rail.***.org /j/rail root@edge:/var/crash # service jail stop rail Stopping jails: rail. root@edge:/var/crash # service jail stop tele Stopping jails: tele. root@edge:/var/crash # service jail stop kerb Stopping jails: kerb. root@edge:/var/crash # jls JID IP Address Hostname Path 1 1***gate.***.org /j/gate 3 1***raix.***.org /j/raix 4 oper.***.org /j/oper 5 admn.***.org /j/admn 6 data.***.org /j/data 7 conn.***.org /j/conn root@edge:/var/crash # jls -d JID IP Address Hostname Path 1 1***gate.***.org /j/gate 3 1***raix.***.org /j/raix 4 oper.***.org /j/oper 5 admn.***.org /j/admn 6 data.***.org /j/data 7 conn.***.org /j/conn 9 tele.***.org /j/tele 10 rail.***.org /j/rail root@edge:/var/crash # service jail start kerb Starting jails:Fssh_packet_write_wait: Connection to 1*** port 22: Broken pipe Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 02 fault virtual address = 0x0 fault code = supervisor read instruction, page not present instruction pointer = 0x20:0x0 stack pointer = 0x28:0xfe00540ea658 frame pointer = 0x28:0xfe00540ea670 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 13420 (ifconfig) trap number = 12 panic: page fault cpuid = 1 time = 1607451910 KDB: stack backtrace: db_trace_self_wrapper() at
virtio-9p support in bhyve coming to 12?
Hi all, Any chance of the virtio-9p support in bhyve (to mount a host directory directly inside a VM) landing in 12.3? I ask mainly because r366413 from October says "MFC after: 1 month" (and I'd like to avoid setting up local NFS shares if I can help it). Thanks! M. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Panic: 12.2 fails to use VIMAGE jails
On 8 Dec 2020, at 0:34, Peter wrote: Hi Kristof, it's great to read You! On Mon, Dec 07, 2020 at 09:11:32PM +0100, Kristof Provost wrote: ! That smells a lot like the epair/vnet issues in bugs 238870, 234985, 244703, ! 250870. epair? No. It is purely Netgrh here. Yeah, the bug is not exclusive to epair but that’s where it’s most easily seen. ! I pushed a fix for that in CURRENT in r368237. It’s scheduled to go into ! stable/12 sometime next week, but it’d be good to know that it fixes your ! problem too before I merge it. ! In other words: can you test a recent CURRENT? It’s likely fixed there, and ! if it’s not I may be able to fix it quickly. Oh my Gods. No offense meant, but this is not really a good time for that. This is the most horrible upgrade I experienced in 25 years FreeBSD (and it was prepared, 12.2 did run fine on the other machine). I have issue with mem config https://forums.freebsd.org/threads/fun-with-upgrading-sysctl-unknown-oid-vm-pageout_wakeup_thresh.77955/ I have issue with damaged filesystem, for no apparent reason https://forums.freebsd.org/threads/no-longer-fun-with-upgrading-file-offline.77959/ Then I have this issue here which is now gladly workarounded https://forums.freebsd.org/threads/panic-12-2-does-not-work-with-jails.77962/post-486365 and when I then dare to have a look at my applications, they look like sheer horror, segfaults all over, and I don't even know where to begin with these. Other option: can you make this fix so that I can patch it into 12.2 source and just redeploy? Try http://people.freebsd.org/~kp/0001-if-Fix-panic-when-destroying-vnet-and-epair-simultan.patch That’s currently running the regression tests that used to provoke the panic nearly instantly, and no panics so far. Best regards. Kristof ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"