Re: More CARP issues under 12
So, another datapoint on this - I just PXE booted the 12.0-RELEASE image downloaded from https://mfsbsd.vx.sk/ and that works fine. Which means that it siether something which has crept in since 12.0-RELEASE or its something to do with my config on that machine. I did try and buld an mfsroot image of the kernel I am trying to deploy,. but that failed, which is a bit of a shame, as thats easier to try than the full upgrade (because rolling that back after a crash is tricky!). The laternative is to bild 12.0-RELEASE and see if that boots up. Not sure when I will get around to trying either of those though. -pete. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: More CARP issues under 12
On 06/02/2019 12:16, Andrey V. Elsukov wrote: Hi, this doesn't look very useful. Do you have some specificity with this host except carp? Some modifications to kernel config, lagg, jails, etc. No, none of those. Its a supermicro motherboard, runs FreeBSD GENERIC and mysql+redis on top, thats it. The only oddity is carp (used to fail over the redis). but the panic happens when I disable carp and have removed all the ports too. My only customisation to the build is to disable sendmail and lpr. We do use geli for the dirves, and load aesni as a module as well to speed that up. loader.conf below: kern.geom.label.disk_ident.enable=0 kern.geom.label.gptid.enable=0 ahci_load="YES" console="comconsole" aesni_load="YES" cryptodev_load="YES" geom_eli_load="YES" carp_load="YES" zfs_load="YES" vfs.zfs.arc_max="1G" vfs.zfs.prefetch_disable="1" vfs.zfs.txg.timeout="5" vfs.zfs.vdev.cache.size="10M" vfs.zfs.vdev.cache.max="10M" rc.conf below geli_enable="YES" geli_autodetach="NO" geli_devices="ada0p4 ada1p4" hostname="serpentine-passive.telehouse-internal.ingresso.co.uk" ifconfig_igb0="inet 10.32.10.4/16" ifconfig_igb0_ipv6="inet6 2a02:1658:1:2:e550::4/64" ifconfig_igb0_alias0="inet 10.32.10.8/16 vhid 80 advskew 160 pass redacted" defaultrouter="10.32.10.6" ipv6_defaultrouter="2a02:1658:1:2:e550::6" ifconfig_igb1="down" pf_enable="NO" pf_rules="/usr/local/etc/pf.conf" redis_enable="YES" stunnel_enable="YES" mysql_enable="YES" mysql_dbdir="/usr/home/mysql/data" tsw_redis_capture_enable="YES" tsw_redis_capture_if="igb0" datadog_enable="YES" datadog_user="root" datadog_chdir="/usr/local/datadog" sshd_enable="YES" named_enable="YES" zfs_enable="YES" ntpd_enable="YES" syslogd_enable="NO" syslog_ng_enable="YES" exim_enable="YES" sendmail_enable="NO" sendmail_submit_enable="NO" sendmail_outbound_enable="NO" sendmail_msp_queue_enable="NO" nfs_server_enable="NO" nfs_client_enable="YES" nfsv4_server_enable="NO" nfsuserd_enable="YES" rpcbind_enable="YES" rpc_lockd_enable="YES" rpc_lockd_flags="-p 819" rpc_statd_enable="YES" rpc_statd_flags="-p 823" mountd_enable="NO" fluentd_enable="YES" The tsw_redis_capture script just set the carp to MASTER if redis is enabled - means if the machine boots without redis running then carp wont grap the address anyway. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: More CARP issues under 12
On 05.02.2019 18:06, Pete French wrote: > The branch and revision is 12.0-STABLE r343538 GENERIC > >> # kgdb >> >> (kgdb) list *ether_output+0x6b6 > > trying to do this on the actual box is hard, as it panics, but on another > machine running the same build I get this, which should suffice if you > are just interested in seeing the line in the source code ? > > (kgdb) list *ether_output+0x6b6 > 0x80ca1526 is in ether_output (/usr/src/sys/net/if_ethersubr.c:435). > 430 if (m == NULL) > 431 return (0); > 432 } > 433 > 434 /* Continue with link-layer output */ > 435 return ether_output_frame(ifp, m); > 436 } > 437 > 438 static bool > 439 ether_set_pcp(struct mbuf **mp, struct ifnet *ifp, uint8_t pcp) Hi, this doesn't look very useful. Do you have some specificity with this host except carp? Some modifications to kernel config, lagg, jails, etc. -- WBR, Andrey V. Elsukov signature.asc Description: OpenPGP digital signature
Re: More CARP issues under 12
> Hi, > > What branch and revision do you use? Can you install gdb and then obtain > this information: The branch and revision is 12.0-STABLE r343538 GENERIC > # kgdb > > (kgdb) list *ether_output+0x6b6 trying to do this on the actual box is hard, as it panics, but on another machine running the same build I get this, which should suffice if you are just interested in seeing the line in the source code ? (kgdb) list *ether_output+0x6b6 0x80ca1526 is in ether_output (/usr/src/sys/net/if_ethersubr.c:435). 430 if (m == NULL) 431 return (0); 432 } 433 434 /* Continue with link-layer output */ 435 return ether_output_frame(ifp, m); 436 } 437 438 static bool 439 ether_set_pcp(struct mbuf **mp, struct ifnet *ifp, uint8_t pcp) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: More CARP issues under 12
On 17.01.2019 15:19, Pete French wrote: > so, having got a workaround for yesterdays problems, I now went to upgrade my > other pair of boxes using CARP. No 'pf' on these, just one shared address. > This is the setup I have tested in development and it works fine. > > I install the new kenel and do the first reboot - and I get the panic > below. Maybe its not carp related, but seems suspicious as the last > thing it spits out is a carp message. > > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 00 > fault virtual address = 0x28 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0x80ca0de1 > stack pointer = 0x28:0xfe4da740 > frame pointer = 0x28:0xfe4da760 > code segment= base 0x0, limit 0xf, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags= interrupt enabled, resume, IOPL = 0 > current process = 12 (swi4: clock (0)) > trap number = 12 > panic: page fault > cpuid = 0 > time = 1547727391 > KDB: stack backtrace: > #0 0x80be8597 at kdb_backtrace+0x67 > #1 0x80b9ccf3 at vpanic+0x1a3 > #2 0x80b9cb43 at panic+0x43 > #3 0x8107382f at trap_fatal+0x35f > #4 0x81073889 at trap_pfault+0x49 > #5 0x81072eae at trap+0x29e > #6 0x8104e1a5 at calltrap+0x8 > #7 0x80ca0ce6 at ether_output+0x6b6 > #8 0x80d0bda4 at arprequest+0x4c4 > #9 0x80d0d9fc at garp_rexmit+0xbc > #10 0x80bb6ba9 at softclock_call_cc+0x129 > #11 0x80bb7089 at softclock+0x79 > #12 0x80b60e79 at ithread_loop+0x169 > #13 0x80b5e012 at fork_exit+0x82 > #14 0x8104f18e at fork_trampoline+0xe > Uptime: 19s Hi, What branch and revision do you use? Can you install gdb and then obtain this information: # kgdb (kgdb) list *ether_output+0x6b6 -- WBR, Andrey V. Elsukov signature.asc Description: OpenPGP digital signature
Kernel panic going multiuser under 12 ( was Re: More CARP issues under 12 (maybe not CARP after all))
Just to get the subject correct, as I tested this disabling CARP and I still see the panic when going multi-user. It netwprking related as the panic is in the ARP code, and seems to happen when the network interfaces are configured. The machine was using a mix of em and igb interfaces, but is now igb only. -pete. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: More CARP issues under 12 (maybe not CARP after all)
> > To point out the obvious, booting a 12.0 kernel with 11.0 userland to > > multiuser mode is seriously unsupported. You really need to boot to > > single user and install 12.0 userland to really expect things to work. > > Yes, good point. This has worked on every other machine I have upgraded > from 11 to 12, which is why I didnt think of that, but then all the > motherboards are slightly different. So, I went back to this, and di it properly. Booted single user mode, which worked, then installed world, mergemaster, and rebooted single user mode. ...and I get a kernel panic as I did before. So it wasn't the 11 world with the 12 kernel after all. Panic is reproduced below. I am somewhat stuck now though - where do I go from here ? Feeding entropy lo0: link state changed to UP carp: demoted by 240 to 240 (interface down) Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x28 fault code = supervisor read data, page not present instruction pointer = 0x20:0x80ca1621 stack pointer = 0x28:0xfe4da740 frame pointer = 0x28:0xfe4da760 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 12 (swi4: clock (0)) trap number = 12 panic: page fault cpuid = 0 time = 1549292394 KDB: stack backtrace: #0 0x80be8d57 at kdb_backtrace+0x67 #1 0x80b9d293 at vpanic+0x1a3 #2 0x80b9d0e3 at panic+0x43 #3 0x8107384f at trap_fatal+0x35f #4 0x810738a9 at trap_pfault+0x49 #5 0x81072ece at trap+0x29e #6 0x8104ee55 at calltrap+0x8 #7 0x80ca1526 at ether_output+0x6b6 #8 0x80d0c824 at arprequest+0x4c4 #9 0x80d0e47c at garp_rexmit+0xbc #10 0x80bb7169 at softclock_call_cc+0x129 #11 0x80bb7649 at softclock+0x79 #12 0x80b613a4 at ithread_loop+0x1d4 #13 0x80b5e2d2 at fork_exit+0x82 #14 0x8104fe3e at fork_trampoline+0xe Uptime: 11s ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: More CARP issues under 12 (maybe not CARP after all)
To point out the obvious, booting a 12.0 kernel with 11 userland to multiuser mode is seriously unsupported. You really need to boot to single user and install 12.0 userland to really expect things to work. Yes, good point. This has worked on every other machine I have upgraded from 11 to 12, which is why I didnt think of that, but then all the motherboards are slightly different. Is there a reason that a standalone boot is not possible? Sort of - I am on a serial console to do this, which works in the BIOS, and works after the kernel has started booting, but does not work in the loader for some reason, so I can't select single user. So I go to single user by booting multi user and the shutting down. Of course I could use nextboot, so its just lazyness on my part actually. Thanks for pointing this out, I immediately jumped to the CARP conclusion due to last weeks experiences on the other machine, but actually this is far more likely to be the issue. -pete. PS: apparently I have been playing fast and loose with this - and bothering the mailing list about it - since 2005... :-) http://freebsd.1045724.x6.nabble.com/upgrading-5-4-gt-6-0-without-reinstalling-safe-td3932902.html Time to change my ways I think! ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: More CARP issues under 12
On Thu, Jan 17, 2019 at 4:21 AM Pete French wrote: > so, having got a workaround for yesterdays problems, I now went to upgrade > my > other pair of boxes using CARP. No 'pf' on these, just one shared address. > This is the setup I have tested in development and it works fine. > > I install the new kenel and do the first reboot - and I get the panic > below. Maybe its not carp related, but seems suspicious as the last > thing it spits out is a carp message. > > Note this is the first reboot - so 12.0 kernel with the 11 userland, in > perparation for the installworld step. Machine is booting from, ZFS and > also > has a GELI partition contain some data which requires a manual password. > To point out the obvious, booting a 12.0 kernel with 11 userland to multiuser mode is seriously unsupported. You really need to boot to single user and install 12.0 userland to really expect things to work. OTOH, while I would expect that MANY things might not work, panics should not come from problems in userland. Still, I would not be at all shocked if it turns out to be coincidental to CARP. I would also not be shocked if this makes no difference, but even between minor version updates, there can be issues when the kernel and userland are different versions. Is there a reason that a standalone boot is not possible? Both installworld and mergemaster should be run before moving to multiuser mode and reboot is preferred to exit. In some cases even delete-old can be required before safely going to multimode. -- Kevin Oberman, Part time kid herder and retired Network Engineer E-mail: rkober...@gmail.com PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683 ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
More CARP issues under 12
so, having got a workaround for yesterdays problems, I now went to upgrade my other pair of boxes using CARP. No 'pf' on these, just one shared address. This is the setup I have tested in development and it works fine. I install the new kenel and do the first reboot - and I get the panic below. Maybe its not carp related, but seems suspicious as the last thing it spits out is a carp message. Note this is the first reboot - so 12.0 kernel with the 11 userland, in perparation for the installworld step. Machine is booting from, ZFS and also has a GELI partition contain some data which requires a manual password. Setting hostuuid: 49434d53-0200-9031-2500-31902500d9bf. Setting hostid: 0xf41a3f2e. Starting file system checks: Mounting local filesystems:. ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib 32-bit compatibility ldconfig path: /usr/lib32 Setting hostname: serpentine-passive.telehouse-internal.ingresso.co.uk. Setting up harvesting: PURE_RDRAND,[UMA],[FS_ATIME],SWI,INTERRUPT,NET_NG,NET_ETHER,NET_TUN,MOUSE,KEYBOARD,ATTACH,CACHED Feeding entropy: . lo0: link state changed to UP em0: promiscuous mode enabled carp: demoted by 240 to 240 (interface down) Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x28 fault code = supervisor read data, page not present instruction pointer = 0x20:0x80ca0de1 stack pointer = 0x28:0xfe4da740 frame pointer = 0x28:0xfe4da760 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 12 (swi4: clock (0)) trap number = 12 panic: page fault cpuid = 0 time = 1547727391 KDB: stack backtrace: #0 0x80be8597 at kdb_backtrace+0x67 #1 0x80b9ccf3 at vpanic+0x1a3 #2 0x80b9cb43 at panic+0x43 #3 0x8107382f at trap_fatal+0x35f #4 0x81073889 at trap_pfault+0x49 #5 0x81072eae at trap+0x29e #6 0x8104e1a5 at calltrap+0x8 #7 0x80ca0ce6 at ether_output+0x6b6 #8 0x80d0bda4 at arprequest+0x4c4 #9 0x80d0d9fc at garp_rexmit+0xbc #10 0x80bb6ba9 at softclock_call_cc+0x129 #11 0x80bb7089 at softclock+0x79 #12 0x80b60e79 at ithread_loop+0x169 #13 0x80b5e012 at fork_exit+0x82 #14 0x8104f18e at fork_trampoline+0xe Uptime: 19s ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"