Re: More CARP issues under 12

2019-02-08 Thread Pete French
So, another datapoint on this - I just PXE booted the 12.0-RELEASE image
downloaded from https://mfsbsd.vx.sk/ and that works fine. Which means
that it siether something which has crept in since 12.0-RELEASE or its
something to do with my config on that machine.

I did try and buld an mfsroot image of the kernel I am trying to
deploy,. but that failed, which is a bit of a shame, as thats easier to
try than the full upgrade (because rolling that back after a crash
is tricky!). The laternative is to bild 12.0-RELEASE and see if that boots
up. Not sure when I will get around to trying either of those though.

-pete.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: More CARP issues under 12

2019-02-06 Thread Pete French




On 06/02/2019 12:16, Andrey V. Elsukov wrote:

Hi,

this doesn't look very useful.
Do you have some specificity with this host except carp? Some
modifications to kernel config, lagg, jails, etc.


No, none of those. Its a supermicro motherboard, runs FreeBSD
GENERIC and mysql+redis on top, thats it. The only oddity is
carp (used to fail over the redis). but the panic happens when
I disable carp and have removed all the ports too. My only customisation 
to the build is to disable sendmail and lpr.


We do use geli for the dirves, and load aesni as a module as well to 
speed that up.


loader.conf below:

kern.geom.label.disk_ident.enable=0
kern.geom.label.gptid.enable=0

ahci_load="YES"
console="comconsole"

aesni_load="YES"
cryptodev_load="YES"
geom_eli_load="YES"
carp_load="YES"

zfs_load="YES"
vfs.zfs.arc_max="1G"
vfs.zfs.prefetch_disable="1"
vfs.zfs.txg.timeout="5"
vfs.zfs.vdev.cache.size="10M"
vfs.zfs.vdev.cache.max="10M"

rc.conf below

geli_enable="YES"
geli_autodetach="NO"
geli_devices="ada0p4 ada1p4"

hostname="serpentine-passive.telehouse-internal.ingresso.co.uk"

ifconfig_igb0="inet 10.32.10.4/16"
ifconfig_igb0_ipv6="inet6 2a02:1658:1:2:e550::4/64"
ifconfig_igb0_alias0="inet 10.32.10.8/16 vhid 80 advskew 160 pass 
redacted"

defaultrouter="10.32.10.6"
ipv6_defaultrouter="2a02:1658:1:2:e550::6"

ifconfig_igb1="down"

pf_enable="NO"
pf_rules="/usr/local/etc/pf.conf"

redis_enable="YES"
stunnel_enable="YES"

mysql_enable="YES"
mysql_dbdir="/usr/home/mysql/data"

tsw_redis_capture_enable="YES"
tsw_redis_capture_if="igb0"

datadog_enable="YES"
datadog_user="root"
datadog_chdir="/usr/local/datadog"

sshd_enable="YES"
named_enable="YES"
zfs_enable="YES"
ntpd_enable="YES"

syslogd_enable="NO"
syslog_ng_enable="YES"

exim_enable="YES"
sendmail_enable="NO"
sendmail_submit_enable="NO"
sendmail_outbound_enable="NO"
sendmail_msp_queue_enable="NO"

nfs_server_enable="NO"
nfs_client_enable="YES"
nfsv4_server_enable="NO"
nfsuserd_enable="YES"
rpcbind_enable="YES"
rpc_lockd_enable="YES"
rpc_lockd_flags="-p 819"
rpc_statd_enable="YES"
rpc_statd_flags="-p 823"
mountd_enable="NO"

fluentd_enable="YES"

The tsw_redis_capture script just set the carp to MASTER if redis is 
enabled - means if the machine boots without redis running then carp 
wont grap the address anyway.


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: More CARP issues under 12

2019-02-06 Thread Andrey V. Elsukov
On 05.02.2019 18:06, Pete French wrote:
> The branch and revision is 12.0-STABLE r343538 GENERIC
> 
>> # kgdb
>>
>> (kgdb) list *ether_output+0x6b6
> 
> trying to do this on the actual box is hard, as it panics, but on another
> machine running the same build I get this, which should suffice if you
> are just interested in seeing the line in the source code ?
> 
> (kgdb)  list *ether_output+0x6b6
> 0x80ca1526 is in ether_output (/usr/src/sys/net/if_ethersubr.c:435).
> 430 if (m == NULL)
> 431 return (0);
> 432 }
> 433
> 434 /* Continue with link-layer output */
> 435 return ether_output_frame(ifp, m);
> 436 }
> 437
> 438 static bool
> 439 ether_set_pcp(struct mbuf **mp, struct ifnet *ifp, uint8_t pcp)

Hi,

this doesn't look very useful.
Do you have some specificity with this host except carp? Some
modifications to kernel config, lagg, jails, etc.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: More CARP issues under 12

2019-02-05 Thread Pete French
> Hi,
>
> What branch and revision do you use? Can you install gdb and then obtain
> this information:

The branch and revision is 12.0-STABLE r343538 GENERIC

> # kgdb
>
> (kgdb) list *ether_output+0x6b6

trying to do this on the actual box is hard, as it panics, but on another
machine running the same build I get this, which should suffice if you
are just interested in seeing the line in the source code ?

(kgdb)  list *ether_output+0x6b6
0x80ca1526 is in ether_output (/usr/src/sys/net/if_ethersubr.c:435).
430 if (m == NULL)
431 return (0);
432 }
433
434 /* Continue with link-layer output */
435 return ether_output_frame(ifp, m);
436 }
437
438 static bool
439 ether_set_pcp(struct mbuf **mp, struct ifnet *ifp, uint8_t pcp)


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: More CARP issues under 12

2019-02-05 Thread Andrey V. Elsukov
On 17.01.2019 15:19, Pete French wrote:
> so, having got a workaround for yesterdays problems, I now went to upgrade my
> other pair of boxes using CARP. No 'pf' on these, just one shared address.
> This is the setup I have tested in development and it works fine.
> 
> I install the new kenel and do the first reboot - and I get the panic
> below. Maybe its not carp related, but seems suspicious as the last
> thing it spits out is a carp message.
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address   = 0x28
> fault code  = supervisor read data, page not present
> instruction pointer = 0x20:0x80ca0de1
> stack pointer   = 0x28:0xfe4da740
> frame pointer   = 0x28:0xfe4da760
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process = 12 (swi4: clock (0))
> trap number = 12
> panic: page fault
> cpuid = 0
> time = 1547727391
> KDB: stack backtrace:
> #0 0x80be8597 at kdb_backtrace+0x67
> #1 0x80b9ccf3 at vpanic+0x1a3
> #2 0x80b9cb43 at panic+0x43
> #3 0x8107382f at trap_fatal+0x35f
> #4 0x81073889 at trap_pfault+0x49
> #5 0x81072eae at trap+0x29e
> #6 0x8104e1a5 at calltrap+0x8
> #7 0x80ca0ce6 at ether_output+0x6b6
> #8 0x80d0bda4 at arprequest+0x4c4
> #9 0x80d0d9fc at garp_rexmit+0xbc
> #10 0x80bb6ba9 at softclock_call_cc+0x129
> #11 0x80bb7089 at softclock+0x79
> #12 0x80b60e79 at ithread_loop+0x169
> #13 0x80b5e012 at fork_exit+0x82
> #14 0x8104f18e at fork_trampoline+0xe
> Uptime: 19s

Hi,

What branch and revision do you use? Can you install gdb and then obtain
this information:

# kgdb

(kgdb) list *ether_output+0x6b6

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Kernel panic going multiuser under 12 ( was Re: More CARP issues under 12 (maybe not CARP after all))

2019-02-05 Thread Pete French




Just to get the subject correct, as I tested this disabling CARP and I 
still see the panic when going multi-user. It netwprking related as the 
panic is in the ARP code, and seems to happen when the network 
interfaces are configured. The machine was using a mix of em and igb 
interfaces, but is now igb only.


-pete.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: More CARP issues under 12 (maybe not CARP after all)

2019-02-04 Thread Pete French
> > To point out the obvious, booting a 12.0 kernel with 11.0 userland to 
> > multiuser mode is seriously unsupported. You really need to boot to 
> > single user and install 12.0 userland to really expect things to work.
>
> Yes, good point. This has worked on every other machine I have upgraded 
> from 11 to 12, which is why I didnt think of that, but then all the 
> motherboards are slightly different.

So, I went back to this, and di it properly. Booted single user mode,
which worked, then installed world, mergemaster, and rebooted single
user mode.

...and I get a kernel panic as I did before. So it wasn't the 11 world with
the 12 kernel after all. Panic is reproduced below. I am somewhat stuck
now though - where do I go from here ?

Feeding entropy
lo0: link state changed to UP
carp: demoted by 240 to 240 (interface down)


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x28
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80ca1621
stack pointer   = 0x28:0xfe4da740
frame pointer   = 0x28:0xfe4da760
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 12 (swi4: clock (0))
trap number = 12
panic: page fault
cpuid = 0
time = 1549292394
KDB: stack backtrace:
#0 0x80be8d57 at kdb_backtrace+0x67
#1 0x80b9d293 at vpanic+0x1a3
#2 0x80b9d0e3 at panic+0x43
#3 0x8107384f at trap_fatal+0x35f
#4 0x810738a9 at trap_pfault+0x49
#5 0x81072ece at trap+0x29e
#6 0x8104ee55 at calltrap+0x8
#7 0x80ca1526 at ether_output+0x6b6
#8 0x80d0c824 at arprequest+0x4c4
#9 0x80d0e47c at garp_rexmit+0xbc
#10 0x80bb7169 at softclock_call_cc+0x129
#11 0x80bb7649 at softclock+0x79
#12 0x80b613a4 at ithread_loop+0x1d4
#13 0x80b5e2d2 at fork_exit+0x82
#14 0x8104fe3e at fork_trampoline+0xe
Uptime: 11s


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: More CARP issues under 12 (maybe not CARP after all)

2019-01-20 Thread Pete French
To point  out the obvious, booting a 12.0 kernel with 11  userland to 
multiuser mode is seriously unsupported. You really need to boot to 
single user and install 12.0 userland to really expect things to work.


Yes, good point. This has worked on every other machine I have upgraded 
from 11 to 12, which is why I didnt think of that, but then all the 
motherboards are slightly different.



Is there a reason that a standalone boot is not possible?


Sort of - I am on a serial console to do this, which works in the BIOS,
and works after the kernel has started booting, but does not work in the
loader for some reason, so I can't select single user. So I go to single
user by booting multi user and the shutting down. Of course I could use
nextboot, so its just lazyness on my part actually.

Thanks for pointing this out, I immediately jumped to the CARP 
conclusion due to last weeks experiences on the other machine,

but actually this is far more likely to be the issue.

-pete.

PS: apparently I have been playing fast and loose with this - and 
bothering the mailing list about it - since 2005... :-)


http://freebsd.1045724.x6.nabble.com/upgrading-5-4-gt-6-0-without-reinstalling-safe-td3932902.html

Time to change my ways I think!
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: More CARP issues under 12

2019-01-19 Thread Kevin Oberman
On Thu, Jan 17, 2019 at 4:21 AM Pete French 
wrote:

> so, having got a workaround for yesterdays problems, I now went to upgrade
> my
> other pair of boxes using CARP. No 'pf' on these, just one shared address.
> This is the setup I have tested in development and it works fine.
>
> I install the new kenel and do the first reboot - and I get the panic
> below. Maybe its not carp related, but seems suspicious as the last
> thing it spits out is a carp message.
>
> Note this is the first reboot - so 12.0 kernel with the 11 userland, in
> perparation for the installworld step. Machine is booting from, ZFS and
> also
> has a GELI partition contain some data which requires a manual password.
>

To point  out the obvious, booting a 12.0 kernel with 11  userland to
multiuser mode is seriously unsupported. You really need to boot to single
user and install 12.0 userland to really expect things to work. OTOH, while
I would expect that MANY things might not work, panics should not come from
problems in userland. Still, I would not be at all shocked if it turns out
to be coincidental to CARP. I would also not be shocked if this makes no
difference, but even between minor version updates, there can be issues
when the kernel and userland are different versions.

Is there a reason that a standalone boot is not possible? Both installworld
and mergemaster should be run before moving to multiuser mode and reboot is
preferred to exit. In some cases even delete-old can be required before
safely going to multimode.
--
Kevin Oberman, Part time kid herder and retired Network Engineer
E-mail: rkober...@gmail.com
PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


More CARP issues under 12

2019-01-17 Thread Pete French
so, having got a workaround for yesterdays problems, I now went to upgrade my
other pair of boxes using CARP. No 'pf' on these, just one shared address.
This is the setup I have tested in development and it works fine.

I install the new kenel and do the first reboot - and I get the panic
below. Maybe its not carp related, but seems suspicious as the last
thing it spits out is a carp message.

Note this is the first reboot - so 12.0 kernel with the 11 userland, in
perparation for the installworld step. Machine is booting from, ZFS and also
has a GELI partition contain some data which requires a manual password.



Setting hostuuid: 49434d53-0200-9031-2500-31902500d9bf.
Setting hostid: 0xf41a3f2e.
Starting file system checks:
Mounting local filesystems:.
ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib
32-bit compatibility ldconfig path: /usr/lib32
Setting hostname: serpentine-passive.telehouse-internal.ingresso.co.uk.
Setting up harvesting: 
PURE_RDRAND,[UMA],[FS_ATIME],SWI,INTERRUPT,NET_NG,NET_ETHER,NET_TUN,MOUSE,KEYBOARD,ATTACH,CACHED
Feeding entropy: .
lo0: link state changed to UP
em0: promiscuous mode enabled
carp: demoted by 240 to 240 (interface down)


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x28
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80ca0de1
stack pointer   = 0x28:0xfe4da740
frame pointer   = 0x28:0xfe4da760
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 12 (swi4: clock (0))
trap number = 12
panic: page fault
cpuid = 0
time = 1547727391
KDB: stack backtrace:
#0 0x80be8597 at kdb_backtrace+0x67
#1 0x80b9ccf3 at vpanic+0x1a3
#2 0x80b9cb43 at panic+0x43
#3 0x8107382f at trap_fatal+0x35f
#4 0x81073889 at trap_pfault+0x49
#5 0x81072eae at trap+0x29e
#6 0x8104e1a5 at calltrap+0x8
#7 0x80ca0ce6 at ether_output+0x6b6
#8 0x80d0bda4 at arprequest+0x4c4
#9 0x80d0d9fc at garp_rexmit+0xbc
#10 0x80bb6ba9 at softclock_call_cc+0x129
#11 0x80bb7089 at softclock+0x79
#12 0x80b60e79 at ithread_loop+0x169
#13 0x80b5e012 at fork_exit+0x82
#14 0x8104f18e at fork_trampoline+0xe
Uptime: 19s
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"