Re: bridge(4) and IPv6 broken?
Am 2024-01-02 00:40, schrieb Lexi Winter: hello, i'm having an issue with bridge(4) and IPv6, with a configuration which is essentially identical to a working system running releng/14.0. ifconfig: lo0: flags=1008049 metric 0 mtu 16384 options=680003 inet 127.0.0.1 netmask 0xff00 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 groups: lo nd6 options=21 pflog0: flags=1000141 metric 0 mtu 33152 options=0 groups: pflog alc0: flags=1008943 metric 0 mtu 1500 options=c3098 ether 30:9c:23:a8:89:a0 inet6 fe80::329c:23ff:fea8:89a0%alc0 prefixlen 64 scopeid 0x3 media: Ethernet autoselect (1000baseT ) status: active nd6 options=1 wg0: flags=10080c1 metric 0 mtu 1420 options=8 inet 172.16.145.21 netmask 0x inet6 fd00:0:1337:cafe:::829a:595e prefixlen 128 groups: wg tunnelfib: 1 nd6 options=101 bridge0: flags=1008843 metric 0 mtu 1500 options=0 ether 58:9c:fc:10:ff:b6 inet 10.1.4.101 netmask 0xff00 broadcast 10.1.4.255 inet6 2001:8b0:aab5:104:3::101 prefixlen 64 id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200 root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 member: tap0 flags=143 ifmaxaddr 0 port 6 priority 128 path cost 200 member: alc0 flags=143 ifmaxaddr 0 port 3 priority 128 path cost 55 groups: bridge nd6 options=1 tap0: flags=9903 metric 0 mtu 1500 options=8 ether 58:9c:fc:10:ff:89 groups: tap media: Ethernet 1000baseT status: no carrier nd6 options=29 the issue is that the bridge doesn't seem to respond to IPv6 ICMP Neighbour Solicitation. for example, while running ping, tcpdump shows this: 23:30:16.567071 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 (0x86dd), length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: ICMP6, echo request, id 34603, seq 13, length 16 23:30:16.634860 1e:ab:48:c1:f6:62 > 33:33:ff:00:01:01, ethertype IPv6 (0x86dd), length 86: fe80::1cab:48ff:fec1:f662 > ff02::1:ff00:101: ICMP6, neighbor solicitation, who has 2001:8b0:aab5:104:3::101, length 32 23:30:17.567080 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 (0x86dd), length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: ICMP6, echo request, id 34603, seq 14, length 16 23:30:17.674842 1e:ab:48:c1:f6:62 > 33:33:ff:00:01:01, ethertype IPv6 (0x86dd), length 86: fe80::1cab:48ff:fec1:f662 > ff02::1:ff00:101: ICMP6, neighbor solicitation, who has 2001:8b0:aab5:104:3::101, length 32 23:30:17.936956 1e:ab:48:c1:f6:62 > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), length 166: fe80::1cab:48ff:fec1:f662 > ff02::1: ICMP6, router advertisement, length 112 23:30:18.567093 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 (0x86dd), length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: ICMP6, echo request, id 34603, seq 15, length 16 23:30:19.567104 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 (0x86dd), length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: ICMP6, echo request, id 34603, seq 16, length 16 23:30:19.567529 1e:ab:48:c1:f6:62 > 33:33:ff:00:01:01, ethertype IPv6 (0x86dd), length 86: fe80::1cab:48ff:fec1:f662 > ff02::1:ff00:101: ICMP6, neighbor solicitation, who has 2001:8b0:aab5:104:3::101, length 32 fe80::1cab:48ff:fec1:f662 is the subnet router; it's sending solicitations but FreeBSD doesn't send a response, if i remove alc0 from the bridge and configure the IPv6 address directly on alc0 instead, everything works fine. i'm testing without any packet filter (ipfw/pf) in the kernel. it's possible i'm missing something obvious here; does anyone have an idea? Just an idea. I'm not sure if it is the right track... There is code in the kernel which is ignoring NS stuff from "non-valid" sources (security / spoofing reasons). The NS request is from a link local address. Your bridge has no link local address (and your tap has the auto linklocal flag set which I would have expected to be on the bridge instead). I'm not sure but I would guess it could be because of this. If my guess is not too far off, I would suggest to try: - remove auto linklocal from tap0 (like for alc0) - add auto linklocal to bridge0 If this doesn't help, there is the sysctl net.inet6.icmp6.nd6_onlink_ns_rfc4861 which you could try to set to 1. Please read https://www.freebsd.org/security/advisories/FreeBSD-SA-08:10.nd6.asc before you do that. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: ZFS problems since recently ?
Hi! > Am 2023-12-31 19:34, schrieb Kurt Jaeger: > > I already have > > > > vfs.zfs.dmu_offset_next_sync=0 > > > > which is supposed to disable block-cloning. > > It isn't. This one is supposed to fix an issue which is unrelated to block > cloning (but can be amplified by block cloning). This issue is fixed since > some weeks, your Dec 23 build should not need it (when the issues happens, > you have files with zero as parts of the data instead of the real data, > and only if you copy files at the same time as those files are modified, > and then only if you happen to get the timing right). > > The sysctl for block cloning is vfs.zfs.bclone_enabled. > To check if a pool has made use of block cloning: > zpool get all poolname | grep bclone Thanks. I now used that sysctl and my testcase (testbuild of shells/bash) and it did not crash. -- p...@freebsd.org +49 171 3101372 Now what ?
Re: ZFS problems since recently ?
Am 2023-12-31 19:34, schrieb Kurt Jaeger: I already have vfs.zfs.dmu_offset_next_sync=0 which is supposed to disable block-cloning. It isn't. This one is supposed to fix an issue which is unrelated to block cloning (but can be amplified by block cloning). This issue is fixed since some weeks, your Dec 23 build should not need it (when the issues happens, you have files with zero as parts of the data instead of the real data, and only if you copy files at the same time as those files are modified, and then only if you happen to get the timing right). The sysctl for block cloning is vfs.zfs.bclone_enabled. To check if a pool has made use of block cloning: zpool get all poolname | grep bclone Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: ZFS problems since recently ?
On Mon, Jan 01, 2024 at 02:27:17PM +0100, Kurt Jaeger wrote: > > On Mon, Jan 01, 2024 at 06:43:58AM +0100, Kurt Jaeger wrote: > > > markj@ pointed me in > > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276039 > > > to > > > https://github.com/openzfs/zfs/pull/15719 > > > > > > So it will probably be fixed sooner or later. > > > > > > The other ZFS crashes I've seen are still an issue. > > > > My poudriere build did eventually fail as well: > > ... > > [05:40:24] [01] [00:17:20] Finished devel/gdb@py39 | gdb-13.2_1: Success > > [05:40:24] Stopping 2 builders > > panic: VERIFY(BP_GET_DEDUP(bp)) failed > > That's one of the panic messages I had as well. > > See > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276051 > > for additional crashes and dumps. > > > I didn't tweak this system off defaults for block-cloning. I haven't > > been following > > that issue 100%. > > Do you have > vfs.zfs.dmu_offset_next_sync=0 > ? I reverted everything and reinstalled. The VERIFY(BP_GET_DEDUP(bp)) panic hasn't reoccurred (tended to happen on poudriere-build cleanup), which may lean it more towards corruption, or maybe I just haven't been "lucky" with my small random chance of corruption. I did set vfs.zfs.dmu_offset_next_sync=0 after the bsdinstall was complete (maybe I could have loaded the zfs kernel module from the shell and set it before things kicked off).
bridge(4) and IPv6 broken?
hello, i'm having an issue with bridge(4) and IPv6, with a configuration which is essentially identical to a working system running releng/14.0. ifconfig: lo0: flags=1008049 metric 0 mtu 16384 options=680003 inet 127.0.0.1 netmask 0xff00 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 groups: lo nd6 options=21 pflog0: flags=1000141 metric 0 mtu 33152 options=0 groups: pflog alc0: flags=1008943 metric 0 mtu 1500 options=c3098 ether 30:9c:23:a8:89:a0 inet6 fe80::329c:23ff:fea8:89a0%alc0 prefixlen 64 scopeid 0x3 media: Ethernet autoselect (1000baseT ) status: active nd6 options=1 wg0: flags=10080c1 metric 0 mtu 1420 options=8 inet 172.16.145.21 netmask 0x inet6 fd00:0:1337:cafe:::829a:595e prefixlen 128 groups: wg tunnelfib: 1 nd6 options=101 bridge0: flags=1008843 metric 0 mtu 1500 options=0 ether 58:9c:fc:10:ff:b6 inet 10.1.4.101 netmask 0xff00 broadcast 10.1.4.255 inet6 2001:8b0:aab5:104:3::101 prefixlen 64 id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200 root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 member: tap0 flags=143 ifmaxaddr 0 port 6 priority 128 path cost 200 member: alc0 flags=143 ifmaxaddr 0 port 3 priority 128 path cost 55 groups: bridge nd6 options=1 tap0: flags=9903 metric 0 mtu 1500 options=8 ether 58:9c:fc:10:ff:89 groups: tap media: Ethernet 1000baseT status: no carrier nd6 options=29 the issue is that the bridge doesn't seem to respond to IPv6 ICMP Neighbour Solicitation. for example, while running ping, tcpdump shows this: 23:30:16.567071 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 (0x86dd), length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: ICMP6, echo request, id 34603, seq 13, length 16 23:30:16.634860 1e:ab:48:c1:f6:62 > 33:33:ff:00:01:01, ethertype IPv6 (0x86dd), length 86: fe80::1cab:48ff:fec1:f662 > ff02::1:ff00:101: ICMP6, neighbor solicitation, who has 2001:8b0:aab5:104:3::101, length 32 23:30:17.567080 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 (0x86dd), length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: ICMP6, echo request, id 34603, seq 14, length 16 23:30:17.674842 1e:ab:48:c1:f6:62 > 33:33:ff:00:01:01, ethertype IPv6 (0x86dd), length 86: fe80::1cab:48ff:fec1:f662 > ff02::1:ff00:101: ICMP6, neighbor solicitation, who has 2001:8b0:aab5:104:3::101, length 32 23:30:17.936956 1e:ab:48:c1:f6:62 > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), length 166: fe80::1cab:48ff:fec1:f662 > ff02::1: ICMP6, router advertisement, length 112 23:30:18.567093 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 (0x86dd), length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: ICMP6, echo request, id 34603, seq 15, length 16 23:30:19.567104 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 (0x86dd), length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: ICMP6, echo request, id 34603, seq 16, length 16 23:30:19.567529 1e:ab:48:c1:f6:62 > 33:33:ff:00:01:01, ethertype IPv6 (0x86dd), length 86: fe80::1cab:48ff:fec1:f662 > ff02::1:ff00:101: ICMP6, neighbor solicitation, who has 2001:8b0:aab5:104:3::101, length 32 fe80::1cab:48ff:fec1:f662 is the subnet router; it's sending solicitations but FreeBSD doesn't send a response, if i remove alc0 from the bridge and configure the IPv6 address directly on alc0 instead, everything works fine. i'm testing without any packet filter (ipfw/pf) in the kernel. it's possible i'm missing something obvious here; does anyone have an idea? kernel is: FreeBSD ilythia.eden.le-fay.org 15.0-CURRENT FreeBSD 15.0-CURRENT #3 main-n267318-1b8d70b2eb71: Sat Dec 30 11:36:42 GMT 2023 l...@ilythia.eden.le-fay.org:/src/main/sys/amd64/compile/ILYTHIA amd64 thanks, lexi. signature.asc Description: PGP signature
Re: ZFS problems since recently ?
On Mon, Jan 01, 2024 at 08:42:26AM -0800, John Kennedy wrote: > Applying the two ZFS kernel patches fixes that issue: commit 09af4bf2c987f6f57804162cef8aeee05575ad1d (zfs: Fix SPA sysctl handlers) landed too. root@bsd15:~ # sysctl -a | grep vfs.zfs.zio vfs.zfs.zio.deadman_log_all: 0 vfs.zfs.zio.dva_throttle_enabled: 1 vfs.zfs.zio.requeue_io_start_cut_in_line: 1 vfs.zfs.zio.slow_io_ms: 3 vfs.zfs.zio.taskq_wr_iss_ncpus: 0 vfs.zfs.zio.taskq_write: sync fixed,1,5 scale fixed,1,5 vfs.zfs.zio.taskq_read: fixed,1,8 null scale null vfs.zfs.zio.taskq_batch_tpq: 0 vfs.zfs.zio.taskq_batch_pct: 80 vfs.zfs.zio.exclude_metadata: 0 root@bsd15:~ # uname -aUK FreeBSD bsd15 15.0-CURRENT FreeBSD 15.0-CURRENT #1 main-n267336-09af4bf2c98: Mon Jan 1 12:04:15 PST 2024 warlock@bsd15:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 158 158
Re: ZFS problems since recently ?
On Mon, Jan 01, 2024 at 06:43:58AM +0100, Kurt Jaeger wrote: > > > I can crash mine with "sysctl -a" as well. > > markj@ pointed me in > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276039 > to > https://github.com/openzfs/zfs/pull/15719 > > So it will probably be fixed sooner or later. > > The other ZFS crashes I've seen are still an issue. Applying the two ZFS kernel patches fixes that issue: root@bsd15:~ # sysctl -a | grep vfs.zfs.zio vfs.zfs.zio.deadman_log_all: 0 vfs.zfs.zio.dva_throttle_enabled: 1 vfs.zfs.zio.requeue_io_start_cut_in_line: 1 vfs.zfs.zio.slow_io_ms: 3 vfs.zfs.zio.taskq_wr_iss_ncpus: 0 vfs.zfs.zio.taskq_write: sync fixed,1,5 scale fixed,1,5 vfs.zfs.zio.taskq_read: fixed,1,8 null scale null vfs.zfs.zio.taskq_batch_tpq: 0 vfs.zfs.zio.taskq_batch_pct: 80 vfs.zfs.zio.exclude_metadata: 0 root@bsd15:~ # uname -aUK FreeBSD bsd15 15.0-CURRENT FreeBSD 15.0-CURRENT #2 main-n267335-499e84e16f5-dirty: Mon Jan 1 08:04:59 PST 2024 warlock@bsd15:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 158 158
Re: ZFS problems since recently ?
On Mon, Jan 01, 2024 at 02:27:17PM +0100, Kurt Jaeger wrote: > Do you have >vfs.zfs.dmu_offset_next_sync=0 I didn't initially, I do now. Like I said, I haven't been following that one 100%. I know it isn't block-clone per say, so much as some underlying problem it pokes with a pointy stick. Small chance multiplied by a bunch of ZFS IOPS. Seems like I'd have to revert it all the way back to fresh install if I want to get rid of all potential corruption unrelated to sysctl panic. But I'll do myh busy-work cycle (*) with that one and maybe another with it off and see what happens. * full kernel+world, plus my local poudriere package build, currenly wedged a bit with the heimdall build issue.
Re: ZFS problems since recently ?
Hi! > On Mon, Jan 01, 2024 at 06:43:58AM +0100, Kurt Jaeger wrote: > > markj@ pointed me in > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276039 > > to > > https://github.com/openzfs/zfs/pull/15719 > > > > So it will probably be fixed sooner or later. > > > > The other ZFS crashes I've seen are still an issue. > > My poudriere build did eventually fail as well: > ... > [05:40:24] [01] [00:17:20] Finished devel/gdb@py39 | gdb-13.2_1: Success > [05:40:24] Stopping 2 builders > panic: VERIFY(BP_GET_DEDUP(bp)) failed That's one of the panic messages I had as well. See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276051 for additional crashes and dumps. > I didn't tweak this system off defaults for block-cloning. I haven't been > following > that issue 100%. Do you have vfs.zfs.dmu_offset_next_sync=0 ? -- p...@freebsd.org +49 171 3101372 Now what ?
Re: ZFS problems since recently ?
On Mon, Jan 01, 2024 at 06:43:58AM +0100, Kurt Jaeger wrote: > markj@ pointed me in > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276039 > to > https://github.com/openzfs/zfs/pull/15719 > > So it will probably be fixed sooner or later. > > The other ZFS crashes I've seen are still an issue. My poudriere build did eventually fail as well: ... [05:40:24] [01] [00:17:20] Finished devel/gdb@py39 | gdb-13.2_1: Success [05:40:24] Stopping 2 builders panic: VERIFY(BP_GET_DEDUP(bp)) failed cpuid = 2 time = 1704091946 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00f62898c0 vpanic() at vpanic+0x131/frame 0xfe00f62899f0 spl_panic() at spl_panic+0x3a/frame 0xfe00f6289a50 dsl_livelist_iterate() at dsl_livelist_iterate+0x2de/frame 0xfe00f6289b30 bpobj_iterate_blkptrs() at bpobj_iterate_blkptrs+0x235/frame 0xfe00f6289bf0 bpobj_iterate_impl() at bpobj_iterate_impl+0x16e/frame 0xfe00f6289c80 dsl_process_sub_livelist() at dsl_process_sub_livelist+0x5c/frame 0xfe00f6289d00 spa_livelist_delete_cb() at spa_livelist_delete_cb+0xf6/frame 0xfe00f6289ea0 zthr_procedure() at zthr_procedure+0xa5/frame 0xfe00f6289ef0 fork_exit() at fork_exit+0x82/frame 0xfe00f6289f30 fork_trampoline() at fork_trampoline+0xe/frame 0xfe00f6289f30 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic [ thread pid 9 tid 100223 ] Stopped at kdb_enter+0x33: movq$0,0xe3a582(%rip) db> Trying to do another poudriere build fails almost immediatly with that verify error. Your verify errors don't match up exactly. I've got snapshots from before I started freaking it out with the sysctl calls and possibly inducing corruption. I didn't tweak this system off defaults for block-cloning. I haven't been following that issue 100%.