Re: bridge(4) and IPv6 broken?

2024-01-01 Thread Alexander Leidinger

Am 2024-01-02 00:40, schrieb Lexi Winter:

hello,

i'm having an issue with bridge(4) and IPv6, with a configuration which
is essentially identical to a working system running releng/14.0.

ifconfig:

lo0: flags=1008049 metric 0 mtu 
16384

options=680003
inet 127.0.0.1 netmask 0xff00
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
groups: lo
nd6 options=21
pflog0: flags=1000141 metric 0 mtu 33152
options=0
groups: pflog
alc0: 
flags=1008943 
metric 0 mtu 1500


options=c3098
ether 30:9c:23:a8:89:a0
inet6 fe80::329c:23ff:fea8:89a0%alc0 prefixlen 64 scopeid 0x3
media: Ethernet autoselect (1000baseT )
status: active
nd6 options=1
wg0: flags=10080c1 metric 0 mtu 
1420

options=8
inet 172.16.145.21 netmask 0x
inet6 fd00:0:1337:cafe:::829a:595e prefixlen 128
groups: wg
tunnelfib: 1
nd6 options=101
bridge0: flags=1008843 
metric 0 mtu 1500

options=0
ether 58:9c:fc:10:ff:b6
inet 10.1.4.101 netmask 0xff00 broadcast 10.1.4.255
inet6 2001:8b0:aab5:104:3::101 prefixlen 64
id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
member: tap0 flags=143
ifmaxaddr 0 port 6 priority 128 path cost 200
member: alc0 flags=143
ifmaxaddr 0 port 3 priority 128 path cost 55
groups: bridge
nd6 options=1
tap0: flags=9903 metric 0 
mtu 1500

options=8
ether 58:9c:fc:10:ff:89
groups: tap
media: Ethernet 1000baseT 
status: no carrier
nd6 options=29

the issue is that the bridge doesn't seem to respond to IPv6 ICMP
Neighbour Solicitation.  for example, while running ping, tcpdump shows
this:

23:30:16.567071 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 
(0x86dd), length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: 
ICMP6, echo request, id 34603, seq 13, length 16
23:30:16.634860 1e:ab:48:c1:f6:62 > 33:33:ff:00:01:01, ethertype IPv6 
(0x86dd), length 86: fe80::1cab:48ff:fec1:f662 > ff02::1:ff00:101: 
ICMP6, neighbor solicitation, who has 2001:8b0:aab5:104:3::101, length 
32
23:30:17.567080 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 
(0x86dd), length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: 
ICMP6, echo request, id 34603, seq 14, length 16
23:30:17.674842 1e:ab:48:c1:f6:62 > 33:33:ff:00:01:01, ethertype IPv6 
(0x86dd), length 86: fe80::1cab:48ff:fec1:f662 > ff02::1:ff00:101: 
ICMP6, neighbor solicitation, who has 2001:8b0:aab5:104:3::101, length 
32
23:30:17.936956 1e:ab:48:c1:f6:62 > 33:33:00:00:00:01, ethertype IPv6 
(0x86dd), length 166: fe80::1cab:48ff:fec1:f662 > ff02::1: ICMP6, 
router advertisement, length 112
23:30:18.567093 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 
(0x86dd), length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: 
ICMP6, echo request, id 34603, seq 15, length 16
23:30:19.567104 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 
(0x86dd), length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: 
ICMP6, echo request, id 34603, seq 16, length 16
23:30:19.567529 1e:ab:48:c1:f6:62 > 33:33:ff:00:01:01, ethertype IPv6 
(0x86dd), length 86: fe80::1cab:48ff:fec1:f662 > ff02::1:ff00:101: 
ICMP6, neighbor solicitation, who has 2001:8b0:aab5:104:3::101, length 
32


fe80::1cab:48ff:fec1:f662 is the subnet router; it's sending
solicitations but FreeBSD doesn't send a response,

if i remove alc0 from the bridge and configure the IPv6 address 
directly

on alc0 instead, everything works fine.

i'm testing without any packet filter (ipfw/pf) in the kernel.

it's possible i'm missing something obvious here; does anyone have an
idea?


Just an idea. I'm not sure if it is the right track...

There is code in the kernel which is ignoring NS stuff from "non-valid" 
sources (security / spoofing reasons). The NS request is from a link 
local address. Your bridge has no link local address (and your tap has 
the auto linklocal flag set which I would have expected to be on the 
bridge instead). I'm not sure but I would guess it could be because of 
this.


If my guess is not too far off, I would suggest to try:
 - remove auto linklocal from tap0 (like for alc0)
 - add auto linklocal to bridge0

If this doesn't help, there is the sysctl 
net.inet6.icmp6.nd6_onlink_ns_rfc4861 which you could try to set to 1. 
Please read 
https://www.freebsd.org/security/advisories/FreeBSD-SA-08:10.nd6.asc 
before you do that.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: ZFS problems since recently ?

2024-01-01 Thread Kurt Jaeger
Hi!

> Am 2023-12-31 19:34, schrieb Kurt Jaeger:
> > I already have
> > 
> > vfs.zfs.dmu_offset_next_sync=0
> > 
> > which is supposed to disable block-cloning.
> 
> It isn't. This one is supposed to fix an issue which is unrelated to block
> cloning (but can be amplified by block cloning). This issue is fixed since
> some weeks, your Dec 23 build should not need it (when the issues happens,
> you have files with zero as parts of the data instead of the real data,
> and only if you copy files at the same time as those files are modified,
> and then only if you happen to get the timing right).
> 
> The sysctl for block cloning is vfs.zfs.bclone_enabled.
> To check if a pool has made use of block cloning:
> zpool get all poolname | grep bclone

Thanks. I now used that sysctl and my testcase (testbuild of shells/bash)
and it did not crash.

-- 
p...@freebsd.org +49 171 3101372  Now what ?



Re: ZFS problems since recently ?

2024-01-01 Thread Alexander Leidinger

Am 2023-12-31 19:34, schrieb Kurt Jaeger:

I already have

vfs.zfs.dmu_offset_next_sync=0

which is supposed to disable block-cloning.


It isn't. This one is supposed to fix an issue which is unrelated to 
block cloning (but can be amplified by block cloning). This issue is 
fixed since some weeks, your Dec 23 build should not need it (when the 
issues happens, you have files with zero as parts of the data instead of 
the real data, and only if you copy files at the same time as those 
files are modified, and then only if you happen to get the timing 
right).


The sysctl for block cloning is vfs.zfs.bclone_enabled.
To check if a pool has made use of block cloning:
zpool get all poolname | grep bclone

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: ZFS problems since recently ?

2024-01-01 Thread John Kennedy
On Mon, Jan 01, 2024 at 02:27:17PM +0100, Kurt Jaeger wrote:
> > On Mon, Jan 01, 2024 at 06:43:58AM +0100, Kurt Jaeger wrote:
> > > markj@ pointed me in
> > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276039
> > > to
> > > https://github.com/openzfs/zfs/pull/15719 
> > > 
> > > So it will probably be fixed sooner or later.
> > > 
> > > The other ZFS crashes I've seen are still an issue.
> > 
> >   My poudriere build did eventually fail as well:
> > ...
> > [05:40:24] [01] [00:17:20] Finished devel/gdb@py39 | gdb-13.2_1: Success
> > [05:40:24] Stopping 2 builders
> > panic: VERIFY(BP_GET_DEDUP(bp)) failed
> 
> That's one of the panic messages I had as well.
> 
> See
> 
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276051
> 
> for additional crashes and dumps.
> 
> >   I didn't tweak this system off defaults for block-cloning.  I haven't 
> > been following
> > that issue 100%.
> 
> Do you have
>   vfs.zfs.dmu_offset_next_sync=0
> ?

  I reverted everything and reinstalled.  The VERIFY(BP_GET_DEDUP(bp)) panic
hasn't reoccurred (tended to happen on poudriere-build cleanup), which may
lean it more towards corruption, or maybe I just haven't been "lucky" with my
small random chance of corruption.

  I did set vfs.zfs.dmu_offset_next_sync=0 after the bsdinstall was complete
(maybe I could have loaded the zfs kernel module from the shell and set it
before things kicked off).




bridge(4) and IPv6 broken?

2024-01-01 Thread Lexi Winter
hello,

i'm having an issue with bridge(4) and IPv6, with a configuration which 
is essentially identical to a working system running releng/14.0.

ifconfig:

lo0: flags=1008049 metric 0 mtu 16384
options=680003
inet 127.0.0.1 netmask 0xff00
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
groups: lo
nd6 options=21
pflog0: flags=1000141 metric 0 mtu 33152
options=0
groups: pflog
alc0: flags=1008943 
metric 0 mtu 1500

options=c3098
ether 30:9c:23:a8:89:a0
inet6 fe80::329c:23ff:fea8:89a0%alc0 prefixlen 64 scopeid 0x3
media: Ethernet autoselect (1000baseT )
status: active
nd6 options=1
wg0: flags=10080c1 metric 0 mtu 1420
options=8
inet 172.16.145.21 netmask 0x
inet6 fd00:0:1337:cafe:::829a:595e prefixlen 128
groups: wg
tunnelfib: 1
nd6 options=101
bridge0: flags=1008843 metric 
0 mtu 1500
options=0
ether 58:9c:fc:10:ff:b6
inet 10.1.4.101 netmask 0xff00 broadcast 10.1.4.255
inet6 2001:8b0:aab5:104:3::101 prefixlen 64
id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
member: tap0 flags=143
ifmaxaddr 0 port 6 priority 128 path cost 200
member: alc0 flags=143
ifmaxaddr 0 port 3 priority 128 path cost 55
groups: bridge
nd6 options=1
tap0: flags=9903 metric 0 mtu 1500
options=8
ether 58:9c:fc:10:ff:89
groups: tap
media: Ethernet 1000baseT 
status: no carrier
nd6 options=29

the issue is that the bridge doesn't seem to respond to IPv6 ICMP
Neighbour Solicitation.  for example, while running ping, tcpdump shows
this:

23:30:16.567071 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 (0x86dd), 
length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: ICMP6, echo 
request, id 34603, seq 13, length 16
23:30:16.634860 1e:ab:48:c1:f6:62 > 33:33:ff:00:01:01, ethertype IPv6 (0x86dd), 
length 86: fe80::1cab:48ff:fec1:f662 > ff02::1:ff00:101: ICMP6, neighbor 
solicitation, who has 2001:8b0:aab5:104:3::101, length 32
23:30:17.567080 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 (0x86dd), 
length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: ICMP6, echo 
request, id 34603, seq 14, length 16
23:30:17.674842 1e:ab:48:c1:f6:62 > 33:33:ff:00:01:01, ethertype IPv6 (0x86dd), 
length 86: fe80::1cab:48ff:fec1:f662 > ff02::1:ff00:101: ICMP6, neighbor 
solicitation, who has 2001:8b0:aab5:104:3::101, length 32
23:30:17.936956 1e:ab:48:c1:f6:62 > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), 
length 166: fe80::1cab:48ff:fec1:f662 > ff02::1: ICMP6, router advertisement, 
length 112
23:30:18.567093 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 (0x86dd), 
length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: ICMP6, echo 
request, id 34603, seq 15, length 16
23:30:19.567104 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 (0x86dd), 
length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: ICMP6, echo 
request, id 34603, seq 16, length 16
23:30:19.567529 1e:ab:48:c1:f6:62 > 33:33:ff:00:01:01, ethertype IPv6 (0x86dd), 
length 86: fe80::1cab:48ff:fec1:f662 > ff02::1:ff00:101: ICMP6, neighbor 
solicitation, who has 2001:8b0:aab5:104:3::101, length 32

fe80::1cab:48ff:fec1:f662 is the subnet router; it's sending
solicitations but FreeBSD doesn't send a response,

if i remove alc0 from the bridge and configure the IPv6 address directly
on alc0 instead, everything works fine.

i'm testing without any packet filter (ipfw/pf) in the kernel.

it's possible i'm missing something obvious here; does anyone have an
idea?

kernel is: FreeBSD ilythia.eden.le-fay.org 15.0-CURRENT FreeBSD
15.0-CURRENT #3 main-n267318-1b8d70b2eb71: Sat Dec 30 11:36:42 GMT 2023
l...@ilythia.eden.le-fay.org:/src/main/sys/amd64/compile/ILYTHIA amd64

thanks, lexi.



signature.asc
Description: PGP signature


Re: ZFS problems since recently ?

2024-01-01 Thread John Kennedy
On Mon, Jan 01, 2024 at 08:42:26AM -0800, John Kennedy wrote:
>   Applying the two ZFS kernel patches fixes that issue:

commit 09af4bf2c987f6f57804162cef8aeee05575ad1d (zfs: Fix SPA sysctl handlers) 
landed too.

root@bsd15:~ # sysctl -a | grep vfs.zfs.zio
vfs.zfs.zio.deadman_log_all: 0
vfs.zfs.zio.dva_throttle_enabled: 1
vfs.zfs.zio.requeue_io_start_cut_in_line: 1
vfs.zfs.zio.slow_io_ms: 3
vfs.zfs.zio.taskq_wr_iss_ncpus: 0
vfs.zfs.zio.taskq_write: sync fixed,1,5 scale fixed,1,5
vfs.zfs.zio.taskq_read: fixed,1,8 null scale null
vfs.zfs.zio.taskq_batch_tpq: 0
vfs.zfs.zio.taskq_batch_pct: 80
vfs.zfs.zio.exclude_metadata: 0

root@bsd15:~ # uname -aUK
FreeBSD bsd15 15.0-CURRENT FreeBSD 15.0-CURRENT #1 
main-n267336-09af4bf2c98: Mon Jan  1 12:04:15 PST 2024 
warlock@bsd15:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 158 158




Re: ZFS problems since recently ?

2024-01-01 Thread John Kennedy
On Mon, Jan 01, 2024 at 06:43:58AM +0100, Kurt Jaeger wrote:
> > >   I can crash mine with "sysctl -a" as well.
> 
> markj@ pointed me in
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276039
> to
> https://github.com/openzfs/zfs/pull/15719 
> 
> So it will probably be fixed sooner or later.
> 
> The other ZFS crashes I've seen are still an issue.

  Applying the two ZFS kernel patches fixes that issue:

root@bsd15:~ # sysctl -a | grep vfs.zfs.zio
vfs.zfs.zio.deadman_log_all: 0
vfs.zfs.zio.dva_throttle_enabled: 1
vfs.zfs.zio.requeue_io_start_cut_in_line: 1
vfs.zfs.zio.slow_io_ms: 3
vfs.zfs.zio.taskq_wr_iss_ncpus: 0
vfs.zfs.zio.taskq_write: sync fixed,1,5 scale fixed,1,5
vfs.zfs.zio.taskq_read: fixed,1,8 null scale null
vfs.zfs.zio.taskq_batch_tpq: 0
vfs.zfs.zio.taskq_batch_pct: 80
vfs.zfs.zio.exclude_metadata: 0

root@bsd15:~ # uname -aUK
FreeBSD bsd15 15.0-CURRENT FreeBSD 15.0-CURRENT #2 
main-n267335-499e84e16f5-dirty: Mon Jan  1 08:04:59 PST 2024 
warlock@bsd15:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 158 158




Re: ZFS problems since recently ?

2024-01-01 Thread John Kennedy
On Mon, Jan 01, 2024 at 02:27:17PM +0100, Kurt Jaeger wrote:
> Do you have
>vfs.zfs.dmu_offset_next_sync=0

  I didn't initially, I do now.  Like I said, I haven't been following that one
100%.  I know it isn't block-clone per say, so much as some underlying problem
it pokes with a pointy stick.  Small chance multiplied by a bunch of ZFS IOPS.

  Seems like I'd have to revert it all the way back to fresh install if I want
to get rid of all potential corruption unrelated to sysctl panic.

  But I'll do myh busy-work cycle (*) with that one and maybe another with it
off and see what happens.


  * full kernel+world, plus my local poudriere package build, currenly wedged
a bit with the heimdall build issue.



Re: ZFS problems since recently ?

2024-01-01 Thread Kurt Jaeger
Hi!

> On Mon, Jan 01, 2024 at 06:43:58AM +0100, Kurt Jaeger wrote:
> > markj@ pointed me in
> > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276039
> > to
> > https://github.com/openzfs/zfs/pull/15719 
> > 
> > So it will probably be fixed sooner or later.
> > 
> > The other ZFS crashes I've seen are still an issue.
> 
>   My poudriere build did eventually fail as well:
>   ...
>   [05:40:24] [01] [00:17:20] Finished devel/gdb@py39 | gdb-13.2_1: Success
>   [05:40:24] Stopping 2 builders
>   panic: VERIFY(BP_GET_DEDUP(bp)) failed

That's one of the panic messages I had as well.

See

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276051

for additional crashes and dumps.

>   I didn't tweak this system off defaults for block-cloning.  I haven't been 
> following
> that issue 100%.

Do you have

vfs.zfs.dmu_offset_next_sync=0

?

-- 
p...@freebsd.org +49 171 3101372  Now what ?



Re: ZFS problems since recently ?

2024-01-01 Thread John Kennedy
On Mon, Jan 01, 2024 at 06:43:58AM +0100, Kurt Jaeger wrote:
> markj@ pointed me in
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276039
> to
> https://github.com/openzfs/zfs/pull/15719 
> 
> So it will probably be fixed sooner or later.
> 
> The other ZFS crashes I've seen are still an issue.

  My poudriere build did eventually fail as well:

...
[05:40:24] [01] [00:17:20] Finished devel/gdb@py39 | gdb-13.2_1: Success
[05:40:24] Stopping 2 builders
panic: VERIFY(BP_GET_DEDUP(bp)) failed

cpuid = 2
time = 1704091946
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe00f62898c0
vpanic() at vpanic+0x131/frame 0xfe00f62899f0
spl_panic() at spl_panic+0x3a/frame 0xfe00f6289a50
dsl_livelist_iterate() at dsl_livelist_iterate+0x2de/frame 
0xfe00f6289b30
bpobj_iterate_blkptrs() at bpobj_iterate_blkptrs+0x235/frame 
0xfe00f6289bf0
bpobj_iterate_impl() at bpobj_iterate_impl+0x16e/frame 
0xfe00f6289c80
dsl_process_sub_livelist() at dsl_process_sub_livelist+0x5c/frame 
0xfe00f6289d00
spa_livelist_delete_cb() at spa_livelist_delete_cb+0xf6/frame 
0xfe00f6289ea0
zthr_procedure() at zthr_procedure+0xa5/frame 0xfe00f6289ef0
fork_exit() at fork_exit+0x82/frame 0xfe00f6289f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfe00f6289f30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 9 tid 100223 ]
Stopped at  kdb_enter+0x33: movq$0,0xe3a582(%rip)
db>

  Trying to do another poudriere build fails almost immediatly with that verify 
error.

  Your verify errors don't match up exactly.  I've got snapshots from before I 
started
freaking it out with the sysctl calls and possibly inducing corruption.

  I didn't tweak this system off defaults for block-cloning.  I haven't been 
following
that issue 100%.