Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)

2021-05-23 Thread Mark Millard via freebsd-stable
On 2021-May-23, at 01:27, Mark Millard  wrote:

> On 2021-May-23, at 00:44, Mark Millard  wrote:
> 
>> On 2021-May-21, at 17:56, Rick Macklem  wrote:
>> 
>>> Mark Millard wrote:
>>> [stuff snipped]
 Well, why is it that ls -R, find, and diff -r all get file
 name problems via genet0 but diff -r gets no problems
 comparing the content of files that it does match up (the
 vast majority)? Any clue how could the problems possibly
 be unique to the handling of file names/paths? Does it
 suggest anything else to look into for getting some more
 potentially useful evidence?
>>> Well, all I can do is describe the most common TSO related
>>> failure:
>>> - When a read RPC reply (including NFS/RPC/TCP/IP headers)
>>> is slightly less than 64K bytes (many TSO implementations are
>>> limited to 64K or 32 discontiguous segments, think 32 2K
>>> mbuf clusters), the driver decides it is ok, but when the MAC
>>> header is added it exceeds what the hardware can handle correctly...
>>> --> This will happen when reading a regular file that is slightly less
>>> than a multiple of 64K in size.
>>> or
>>> --> This will happen when reading just about any large directory,
>>>since the directory reply for a 64K request is converted to Sun XDR
>>>format and clipped at the last full directory entry that will fit within 
>>> 64K.
>>> For ports, where most files are small, I think you can tell which is more
>>> likely to happen.
>>> --> If TSO is disabled, I have no idea how this might matter, but??
>>> 
 I'll note that netstat -I ue0 -d and netstat -I genet0 -d
 do not report changes in Ierrs or Idrop in a before vs.
 after failures comparison. (There may be better figures
 to look at for all I know.)
 
 I tried "ifconfig genet0 -rxcsum -rxcsum -rxcsum6 -txcsum6"
 and got no obvious change in behavior.
>>> All we know is that the data is getting corrupted somehow.
>>> 
>>> NFS traffic looks very different than typical TCP traffic. It is
>>> mostly small messages travelling in both directions concurrently,
>>> with some large messages thrown in the mix.
>>> All I'm saying is that, testing a net interface with something like
>>> bulk data transfer in one direction doesn't verify it works for NFS
>>> traffic.
>>> 
>>> Also, the large RPC messages are a chain of about 33 mbufs of
>>> various lengths, including a mix of partial clusters and regular
>>> data mbufs, whereas a bulk send on a socket will typically
>>> result in an mbuf chain of a lot of full 2K clusters.
>>> --> As such, NFS can be good at tickling subtle bugs it the
>>>net driver related to mbuf handling.
>>> 
>>> rick
>>> 
> W.r.t. reverting r367492...the patch to replace r367492 was just
> committed to "main" by rscheff@ with a two week MFC, so it
> should be in stable/13 soon. Not sure if an errata can be done
> for it for releng13.0?
 
 That update is reported to be causing "rack" related panics:
 
 https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-May/004440.html
 
 reports (via links):
 
 panic: _mtx_lock_sleep: recursed on non-recursive mutex so_snd @ 
 /syzkaller/managers/i386/kernel/sys/modules/tcp/rack/../../../netinet/tcp_stacks/rack.c:10632
 
 Still, I have a non-debug update to main building and will
 likely do a debug build as well. llvm is rebuilding, so
 the builds will take a notable time.
>> 
>> I got the following built and installed on the two
>> machines:
>> 
>> # uname -apKU
>> FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 
>> main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 
>> root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-DBG-CA72
>>   arm64 aarch64 1400013 1400013
>> 
>> # uname -apKU
>> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 
>> main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 
>> root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-DBG-CA72
>>   arm64 aarch64 1400013 1400013
>> 
>> Note that both are booted with debug builds of main.
>> 
>> Using the context with the alternate EtherNet device that has not
>> had an associated diff -r, find, pr ls -R failure yet
>> yet got a panic that looks likely to be unrelated:
>> 
>> # mount -onoatime 192.168.1.187:/usr/ports/ /mnt/
>> # diff -r /usr/ports/ /mnt/ | more
>> nvme0: cpl does not map to outstanding cmd
>> cdw0: sqhd:0020 sqid:0003 cid:007e p:1 sc:00 sct:0 m:0 dnr:0
>> panic: received completion for unknown cmd
>> cpuid = 3
>> time = 1621743752
>> KDB: stack backtrace:
>> db_trace_self() at db_trace_self
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x30
>> vpanic() at vpanic+0x188
>> panic() at panic+0x44
>> nvme_qpair_process_completions() at nvme_qpair_process_completions+0x1fc
>> nvme_timeout() at nvme_timeout+0x3c
>> softclock_call_cc() at softclock_call_cc+0x124
>> softclock() 

Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)

2021-05-23 Thread Mark Millard via freebsd-stable
On 2021-May-23, at 00:44, Mark Millard  wrote:

> On 2021-May-21, at 17:56, Rick Macklem  wrote:
> 
>> Mark Millard wrote:
>> [stuff snipped]
>>> Well, why is it that ls -R, find, and diff -r all get file
>>> name problems via genet0 but diff -r gets no problems
>>> comparing the content of files that it does match up (the
>>> vast majority)? Any clue how could the problems possibly
>>> be unique to the handling of file names/paths? Does it
>>> suggest anything else to look into for getting some more
>>> potentially useful evidence?
>> Well, all I can do is describe the most common TSO related
>> failure:
>> - When a read RPC reply (including NFS/RPC/TCP/IP headers)
>> is slightly less than 64K bytes (many TSO implementations are
>> limited to 64K or 32 discontiguous segments, think 32 2K
>> mbuf clusters), the driver decides it is ok, but when the MAC
>> header is added it exceeds what the hardware can handle correctly...
>> --> This will happen when reading a regular file that is slightly less
>>  than a multiple of 64K in size.
>> or
>> --> This will happen when reading just about any large directory,
>> since the directory reply for a 64K request is converted to Sun XDR
>> format and clipped at the last full directory entry that will fit within 
>> 64K.
>> For ports, where most files are small, I think you can tell which is more
>> likely to happen.
>> --> If TSO is disabled, I have no idea how this might matter, but??
>> 
>>> I'll note that netstat -I ue0 -d and netstat -I genet0 -d
>>> do not report changes in Ierrs or Idrop in a before vs.
>>> after failures comparison. (There may be better figures
>>> to look at for all I know.)
>>> 
>>> I tried "ifconfig genet0 -rxcsum -rxcsum -rxcsum6 -txcsum6"
>>> and got no obvious change in behavior.
>> All we know is that the data is getting corrupted somehow.
>> 
>> NFS traffic looks very different than typical TCP traffic. It is
>> mostly small messages travelling in both directions concurrently,
>> with some large messages thrown in the mix.
>> All I'm saying is that, testing a net interface with something like
>> bulk data transfer in one direction doesn't verify it works for NFS
>> traffic.
>> 
>> Also, the large RPC messages are a chain of about 33 mbufs of
>> various lengths, including a mix of partial clusters and regular
>> data mbufs, whereas a bulk send on a socket will typically
>> result in an mbuf chain of a lot of full 2K clusters.
>> --> As such, NFS can be good at tickling subtle bugs it the
>> net driver related to mbuf handling.
>> 
>> rick
>> 
 W.r.t. reverting r367492...the patch to replace r367492 was just
 committed to "main" by rscheff@ with a two week MFC, so it
 should be in stable/13 soon. Not sure if an errata can be done
 for it for releng13.0?
>>> 
>>> That update is reported to be causing "rack" related panics:
>>> 
>>> https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-May/004440.html
>>> 
>>> reports (via links):
>>> 
>>> panic: _mtx_lock_sleep: recursed on non-recursive mutex so_snd @ 
>>> /syzkaller/managers/i386/kernel/sys/modules/tcp/rack/../../../netinet/tcp_stacks/rack.c:10632
>>> 
>>> Still, I have a non-debug update to main building and will
>>> likely do a debug build as well. llvm is rebuilding, so
>>> the builds will take a notable time.
> 
> I got the following built and installed on the two
> machines:
> 
> # uname -apKU
> FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 
> main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 
> root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-DBG-CA72
>   arm64 aarch64 1400013 1400013
> 
> # uname -apKU
> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 
> main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 
> root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-DBG-CA72
>   arm64 aarch64 1400013 1400013
> 
> Note that both are booted with debug builds of main.
> 
> Using the context with the alternate EtherNet device that has not
> had an associated diff -r, find, pr ls -R failure yet
> yet got a panic that looks likely to be unrelated:
> 
> # mount -onoatime 192.168.1.187:/usr/ports/ /mnt/
> # diff -r /usr/ports/ /mnt/ | more
> nvme0: cpl does not map to outstanding cmd
> cdw0: sqhd:0020 sqid:0003 cid:007e p:1 sc:00 sct:0 m:0 dnr:0
> panic: received completion for unknown cmd
> cpuid = 3
> time = 1621743752
> KDB: stack backtrace:
> db_trace_self() at db_trace_self
> db_trace_self_wrapper() at db_trace_self_wrapper+0x30
> vpanic() at vpanic+0x188
> panic() at panic+0x44
> nvme_qpair_process_completions() at nvme_qpair_process_completions+0x1fc
> nvme_timeout() at nvme_timeout+0x3c
> softclock_call_cc() at softclock_call_cc+0x124
> softclock() at softclock+0x60
> ithread_loop() at ithread_loop+0x2a8
> fork_exit() at fork_exit+0x74
> fork_trampoline() at fork_trampoline+0x14
> KDB: enter: panic
> [ 

Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)

2021-05-23 Thread Mark Millard via freebsd-stable
On 2021-May-21, at 17:56, Rick Macklem  wrote:

> Mark Millard wrote:
> [stuff snipped]
>> Well, why is it that ls -R, find, and diff -r all get file
>> name problems via genet0 but diff -r gets no problems
>> comparing the content of files that it does match up (the
>> vast majority)? Any clue how could the problems possibly
>> be unique to the handling of file names/paths? Does it
>> suggest anything else to look into for getting some more
>> potentially useful evidence?
> Well, all I can do is describe the most common TSO related
> failure:
> - When a read RPC reply (including NFS/RPC/TCP/IP headers)
>  is slightly less than 64K bytes (many TSO implementations are
>  limited to 64K or 32 discontiguous segments, think 32 2K
>  mbuf clusters), the driver decides it is ok, but when the MAC
>  header is added it exceeds what the hardware can handle correctly...
> --> This will happen when reading a regular file that is slightly less
>   than a multiple of 64K in size.
> or
> --> This will happen when reading just about any large directory,
>  since the directory reply for a 64K request is converted to Sun XDR
>  format and clipped at the last full directory entry that will fit within 
> 64K.
> For ports, where most files are small, I think you can tell which is more
> likely to happen.
> --> If TSO is disabled, I have no idea how this might matter, but??
> 
>> I'll note that netstat -I ue0 -d and netstat -I genet0 -d
>> do not report changes in Ierrs or Idrop in a before vs.
>> after failures comparison. (There may be better figures
>> to look at for all I know.)
>> 
>> I tried "ifconfig genet0 -rxcsum -rxcsum -rxcsum6 -txcsum6"
>> and got no obvious change in behavior.
> All we know is that the data is getting corrupted somehow.
> 
> NFS traffic looks very different than typical TCP traffic. It is
> mostly small messages travelling in both directions concurrently,
> with some large messages thrown in the mix.
> All I'm saying is that, testing a net interface with something like
> bulk data transfer in one direction doesn't verify it works for NFS
> traffic.
> 
> Also, the large RPC messages are a chain of about 33 mbufs of
> various lengths, including a mix of partial clusters and regular
> data mbufs, whereas a bulk send on a socket will typically
> result in an mbuf chain of a lot of full 2K clusters.
> --> As such, NFS can be good at tickling subtle bugs it the
>  net driver related to mbuf handling.
> 
> rick
> 
>>> W.r.t. reverting r367492...the patch to replace r367492 was just
>>> committed to "main" by rscheff@ with a two week MFC, so it
>>> should be in stable/13 soon. Not sure if an errata can be done
>>> for it for releng13.0?
>> 
>> That update is reported to be causing "rack" related panics:
>> 
>> https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-May/004440.html
>> 
>> reports (via links):
>> 
>> panic: _mtx_lock_sleep: recursed on non-recursive mutex so_snd @ 
>> /syzkaller/managers/i386/kernel/sys/modules/tcp/rack/../../../netinet/tcp_stacks/rack.c:10632
>> 
>> Still, I have a non-debug update to main building and will
>> likely do a debug build as well. llvm is rebuilding, so
>> the builds will take a notable time.

I got the following built and installed on the two
machines:

# uname -apKU
FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 
main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 
root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-DBG-CA72
  arm64 aarch64 1400013 1400013

# uname -apKU
FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 
main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 
root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-DBG-CA72
  arm64 aarch64 1400013 1400013

Note that both are booted with debug builds of main.

Using the context with the alternate EtherNet device that has not
had an associated diff -r, find, pr ls -R failure yet
yet got a panic that looks likely to be unrelated:

# mount -onoatime 192.168.1.187:/usr/ports/ /mnt/
# diff -r /usr/ports/ /mnt/ | more
nvme0: cpl does not map to outstanding cmd
cdw0: sqhd:0020 sqid:0003 cid:007e p:1 sc:00 sct:0 m:0 dnr:0
panic: received completion for unknown cmd
cpuid = 3
time = 1621743752
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
vpanic() at vpanic+0x188
panic() at panic+0x44
nvme_qpair_process_completions() at nvme_qpair_process_completions+0x1fc
nvme_timeout() at nvme_timeout+0x3c
softclock_call_cc() at softclock_call_cc+0x124
softclock() at softclock+0x60
ithread_loop() at ithread_loop+0x2a8
fork_exit() at fork_exit+0x74
fork_trampoline() at fork_trampoline+0x14
KDB: enter: panic
[ thread pid 12 tid 100028 ]
Stopped at  kdb_enter+0x48: undefined   f904411f
db> 

Based on the "nvme" references, I expect this is tied to
handling the Optane 480 GiByte that is in the PCIe s

Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)

2021-05-21 Thread Rick Macklem
Mark Millard wrote:
[stuff snipped]
>Well, why is it that ls -R, find, and diff -r all get file
>name problems via genet0 but diff -r gets no problems
>comparing the content of files that it does match up (the
>vast majority)? Any clue how could the problems possibly
>be unique to the handling of file names/paths? Does it
>suggest anything else to look into for getting some more
>potentially useful evidence?
Well, all I can do is describe the most common TSO related
failure:
- When a read RPC reply (including NFS/RPC/TCP/IP headers)
  is slightly less than 64K bytes (many TSO implementations are
  limited to 64K or 32 discontiguous segments, think 32 2K
  mbuf clusters), the driver decides it is ok, but when the MAC
  header is added it exceeds what the hardware can handle correctly...
--> This will happen when reading a regular file that is slightly less
   than a multiple of 64K in size.
or
--> This will happen when reading just about any large directory,
  since the directory reply for a 64K request is converted to Sun XDR
  format and clipped at the last full directory entry that will fit within 
64K.
For ports, where most files are small, I think you can tell which is more
likely to happen.
--> If TSO is disabled, I have no idea how this might matter, but??

>I'll note that netstat -I ue0 -d and netstat -I genet0 -d
>do not report changes in Ierrs or Idrop in a before vs.
>after failures comparison. (There may be better figures
>to look at for all I know.)
>
>I tried "ifconfig genet0 -rxcsum -rxcsum -rxcsum6 -txcsum6"
>and got no obvious change in behavior.
All we know is that the data is getting corrupted somehow.

NFS traffic looks very different than typical TCP traffic. It is
mostly small messages travelling in both directions concurrently,
with some large messages thrown in the mix.
All I'm saying is that, testing a net interface with something like
bulk data transfer in one direction doesn't verify it works for NFS
traffic.

Also, the large RPC messages are a chain of about 33 mbufs of
various lengths, including a mix of partial clusters and regular
data mbufs, whereas a bulk send on a socket will typically
result in an mbuf chain of a lot of full 2K clusters.
--> As such, NFS can be good at tickling subtle bugs it the
  net driver related to mbuf handling.

rick

> W.r.t. reverting r367492...the patch to replace r367492 was just
> committed to "main" by rscheff@ with a two week MFC, so it
> should be in stable/13 soon. Not sure if an errata can be done
> for it for releng13.0?

That update is reported to be causing "rack" related panics:

https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-May/004440.html

reports (via links):

panic: _mtx_lock_sleep: recursed on non-recursive mutex so_snd @ 
/syzkaller/managers/i386/kernel/sys/modules/tcp/rack/../../../netinet/tcp_stacks/rack.c:10632

Still, I have a non-debug update to main building and will
likely do a debug build as well. llvm is rebuilding, so
the builds will take a notable time.

> Thanks for isolating this, rick
> ps: Co-incidentally, I've been thinking of buying an RBPi4 as a toy.

I'll warn that the primary "small arm" development/support
folk(s) do not work on the RPi*'s these days, beyond
committing what others provide and the like.




===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)


___
[email protected] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[email protected]"


Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)

2021-05-21 Thread Mark Millard via freebsd-stable



On 2021-May-21, at 09:00, Rick Macklem  wrote:

> Mark Millard wrote:
>> On 2021-May-20, at 22:19, Rick Macklem  wrote:
> [stuff snipped]
>>> ps: I do not think that r367492 could cause this, but it would be
>>>nice if you try a kernel with the r367492 patch reverted.
>>>It is currently in all of releng13, stable13 and main, although
>>>the patch to fix this is was just reviewed and may hit main soon.
>> 
>> Do you want a debug kernel to be used? Do you have a preference
>> for main vs. stable/13 vs. release/13.0.0 based? Is it okay to
>> stick to the base version things are now based on --or do you
>> want me to update to more recent? (That last only applies if
>> main or stable/13 is to be put to use.)
> Well, it sounds like you've isolated it to the genet interface.
> Good sluething.
> Unfortunately, NFS is only as good as the network fabric under it.
> However, it's usually hangs or poor performance. Except maybe
> for the readdir issue that Jason Bacon reported and resolved via
> an upgrade, this is a first.
> --> In the old days, I would have expected IP checksums to catch
>   this, but I'm guessing the hardware/net driver are doing them
>   these days?

Well, why is it that ls -R, find, and diff -r all get file
name problems via genet0 but diff -r gets no problems
comparing the content of files that it does match up (the
vast majority)? Any clue how could the problems possibly
be unique to the handling of file names/paths? Does it
suggest anything else to look into for getting some more
potentially useful evidence?

I'll note that netstat -I ue0 -d and netstat -I genet0 -d
do not report changes in Ierrs or Idrop in a before vs.
after failures comparison. (There may be better figures
to look at for all I know.)

I tried "ifconfig genet0 -rxcsum -rxcsum -rxcsum6 -txcsum6"
and got no obvious change in behavior.

> W.r.t. reverting r367492...the patch to replace r367492 was just
> committed to "main" by rscheff@ with a two week MFC, so it
> should be in stable/13 soon. Not sure if an errata can be done
> for it for releng13.0?

That update is reported to be causing "rack" related panics:

https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-May/004440.html

reports (via links):

panic: _mtx_lock_sleep: recursed on non-recursive mutex so_snd @ 
/syzkaller/managers/i386/kernel/sys/modules/tcp/rack/../../../netinet/tcp_stacks/rack.c:10632

Still, I have a non-debug update to main building and will
likely do a debug build as well. llvm is rebuilding, so
the builds will take a notable time.

> Thanks for isolating this, rick
> ps: Co-incidentally, I've been thinking of buying an RBPi4 as a toy.

I'll warn that the primary "small arm" development/support
folk(s) do not work on the RPi*'s these days, beyond
committing what others provide and the like.




===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
[email protected] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[email protected]"


Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)

2021-05-21 Thread Rick Macklem
Mark Millard wrote:
>On 2021-May-20, at 22:19, Rick Macklem  wrote:
[stuff snipped]
>> ps: I do not think that r367492 could cause this, but it would be
>> nice if you try a kernel with the r367492 patch reverted.
>> It is currently in all of releng13, stable13 and main, although
>> the patch to fix this is was just reviewed and may hit main soon.
>
>Do you want a debug kernel to be used? Do you have a preference
>for main vs. stable/13 vs. release/13.0.0 based? Is it okay to
>stick to the base version things are now based on --or do you
>want me to update to more recent? (That last only applies if
>main or stable/13 is to be put to use.)
Well, it sounds like you've isolated it to the genet interface.
Good sluething.
Unfortunately, NFS is only as good as the network fabric under it.
However, it's usually hangs or poor performance. Except maybe
for the readdir issue that Jason Bacon reported and resolved via
an upgrade, this is a first.
--> In the old days, I would have expected IP checksums to catch
   this, but I'm guessing the hardware/net driver are doing them
   these days?

W.r.t. reverting r367492...the patch to replace r367492 was just
committed to "main" by rscheff@ with a two week MFC, so it
should be in stable/13 soon. Not sure if an errata can be done
for it for releng13.0?

Thanks for isolating this, rick
ps: Co-incidentally, I've been thinking of buying an RBPi4 as a toy.
> . . . old history deleted . . .

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)


___
[email protected] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[email protected]"


Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context) [RPi4B genet0 involved in problem]

2021-05-21 Thread Mark Millard via freebsd-stable
[Looks like the RPi4B genet0 handling is involved.]

On 2021-May-20, at 22:56, Mark Millard  wrote:
> 
> On 2021-May-20, at 22:19, Rick Macklem  wrote:
> 
>> Ok, so it isn't related to "soft".
>> I am wondering if it is something specific to what
>> "diff -r" does?
>> 
>> Could you try:
>> # cd /usr/ports
>> # ls -R > /tmp/x
>> # cd /mnt
>> # ls -R > /tmp/y
>> # cd /tmp
>> # diff -u -p x y
>> --> To see if "ls -R" finds any difference?
>> 
> 
> # diff -u -p x y 
> --- x   2021-05-20 22:35:48.021663000 -0700
> +++ y   2021-05-20 22:39:03.691936000 -0700
> @@ -227209,10 +227209,10 @@ 
> patch-chrome_browser_background_background__mode__mana
> patch-chrome_browser_background_background__mode__optimizer.cc
> patch-chrome_browser_browser__resources.grd
> patch-chrome_browser_browsing__data_chrome__browsing__data__remover__delegate.cc
> +patch-chrome_browser_chrome__browser
> patch-chrome_browser_chrome__browser__interface__binders.cc
> patch-chrome_browser_chrome__browser__main.cc
> patch-chrome_browser_chrome__browser__main__linux.cc
> -patch-chrome_browser_chrome__browser__main__posix.cc
> patch-chrome_browser_chrome__content__browser__client.cc
> patch-chrome_browser_chrome__content__browser__client.h
> patch-chrome_browser_crash__upload__list_crash__upload__list.cc
> 
> # find /usr/ports/ -name 'patch-chrome_browser_chrome__browser*' -print | more
> /usr/ports/devel/electron12/files/patch-chrome_browser_chrome__browser__main__linux.cc
> /usr/ports/devel/electron12/files/patch-chrome_browser_chrome__browser__main.cc
> /usr/ports/devel/electron12/files/patch-chrome_browser_chrome__browser__main__posix.cc
> /usr/ports/devel/electron12/files/patch-chrome_browser_chrome__browser__interface__binders.cc
> /usr/ports/www/chromium/files/patch-chrome_browser_chrome__browser__main__posix.cc
> /usr/ports/www/chromium/files/patch-chrome_browser_chrome__browser__main.cc
> /usr/ports/www/chromium/files/patch-chrome_browser_chrome__browser__main__linux.cc
> /usr/ports/www/chromium/files/patch-chrome_browser_chrome__browser__interface__binders.cc
> 
> find /mnt/ -name 'patch-chrome_browser_chrome__browser*' -print | more
> /mnt/devel/electron12/files/patch-chrome_browser_chrome__browser__main__linux.cc
> /mnt/devel/electron12/files/patch-chrome_browser_chrome__browser__main.cc
> /mnt/devel/electron12/files/patch-chrome_browser_chrome__browser__main__posix.cc
> /mnt/devel/electron12/files/patch-chrome_browser_chrome__browser__interface__binders.cc
> /mnt/www/chromium/files/patch-chrome_browser_chrome__browser
> /mnt/www/chromium/files/patch-chrome_browser_chrome__browser__main.cc
> /mnt/www/chromium/files/patch-chrome_browser_chrome__browser__main__linux.cc
> /mnt/www/chromium/files/patch-chrome_browser_chrome__browser__interface__binders.cc
> 
> So: patch-chrome_browser_chrome__browser appears to be a
> truncated: patch-chrome_browser_chrome__browser__main__posix.cc
> file name and find also gets the same oddity.
> 
> (Note: This had /usr/ports in a main context and /mnt/
> referring to a release/13.0.0 context.)
> 
>> ps: I do not think that r367492 could cause this, but it would be
>>nice if you try a kernel with the r367492 patch reverted.
>>It is currently in all of releng13, stable13 and main, although
>>the patch to fix this is was just reviewed and may hit main soon.
> 
> Do you want a debug kernel to be used? Do you have a preference
> for main vs. stable/13 vs. release/13.0.0 based? Is it okay to
> stick to the base version things are now based on --or do you
> want me to update to more recent? (That last only applies if
> main or stable/13 is to be put to use.)
> 
>> . . . old history deleted . . .

I reversed the roles of the faster vs. somewhat slower
machine and so far my diff -r attempts for this found
no differences. The machines were using different types
of EtherNet devices.

So I've substituted a different EtherNet device onto
the slower machine: the same type of USB3 EtherNet
device in use on the faster machine (instead of
using the RPi4B's builtin EtherNet). So the below
testing is with both machines having a:

ugen0.6:  at usbus0
ure0 on uhub0
ure0:  on usbus0
miibus1:  on ure0
rgephy0:  PHY 0 on miibus1
rgephy0:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT-FDX, 
1000baseT-FDX-master, auto

in use.

I rebooted with this connected instead of the genet0
interface.

Mounting the slower machine's /usr/ports/ as /mnt/ from the faster machine:
No differences found by diff -r this way (expected result).

Mounting the faster machine's /usr/ports/ as /mnt/ from the slower machine:
No differences found by diff -r this way (expected result).

Doing diff -r's from both sides at the same time:
No differences found by diff -r this way (expected result).


So it looks like genet0 or its supporting software
is contributing to the problems that I had reported.

It is interesting that there were no examples of the
content of files reporting a mismatch, just some file
names/paths not findin

Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)

2021-05-20 Thread Mark Millard via freebsd-stable



On 2021-May-20, at 22:19, Rick Macklem  wrote:

> Ok, so it isn't related to "soft".
> I am wondering if it is something specific to what
> "diff -r" does?
> 
> Could you try:
> # cd /usr/ports
> # ls -R > /tmp/x
> # cd /mnt
> # ls -R > /tmp/y
> # cd /tmp
> # diff -u -p x y
> --> To see if "ls -R" finds any difference?
> 

# diff -u -p x y 
--- x   2021-05-20 22:35:48.021663000 -0700
+++ y   2021-05-20 22:39:03.691936000 -0700
@@ -227209,10 +227209,10 @@ 
patch-chrome_browser_background_background__mode__mana
 patch-chrome_browser_background_background__mode__optimizer.cc
 patch-chrome_browser_browser__resources.grd
 
patch-chrome_browser_browsing__data_chrome__browsing__data__remover__delegate.cc
+patch-chrome_browser_chrome__browser
 patch-chrome_browser_chrome__browser__interface__binders.cc
 patch-chrome_browser_chrome__browser__main.cc
 patch-chrome_browser_chrome__browser__main__linux.cc
-patch-chrome_browser_chrome__browser__main__posix.cc
 patch-chrome_browser_chrome__content__browser__client.cc
 patch-chrome_browser_chrome__content__browser__client.h
 patch-chrome_browser_crash__upload__list_crash__upload__list.cc

# find /usr/ports/ -name 'patch-chrome_browser_chrome__browser*' -print | more
/usr/ports/devel/electron12/files/patch-chrome_browser_chrome__browser__main__linux.cc
/usr/ports/devel/electron12/files/patch-chrome_browser_chrome__browser__main.cc
/usr/ports/devel/electron12/files/patch-chrome_browser_chrome__browser__main__posix.cc
/usr/ports/devel/electron12/files/patch-chrome_browser_chrome__browser__interface__binders.cc
/usr/ports/www/chromium/files/patch-chrome_browser_chrome__browser__main__posix.cc
/usr/ports/www/chromium/files/patch-chrome_browser_chrome__browser__main.cc
/usr/ports/www/chromium/files/patch-chrome_browser_chrome__browser__main__linux.cc
/usr/ports/www/chromium/files/patch-chrome_browser_chrome__browser__interface__binders.cc

 find /mnt/ -name 'patch-chrome_browser_chrome__browser*' -print | more
/mnt/devel/electron12/files/patch-chrome_browser_chrome__browser__main__linux.cc
/mnt/devel/electron12/files/patch-chrome_browser_chrome__browser__main.cc
/mnt/devel/electron12/files/patch-chrome_browser_chrome__browser__main__posix.cc
/mnt/devel/electron12/files/patch-chrome_browser_chrome__browser__interface__binders.cc
/mnt/www/chromium/files/patch-chrome_browser_chrome__browser
/mnt/www/chromium/files/patch-chrome_browser_chrome__browser__main.cc
/mnt/www/chromium/files/patch-chrome_browser_chrome__browser__main__linux.cc
/mnt/www/chromium/files/patch-chrome_browser_chrome__browser__interface__binders.cc

So: patch-chrome_browser_chrome__browser appears to be a
truncated: patch-chrome_browser_chrome__browser__main__posix.cc
file name and find also gets the same oddity.

(Note: This had /usr/ports in a main context and /mnt/
referring to a release/13.0.0 context.)

> ps: I do not think that r367492 could cause this, but it would be
> nice if you try a kernel with the r367492 patch reverted.
> It is currently in all of releng13, stable13 and main, although
> the patch to fix this is was just reviewed and may hit main soon.

Do you want a debug kernel to be used? Do you have a preference
for main vs. stable/13 vs. release/13.0.0 based? Is it okay to
stick to the base version things are now based on --or do you
want me to update to more recent? (That last only applies if
main or stable/13 is to be put to use.)

> . . . old history deleted . . .

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
[email protected] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[email protected]"


Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)

2021-05-20 Thread Rick Macklem
Ok, so it isn't related to "soft".
I am wondering if it is something specific to what
"diff -r" does?

Could you try:
# cd /usr/ports
# ls -R > /tmp/x
# cd /mnt
# ls -R > /tmp/y
# cd /tmp
# diff -u -p x y
--> To see if "ls -R" finds any difference?

rick
ps: I do not think that r367492 could cause this, but it would be
 nice if you try a kernel with the r367492 patch reverted.
 It is currently in all of releng13, stable13 and main, although
 the patch to fix this is was just reviewed and may hit main soon.



From: Mark Millard 
Sent: Friday, May 21, 2021 12:40 AM
To: Rick Macklem
Cc: FreeBSD-STABLE Mailing List
Subject: Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in 
a zfs file systems context)

CAUTION: This email originated from outside of the University of Guelph. Do not 
click links or open attachments unless you recognize the sender and know the 
content is safe. If in doubt, forward suspicious emails to [email protected]


[main test example and main/releng/13 mixed example]

On 2021-May-20, at 20:36, Mark Millard  wrote:

> [stable/13 test: example ends up being odder. That might
> allow eliminating some potential alternatives.]
>
> On 2021-May-20, at 19:38, Mark Millard  wrote:
>>
>> On 2021-May-20, at 18:09, Rick Macklem  wrote:
>>>
>>> Oh, one additional thing that I'll dare to top post...
>>> r367492 broke the TCP upcalls that the NFS server uses, such
>>> that intermittent hangs of NFS mounts to FreeBSD13 servers can occur.
>>> This has not yet been resolved in "main" etc and could explain
>>> why an RPC could time out for a soft mount.
>>
>> See later notes that I added: soft mount is not required
>> to see the problem.
>>
>>> You can revert the patch in r367492 to avoid the problem.
>>
>> If I understand right, you are indicating that this would
>> not apply to the non-soft mount case that I got.
>>
>>> Disabling TSO, LRO are also de-facto standard things to do when
>>> you observe weird NFS  behaviour, because they are often broken
>>> in various network device drivers.
>>
>> I'll have to figure out how to experiment with such. Things
>> are at defaults rather generally on the systems. I'm not
>> literate in the subject areas.
>>
>> I'm the only user of the machines and network. It is not
>> outward facing. It is a rather small EtherNet network.
>>
>>> rick
>>>
>>> ________
>>> From: [email protected]  
>>> on behalf of Rick Macklem 
>>> Sent: Thursday, May 20, 2021 8:55 PM
>>> To: FreeBSD-STABLE Mailing List; Mark Millard
>>> Subject: Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs 
>>> (in a zfs file systems context)
>>>
>>> Mark Millard wrote:
>>>> [I warn that I'm a fairly minimal user of NFS
>>>> mounts, not knowing all that much. I'm mostly
>>>> reporting this in case it ends up as evidence
>>>> via eventually matching up with others observing
>>>> possibly related oddities.]
>>>>
>>>> I got the following odd sequence (that I've
>>>> mixed notes into). It involved a diff -r over NFS
>>>> showing differences (files missing) and then a
>>>> later diff finding matches for the same files,
>>>> no file system changes made on either machine.
>>>> I'm unable to reproduce the oddity on demand.
>>>>
>>>> Note: A larger scope diff -r originally returned the
>>>> below as well, but doing the narrower diff -r did
>>>> repeat the result and that is what I show. (I
>>>> make no use of devel/ice .)
>>>>
>>>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more
>>>> Only in /usr/ports/devel/ice/files: Make.rules.FreeBSD
>> . . .
>>>> Only in /usr/ports/devel/ice/files: patch-scripts-TestUtil.py
>>>>
>>>> Note: The above was not expected. So I tried:
>>>>
>>>> # ls -Tld /mnt/devel/ice/files/*
>>>> -rw-r--r--  1 root  wheel   755 Apr 21 21:07:54 2021 
>>>> /mnt/devel/ice/files/Make.rules.FreeBSD
>> . . .
>>>> -rw-r--r--  1 root  wheel  2588 Apr 21 21:07:54 2021 
>>>> /mnt/devel/ice/files/patch-scripts-TestUtil.py
>>>>
>>>> Note: So that indicated that the files were there on the
>>>> machine that /mnt references. So attempting the original
>>>> diff -r 

Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)

2021-05-20 Thread Mark Millard via freebsd-stable
[Direct drive connection to machine: no problem.]

On 2021-May-20, at 21:40, Mark Millard  wrote:

> [main test example and main/releng/13 mixed example]
> 
> On 2021-May-20, at 20:36, Mark Millard  wrote:
> 
>> [stable/13 test: example ends up being odder. That might
>> allow eliminating some potential alternatives.]
>> 
>> On 2021-May-20, at 19:38, Mark Millard  wrote:
>>> 
>>> On 2021-May-20, at 18:09, Rick Macklem  wrote:
>>>> 
>>>> Oh, one additional thing that I'll dare to top post...
>>>> r367492 broke the TCP upcalls that the NFS server uses, such
>>>> that intermittent hangs of NFS mounts to FreeBSD13 servers can occur.
>>>> This has not yet been resolved in "main" etc and could explain
>>>> why an RPC could time out for a soft mount.
>>> 
>>> See later notes that I added: soft mount is not required
>>> to see the problem.
>>> 
>>>> You can revert the patch in r367492 to avoid the problem.
>>> 
>>> If I understand right, you are indicating that this would
>>> not apply to the non-soft mount case that I got.
>>> 
>>>> Disabling TSO, LRO are also de-facto standard things to do when
>>>> you observe weird NFS  behaviour, because they are often broken
>>>> in various network device drivers.
>>> 
>>> I'll have to figure out how to experiment with such. Things
>>> are at defaults rather generally on the systems. I'm not
>>> literate in the subject areas.
>>> 
>>> I'm the only user of the machines and network. It is not
>>> outward facing. It is a rather small EtherNet network.
>>> 
>>>> rick
>>>> 
>>>> 
>>>> From: [email protected]  
>>>> on behalf of Rick Macklem 
>>>> Sent: Thursday, May 20, 2021 8:55 PM
>>>> To: FreeBSD-STABLE Mailing List; Mark Millard
>>>> Subject: Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs 
>>>> (in a zfs file systems context)
>>>> 
>>>> Mark Millard wrote:
>>>>> [I warn that I'm a fairly minimal user of NFS
>>>>> mounts, not knowing all that much. I'm mostly
>>>>> reporting this in case it ends up as evidence
>>>>> via eventually matching up with others observing
>>>>> possibly related oddities.]
>>>>> 
>>>>> I got the following odd sequence (that I've
>>>>> mixed notes into). It involved a diff -r over NFS
>>>>> showing differences (files missing) and then a
>>>>> later diff finding matches for the same files,
>>>>> no file system changes made on either machine.
>>>>> I'm unable to reproduce the oddity on demand.
>>>>> 
>>>>> Note: A larger scope diff -r originally returned the
>>>>> below as well, but doing the narrower diff -r did
>>>>> repeat the result and that is what I show. (I
>>>>> make no use of devel/ice .)
>>>>> 
>>>>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more
>>>>> Only in /usr/ports/devel/ice/files: Make.rules.FreeBSD
>>> . . .
>>>>> Only in /usr/ports/devel/ice/files: patch-scripts-TestUtil.py
>>>>> 
>>>>> Note: The above was not expected. So I tried:
>>>>> 
>>>>> # ls -Tld /mnt/devel/ice/files/*
>>>>> -rw-r--r--  1 root  wheel   755 Apr 21 21:07:54 2021 
>>>>> /mnt/devel/ice/files/Make.rules.FreeBSD
>>> . . .
>>>>> -rw-r--r--  1 root  wheel  2588 Apr 21 21:07:54 2021 
>>>>> /mnt/devel/ice/files/patch-scripts-TestUtil.py
>>>>> 
>>>>> Note: So that indicated that the files were there on the
>>>>> machine that /mnt references. So attempting the original
>>>>> diff -r again:
>>>>> 
>>>>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more
>>>>> #
>>>>> 
>>>>> (Empty difference.)
>>>>> 
>>>>> Note: So after the explicit "ls -Tld /mnt/devel/ice/files/*"
>>>>> the odd result of the diff -r no longer happened: no
>>>>> differences reported.
>>>>> 
>>>>> 
>>>>> 
>>>>> For reference (both machines reported):
>>>>> 
>>>>> . . .
>>&

Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)

2021-05-20 Thread Mark Millard via freebsd-stable
[main test example and main/releng/13 mixed example]

On 2021-May-20, at 20:36, Mark Millard  wrote:

> [stable/13 test: example ends up being odder. That might
> allow eliminating some potential alternatives.]
> 
> On 2021-May-20, at 19:38, Mark Millard  wrote:
>> 
>> On 2021-May-20, at 18:09, Rick Macklem  wrote:
>>> 
>>> Oh, one additional thing that I'll dare to top post...
>>> r367492 broke the TCP upcalls that the NFS server uses, such
>>> that intermittent hangs of NFS mounts to FreeBSD13 servers can occur.
>>> This has not yet been resolved in "main" etc and could explain
>>> why an RPC could time out for a soft mount.
>> 
>> See later notes that I added: soft mount is not required
>> to see the problem.
>> 
>>> You can revert the patch in r367492 to avoid the problem.
>> 
>> If I understand right, you are indicating that this would
>> not apply to the non-soft mount case that I got.
>> 
>>> Disabling TSO, LRO are also de-facto standard things to do when
>>> you observe weird NFS  behaviour, because they are often broken
>>> in various network device drivers.
>> 
>> I'll have to figure out how to experiment with such. Things
>> are at defaults rather generally on the systems. I'm not
>> literate in the subject areas.
>> 
>> I'm the only user of the machines and network. It is not
>> outward facing. It is a rather small EtherNet network.
>> 
>>> rick
>>> 
>>> ________________________
>>> From: [email protected]  
>>> on behalf of Rick Macklem 
>>> Sent: Thursday, May 20, 2021 8:55 PM
>>> To: FreeBSD-STABLE Mailing List; Mark Millard
>>> Subject: Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs 
>>> (in a zfs file systems context)
>>> 
>>> Mark Millard wrote:
>>>> [I warn that I'm a fairly minimal user of NFS
>>>> mounts, not knowing all that much. I'm mostly
>>>> reporting this in case it ends up as evidence
>>>> via eventually matching up with others observing
>>>> possibly related oddities.]
>>>> 
>>>> I got the following odd sequence (that I've
>>>> mixed notes into). It involved a diff -r over NFS
>>>> showing differences (files missing) and then a
>>>> later diff finding matches for the same files,
>>>> no file system changes made on either machine.
>>>> I'm unable to reproduce the oddity on demand.
>>>> 
>>>> Note: A larger scope diff -r originally returned the
>>>> below as well, but doing the narrower diff -r did
>>>> repeat the result and that is what I show. (I
>>>> make no use of devel/ice .)
>>>> 
>>>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more
>>>> Only in /usr/ports/devel/ice/files: Make.rules.FreeBSD
>> . . .
>>>> Only in /usr/ports/devel/ice/files: patch-scripts-TestUtil.py
>>>> 
>>>> Note: The above was not expected. So I tried:
>>>> 
>>>> # ls -Tld /mnt/devel/ice/files/*
>>>> -rw-r--r--  1 root  wheel   755 Apr 21 21:07:54 2021 
>>>> /mnt/devel/ice/files/Make.rules.FreeBSD
>> . . .
>>>> -rw-r--r--  1 root  wheel  2588 Apr 21 21:07:54 2021 
>>>> /mnt/devel/ice/files/patch-scripts-TestUtil.py
>>>> 
>>>> Note: So that indicated that the files were there on the
>>>> machine that /mnt references. So attempting the original
>>>> diff -r again:
>>>> 
>>>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more
>>>> #
>>>> 
>>>> (Empty difference.)
>>>> 
>>>> Note: So after the explicit "ls -Tld /mnt/devel/ice/files/*"
>>>> the odd result of the diff -r no longer happened: no
>>>> differences reported.
>>>> 
>>>> 
>>>> 
>>>> For reference (both machines reported):
>>>> 
>>>> . . .
>>>> The original mount command was on CA72_16Gp_ZFS:
>>>> 
>>>> # mount -onoatime,soft 192.168.1.170:/usr/ports/ /mnt/
>>> The likely explanation for this is your use of a "soft" mount.
>>> - If the NFS server is slow to respond or there is a temporary network 
>>> issue,
>>> the RPC request can time out and then the
>>> syscall can fail with EINT/ETIMEDOUT. Since almost nothing, including the
&g

Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)

2021-05-20 Thread Mark Millard via freebsd-stable
[stable/13 test: example ends up being odder. That might
allow eliminating some potential alternatives.]

On 2021-May-20, at 19:38, Mark Millard  wrote:
> 
> On 2021-May-20, at 18:09, Rick Macklem  wrote:
>> 
>> Oh, one additional thing that I'll dare to top post...
>> r367492 broke the TCP upcalls that the NFS server uses, such
>> that intermittent hangs of NFS mounts to FreeBSD13 servers can occur.
>> This has not yet been resolved in "main" etc and could explain
>> why an RPC could time out for a soft mount.
> 
> See later notes that I added: soft mount is not required
> to see the problem.
> 
>> You can revert the patch in r367492 to avoid the problem.
> 
> If I understand right, you are indicating that this would
> not apply to the non-soft mount case that I got.
> 
>> Disabling TSO, LRO are also de-facto standard things to do when
>> you observe weird NFS  behaviour, because they are often broken
>> in various network device drivers.
> 
> I'll have to figure out how to experiment with such. Things
> are at defaults rather generally on the systems. I'm not
> literate in the subject areas.
> 
> I'm the only user of the machines and network. It is not
> outward facing. It is a rather small EtherNet network.
> 
>> rick
>> 
>> 
>> From: [email protected]  on 
>> behalf of Rick Macklem 
>> Sent: Thursday, May 20, 2021 8:55 PM
>> To: FreeBSD-STABLE Mailing List; Mark Millard
>> Subject: Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs 
>> (in a zfs file systems context)
>> 
>> Mark Millard wrote:
>>> [I warn that I'm a fairly minimal user of NFS
>>> mounts, not knowing all that much. I'm mostly
>>> reporting this in case it ends up as evidence
>>> via eventually matching up with others observing
>>> possibly related oddities.]
>>> 
>>> I got the following odd sequence (that I've
>>> mixed notes into). It involved a diff -r over NFS
>>> showing differences (files missing) and then a
>>> later diff finding matches for the same files,
>>> no file system changes made on either machine.
>>> I'm unable to reproduce the oddity on demand.
>>> 
>>> Note: A larger scope diff -r originally returned the
>>> below as well, but doing the narrower diff -r did
>>> repeat the result and that is what I show. (I
>>> make no use of devel/ice .)
>>> 
>>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more
>>> Only in /usr/ports/devel/ice/files: Make.rules.FreeBSD
> . . .
>>> Only in /usr/ports/devel/ice/files: patch-scripts-TestUtil.py
>>> 
>>> Note: The above was not expected. So I tried:
>>> 
>>> # ls -Tld /mnt/devel/ice/files/*
>>> -rw-r--r--  1 root  wheel   755 Apr 21 21:07:54 2021 
>>> /mnt/devel/ice/files/Make.rules.FreeBSD
> . . .
>>> -rw-r--r--  1 root  wheel  2588 Apr 21 21:07:54 2021 
>>> /mnt/devel/ice/files/patch-scripts-TestUtil.py
>>> 
>>> Note: So that indicated that the files were there on the
>>> machine that /mnt references. So attempting the original
>>> diff -r again:
>>> 
>>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more
>>> #
>>> 
>>> (Empty difference.)
>>> 
>>> Note: So after the explicit "ls -Tld /mnt/devel/ice/files/*"
>>> the odd result of the diff -r no longer happened: no
>>> differences reported.
>>> 
>>> 
>>> 
>>> For reference (both machines reported):
>>> 
>>> . . .
>>> The original mount command was on CA72_16Gp_ZFS:
>>> 
>>> # mount -onoatime,soft 192.168.1.170:/usr/ports/ /mnt/
>> The likely explanation for this is your use of a "soft" mount.
>> - If the NFS server is slow to respond or there is a temporary network issue,
>>  the RPC request can time out and then the
>>  syscall can fail with EINT/ETIMEDOUT. Since almost nothing, including the
>>   readdir(3) libc functions expect syscalls to fail this way...
>>   Then the cached directory is messed up.
>>   Doing the "ls" read the directory again and fixed the problem.
>> 
>> Try to reproduce it for a mount without the "soft" option.
>> (If a mount point is hung, due to an unresponsive server "umount -N /mnt"
>> can usually get rid of it.)
>> Personally, I thought "soft" was a bad idea when Sun introduced it in NFS in 
>&

Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)

2021-05-20 Thread Mark Millard via freebsd-stable



> On 2021-May-20, at 18:09, Rick Macklem  wrote:
> 
> Oh, one additional thing that I'll dare to top post...
> r367492 broke the TCP upcalls that the NFS server uses, such
> that intermittent hangs of NFS mounts to FreeBSD13 servers can occur.
> This has not yet been resolved in "main" etc and could explain
> why an RPC could time out for a soft mount.

See later notes that I added: soft mount is not required
to see the problem.

> You can revert the patch in r367492 to avoid the problem.

If I understand right, you are indicating that this would
not apply to the non-soft mount case that I got.

> Disabling TSO, LRO are also de-facto standard things to do when
> you observe weird NFS  behaviour, because they are often broken
> in various network device drivers.

I'll have to figure out how to experiment with such. Things
are at defaults rather generally on the systems. I'm not
literate in the subject areas.

I'm the only user of the machines and network. It is not
outward facing. It is a rather small EtherNet network.

> rick
> 
> 
> From: [email protected]  on 
> behalf of Rick Macklem 
> Sent: Thursday, May 20, 2021 8:55 PM
> To: FreeBSD-STABLE Mailing List; Mark Millard
> Subject: Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs 
> (in a zfs file systems context)
> 
> Mark Millard wrote:
>> [I warn that I'm a fairly minimal user of NFS
>> mounts, not knowing all that much. I'm mostly
>> reporting this in case it ends up as evidence
>> via eventually matching up with others observing
>> possibly related oddities.]
>> 
>> I got the following odd sequence (that I've
>> mixed notes into). It involved a diff -r over NFS
>> showing differences (files missing) and then a
>> later diff finding matches for the same files,
>> no file system changes made on either machine.
>> I'm unable to reproduce the oddity on demand.
>> 
>> Note: A larger scope diff -r originally returned the
>> below as well, but doing the narrower diff -r did
>> repeat the result and that is what I show. (I
>> make no use of devel/ice .)
>> 
>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more
>> Only in /usr/ports/devel/ice/files: Make.rules.FreeBSD
. . .
>> Only in /usr/ports/devel/ice/files: patch-scripts-TestUtil.py
>> 
>> Note: The above was not expected. So I tried:
>> 
>> # ls -Tld /mnt/devel/ice/files/*
>> -rw-r--r--  1 root  wheel   755 Apr 21 21:07:54 2021 
>> /mnt/devel/ice/files/Make.rules.FreeBSD
. . .
>> -rw-r--r--  1 root  wheel  2588 Apr 21 21:07:54 2021 
>> /mnt/devel/ice/files/patch-scripts-TestUtil.py
>> 
>> Note: So that indicated that the files were there on the
>> machine that /mnt references. So attempting the original
>> diff -r again:
>> 
>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more
>> #
>> 
>> (Empty difference.)
>> 
>> Note: So after the explicit "ls -Tld /mnt/devel/ice/files/*"
>> the odd result of the diff -r no longer happened: no
>> differences reported.
>> 
>> 
>> 
>> For reference (both machines reported):
>> 
>> . . .
>> The original mount command was on CA72_16Gp_ZFS:
>> 
>> # mount -onoatime,soft 192.168.1.170:/usr/ports/ /mnt/
> The likely explanation for this is your use of a "soft" mount.
> - If the NFS server is slow to respond or there is a temporary network issue,
>   the RPC request can time out and then the
>   syscall can fail with EINT/ETIMEDOUT. Since almost nothing, including the
>readdir(3) libc functions expect syscalls to fail this way...
>Then the cached directory is messed up.
>Doing the "ls" read the directory again and fixed the problem.
> 
> Try to reproduce it for a mount without the "soft" option.
> (If a mount point is hung, due to an unresponsive server "umount -N /mnt"
> can usually get rid of it.)
> Personally, I thought "soft" was a bad idea when Sun introduced it in NFS in 
> 1985
> and I still feel that way.
> --> If you can reproduce it without "soft" then I can't explain it.
>  To be honest, the directory reading/caching code in the NFSv3 client
>  hasn't changed significantly in literally decades, as far as I can 
> remember.

Well . . . trying an even wider scope diff than
the original . . .

# umount /mnt/
# mount -onoatime 192.168.1.170:/usr/ports/ /mnt/
# diff -r /usr/ports/ /mnt/ | more
Only in /mnt/databases/mongodb42/files/aarch64: patch-src_third__party_mozjs-60_
Only in /usr/port

Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)

2021-05-20 Thread Rick Macklem
Oh, one additional thing that I'll dare to top post...
r367492 broke the TCP upcalls that the NFS server uses, such
that intermittent hangs of NFS mounts to FreeBSD13 servers can occur.
This has not yet been resolved in "main" etc and could explain
why an RPC could time out for a soft mount.

You can revert the patch in r367492 to avoid the problem.

Disabling TSO, LRO are also de-facto standard things to do when
you observe weird NFS  behaviour, because they are often broken
in various network device drivers.

rick


From: [email protected]  on 
behalf of Rick Macklem 
Sent: Thursday, May 20, 2021 8:55 PM
To: FreeBSD-STABLE Mailing List; Mark Millard
Subject: Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in 
a zfs file systems context)

Mark Millard wrote:
>[I warn that I'm a fairly minimal user of NFS
>mounts, not knowing all that much. I'm mostly
>reporting this in case it ends up as evidence
>via eventually matching up with others observing
>possibly related oddities.]
>
>I got the following odd sequence (that I've
>mixed notes into). It involved a diff -r over NFS
>showing differences (files missing) and then a
>later diff finding matches for the same files,
>no file system changes made on either machine.
>I'm unable to reproduce the oddity on demand.
>
>Note: A larger scope diff -r originally returned the
>below as well, but doing the narrower diff -r did
>repeat the result and that is what I show. (I
>make no use of devel/ice .)
>
># diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more
>Only in /usr/ports/devel/ice/files: Make.rules.FreeBSD
>Only in /usr/ports/devel/ice/files: patch-config-Make.common.rules
>Only in /usr/ports/devel/ice/files: patch-cpp-Makefile
>Only in /usr/ports/devel/ice/files: patch-cpp-allTests.py
>Only in /usr/ports/devel/ice/files: patch-cpp-config-Make.rules
>Only in /usr/ports/devel/ice/files: patch-cpp-include-Ice-FactoryTableInit.h
>Only in /usr/ports/devel/ice/files: patch-cpp-include-IceUtil-Config.h
>Only in /usr/ports/devel/ice/files: patch-cpp-include-IceUtil-ScannerConfig.h
>Only in /usr/ports/devel/ice/files: 
>patch-cpp-src-Glacier2CryptPermissionsVerifier-CryptPermissionsVerifierI.cpp
>Only in /usr/ports/devel/ice/files: patch-cpp-src-Ice-ProxyFactory.cpp
>Only in /usr/ports/devel/ice/files: patch-cpp-src-IceGrid-PluginFacadeI.cpp
>Only in /usr/ports/devel/ice/files: patch-cpp-src-IceGrid-RegistryI.cpp
>Only in /usr/ports/devel/ice/files: patch-cpp-src-IceSSL-Makefile
>Only in /usr/ports/devel/ice/files: patch-cpp-test-Glacier2-ssl-Server.cpp
>Only in /usr/ports/devel/ice/files: 
>patch-cpp-test-Glacier2-staticFiltering-run.py
>Only in /usr/ports/devel/ice/files: patch-cpp-test-Ice-info-AllTests.cpp
>Only in /usr/ports/devel/ice/files: patch-cpp-test-Ice-metrics-AllTests.cpp
>Only in /usr/ports/devel/ice/files: patch-cpp-test-Ice-objects-Makefile
>Only in /usr/ports/devel/ice/files: patch-cpp-test-Ice-properties-run.py
>Only in /usr/ports/devel/ice/files: patch-cpp-test-IceGrid-admin-run.py
>Only in /usr/ports/devel/ice/files: 
>patch-cpp-test-IceGrid-deployer-AllTests.cpp
>Only in /usr/ports/devel/ice/files: patch-cpp-test-IceGrid-deployer-Makefile
>Only in /usr/ports/devel/ice/files: 
>patch-cpp-test-IceGrid-deployer-application.xml
>Only in /usr/ports/devel/ice/files: 
>patch-cpp-test-IceGrid-distribution-AllTests.cpp
>Only in /usr/ports/devel/ice/files: 
>patch-cpp-test-IceGrid-distribution-application.xml
>Only in /usr/ports/devel/ice/files: patch-cpp-test-IceGrid-distribution-run.py
>Only in /usr/ports/devel/ice/files: patch-cpp-test-IceGrid-session-run.py
>Only in /usr/ports/devel/ice/files: 
>patch-cpp-test-IceSSL-configuration-AllTests.cpp
>Only in /usr/ports/devel/ice/files: patch-cpp-test-IceSSL-configuration-run.py
>Only in /usr/ports/devel/ice/files: patch-cpp-test-Slice-headers-run.py
>Only in /usr/ports/devel/ice/files: patch-cpp-test-Slice-unicodePaths-run.py
>Only in /usr/ports/devel/ice/files: patch-cpp-test-include-TestCommon.h
>Only in /usr/ports/devel/ice/files: patch-php-Makefile
>Only in /usr/ports/devel/ice/files: patch-php-config-Make.rules.php
>Only in /usr/ports/devel/ice/files: patch-php-lib-Makefile
>Only in /usr/ports/devel/ice/files: patch-python-Makefile
>Only in /usr/ports/devel/ice/files: patch-python-config-Make.rules
>Only in /usr/ports/devel/ice/files: patch-python-modules-IcePy-Types.cpp
>Only in /usr/ports/devel/ice/files: patch-python-modules-IcePy-Types.h
>Only in /usr/ports/devel/ice/files: patch-python-python-Makefile
>Only in /usr/ports/devel/ice/files: patch-python-test-Ice-info-AllTests.py
>Only in /usr/ports/devel/ice/files: patch-python-test-Ice-properties-run.py
>Only in /usr/ports/devel/ice

Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)

2021-05-20 Thread Rick Macklem
Mark Millard wrote:
>[I warn that I'm a fairly minimal user of NFS
>mounts, not knowing all that much. I'm mostly
>reporting this in case it ends up as evidence
>via eventually matching up with others observing
>possibly related oddities.]
>
>I got the following odd sequence (that I've
>mixed notes into). It involved a diff -r over NFS
>showing differences (files missing) and then a
>later diff finding matches for the same files,
>no file system changes made on either machine.
>I'm unable to reproduce the oddity on demand.
>
>Note: A larger scope diff -r originally returned the
>below as well, but doing the narrower diff -r did
>repeat the result and that is what I show. (I
>make no use of devel/ice .)
>
># diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more
>Only in /usr/ports/devel/ice/files: Make.rules.FreeBSD
>Only in /usr/ports/devel/ice/files: patch-config-Make.common.rules
>Only in /usr/ports/devel/ice/files: patch-cpp-Makefile
>Only in /usr/ports/devel/ice/files: patch-cpp-allTests.py
>Only in /usr/ports/devel/ice/files: patch-cpp-config-Make.rules
>Only in /usr/ports/devel/ice/files: patch-cpp-include-Ice-FactoryTableInit.h
>Only in /usr/ports/devel/ice/files: patch-cpp-include-IceUtil-Config.h
>Only in /usr/ports/devel/ice/files: patch-cpp-include-IceUtil-ScannerConfig.h
>Only in /usr/ports/devel/ice/files: 
>patch-cpp-src-Glacier2CryptPermissionsVerifier-CryptPermissionsVerifierI.cpp
>Only in /usr/ports/devel/ice/files: patch-cpp-src-Ice-ProxyFactory.cpp
>Only in /usr/ports/devel/ice/files: patch-cpp-src-IceGrid-PluginFacadeI.cpp
>Only in /usr/ports/devel/ice/files: patch-cpp-src-IceGrid-RegistryI.cpp
>Only in /usr/ports/devel/ice/files: patch-cpp-src-IceSSL-Makefile
>Only in /usr/ports/devel/ice/files: patch-cpp-test-Glacier2-ssl-Server.cpp
>Only in /usr/ports/devel/ice/files: 
>patch-cpp-test-Glacier2-staticFiltering-run.py
>Only in /usr/ports/devel/ice/files: patch-cpp-test-Ice-info-AllTests.cpp
>Only in /usr/ports/devel/ice/files: patch-cpp-test-Ice-metrics-AllTests.cpp
>Only in /usr/ports/devel/ice/files: patch-cpp-test-Ice-objects-Makefile
>Only in /usr/ports/devel/ice/files: patch-cpp-test-Ice-properties-run.py
>Only in /usr/ports/devel/ice/files: patch-cpp-test-IceGrid-admin-run.py
>Only in /usr/ports/devel/ice/files: 
>patch-cpp-test-IceGrid-deployer-AllTests.cpp
>Only in /usr/ports/devel/ice/files: patch-cpp-test-IceGrid-deployer-Makefile
>Only in /usr/ports/devel/ice/files: 
>patch-cpp-test-IceGrid-deployer-application.xml
>Only in /usr/ports/devel/ice/files: 
>patch-cpp-test-IceGrid-distribution-AllTests.cpp
>Only in /usr/ports/devel/ice/files: 
>patch-cpp-test-IceGrid-distribution-application.xml
>Only in /usr/ports/devel/ice/files: patch-cpp-test-IceGrid-distribution-run.py
>Only in /usr/ports/devel/ice/files: patch-cpp-test-IceGrid-session-run.py
>Only in /usr/ports/devel/ice/files: 
>patch-cpp-test-IceSSL-configuration-AllTests.cpp
>Only in /usr/ports/devel/ice/files: patch-cpp-test-IceSSL-configuration-run.py
>Only in /usr/ports/devel/ice/files: patch-cpp-test-Slice-headers-run.py
>Only in /usr/ports/devel/ice/files: patch-cpp-test-Slice-unicodePaths-run.py
>Only in /usr/ports/devel/ice/files: patch-cpp-test-include-TestCommon.h
>Only in /usr/ports/devel/ice/files: patch-php-Makefile
>Only in /usr/ports/devel/ice/files: patch-php-config-Make.rules.php
>Only in /usr/ports/devel/ice/files: patch-php-lib-Makefile
>Only in /usr/ports/devel/ice/files: patch-python-Makefile
>Only in /usr/ports/devel/ice/files: patch-python-config-Make.rules
>Only in /usr/ports/devel/ice/files: patch-python-modules-IcePy-Types.cpp
>Only in /usr/ports/devel/ice/files: patch-python-modules-IcePy-Types.h
>Only in /usr/ports/devel/ice/files: patch-python-python-Makefile
>Only in /usr/ports/devel/ice/files: patch-python-test-Ice-info-AllTests.py
>Only in /usr/ports/devel/ice/files: patch-python-test-Ice-properties-run.py
>Only in /usr/ports/devel/ice/files: patch-python-test-Slice-unicodePaths-run.py
>Only in /usr/ports/devel/ice/files: patch-scripts-Expect.py
>Only in /usr/ports/devel/ice/files: patch-scripts-IceGridAdmin.py
>Only in /usr/ports/devel/ice/files: patch-scripts-TestUtil.py
>
>Note: The above was not expected. So I tried:
>
># ls -Tld /mnt/devel/ice/files/*
>-rw-r--r--  1 root  wheel   755 Apr 21 21:07:54 2021 
>/mnt/devel/ice/files/Make.rules.FreeBSD
>-rw-r--r--  1 root  wheel  1542 Apr 21 21:07:54 2021 
>/mnt/devel/ice/files/patch-config-Make.common.rules
>-rw-r--r--  1 root  wheel   388 Apr 21 21:07:54 2021 
>/mnt/devel/ice/files/patch-cpp-Makefile
>-rw-r--r--  1 root  wheel  1695 Apr 21 21:07:54 2021 
>/mnt/devel/ice/files/patch-cpp-allTests.py
>-rw-r--r--  1 root  wheel  1112 Apr 21 21:07:54 2021 
>/mnt/devel/ice/files/patch-cpp-config-Make.rules
>-rw-r--r--  1 root  wheel  1512 Apr 21 21:07:54 2021 
>/mnt/devel/ice/files/patch-cpp-include-Ice-FactoryTableInit.h
>-rw-r--r--  1 root  wheel  1496 Apr 21 21:07:54 2021 
>/mnt/devel/ice/files/patch-cpp-include-IceUtil-Config.h
>-rw-