Bug#962254: Umask ignored when mounting NFSv4.2 share of an exported ZFS (with acltype=off) (was: Re: Bug#962254: NFS(v4) broken at 4.19.118-2)

2020-06-15 Thread Elliott Mitchell
On Mon, Jun 15, 2020 at 10:50:35AM -0400, J. Bruce Fields wrote:
> Honestly I don't think I currently have a regression test for this so
> it's possible I could have missed something upstream.  I haven't seen
> any reports, though
> 
> ZFS's ACL implementation is very different from any in-tree
> filesystem's, and given limited time, a filesystem with no prospect of
> going upstream isn't going to get much attention, so, yes, I'd need to
> see a reproducer on xfs or ext4 or something.

Salvatore managing to reproduce it with ext4 yet all prior reports with
the filesystem used being known was ZFS seems to suggest one of two
things.

First, could be enabling POSIX ACLs has been very strongly pushed by
other filesystems, while ZFS hasn't pushed them as strongly.

Second, could be a substantial majority of users of NFS are using ZFS.

If the former, this simply means an additional test case is needed.  If
the latter, then any testing of NFS which excludes ZFS is going to have
underwhelming coverage.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#962254: Umask ignored when mounting NFSv4.2 share of an exported ZFS (with acltype=off) (was: Re: Bug#962254: NFS(v4) broken at 4.19.118-2)

2020-06-15 Thread J. Bruce Fields
On Sat, Jun 13, 2020 at 11:45:27AM -0700, Elliott Mitchell wrote:
> I disagree with this assessment.  All of the reporters have been using
> ZFS, but this could indicate an absence of testers using other
> filesystems.  We need someone with a NFS server which has a 4.15+ kernel
> and uses a different filesystem which supports ACLs.

Honestly I don't think I currently have a regression test for this so
it's possible I could have missed something upstream.  I haven't seen
any reports, though

ZFS's ACL implementation is very different from any in-tree
filesystem's, and given limited time, a filesystem with no prospect of
going upstream isn't going to get much attention, so, yes, I'd need to
see a reproducer on xfs or ext4 or something.

--b.



Bug#962254: Umask ignored when mounting NFSv4.2 share of an exported ZFS (with acltype=off) (was: Re: Bug#962254: NFS(v4) broken at 4.19.118-2)

2020-06-15 Thread Christoph Hellwig
If you are violating our license please also don't spam our list when
using your crappy combination.



Bug#962254: Umask ignored when mounting NFSv4.2 share of an exported ZFS (with acltype=off) (was: Re: Bug#962254: NFS(v4) broken at 4.19.118-2)

2020-06-13 Thread Elliott Mitchell
On Sat, Jun 13, 2020 at 02:54:31PM +0200, Salvatore Bonaccorso wrote:
> indicated this was specifically observed on ZFS on Linux only. Seth
> Arnold's answer seem to be inline with that that the issue is more on
> the ZFS on Linux side and the issue keeps biting people a bit
> unexpectedly. Why does this break with ACL off settings?

I disagree with this assessment.  All of the reporters have been using
ZFS, but this could indicate an absence of testers using other
filesystems.  We need someone with a NFS server which has a 4.15+ kernel
and uses a different filesystem which supports ACLs.

I'm though doubtful ACLs are related to the actual problem.  My
impression of what I've read is they're a useful tool to work around the
problem, but not related to the actual cause.


> But there was at least one other (but again without further
> detail/followups) that it was observed on an export from OpenWRT, but
> no specific details here:
> 
> https://bugs.openwrt.org/index.php?do=details_id=2581

This appears to be the same reporter as the RedHat bug report (comment 3
on the RedHat report).  This is a report for the server portion of the
reporter's setup.

Analyzing the setup, I disagree with one of the prior assessment of this
report.  This is OpenWRT on x86_64 hardware which would suggest a
high-end router or embedded device.  Such might well have ECC memory and
a processor fast enough to handle ZFS.



Let me add one more data point.  I had been thinking I might need the
additional features in Linux-ZFS 0.7.12.  As such my NFS server had been
running a 4.9 kernel with Debian's ZFS 0.7.12-2+debg10u1~bpo9+1 packages.
Now with the problem manifesting my NFS server is running a 4.19 kernel
with Debian's ZFS 0.7.12-2+deb10u2 packages.

I could well believe the actual root cause is a problem with the
Linux-ZFS implementation.  What manifested the problem though seems to be
in Linux's NFS implementation between 4.9 and 4.15.  ie Linux-ZFS
implemented /something/ which worked when implemented, but may not have
properly implemented the intended API and was broken by Linux-NFS.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#962254: Umask ignored when mounting NFSv4.2 share of an exported ZFS (with acltype=off) (was: Re: Bug#962254: NFS(v4) broken at 4.19.118-2)

2020-06-13 Thread Salvatore Bonaccorso
Hi Elliott,

[I'm adding linux-nfs upstream hopefully J. Bruce Fields or others can
help clarifying]

On Thu, Jun 11, 2020 at 03:37:11PM -0700, Elliott Mitchell wrote:
> Bit more experimentation on this issue.
> 
> I tried a very small C program meant to create files with fewer
> permissions bits set.  This succeeded which strengthens the theory of
> the umask getting ignored.
> 
> I haven't seen anything hinting whether this is more a client or server
> issue.
> 
> I can speculate perhaps somewhere between 4.9 and 4.15 the NFS client
> code stepped closer to proper the "proper" 4.2 protocol.  If a
> corresponding NFS server was slow at getting merged, what we're seeing
> could happen.
> 
> Alternatively someone was trying to get a Linux NFS v4.2 client to work
> better with a different NFS v4.2 server, so they fixed Linux's NFS v4.2
> client.  Yet they failed to test with Linux's v4.2 server.
> 
> 
> This though is speculation.  All I can say is sometime between kernels
> 4.9 and 4.15, NFS v4.2 got broken.  There are hints this is related to
> handling of umask.
 
I was initially confused because of the mentioning of only appearing
with the update to 4.19.118-2 but this is now cleared up, so it shows
up when changing from 4.9.x from stretch to 4.19.x.

Now I'm quite unsure if this should and is to be considered a Linux
kernel issue. What follows is just what I found with respect of the
mentioned behaviour. There is a specific aspect of the NFSv4.2
implementation:

In upstream, with [nfsv4.2-umask-support], [47057abde515] NFSv4.2
support was added. The repsective RFC describing it is [RFC8275].

[nfsv4.2-umask-support]: 

[47057abde515]: 

[RFC8275]: 

Since, they allow the umask to be ignored in the presence of
inheritable NFSv4 ACLs.

Now what is or will be confusing is that the behaviour is reproducible
with ZFS default of acltype=off (aclinherit=restricted, sharenfs=off).

Reproducing the issue is easy as follows (all done on Debian unstable
to verify the behaviours can be triggered there as well with more
current 5.6.14-2, zfs-linux on 0.8.4-1):

# zpool create zfs_test /dev/vdb

and exporting /zfs_test in /etc/exports as

/zfs_test 192.168.122.1/24(rw,sync,no_subtree_check,no_root_squash)

The properties of zfs_test would be:

# zfs get acltype,aclinherit,sharenfs zfs_test
NAME  PROPERTYVALUE  SOURCE
zfs_test  acltype offlocal
zfs_test  aclinherit  restricted local
zfs_test  sharenfsoffdefault

And reproducing then with

# mount -t nfs 192.168.122.150:/zfs_test /mnt
# mkdir /mnt/foo && ls -ld /mnt/foo && rmdir /mnt/foo
drwxrwxrwx 2 root root 2 Jun 13 14:25 /mnt/fo
# umount /mnt

The comment from J. Bruce Fields, in
https://bugzilla.redhat.com/show_bug.cgi?id=1667761#c1 can help debug
it further:

> To start debugging this, I'd recommend looking running wireshark to
> sniff traffic while running your reproducer (mount, mkdir) and
> compare to what's expected from the umask RFC.  Somewhere there
> should be a getattr from the client for the supported_attrs
> attribute, and the reply from the server will probably indicate
> support for the new mode_umask attribute.  If you find the CREATE
> operation that creates the new directory, you should see the client
> set the mode_umask attribute, with the mode part set to the open
> mode and the umask to the process umask.  If those values look
> right, then the problem is likely on the server side.

In fact in sniffing the traffic, there, the gettattr from the client and the
server does indicate support for the new mode_umask. Then later in the CREATE
operation, the client sets the mode_umask attribute, with mode part set to
'0777' and umask to '022'. The mode replied is then as well '0777'.

If further needed to debug we should try to distill a sniff with
wireshark providing the repsective pcap.

https://bugzilla.redhat.com/show_bug.cgi?id=1667761

did not further contain specific information on followups.

https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1779736

indicated this was specifically observed on ZFS on Linux only. Seth
Arnold's answer seem to be inline with that that the issue is more on
the ZFS on Linux side and the issue keeps biting people a bit
unexpectedly. Why does this break with ACL off settings?

But there was at least one other (but again without further
detail/followups) that it was observed on an export from OpenWRT, but
no specific details here:

https://bugs.openwrt.org/index.php?do=details_id=2581

Both Debian bugs itself were as well with underlying ZFS filesystem exported:
https://bugs.debian.org/934160
https://bugs.debian.org/962254

Any hint on were to pin-point the issue? Both on Linux anf ZFS on
Linux side or only on one of the components?

Regards,
Salvatore