Have created a 100% reliable reproducer test case and also determined
the Ubuntu-specific patch 4701-enable-ARC-FILL-LOCKED-flag.patch to fix
Bug #1900889 is likely the cause.

[Test Case]

The important parts are:
- Use encryption
- rsync the zfs git tree
- Use parallel I/O from silversearcher-ag to access it after a reboot. A simple 
"find ." or "find . -exec cat {} > /dev/null \;" does not reproduce the issue.

Reproduction done using a libvirt VM installed from the Ubuntu Impish
daily livecd using a normal ext4 root but with a second 4GB /dev/vdb
disk for zfs later

= Preparation
apt install silversearcher-ag git zfs-dkms zfsutils-linux
echo -n testkey2 > /root/testkey
git clone https://github.com/openzfs/zfs /root/zfs

= Test Execution
zpool create test /dev/vdb
zfs create test/test -o encryption=on -o keyformat=passphrase -o 
keylocation=file:///root/testkey
rsync -va --progress -HAX /root/zfs/ /test/test/zfs/

# If you access the data now it works fine.
reboot

zfs load-key test/test
zfs mount -a
cd /test/test/zfs/
ag DISKS= 

= Test Result
ag hangs, "sudo dmesg" shows an exception

[Analysis]
I rebuilt the zfs-linux 2.0.6-1ubuntu1 package from ppa:colin-king/zfs-impish 
without the Ubuntu-specific patch ubuntu/4701-enable-ARC-FILL-LOCKED-flag.patch 
which fixed Bug #1900889. With this patch disabled the issue does not 
reproduce. Re-enabling the patch it reproduces reliably every time again.

Seems this bug was never sent upstream. No code changes upstream setting the 
flag ARC_FILL_IN_PLACE appear to have been added since that I can see however 
interestingly the code for this ARC_FILL_IN_PLACE handling was added to fix a 
similar sounding issue "Raw receive fix and encrypted objset security fix" 
 in 
https://github.com/openzfs/zfs/commit/69830602de2d836013a91bd42cc8d36bbebb3aae 
. This first shipped in zfs 0.8.0 and the original bug was filed against 0.8.3.

I also have found the same issue as the original Launchpad bug reported 
upstream without any fixes and a lot of discussion (and quite a few duplicates 
linking back to 11679):
https://github.com/openzfs/zfs/issues/11679
https://github.com/openzfs/zfs/issues/12014

Without fully understanding the ZFS code in relation to this flag, the
code at
https://github.com/openzfs/zfs/blob/ce2bdcedf549b2d83ae9df23a3fa0188b33327b7/module/zfs/arc.c#L2026
talks about how this flag is to do with decrypting blocks in the ARC and
doing so 'inplace'. It makes some sense thus that I need encryption to
reproduce it and it works best after a reboot (thus flushing the ARC)
and why I can still read the data in the test case before doing a reboot
when it then fails.

This patch was added in 0.8.4-1ubuntu15 and I first experienced the
issue somewhere between 0.8.4-1ubuntu11 and 0.8.4-1ubuntu16.

So it all adds up and I suggest that this patch should be reverted.

** Bug watch added: github.com/openzfs/zfs/issues #11679
   https://github.com/openzfs/zfs/issues/11679

** Bug watch added: github.com/openzfs/zfs/issues #12014
   https://github.com/openzfs/zfs/issues/12014

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1906476

Title:
  PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 ==
  sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED,
  &zp->z_sa_hdl)) failed

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1906476/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to