[Bug 1916486] Re: zfs_zrele_async can cause txg sync deadlocks

2021-12-08 Thread Bug Watch Updater
** Changed in: zfs-linux (Debian)
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1916486

Title:
  zfs_zrele_async can cause txg sync deadlocks

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1916486/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1916486] Re: zfs_zrele_async can cause txg sync deadlocks

2021-03-18 Thread Launchpad Bug Tracker
This bug was fixed in the package zfs-linux - 0.6.5.6-0ubuntu29

---
zfs-linux (0.6.5.6-0ubuntu29) xenial; urgency=medium

  * Fix race condition in zfs_iput_async (LP: #1916486)
- Upstream ZFS fix 43eaef6de817 ("Fix zrele race in zrele_async that can
  cause hang")

 -- Heitor Alves de Siqueira   Thu, 25 Feb 2021
18:21:48 +

** Changed in: zfs-linux (Ubuntu Xenial)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1916486

Title:
  zfs_zrele_async can cause txg sync deadlocks

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1916486/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1916486] Re: zfs_zrele_async can cause txg sync deadlocks

2021-03-18 Thread Launchpad Bug Tracker
This bug was fixed in the package zfs-linux - 0.7.5-1ubuntu16.11

---
zfs-linux (0.7.5-1ubuntu16.11) bionic; urgency=medium

  * Fix race condition in zfs_iput_async (LP: #1916486)
- Upstream ZFS fix 43eaef6de817 ("Fix zrele race in zrele_async that can
  cause hang")

 -- Heitor Alves de Siqueira   Thu, 25 Feb 2021
17:20:20 +

** Changed in: zfs-linux (Ubuntu Bionic)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1916486

Title:
  zfs_zrele_async can cause txg sync deadlocks

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1916486/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1916486] Re: zfs_zrele_async can cause txg sync deadlocks

2021-03-18 Thread Launchpad Bug Tracker
This bug was fixed in the package zfs-linux - 0.8.3-1ubuntu12.7

---
zfs-linux (0.8.3-1ubuntu12.7) focal; urgency=medium

  * Fix race condition in zfs_iput_async (LP: #1916486)
- Upstream ZFS fix 43eaef6de817 ("Fix zrele race in zrele_async that can
  cause hang")

 -- Heitor Alves de Siqueira   Thu, 25 Feb 2021
19:48:51 +

** Changed in: zfs-linux (Ubuntu Focal)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1916486

Title:
  zfs_zrele_async can cause txg sync deadlocks

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1916486/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1916486] Re: zfs_zrele_async can cause txg sync deadlocks

2021-03-18 Thread Launchpad Bug Tracker
This bug was fixed in the package zfs-linux - 0.8.4-1ubuntu11.2

---
zfs-linux (0.8.4-1ubuntu11.2) groovy; urgency=medium

  * Fix race condition in zfs_iput_async (LP: #1916486)
- Upstream ZFS fix 43eaef6de817 ("Fix zrele race in zrele_async that can
  cause hang")

 -- Heitor Alves de Siqueira   Thu, 25 Feb 2021
19:53:32 +

** Changed in: zfs-linux (Ubuntu Groovy)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1916486

Title:
  zfs_zrele_async can cause txg sync deadlocks

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1916486/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1916486] Re: zfs_zrele_async can cause txg sync deadlocks

2021-03-18 Thread Heitor Alves de Siqueira
There's a build failure for zfs-linux on riscv64/focal, but that should
be safe to ignore for this patch. The riscv64 builds seem to have failed
since they were enabled in 0.8.3-1ubuntu11, and are related to an
"Unsupported ISA type" error in the libspl header files. It's likely
that we're missing some compatibility patches from upstream for this
specific architecture.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1916486

Title:
  zfs_zrele_async can cause txg sync deadlocks

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1916486/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1916486] Re: zfs_zrele_async can cause txg sync deadlocks

2021-03-12 Thread Heitor Alves de Siqueira
Verified for Xenial, using the ZFS test suite. I don't think Xenial
ships zfs-test, so I pulled it from upstream and ran it against the DKMS
drivers. No new regressions were reported.

ubuntu@z-rotomvm27:~$ dpkg -l | grep zfs-dkms
ii  zfs-dkms  0.6.5.6-0ubuntu29 
  amd64Native OpenZFS filesystem kernel modules for Linux


** Tags added: verification-done-xenial

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1916486

Title:
  zfs_zrele_async can cause txg sync deadlocks

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1916486/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1916486] Re: zfs_zrele_async can cause txg sync deadlocks

2021-03-12 Thread Heitor Alves de Siqueira
Verified for Groovy, through the ZFS test suite. No new regressions were
reported.

ubuntu@racah:/usr/share/zfs$ dpkg -l | grep zfs-dkms
ii  zfs-dkms 0.8.4-1ubuntu11.2   
all  OpenZFS filesystem kernel modules for Linux


** Tags added: verification-done-groovy

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1916486

Title:
  zfs_zrele_async can cause txg sync deadlocks

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1916486/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1916486] Re: zfs_zrele_async can cause txg sync deadlocks

2021-03-12 Thread Heitor Alves de Siqueira
Verified for Focal, through the ZFS test suite. No new regressions were
reported.

ubuntu@z-rotomvm29:/usr/share/zfs$ dpkg -l | rg zfs-dkms
ii  zfs-dkms 0.8.3-1ubuntu12.7 
all  OpenZFS filesystem kernel modules for Linux


** Tags added: verification-done-focal

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1916486

Title:
  zfs_zrele_async can cause txg sync deadlocks

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1916486/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1916486] Re: zfs_zrele_async can cause txg sync deadlocks

2021-03-12 Thread Heitor Alves de Siqueira
Verified for Bionic, through the ZFS test suite. No new regressions were
reported, and results of the test suite are the same as for the previous
version.

ubuntu@z-rotomvm28:/usr/share/zfs$ dpkg -l |grep zfs-dkms
ii  zfs-dkms   0.7.5-1ubuntu16.11   
   all  OpenZFS filesystem kernel modules for Linux

** Tags added: verification-done-bionic

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1916486

Title:
  zfs_zrele_async can cause txg sync deadlocks

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1916486/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1916486] Re: zfs_zrele_async can cause txg sync deadlocks

2021-03-04 Thread Ɓukasz Zemczak
Hello Heitor, or anyone else affected,

Accepted zfs-linux into groovy-proposed. The package will build now and
be available at https://launchpad.net/ubuntu/+source/zfs-
linux/0.8.4-1ubuntu11.2 in a few hours, and then in the -proposed
repository.

Please help us by testing this new package.  See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed.  Your feedback will aid us getting this
update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested, what testing has been
performed on the package and change the tag from verification-needed-
groovy to verification-done-groovy. If it does not fix the bug for you,
please add a comment stating that, and change the tag to verification-
failed-groovy. In either case, without details of your testing we will
not be able to proceed.

Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in
advance for helping!

N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.

** Changed in: zfs-linux (Ubuntu Groovy)
   Status: In Progress => Fix Committed

** Changed in: zfs-linux (Ubuntu Focal)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1916486

Title:
  zfs_zrele_async can cause txg sync deadlocks

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1916486/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1916486] Re: zfs_zrele_async can cause txg sync deadlocks

2021-03-03 Thread Colin Ian King
FYI, I've sponsored these and uploaded, now waiting in -proposed.  I
also tested these patches on and AMD64 VM using the more exhaustive
kernel team ZFS tests suite and they passed.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1916486

Title:
  zfs_zrele_async can cause txg sync deadlocks

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1916486/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1916486] Re: zfs_zrele_async can cause txg sync deadlocks

2021-03-02 Thread Heitor Alves de Siqueira
** Description changed:

  [Impact]
  TXG sync stalls, causing ZFS workloads to hang indefinitely
  
  [Description]
  For certain ZFS workloads, we can see hung task timeouts in the kernel logs 
due to a transaction group deadlock. Userspace process will hang and display 
stack traces similar to the one below:
  [49181.619711] clnt_server D0 21699  28868 0x0320
  [49181.619715] Call Trace:
  [49181.619725]  __schedule+0x24e/0x880
  [49181.619730]  schedule+0x2c/0x80
  [49181.619750]  cv_wait_common+0x11e/0x140 [spl]
  [49181.619763]  ? wait_woken+0x80/0x80
  [49181.619775]  __cv_wait+0x15/0x20 [spl]
  [49181.619872]  zil_commit.part.14+0x80/0x8c0 [zfs]
  [49181.619884]  ? _cond_resched+0x19/0x40
  [49181.619887]  ? mutex_lock+0x12/0x40
  [49181.619959]  zil_commit+0x17/0x20 [zfs]
  [49181.620026]  zfs_fsync+0x77/0xe0 [zfs]
  [49181.620093]  zpl_fsync+0x68/0xa0 [zfs]
  [49181.620100]  vfs_fsync_range+0x51/0xb0
  [49181.620105]  do_fsync+0x3d/0x70
  [49181.620109]  SyS_fsync+0x10/0x20
  [49181.620114]  do_syscall_64+0x73/0x130
  [49181.620119]  entry_SYSCALL_64_after_hwframe+0x41/0xa6
  
  We also might see a kworker thread blocking in the zfs writeback/evict path:
  [49181.881570] kworker/u17:3   D0  4915  2 0x8000
  [49181.881576] Workqueue: writeback wb_workfn (flush-zfs-10)
  [49181.881577] Call Trace:
  [49181.881580]  __schedule+0x24e/0x880
  [49181.881582]  ? atomic_t_wait+0x60/0x60
  [49181.881584]  schedule+0x2c/0x80
  [49181.881588]  bit_wait+0x11/0x60
  [49181.881592]  __wait_on_bit+0x4c/0x90
  [49181.881596]  ? atomic_t_wait+0x60/0x60
  [49181.881599]  __inode_wait_for_writeback+0xb9/0xf0
  [49181.881601]  ? bit_waitqueue+0x40/0x40
  [49181.881605]  inode_wait_for_writeback+0x26/0x40
  [49181.881609]  evict+0xb5/0x1a0
  [49181.881611]  iput+0x19c/0x230
  [49181.881648]  zfs_iput_async+0x1d/0x80 [zfs]
  [49181.881682]  zfs_get_data+0x1d4/0x2a0 [zfs]
  [49181.881718]  zil_commit.part.14+0x640/0x8c0 [zfs]
  [49181.881752]  zil_commit+0x17/0x20 [zfs]
  [49181.881784]  zpl_writepages+0xd5/0x160 [zfs]
  [49181.881787]  do_writepages+0x4b/0xe0
  [49181.881790]  __writeback_single_inode+0x45/0x350
  [49181.881792]  ? __writeback_single_inode+0x45/0x350
  [49181.881794]  writeback_sb_inodes+0x1d7/0x530
  [49181.881796]  wb_writeback+0xfb/0x300
  [49181.881799]  wb_workfn+0xad/0x400
  [49181.881800]  ? wb_workfn+0xad/0x400
  [49181.881803]  ? __switch_to_asm+0x35/0x70
  [49181.881809]  process_one_work+0x1de/0x420
  [49181.881811]  worker_thread+0x32/0x410
  [49181.881813]  kthread+0x121/0x140
  [49181.881815]  ? process_one_work+0x420/0x420
  [49181.881817]  ? kthread_create_worker_on_cpu+0x70/0x70
  [49181.881819]  ret_from_fork+0x35/0x40
  
  This is caused by a race between ZFS writeback and evict threads,
  usually during a transaction group sync operation. It's possible to have
  two iput() threads racing for the same inode: one of them scheduled
  async and the other executed synchronously as part of the writeback
  path. If the writeback thread tries to evict the inode while the async
  thread is running, it might re-enter the block layer for the same inode
  due to ZFS counters being in an inconsistent state. This then causes the
  kworker thread to stall the writeback, which in turn prevents the
  transaction group sync to complete and locks other ZFS threads.
  
  This is fixed by the upstream commit:
  - Fix zrele race in zrele_async that can cause hang (43eaef6de817) [0]
  
  [Test Case]
  Being a race condition, this issue has been hard to reproduce consistently. 
This has been reported on heavy I/O workloads, mixing file creation and 
deletion. We have some reports both from upstream and from Ubuntu users that 
this is usually reproducible on e.g. heavy SQL workloads or on complex 
ccache-enabled builds [1].
  
  [0] https://github.com/openzfs/zfs/pull/11530
  [1] https://github.com/openzfs/zfs/issues/11527
  
  [Regression Potential]
- The patch has been tested in the ZFS test suite and in production 
environments, so the potential for further regressions should be fairly 
controlled. Potential regressions might arise in the ZFS writeback path, so we 
should monitor heavy I/O workloads that put a lot of stress in the sync and 
evict paths.
+ The patch has been tested in the ZFS test suite and in production 
environments, so the potential for further regressions should be fairly 
controlled. Potential regressions might arise in the ZFS writeback path, 
causing write hangs and eventually stalling all ZFS-backed operations 
indefinitely. We should monitor heavy I/O workloads that put a lot of stress in 
the sync and evict paths to exercise the new changes.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1916486

Title:
  zfs_zrele_async can cause txg sync deadlocks

To manage notifications about this bug go to:

[Bug 1916486] Re: zfs_zrele_async can cause txg sync deadlocks

2021-03-02 Thread Heitor Alves de Siqueira
** Patch added: "lp1916486-groovy.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1916486/+attachment/5471921/+files/lp1916486-groovy.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1916486

Title:
  zfs_zrele_async can cause txg sync deadlocks

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1916486/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1916486] Re: zfs_zrele_async can cause txg sync deadlocks

2021-03-02 Thread Heitor Alves de Siqueira
** Patch added: "lp1916486-focal.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1916486/+attachment/5471920/+files/lp1916486-focal.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1916486

Title:
  zfs_zrele_async can cause txg sync deadlocks

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1916486/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1916486] Re: zfs_zrele_async can cause txg sync deadlocks

2021-03-02 Thread Heitor Alves de Siqueira
** Patch added: "lp1916486-bionic.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1916486/+attachment/5471919/+files/lp1916486-bionic.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1916486

Title:
  zfs_zrele_async can cause txg sync deadlocks

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1916486/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1916486] Re: zfs_zrele_async can cause txg sync deadlocks

2021-03-02 Thread Heitor Alves de Siqueira
** Patch added: "lp1916486-xenial.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1916486/+attachment/5471918/+files/lp1916486-xenial.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1916486

Title:
  zfs_zrele_async can cause txg sync deadlocks

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1916486/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1916486] Re: zfs_zrele_async can cause txg sync deadlocks

2021-03-02 Thread Heitor Alves de Siqueira
-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1916486

Title:
  zfs_zrele_async can cause txg sync deadlocks

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1916486/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1916486] Re: zfs_zrele_async can cause txg sync deadlocks

2021-02-27 Thread Bug Watch Updater
** Changed in: zfs-linux (Debian)
   Status: Unknown => New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1916486

Title:
  zfs_zrele_async can cause txg sync deadlocks

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1916486/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1916486] Re: zfs_zrele_async can cause txg sync deadlocks

2021-02-25 Thread Launchpad Bug Tracker
** Merge proposal linked:
   
https://code.launchpad.net/~halves/ubuntu/+source/zfs-linux/+git/zfs-linux/+merge/398738

** Merge proposal linked:
   
https://code.launchpad.net/~halves/ubuntu/+source/zfs-linux/+git/zfs-linux/+merge/398740

** Merge proposal linked:
   
https://code.launchpad.net/~halves/ubuntu/+source/zfs-linux/+git/zfs-linux/+merge/398741

** Merge proposal linked:
   
https://code.launchpad.net/~halves/ubuntu/+source/zfs-linux/+git/zfs-linux/+merge/398743

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1916486

Title:
  zfs_zrele_async can cause txg sync deadlocks

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1916486/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1916486] Re: zfs_zrele_async can cause txg sync deadlocks

2021-02-25 Thread Heitor Alves de Siqueira
** Changed in: zfs-linux (Ubuntu Hirsute)
   Status: In Progress => Fix Released

** Changed in: zfs-linux (Ubuntu Hirsute)
 Assignee: Heitor Alves de Siqueira (halves) => (unassigned)

** Description changed:

  [Impact]
  TXG sync stalls, causing ZFS workloads to hang indefinitely
  
  [Description]
  For certain ZFS workloads, we can see hung task timeouts in the kernel logs 
due to a transaction group deadlock. Userspace process will hang and display 
stack traces similar to the one below:
  [49181.619711] clnt_server D0 21699  28868 0x0320
  [49181.619715] Call Trace:
  [49181.619725]  __schedule+0x24e/0x880
  [49181.619730]  schedule+0x2c/0x80
  [49181.619750]  cv_wait_common+0x11e/0x140 [spl]
  [49181.619763]  ? wait_woken+0x80/0x80
  [49181.619775]  __cv_wait+0x15/0x20 [spl]
  [49181.619872]  zil_commit.part.14+0x80/0x8c0 [zfs]
  [49181.619884]  ? _cond_resched+0x19/0x40
  [49181.619887]  ? mutex_lock+0x12/0x40
  [49181.619959]  zil_commit+0x17/0x20 [zfs]
  [49181.620026]  zfs_fsync+0x77/0xe0 [zfs]
  [49181.620093]  zpl_fsync+0x68/0xa0 [zfs]
  [49181.620100]  vfs_fsync_range+0x51/0xb0
  [49181.620105]  do_fsync+0x3d/0x70
  [49181.620109]  SyS_fsync+0x10/0x20
  [49181.620114]  do_syscall_64+0x73/0x130
  [49181.620119]  entry_SYSCALL_64_after_hwframe+0x41/0xa6
  
  We also might see a kworker thread blocking in the zfs writeback/evict path:
  [49181.881570] kworker/u17:3   D0  4915  2 0x8000
  [49181.881576] Workqueue: writeback wb_workfn (flush-zfs-10)
  [49181.881577] Call Trace:
  [49181.881580]  __schedule+0x24e/0x880
  [49181.881582]  ? atomic_t_wait+0x60/0x60
  [49181.881584]  schedule+0x2c/0x80
  [49181.881588]  bit_wait+0x11/0x60
  [49181.881592]  __wait_on_bit+0x4c/0x90
  [49181.881596]  ? atomic_t_wait+0x60/0x60
  [49181.881599]  __inode_wait_for_writeback+0xb9/0xf0
  [49181.881601]  ? bit_waitqueue+0x40/0x40
  [49181.881605]  inode_wait_for_writeback+0x26/0x40
  [49181.881609]  evict+0xb5/0x1a0
  [49181.881611]  iput+0x19c/0x230
  [49181.881648]  zfs_iput_async+0x1d/0x80 [zfs]
  [49181.881682]  zfs_get_data+0x1d4/0x2a0 [zfs]
  [49181.881718]  zil_commit.part.14+0x640/0x8c0 [zfs]
  [49181.881752]  zil_commit+0x17/0x20 [zfs]
  [49181.881784]  zpl_writepages+0xd5/0x160 [zfs]
  [49181.881787]  do_writepages+0x4b/0xe0
  [49181.881790]  __writeback_single_inode+0x45/0x350
  [49181.881792]  ? __writeback_single_inode+0x45/0x350
  [49181.881794]  writeback_sb_inodes+0x1d7/0x530
  [49181.881796]  wb_writeback+0xfb/0x300
  [49181.881799]  wb_workfn+0xad/0x400
  [49181.881800]  ? wb_workfn+0xad/0x400
  [49181.881803]  ? __switch_to_asm+0x35/0x70
  [49181.881809]  process_one_work+0x1de/0x420
  [49181.881811]  worker_thread+0x32/0x410
  [49181.881813]  kthread+0x121/0x140
  [49181.881815]  ? process_one_work+0x420/0x420
  [49181.881817]  ? kthread_create_worker_on_cpu+0x70/0x70
  [49181.881819]  ret_from_fork+0x35/0x40
  
  This is caused by a race between ZFS writeback and evict threads,
  usually during a transaction group sync operation. It's possible to have
  two iput() threads racing for the same inode: one of them scheduled
  async and the other executed synchronously as part of the writeback
  path. If the writeback thread tries to evict the inode while the async
  thread is running, it might re-enter the block layer for the same inode
  due to ZFS counters being in an inconsistent state. This then causes the
  kworker thread to stall the writeback, which in turn prevents the
  transaction group sync to complete and locks other ZFS threads.
  
  This is fixed by the upstream commit:
- - Fix zrele race in zrele_async that can cause hang (2921ad6cba54) [0]
+ - Fix zrele race in zrele_async that can cause hang (43eaef6de817) [0]
  
  [Test Case]
  Being a race condition, this issue has been hard to reproduce consistently. 
This has been reported on heavy I/O workloads, mixing file creation and 
deletion. We have some reports both from upstream and from Ubuntu users that 
this is usually reproducible on e.g. heavy SQL workloads or on complex 
ccache-enabled builds [1].
  
  [0] https://github.com/openzfs/zfs/pull/11530
  [1] https://github.com/openzfs/zfs/issues/11527
  
  [Regression Potential]
  The patch has been tested in the ZFS test suite and in production 
environments, so the potential for further regressions should be fairly 
controlled. Potential regressions might arise in the ZFS writeback path, so we 
should monitor heavy I/O workloads that put a lot of stress in the sync and 
evict paths.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1916486

Title:
  zfs_zrele_async can cause txg sync deadlocks

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1916486/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1916486] Re: zfs_zrele_async can cause txg sync deadlocks

2021-02-22 Thread Heitor Alves de Siqueira
** Changed in: zfs-linux (Ubuntu Bionic)
   Status: New => In Progress

** Changed in: zfs-linux (Ubuntu Xenial)
   Status: New => In Progress

** Changed in: zfs-linux (Ubuntu Focal)
   Status: New => In Progress

** Changed in: zfs-linux (Ubuntu Groovy)
   Status: New => In Progress

** Changed in: zfs-linux (Ubuntu Hirsute)
   Status: Confirmed => In Progress

** Bug watch added: Debian Bug tracker #983331
   https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=983331

** Also affects: zfs-linux (Debian) via
   https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=983331
   Importance: Unknown
   Status: Unknown

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1916486

Title:
  zfs_zrele_async can cause txg sync deadlocks

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1916486/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs