[Bug 1587686] Re: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"
** Changed in: zfs-linux (Ubuntu Xenial) Importance: Undecided => Medium ** Changed in: zfs-linux (Ubuntu Xenial) Assignee: (unassigned) => Colin Ian King (colin-king) ** No longer affects: zfs-linux (Ubuntu) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1587686 Title: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory" To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1587686/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1587686] Re: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"
This bug was fixed in the package zfs-linux - 0.6.5.6-0ubuntu10 --- zfs-linux (0.6.5.6-0ubuntu10) xenial; urgency=medium * Sync with relevant upstream fixes (LP: #1594871) - Fix user namespaces uid/gid mapping As described in torvalds/linux@5f3a4a2 the &init_user_ns, and not the current user_ns, should be passed to posix_acl_from_xattr() and posix_acl_to_xattr(). Conveniently the init_user_ns is available through the init credential (kcred). (upstream commit 874bd959f4f15b3d4b007160ee7ad3f4111dd341) ZFS #4177 - Fix ZPL miswrite of default POSIX ACL Commit 4967a3e introduced a typo that caused the ZPL to store the intended default ACL as an access ACL. Due to caching this problem may not become visible until the filesystem is remounted or the inode is evicted from the cache. Fix the typo. (upstream commit 98f03691a4c08f38ca4538c468e9523f8e6b24be) ZFS #4520 - Create unique partition labels When partitioning a device a name may be specified for each partition. Internally zfs doesn't use this partition name for anything so it has always just been set to "zfs". However this isn't optimal because udev will create symlinks using this name in /dev/disk/by-partlabel/. If the name isn't unique then all the links cannot be created. Therefore a random 64-bit value has been added to the partition label, i.e "zfs-1234567890abcdef". Additional information could be encoded here but since partitions may be reused that might result in confusion and it was decided against. (upstream commit fbffa53a5cdb9b796de5afc9be8c1f79619253d4) ZFS #4517 - Fix inverted logic on none elevator comparison Commit d1d7e2689db9e03f1 ("cstyle: Resolve C style issues") inverted the logic on the none elevator comparison. Fix this and make it cstyle warning clean. (upstream commit 60a4ea3f948f1596b92b666fc7dd21202544edbb) ZFS #4507 - Remove wrong ASSERT in annotate_ecksum When using large blocks like 1M, there will be more than UINT16_MAX qwords in one block, so this ASSERT would go off. Also, it is possible for the histogram to overflow. We cap them to UINT16_MAX to prevent this. (upstream commit 21ea9460fa880bb072a9ca9d845aef740f9d3af6) ZFS #4257 - Fix 'zpool import' blkid device names When importing a pool using the blkid cache only the device node path was added to the list of known paths for a device. This results in 'zpool import' always using the sdX names in preference to the 'path' name stored in the label. To fix the issue the blkid import path has been updated to add both the 'path', 'devid', and 'devname' names from the label to the known paths. A sanity check is done to ensure these paths do refer to the same device identified by blkid. (upstream commit c9ca152fd1de1b0fd959e772b9a25d14a891952b) ZFS #4523, #3043 - Use udev for partition detection When ZFS partitions a block device it must wait for udev to create both a device node and all the device symlinks. This process takes a variable length of time and depends on factors such how many links must be created, the complexity of the rules, etc. Complicating the situation further it is not uncommon for udev to create and then remove a link multiple times while processing the udev rules. In order to address this the zpool_label_disk_wait() function has been updated to use libudev. Until the registered system device acknowledges that it in fully initialized the function will wait. Once fully initialized all device links are checked and allowed to settle for 50ms. This makes it far more likely that all the device nodes will exist when the kernel modules need to open them. For systems without libudev an alternate zpool_label_disk_wait() was updated to include a settle time. In addition, the kernel modules were updated to include retry logic for this ENOENT case. Due to the improved checks in the utilities it is unlikely this logic will be invoked. However, if the rare event it is needed it will prevent a failure. (upstream commit 2cb77346cb698ae0c233c7baf8b4c787205b54e9) ZFS #4523, #3708, #4077, #4144, #4214, #4517 * Fix ztest truncated cache file (LP: #1587686) Commit efc412b updated spa_config_write() for Linux 4.2 kernels to truncate and overwrite rather than rename the cache file. This is the correct fix but it should have only been applied for the kernel build. In user space rename(2) is needed because ztest depends on the cache file. (upstream commit 151f84e2c32f690b92c424d8c55d2dfccaa76e51) ZFS #4129 -- Colin Ian King Tue, 21 Jun 2016 15:49:12 +0100 ** Changed in: zfs-linux (Ubuntu Xenial) Status: Fix Committed => Fix Released -- You received this bug notification
[Bug 1587686] Re: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"
This bug was fixed in the package linux - 4.4.0-30.49 --- linux (4.4.0-30.49) xenial; urgency=low [ Kamal Mostafa ] * Release Tracking Bug - LP: #1597897 * FCP devices are not detected correctly nor deterministically (LP: #1567602) - scsi_dh_alua: Disable ALUA handling for non-disk devices - scsi_dh_alua: Use vpd_pg83 information - scsi_dh_alua: improved logging - scsi_dh_alua: sanitze sense code handling - scsi_dh_alua: use standard logging functions - scsi_dh_alua: return standard SCSI return codes in submit_rtpg - scsi_dh_alua: fixup description of stpg_endio() - scsi_dh_alua: use flag for RTPG extended header - scsi_dh_alua: use unaligned access macros - scsi_dh_alua: rework alua_check_tpgs() to return the tpgs mode - scsi_dh_alua: simplify sense code handling - scsi: Add scsi_vpd_lun_id() - scsi: Add scsi_vpd_tpg_id() - scsi_dh_alua: use scsi_vpd_tpg_id() - scsi_dh_alua: Remove stale variables - scsi_dh_alua: Pass buffer as function argument - scsi_dh_alua: separate out alua_stpg() - scsi_dh_alua: Make stpg synchronous - scsi_dh_alua: call alua_rtpg() if stpg fails - scsi_dh_alua: switch to scsi_execute_req_flags() - scsi_dh_alua: allocate RTPG buffer separately - scsi_dh_alua: Use separate alua_port_group structure - scsi_dh_alua: use unique device id - scsi_dh_alua: simplify alua_initialize() - revert commit a8e5a2d593cb ("[SCSI] scsi_dh_alua: ALUA handler attach should succeed while TPG is transitioning") - scsi_dh_alua: move optimize_stpg evaluation - scsi_dh_alua: remove 'rel_port' from alua_dh_data structure - scsi_dh_alua: Use workqueue for RTPG - scsi_dh_alua: Allow workqueue to run synchronously - scsi_dh_alua: Add new blacklist flag 'BLIST_SYNC_ALUA' - scsi_dh_alua: Recheck state on unit attention - scsi_dh_alua: update all port states - scsi_dh_alua: Send TEST UNIT READY to poll for transitioning - scsi_dh_alua: do not fail for unknown VPD identification linux (4.4.0-29.48) xenial; urgency=low [ Kamal Mostafa ] * Release Tracking Bug - LP: #1597015 * Wireless hotkey fails on Dell XPS 15 9550 (LP: #1589886) - intel-hid: new hid event driver for hotkeys - intel-hid: fix incorrect entries in intel_hid_keymap - intel-hid: allocate correct amount of memory for private struct - intel-hid: add a workaround to ignore an event after waking up from S4. - [Config] CONFIG_INTEL_HID_EVENT=m * cgroupfs mounts can hang (LP: #1588056) - Revert "UBUNTU: SAUCE: (namespace) mqueue: Super blocks must be owned by the user ns which owns the ipc ns" - Revert "UBUNTU: SAUCE: kernfs: Do not match superblock in another user namespace when mounting" - Revert "UBUNTU: SAUCE: cgroup: Use a new super block when mounting in a cgroup namespace" - (namespace) bpf: Use mount_nodev not mount_ns to mount the bpf filesystem - (namespace) bpf, inode: disallow userns mounts - (namespace) ipc: Initialize ipc_namespace->user_ns early. - (namespace) vfs: Pass data, ns, and ns->userns to mount_ns - SAUCE: (namespace) Sync with upstream s_user_ns patches - (namespace) kernfs: The cgroup filesystem also benefits from SB_I_NOEXEC - (namespace) ipc/mqueue: The mqueue filesystem should never contain executables * KVM system crashes after starting guest (LP: #1596635) - xhci: Cleanup only when releasing primary hcd * Upstream patch "crypto: vmx - IV size failing on skcipher API" for Ubuntu 16.04 (LP: #1596557) - crypto: vmx - IV size failing on skcipher API * [Bug]tpm initialization fails on x86 (LP: #1596469) - tpm_crb: drop struct resource res from struct crb_priv - tpm_crb: fix mapping of the buffers * Device shutdown notification for CAPI Flash cards (LP: #1592114) - cxlflash: Fix regression issue with re-ordering patch - cxlflash: Fix to drain operations from previous reset - cxlflash: Add device dependent flags - cxlflash: Shutdown notify support for CXL Flash cards * scsi-modules udeb should include pm80xx (LP: #1595628) - [Config] Add pm80xx scsi driver to d-i * Sync up latest relevant upstream bug fixes (LP: #1594871) - SAUCE: (noup) Update zfs to 0.6.5.6-0ubuntu10 * Cannot compile module tda10071 (LP: #1592531) - [media] tda10071: Fix dependency to REGMAP_I2C * lsvpd doesn't show correct location code for devices attached to a CAPI card (LP: #1594847) - cxl: Make vPHB device node match adapter's * enable CRC32 and AES ARM64 by default or as module (LP: #1594455) - [Config] Enable arm64 AES and CRC32 crypto * VMX kernel crypto module exhibits poor performance in Ubuntu 16.04 (LP: #1592481) - crypto: vmx - comply with ABIs that specify vrsave as reserved. - crypto: vmx - Fix ABI detection - crypto: vmx - Increase priority of aes-cbc cipher * build squashfs i
[Bug 1587686] Re: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"
Colin, please fix this in yakkety so that the SRU can be released. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1587686 Title: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory" To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1587686/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1587686] Re: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"
I've completed 14 hours of testing and cannot reproduce the issue with the -proposed kernel. Also the -proposed kernel passes all the ZFS regression tests, so it looks good to me. ** Tags removed: verification-needed-xenial ** Tags added: verification-done-xenial -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1587686 Title: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory" To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1587686/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1587686] Re: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"
Currently soak testing this. Will be complete in ~8 hours time. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1587686 Title: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory" To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1587686/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1587686] Re: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- xenial' to 'verification-done-xenial'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: verification-needed-xenial -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1587686 Title: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory" To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1587686/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1587686] Re: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"
Tests completed without any issues. I'm marking this as verified. ** Tags added: verification-done -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1587686 Title: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory" To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1587686/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1587686] Re: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"
I'm running some tests on this overnight. Will report back on this later over the weekend. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1587686 Title: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory" To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1587686/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1587686] Re: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"
Hello LeetMiniWheat, or anyone else affected, Accepted zfs-linux into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/zfs- linux/0.6.5.6-0ubuntu10 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance! ** Changed in: zfs-linux (Ubuntu Xenial) Status: New => Fix Committed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1587686 Title: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory" To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1587686/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1587686] Re: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"
** Description changed: + [SRU Justification][XENIAL] + + Problem: Running ztest repeatedly for long periods of time eventually + results in "zdb: can't open 'ztest': No such file or directory" + + [FIX] + + Upstream commit + https://github.com/zfsonlinux/zfs/commit/151f84e2c32f690b92c424d8c55d2dfccaa76e51 + + [TEST CASE] + Without the fix, the ztest will fail after hours of soak testing. With the fix, the issue can't be reproduced. + + [REGRESSION POTENTIAL] + + This fix is an upstream fix and therefore passed the ZFS integration + tested. I have also tested this thoroughly with the kernel team ZFS + regression tests and not found any issues, so the regression potential + is slim to zero. + + -- + + Problem: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory" This bug affects the xenial kernel built-in ZFS as well as the package zfs-dkms. I don't believe ZFS 0.6.3-stable or 0.6.4-release are effected, 0.6.5-release seems to have included the offending commit. Sorry for excessive "Affects" tagging, I'm still new to this and unsure of the proper packages to report this against and/or how to properly add the upstream issues/commits. Upstream bug report: https://github.com/zfsonlinux/zfs/issues/4129 "ztest can occasionally fail because zdb cannot locate the pool after several hours of run time. This appears to be caused be an empty cache file." How to reproduce: run ztest repeatedly such as a command like this and it will eventually fail: ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* (I have /tmp mounted on tmpfs with a 10G limit but I don't believe this is related in any way, and I've confirmed it's not running out of space) Upstream fix: https://github.com/zfsonlinux/zfs/commit/151f84e2c32f690b92c424d8c55d2dfccaa76e51 Description: Fix ztest truncated cache file "Commit efc412b updated spa_config_write() for Linux 4.2 kernels to truncate and overwrite rather than rename the cache file. This is the correct fix but it should have only been applied for the kernel build. In user space rename(2) is needed because ztest depends on the cache file." Associated pull request for above commit: https://github.com/zfsonlinux/zfs/pull/4130 I'm not sure why this wasn't backported to release but it's in zfs master. I've Reproduced this bug on xenial kernels 4.4.0-22-generic, 4.4.0-23-generic, 4.4.0-22-lowlatency, and 4.4.0-23-lowlatency as well as various xenial master-next builds. After applying the above commit patch to kernel and building/installing kernel manually, ztest runs fine. I've also separately tested the commit patch on zfs-dkms package which also appears to fix the issue. Note however, there may still be some other outstanding ztest related issues upstream - especially when preempt and hires timers are used. I'm currently testing more heavily against lowlatency builds and master-next. (I'm unsure how to associate this bug with multiple packages but zfs- dkms and linux-image-* packages both are affected). P.S. Also of note is https://github.com/zfsonlinux/zfs/commit/60a4ea3f948f1596b92b666fc7dd21202544edbb "Fix inverted logic on none elevator comparison" - which interestingly was signed-off-by canonical but curiously not included in the xenial kernel or zfs-dkms packages. It was however, backported to 0.6.5-release upstream. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1587686 Title: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory" To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1587686/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1587686] Re: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"
Appreciate you looking into this, I was only able to test your builds for about 5 hours on generic kernel version so far (doing some hardware upgrades at the moment, but my current test system is torture-test stable). My test hardware was 2x (westmere) Intel Xeon E5620's (2 NUMA nodes) with 12GB (2GBx6) ECC RDIMMs on each CPU (24GB total) on ubuntu-server 16.04. ztest was ran on default /tmp however I had /tmp mounted on tmpfs with 10G limit, but from what I could tell it was not exceeding that limit. I believe this issue becomes more apparent in 4.4.11 and 4.4.12 (and possibly 4.4.13 now) for some reason since those were failing for me within a few hours with this "fix" applied, whereas latest stable I compiled with fix seemed okay. I think there's some race conditions of some sort with newer kernels, especially since I saw different results on the lowlatency kernel awhile back (on the same stable release). I'll do some more testing if I have some time, and I want to test this on some other distros as well but I think the fix might not work on future kernel releases that integrate 4.4.11, 4.4.12, and 4.4.13 since some of the patches may have changed some core functions which uncovered ZFS bugs again. It's still possible it somehow only effects my hardware/OS only. Unless I was compiling the kernel strangely, I was doing a git clone from master-next, checking out latest stable (detached head) and applying/commiting the patch. My 4.4.11 and 4.4.12 builds were were manually applied cleanly from upstream on top of xenial master-next (neither were merged into master-next at the time), so that could also have been a possible issue - there was a few redundant patches I skipped that were already in master-next though. However, the bug still stands on stock stable xenial kernel - and this patch seems to fix it (at least on generic, still unsure about lowlatency). Compiling debian/ubuntu kernels from git is pretty complicated though with conflicting documentation. I was using this command after checking out and appluing patch: fakeroot debian/rules clean fakeroot debian/rules updateconfigs fakeroot debian/rules binary-headers binary-generic binary-perarch (or binary-lowlatency for lowlatency builds) I'm not using cloud-tools packages. Anyways I guess you can close this and it can be reopened if I have time to attempt to reproduce the bug. it's not a critical patch but it's queued for 0.6.5-release upstream so there's probably no harm including it in ubuntu kernel. Thanks -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1587686 Title: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory" To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1587686/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1587686] Re: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"
I've run these tests now for 24 hours w/o issue on the original kernel and on my 'fixed' kernel, with no issues in either, so I'm unable to reproduce this issue. Are there any specific configuration options on your H/W that I need to try and duplicate as maybe my configuration is not able to trip the issue. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1587686 Title: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory" To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1587686/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1587686] Re: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"
Thanks for looking into this, I'll test that build tonight but I assume I'll see similar results. In my previous tests with this commit applied I still occasionally ran into the error (albeit far less often, sometimes not at all, or rather quickly) and also some other traces regarding pthreads (current 0.6.5-release sort of incorrectly uses pthreads and ASSERTs and other things at the moment from what I understand, and there's a lot of upstream work being done on it in master). lowlatency kernels seem to fail faster on it too which is a bit confusing. I still think there's a lot of corner case bugs in ZFS and ztest. Fully fixing ztest/ZFS/SPL for 0.5.6-release would likely be way too invasive to backport, and it looks like bandaids such as this only prolong the inevitable failure. After much cherry picking, trial & error, and commenting on some upsteam commits I don't believe ztest was intended for end users or as a reliable long-term stress tool - nor does it get as much developer attention for releases since it's not a real-world test. One upstream developer/maintainer even commented that ztest is intended for ZFS developers (implying end users shouldn't be using it?) - which makes me question why it's even included in zfsutils-linux if it's fundamentally broken on release versions. If it's this unreliable then it will create many more false positives for others looking to test the stability of Ubuntu's ZFS, resulting in people thikning ZFS or Ubuntu's ZFS implementation is broken when in fact it may be perfectly fine under real world workloads. ztest still works as a short term test for ZFS functions though and this commit probably did belong in release (they've marked it for 0.6.5.8 milestone) but as mentioned above there's many other outstanding issues this tool brings to light (whether falsely positive or not). On a side note, I'd be interested in seeing ZFS ran under AFL (AFL Filesystem fuzzing, a tool which recently discovered many upstream bugs in existing kernel filesystems) since many corner case bugs were found in current filesystems with fixes incoming for backport to 4.4.13, 4.5.7, and 4.6.2 however LinuxFoundation's Oracle AFL event/presentation only included the most commonly used in-kernel filesystems. Sorry if this is noise, but hopefully this will bring more awareness to this issue which may not even be an issue, the correct fix may be to move ztest to another (dev or debug?) package. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1587686 Title: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory" To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1587686/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1587686] Re: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"
I've applied upstream fix https://github.com/zfsonlinux/zfs/commit/151f84e2c32f690b92c424d8c55d2dfccaa76e51 and build some test kernels. I'm currently testing these, but it appears the reproducer takes a while to run to completion. http://kernel.ubuntu.com/~cking/lp-1587686/ If you can test this, I'll get this fix applied as a Stable Release Update. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1587686 Title: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory" To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1587686/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1587686] Re: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"
Sorry, correction to above: I meant perhaps something changed since 4.4.11 since the 4.4.10 based xenial kernels seem fine (except the curious issue with lowlatency build). I've run into some stack traces regarding __pthread and some occasional failures in the spare tests with "returned 0, expected 75" on newer builds. There's a few interesting upstream patches that potentially fix these new errors, though this should probably be brought to the attention of upstream if ztest has issues on 0.6.5-release: "Skip ctldir znode in zfs_rezget to fix snapdir issues" "OpenZFS 6739 - assumption in cv_timedwait_hires" "Fix do_div() types in condvar:timeout" Either way I think xenial should probably sync with upstream 0.6.5.x, though I understand this is a sensitive matter. I reported this upstream against 0.6.5-release about this missing "Fix ztest truncated cache file" patch which hopefully should be queued for the next point release and maybe these other issues will be discovered and fixed as well. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1587686 Title: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory" To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1587686/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1587686] Re: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"
Interestingly this fixes the issue in 4.4.0-23-generic but not 4.4.0-23-lowlatency, and it appears my local builds of master-next on both 4.4.0-24-generic and 4.4.0-24-lowlatency fail still. perhaps something in the 4.4.10 sublevel patches changed something. I've also merged 4.4.11 and 4.4.12 into master-next for my local build to test against those and so far my generic build hasn't failed but on master- next plus 4.4.11 merged it was failing. something is fishy and I'm not sure where, but there's multiple bug reports upstream regarding ztest and some patches in master - but this truncated cache file patch should have been in 0.6.5-release anyways. Also in case this matters, I've been testing on a 2 node NUMA machine (2x xeon) with ECC memory with no reported memory errors. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1587686 Title: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory" To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1587686/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1587686] Re: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"
** Changed in: linux (Ubuntu) Importance: Undecided => Medium ** Changed in: linux (Ubuntu) Assignee: (unassigned) => Colin Ian King (colin-king) ** Changed in: zfs-linux (Ubuntu) Status: New => In Progress ** Changed in: zfs-linux (Ubuntu) Assignee: (unassigned) => Colin Ian King (colin-king) ** Changed in: zfs-linux (Ubuntu) Importance: Undecided => Medium -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1587686 Title: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory" To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1587686/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1587686] Re: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"
** Description changed: Problem: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory" This bug affects the xenial kernel built-in ZFS as well as the package zfs-dkms. I don't believe ZFS 0.6.3-stable is affected, as the offending commit that caused this issue was introduced in 0.6.4-release and 0.6.5-release. Sorry for excessive "Affects" tagging, I'm still new to this and unsure of the proper packages to report this against and/or how to properly add the upstream issues/commits. Upstream bug report: https://github.com/zfsonlinux/zfs/issues/4129 "ztest can occasionally fail because zdb cannot locate the pool after several hours of run time. This appears to be caused be an empty cache file." How to reproduce: run ztest repeatedly such as a command like this and it will eventually fail: ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* (I have /tmp mounted on tmpfs with a 10G limit but I don't believe this is related in any way, and I've confirmed it's not running out of space) Upstream fix: https://github.com/zfsonlinux/zfs/commit/151f84e2c32f690b92c424d8c55d2dfccaa76e51 - Description: + Description: Fix ztest truncated cache file "Commit efc412b updated spa_config_write() for Linux 4.2 kernels to truncate and overwrite rather than rename the cache file. This is the correct fix but it should have only been applied for the kernel build. In user space rename(2) is needed because ztest depends on the cache file." Associated pull request for above commit: https://github.com/zfsonlinux/zfs/pull/4130 I'm not sure why this wasn't backported to release but it's in zfs master. I've Reproduced this bug on xenial kernels 4.4.0-22-generic, 4.4.0-23-generic, 4.4.0-22-lowlatency, and 4.4.0-23-lowlatency as well as various xenial master-next builds. After applying the above commit patch to kernel and building/installing kernel manually, ztest runs fine. I've also separately tested the commit patch on zfs-dkms package which also appears to fix the issue. Note however, there may still be some other outstanding ztest related issues upstream - especially when preempt and hires timers are used. I'm currently testing more heavily against lowlatency builds and master-next. (I'm unsure how to associate this bug with multiple packages but zfs- dkms and linux-image-* packages both are affected). P.S. Also of note is https://github.com/zfsonlinux/zfs/commit/60a4ea3f948f1596b92b666fc7dd21202544edbb "Fix inverted logic on none elevator comparison" - which interestingly was signed-off-by canonical but curiously not included in the xenial kernel or zfs-dkms packages. It was however, backported to 0.6.5-release upstream. ** Description changed: Problem: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory" This bug affects the xenial kernel built-in ZFS as well as the package - zfs-dkms. I don't believe ZFS 0.6.3-stable is affected, as the offending - commit that caused this issue was introduced in 0.6.4-release and - 0.6.5-release. Sorry for excessive "Affects" tagging, I'm still new to - this and unsure of the proper packages to report this against and/or how - to properly add the upstream issues/commits. + zfs-dkms. I don't believe ZFS 0.6.3-stable or 0.6.4-release are + effected, 0.6.5-release seems to have included the offending commit. + Sorry for excessive "Affects" tagging, I'm still new to this and unsure + of the proper packages to report this against and/or how to properly add + the upstream issues/commits. Upstream bug report: https://github.com/zfsonlinux/zfs/issues/4129 "ztest can occasionally fail because zdb cannot locate the pool after several hours of run time. This appears to be caused be an empty cache file." How to reproduce: run ztest repeatedly such as a command like this and it will eventually fail: ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* (I have /tmp mount
[Bug 1587686] Re: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"
** Description changed: Problem: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory" This bug affects the xenial kernel built-in ZFS as well as the package - zfs-dkms. + zfs-dkms. I don't believe ZFS 0.6.3-stable is affected, as the offending + commit that caused this issue was introduced in 0.6.4-release and + 0.6.5-release Upstream bug report: https://github.com/zfsonlinux/zfs/issues/4129 "ztest can occasionally fail because zdb cannot locate the pool after several hours of run time. This appears to be caused be an empty cache file." How to reproduce: run ztest repeatedly such as a command like this and it will eventually fail: ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* (I have /tmp mounted on tmpfs with a 10G limit but I don't believe this is related in any way, and I've confirmed it's not running out of space) Upstream fix: https://github.com/zfsonlinux/zfs/commit/151f84e2c32f690b92c424d8c55d2dfccaa76e51 Description: "Commit efc412b updated spa_config_write() for Linux 4.2 kernels to truncate and overwrite rather than rename the cache file. This is the correct fix but it should have only been applied for the kernel build. In user space rename(2) is needed because ztest depends on the cache file." Associated pull request for above commit: https://github.com/zfsonlinux/zfs/pull/4130 I'm not sure why this wasn't backported to release but it's in zfs master. I've Reproduced this bug on xenial kernels 4.4.0-22-generic, 4.4.0-23-generic, 4.4.0-22-lowlatency, and 4.4.0-23-lowlatency as well as various xenial master-next builds. After applying the above commit patch to kernel and building/installing kernel manually, ztest runs fine. I've also separately tested the commit patch on zfs-dkms package which also appears to fix the issue. Note however, there may still be some other outstanding ztest related issues upstream - especially when preempt and hires timers are used. I'm currently testing more heavily against lowlatency builds and master-next. (I'm unsure how to associate this bug with multiple packages but zfs- dkms and linux-image-* packages both are affected). P.S. Also of note is https://github.com/zfsonlinux/zfs/commit/60a4ea3f948f1596b92b666fc7dd21202544edbb "Fix inverted logic on none elevator comparison" - which interestingly was signed-off-by canonical but curiously not included in the xenial kernel or zfs-dkms packages. It was however, backported to 0.6.5-release upstream. ** Also affects: zfs Importance: Undecided Status: New ** Description changed: Problem: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory" This bug affects the xenial kernel built-in ZFS as well as the package zfs-dkms. I don't believe ZFS 0.6.3-stable is affected, as the offending commit that caused this issue was introduced in 0.6.4-release and - 0.6.5-release + 0.6.5-release. Sorry for excessive "Affects" tagging, I'm still new to + this and unsure of the proper packages to report this against and/or how + to properly add the upstream issues/commits. Upstream bug report: https://github.com/zfsonlinux/zfs/issues/4129 "ztest can occasionally fail because zdb cannot locate the pool after several hours of run time. This appears to be caused be an empty cache file." How to reproduce: run ztest repeatedly such as a command like this and it will eventually fail: ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* (I have /tmp mounted on tmpfs with a 10G limit but I don't believe this is related in any way, and I've confirmed it's not running out of space) Upstream fix: https://github.com/zfsonlinux/zfs/commit/151f84e2c32f690b92c424d8c55d2dfccaa76e51 Description: "Commit efc412b updated spa_config_write() for Linux 4.2 kernels to truncate and overwrite rather than rename the cache file. This is the correct fix but it should have only been applied for the k