Public bug reported:

We periodically see an issue where unmounting a ZFS filesystem fails
with EBUSY, even though there appears to be no one using it.

    # cat /proc/self/mounts | grep 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive
    domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive zfs 
rw,nosuid,nodev,noexec,relatime,xattr,noacl 0 0

'lsof' and 'fuser' show no processes using any of the files in the
problematic filesystem:

    # ls -l 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive/
    total 221
    -rw-r----- 1 500 500  52736 May 22 11:01 1_19_1008904362.dbf
    -rw-r----- 1 500 500 541696 May 22 11:03 1_20_1008904362.dbf
    # fuser 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive/1_20_1008904362.dbf
    # fuser 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive/1_19_1008904362.dbf
    # fuser 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive/
    # lsof | grep 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive
    #

The filesystem was shared over NFS, but has since been unshared:

    # showmount -e | grep 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive
    #

Since no one appears to be using the filesystem, our expectation is that
it should be possible to unmount the filesystem. However, attempts to
unmount the filesystem fail with EBUSY:

    # zfs destroy 
domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive
    umount: 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive: target 
is busy.
    cannot unmount 
'/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive': 
umount failed
    # umount 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive
    umount: 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive: target 
is busy.


Using bpftrace, we can see that the unmount is failing in 
'propagate_mount_busy()' in the kernel. Using a live kernel debugger, we can 
look at the 'mount' struct for this particular mount and see that the 
'mnt_count' refcount summed across all CPUs is 2. For filesystems that are 
eligible for unmounting, the refcount is 1.

The only way to work around this issue that we have found is to reboot,
at which point the filesystem can be unmounted and destroyed.


So far, we have only been able to reproduce this using a workload driven by our 
application. The application mananges ZFS filesystems in groups, and the 
lifecycle of each group looks something like

    - Create and mount a group of filesystems, 1 parent and 4 children:
        /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370
        /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/datafile
        /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/external
        /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive
        /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/temp
    - Share all 5 filesystems over NFS
    - A client mounts all 5 shares using NFSv3
    - For a few hours, the client does NFS operations on the filesystems and 
the server occasionally takes ZFS snapshots of them
    - Unshare filesystems
    - Unmount filesystems
    - Delete filesystems

These groups of filesystems are constantly being created and destroyed.
At any given time, we have ~30k filesystems on the system, about 5k of
which are shared. On average, one out of ~200-300k unmounts fails with
this EBUSY error. To create and destroy this many filesystems takes us
about a week or so.

Note that we are using ZFS built from https://github.com/delphix/zfs,
which is essentially master ZFS on Linux.

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-4.15.0-50-generic 4.15.0-50.54
ProcVersionSignature: Ubuntu 4.15.0-50.54-generic 4.15.18
Uname: Linux 4.15.0-50-generic x86_64
NonfreeKernelModules: zfs zunicode zcommon znvpair zavl icp
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116,  1 May 20 19:10 seq
 crw-rw---- 1 root audio 116, 33 May 20 19:10 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7.6
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
Date: Tue Jun 11 05:28:21 2019
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
Lsusb: Error: [Errno 2] No such file or directory: 'lsusb': 'lsusb'
MachineType: VMware, Inc. VMware Virtual Platform
PciMultimedia:
 
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 svgadrmfb
ProcKernelCmdLine: 
BOOT_IMAGE=/ROOT/username.QbVhgpM/root@/boot/vmlinuz-4.15.0-50-generic 
root=ZFS=rpool/ROOT/username.QbVhgpM/root ro console=tty0 console=ttyS0,38400n8 
ipv6.disable=1 crashkernel=1024M-:512M
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-50-generic N/A
 linux-backports-modules-4.15.0-50-generic  N/A
 linux-firmware                             1.173.6
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
WifiSyslog:
 
dmi.bios.date: 09/21/2015
dmi.bios.vendor: Phoenix Technologies LTD
dmi.bios.version: 6.00
dmi.board.name: 440BX Desktop Reference Platform
dmi.board.vendor: Intel Corporation
dmi.board.version: None
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 1
dmi.chassis.vendor: No Enclosure
dmi.chassis.version: N/A
dmi.modalias: 
dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd09/21/2015:svnVMware,Inc.:pnVMwareVirtualPlatform:pvrNone:rvnIntelCorporation:rn440BXDesktopReferencePlatform:rvrNone:cvnNoEnclosure:ct1:cvrN/A:
dmi.product.name: VMware Virtual Platform
dmi.product.version: None
dmi.sys.vendor: VMware, Inc.

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: Confirmed


** Tags: amd64 apport-bug bionic uec-images

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1832384

Title:
  Unable to unmount apparently unused filesystem

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832384/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to