[Bug 754495] [NEW] jfs filesystem corruption after power failure, fast reboot sequences (stale NFS lock)

Roman Fiedler Fri, 08 Apr 2011 03:45:52 -0700

Public bug reported:

Binary package hint: jfsutils


Power failure leads to file system corruption and data loss, probably
because fsck.jfs does not correctly detect the damages in the first run.

See als jfs mailing list discussion http://www.mail-archive.com/jfs-
discuss...@lists.sourceforge.net/msg01682.html

The problem has good reproducibility on a minimal ubuntu lucid install
in vmware. Corruption can be detected using ls -alR, which reports a
"stale NFS lock" on the jfs filesystem. I haven't found a pattern, which
directory or file inodes are usually affected. It seems, that even
unmodified files can be lost also and are sometimes reconnected to
/lost+found (e.g. /etc/resolv.conf or /usr/local/share vanished without
trace, other show up in /lost+found, others show up as "stale NFS lock"
inodes in /lost+found), so one knows that an inode was lost but not its
content.

It is not clear a reboot triggers the corruption, fsck fails to detect it, 
mount therefore OK and error can be detected or if the sequence is:
corruption - fsck invalid repair - modifications cause secondary corruption - 
fsck invalid repair makes corruption visible

To verify this, one would have to run the reproducer on a completely
sane (fresh) filesystem quite often to find the minimal number of
successive reboots to trigger the problem.


To reproduce it on lucid:

* Create init script to trigger test on each reboot:

# cat /etc/init/DiskTest.conf 
description "Start Disktest"

start on filesystem

task

script
  /root/DiskTest/DiskTest.sh >> /root/DiskTest/DiskTest.log 2>&1
end script

* Format a small disk partition

I just did this step to produce a smaller 20MB corrupted image with 60%
diskuse, but corruption does also occur on root partition, so you have
to run multiple test runs to get a result with "non-root" but "data"
corruption

dd if=/dev/zero of=/dev/sdb1
mkfs.jfs -f /dev/sdb1
mkdir /data
mount  /dev/sdb1 /data
# fill data approx 60%, create a dump of this data, adjust tar name in 
DiskTest.sh
umount /data

* Add the test script

# cat /root/DiskTest/DiskTest.sh
#!/bin/bash -e

echo "$(date): Starting disktest" >&2

mountDev=/dev/sdb1
if ! fsck.jfs "${mountDev}" || ! jfs_fsck -n  "${mountDev}"; then
  echo "Fsck failed!" >&2
  exit 1
fi

mount "${mountDev}" /data

if ls -alR / 2>&1 | grep -E -e '(\?|stale )'; then
  echo "Damage marker found" >&2
  exit 1
fi

rm -rf /data/usr/bin/*d*
tar -C /data -xf /root/DiskTest/2011-04-08-ContentOriginal.tar
umount /data

echo "Killing system with hard reboot"
echo "b" > /proc/sysrq-trigger

* Start test

start DiskTest


The problem does also occur after replacing fsck.jfs and jfs_fsck with version 
1.1.15 from jfsutils trunk. The problem seems to be unrelated to a jfs root 
node corruption, which does not produce stale nfs locks but destroys the root 
directory just using mount/unmount multiple times.

$ lsb_release -rd
Description:    Ubuntu 10.04.2 LTS
Release:        10.04

$ apt-cache policy jfsutils
jfsutils:
  Installed: 1.1.12-2.1
  Candidate: 1.1.12-2.1
  Version table:
 *** 1.1.12-2.1 0
        500 http://archive.ubuntu.com/ubuntu/ lucid/main Packages
        100 /var/lib/dpkg/status

** Affects: jfsutils (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/754495

Title:
  jfs filesystem corruption after power failure, fast reboot sequences
  (stale NFS lock)

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 754495] [NEW] jfs filesystem corruption after power failure, fast reboot sequences (stale NFS lock)

Reply via email to