Public bug reported: Binary package hint: jfsutils
Power failure leads to file system corruption and data loss, probably because fsck.jfs does not correctly detect the damages in the first run. See als jfs mailing list discussion http://www.mail-archive.com/jfs- discuss...@lists.sourceforge.net/msg01682.html The problem has good reproducibility on a minimal ubuntu lucid install in vmware. Corruption can be detected using ls -alR, which reports a "stale NFS lock" on the jfs filesystem. I haven't found a pattern, which directory or file inodes are usually affected. It seems, that even unmodified files can be lost also and are sometimes reconnected to /lost+found (e.g. /etc/resolv.conf or /usr/local/share vanished without trace, other show up in /lost+found, others show up as "stale NFS lock" inodes in /lost+found), so one knows that an inode was lost but not its content. It is not clear a reboot triggers the corruption, fsck fails to detect it, mount therefore OK and error can be detected or if the sequence is: corruption - fsck invalid repair - modifications cause secondary corruption - fsck invalid repair makes corruption visible To verify this, one would have to run the reproducer on a completely sane (fresh) filesystem quite often to find the minimal number of successive reboots to trigger the problem. To reproduce it on lucid: * Create init script to trigger test on each reboot: # cat /etc/init/DiskTest.conf description "Start Disktest" start on filesystem task script /root/DiskTest/DiskTest.sh >> /root/DiskTest/DiskTest.log 2>&1 end script * Format a small disk partition I just did this step to produce a smaller 20MB corrupted image with 60% diskuse, but corruption does also occur on root partition, so you have to run multiple test runs to get a result with "non-root" but "data" corruption dd if=/dev/zero of=/dev/sdb1 mkfs.jfs -f /dev/sdb1 mkdir /data mount /dev/sdb1 /data # fill data approx 60%, create a dump of this data, adjust tar name in DiskTest.sh umount /data * Add the test script # cat /root/DiskTest/DiskTest.sh #!/bin/bash -e echo "$(date): Starting disktest" >&2 mountDev=/dev/sdb1 if ! fsck.jfs "${mountDev}" || ! jfs_fsck -n "${mountDev}"; then echo "Fsck failed!" >&2 exit 1 fi mount "${mountDev}" /data if ls -alR / 2>&1 | grep -E -e '(\?|stale )'; then echo "Damage marker found" >&2 exit 1 fi rm -rf /data/usr/bin/*d* tar -C /data -xf /root/DiskTest/2011-04-08-ContentOriginal.tar umount /data echo "Killing system with hard reboot" echo "b" > /proc/sysrq-trigger * Start test start DiskTest The problem does also occur after replacing fsck.jfs and jfs_fsck with version 1.1.15 from jfsutils trunk. The problem seems to be unrelated to a jfs root node corruption, which does not produce stale nfs locks but destroys the root directory just using mount/unmount multiple times. $ lsb_release -rd Description: Ubuntu 10.04.2 LTS Release: 10.04 $ apt-cache policy jfsutils jfsutils: Installed: 1.1.12-2.1 Candidate: 1.1.12-2.1 Version table: *** 1.1.12-2.1 0 500 http://archive.ubuntu.com/ubuntu/ lucid/main Packages 100 /var/lib/dpkg/status ** Affects: jfsutils (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/754495 Title: jfs filesystem corruption after power failure, fast reboot sequences (stale NFS lock) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs