[Bug 485562] [NEW] Data loss on ext3, maybe related to data=journal

Jürgen Kreileder Thu, 19 Nov 2009 15:45:31 -0800

Public bug reported:

I'm currently testing a backup scheme on a new karmic installation.  The
procedure worked flawlessly on jaunty and older Ubuntu/Debian
distributions (albeit using hardware RAID on those, the new machine uses
a software RAID).  With karmic however I'm experiencing data loss (at
least on the designated backup partition).


The partition in question gets mounted once per hour.  The respective
entry in /etc/fstab is

UUID="7420cd8f-dd47-4fdb-b64e-4fd02f945e43"     /srv/backup
ext3
noatime,nodiratime,user_xattr,acl,noauto,nodev,nosuid,data=journal
0       2

The partition is an LVM2 logical volume which runs on a single PV on a
RAID 1 composed of 2 disks (driver is AHCI).

I noticed the data loss because I use sitecopy to push the backups to
another machine after each backup run.  On about 1 out of 3 backup runs
sitecopy complains about a corrupted state file.  I didn't check the
backups for the integrity yet as I can reproduce the problem with
sitecopy alone easily.

To reproduce it I do:

# cd /srv/backup/backup2l/scripts/
# cp data.1001.all.tar.gpg xxxx # change something so sitecopy has something to 
push
# sitecopy -r /srv/backup/backup2l/scripts/.sitecopyrc -p 
/srv/backup/backup2l/scripts/.sitecopy  -q -u backup
# cd /
# umount /srv/backup
# mount /srv/backup
# less /srv/backup/backup2l/scripts/.sitecopy/backup

In about one out of three runs, the last step step shows a corrupted
file:  Old contents + rest filled with zeros or a truncated file.

dmesg and syslog show nothing.  In particular no journal-replay related
message.  Adding a "fsck.ext3 -f  /dev/vg0/srv_backup" before mounting
shows no problem either, still the file gets corrupted every now and
then.

So far I've discovered two ways to work around the problem:
* Don't use "data=journal".  Both data=writeback and data=ordered seem to work 
fine
* Do "less /srv/backup/backup2l/scripts/.sitecopy/backup" before the unmount

Especially the latter seems to suggest a strange flush problem with the
data=journal code in karmic's current x86-64 kernel (2.6.31.15.28).

# sudo lvdisplay /dev/vg0/srv_backup
  --- Logical volume ---
  LV Name                /dev/vg0/srv_backup
  VG Name                vg0
  LV UUID                KXZqxv-v8MQ-UD4x-41Vf-2c2t-0wsr-etUNjQ
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                128.00 GB
  Current LE             32768
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:12
   
# sudo pvdisplay 
  --- Physical volume ---
  PV Name               /dev/md2
  VG Name               vg0
  PV Size               693.63 GB / not usable 4.12 MB
  Allocatable           yes 
  PE Size (KByte)       4096
  Total PE              177567
  Free PE               64927
  Allocated PE          112640
  PV UUID               FHAWPv-otHj-jpDD-x35T-nE0Q-13uB-30GuSt

# cat /proc/mdstat 
Personalities : [raid1] 
md2 : active raid1 sda3[0] sdb3[1]
      727318656 blocks [2/2] [UU]
      
md1 : active raid1 sda2[0] sdb2[1]
      1052160 blocks [2/2] [UU]
      
md0 : active raid1 sda1[0] sdb1[1]
      4192896 blocks [2/2] [UU]
      
unused devices: <none>

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New

-- 
Data loss on ext3, maybe related to data=journal
https://bugs.launchpad.net/bugs/485562
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 485562] [NEW] Data loss on ext3, maybe related to data=journal

Reply via email to