[Ocfs2-users] Unreported temporary file corruption

Nuno Tavares Sat, 13 Feb 2010 12:36:31 -0800

Greetings all,

I'm wondering if anyone can shed some light here.


Some days ago an user reported problems dealing with a specific
directory. After further investigation, I'm now suspecting that there
was data corruption between a specific time period.

Addressing the initial issue, I've checked /var/log/messages just in
case and it had lots of messages like this:

Jan 18 14:24:40 fsnode01 kernel: (4626,1):ocfs2_check_dir_entry:111 ERROR:
bad entry in directory #8075816: directory entry across blocks -
offset=0, inode=1164370764863544510, rec_len=57912, name_len=167
Jan 18 14:24:40 fsnode01 kernel: (4626,1):ocfs2_prepare_dir_for_insert:1734
ERROR: status = -2
Jan 18 14:24:40 fsnode01 kernel: (4626,1):ocfs2_mknod:240 ERROR: status = -2
Jan 18 14:24:59 fsnode01 kernel: (4264,0):ocfs2_check_dir_entry:111 ERROR:
bad entry in directory #8075816: directory entry across blocks -
offset=0, inode=1164370764863544510, rec_len=57912, name_len=167
Jan 18 14:24:59 fsnode01 kernel: (4264,0):ocfs2_prepare_dir_for_insert:1734
ERROR: status = -2
Jan 18 14:24:59 fsnode01 kernel: (4264,0):ocfs2_mknod:240 ERROR: status = -2
[...]

Indeed, that inode was bound to the problematic directory:

[r...@fsnode01 ~]# debugfs.ocfs2 -R "findpath <8075816>" /dev/sdc1
        8075816 /storage/problematic/directory/

So I brought the cluster down and requested a filesystem check which
dumped a lot of messages like this:

Cluster 1135086 is claimed by the following inodes:
  /storage/unrelated/file1
  /storage/unrelated/file2
[DUP_CLUSTERS_CLONE] Inode "/storage/unrelated/file1" may be cloned or
deleted to break the claim it has on its clusters. Clone inode
"/storage/unrelated/file1" to break claims on clusters it shares with
other inodes? y
pass1d: Invalid argument passed to OCFS2 library while reading inode to
clone

Just check that pass1d (last) message. I've checked my tools, and
although they mismatch, they are the latest versions available:
[r...@fsnode01 ~]# rpm -qa | grep ocfs
ocfs2-tools-1.4.3-1.el5
ocfs2-2.6.18-164.el5-1.4.4-1.el5
ocfs2console-1.4.3-1.el5

Notice kernel modules are 1.4.4 and tools are 1.4.3. Could this version
mismatch cause the pass1d error? Does it have any consequence? I've
checked again, they were the only ones available...

I must say /storage/unrelated/* are all PDF files. However, there are
some damaged ones, and I've tracked some down using 'file -bi' to an
interval of time between 'Jan 18 09:47' and 'Jan 18 12:24'. I could only
track these files because 'file' reported a damaged PDF header, but I
can't be sure the other ones are all OK, I can just say their header is OK.

Also worth mentioning is that there are other files between that time
interval that seem to be OK (again, I can't be sure). I can't be certain
when this mess was started and when did the cluster recovered from this
mess.

I'm almost sure the files were OK when they were "about to be" stored on
/storage. This investigation suggested they were damaged *during* their
existence on /storage. I've now taken appropriate measures to prove this
in the future.

What is puzzling me is:
* I now have knowledge of corrupted files, and I don't even know how
many there is. I just know they are as much or more than those 'file'
detected. Some of the files whose inodes fsck.ocfs2 tried to clone
belong to the supra time period, and this suggests there were some kind
of mess going on that the cluster wrote different files parts on the
same blocks.. what could have caused this, and how do I avoid happening
again?

* Show I turn on tracing for a particular bit? Which one?

* How can I monitor OCFS2 health on a running cluster?

Regards,
-- 
Nuno Tavares
DRI, Consultoria Informática
Telef: +351 936 184 086

signature.asc
Description: OpenPGP digital signature

_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

[Ocfs2-users] Unreported temporary file corruption

Reply via email to