Re: [lustre-discuss] corrupt FID on zfs?

2018-04-09 Thread Stu Midgley
> Try it with the last field as 0x0, like "[0x20a48:0x1e86e:0x0]".
> On the OST, we use the last field to store the stripe index for the file,
> so that LFSCK can reconstruct the file layout even if the MDT inode is
> corrupted.


OK, thanks.  Setting it to 0x0 worked and returned

No such file or directory

as expected.


> That is not unusual, since the parent (MDT inode) FID is only stored into
the
> object if it is modified by a client, or if an LFSCK layout check is run.


Interesting.  So these files can only be created files without any
content.  I can safely ignore those :)


> It would be great if you could submit this as a patch to Gerrit.


Will do.  This tool is good and can be extended to handle many cases.

 * unmounted (existing)
 * mounted (my version)
 * mounted RO snapshot (my version)
 * version that uses getfattr which is WAY faster than calling zdb

My workflow was

ssh to OSS
snapshot OST
mount RO snapshot
find O/ -type f
pass file names to zfsobj2fid
ssh to lustre client
sudo
pass FID's to lfs fid2path

the zfsobj2fid step was very slow for a large number of files, so I also
have a getfattr version that is much faster.

I'll see what I can put together.

As a side note, it would make things a LOT easier if lfs fid2path worked
with the value stored in trusted.fid extended attribute without modification

eg.


trusted.fid=0xcc0e0200a02a1100
FID=[0x20ecc:0x2aa0:0x0]




--
Dr Stuart Midgley
sdm...@gmail.com
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] corrupt FID on zfs?

2018-04-09 Thread Dilger, Andreas
On Apr 9, 2018, at 02:10, Stu Midgley  wrote:
> 
> Afternoon
> 
> We have copied off all the files from an OST (lfs find identifies no files on 
> the OST) but the OST still has some left over files
> 
> eg.
> 
> 9.6G  O/0/d22/1277942
> 
> when I get the FID of this file using zfsobj2fid it appears to get a corrupt 
> FID
> 
> [0x20a48:0x1e86e:0x1]
> 
> which then returns
> 
> bad FID format '[0x20a48:0x1e86e:0x1]', should be [seq:oid:ver] (e.g. 
> [0x20400:0x2:0x0])
> 
> fid2path: error on FID [0x20a48:0x1e86e:0x1]: Invalid argument
> 
> when I check it with lfs fid2path

Try it with the last field as 0x0, like "[0x20a48:0x1e86e:0x0]".
On the OST, we use the last field to store the stripe index for the file,
so that LFSCK can reconstruct the file layout even if the MDT inode is
corrupted.

> WTF?
> 
> Checking a few OST's this isn't isolated.  I've seen a few different 
> corruptions eg.
> 
> [0x20a48:0x1e86e:0x7]
> [0x20a48:0x1e684:0x3]
> 
> 
> Extra, quite a file files under the O/0/ directory didn't have trusted.fid 
> set... which seemed strange.

That is not unusual, since the parent (MDT inode) FID is only stored into the
object if it is modified by a client, or if an LFSCK layout check is run.

> So a few questions.  
> How did the FID type get corrupt?
> How did this file get orphaned?
> 
> I had to modify zfsobj2fid  to work with a mounted snapshot of the ZFS volume
> 
> # diff ../zfsobj2fid /sbin/zfsobj2fid
> 38c38
> < p = subprocess.Popen(["zdb", "-O", "-vvv", sys.argv[1], sys.argv[2]],
> ---
> > p = subprocess.Popen(["zdb", "-e", "-vvv", sys.argv[1], sys.argv[2]],

It would be great if you could submit this as a patch to Gerrit.


Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation







___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org