Re: [Lustre-discuss] Inode errors at time of job failure

2009-08-07 Thread Thomas Roth
Hi Oleg, thanks for your reply. I'm not able to reproduce this error at will, though. There are files reported missing by our users, but I couldn't correlate these with the ll_inode_revalidate_fini errors, at least not directly. In fact, some of the missing files reappeared later, as reported in

Re: [Lustre-discuss] Inode errors at time of job failure

2009-08-06 Thread Thomas Roth
Hi, these ll_inode_revalidate_fini errors are unfortunately quite known to us. So what would you guess if that happens again and again, on a number of clients - MDT softly dying away? Because we haven't seen any mass evictions (and no reasons for that) in connection with these errors. Or could the

Re: [Lustre-discuss] Inode errors at time of job failure

2009-08-06 Thread Oleg Drokin
Hello! On Aug 6, 2009, at 12:57 PM, Thomas Roth wrote: Hi, these ll_inode_revalidate_fini errors are unfortunately quite known to us. So what would you guess if that happens again and again, on a number of clients - MDT softly dying away? No, I do not think this is MDT problem of any

[Lustre-discuss] Inode errors at time of job failure

2009-08-05 Thread Daniel Kulinski
What would cause the following error to appear? LustreError: 10991:0:(file.c:2930:ll_inode_revalidate_fini()) failure -2 inode 14520180 This happened at the same time a job failed. Error number 2 is ENOENT which means that this inode does not exist? Is there a way to query the MDS to

Re: [Lustre-discuss] Inode errors at time of job failure

2009-08-05 Thread Oleg Drokin
Hello! On Aug 5, 2009, at 3:12 PM, Daniel Kulinski wrote: What would cause the following error to appear? Typically this is some sort of a race where you presume an inode exist (because you have some traces of it in memory), but it is not anymore (on mds, anyway). So when client comes to