Re: [NILFS users] nilfs_check_page: bad entry in directory #37945

admin Tue, 17 Mar 2009 04:17:42 -0700

Hi,

Actually I have lots of free space, so I will hold the image until you
tell me that it is no longer needed.


Bye,
David Arendt

> Hi,
> On Mon, 16 Mar 2009 23:17:03 +0100, David Arendt wrote:
>> Hi,
>>
>> I also received numerous nilfs related error messages. I have
>> attached them to the end of this mail.  As the backup of yesterday
>> worked flawlessly, the corruption must have occured today.
>>
>> As I am doing video editing on nilfs2 filesystems, and so I don't want
>> the cleaner to interfere with it, I always do mount -i and if I am short
>> on space, I call the cleaner manually and send it a SIGTERM signal, once
>> I see no more cleaning activity. I have modified the number of segments
>> to clean from 2 to 20 in order to speed up cleaning.
>>
>> As this moring, I had 95% of used space, I called the cleaner manually.
>> The cleaner seemed to work normally so after there was no more disk
>> activity, I sent it a SIGTERM signal. As I am running on gentoo, I did
>> an emerge --sync followed by an emerge --update --deep --ask world, so
>> there was lots of io on this partition (it was the root one). During
>> this, the system crashed completely  without reaction to any key. As I
>> find no data on this crash in the log files, I can't say if the nilfs
>> corruption was leading to this crash or if this crash was leading to the
>> nilfs corruption.
>
> Thank you for the detail report.
>
> We are now testing nilfs under such high capacity condition.
>
> I first suspected a problem of directory implementation, but now it
> seems to be corruption involved with GC.
>
> I suspect that your nilfs had read blocks which were reclaimed by
> clearnerd and overwritten by new logs for some reason.
>
> The cycle of reusing segments will be shorter than usual in such case.
> So, it's a likely cause though we carefully designed nilfs not to
> suffer such trouble.
>
> Another possibility is that the secondary superblock introduced in
> nilfs-2.0.10 broke an in-use segment.  If the corruption will not
> occur for nilfs-2.0.9, it would be the cause.
>
>> After this crash, I first saw error messages from nilfs.
>>
>> I did a dd if=/dev/sda3 of=imagefile bs=8192 of the partition prior to
>> the restore, so if you like to further investigate into this, the
>> corrupted partition is still mountable via a loop device. If this data
>> is useless for you, please tell me so, then I will delete the image
>> file.
>>
>> Bye,
>> David Arendt
>
> Could we have some time?  If we can reproduce the problem, it is the
> best.  If we cannot, I'd like to ask for more help.
>
> Sorry for inconvenience.
>
> Regards,
> Ryusuke Konishi
>
> Ryusuke Konishi wrote:
>> On Tue, 17 Mar 2009 00:09:05 +0900 (JST), Ryusuke Konishi wrote:
>>
>>> Hi David,
>>> On Mon, 16 Mar 2009 11:18:13 +0100, David Arendt wrote:
>>>
>>>> Hi,
>>>>
>>>> this morning I discovered this in /var/log/messages
>>>>
>>>> Mar 16 10:59:00 server NILFS error (device sda3): nilfs_check_page:
>>>> bad
>>>> entry in directory #37945: unaligned directory entry - offset=0,
>>>> inode=1919250021, rec_len=14411, name_len=67
>>>>
>>>> I am using nilfs 2.0.10.
>>>>
>>>> What should I do about this error ? Should I ignore it or should
>>>> special
>>>> care be taken about it ?
>>>>
>>>>
>>> offset=0 --> first entry of the directory.
>>> inode=1919250021 --> unnatural inode number. seems invalid.
>>> rec_len=14411 --> invalid.
>>>
>>> So, this directory is completely broken. Maybe b-tree of the directory
>>> is pointing to a wrong block.
>>>
>>> Is this reproducible by umount/mount and ls -R ?
>>>
>>
>> The directory's inode number was shown in the log!
>> You can find it by:
>>
>>  $ find /mount-dir -inum 37945 -ls
>>
>>
>> Regards,
>> Ryusuke Konishi
>
>


_______________________________________________
users mailing list
[email protected]
https://www.nilfs.org/mailman/listinfo/users

Re: [NILFS users] nilfs_check_page: bad entry in directory #37945

Reply via email to