Re: [Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)
> The journal will prevent inconsistencies in the filesystem in case of a crash. > It cannot prevent corruption of the on-disk data, inconsistencies caused by > cache > enabled on the disks or in a RAID controller, software bugs, memory > corruption, bad cables, etc. The OSS is part of a 'Snowbird' installation, so the RAID/Disk part should be fine. I hope that we 'just' hit a small software bug :-/ > That is why it is still a good idea for users to run e2fsck periodically on a > filesystem. Ok, we will keep this in mind (e2fsck was surprisingly fast anyway!) Regards, Adrian -- RFC 1925: (11) Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] O_DIRECT
On 2010-08-14, at 1:32, Michael Kluge wrote: > how does Lustre handle write() requests to files opened with O_DIRECT. > Does the OSS enforce that the OST has physically written the data to the > OST before the op is completed or does the write() call return on the > client before this? The write will be submitted directly from the client to the OST, and the OST always does synchronous writes, regardless of whether it is O_DIRECT or not. It cannot return from the syscall until the write is complete, because those pages are shared from userspace. Cheers, Andreas ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)
On 2010-08-14, at 2:28, Adrian Ulrich wrote: >> - the on-disk structure of the object directory for this OST is corrupted. >> Run "e2fsck -fp /dev/{ostdev}" on the unmounted OST filesystem. > > e2fsck fixed it: The OST is now running since 40 minutes without problems: > > But shouldn't the journal of ext3/ldiskfs make running e2fsck unnecessary? The journal will prevent inconsistencies in the filesystem in case of a crash. It cannot prevent corruption of the on-disk data, inconsistencies caused by cache enabled on the disks or in a RAID controller, software bugs, memory corruption, bad cables, etc. That is why it is still a good idea for users to run e2fsck periodically on a filesystem. If you are using LVM there is an lvcheck script I wrote that can check a filesystem snapshot on a running system, but otherwise you should do it whenever the opportunity arises. Cheers, Andreas ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)
> - the on-disk structure of the object directory for this OST is corrupted. > Run "e2fsck -fp /dev/{ostdev}" on the unmounted OST filesystem. e2fsck fixed it: The OST is now running since 40 minutes without problems: e2fsck 1.41.6.sun1 (30-May-2009) lustre1-OST0005: recovering journal lustre1-OST0005 has been mounted 72 times without being checked, check forced. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Directory inode 440696867, block 493, offset 0: directory corrupted Salvage? yes Directory inode 440696853, block 517, offset 0: directory corrupted Salvage? yes Directory inode 440696842, block 560, offset 0: directory corrupted Salvage? yes Pass 3: Checking directory connectivity Pass 4: Checking reference counts Unattached inode 17769156 Connect to /lost+found? yes Inode 17769156 ref count is 2, should be 1. Fix? yes Unattached zero-length inode 17883901. Clear? yes Pass 5: Checking group summary information lustre1-OST0005: * FILE SYSTEM WAS MODIFIED * lustre1-OST0005: 44279/488382464 files (15.4% non-contiguous), 280329314/1953524992 blocks But shouldn't the journal of ext3/ldiskfs make running e2fsck unnecessary? Have a nice weekend and thanks a lot for the fast reply! Regards, Adrian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] O_DIRECT
Hi all, how does Lustre handle write() requests to files opened with O_DIRECT. Does the OSS enforce that the OST has physically written the data to the OST before the op is completed or does the write() call return on the client before this? I do not see the whole file content walking through the FC port of the RAID controller, but it can also be that my measurement is wrong ... Michael -- Michael Kluge, M.Sc. Technische Universität Dresden Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Contact: Willersbau, Room WIL A 208 Phone: (+49) 351 463-34217 Fax:(+49) 351 463-37773 e-mail: michael.kl...@tu-dresden.de WWW:http://www.tu-dresden.de/zih ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss