Re: [lustre-discuss] Unable to mount new OST

2021-07-06 Thread Jeff Johnson
David,

If it’s brand new DDN I’m sure you paid enough for it to be able to call
Alex on his yacht and get this issue sorted out….but I digress.

Previously in this thread it appeared that ldiskfs was disjointed over
something. To have a new clean ldiskfs format go sideways like that is
*highly* unusual.

Lustre, imho, is best debugged in layers. I’d start by running some safe
low level IO like dd reads on your dm device to ensure IO is error free.
Then an fsck, ensuring your version of e2fsprogs is a ldiskfs WC version.
If after that it still won’t mount be sure to review logs on the mds as
well as the oss where the faulty ost resides.

I don’t usually debug a competitors gear so I’ll have to make a few extra
trips to the bar and chow line at DDN’s SC party as compensation ;-)

—Jeff

On Tue, Jul 6, 2021 at 22:10 David Cohen 
wrote:

> Hi Jeff,
> The logs are clear, the new OST is a brand new DDN pool, no alerts on the
> physical storage, and no indications of malfunctioning disks in the
> machines logs
>
> After reboot the device dm changes:
> ls -la /dev/mapper/OST0051
> lrwxrwxrwx 1 root root 8 Jul  6 07:59 /dev/mapper/OST0051 -> ../dm-30
>
> ls /sys/block/dm-30/slaves
> sdag  sdbm  sdcs  sddy
>
> [root@oss03 ~]# grep sdag /var/log/messages
> Jul  4 05:50:45 oss03 kernel: sd 12:0:0:92: [sdag] 34863054848 4096-byte
> logical blocks: (142 TB/129 TiB)
> Jul  4 05:50:45 oss03 kernel: sd 12:0:0:92: [sdag] Write Protect is off
> Jul  4 05:50:45 oss03 kernel: sd 12:0:0:92: [sdag] Write cache: enabled,
> read cache: enabled, supports DPO and FUA
> Jul  4 05:50:45 oss03 kernel: sd 12:0:0:92: [sdag] Attached SCSI disk
> Jul  4 05:50:46 oss03 multipathd: sdag: add path (uevent)
> Jul  4 05:50:46 oss03 multipathd: sdag [66:0]: path added to devmap OST0051
> Jul  4 06:01:30 oss03 kernel: sd 10:0:0:92: [sdag] 34863054848 4096-byte
> logical blocks: (142 TB/129 TiB)
> Jul  4 06:01:30 oss03 kernel: sd 10:0:0:92: [sdag] Write Protect is off
> Jul  4 06:01:30 oss03 kernel: sd 10:0:0:92: [sdag] Write cache: enabled,
> read cache: enabled, supports DPO and FUA
> Jul  4 06:01:31 oss03 kernel: sd 10:0:0:92: [sdag] Attached SCSI disk
> Jul  4 06:01:31 oss03 multipathd: sdag: add path (uevent)
> Jul  4 06:01:31 oss03 multipathd: sdag [66:0]: path added to devmap OST0051
> Jul  4 06:25:21 oss03 kernel: sd 12:0:0:92: [sdag] 34863054848 4096-byte
> logical blocks: (142 TB/129 TiB)
> Jul  4 06:25:21 oss03 kernel: sd 12:0:0:92: [sdag] Write Protect is off
> Jul  4 06:25:21 oss03 kernel: sd 12:0:0:92: [sdag] Write cache: enabled,
> read cache: enabled, supports DPO and FUA
> Jul  4 06:25:21 oss03 kernel: sd 12:0:0:92: [sdag] Attached SCSI disk
> Jul  4 06:25:22 oss03 multipathd: sdag: add path (uevent)
> Jul  4 06:25:22 oss03 multipathd: sdag [66:0]: path added to devmap OST0051
> Jul  4 07:21:47 oss03 kernel: sd 10:0:0:92: [sdag] 34863054848 4096-byte
> logical blocks: (142 TB/129 TiB)
> Jul  4 07:21:47 oss03 kernel: sd 10:0:0:92: [sdag] Write Protect is off
> Jul  4 07:21:47 oss03 kernel: sd 10:0:0:92: [sdag] Write cache: enabled,
> read cache: enabled, supports DPO and FUA
> Jul  4 07:21:47 oss03 kernel: sd 10:0:0:92: [sdag] Attached SCSI disk
> Jul  4 07:21:48 oss03 multipathd: sdag: add path (uevent)
> Jul  4 07:21:48 oss03 multipathd: sdag [66:0]: path added to devmap OST0051
> Jul  6 07:59:06 oss03 kernel: sd 10:0:0:92: [sdag] 34863054848 4096-byte
> logical blocks: (142 TB/129 TiB)
> Jul  6 07:59:06 oss03 kernel: sd 10:0:0:92: [sdag] Write Protect is off
> Jul  6 07:59:06 oss03 kernel: sd 10:0:0:92: [sdag] Write cache: enabled,
> read cache: enabled, supports DPO and FUA
> Jul  6 07:59:06 oss03 kernel: sd 10:0:0:92: [sdag] Attached SCSI disk
> Jul  6 07:59:06 oss03 multipathd: sdag: add path (uevent)
> Jul  6 07:59:06 oss03 multipathd: sdag [66:0]: path added to devmap OST0051
> [root@oss03 ~]# grep sdbm /var/log/messages
> Jul  4 05:50:49 oss03 kernel: sd 13:0:0:92: [sdbm] 34863054848 4096-byte
> logical blocks: (142 TB/129 TiB)
> Jul  4 05:50:49 oss03 kernel: sd 13:0:0:92: [sdbm] Write Protect is off
> Jul  4 05:50:49 oss03 kernel: sd 13:0:0:92: [sdbm] Write cache: enabled,
> read cache: enabled, supports DPO and FUA
> Jul  4 05:50:49 oss03 kernel: sd 13:0:0:92: [sdbm] Attached SCSI disk
> Jul  4 05:50:49 oss03 multipathd: sdbm: add path (uevent)
> Jul  4 05:50:49 oss03 multipathd: sdbm [68:0]: path added to devmap OST0051
> Jul  4 06:01:34 oss03 kernel: sd 11:0:0:92: [sdbm] 34863054848 4096-byte
> logical blocks: (142 TB/129 TiB)
> Jul  4 06:01:34 oss03 kernel: sd 11:0:0:92: [sdbm] Write Protect is off
> Jul  4 06:01:34 oss03 kernel: sd 11:0:0:92: [sdbm] Write cache: enabled,
> read cache: enabled, supports DPO and FUA
> Jul  4 06:01:34 oss03 kernel: sd 11:0:0:92: [sdbm] Attached SCSI disk
> Jul  4 06:01:34 oss03 multipathd: sdbm: add path (uevent)
> Jul  4 06:01:34 oss03 multipathd: sdbm [68:0]: path added to devmap OST0051
> Jul  4 06:25:25 oss03 kernel: sd 13:0:0:92: [sdbm] 34863054848 4096-byte
> logical blocks: 

Re: [lustre-discuss] Unable to mount new OST

2021-07-06 Thread David Cohen
Hi Jeff,
The logs are clear, the new OST is a brand new DDN pool, no alerts on the
physical storage, and no indications of malfunctioning disks in the
machines logs

After reboot the device dm changes:
ls -la /dev/mapper/OST0051
lrwxrwxrwx 1 root root 8 Jul  6 07:59 /dev/mapper/OST0051 -> ../dm-30

ls /sys/block/dm-30/slaves
sdag  sdbm  sdcs  sddy

[root@oss03 ~]# grep sdag /var/log/messages
Jul  4 05:50:45 oss03 kernel: sd 12:0:0:92: [sdag] 34863054848 4096-byte
logical blocks: (142 TB/129 TiB)
Jul  4 05:50:45 oss03 kernel: sd 12:0:0:92: [sdag] Write Protect is off
Jul  4 05:50:45 oss03 kernel: sd 12:0:0:92: [sdag] Write cache: enabled,
read cache: enabled, supports DPO and FUA
Jul  4 05:50:45 oss03 kernel: sd 12:0:0:92: [sdag] Attached SCSI disk
Jul  4 05:50:46 oss03 multipathd: sdag: add path (uevent)
Jul  4 05:50:46 oss03 multipathd: sdag [66:0]: path added to devmap OST0051
Jul  4 06:01:30 oss03 kernel: sd 10:0:0:92: [sdag] 34863054848 4096-byte
logical blocks: (142 TB/129 TiB)
Jul  4 06:01:30 oss03 kernel: sd 10:0:0:92: [sdag] Write Protect is off
Jul  4 06:01:30 oss03 kernel: sd 10:0:0:92: [sdag] Write cache: enabled,
read cache: enabled, supports DPO and FUA
Jul  4 06:01:31 oss03 kernel: sd 10:0:0:92: [sdag] Attached SCSI disk
Jul  4 06:01:31 oss03 multipathd: sdag: add path (uevent)
Jul  4 06:01:31 oss03 multipathd: sdag [66:0]: path added to devmap OST0051
Jul  4 06:25:21 oss03 kernel: sd 12:0:0:92: [sdag] 34863054848 4096-byte
logical blocks: (142 TB/129 TiB)
Jul  4 06:25:21 oss03 kernel: sd 12:0:0:92: [sdag] Write Protect is off
Jul  4 06:25:21 oss03 kernel: sd 12:0:0:92: [sdag] Write cache: enabled,
read cache: enabled, supports DPO and FUA
Jul  4 06:25:21 oss03 kernel: sd 12:0:0:92: [sdag] Attached SCSI disk
Jul  4 06:25:22 oss03 multipathd: sdag: add path (uevent)
Jul  4 06:25:22 oss03 multipathd: sdag [66:0]: path added to devmap OST0051
Jul  4 07:21:47 oss03 kernel: sd 10:0:0:92: [sdag] 34863054848 4096-byte
logical blocks: (142 TB/129 TiB)
Jul  4 07:21:47 oss03 kernel: sd 10:0:0:92: [sdag] Write Protect is off
Jul  4 07:21:47 oss03 kernel: sd 10:0:0:92: [sdag] Write cache: enabled,
read cache: enabled, supports DPO and FUA
Jul  4 07:21:47 oss03 kernel: sd 10:0:0:92: [sdag] Attached SCSI disk
Jul  4 07:21:48 oss03 multipathd: sdag: add path (uevent)
Jul  4 07:21:48 oss03 multipathd: sdag [66:0]: path added to devmap OST0051
Jul  6 07:59:06 oss03 kernel: sd 10:0:0:92: [sdag] 34863054848 4096-byte
logical blocks: (142 TB/129 TiB)
Jul  6 07:59:06 oss03 kernel: sd 10:0:0:92: [sdag] Write Protect is off
Jul  6 07:59:06 oss03 kernel: sd 10:0:0:92: [sdag] Write cache: enabled,
read cache: enabled, supports DPO and FUA
Jul  6 07:59:06 oss03 kernel: sd 10:0:0:92: [sdag] Attached SCSI disk
Jul  6 07:59:06 oss03 multipathd: sdag: add path (uevent)
Jul  6 07:59:06 oss03 multipathd: sdag [66:0]: path added to devmap OST0051
[root@oss03 ~]# grep sdbm /var/log/messages
Jul  4 05:50:49 oss03 kernel: sd 13:0:0:92: [sdbm] 34863054848 4096-byte
logical blocks: (142 TB/129 TiB)
Jul  4 05:50:49 oss03 kernel: sd 13:0:0:92: [sdbm] Write Protect is off
Jul  4 05:50:49 oss03 kernel: sd 13:0:0:92: [sdbm] Write cache: enabled,
read cache: enabled, supports DPO and FUA
Jul  4 05:50:49 oss03 kernel: sd 13:0:0:92: [sdbm] Attached SCSI disk
Jul  4 05:50:49 oss03 multipathd: sdbm: add path (uevent)
Jul  4 05:50:49 oss03 multipathd: sdbm [68:0]: path added to devmap OST0051
Jul  4 06:01:34 oss03 kernel: sd 11:0:0:92: [sdbm] 34863054848 4096-byte
logical blocks: (142 TB/129 TiB)
Jul  4 06:01:34 oss03 kernel: sd 11:0:0:92: [sdbm] Write Protect is off
Jul  4 06:01:34 oss03 kernel: sd 11:0:0:92: [sdbm] Write cache: enabled,
read cache: enabled, supports DPO and FUA
Jul  4 06:01:34 oss03 kernel: sd 11:0:0:92: [sdbm] Attached SCSI disk
Jul  4 06:01:34 oss03 multipathd: sdbm: add path (uevent)
Jul  4 06:01:34 oss03 multipathd: sdbm [68:0]: path added to devmap OST0051
Jul  4 06:25:25 oss03 kernel: sd 13:0:0:92: [sdbm] 34863054848 4096-byte
logical blocks: (142 TB/129 TiB)
Jul  4 06:25:25 oss03 kernel: sd 13:0:0:92: [sdbm] Write Protect is off
Jul  4 06:25:25 oss03 kernel: sd 13:0:0:92: [sdbm] Write cache: enabled,
read cache: enabled, supports DPO and FUA
Jul  4 06:25:25 oss03 kernel: sd 13:0:0:92: [sdbm] Attached SCSI disk
Jul  4 06:25:25 oss03 multipathd: sdbm: add path (uevent)
Jul  4 06:25:25 oss03 multipathd: sdbm [68:0]: path added to devmap OST0051
Jul  4 07:21:50 oss03 kernel: sd 11:0:0:92: [sdbm] 34863054848 4096-byte
logical blocks: (142 TB/129 TiB)
Jul  4 07:21:50 oss03 kernel: sd 11:0:0:92: [sdbm] Write Protect is off
Jul  4 07:21:50 oss03 kernel: sd 11:0:0:92: [sdbm] Write cache: enabled,
read cache: enabled, supports DPO and FUA
Jul  4 07:21:50 oss03 kernel: sd 11:0:0:92: [sdbm] Attached SCSI disk
Jul  4 07:21:50 oss03 multipathd: sdbm: add path (uevent)
Jul  4 07:21:50 oss03 multipathd: sdbm [68:0]: path added to devmap OST0051
Jul  6 07:59:09 oss03 kernel: sd 11:0:0:92: [sdbm] 34863054848 4096-byte
logical 

Re: [lustre-discuss] Unable to mount new OST

2021-07-06 Thread Jeff Johnson
What devices are underneath dm-21 and are there any errors in
/var/log/messages for those devices? (assuming /dev/sdX devices underneath)

Run `ls /sys/block/dm-21/slaves` to see what devices are beneath dm-21





On Tue, Jul 6, 2021 at 20:09 David Cohen 
wrote:

> Hi,
> The index of the OST is unique in the system and free for the new one, as
> it is increased by "1" for every new OST created, so whatever it converts
> to should not be relevant to it's refusal to mount, or am I mistaken?
>
> I'm pasting the log messages again, in case they were lost up the thread,
> adding the output of "fdisk -l", should the OST size be the issue:
>
> lctl dk show tens of thousands of lines repeating the same error after
> attempting to mount the OST:
>
> 0010:1000:26.0:1625546374.322973:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one())
> local-OST0033: fail to set LMA for init OI scrub: rc = -30
> 0010:1000:26.0:1625546374.322974:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one())
> local-OST0033: fail to set LMA for init OI scrub: rc = -30
> 0010:1000:26.0:1625546374.322975:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one())
> local-OST0033: fail to set LMA for init OI scrub: rc = -30
>
> in /var/log/messages I see the following corresponding to dm21 which is
> the new OST:
>
> Jul  6 07:38:37 oss03 kernel: LDISKFS-fs warning (device dm-21):
> ldiskfs_multi_mount_protect:322: MMP interval 42 higher than expected,
> please wait.
> Jul  6 07:39:19 oss03 kernel: LDISKFS-fs (dm-21): file extents enabled,
> maximum tree depth=5
> Jul  6 07:39:19 oss03 kernel: LDISKFS-fs warning (device dm-21):
> ldiskfs_clear_journal_err:4862: Filesystem error recorded from previous
> mount: IO failure
> Jul  6 07:39:19 oss03 kernel: LDISKFS-fs warning (device dm-21):
> ldiskfs_clear_journal_err:4863: Marking fs in need of filesystem check.
> Jul  6 07:39:19 oss03 kernel: LDISKFS-fs (dm-21): warning: mounting fs
> with errors, running e2fsck is recommended
> Jul  6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): recovery complete
> Jul  6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): mounted filesystem with
> ordered data mode. Opts:
> user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc
> Jul  6 07:39:22 oss03 kernel: LDISKFS-fs error (device dm-21):
> htree_dirblock_to_tree:1278: inode #2: block 21233: comm mount.lustre: bad
> entry in directory: rec_len is too small for name_len - offset=4084(4084),
> inode=0, rec_len=12
> , name_len=0
> Jul  6 07:39:22 oss03 kernel: Aborting journal on device dm-21-8.
> Jul  6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): Remounting filesystem
> read-only
> Jul  6 07:39:24 oss03 kernel: LDISKFS-fs warning (device dm-21):
> kmmpd:187: kmmpd being stopped since filesystem has been remounted as
> readonly.
> Jul  6 07:44:22 oss03 kernel: LDISKFS-fs (dm-21): error count since last
> fsck: 6
> Jul  6 07:44:22 oss03 kernel: LDISKFS-fs (dm-21): initial error at time
> 1625367384: htree_dirblock_to_tree:1278: inode 2: block 21233
> Jul  6 07:44:22 oss03 kernel: LDISKFS-fs (dm-21): last error at time
> 1625546362: htree_dirblock_to_tree:1278: inode 2: block 21233
>
> fdisk -l /dev/mapper/OST0051
>
> Disk /dev/mapper/OST0051: 142799.1 GB, 142799072657408 bytes, 34863054848
> sectors
> Units = sectors of 1 * 4096 = 4096 bytes
> Sector size (logical/physical): 4096 bytes / 4096 bytes
> I/O size (minimum/optimal): 2097152 bytes / 2097152 bytes
>
>
> Thanks,
> David
>
> On Tue, Jul 6, 2021 at 10:35 PM Spitz, Cory James 
> wrote:
>
>> What OST index (number) were you trying to add?
>>
>>
>>
>> Andreas is right:
>>
>> Note that your "--index=0051" value is probably interpreted as an octal
>> number "41", it should be "--index=0x0051" or "--index=0x51" (hex, to match
>> the OST device name) or "--index=81" (decimal).
>>
>>
>>
>> And you said:
>>
>> I'm aware that index 51 actually translates to hex 33
>> (local-OST0033_UUID).
>>
>>
>>
>> Ok, 0051 (in octal by way of the leading zeros*) translates to decimal 41
>> as Andreas pointed out, but that’s 0x29 in hexadecimal, not 0x33.  Assuming
>> you wanted to use decimal 51 then you’d have tried to mkfs.lustre the wrong
>> index.  So, if you wanted to use decimal 51, you’d have use –index=0x33 or
>> –index=0063.
>>
>>
>>
>> -Cory
>>
>>
>>
>> p.s.
>>
>> (*) BTW, the convention with leading zeros for octal can be googled or
>> read about at https://en.wikipedia.org/wiki/Octal.
>>
>>
>>
>>
>>
>> On 7/6/21, 12:35 AM, "lustre-discuss on behalf of David Cohen" <
>> lustre-discuss-boun...@lists.lustre.org on behalf of
>> cda...@physics.technion.ac.il> wrote:
>>
>>
>>
>> Thanks Andreas,
>>
>> I'm aware that index 51 actually translates to hex 33
>> (local-OST0033_UUID).
>> I don't believe that's the reason for the failed mount as it is only an
>> index that I increase for every new OST and there are no duplicates.
>>
>>
>>
>> lctl dk show tens of thousands of lines repeating the same error after
>> attempting to mount the OST:
>>
>>
>>
>> 

Re: [lustre-discuss] Unable to mount new OST

2021-07-06 Thread Abdeslam Tahari via lustre-discuss
Hello
I think you have a disk problem , bad blocks which is causing the problem ,
unmount the disk and run e2fsck and repair the bad blocks as well (man
e2fsck)

Le mar. 6 juil. 2021 à 09:28, David Cohen  a
écrit :

> Thanks Artem,
> I already tried that (e2fsck) with no avail.
> I even tried tunefs.lustre --writeconf --erase-params on the MDS and all
> the other targets, but the behaviour remains the same.
>
> Best regards,
> David
>
>
>
> On Tue, Jul 6, 2021 at 10:09 AM Благодаренко Артём <
> artem.blagodare...@gmail.com> wrote:
>
>> Hello David,
>>
>> On 6 Jul 2021, at 08:34, David Cohen 
>> wrote:
>>
>> Jul  6 07:39:19 oss03 kernel: LDISKFS-fs (dm-21): warning: mounting fs
>> with errors, running e2fsck is recommended
>>
>>
>>
>> It looks like LDISKFS partition is in inconsistent state now. It is
>> better to follow the recommendation and run e2fsck.
>>
>> Best regards,
>> Artem Blagodarenko.
>>
>> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>


-- 
Tahari.Abdeslam
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Unable to mount new OST

2021-07-06 Thread David Cohen
Thanks Artem,
I already tried that (e2fsck) with no avail.
I even tried tunefs.lustre --writeconf --erase-params on the MDS and all
the other targets, but the behaviour remains the same.

Best regards,
David



On Tue, Jul 6, 2021 at 10:09 AM Благодаренко Артём <
artem.blagodare...@gmail.com> wrote:

> Hello David,
>
> On 6 Jul 2021, at 08:34, David Cohen 
> wrote:
>
> Jul  6 07:39:19 oss03 kernel: LDISKFS-fs (dm-21): warning: mounting fs
> with errors, running e2fsck is recommended
>
>
>
> It looks like LDISKFS partition is in inconsistent state now. It is better
> to follow the recommendation and run e2fsck.
>
> Best regards,
> Artem Blagodarenko.
>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Unable to mount new OST

2021-07-06 Thread Благодаренко Артём via lustre-discuss
Hello David,

> On 6 Jul 2021, at 08:34, David Cohen  wrote:
> 
> Jul  6 07:39:19 oss03 kernel: LDISKFS-fs (dm-21): warning: mounting fs with 
> errors, running e2fsck is recommended


It looks like LDISKFS partition is in inconsistent state now. It is better to 
follow the recommendation and run e2fsck.

Best regards,
Artem Blagodarenko.

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Unable to mount new OST

2021-07-05 Thread David Cohen
Thanks Andreas,
I'm aware that index 51 actually translates to hex 33 (local-OST0033_UUID).
I don't believe that's the reason for the failed mount as it is only an
index that I increase for every new OST and there are no duplicates.

lctl dk show tens of thousands of lines repeating the same error after
attempting to mount the OST:

0010:1000:26.0:1625546374.322973:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one())
local-OST0033: fail to set LMA for init OI scrub: rc = -30
0010:1000:26.0:1625546374.322974:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one())
local-OST0033: fail to set LMA for init OI scrub: rc = -30
0010:1000:26.0:1625546374.322975:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one())
local-OST0033: fail to set LMA for init OI scrub: rc = -30

in /var/log/messages I see the following corresponding to dm21 which is the
new OST:

Jul  6 07:38:37 oss03 kernel: LDISKFS-fs warning (device dm-21):
ldiskfs_multi_mount_protect:322: MMP interval 42 higher than expected,
please wait.
Jul  6 07:39:19 oss03 kernel: LDISKFS-fs (dm-21): file extents enabled,
maximum tree depth=5
Jul  6 07:39:19 oss03 kernel: LDISKFS-fs warning (device dm-21):
ldiskfs_clear_journal_err:4862: Filesystem error recorded from previous
mount: IO failure
Jul  6 07:39:19 oss03 kernel: LDISKFS-fs warning (device dm-21):
ldiskfs_clear_journal_err:4863: Marking fs in need of filesystem check.
Jul  6 07:39:19 oss03 kernel: LDISKFS-fs (dm-21): warning: mounting fs with
errors, running e2fsck is recommended
Jul  6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): recovery complete
Jul  6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): mounted filesystem with
ordered data mode. Opts:
user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc
Jul  6 07:39:22 oss03 kernel: LDISKFS-fs error (device dm-21):
htree_dirblock_to_tree:1278: inode #2: block 21233: comm mount.lustre: bad
entry in directory: rec_len is too small for name_len - offset=4084(4084),
inode=0, rec_len=12
, name_len=0
Jul  6 07:39:22 oss03 kernel: Aborting journal on device dm-21-8.
Jul  6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): Remounting filesystem
read-only
Jul  6 07:39:24 oss03 kernel: LDISKFS-fs warning (device dm-21): kmmpd:187:
kmmpd being stopped since filesystem has been remounted as readonly.
Jul  6 07:44:22 oss03 kernel: LDISKFS-fs (dm-21): error count since last
fsck: 6
Jul  6 07:44:22 oss03 kernel: LDISKFS-fs (dm-21): initial error at time
1625367384: htree_dirblock_to_tree:1278: inode 2: block 21233
Jul  6 07:44:22 oss03 kernel: LDISKFS-fs (dm-21): last error at time
1625546362: htree_dirblock_to_tree:1278: inode 2: block 21233

As I mentioned before mount never completes so the only way out of that is
force reboot.

Thanks,
David

On Tue, Jul 6, 2021 at 8:07 AM Andreas Dilger  wrote:

>
>
> On Jul 5, 2021, at 09:05, David Cohen 
> wrote:
>
> Hi,
> I'm using Lustre 2.10.5 and lately tried to add a new OST.
> The OST was formatted with the command below, which other than the index
> is the exact same one used for all the other OSTs in the system.
>
> mkfs.lustre --reformat --mkfsoptions="-t ext4 -T huge" --ost
> --fsname=local  --index=0051 --param ost.quota_type=ug
> --mountfsoptions='errors=remount-ro,extents,mballoc' --mgsnode=10.0.0.3@tcp
> --mgsnode=10.0.0.1@tc
> p --mgsnode=10.0.0.2@tcp --servicenode=10.0.0.3@tcp
> --servicenode=10.0.0.1@tcp --servicenode=10.0.0.2@tcp /dev/mapper/OST0051
>
>
> Note that your "--index=0051" value is probably interpreted as an octal
> number "41", it should be "--index=0x0051" or "--index=0x51" (hex, to match
> the OST device name) or "--index=81" (decimal).
>
>
> When trying to mount the with:
> mount.lustre /dev/mapper/OST0051 /Lustre/OST0051
>
> The system stays on 100% CPU (one core) forever and the mount never
> completes, not even after a week.
>
> I tried tunefs.lustre --writeconf --erase-params on the MDS and all the
> other targets, but the behaviour remains the same.
>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Whamcloud
>
>
>
>
>
>
>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Unable to mount new OST

2021-07-05 Thread Andreas Dilger via lustre-discuss


On Jul 5, 2021, at 09:05, David Cohen 
mailto:cda...@physics.technion.ac.il>> wrote:

Hi,
I'm using Lustre 2.10.5 and lately tried to add a new OST.
The OST was formatted with the command below, which other than the index is the 
exact same one used for all the other OSTs in the system.

mkfs.lustre --reformat --mkfsoptions="-t ext4 -T huge" --ost --fsname=local  
--index=0051 --param ost.quota_type=ug 
--mountfsoptions='errors=remount-ro,extents,mballoc' --mgsnode=10.0.0.3@tcp 
--mgsnode=10.0.0.1@tc
p --mgsnode=10.0.0.2@tcp --servicenode=10.0.0.3@tcp --servicenode=10.0.0.1@tcp 
--servicenode=10.0.0.2@tcp /dev/mapper/OST0051

Note that your "--index=0051" value is probably interpreted as an octal number 
"41", it should be "--index=0x0051" or "--index=0x51" (hex, to match the OST 
device name) or "--index=81" (decimal).


When trying to mount the with:
mount.lustre /dev/mapper/OST0051 /Lustre/OST0051

The system stays on 100% CPU (one core) forever and the mount never completes, 
not even after a week.

I tried tunefs.lustre --writeconf --erase-params on the MDS and all the other 
targets, but the behaviour remains the same.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud







___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Unable to mount new OST

2021-07-05 Thread Abdeslam Tahari via lustre-discuss
Hello


could you submit logs files please

Le lun. 5 juil. 2021 à 17:05, David Cohen  a
écrit :

> Hi,
> I'm using Lustre 2.10.5 and lately tried to add a new OST.
> The OST was formatted with the command below, which other than the index
> is the exact same one used for all the other OSTs in the system.
>
> mkfs.lustre --reformat --mkfsoptions="-t ext4 -T huge" --ost
> --fsname=local  --index=0051 --param ost.quota_type=ug
> --mountfsoptions='errors=remount-ro,extents,mballoc' --mgsnode=10.0.0.3@tcp
> --mgsnode=10.0.0.1@tc
> p --mgsnode=10.0.0.2@tcp --servicenode=10.0.0.3@tcp
> --servicenode=10.0.0.1@tcp --servicenode=10.0.0.2@tcp /dev/mapper/OST0051
>
> When trying to mount the with:
> mount.lustre /dev/mapper/OST0051 /Lustre/OST0051
>
> The system stays on 100% CPU (one core) forever and the mount never
> completes, not even after a week.
>
> I tried tunefs.lustre --writeconf --erase-params on the MDS and all the
> other targets, but the behaviour remains the same.
>
> David
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>


-- 
Tahari.Abdeslam
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Unable to mount new OST

2021-07-05 Thread David Cohen
Hi,
I'm using Lustre 2.10.5 and lately tried to add a new OST.
The OST was formatted with the command below, which other than the index is
the exact same one used for all the other OSTs in the system.

mkfs.lustre --reformat --mkfsoptions="-t ext4 -T huge" --ost
--fsname=local  --index=0051 --param ost.quota_type=ug
--mountfsoptions='errors=remount-ro,extents,mballoc' --mgsnode=10.0.0.3@tcp
--mgsnode=10.0.0.1@tc
p --mgsnode=10.0.0.2@tcp --servicenode=10.0.0.3@tcp
--servicenode=10.0.0.1@tcp --servicenode=10.0.0.2@tcp /dev/mapper/OST0051

When trying to mount the with:
mount.lustre /dev/mapper/OST0051 /Lustre/OST0051

The system stays on 100% CPU (one core) forever and the mount never
completes, not even after a week.

I tried tunefs.lustre --writeconf --erase-params on the MDS and all the
other targets, but the behaviour remains the same.

David
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org