Re: [lustre-discuss] Odd behavior with tunefs.lustre and device index

2024-01-25 Thread Backer via lustre-discuss
Thank you Andreas.
Are you aware of any paid engagements/support for requests like these to
get changes done quickly?

On Wed, 24 Jan 2024 at 20:52, Andreas Dilger  wrote:

> This is more like a bug report and should be filed in Jira.
> That said, no guarantee that someone would be able to
> work on this in a timely manner.
>
> On Jan 24, 2024, at 09:47, Backer via lustre-discuss <
> lustre-discuss@lists.lustre.org> wrote:
>
> Just pushing it on to the top of inbox :)  Or is there any other
> distribution list that is more appropriate for this type of questions? I am
> also trying devel mailing list.
>
> On Sun, 21 Jan 2024 at 18:34, Backer  wrote:
>
>> Just to clarify. OSS-2 is completely powered off (hard power off without
>> any graceful shutdown) before start working on OSS-3.
>>
>> On Sun, 21 Jan 2024 at 12:12, Backer  wrote:
>>
>>> Hi All,
>>>
>>> I am seeing a behavior with tunefs.lustre. After changing the failover
>>> node and trying to mount an OST, getting getting the following error:
>>>
>>> The target service's index is already in use. (/dev/sdd)
>>>
>>> After the above error, and performing --writeconf once, I can repeat
>>> these steps (see below) any number of times and any OSS without
>>> --writeconf.
>>>
>>> This is an effort to mount an OST to a new OSS. I reproduced this issue
>>> after simplifying some steps and reproducing the behavior (see below)
>>> consistently. I was wondering if anyone could help me to understand this?
>>>
>>> [root@OSS-2 opc]# lctl list_nids
>>> 10.99.101.18@tcp1
>>> [root@OSS-2 opc]#
>>>
>>> [root@OSS-2 opc]# mkfs.lustre --reformat  --ost --fsname="testfs"
>>> --index="64"  --mgsnode "10.99.101.6@tcp1" --mgsnode "10.99.101.7@tcp1"
>>> --servicenode "10.99.101.18@tcp1" "/dev/sdd"
>>>
>>>Permanent disk data:
>>> Target: testfs:OST0040
>>> Index:  64
>>> Lustre FS:  testfs
>>> Mount type: ldiskfs
>>> Flags:  0x1062
>>>   (OST first_time update no_primnode )
>>> Persistent mount opts: ,errors=remount-ro
>>> Parameters:  mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1
>>> failover.node=10.99.101.18@tcp1
>>>
>>> device size = 51200MB
>>> formatting backing filesystem ldiskfs on /dev/sdd
>>> target name   testfs:OST0040
>>> kilobytes 52428800
>>> options-J size=1024 -I 512 -i 69905 -q -O
>>> extents,uninit_bg,mmp,dir_nlink,quota,project,huge_file,^fast_commit,flex_bg
>>> -G 256 -E resize="4290772992",lazy_journal_init="0",lazy_itable_init="0" -F
>>> mkfs_cmd = mke2fs -j -b 4096 -L testfs:OST0040  -J size=1024 -I 512 -i
>>> 69905 -q -O
>>> extents,uninit_bg,mmp,dir_nlink,quota,project,huge_file,^fast_commit,flex_bg
>>> -G 256 -E resize="4290772992",lazy_journal_init="0",lazy_itable_init="0" -F
>>> /dev/sdd 52428800k
>>> Writing CONFIGS/mountdata
>>>
>>> [root@OSS-2 opc]# tunefs.lustre --dryrun /dev/sdd
>>> checking for existing Lustre data: found
>>>
>>>Read previous values:
>>> Target: testfs-OST0040
>>> Index:  64
>>> Lustre FS:  testfs
>>> Mount type: ldiskfs
>>> Flags:  0x1062
>>>   (OST first_time update no_primnode )
>>> Persistent mount opts: ,errors=remount-ro
>>> Parameters:  mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1
>>> failover.node=10.99.101.18@tcp1
>>>
>>>
>>>Permanent disk data:
>>> Target: testfs:OST0040
>>> Index:  64
>>> Lustre FS:  testfs
>>> Mount type: ldiskfs
>>> Flags:  0x1062
>>>   (OST first_time update no_primnode )
>>> Persistent mount opts: ,errors=remount-ro
>>> Parameters:  mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1
>>> failover.node=10.99.101.18@tcp1
>>>
>>> exiting before disk write.
>>> [root@OSS-2 opc]#
>>>
>>> [root@OSS-2 opc]# tunefs.lustre --erase-param failover.node
>>> --servicenode 10.99.101.18@tcp1 /dev/sdd
>>> checking for existing Lustre data: found
>>>
>>>Read previous values:
>>> Target: testfs-OST0040
>>> Index:  64
>>> Lustre FS:  testfs
>>> Mount type: ldiskfs
>>> Flags:  0x1062
>>>   (OST first_time update no_primnode )
>>> Persi

Re: [lustre-discuss] Odd behavior with tunefs.lustre and device index

2024-01-24 Thread Backer via lustre-discuss
Just pushing it on to the top of inbox :)  Or is there any other
distribution list that is more appropriate for this type of questions? I am
also trying devel mailing list.

On Sun, 21 Jan 2024 at 18:34, Backer  wrote:

> Just to clarify. OSS-2 is completely powered off (hard power off without
> any graceful shutdown) before start working on OSS-3.
>
> On Sun, 21 Jan 2024 at 12:12, Backer  wrote:
>
>> Hi All,
>>
>> I am seeing a behavior with tunefs.lustre. After changing the failover
>> node and trying to mount an OST, getting getting the following error:
>>
>> The target service's index is already in use. (/dev/sdd)
>>
>>
>> After the above error, and performing --writeconf once, I can repeat
>> these steps (see below) any number of times and any OSS without
>> --writeconf.
>>
>>
>> This is an effort to mount an OST to a new OSS. I reproduced this issue
>> after simplifying some steps and reproducing the behavior (see below)
>> consistently. I was wondering if anyone could help me to understand this?
>>
>> [root@OSS-2 opc]# lctl list_nids
>>
>> 10.99.101.18@tcp1
>>
>> [root@OSS-2 opc]#
>>
>>
>> [root@OSS-2 opc]# mkfs.lustre --reformat  --ost --fsname="testfs"
>> --index="64"  --mgsnode "10.99.101.6@tcp1" --mgsnode "10.99.101.7@tcp1"
>> --servicenode "10.99.101.18@tcp1" "/dev/sdd"
>>
>>
>>Permanent disk data:
>>
>> Target: testfs:OST0040
>>
>> Index:  64
>>
>> Lustre FS:  testfs
>>
>> Mount type: ldiskfs
>>
>> Flags:  0x1062
>>
>>   (OST first_time update no_primnode )
>>
>> Persistent mount opts: ,errors=remount-ro
>>
>> Parameters:  mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1
>> failover.node=10.99.101.18@tcp1
>>
>>
>> device size = 51200MB
>>
>> formatting backing filesystem ldiskfs on /dev/sdd
>>
>> target name   testfs:OST0040
>>
>> kilobytes 52428800
>>
>> options-J size=1024 -I 512 -i 69905 -q -O
>> extents,uninit_bg,mmp,dir_nlink,quota,project,huge_file,^fast_commit,flex_bg
>> -G 256 -E resize="4290772992",lazy_journal_init="0",lazy_itable_init="0" -F
>>
>> mkfs_cmd = mke2fs -j -b 4096 -L testfs:OST0040  -J size=1024 -I 512 -i
>> 69905 -q -O
>> extents,uninit_bg,mmp,dir_nlink,quota,project,huge_file,^fast_commit,flex_bg
>> -G 256 -E resize="4290772992",lazy_journal_init="0",lazy_itable_init="0" -F
>> /dev/sdd 52428800k
>>
>> Writing CONFIGS/mountdata
>>
>>
>> [root@OSS-2 opc]# tunefs.lustre --dryrun /dev/sdd
>>
>> checking for existing Lustre data: found
>>
>>
>>Read previous values:
>>
>> Target: testfs-OST0040
>>
>> Index:  64
>>
>> Lustre FS:  testfs
>>
>> Mount type: ldiskfs
>>
>> Flags:  0x1062
>>
>>   (OST first_time update no_primnode )
>>
>> Persistent mount opts: ,errors=remount-ro
>>
>> Parameters:  mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1
>> failover.node=10.99.101.18@tcp1
>>
>>
>>
>>Permanent disk data:
>>
>> Target: testfs:OST0040
>>
>> Index:  64
>>
>> Lustre FS:  testfs
>>
>> Mount type: ldiskfs
>>
>> Flags:  0x1062
>>
>>   (OST first_time update no_primnode )
>>
>> Persistent mount opts: ,errors=remount-ro
>>
>> Parameters:  mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1
>> failover.node=10.99.101.18@tcp1
>>
>>
>> exiting before disk write.
>>
>> [root@OSS-2 opc]#
>>
>>
>> [root@OSS-2 opc]# tunefs.lustre --erase-param failover.node
>> --servicenode 10.99.101.18@tcp1 /dev/sdd
>>
>> checking for existing Lustre data: found
>>
>>
>>Read previous values:
>>
>> Target: testfs-OST0040
>>
>> Index:  64
>>
>> Lustre FS:  testfs
>>
>> Mount type: ldiskfs
>>
>> Flags:  0x1062
>>
>>   (OST first_time update no_primnode )
>>
>> Persistent mount opts: ,errors=remount-ro
>>
>> Parameters:  mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1
>> failover.node=10.99.101.18@tcp1
>>
>>
>>
>>Permanent disk data:
>>
>> Target: testfs:OST0040
>>
>> Index:  64
>>
>> Lustre FS:  testfs
>>
>> Mount type: ldiskfs
>>
>> Flags:  0x1062
>>
>>   (OST first_time update no_primnode )
>>
>> Persistent mount opts: ,errors=remount-ro
>>
>> Parameters:  mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1
>> failover.node=10.99.101.18@tcp1
>>
>>
>> Writing CONFIGS/mountdata
>>
>>
>> [root@OSS-2 opc]# mkdir /testfs-OST0040
>>
>> [root@OSS-2 opc]# mount -t lustre /dev/sdd  /testfs-OST0040
>>
>> mount.lustre: increased
>> '/sys/devices/platform/host5/session3/target5:0:0/5:0:0:1/block/sdd/queue/max_sectors_kb'
>> from 1024 to 16384
>>
>> [root@OSS-2 opc]#
>>
>>
>> [root@OSS-2 opc]# tunefs.lustre --dryrun /dev/sdd
>>
>> checking for existing Lustre data: found
>>
>>
>>Read previous values:
>>
>> Target: testfs-OST0040
>>
>> Index:  64
>>
>> Lustre FS:  testfs
>>
>> Mount type: ldiskfs
>>
>> Flags:  0x1002
>>
>>   (OST no_primnode )
>>
>> Persistent mount opts: ,errors=remount-ro
>>
>> Parameters:  mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1
>> failover.node=10.99.101.18@tcp1
>>
>>
>>
>>Permanent disk data:
>>
>> Target: testfs-OST0040
>>
>> Index:  64
>>
>> 

Re: [lustre-discuss] Odd behavior with tunefs.lustre and device index

2024-01-21 Thread Backer via lustre-discuss
Just to clarify. OSS-2 is completely powered off (hard power off without
any graceful shutdown) before start working on OSS-3.

On Sun, 21 Jan 2024 at 12:12, Backer  wrote:

> Hi All,
>
> I am seeing a behavior with tunefs.lustre. After changing the failover
> node and trying to mount an OST, getting getting the following error:
>
> The target service's index is already in use. (/dev/sdd)
>
>
> After the above error, and performing --writeconf once, I can repeat these
> steps (see below) any number of times and any OSS without --writeconf.
>
>
> This is an effort to mount an OST to a new OSS. I reproduced this issue
> after simplifying some steps and reproducing the behavior (see below)
> consistently. I was wondering if anyone could help me to understand this?
>
> [root@OSS-2 opc]# lctl list_nids
>
> 10.99.101.18@tcp1
>
> [root@OSS-2 opc]#
>
>
> [root@OSS-2 opc]# mkfs.lustre --reformat  --ost --fsname="testfs"
> --index="64"  --mgsnode "10.99.101.6@tcp1" --mgsnode "10.99.101.7@tcp1"
> --servicenode "10.99.101.18@tcp1" "/dev/sdd"
>
>
>Permanent disk data:
>
> Target: testfs:OST0040
>
> Index:  64
>
> Lustre FS:  testfs
>
> Mount type: ldiskfs
>
> Flags:  0x1062
>
>   (OST first_time update no_primnode )
>
> Persistent mount opts: ,errors=remount-ro
>
> Parameters:  mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1
> failover.node=10.99.101.18@tcp1
>
>
> device size = 51200MB
>
> formatting backing filesystem ldiskfs on /dev/sdd
>
> target name   testfs:OST0040
>
> kilobytes 52428800
>
> options-J size=1024 -I 512 -i 69905 -q -O
> extents,uninit_bg,mmp,dir_nlink,quota,project,huge_file,^fast_commit,flex_bg
> -G 256 -E resize="4290772992",lazy_journal_init="0",lazy_itable_init="0" -F
>
> mkfs_cmd = mke2fs -j -b 4096 -L testfs:OST0040  -J size=1024 -I 512 -i
> 69905 -q -O
> extents,uninit_bg,mmp,dir_nlink,quota,project,huge_file,^fast_commit,flex_bg
> -G 256 -E resize="4290772992",lazy_journal_init="0",lazy_itable_init="0" -F
> /dev/sdd 52428800k
>
> Writing CONFIGS/mountdata
>
>
> [root@OSS-2 opc]# tunefs.lustre --dryrun /dev/sdd
>
> checking for existing Lustre data: found
>
>
>Read previous values:
>
> Target: testfs-OST0040
>
> Index:  64
>
> Lustre FS:  testfs
>
> Mount type: ldiskfs
>
> Flags:  0x1062
>
>   (OST first_time update no_primnode )
>
> Persistent mount opts: ,errors=remount-ro
>
> Parameters:  mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1
> failover.node=10.99.101.18@tcp1
>
>
>
>Permanent disk data:
>
> Target: testfs:OST0040
>
> Index:  64
>
> Lustre FS:  testfs
>
> Mount type: ldiskfs
>
> Flags:  0x1062
>
>   (OST first_time update no_primnode )
>
> Persistent mount opts: ,errors=remount-ro
>
> Parameters:  mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1
> failover.node=10.99.101.18@tcp1
>
>
> exiting before disk write.
>
> [root@OSS-2 opc]#
>
>
> [root@OSS-2 opc]# tunefs.lustre --erase-param failover.node --servicenode
> 10.99.101.18@tcp1 /dev/sdd
>
> checking for existing Lustre data: found
>
>
>Read previous values:
>
> Target: testfs-OST0040
>
> Index:  64
>
> Lustre FS:  testfs
>
> Mount type: ldiskfs
>
> Flags:  0x1062
>
>   (OST first_time update no_primnode )
>
> Persistent mount opts: ,errors=remount-ro
>
> Parameters:  mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1
> failover.node=10.99.101.18@tcp1
>
>
>
>Permanent disk data:
>
> Target: testfs:OST0040
>
> Index:  64
>
> Lustre FS:  testfs
>
> Mount type: ldiskfs
>
> Flags:  0x1062
>
>   (OST first_time update no_primnode )
>
> Persistent mount opts: ,errors=remount-ro
>
> Parameters:  mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1
> failover.node=10.99.101.18@tcp1
>
>
> Writing CONFIGS/mountdata
>
>
> [root@OSS-2 opc]# mkdir /testfs-OST0040
>
> [root@OSS-2 opc]# mount -t lustre /dev/sdd  /testfs-OST0040
>
> mount.lustre: increased
> '/sys/devices/platform/host5/session3/target5:0:0/5:0:0:1/block/sdd/queue/max_sectors_kb'
> from 1024 to 16384
>
> [root@OSS-2 opc]#
>
>
> [root@OSS-2 opc]# tunefs.lustre --dryrun /dev/sdd
>
> checking for existing Lustre data: found
>
>
>Read previous values:
>
> Target: testfs-OST0040
>
> Index:  64
>
> Lustre FS:  testfs
>
> Mount type: ldiskfs
>
> Flags:  0x1002
>
>   (OST no_primnode )
>
> Persistent mount opts: ,errors=remount-ro
>
> Parameters:  mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1
> failover.node=10.99.101.18@tcp1
>
>
>
>Permanent disk data:
>
> Target: testfs-OST0040
>
> Index:  64
>
> Lustre FS:  testfs
>
> Mount type: ldiskfs
>
> Flags:  0x1002
>
>   (OST no_primnode )
>
> Persistent mount opts: ,errors=remount-ro
>
> Parameters:  mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1
> failover.node=10.99.101.18@tcp1
>
>
> exiting before disk write.
>
> [root@OSS-2 opc]#
>
>
>
>
> Going over to OSS-3 and trying to mount OST.
>
>
>
> [root@OSS-3 opc]# lctl list_nids
>
> 10.99.101.19@tcp1
>
> [root@OSS-3 opc]#
>
>
> 

[lustre-discuss] Odd behavior with tunefs.lustre and device index

2024-01-21 Thread Backer via lustre-discuss
Hi All,

I am seeing a behavior with tunefs.lustre. After changing the failover node
and trying to mount an OST, getting getting the following error:

The target service's index is already in use. (/dev/sdd)


After the above error, and performing --writeconf once, I can repeat these
steps (see below) any number of times and any OSS without --writeconf.


This is an effort to mount an OST to a new OSS. I reproduced this issue
after simplifying some steps and reproducing the behavior (see below)
consistently. I was wondering if anyone could help me to understand this?

[root@OSS-2 opc]# lctl list_nids

10.99.101.18@tcp1

[root@OSS-2 opc]#


[root@OSS-2 opc]# mkfs.lustre --reformat  --ost --fsname="testfs"
--index="64"  --mgsnode "10.99.101.6@tcp1" --mgsnode "10.99.101.7@tcp1"
--servicenode "10.99.101.18@tcp1" "/dev/sdd"


   Permanent disk data:

Target: testfs:OST0040

Index:  64

Lustre FS:  testfs

Mount type: ldiskfs

Flags:  0x1062

  (OST first_time update no_primnode )

Persistent mount opts: ,errors=remount-ro

Parameters:  mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1
failover.node=10.99.101.18@tcp1


device size = 51200MB

formatting backing filesystem ldiskfs on /dev/sdd

target name   testfs:OST0040

kilobytes 52428800

options-J size=1024 -I 512 -i 69905 -q -O
extents,uninit_bg,mmp,dir_nlink,quota,project,huge_file,^fast_commit,flex_bg
-G 256 -E resize="4290772992",lazy_journal_init="0",lazy_itable_init="0" -F

mkfs_cmd = mke2fs -j -b 4096 -L testfs:OST0040  -J size=1024 -I 512 -i
69905 -q -O
extents,uninit_bg,mmp,dir_nlink,quota,project,huge_file,^fast_commit,flex_bg
-G 256 -E resize="4290772992",lazy_journal_init="0",lazy_itable_init="0" -F
/dev/sdd 52428800k

Writing CONFIGS/mountdata


[root@OSS-2 opc]# tunefs.lustre --dryrun /dev/sdd

checking for existing Lustre data: found


   Read previous values:

Target: testfs-OST0040

Index:  64

Lustre FS:  testfs

Mount type: ldiskfs

Flags:  0x1062

  (OST first_time update no_primnode )

Persistent mount opts: ,errors=remount-ro

Parameters:  mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1
failover.node=10.99.101.18@tcp1



   Permanent disk data:

Target: testfs:OST0040

Index:  64

Lustre FS:  testfs

Mount type: ldiskfs

Flags:  0x1062

  (OST first_time update no_primnode )

Persistent mount opts: ,errors=remount-ro

Parameters:  mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1
failover.node=10.99.101.18@tcp1


exiting before disk write.

[root@OSS-2 opc]#


[root@OSS-2 opc]# tunefs.lustre --erase-param failover.node --servicenode
10.99.101.18@tcp1 /dev/sdd

checking for existing Lustre data: found


   Read previous values:

Target: testfs-OST0040

Index:  64

Lustre FS:  testfs

Mount type: ldiskfs

Flags:  0x1062

  (OST first_time update no_primnode )

Persistent mount opts: ,errors=remount-ro

Parameters:  mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1
failover.node=10.99.101.18@tcp1



   Permanent disk data:

Target: testfs:OST0040

Index:  64

Lustre FS:  testfs

Mount type: ldiskfs

Flags:  0x1062

  (OST first_time update no_primnode )

Persistent mount opts: ,errors=remount-ro

Parameters:  mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1
failover.node=10.99.101.18@tcp1


Writing CONFIGS/mountdata


[root@OSS-2 opc]# mkdir /testfs-OST0040

[root@OSS-2 opc]# mount -t lustre /dev/sdd  /testfs-OST0040

mount.lustre: increased
'/sys/devices/platform/host5/session3/target5:0:0/5:0:0:1/block/sdd/queue/max_sectors_kb'
from 1024 to 16384

[root@OSS-2 opc]#


[root@OSS-2 opc]# tunefs.lustre --dryrun /dev/sdd

checking for existing Lustre data: found


   Read previous values:

Target: testfs-OST0040

Index:  64

Lustre FS:  testfs

Mount type: ldiskfs

Flags:  0x1002

  (OST no_primnode )

Persistent mount opts: ,errors=remount-ro

Parameters:  mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1
failover.node=10.99.101.18@tcp1



   Permanent disk data:

Target: testfs-OST0040

Index:  64

Lustre FS:  testfs

Mount type: ldiskfs

Flags:  0x1002

  (OST no_primnode )

Persistent mount opts: ,errors=remount-ro

Parameters:  mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1
failover.node=10.99.101.18@tcp1


exiting before disk write.

[root@OSS-2 opc]#




Going over to OSS-3 and trying to mount OST.



[root@OSS-3 opc]# lctl list_nids

10.99.101.19@tcp1

[root@OSS-3 opc]#


Parameters looks same as OSS-2


[root@OSS-3 opc]#  tunefs.lustre --dryrun /dev/sdd

checking for existing Lustre data: found


   Read previous values:

Target: testfs-OST0040

Index:  64

Lustre FS:  testfs

Mount type: ldiskfs

Flags:  0x1002

  (OST no_primnode )

Persistent mount opts: ,errors=remount-ro

Parameters:  mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1
failover.node=10.99.101.18@tcp1



   Permanent disk data:

Target: testfs-OST0040

Index:  64

Lustre FS:  testfs

Mount type: ldiskfs

Flags:

Re: [lustre-discuss] Recommendation on number of OSTs

2024-01-12 Thread Backer via lustre-discuss
Thanks for the advice!

On Fri, 12 Jan 2024 at 19:23, Andreas Dilger  wrote:

> I would recommend *not* to use too many OSTs as this causes fragmentation
> of the free space, and excess overhead in managing the connections.  Today,
> single OSTs can be up to 500TiB in size (or larger, though not necessarily
> optimal for performance). Depending on your cluster size and total
> capacity, it is typical for large systems to have a couple hundred OSTs,
> 2-4 per OSS balancing the storage and network bandwidth.
>
> On Jan 12, 2024, at 07:37, Backer via lustre-discuss <
> lustre-discuss@lists.lustre.org> wrote:
>
>
> Hi All,
>
> What is the recommendation on the total number of OSTs?
>
> In order to maximize throughput, go for more number of OSS with small
> OSTs. This means that it will end up with 1000s of OSTs. Any suggestions or
> recommendations?
>
> Thank you,
>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Whamcloud
>
>
>
>
>
>
>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Mixing ZFS and LDISKFS

2024-01-12 Thread Backer via lustre-discuss
Sounds good. Thank you!

On Fri, 12 Jan 2024 at 19:28, Andreas Dilger  wrote:

> All of the OSTs and MDTs are "independently managed" (have their own
> connection state between each client and target) so this should be
> possible, though I don't know of sites that are doing this.  Possibly this
> makes sense to put NVMe flash OSTs on ldiskfs, and HDD OSTs on ZFS, and
> then put them in OST pools so that they are managed separately.
>
> On Jan 12, 2024, at 10:38, Backer  wrote:
>
> Thank you Andreas! How about mixing OSTs?  The requirement is to do RAID
> with small volumes using ZFS and have a large OST. This is to reduce the
> number of OSTs overall as the cluster being extended.
>
> On Fri, 12 Jan 2024 at 11:26, Andreas Dilger 
> wrote:
>
>> Yes, some systems use ldiskfs for the MDT (for performance) and ZFS for
>> the OSTs (for low-cost RAID).  The IOPS performance of ZFS is low vs.
>> ldiskfs, but the streaming bandwidth is fine.
>>
>> Cheers, Andreas
>>
>> > On Jan 12, 2024, at 08:40, Backer via lustre-discuss <
>> lustre-discuss@lists.lustre.org> wrote:
>> >
>> > 
>> > Hi,
>> >
>> > Could we mix ZFS and LDISKFS together in a cluster?
>> >
>> > Thank you,
>> >
>> >
>> > ___
>> > lustre-discuss mailing list
>> > lustre-discuss@lists.lustre.org
>> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Whamcloud
>
>
>
>
>
>
>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Mixing ZFS and LDISKFS

2024-01-12 Thread Backer via lustre-discuss
Thank you Andreas! How about mixing OSTs?  The requirement is to do RAID
with small volumes using ZFS and have a large OST. This is to reduce the
number of OSTs overall as the cluster being extended.

On Fri, 12 Jan 2024 at 11:26, Andreas Dilger  wrote:

> Yes, some systems use ldiskfs for the MDT (for performance) and ZFS for
> the OSTs (for low-cost RAID).  The IOPS performance of ZFS is low vs.
> ldiskfs, but the streaming bandwidth is fine.
>
> Cheers, Andreas
>
> > On Jan 12, 2024, at 08:40, Backer via lustre-discuss <
> lustre-discuss@lists.lustre.org> wrote:
> >
> > 
> > Hi,
> >
> > Could we mix ZFS and LDISKFS together in a cluster?
> >
> > Thank you,
> >
> >
> > ___
> > lustre-discuss mailing list
> > lustre-discuss@lists.lustre.org
> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Mixing ZFS and LDISKFS

2024-01-12 Thread Backer via lustre-discuss
Hi,

Could we mix ZFS and LDISKFS together in a cluster?

Thank you,
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Recommendation on number of OSTs

2024-01-12 Thread Backer via lustre-discuss
Hi All,

What is the recommendation on the total number of OSTs?

In order to maximize throughput, go for more number of OSS with small OSTs.
This means that it will end up with 1000s of OSTs. Any suggestions or
recommendations?

Thank you,
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Extending Lustre file system

2024-01-09 Thread Backer via lustre-discuss
Thank you all for the valuable information. Are there any tools that I
could use to migrate (rebalance) OSTs? I know about lfs_migrate. Is there a
tool that walks and balance the OST usage?

Thank you!


On Mon, 8 Jan 2024 at 09:38, Backer  wrote:

> Hi,
>
> Good morning and happy new year!
>
> I have a quick question on extending a lustre file system. The extension
> is performed online. I am looking for any best practices or anything to
> watchout while doing the file system extension. The file system extension
> is done adding new OSS and many OSTs within these servers.
>
> Really appreciate your help on this.
>
> Regards,
>
>
>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Extending Lustre file system

2024-01-08 Thread Backer via lustre-discuss
Hi,

Good morning and happy new year!

I have a quick question on extending a lustre file system. The extension is
performed online. I am looking for any best practices or anything to
watchout while doing the file system extension. The file system extension
is done adding new OSS and many OSTs within these servers.

Really appreciate your help on this.

Regards,
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] What is the meaning of these messages?

2023-12-08 Thread Backer via lustre-discuss
Hi All,

Just sending this again.

On Tue, 5 Dec 2023 at 15:03, Backer  wrote:

> Hi All,
>
> Time to time, I see the following messages on multiple OSS about a
> particular client IP. What does it mean? All the OSS and OSTs are online
> and has been online in the past.
>
> Dec  4 18:05:27 oss010 kernel: LustreError: 137-5: fs-OST00b0_UUID: not
> available for connect from @tcp1 (no target). If you are running
> an HA pair check that the target is mounted on the other server.
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] What is the meaning of these messages?

2023-12-05 Thread Backer via lustre-discuss
Hi All,

Time to time, I see the following messages on multiple OSS about a
particular client IP. What does it mean? All the OSS and OSTs are online
and has been online in the past.

Dec  4 18:05:27 oss010 kernel: LustreError: 137-5: fs-OST00b0_UUID: not
available for connect from @tcp1 (no target). If you are running
an HA pair check that the target is mounted on the other server.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Error messages (ex: not available for connect from 0@lo) on server boot with Lustre 2.15.3 and 2.15.4-RC1

2023-12-04 Thread Backer via lustre-discuss
I do not want to hijack this thread but just checking here before I start
another new thread. I am getting similar messages randomly. The IP involved
here is one Client IP. Getting messages from multiple OSS about multiple
OST at the same time and stops. These types of messages appear
occasionally on multiple OSS, and all these are related to one client at a
time.  Wondering if it is one client related issue as this FS has 100s of
clients and only one client reports at a time. Unfortunately, there is no
easy way for me to figure out if the specified client had an access issue
around the time frame mentioned in the log (no access to clients).

Dec  4 18:05:27 oss010 kernel: LustreError: 137-5: fs-OST00b0_UUID: not
available for connect from @tcp1 (no target). If you are running
an HA pair check that the target is mounted on the other server.

On Mon, 4 Dec 2023 at 05:27, Andreas Dilger via lustre-discuss <
lustre-discuss@lists.lustre.org> wrote:

> It wasn't clear from your rail which message(s) are you concerned about?
> These look like normal mount message(s) to me.
>
> The "error" is pretty normal, it just means there were multiple services
> starting at once and one wasn't yet ready for the other.
>
>  LustreError: 137-5: lustrevm-MDT_UUID: not available for
> connect
>  from 0@lo (no target). If you are running an HA pair check that
> the target
> is mounted on the other server.
>
> It probably makes sense to quiet this message right at mount time to avoid
> this.
>
> Cheers, Andreas
>
> On Dec 1, 2023, at 10:24, Audet, Martin via lustre-discuss <
> lustre-discuss@lists.lustre.org> wrote:
>
> 
>
> Hello Lustre community,
>
>
> Have someone ever seen messages like these on in "/var/log/messages" on a
> Lustre server ?
>
> Dec  1 11:26:30 vlfs kernel: Lustre: Lustre: Build Version: 2.15.4_RC1
> Dec  1 11:26:30 vlfs kernel: LDISKFS-fs (sdd): mounted filesystem with
> ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc
> Dec  1 11:26:30 vlfs kernel: LDISKFS-fs (sdc): mounted filesystem with
> ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc
> Dec  1 11:26:30 vlfs kernel: LDISKFS-fs (sdb): mounted filesystem with
> ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
> Dec  1 11:26:36 vlfs kernel: LustreError: 137-5: lustrevm-MDT_UUID:
> not available for connect from 0@lo (no target). If you are running an HA
> pair check that the target is mounted on the other server.
> Dec  1 11:26:36 vlfs kernel: Lustre: lustrevm-OST0001: Imperative Recovery
> not enabled, recovery window 300-900
> Dec  1 11:26:36 vlfs kernel: Lustre: lustrevm-OST0001: deleting orphan
> objects from 0x0:227 to 0x0:513
>
> This happens on every boot on a Lustre server named vlfs (a AlmaLinux 8.9
> VM hosted on a VMware) playing the role of both MGS and OSS (it hosts an
> MDT two OST using "virtual" disks). We chose LDISKFS and not ZFS. Note that
> this happens at every boot, well before the clients (AlmaLinux 9.3 or 8.9
> VMs) connect and even when the clients are powered off. The network
> connecting the clients and the server is a "virtual" 10GbE network (of
> course there is no virtual IB). Also we had the same messages previously
> with Lustre 2.15.3 using an AlmaLinux 8.8 server and AlmaLinux 8.8 / 9.2
> clients (also using VMs). Note also that we compile ourselves the Lustre
> RPMs from the sources from the git repository. We also chose to use a
> patched kernel. Our build procedure for RPMs seems to work well because
> our real cluster run fine on CentOS 7.9 with Lustre 2.12.9 and IB (MOFED)
> networking.
>
> So has anyone seen these messages ?
>
> Are they problematic ? If yes, how do we avoid them ?
>
> We would like to make sure our small test system using VMs works well
> before we upgrade our real cluster.
>
> Thanks in advance !
>
> Martin Audet
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] OST is not mounting

2023-11-08 Thread Backer via lustre-discuss
Thanks for the explanation. There was a problem with the iscsi target. It
is already multi-path. Anyhow, I was expecting things to come back online
after the problem was resolved. This kind of created a data loss situation
and I thought Lustre was resilient not to lose the whole OST. Here the OST
became completely unmountable.

On Tue, 7 Nov 2023 at 13:56, Andreas Dilger  wrote:

> The OST went read-only because that is what happens when the block device
> disappears underneath it. That is a behavior of ext4 and other local
> filesystems as well.
>
> If you look in the console logs you would see SCSI errors and the
> filesystem being remounted read-only.
>
> To have reliability in the face of such storage issues you need to use
> dm-multipath.
>
> Cheers, Andreas
>
> > On Nov 5, 2023, at 09:13, Backer via lustre-discuss <
> lustre-discuss@lists.lustre.org> wrote:
> >
> > - Why did OST become in this state after the write failure and was
> mounted RO.  The write error was due to iSCSI target going offline and
> coming back after a few seconds later.
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] OST is not mounting

2023-11-07 Thread Backer via lustre-discuss
Hi,

Sending this again. Appreciate your help.

On Sun, 5 Nov 2023 at 11:11, Backer  wrote:

> Hi,
>
> I am new to this email list. Looking to get some help on why an OST is not
> getting mounted.
>
>
> The cluster was running healthy and the OST experienced an issue and Linux
> re-mounted the OST read only. After fixing the issue and rebooting the node
> multiple times, it wouldn't mount.
>
> When the mount is done, the mount command errors out stating that that the
> index is already in use. The index for the device is 33.  There is no place
> where this index is mounted.
>
> The debug message from the MGS during the mount is attached at the end of
> this email. It is asking to use writeconf. After using writeconfig, the
> device was mounted. Looking for a couple of things here.
>
> - I am hoping that the writeconf method is the right thing to do here.
> - Why did OST become in this state after the write failure and was mounted
> RO.  The write error was due to iSCSI target going offline and coming back
> after a few seconds later.
>
> 2000:0100:17.0:1698240468.758487:0:91492:0:(mgs_handler.c:496:mgs_target_reg())
> updating fs1-OST0021, index=33
>
> 2000:0001:17.0:1698240468.758488:0:91492:0:(mgs_llog.c:4403:mgs_write_log_target())
> Process entered
>
> 2000:0001:17.0:1698240468.758488:0:91492:0:(mgs_llog.c:671:mgs_set_index())
> Process entered
>
> 2000:0001:17.0:1698240468.758488:0:91492:0:(mgs_llog.c:572:mgs_find_or_make_fsdb())
> Process entered
>
> 2000:0001:17.0:1698240468.758489:0:91492:0:(mgs_llog.c:551:mgs_find_or_make_fsdb_nolock())
> Process entered
>
> 2000:0001:17.0:1698240468.758489:0:91492:0:(mgs_llog.c:565:mgs_find_or_make_fsdb_nolock())
> Process leaving (rc=0 : 0 : 0)
>
> 2000:0001:17.0:1698240468.758489:0:91492:0:(mgs_llog.c:578:mgs_find_or_make_fsdb())
> Process leaving (rc=0 : 0 : 0)
>
> 2000:0202:17.0:1698240468.758490:0:91492:0:(mgs_llog.c:711:mgs_set_index())
> 140-5: Server fs1-OST0021 requested index 33, but that index is already in
> use. Use --writeconf to force
>
> 2000:0001:17.0:1698240468.772355:0:91492:0:(mgs_llog.c:712:mgs_set_index())
> Process leaving via out_up (rc=18446744073709551518 : -98 :
> 0xff9e)
>
> 2000:0001:17.0:1698240468.772356:0:91492:0:(mgs_llog.c:4408:mgs_write_log_target())
> Process leaving (rc=18446744073709551518 : -98 : ff9e)
>
> 2000:0002:17.0:1698240468.772357:0:91492:0:(mgs_handler.c:503:mgs_target_reg())
> Failed to write fs1-OST0021 log (-98)
>
> 2000:0001:17.0:1698240468.783747:0:91492:0:(mgs_handler.c:504:mgs_target_reg())
> Process leaving via out (rc=18446744073709551518 : -98 : 0xff9e)
>
>
>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] OST is not mounting

2023-11-05 Thread Backer via lustre-discuss
Hi,

I am new to this email list. Looking to get some help on why an OST is not
getting mounted.


The cluster was running healthy and the OST experienced an issue and Linux
re-mounted the OST read only. After fixing the issue and rebooting the node
multiple times, it wouldn't mount.

When the mount is done, the mount command errors out stating that that the
index is already in use. The index for the device is 33.  There is no place
where this index is mounted.

The debug message from the MGS during the mount is attached at the end of
this email. It is asking to use writeconf. After using writeconfig, the
device was mounted. Looking for a couple of things here.

- I am hoping that the writeconf method is the right thing to do here.
- Why did OST become in this state after the write failure and was mounted
RO.  The write error was due to iSCSI target going offline and coming back
after a few seconds later.

2000:0100:17.0:1698240468.758487:0:91492:0:(mgs_handler.c:496:mgs_target_reg())
updating fs1-OST0021, index=33

2000:0001:17.0:1698240468.758488:0:91492:0:(mgs_llog.c:4403:mgs_write_log_target())
Process entered

2000:0001:17.0:1698240468.758488:0:91492:0:(mgs_llog.c:671:mgs_set_index())
Process entered

2000:0001:17.0:1698240468.758488:0:91492:0:(mgs_llog.c:572:mgs_find_or_make_fsdb())
Process entered

2000:0001:17.0:1698240468.758489:0:91492:0:(mgs_llog.c:551:mgs_find_or_make_fsdb_nolock())
Process entered

2000:0001:17.0:1698240468.758489:0:91492:0:(mgs_llog.c:565:mgs_find_or_make_fsdb_nolock())
Process leaving (rc=0 : 0 : 0)

2000:0001:17.0:1698240468.758489:0:91492:0:(mgs_llog.c:578:mgs_find_or_make_fsdb())
Process leaving (rc=0 : 0 : 0)

2000:0202:17.0:1698240468.758490:0:91492:0:(mgs_llog.c:711:mgs_set_index())
140-5: Server fs1-OST0021 requested index 33, but that index is already in
use. Use --writeconf to force

2000:0001:17.0:1698240468.772355:0:91492:0:(mgs_llog.c:712:mgs_set_index())
Process leaving via out_up (rc=18446744073709551518 : -98 :
0xff9e)

2000:0001:17.0:1698240468.772356:0:91492:0:(mgs_llog.c:4408:mgs_write_log_target())
Process leaving (rc=18446744073709551518 : -98 : ff9e)

2000:0002:17.0:1698240468.772357:0:91492:0:(mgs_handler.c:503:mgs_target_reg())
Failed to write fs1-OST0021 log (-98)

2000:0001:17.0:1698240468.783747:0:91492:0:(mgs_handler.c:504:mgs_target_reg())
Process leaving via out (rc=18446744073709551518 : -98 : 0xff9e)
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org