Re: [lustre-discuss] Odd behavior with tunefs.lustre and device index
Thank you Andreas. Are you aware of any paid engagements/support for requests like these to get changes done quickly? On Wed, 24 Jan 2024 at 20:52, Andreas Dilger wrote: > This is more like a bug report and should be filed in Jira. > That said, no guarantee that someone would be able to > work on this in a timely manner. > > On Jan 24, 2024, at 09:47, Backer via lustre-discuss < > lustre-discuss@lists.lustre.org> wrote: > > Just pushing it on to the top of inbox :) Or is there any other > distribution list that is more appropriate for this type of questions? I am > also trying devel mailing list. > > On Sun, 21 Jan 2024 at 18:34, Backer wrote: > >> Just to clarify. OSS-2 is completely powered off (hard power off without >> any graceful shutdown) before start working on OSS-3. >> >> On Sun, 21 Jan 2024 at 12:12, Backer wrote: >> >>> Hi All, >>> >>> I am seeing a behavior with tunefs.lustre. After changing the failover >>> node and trying to mount an OST, getting getting the following error: >>> >>> The target service's index is already in use. (/dev/sdd) >>> >>> After the above error, and performing --writeconf once, I can repeat >>> these steps (see below) any number of times and any OSS without >>> --writeconf. >>> >>> This is an effort to mount an OST to a new OSS. I reproduced this issue >>> after simplifying some steps and reproducing the behavior (see below) >>> consistently. I was wondering if anyone could help me to understand this? >>> >>> [root@OSS-2 opc]# lctl list_nids >>> 10.99.101.18@tcp1 >>> [root@OSS-2 opc]# >>> >>> [root@OSS-2 opc]# mkfs.lustre --reformat --ost --fsname="testfs" >>> --index="64" --mgsnode "10.99.101.6@tcp1" --mgsnode "10.99.101.7@tcp1" >>> --servicenode "10.99.101.18@tcp1" "/dev/sdd" >>> >>>Permanent disk data: >>> Target: testfs:OST0040 >>> Index: 64 >>> Lustre FS: testfs >>> Mount type: ldiskfs >>> Flags: 0x1062 >>> (OST first_time update no_primnode ) >>> Persistent mount opts: ,errors=remount-ro >>> Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 >>> failover.node=10.99.101.18@tcp1 >>> >>> device size = 51200MB >>> formatting backing filesystem ldiskfs on /dev/sdd >>> target name testfs:OST0040 >>> kilobytes 52428800 >>> options-J size=1024 -I 512 -i 69905 -q -O >>> extents,uninit_bg,mmp,dir_nlink,quota,project,huge_file,^fast_commit,flex_bg >>> -G 256 -E resize="4290772992",lazy_journal_init="0",lazy_itable_init="0" -F >>> mkfs_cmd = mke2fs -j -b 4096 -L testfs:OST0040 -J size=1024 -I 512 -i >>> 69905 -q -O >>> extents,uninit_bg,mmp,dir_nlink,quota,project,huge_file,^fast_commit,flex_bg >>> -G 256 -E resize="4290772992",lazy_journal_init="0",lazy_itable_init="0" -F >>> /dev/sdd 52428800k >>> Writing CONFIGS/mountdata >>> >>> [root@OSS-2 opc]# tunefs.lustre --dryrun /dev/sdd >>> checking for existing Lustre data: found >>> >>>Read previous values: >>> Target: testfs-OST0040 >>> Index: 64 >>> Lustre FS: testfs >>> Mount type: ldiskfs >>> Flags: 0x1062 >>> (OST first_time update no_primnode ) >>> Persistent mount opts: ,errors=remount-ro >>> Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 >>> failover.node=10.99.101.18@tcp1 >>> >>> >>>Permanent disk data: >>> Target: testfs:OST0040 >>> Index: 64 >>> Lustre FS: testfs >>> Mount type: ldiskfs >>> Flags: 0x1062 >>> (OST first_time update no_primnode ) >>> Persistent mount opts: ,errors=remount-ro >>> Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 >>> failover.node=10.99.101.18@tcp1 >>> >>> exiting before disk write. >>> [root@OSS-2 opc]# >>> >>> [root@OSS-2 opc]# tunefs.lustre --erase-param failover.node >>> --servicenode 10.99.101.18@tcp1 /dev/sdd >>> checking for existing Lustre data: found >>> >>>Read previous values: >>> Target: testfs-OST0040 >>> Index: 64 >>> Lustre FS: testfs >>> Mount type: ldiskfs >>> Flags: 0x1062 >>> (OST first_time update no_primnode ) >>>
Re: [lustre-discuss] Odd behavior with tunefs.lustre and device index
Just pushing it on to the top of inbox :) Or is there any other distribution list that is more appropriate for this type of questions? I am also trying devel mailing list. On Sun, 21 Jan 2024 at 18:34, Backer wrote: > Just to clarify. OSS-2 is completely powered off (hard power off without > any graceful shutdown) before start working on OSS-3. > > On Sun, 21 Jan 2024 at 12:12, Backer wrote: > >> Hi All, >> >> I am seeing a behavior with tunefs.lustre. After changing the failover >> node and trying to mount an OST, getting getting the following error: >> >> The target service's index is already in use. (/dev/sdd) >> >> >> After the above error, and performing --writeconf once, I can repeat >> these steps (see below) any number of times and any OSS without >> --writeconf. >> >> >> This is an effort to mount an OST to a new OSS. I reproduced this issue >> after simplifying some steps and reproducing the behavior (see below) >> consistently. I was wondering if anyone could help me to understand this? >> >> [root@OSS-2 opc]# lctl list_nids >> >> 10.99.101.18@tcp1 >> >> [root@OSS-2 opc]# >> >> >> [root@OSS-2 opc]# mkfs.lustre --reformat --ost --fsname="testfs" >> --index="64" --mgsnode "10.99.101.6@tcp1" --mgsnode "10.99.101.7@tcp1" >> --servicenode "10.99.101.18@tcp1" "/dev/sdd" >> >> >>Permanent disk data: >> >> Target: testfs:OST0040 >> >> Index: 64 >> >> Lustre FS: testfs >> >> Mount type: ldiskfs >> >> Flags: 0x1062 >> >> (OST first_time update no_primnode ) >> >> Persistent mount opts: ,errors=remount-ro >> >> Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 >> failover.node=10.99.101.18@tcp1 >> >> >> device size = 51200MB >> >> formatting backing filesystem ldiskfs on /dev/sdd >> >> target name testfs:OST0040 >> >> kilobytes 52428800 >> >> options-J size=1024 -I 512 -i 69905 -q -O >> extents,uninit_bg,mmp,dir_nlink,quota,project,huge_file,^fast_commit,flex_bg >> -G 256 -E resize="4290772992",lazy_journal_init="0",lazy_itable_init="0" -F >> >> mkfs_cmd = mke2fs -j -b 4096 -L testfs:OST0040 -J size=1024 -I 512 -i >> 69905 -q -O >> extents,uninit_bg,mmp,dir_nlink,quota,project,huge_file,^fast_commit,flex_bg >> -G 256 -E resize="4290772992",lazy_journal_init="0",lazy_itable_init="0" -F >> /dev/sdd 52428800k >> >> Writing CONFIGS/mountdata >> >> >> [root@OSS-2 opc]# tunefs.lustre --dryrun /dev/sdd >> >> checking for existing Lustre data: found >> >> >>Read previous values: >> >> Target: testfs-OST0040 >> >> Index: 64 >> >> Lustre FS: testfs >> >> Mount type: ldiskfs >> >> Flags: 0x1062 >> >> (OST first_time update no_primnode ) >> >> Persistent mount opts: ,errors=remount-ro >> >> Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 >> failover.node=10.99.101.18@tcp1 >> >> >> >>Permanent disk data: >> >> Target: testfs:OST0040 >> >> Index: 64 >> >> Lustre FS: testfs >> >> Mount type: ldiskfs >> >> Flags: 0x1062 >> >> (OST first_time update no_primnode ) >> >> Persistent mount opts: ,errors=remount-ro >> >> Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 >> failover.node=10.99.101.18@tcp1 >> >> >> exiting before disk write. >> >> [root@OSS-2 opc]# >> >> >> [root@OSS-2 opc]# tunefs.lustre --erase-param failover.node >> --servicenode 10.99.101.18@tcp1 /dev/sdd >> >> checking for existing Lustre data: found >> >> >>Read previous values: >> >> Target: testfs-OST0040 >> >> Index: 64 >> >> Lustre FS: testfs >> >> Mount type: ldiskfs >> >> Flags: 0x1062 >> >> (OST first_time update no_primnode ) >> >> Persistent mount opts: ,errors=remount-ro >> >> Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 >> failover.node=10.99.101.18@tcp1 >> >> >> >>Permanent disk data: >> >> Target: testfs:OST0040 >> >> Index: 64 >> >> Lustre FS: testfs >> >> Mount type: ldiskfs >> >> Flags: 0x1062 >> >> (OST first_time update no_primnode ) >> >> Persistent mount opts: ,errors=remount-ro >> >> Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 >> failover.node=10.99.101.18@tcp1 >> >> >> Writing CONFIGS/mountdata >> >> >> [root@OSS-2 opc]# mkdir /testfs-OST0040 >> >> [root@OSS-2 opc]# mount -t lustre /dev/sdd /testfs-OST0040 >> >> mount.lustre: increased >> '/sys/devices/platform/host5/session3/target5:0:0/5:0:0:1/block/sdd/queue/max_sectors_kb' >> from 1024 to 16384 >> >> [root@OSS-2 opc]# >> >> >> [root@OSS-2 opc]# tunefs.lustre --dryrun /dev/sdd >> >> checking for existing Lustre data: found >> >> >>Read previous values: >> >> Target: testfs-OST0040 >> >> Index: 64 >> >> Lustre FS: testfs >> >> Mount type: ldiskfs >> >> Flags: 0x1002 >> >> (OST no_primnode ) >> >> Persistent mount opts: ,errors=remount-ro >> >> Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 >> failover.node=10.99.101.18@tcp1 >> >> >> >>Permanent disk data: >> >> Target: testfs-OST0040 >> >> Index: 64 >> >> Lus
Re: [lustre-discuss] Odd behavior with tunefs.lustre and device index
Just to clarify. OSS-2 is completely powered off (hard power off without any graceful shutdown) before start working on OSS-3. On Sun, 21 Jan 2024 at 12:12, Backer wrote: > Hi All, > > I am seeing a behavior with tunefs.lustre. After changing the failover > node and trying to mount an OST, getting getting the following error: > > The target service's index is already in use. (/dev/sdd) > > > After the above error, and performing --writeconf once, I can repeat these > steps (see below) any number of times and any OSS without --writeconf. > > > This is an effort to mount an OST to a new OSS. I reproduced this issue > after simplifying some steps and reproducing the behavior (see below) > consistently. I was wondering if anyone could help me to understand this? > > [root@OSS-2 opc]# lctl list_nids > > 10.99.101.18@tcp1 > > [root@OSS-2 opc]# > > > [root@OSS-2 opc]# mkfs.lustre --reformat --ost --fsname="testfs" > --index="64" --mgsnode "10.99.101.6@tcp1" --mgsnode "10.99.101.7@tcp1" > --servicenode "10.99.101.18@tcp1" "/dev/sdd" > > >Permanent disk data: > > Target: testfs:OST0040 > > Index: 64 > > Lustre FS: testfs > > Mount type: ldiskfs > > Flags: 0x1062 > > (OST first_time update no_primnode ) > > Persistent mount opts: ,errors=remount-ro > > Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 > failover.node=10.99.101.18@tcp1 > > > device size = 51200MB > > formatting backing filesystem ldiskfs on /dev/sdd > > target name testfs:OST0040 > > kilobytes 52428800 > > options-J size=1024 -I 512 -i 69905 -q -O > extents,uninit_bg,mmp,dir_nlink,quota,project,huge_file,^fast_commit,flex_bg > -G 256 -E resize="4290772992",lazy_journal_init="0",lazy_itable_init="0" -F > > mkfs_cmd = mke2fs -j -b 4096 -L testfs:OST0040 -J size=1024 -I 512 -i > 69905 -q -O > extents,uninit_bg,mmp,dir_nlink,quota,project,huge_file,^fast_commit,flex_bg > -G 256 -E resize="4290772992",lazy_journal_init="0",lazy_itable_init="0" -F > /dev/sdd 52428800k > > Writing CONFIGS/mountdata > > > [root@OSS-2 opc]# tunefs.lustre --dryrun /dev/sdd > > checking for existing Lustre data: found > > >Read previous values: > > Target: testfs-OST0040 > > Index: 64 > > Lustre FS: testfs > > Mount type: ldiskfs > > Flags: 0x1062 > > (OST first_time update no_primnode ) > > Persistent mount opts: ,errors=remount-ro > > Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 > failover.node=10.99.101.18@tcp1 > > > >Permanent disk data: > > Target: testfs:OST0040 > > Index: 64 > > Lustre FS: testfs > > Mount type: ldiskfs > > Flags: 0x1062 > > (OST first_time update no_primnode ) > > Persistent mount opts: ,errors=remount-ro > > Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 > failover.node=10.99.101.18@tcp1 > > > exiting before disk write. > > [root@OSS-2 opc]# > > > [root@OSS-2 opc]# tunefs.lustre --erase-param failover.node --servicenode > 10.99.101.18@tcp1 /dev/sdd > > checking for existing Lustre data: found > > >Read previous values: > > Target: testfs-OST0040 > > Index: 64 > > Lustre FS: testfs > > Mount type: ldiskfs > > Flags: 0x1062 > > (OST first_time update no_primnode ) > > Persistent mount opts: ,errors=remount-ro > > Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 > failover.node=10.99.101.18@tcp1 > > > >Permanent disk data: > > Target: testfs:OST0040 > > Index: 64 > > Lustre FS: testfs > > Mount type: ldiskfs > > Flags: 0x1062 > > (OST first_time update no_primnode ) > > Persistent mount opts: ,errors=remount-ro > > Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 > failover.node=10.99.101.18@tcp1 > > > Writing CONFIGS/mountdata > > > [root@OSS-2 opc]# mkdir /testfs-OST0040 > > [root@OSS-2 opc]# mount -t lustre /dev/sdd /testfs-OST0040 > > mount.lustre: increased > '/sys/devices/platform/host5/session3/target5:0:0/5:0:0:1/block/sdd/queue/max_sectors_kb' > from 1024 to 16384 > > [root@OSS-2 opc]# > > > [root@OSS-2 opc]# tunefs.lustre --dryrun /dev/sdd > > checking for existing Lustre data: found > > >Read previous values: > > Target: testfs-OST0040 > > Index: 64 > > Lustre FS: testfs > > Mount type: ldiskfs > > Flags: 0x1002 > > (OST no_primnode ) > > Persistent mount opts: ,errors=remount-ro > > Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 > failover.node=10.99.101.18@tcp1 > > > >Permanent disk data: > > Target: testfs-OST0040 > > Index: 64 > > Lustre FS: testfs > > Mount type: ldiskfs > > Flags: 0x1002 > > (OST no_primnode ) > > Persistent mount opts: ,errors=remount-ro > > Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 > failover.node=10.99.101.18@tcp1 > > > exiting before disk write. > > [root@OSS-2 opc]# > > > > > Going over to OSS-3 and trying to mount OST. > > > > [root@OSS-3 opc]# lctl list_nids > > 10.99.101.19@tcp1 > > [root@OSS-3 opc]# > > > Param
[lustre-discuss] Odd behavior with tunefs.lustre and device index
Hi All, I am seeing a behavior with tunefs.lustre. After changing the failover node and trying to mount an OST, getting getting the following error: The target service's index is already in use. (/dev/sdd) After the above error, and performing --writeconf once, I can repeat these steps (see below) any number of times and any OSS without --writeconf. This is an effort to mount an OST to a new OSS. I reproduced this issue after simplifying some steps and reproducing the behavior (see below) consistently. I was wondering if anyone could help me to understand this? [root@OSS-2 opc]# lctl list_nids 10.99.101.18@tcp1 [root@OSS-2 opc]# [root@OSS-2 opc]# mkfs.lustre --reformat --ost --fsname="testfs" --index="64" --mgsnode "10.99.101.6@tcp1" --mgsnode "10.99.101.7@tcp1" --servicenode "10.99.101.18@tcp1" "/dev/sdd" Permanent disk data: Target: testfs:OST0040 Index: 64 Lustre FS: testfs Mount type: ldiskfs Flags: 0x1062 (OST first_time update no_primnode ) Persistent mount opts: ,errors=remount-ro Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 failover.node=10.99.101.18@tcp1 device size = 51200MB formatting backing filesystem ldiskfs on /dev/sdd target name testfs:OST0040 kilobytes 52428800 options-J size=1024 -I 512 -i 69905 -q -O extents,uninit_bg,mmp,dir_nlink,quota,project,huge_file,^fast_commit,flex_bg -G 256 -E resize="4290772992",lazy_journal_init="0",lazy_itable_init="0" -F mkfs_cmd = mke2fs -j -b 4096 -L testfs:OST0040 -J size=1024 -I 512 -i 69905 -q -O extents,uninit_bg,mmp,dir_nlink,quota,project,huge_file,^fast_commit,flex_bg -G 256 -E resize="4290772992",lazy_journal_init="0",lazy_itable_init="0" -F /dev/sdd 52428800k Writing CONFIGS/mountdata [root@OSS-2 opc]# tunefs.lustre --dryrun /dev/sdd checking for existing Lustre data: found Read previous values: Target: testfs-OST0040 Index: 64 Lustre FS: testfs Mount type: ldiskfs Flags: 0x1062 (OST first_time update no_primnode ) Persistent mount opts: ,errors=remount-ro Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 failover.node=10.99.101.18@tcp1 Permanent disk data: Target: testfs:OST0040 Index: 64 Lustre FS: testfs Mount type: ldiskfs Flags: 0x1062 (OST first_time update no_primnode ) Persistent mount opts: ,errors=remount-ro Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 failover.node=10.99.101.18@tcp1 exiting before disk write. [root@OSS-2 opc]# [root@OSS-2 opc]# tunefs.lustre --erase-param failover.node --servicenode 10.99.101.18@tcp1 /dev/sdd checking for existing Lustre data: found Read previous values: Target: testfs-OST0040 Index: 64 Lustre FS: testfs Mount type: ldiskfs Flags: 0x1062 (OST first_time update no_primnode ) Persistent mount opts: ,errors=remount-ro Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 failover.node=10.99.101.18@tcp1 Permanent disk data: Target: testfs:OST0040 Index: 64 Lustre FS: testfs Mount type: ldiskfs Flags: 0x1062 (OST first_time update no_primnode ) Persistent mount opts: ,errors=remount-ro Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 failover.node=10.99.101.18@tcp1 Writing CONFIGS/mountdata [root@OSS-2 opc]# mkdir /testfs-OST0040 [root@OSS-2 opc]# mount -t lustre /dev/sdd /testfs-OST0040 mount.lustre: increased '/sys/devices/platform/host5/session3/target5:0:0/5:0:0:1/block/sdd/queue/max_sectors_kb' from 1024 to 16384 [root@OSS-2 opc]# [root@OSS-2 opc]# tunefs.lustre --dryrun /dev/sdd checking for existing Lustre data: found Read previous values: Target: testfs-OST0040 Index: 64 Lustre FS: testfs Mount type: ldiskfs Flags: 0x1002 (OST no_primnode ) Persistent mount opts: ,errors=remount-ro Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 failover.node=10.99.101.18@tcp1 Permanent disk data: Target: testfs-OST0040 Index: 64 Lustre FS: testfs Mount type: ldiskfs Flags: 0x1002 (OST no_primnode ) Persistent mount opts: ,errors=remount-ro Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 failover.node=10.99.101.18@tcp1 exiting before disk write. [root@OSS-2 opc]# Going over to OSS-3 and trying to mount OST. [root@OSS-3 opc]# lctl list_nids 10.99.101.19@tcp1 [root@OSS-3 opc]# Parameters looks same as OSS-2 [root@OSS-3 opc]# tunefs.lustre --dryrun /dev/sdd checking for existing Lustre data: found Read previous values: Target: testfs-OST0040 Index: 64 Lustre FS: testfs Mount type: ldiskfs Flags: 0x1002 (OST no_primnode ) Persistent mount opts: ,errors=remount-ro Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 failover.node=10.99.101.18@tcp1 Permanent disk data: Target: testfs-OST0040 Index: 64 Lustre FS: testfs Mount type: ldiskfs Flags:
Re: [lustre-discuss] Recommendation on number of OSTs
Thanks for the advice! On Fri, 12 Jan 2024 at 19:23, Andreas Dilger wrote: > I would recommend *not* to use too many OSTs as this causes fragmentation > of the free space, and excess overhead in managing the connections. Today, > single OSTs can be up to 500TiB in size (or larger, though not necessarily > optimal for performance). Depending on your cluster size and total > capacity, it is typical for large systems to have a couple hundred OSTs, > 2-4 per OSS balancing the storage and network bandwidth. > > On Jan 12, 2024, at 07:37, Backer via lustre-discuss < > lustre-discuss@lists.lustre.org> wrote: > > > Hi All, > > What is the recommendation on the total number of OSTs? > > In order to maximize throughput, go for more number of OSS with small > OSTs. This means that it will end up with 1000s of OSTs. Any suggestions or > recommendations? > > Thank you, > > > Cheers, Andreas > -- > Andreas Dilger > Lustre Principal Architect > Whamcloud > > > > > > > > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Mixing ZFS and LDISKFS
Sounds good. Thank you! On Fri, 12 Jan 2024 at 19:28, Andreas Dilger wrote: > All of the OSTs and MDTs are "independently managed" (have their own > connection state between each client and target) so this should be > possible, though I don't know of sites that are doing this. Possibly this > makes sense to put NVMe flash OSTs on ldiskfs, and HDD OSTs on ZFS, and > then put them in OST pools so that they are managed separately. > > On Jan 12, 2024, at 10:38, Backer wrote: > > Thank you Andreas! How about mixing OSTs? The requirement is to do RAID > with small volumes using ZFS and have a large OST. This is to reduce the > number of OSTs overall as the cluster being extended. > > On Fri, 12 Jan 2024 at 11:26, Andreas Dilger > wrote: > >> Yes, some systems use ldiskfs for the MDT (for performance) and ZFS for >> the OSTs (for low-cost RAID). The IOPS performance of ZFS is low vs. >> ldiskfs, but the streaming bandwidth is fine. >> >> Cheers, Andreas >> >> > On Jan 12, 2024, at 08:40, Backer via lustre-discuss < >> lustre-discuss@lists.lustre.org> wrote: >> > >> > >> > Hi, >> > >> > Could we mix ZFS and LDISKFS together in a cluster? >> > >> > Thank you, >> > >> > >> > ___ >> > lustre-discuss mailing list >> > lustre-discuss@lists.lustre.org >> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >> > > Cheers, Andreas > -- > Andreas Dilger > Lustre Principal Architect > Whamcloud > > > > > > > > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Mixing ZFS and LDISKFS
Thank you Andreas! How about mixing OSTs? The requirement is to do RAID with small volumes using ZFS and have a large OST. This is to reduce the number of OSTs overall as the cluster being extended. On Fri, 12 Jan 2024 at 11:26, Andreas Dilger wrote: > Yes, some systems use ldiskfs for the MDT (for performance) and ZFS for > the OSTs (for low-cost RAID). The IOPS performance of ZFS is low vs. > ldiskfs, but the streaming bandwidth is fine. > > Cheers, Andreas > > > On Jan 12, 2024, at 08:40, Backer via lustre-discuss < > lustre-discuss@lists.lustre.org> wrote: > > > > > > Hi, > > > > Could we mix ZFS and LDISKFS together in a cluster? > > > > Thank you, > > > > > > ___ > > lustre-discuss mailing list > > lustre-discuss@lists.lustre.org > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Mixing ZFS and LDISKFS
Hi, Could we mix ZFS and LDISKFS together in a cluster? Thank you, ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Recommendation on number of OSTs
Hi All, What is the recommendation on the total number of OSTs? In order to maximize throughput, go for more number of OSS with small OSTs. This means that it will end up with 1000s of OSTs. Any suggestions or recommendations? Thank you, ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Extending Lustre file system
Thank you all for the valuable information. Are there any tools that I could use to migrate (rebalance) OSTs? I know about lfs_migrate. Is there a tool that walks and balance the OST usage? Thank you! On Mon, 8 Jan 2024 at 09:38, Backer wrote: > Hi, > > Good morning and happy new year! > > I have a quick question on extending a lustre file system. The extension > is performed online. I am looking for any best practices or anything to > watchout while doing the file system extension. The file system extension > is done adding new OSS and many OSTs within these servers. > > Really appreciate your help on this. > > Regards, > > > > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Extending Lustre file system
Hi, Good morning and happy new year! I have a quick question on extending a lustre file system. The extension is performed online. I am looking for any best practices or anything to watchout while doing the file system extension. The file system extension is done adding new OSS and many OSTs within these servers. Really appreciate your help on this. Regards, ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] What is the meaning of these messages?
Hi All, Just sending this again. On Tue, 5 Dec 2023 at 15:03, Backer wrote: > Hi All, > > Time to time, I see the following messages on multiple OSS about a > particular client IP. What does it mean? All the OSS and OSTs are online > and has been online in the past. > > Dec 4 18:05:27 oss010 kernel: LustreError: 137-5: fs-OST00b0_UUID: not > available for connect from @tcp1 (no target). If you are running > an HA pair check that the target is mounted on the other server. > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] What is the meaning of these messages?
Hi All, Time to time, I see the following messages on multiple OSS about a particular client IP. What does it mean? All the OSS and OSTs are online and has been online in the past. Dec 4 18:05:27 oss010 kernel: LustreError: 137-5: fs-OST00b0_UUID: not available for connect from @tcp1 (no target). If you are running an HA pair check that the target is mounted on the other server. ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Error messages (ex: not available for connect from 0@lo) on server boot with Lustre 2.15.3 and 2.15.4-RC1
I do not want to hijack this thread but just checking here before I start another new thread. I am getting similar messages randomly. The IP involved here is one Client IP. Getting messages from multiple OSS about multiple OST at the same time and stops. These types of messages appear occasionally on multiple OSS, and all these are related to one client at a time. Wondering if it is one client related issue as this FS has 100s of clients and only one client reports at a time. Unfortunately, there is no easy way for me to figure out if the specified client had an access issue around the time frame mentioned in the log (no access to clients). Dec 4 18:05:27 oss010 kernel: LustreError: 137-5: fs-OST00b0_UUID: not available for connect from @tcp1 (no target). If you are running an HA pair check that the target is mounted on the other server. On Mon, 4 Dec 2023 at 05:27, Andreas Dilger via lustre-discuss < lustre-discuss@lists.lustre.org> wrote: > It wasn't clear from your rail which message(s) are you concerned about? > These look like normal mount message(s) to me. > > The "error" is pretty normal, it just means there were multiple services > starting at once and one wasn't yet ready for the other. > > LustreError: 137-5: lustrevm-MDT_UUID: not available for > connect > from 0@lo (no target). If you are running an HA pair check that > the target > is mounted on the other server. > > It probably makes sense to quiet this message right at mount time to avoid > this. > > Cheers, Andreas > > On Dec 1, 2023, at 10:24, Audet, Martin via lustre-discuss < > lustre-discuss@lists.lustre.org> wrote: > > > > Hello Lustre community, > > > Have someone ever seen messages like these on in "/var/log/messages" on a > Lustre server ? > > Dec 1 11:26:30 vlfs kernel: Lustre: Lustre: Build Version: 2.15.4_RC1 > Dec 1 11:26:30 vlfs kernel: LDISKFS-fs (sdd): mounted filesystem with > ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc > Dec 1 11:26:30 vlfs kernel: LDISKFS-fs (sdc): mounted filesystem with > ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc > Dec 1 11:26:30 vlfs kernel: LDISKFS-fs (sdb): mounted filesystem with > ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc > Dec 1 11:26:36 vlfs kernel: LustreError: 137-5: lustrevm-MDT_UUID: > not available for connect from 0@lo (no target). If you are running an HA > pair check that the target is mounted on the other server. > Dec 1 11:26:36 vlfs kernel: Lustre: lustrevm-OST0001: Imperative Recovery > not enabled, recovery window 300-900 > Dec 1 11:26:36 vlfs kernel: Lustre: lustrevm-OST0001: deleting orphan > objects from 0x0:227 to 0x0:513 > > This happens on every boot on a Lustre server named vlfs (a AlmaLinux 8.9 > VM hosted on a VMware) playing the role of both MGS and OSS (it hosts an > MDT two OST using "virtual" disks). We chose LDISKFS and not ZFS. Note that > this happens at every boot, well before the clients (AlmaLinux 9.3 or 8.9 > VMs) connect and even when the clients are powered off. The network > connecting the clients and the server is a "virtual" 10GbE network (of > course there is no virtual IB). Also we had the same messages previously > with Lustre 2.15.3 using an AlmaLinux 8.8 server and AlmaLinux 8.8 / 9.2 > clients (also using VMs). Note also that we compile ourselves the Lustre > RPMs from the sources from the git repository. We also chose to use a > patched kernel. Our build procedure for RPMs seems to work well because > our real cluster run fine on CentOS 7.9 with Lustre 2.12.9 and IB (MOFED) > networking. > > So has anyone seen these messages ? > > Are they problematic ? If yes, how do we avoid them ? > > We would like to make sure our small test system using VMs works well > before we upgrade our real cluster. > > Thanks in advance ! > > Martin Audet > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] OST is not mounting
Thanks for the explanation. There was a problem with the iscsi target. It is already multi-path. Anyhow, I was expecting things to come back online after the problem was resolved. This kind of created a data loss situation and I thought Lustre was resilient not to lose the whole OST. Here the OST became completely unmountable. On Tue, 7 Nov 2023 at 13:56, Andreas Dilger wrote: > The OST went read-only because that is what happens when the block device > disappears underneath it. That is a behavior of ext4 and other local > filesystems as well. > > If you look in the console logs you would see SCSI errors and the > filesystem being remounted read-only. > > To have reliability in the face of such storage issues you need to use > dm-multipath. > > Cheers, Andreas > > > On Nov 5, 2023, at 09:13, Backer via lustre-discuss < > lustre-discuss@lists.lustre.org> wrote: > > > > - Why did OST become in this state after the write failure and was > mounted RO. The write error was due to iSCSI target going offline and > coming back after a few seconds later. > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] OST is not mounting
Hi, Sending this again. Appreciate your help. On Sun, 5 Nov 2023 at 11:11, Backer wrote: > Hi, > > I am new to this email list. Looking to get some help on why an OST is not > getting mounted. > > > The cluster was running healthy and the OST experienced an issue and Linux > re-mounted the OST read only. After fixing the issue and rebooting the node > multiple times, it wouldn't mount. > > When the mount is done, the mount command errors out stating that that the > index is already in use. The index for the device is 33. There is no place > where this index is mounted. > > The debug message from the MGS during the mount is attached at the end of > this email. It is asking to use writeconf. After using writeconfig, the > device was mounted. Looking for a couple of things here. > > - I am hoping that the writeconf method is the right thing to do here. > - Why did OST become in this state after the write failure and was mounted > RO. The write error was due to iSCSI target going offline and coming back > after a few seconds later. > > 2000:0100:17.0:1698240468.758487:0:91492:0:(mgs_handler.c:496:mgs_target_reg()) > updating fs1-OST0021, index=33 > > 2000:0001:17.0:1698240468.758488:0:91492:0:(mgs_llog.c:4403:mgs_write_log_target()) > Process entered > > 2000:0001:17.0:1698240468.758488:0:91492:0:(mgs_llog.c:671:mgs_set_index()) > Process entered > > 2000:0001:17.0:1698240468.758488:0:91492:0:(mgs_llog.c:572:mgs_find_or_make_fsdb()) > Process entered > > 2000:0001:17.0:1698240468.758489:0:91492:0:(mgs_llog.c:551:mgs_find_or_make_fsdb_nolock()) > Process entered > > 2000:0001:17.0:1698240468.758489:0:91492:0:(mgs_llog.c:565:mgs_find_or_make_fsdb_nolock()) > Process leaving (rc=0 : 0 : 0) > > 2000:0001:17.0:1698240468.758489:0:91492:0:(mgs_llog.c:578:mgs_find_or_make_fsdb()) > Process leaving (rc=0 : 0 : 0) > > 2000:0202:17.0:1698240468.758490:0:91492:0:(mgs_llog.c:711:mgs_set_index()) > 140-5: Server fs1-OST0021 requested index 33, but that index is already in > use. Use --writeconf to force > > 2000:0001:17.0:1698240468.772355:0:91492:0:(mgs_llog.c:712:mgs_set_index()) > Process leaving via out_up (rc=18446744073709551518 : -98 : > 0xff9e) > > 2000:0001:17.0:1698240468.772356:0:91492:0:(mgs_llog.c:4408:mgs_write_log_target()) > Process leaving (rc=18446744073709551518 : -98 : ff9e) > > 2000:0002:17.0:1698240468.772357:0:91492:0:(mgs_handler.c:503:mgs_target_reg()) > Failed to write fs1-OST0021 log (-98) > > 2000:0001:17.0:1698240468.783747:0:91492:0:(mgs_handler.c:504:mgs_target_reg()) > Process leaving via out (rc=18446744073709551518 : -98 : 0xff9e) > > > > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] OST is not mounting
Hi, I am new to this email list. Looking to get some help on why an OST is not getting mounted. The cluster was running healthy and the OST experienced an issue and Linux re-mounted the OST read only. After fixing the issue and rebooting the node multiple times, it wouldn't mount. When the mount is done, the mount command errors out stating that that the index is already in use. The index for the device is 33. There is no place where this index is mounted. The debug message from the MGS during the mount is attached at the end of this email. It is asking to use writeconf. After using writeconfig, the device was mounted. Looking for a couple of things here. - I am hoping that the writeconf method is the right thing to do here. - Why did OST become in this state after the write failure and was mounted RO. The write error was due to iSCSI target going offline and coming back after a few seconds later. 2000:0100:17.0:1698240468.758487:0:91492:0:(mgs_handler.c:496:mgs_target_reg()) updating fs1-OST0021, index=33 2000:0001:17.0:1698240468.758488:0:91492:0:(mgs_llog.c:4403:mgs_write_log_target()) Process entered 2000:0001:17.0:1698240468.758488:0:91492:0:(mgs_llog.c:671:mgs_set_index()) Process entered 2000:0001:17.0:1698240468.758488:0:91492:0:(mgs_llog.c:572:mgs_find_or_make_fsdb()) Process entered 2000:0001:17.0:1698240468.758489:0:91492:0:(mgs_llog.c:551:mgs_find_or_make_fsdb_nolock()) Process entered 2000:0001:17.0:1698240468.758489:0:91492:0:(mgs_llog.c:565:mgs_find_or_make_fsdb_nolock()) Process leaving (rc=0 : 0 : 0) 2000:0001:17.0:1698240468.758489:0:91492:0:(mgs_llog.c:578:mgs_find_or_make_fsdb()) Process leaving (rc=0 : 0 : 0) 2000:0202:17.0:1698240468.758490:0:91492:0:(mgs_llog.c:711:mgs_set_index()) 140-5: Server fs1-OST0021 requested index 33, but that index is already in use. Use --writeconf to force 2000:0001:17.0:1698240468.772355:0:91492:0:(mgs_llog.c:712:mgs_set_index()) Process leaving via out_up (rc=18446744073709551518 : -98 : 0xff9e) 2000:0001:17.0:1698240468.772356:0:91492:0:(mgs_llog.c:4408:mgs_write_log_target()) Process leaving (rc=18446744073709551518 : -98 : ff9e) 2000:0002:17.0:1698240468.772357:0:91492:0:(mgs_handler.c:503:mgs_target_reg()) Failed to write fs1-OST0021 log (-98) 2000:0001:17.0:1698240468.783747:0:91492:0:(mgs_handler.c:504:mgs_target_reg()) Process leaving via out (rc=18446744073709551518 : -98 : 0xff9e) ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org