Re: [lustre-discuss] Missing OST's from 1 node only
I don't know the problem here, but you might want to look for connectivity issues from the client to the OSS(s) that house those two missing OSTs. I would image the lustre.log would show such errors in bulk. I've seen where an IB subnet manager gets in a weird state such that some nodes can no longer find a path to certain other nodes. Cameron On 10/7/21 4:54 PM, Sid Young via lustre-discuss wrote: G'Day all, I have an odd situation where 1 compute node, mounts /home and /lustre but only half the OST's are present, while all the other nodes are fine not sure where to start on this one? Good node: [root@n02 ~]# lfs df UUID 1K-blocks Used Available Use% Mounted on home-MDT_UUID 4473970688 30695424 4443273216 1% /home[MDT:0] home-OST_UUID 51097721856 39839794176 11257662464 78% /home[OST:0] home-OST0001_UUID 51097897984 40967138304 10130627584 81% /home[OST:1] home-OST0002_UUID 51097705472 37731089408 13366449152 74% /home[OST:2] home-OST0003_UUID 51097773056 41447411712 9650104320 82% /home[OST:3] filesystem_summary: 204391098368 159985433600 44404843520 79% /home UUID 1K-blocks Used Available Use% Mounted on lustre-MDT_UUID 5368816128 28246656 5340567424 1% /lustre[MDT:0] lustre-OST_UUID 51098352640 10144093184 40954257408 20% /lustre[OST:0] lustre-OST0001_UUID 51098497024 9584398336 41514096640 19% /lustre[OST:1] lustre-OST0002_UUID 51098414080 11683002368 39415409664 23% /lustre[OST:2] lustre-OST0003_UUID 51098514432 10475310080 40623202304 21% /lustre[OST:3] lustre-OST0004_UUID 51098506240 11505326080 39593178112 23% /lustre[OST:4] lustre-OST0005_UUID 51098429440 9272059904 41826367488 19% /lustre[OST:5] filesystem_summary: 306590713856 62664189952 243926511616 21% /lustre [root@n02 ~]# The bad Node: [root@n04 ~]# lfs df UUID 1K-blocks Used Available Use% Mounted on home-MDT_UUID 4473970688 30726400 4443242240 1% /home[MDT:0] home-OST0002_UUID 51097703424 37732352000 13363446784 74% /home[OST:2] home-OST0003_UUID 51097778176 41449634816 9646617600 82% /home[OST:3] filesystem_summary: 102195481600 79181986816 23010064384 78% /home UUID 1K-blocks Used Available Use% Mounted on lustre-MDT_UUID 5368816128 28246656 5340567424 1% /lustre[MDT:0] lustre-OST0003_UUID 51098514432 10475310080 40623202304 21% /lustre[OST:3] lustre-OST0004_UUID 51098511360 11505326080 39593183232 23% /lustre[OST:4] lustre-OST0005_UUID 51098429440 9272059904 41826367488 19% /lustre[OST:5] filesystem_summary: 153295455232 31252696064 122042753024 21% /lustre [root@n04 ~]# Sid Young Translational Research Institute ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org https://urldefense.us/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!G2kpM7uM-TzIFchu!nEjUA49bGioXfgNynjj0MPhe-SucZDvI3_iVk8BGgkI-ZEL4s6xX3Ow51T_fkSY$ ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Missing OST's from 1 node only
G'Day all, I have an odd situation where 1 compute node, mounts /home and /lustre but only half the OST's are present, while all the other nodes are fine not sure where to start on this one? Good node: [root@n02 ~]# lfs df UUID 1K-blocksUsed Available Use% Mounted on home-MDT_UUID 447397068830695424 4443273216 1% /home[MDT:0] home-OST_UUID51097721856 39839794176 11257662464 78% /home[OST:0] home-OST0001_UUID51097897984 40967138304 10130627584 81% /home[OST:1] home-OST0002_UUID51097705472 37731089408 13366449152 74% /home[OST:2] home-OST0003_UUID51097773056 41447411712 9650104320 82% /home[OST:3] filesystem_summary: 204391098368 159985433600 44404843520 79% /home UUID 1K-blocksUsed Available Use% Mounted on lustre-MDT_UUID 536881612828246656 5340567424 1% /lustre[MDT:0] lustre-OST_UUID 51098352640 10144093184 40954257408 20% /lustre[OST:0] lustre-OST0001_UUID 51098497024 9584398336 41514096640 19% /lustre[OST:1] lustre-OST0002_UUID 51098414080 11683002368 39415409664 23% /lustre[OST:2] lustre-OST0003_UUID 51098514432 10475310080 40623202304 21% /lustre[OST:3] lustre-OST0004_UUID 51098506240 11505326080 39593178112 23% /lustre[OST:4] lustre-OST0005_UUID 51098429440 9272059904 41826367488 19% /lustre[OST:5] filesystem_summary: 306590713856 62664189952 243926511616 21% /lustre [root@n02 ~]# The bad Node: [root@n04 ~]# lfs df UUID 1K-blocksUsed Available Use% Mounted on home-MDT_UUID 447397068830726400 4443242240 1% /home[MDT:0] home-OST0002_UUID51097703424 37732352000 13363446784 74% /home[OST:2] home-OST0003_UUID51097778176 41449634816 9646617600 82% /home[OST:3] filesystem_summary: 102195481600 79181986816 23010064384 78% /home UUID 1K-blocksUsed Available Use% Mounted on lustre-MDT_UUID 536881612828246656 5340567424 1% /lustre[MDT:0] lustre-OST0003_UUID 51098514432 10475310080 40623202304 21% /lustre[OST:3] lustre-OST0004_UUID 51098511360 11505326080 39593183232 23% /lustre[OST:4] lustre-OST0005_UUID 51098429440 9272059904 41826367488 19% /lustre[OST:5] filesystem_summary: 153295455232 31252696064 122042753024 21% /lustre [root@n04 ~]# Sid Young Translational Research Institute ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [Lustre-discuss] missing ost's?
On Tue, Jun 16, 2009 at 11:08 PM, Mag Gammagaw...@gmail.com wrote: do you have many small files? There was a mix of small vs medium size. I reread the Sizing MDT section in the manual and see my error. That section should be in big bold letters at the very beginning... :) ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] missing ost's?
2009/6/17 Timh Bergström timh.bergst...@diino.net: As long as the inode-discussion is up, two questions; what exactly is stored in the inode (how big should i make them) I've read the manual about this and it doesnt really say except the notation about stripes/osts. Is there a proper way of moving or recreating the mdt-filesystem to hold more inodes or is it backup - reformat - restore procedure that is the proper way? Sorry to hijack your thread. Its okay. I have roughly the same question. In my current case, the filesystem is only a test so i can just recreate it, but i can see this happening in production, so preparing for it not to happen i can do, but users are unpredictable... ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] missing ost's?
Michael Di Domenico wrote: On Tue, Jun 16, 2009 at 8:25 PM, Michael Di Domenicomdidomeni...@gmail.com wrote: I have a small lustre test cluster with eight OST's running. The servers were shut off over the weekend, upon turning them back on and trying to startup lustre I seem to have lost my OST's. [r...@node1 ~]$ lctl dl 0 UP mgs MGS MGS 19 1 UP mgc mgc192.168.1@tcp 8acd9bf1-d1ca-8e26-1fad-bd2cf88a2957 5 2 UP mdt MDS MDS_uuid 3 3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4 4 UP mds lustre-MDT lustre-MDT_UUID 3 5 UP ost OSS OSS_uuid 3 6 UP obdfilter lustre-OST lustre-OST_UUID 3 Everything in the messages log appears to be fine as if it was just a normal startup of lustre, except for the below message. I'm not sure what logfile the error is referring to, and the message gives little detail on where i should start looking for an error. Jun 16 20:13:55 node1-eth0 kernel: LustreError: 3106:0:(llog_lvfs.c:577:llog_filp_open()) logfile creation CONFIGS/lustre-MDTT: -28 Jun 16 20:13:55 node1-eth0 kernel: LustreError: 3106:0:(mgc_request.c:1086:mgc_copy_llog()) Failed to copy remote log lustre-MDT (-28) Apparently from the lustre manual the -28 at the end of the line is an error code, which points to -28 -ENOSPC The file system is out-of-space or out of inodes. Use lfs df (query the amount of file system space) or lfs df -i (query the number of inodes). verified by [r...@node1 ~]$ df -i FilesystemInodes IUsed IFree IUse% Mounted on /dev/md2 128 42132 12378684% / /dev/md0 255232 45 2551871% /boot tmpfs 124645 1 1246441% /dev/shm /dev/md3 63872 24 638481% /mgs /dev/md4 255040 255040 0 100% /mdt /dev/md5 29892608 28726 298638821% /ost I only put 500k files in the filesystem i would not have thought the mdt would have used up the inodes that fast The MDT will consume one inode for each file in the global Lustre file system. You have plenty of OST space, but no inodes. You have 255K inodes on the MDS, but you are trying to create 500k files. cliffw ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] missing ost's?
do you have many small files? On Tue, Jun 16, 2009 at 8:58 PM, Michael Di Domenicomdidomeni...@gmail.com wrote: On Tue, Jun 16, 2009 at 8:25 PM, Michael Di Domenicomdidomeni...@gmail.com wrote: I have a small lustre test cluster with eight OST's running. The servers were shut off over the weekend, upon turning them back on and trying to startup lustre I seem to have lost my OST's. [r...@node1 ~]$ lctl dl 0 UP mgs MGS MGS 19 1 UP mgc mgc192.168.1@tcp 8acd9bf1-d1ca-8e26-1fad-bd2cf88a2957 5 2 UP mdt MDS MDS_uuid 3 3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4 4 UP mds lustre-MDT lustre-MDT_UUID 3 5 UP ost OSS OSS_uuid 3 6 UP obdfilter lustre-OST lustre-OST_UUID 3 Everything in the messages log appears to be fine as if it was just a normal startup of lustre, except for the below message. I'm not sure what logfile the error is referring to, and the message gives little detail on where i should start looking for an error. Jun 16 20:13:55 node1-eth0 kernel: LustreError: 3106:0:(llog_lvfs.c:577:llog_filp_open()) logfile creation CONFIGS/lustre-MDTT: -28 Jun 16 20:13:55 node1-eth0 kernel: LustreError: 3106:0:(mgc_request.c:1086:mgc_copy_llog()) Failed to copy remote log lustre-MDT (-28) Apparently from the lustre manual the -28 at the end of the line is an error code, which points to -28 -ENOSPC The file system is out-of-space or out of inodes. Use lfs df (query the amount of file system space) or lfs df -i (query the number of inodes). verified by [r...@node1 ~]$ df -i Filesystem Inodes IUsed IFree IUse% Mounted on /dev/md2 128 42132 1237868 4% / /dev/md0 255232 45 255187 1% /boot tmpfs 124645 1 124644 1% /dev/shm /dev/md3 63872 24 63848 1% /mgs /dev/md4 255040 255040 0 100% /mdt /dev/md5 29892608 28726 29863882 1% /ost I only put 500k files in the filesystem i would not have thought the mdt would have used up the inodes that fast ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] missing ost's?
As long as the inode-discussion is up, two questions; what exactly is stored in the inode (how big should i make them) I've read the manual about this and it doesnt really say except the notation about stripes/osts. Is there a proper way of moving or recreating the mdt-filesystem to hold more inodes or is it backup - reformat - restore procedure that is the proper way? Sorry to hijack your thread. Regards, Timh 2009/6/17 Mag Gam magaw...@gmail.com: do you have many small files? On Tue, Jun 16, 2009 at 8:58 PM, Michael Di Domenicomdidomeni...@gmail.com wrote: On Tue, Jun 16, 2009 at 8:25 PM, Michael Di Domenicomdidomeni...@gmail.com wrote: I have a small lustre test cluster with eight OST's running. The servers were shut off over the weekend, upon turning them back on and trying to startup lustre I seem to have lost my OST's. [r...@node1 ~]$ lctl dl 0 UP mgs MGS MGS 19 1 UP mgc mgc192.168.1@tcp 8acd9bf1-d1ca-8e26-1fad-bd2cf88a2957 5 2 UP mdt MDS MDS_uuid 3 3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4 4 UP mds lustre-MDT lustre-MDT_UUID 3 5 UP ost OSS OSS_uuid 3 6 UP obdfilter lustre-OST lustre-OST_UUID 3 Everything in the messages log appears to be fine as if it was just a normal startup of lustre, except for the below message. I'm not sure what logfile the error is referring to, and the message gives little detail on where i should start looking for an error. Jun 16 20:13:55 node1-eth0 kernel: LustreError: 3106:0:(llog_lvfs.c:577:llog_filp_open()) logfile creation CONFIGS/lustre-MDTT: -28 Jun 16 20:13:55 node1-eth0 kernel: LustreError: 3106:0:(mgc_request.c:1086:mgc_copy_llog()) Failed to copy remote log lustre-MDT (-28) Apparently from the lustre manual the -28 at the end of the line is an error code, which points to -28 -ENOSPC The file system is out-of-space or out of inodes. Use lfs df (query the amount of file system space) or lfs df -i (query the number of inodes). verified by [r...@node1 ~]$ df -i Filesystem Inodes IUsed IFree IUse% Mounted on /dev/md2 128 42132 1237868 4% / /dev/md0 255232 45 255187 1% /boot tmpfs 124645 1 124644 1% /dev/shm /dev/md3 63872 24 63848 1% /mgs /dev/md4 255040 255040 0 100% /mdt /dev/md5 29892608 28726 29863882 1% /ost I only put 500k files in the filesystem i would not have thought the mdt would have used up the inodes that fast ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- Timh Bergström System Operations Manager Diino AB - www.diino.com :wq ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss