Re: [lustre-discuss] Missing OST's from 1 node only

2021-10-12 Thread Cameron Harr via lustre-discuss

  
I don't know the problem here, but you might want to look for
  connectivity issues from the client to the OSS(s) that house those
  two missing OSTs. I would image the lustre.log would show such
  errors in bulk. I've seen where an IB subnet manager gets in a
  weird state such that some nodes can no longer find a path to
  certain other nodes.
Cameron

On 10/7/21 4:54 PM, Sid Young via
  lustre-discuss wrote:


  
  G'Day all,


I have an odd situation where 1 compute node, mounts /home
  and /lustre but only half the OST's are present, while all the
  other nodes are fine not sure where to start on this one?


Good node:
[root@n02 ~]# lfs df
  UUID                   1K-blocks        Used   Available Use%
  Mounted on
  home-MDT_UUID     4473970688    30695424  4443273216   1%
  /home[MDT:0]
  home-OST_UUID    51097721856 39839794176 11257662464  78%
  /home[OST:0]
  home-OST0001_UUID    51097897984 40967138304 10130627584  81%
  /home[OST:1]
  home-OST0002_UUID    51097705472 37731089408 13366449152  74%
  /home[OST:2]
  home-OST0003_UUID    51097773056 41447411712  9650104320  82%
  /home[OST:3]
  
  filesystem_summary:  204391098368 159985433600 44404843520
   79% /home
  
  UUID                   1K-blocks        Used   Available Use%
  Mounted on
  lustre-MDT_UUID   5368816128    28246656  5340567424   1%
  /lustre[MDT:0]
  lustre-OST_UUID  51098352640 10144093184 40954257408  20%
  /lustre[OST:0]
  lustre-OST0001_UUID  51098497024  9584398336 41514096640  19%
  /lustre[OST:1]
  lustre-OST0002_UUID  51098414080 11683002368 39415409664  23%
  /lustre[OST:2]
  lustre-OST0003_UUID  51098514432 10475310080 40623202304  21%
  /lustre[OST:3]
  lustre-OST0004_UUID  51098506240 11505326080 39593178112  23%
  /lustre[OST:4]
  lustre-OST0005_UUID  51098429440  9272059904 41826367488  19%
  /lustre[OST:5]
  
  filesystem_summary:  306590713856 62664189952 243926511616
   21% /lustre
  
  [root@n02 ~]#






The bad Node:


 [root@n04 ~]# lfs df

UUID                   1K-blocks        Used   Available Use%
Mounted on
home-MDT_UUID     4473970688    30726400  4443242240   1%
/home[MDT:0]
home-OST0002_UUID    51097703424 37732352000 13363446784  74%
/home[OST:2]
home-OST0003_UUID    51097778176 41449634816  9646617600  82%
/home[OST:3]

filesystem_summary:  102195481600 79181986816 23010064384  78%
/home

UUID                   1K-blocks        Used   Available Use%
Mounted on
lustre-MDT_UUID   5368816128    28246656  5340567424   1%
/lustre[MDT:0]
lustre-OST0003_UUID  51098514432 10475310080 40623202304  21%
/lustre[OST:3]
lustre-OST0004_UUID  51098511360 11505326080 39593183232  23%
/lustre[OST:4]
lustre-OST0005_UUID  51098429440  9272059904 41826367488  19%
/lustre[OST:5]

filesystem_summary:  153295455232 31252696064 122042753024  21%
/lustre

[root@n04 ~]#

  

  

  

  

  




Sid Young
Translational Research Institute 


  

  

  

  

  

  
  
  
  ___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
https://urldefense.us/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!G2kpM7uM-TzIFchu!nEjUA49bGioXfgNynjj0MPhe-SucZDvI3_iVk8BGgkI-ZEL4s6xX3Ow51T_fkSY$ 


  

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Missing OST's from 1 node only

2021-10-07 Thread Sid Young via lustre-discuss
G'Day all,

I have an odd situation where 1 compute node, mounts /home and /lustre but
only half the OST's are present, while all the other nodes are fine not
sure where to start on this one?

Good node:
[root@n02 ~]# lfs df
UUID   1K-blocksUsed   Available Use% Mounted on
home-MDT_UUID 447397068830695424  4443273216   1% /home[MDT:0]
home-OST_UUID51097721856 39839794176 11257662464  78% /home[OST:0]
home-OST0001_UUID51097897984 40967138304 10130627584  81% /home[OST:1]
home-OST0002_UUID51097705472 37731089408 13366449152  74% /home[OST:2]
home-OST0003_UUID51097773056 41447411712  9650104320  82% /home[OST:3]

filesystem_summary:  204391098368 159985433600 44404843520  79% /home

UUID   1K-blocksUsed   Available Use% Mounted on
lustre-MDT_UUID   536881612828246656  5340567424   1% /lustre[MDT:0]
lustre-OST_UUID  51098352640 10144093184 40954257408  20% /lustre[OST:0]
lustre-OST0001_UUID  51098497024  9584398336 41514096640  19% /lustre[OST:1]
lustre-OST0002_UUID  51098414080 11683002368 39415409664  23% /lustre[OST:2]
lustre-OST0003_UUID  51098514432 10475310080 40623202304  21% /lustre[OST:3]
lustre-OST0004_UUID  51098506240 11505326080 39593178112  23% /lustre[OST:4]
lustre-OST0005_UUID  51098429440  9272059904 41826367488  19% /lustre[OST:5]

filesystem_summary:  306590713856 62664189952 243926511616  21% /lustre

[root@n02 ~]#



The bad Node:

 [root@n04 ~]# lfs df
UUID   1K-blocksUsed   Available Use% Mounted on
home-MDT_UUID 447397068830726400  4443242240   1% /home[MDT:0]
home-OST0002_UUID51097703424 37732352000 13363446784  74% /home[OST:2]
home-OST0003_UUID51097778176 41449634816  9646617600  82% /home[OST:3]

filesystem_summary:  102195481600 79181986816 23010064384  78% /home

UUID   1K-blocksUsed   Available Use% Mounted on
lustre-MDT_UUID   536881612828246656  5340567424   1% /lustre[MDT:0]
lustre-OST0003_UUID  51098514432 10475310080 40623202304  21% /lustre[OST:3]
lustre-OST0004_UUID  51098511360 11505326080 39593183232  23% /lustre[OST:4]
lustre-OST0005_UUID  51098429440  9272059904 41826367488  19% /lustre[OST:5]

filesystem_summary:  153295455232 31252696064 122042753024  21% /lustre

[root@n04 ~]#



Sid Young
Translational Research Institute
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [Lustre-discuss] missing ost's?

2009-06-17 Thread Michael Di Domenico
On Tue, Jun 16, 2009 at 11:08 PM, Mag Gammagaw...@gmail.com wrote:
 do you have many small files?

There was a mix of small vs medium size.  I reread the Sizing MDT
section in the manual and see my error.  That section should be in big
bold letters at the very beginning... :)
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] missing ost's?

2009-06-17 Thread Michael Di Domenico
2009/6/17 Timh Bergström timh.bergst...@diino.net:
 As long as the inode-discussion is up, two questions; what exactly is
 stored in the inode (how big should i make them) I've read the manual
 about this and it doesnt really say except the notation about
 stripes/osts.

 Is there a proper way of moving or recreating the mdt-filesystem
 to hold more inodes or is it backup - reformat - restore procedure
 that is the proper way?

 Sorry to hijack your thread.


Its okay.  I have roughly the same question.  In my current case, the
filesystem is only a test so i can just recreate it, but i can see
this happening in production, so preparing for it not to happen i can
do, but users are unpredictable...
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] missing ost's?

2009-06-17 Thread Cliff White
Michael Di Domenico wrote:
 On Tue, Jun 16, 2009 at 8:25 PM, Michael Di
 Domenicomdidomeni...@gmail.com wrote:
 I have a small lustre test cluster with eight OST's running.  The
 servers were shut off over the weekend, upon turning them back on and
 trying to startup lustre I seem to have lost my OST's.

 [r...@node1 ~]$ lctl dl
  0 UP mgs MGS MGS 19
  1 UP mgc mgc192.168.1@tcp 8acd9bf1-d1ca-8e26-1fad-bd2cf88a2957 5
  2 UP mdt MDS MDS_uuid 3
  3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4
  4 UP mds lustre-MDT lustre-MDT_UUID 3
  5 UP ost OSS OSS_uuid 3
  6 UP obdfilter lustre-OST lustre-OST_UUID 3

 Everything in the messages log appears to be fine as if it was just a
 normal startup of lustre, except for the below message.  I'm not sure
 what logfile the error is referring to, and the message gives little
 detail on where i should start looking for an error.

 Jun 16 20:13:55 node1-eth0 kernel: LustreError:
 3106:0:(llog_lvfs.c:577:llog_filp_open()) logfile creation
 CONFIGS/lustre-MDTT: -28
 Jun 16 20:13:55 node1-eth0 kernel: LustreError:
 3106:0:(mgc_request.c:1086:mgc_copy_llog()) Failed to copy remote log
 lustre-MDT (-28)
 
 Apparently from the lustre manual the -28 at the end of the line is an
 error code, which points to
 
 -28 -ENOSPC The file system is out-of-space or out of inodes. Use lfs df
 (query the amount of file system space) or lfs df -i
 (query the number of inodes).
 
 verified by
 
 [r...@node1 ~]$ df -i
 FilesystemInodes   IUsed   IFree IUse% Mounted on
 /dev/md2 128   42132 12378684% /
 /dev/md0  255232  45  2551871% /boot
 tmpfs 124645   1  1246441% /dev/shm
 /dev/md3   63872  24   638481% /mgs
 /dev/md4  255040  255040   0  100% /mdt
 /dev/md5 29892608   28726 298638821% /ost
 
 I only put 500k files in the filesystem i would not have thought the
 mdt would have used up the inodes that fast

The MDT will consume one inode for each file in the global Lustre file 
system. You have plenty of OST space, but no inodes.

You have 255K inodes on the MDS, but you are trying to create 500k files.

cliffw

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] missing ost's?

2009-06-16 Thread Mag Gam
do you have many small files?



On Tue, Jun 16, 2009 at 8:58 PM, Michael Di
Domenicomdidomeni...@gmail.com wrote:
 On Tue, Jun 16, 2009 at 8:25 PM, Michael Di
 Domenicomdidomeni...@gmail.com wrote:
 I have a small lustre test cluster with eight OST's running.  The
 servers were shut off over the weekend, upon turning them back on and
 trying to startup lustre I seem to have lost my OST's.

 [r...@node1 ~]$ lctl dl
  0 UP mgs MGS MGS 19
  1 UP mgc mgc192.168.1@tcp 8acd9bf1-d1ca-8e26-1fad-bd2cf88a2957 5
  2 UP mdt MDS MDS_uuid 3
  3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4
  4 UP mds lustre-MDT lustre-MDT_UUID 3
  5 UP ost OSS OSS_uuid 3
  6 UP obdfilter lustre-OST lustre-OST_UUID 3

 Everything in the messages log appears to be fine as if it was just a
 normal startup of lustre, except for the below message.  I'm not sure
 what logfile the error is referring to, and the message gives little
 detail on where i should start looking for an error.

 Jun 16 20:13:55 node1-eth0 kernel: LustreError:
 3106:0:(llog_lvfs.c:577:llog_filp_open()) logfile creation
 CONFIGS/lustre-MDTT: -28
 Jun 16 20:13:55 node1-eth0 kernel: LustreError:
 3106:0:(mgc_request.c:1086:mgc_copy_llog()) Failed to copy remote log
 lustre-MDT (-28)

 Apparently from the lustre manual the -28 at the end of the line is an
 error code, which points to

 -28 -ENOSPC The file system is out-of-space or out of inodes. Use lfs df
 (query the amount of file system space) or lfs df -i
 (query the number of inodes).

 verified by

 [r...@node1 ~]$ df -i
 Filesystem            Inodes   IUsed   IFree IUse% Mounted on
 /dev/md2             128   42132 1237868    4% /
 /dev/md0              255232      45  255187    1% /boot
 tmpfs                 124645       1  124644    1% /dev/shm
 /dev/md3               63872      24   63848    1% /mgs
 /dev/md4              255040  255040       0  100% /mdt
 /dev/md5             29892608   28726 29863882    1% /ost

 I only put 500k files in the filesystem i would not have thought the
 mdt would have used up the inodes that fast
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] missing ost's?

2009-06-16 Thread Timh Bergström
As long as the inode-discussion is up, two questions; what exactly is
stored in the inode (how big should i make them) I've read the manual
about this and it doesnt really say except the notation about
stripes/osts.

Is there a proper way of moving or recreating the mdt-filesystem
to hold more inodes or is it backup - reformat - restore procedure
that is the proper way?

Sorry to hijack your thread.

Regards,
Timh

2009/6/17 Mag Gam magaw...@gmail.com:
 do you have many small files?



 On Tue, Jun 16, 2009 at 8:58 PM, Michael Di
 Domenicomdidomeni...@gmail.com wrote:
 On Tue, Jun 16, 2009 at 8:25 PM, Michael Di
 Domenicomdidomeni...@gmail.com wrote:
 I have a small lustre test cluster with eight OST's running.  The
 servers were shut off over the weekend, upon turning them back on and
 trying to startup lustre I seem to have lost my OST's.

 [r...@node1 ~]$ lctl dl
  0 UP mgs MGS MGS 19
  1 UP mgc mgc192.168.1@tcp 8acd9bf1-d1ca-8e26-1fad-bd2cf88a2957 5
  2 UP mdt MDS MDS_uuid 3
  3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4
  4 UP mds lustre-MDT lustre-MDT_UUID 3
  5 UP ost OSS OSS_uuid 3
  6 UP obdfilter lustre-OST lustre-OST_UUID 3

 Everything in the messages log appears to be fine as if it was just a
 normal startup of lustre, except for the below message.  I'm not sure
 what logfile the error is referring to, and the message gives little
 detail on where i should start looking for an error.

 Jun 16 20:13:55 node1-eth0 kernel: LustreError:
 3106:0:(llog_lvfs.c:577:llog_filp_open()) logfile creation
 CONFIGS/lustre-MDTT: -28
 Jun 16 20:13:55 node1-eth0 kernel: LustreError:
 3106:0:(mgc_request.c:1086:mgc_copy_llog()) Failed to copy remote log
 lustre-MDT (-28)

 Apparently from the lustre manual the -28 at the end of the line is an
 error code, which points to

 -28 -ENOSPC The file system is out-of-space or out of inodes. Use lfs df
 (query the amount of file system space) or lfs df -i
 (query the number of inodes).

 verified by

 [r...@node1 ~]$ df -i
 Filesystem            Inodes   IUsed   IFree IUse% Mounted on
 /dev/md2             128   42132 1237868    4% /
 /dev/md0              255232      45  255187    1% /boot
 tmpfs                 124645       1  124644    1% /dev/shm
 /dev/md3               63872      24   63848    1% /mgs
 /dev/md4              255040  255040       0  100% /mdt
 /dev/md5             29892608   28726 29863882    1% /ost

 I only put 500k files in the filesystem i would not have thought the
 mdt would have used up the inodes that fast
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss




-- 
Timh Bergström
System Operations Manager
Diino AB - www.diino.com
:wq
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss