[Lustre-discuss] small cluster
Hi I'm new to the world of HPC and cluster :-( and I was given the task of setting up a new (small) HPC cluster based on 36 Sun X2200 M2 systems. Each system as 1 136GB SAS HD. I've installed CentOS 5.2 as the base OS on all the systems and through partitinoing managed to have a unused partition size 65GB (and with 36 nodes I get +2TB of distributed storage). Tthe rest of the HD is as follows: 4xRAM = 64GB swap and ~10GB for the OS itself. I want ot put a Clustered FS on the unused partitions and from looking around seems that Lustre is king of the hill ... The question I have is: Will I gain anything from using Lustre in such a tight environemnt or am I going to degrade the cluster's performnce that it simply not worth the effort? -- TIA Paolo ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] small cluster
Paolo, The biggest problem with using Lustre in this environment is that the Lustre _servers_ (those providing the disk space) cannot _use_ the filesystem (cannot have a client and a server on same machine). Kevin Paolo Supino wrote: Hi I'm new to the world of HPC and cluster :-( and I was given the task of setting up a new (small) HPC cluster based on 36 Sun X2200 M2 systems. Each system as 1 136GB SAS HD. I've installed CentOS 5.2 as the base OS on all the systems and through partitinoing managed to have a unused partition size 65GB (and with 36 nodes I get +2TB of distributed storage). Tthe rest of the HD is as follows: 4xRAM = 64GB swap and ~10GB for the OS itself. I want ot put a Clustered FS on the unused partitions and from looking around seems that Lustre is king of the hill ... The question I have is: Will I gain anything from using Lustre in such a tight environemnt or am I going to degrade the cluster's performnce that it simply not worth the effort? -- TIA Paolo ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre clients failing, and cant reconnect
On Fri, 2008-09-05 at 00:15 -0400, Brock Palen wrote: Looks like that didn't fix it. One of the login nodes repeated the behavior. So what are the messages the client logged when the problem occurred? And what, if anything was logged on the MDS at the same time? b. signature.asc Description: This is a digitally signed message part ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre clients failing, and cant reconnect
I had to reboot the MDS to get the problem to go away. I will watch and see if it reappears. I screwed up and deleted the wrong /var/log/messages So I don't have the messages. I am watching this issues. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Sep 5, 2008, at 10:01 AM, Brian J. Murrell wrote: On Fri, 2008-09-05 at 00:15 -0400, Brock Palen wrote: Looks like that didn't fix it. One of the login nodes repeated the behavior. So what are the messages the client logged when the problem occurred? And what, if anything was logged on the MDS at the same time? b. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre clients failing, and cant reconnect
For what it's worth... I've seen similar problems with clients not being able to connect to OSSs SERVER OS: Linux oss1 2.6.18-53.1.14.el5_lustre.1.6.5.1smp #1 SMP Thu Jun 26 01:38:50 EDT 2008 i686 i686 i386 GNU/Linux CLIENT OS: Linux x15 2.6.18-53.1.14.el5_lustre.1.6.5smp #1 SMP Mon May 12 22:24:24 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux On the client side I see this... CLIENT LOG = Sep 1 15:17:22 x15 kernel: Lustre: Request x30990319 sent from data-OST0004-osc-81022067ec00 to NID [EMAIL PROTECTED] 100s ago has timed out (limit 100s). Sep 1 15:17:22 x15 kernel: Lustre: Skipped 9 previous similar messages Sep 1 15:17:22 x15 kernel: Lustre: data-OST0004-osc-81022067ec00: Connection to service data-OST0004 via nid [EMAIL PROTECTED] was lost; in progress operations using this service will wait for recovery to complete. Sep 1 15:17:22 x15 kernel: LustreError: 3834:0:(ldlm_request.c:986:ldlm_cli_cancel_req()) Got rc -11 from cancel RPC: canceling anyway Sep 1 15:17:22 x15 kernel: LustreError: 3834:0:(ldlm_request.c:1575:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -11 Sep 1 15:17:22 x15 kernel: LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The ost_connect operation failed with -16 Sep 1 15:17:22 x15 kernel: LustreError: Skipped 2 previous similar messages Sep 1 15:17:47 x15 kernel: Lustre: 3216:0:(import.c:395:import_select_connection()) data-OST0004-osc-81022067ec00: tried all connections, increasing latency to 6s Sep 1 15:17:47 x15 kernel: Lustre: 3216:0:(import.c:395:import_select_connection()) Skipped 4 previous similar messages Sep 1 15:17:47 x15 kernel: LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The ost_connect operation failed with -16 Sep 1 15:18:37 x15 last message repeated 2 times Sep 1 15:19:02 x15 kernel: LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The ost_connect operation failed with -16 Sep 1 15:19:27 x15 kernel: Lustre: 3216:0:(import.c:395:import_select_connection()) data-OST0004-osc-81022067ec00: tried all connections, increasing latency to 26s Sep 1 15:19:27 x15 kernel: Lustre: 3216:0:(import.c:395:import_select_connection()) Skipped 3 previous similar messages Sep 1 15:19:27 x15 kernel: LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The ost_connect operation failed with -16 Sep 1 15:20:17 x15 last message repeated 2 times Sep 1 15:21:07 x15 kernel: LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The ost_connect operation failed with -16 Sep 1 15:21:07 x15 kernel: LustreError: Skipped 1 previous similar message Sep 1 15:21:57 x15 kernel: Lustre: 3216:0:(import.c:395:import_select_connection()) data-OST0004-osc-81022067ec00: tried all connections, increasing latency to 51s Sep 1 15:21:57 x15 kernel: Lustre: 3216:0:(import.c:395:import_select_connection()) Skipped 5 previous similar messages Sep 1 15:22:22 x15 kernel: LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The ost_connect operation failed with -16 Sep 1 15:22:22 x15 kernel: LustreError: Skipped 2 previous similar messages Sep 1 15:24:52 x15 kernel: LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The ost_connect operation failed with -16 Sep 1 15:24:52 x15 kernel: LustreError: Skipped 5 previous similar messages Sep 1 15:27:22 x15 kernel: Lustre: 3216:0:(import.c:395:import_select_connection()) data-OST0004-osc-81022067ec00: tried all connections, increasing latency to 51s Sep 1 15:27:22 x15 kernel: Lustre: 3216:0:(import.c:395:import_select_connection()) Skipped 12 previous similar messages Sep 1 15:29:27 x15 kernel: LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The ost_connect operation failed with -16 Sep 1 15:29:27 x15 kernel: LustreError: Skipped 10 previous similar messages Sep 1 15:37:47 x15 kernel: Lustre: 3216:0:(import.c:395:import_select_connection()) data-OST0004-osc-81022067ec00: tried all connections, increasing latency to 51s Sep 1 15:37:47 x15 kernel: Lustre: 3216:0:(import.c:395:import_select_connection()) Skipped 24 previous similar messages Sep 1 15:38:12 x15 kernel: LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The ost_connect operation failed with -16 Sep 1 15:38:12 x15 kernel: LustreError: Skipped 20 previous similar messages Sep 1 15:48:12 x15 kernel: Lustre: 3216:0:(import.c:395:import_select_connection()) data-OST0004-osc-81022067ec00: tried all connections, increasing latency to 51s END CLIENT LOG = Server log at corresponding time... SERVER LOG = Aug 31 04:02:04 oss1 syslogd
Re: [Lustre-discuss] Starting a new MGS/MDS
Does the new MDS actually have an MGS running? FYI- you only need one mgs per lustre set up. In the commands you issued it doesn't look like you actually set up an MGS on the host mds2. Can you run an lctl dl on mds2 and send the output? On Sep 4, 2008, at 4:54 PM, Ms. Megan Larko wrote: Hi, I have a new MGS/MDS that I would like to start. It is another of the same Cent0S 5 kernel 2.6.18-53.1.13.el5 lustre-1.6.4.3smp as my other boxes. Initially I had an IP number that was used elsewhere in our group. I changed it using the tunefs.lustre command below for the new MDT. [EMAIL PROTECTED] ~]# tunefs.lustre --erase-params --writeconf [EMAIL PROTECTED] /dev/sdd1 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: crew8-MDT Index: unassigned Lustre FS: crew8 Mount type: ldiskfs Flags: 0x71 (MDT needs_index first_time update ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: [EMAIL PROTECTED] Permanent disk data: Target: crew8-MDT Index: unassigned Lustre FS: crew8 Mount type: ldiskfs Flags: 0x171 (MDT needs_index first_time update writeconf ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: [EMAIL PROTECTED] Writing CONFIGS/mountdata Next I try to mount this new MDT onto the system [EMAIL PROTECTED] ~]# mount -t lustre /dev/sdd1 /srv/lustre/mds/crew8-MDT mount.lustre: mount /dev/sdd1 at /srv/lustre/mds/crew8-MDT failed: Input/output error Is the MGS running? Ummm--- yeah, I thought the MGS is running. [EMAIL PROTECTED] ~]# tail /var/log/messages Sep 4 16:28:08 mds2 kernel: LDISKFS-fs: mounted filesystem with ordered data mode. Sep 4 16:28:13 mds2 kernel: LustreError: 3526:0:(client.c:975:ptlrpc_expire_one_request()) @@@ timeout (sent at 1220560088, 5s ago) [EMAIL PROTECTED] x3/t0 o250-[EMAIL PROTECTED]@o2ib_0:26 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/-22 Sep 4 16:28:13 mds2 kernel: LustreError: 3797:0:(obd_mount.c:954:server_register_target()) registration with the MGS failed (-5) Sep 4 16:28:13 mds2 kernel: LustreError: 3797:0:(obd_mount.c:1054:server_start_targets()) Required registration failed for crew8-MDT: -5 Sep 4 16:28:13 mds2 kernel: LustreError: 15f-b: Communication error with the MGS. Is the MGS running? Sep 4 16:28:13 mds2 kernel: LustreError: 3797:0:(obd_mount.c:1570:server_fill_super()) Unable to start targets: -5 Sep 4 16:28:13 mds2 kernel: LustreError: 3797:0:(obd_mount.c:1368:server_put_super()) no obd crew8-MDT Sep 4 16:28:13 mds2 kernel: LustreError: 3797:0:(obd_mount.c:119:server_deregister_mount()) crew8-MDT not registered Sep 4 16:28:13 mds2 kernel: Lustre: server umount crew8-MDT complete Sep 4 16:28:13 mds2 kernel: LustreError: 3797:0:(obd_mount.c:1924:lustre_fill_super()) Unable to mount (-5) The o2ib network is up. It is ping-able via bash and lctl. I can get to it from itself and from other computers on this local subnet. [EMAIL PROTECTED] ~]# lctl lctl ping [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] lctl ping [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] lctl quit On this net, there are no firewalls as the computers are using only non-routable IP numbers. So there is not a firewall issue of which I am aware... [EMAIL PROTECTED] ~]# iptables -L -bash: iptables: command not found The only oddity I have found is that the modules in my working MGS/MDS are used more than the modules in my new MGS/MDT. Correctly functioning MGS/MDT: [EMAIL PROTECTED] ~]# lsmod | grep mgs mgs 181512 1 mgc86744 2 mgs ptlrpc659512 8 osc,mds,mgs,mgc,lustre,lov,lquota,mdc obdclass 542200 13 osc,mds,fsfilt_ldiskfs,mgs,mgc,lustre,lov,lquota,mdc,ptlrpc lvfs 84712 12 osc,mds,fsfilt_ldiskfs,mgs,mgc,lustre,lov,lquota,mdc,ptlrpc,obdclass libcfs183128 14 osc ,mds ,fsfilt_ldiskfs ,mgs,mgc,lustre,lov,lquota,mdc,ko2iblnd,ptlrpc,obdclass,lnet,lvfs [EMAIL PROTECTED] ~]# lsmod | grep osc osc 172136 11 ptlrpc659512 8 osc,mds,mgs,mgc,lustre,lov,lquota,mdc obdclass 542200 13 osc,mds,fsfilt_ldiskfs,mgs,mgc,lustre,lov,lquota,mdc,ptlrpc lvfs 84712 12 osc,mds,fsfilt_ldiskfs,mgs,mgc,lustre,lov,lquota,mdc,ptlrpc,obdclass libcfs183128 14 osc ,mds ,fsfilt_ldiskfs ,mgs,mgc,lustre,lov,lquota,mdc,ko2iblnd,ptlrpc,obdclass,lnet,lvfs [EMAIL PROTECTED] ~]# lsmod | grep lnet lnet 255656 4 lustre,ko2iblnd,ptlrpc,obdclass libcfs183128 14 osc ,mds ,fsfilt_ldiskfs ,mgs,mgc,lustre,lov,lquota,mdc,ko2iblnd,ptlrpc,obdclass,lnet,lvfs Failing MGS/MDT: [EMAIL PROTECTED] ~]# lsmod | grep mgs mgs 181512 0 mgc
Re: [Lustre-discuss] Starting a new MGS/MDS
On Sep 05, 2008 11:11 -0400, Aaron Knister wrote: Does the new MDS actually have an MGS running? FYI- you only need one mgs per lustre set up. In the commands you issued it doesn't look like you actually set up an MGS on the host mds2. Can you run an lctl dl on mds2 and send the output? There are tradeoffs between having a single MGS for multiple filesystems, and having one MGS per filesystem (assuming different MDS nodes). In general, there isn't much benefit to sharing an MGS between multiple MDS nodes, and the drawback is that it is a single point of failure, so you may as well have one per MDS. On Sep 4, 2008, at 4:54 PM, Ms. Megan Larko wrote: Hi, I have a new MGS/MDS that I would like to start. It is another of the same Cent0S 5 kernel 2.6.18-53.1.13.el5 lustre-1.6.4.3smp as my other boxes. Initially I had an IP number that was used elsewhere in our group. I changed it using the tunefs.lustre command below for the new MDT. [EMAIL PROTECTED] ~]# tunefs.lustre --erase-params --writeconf [EMAIL PROTECTED] /dev/sdd1 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: crew8-MDT Index: unassigned Lustre FS: crew8 Mount type: ldiskfs Flags: 0x71 (MDT needs_index first_time update ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: [EMAIL PROTECTED] Permanent disk data: Target: crew8-MDT Index: unassigned Lustre FS: crew8 Mount type: ldiskfs Flags: 0x171 (MDT needs_index first_time update writeconf ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: [EMAIL PROTECTED] Writing CONFIGS/mountdata Next I try to mount this new MDT onto the system [EMAIL PROTECTED] ~]# mount -t lustre /dev/sdd1 /srv/lustre/mds/crew8-MDT mount.lustre: mount /dev/sdd1 at /srv/lustre/mds/crew8-MDT failed: Input/output error Is the MGS running? Ummm--- yeah, I thought the MGS is running. [EMAIL PROTECTED] ~]# tail /var/log/messages Sep 4 16:28:08 mds2 kernel: LDISKFS-fs: mounted filesystem with ordered data mode. Sep 4 16:28:13 mds2 kernel: LustreError: 3526:0:(client.c:975:ptlrpc_expire_one_request()) @@@ timeout (sent at 1220560088, 5s ago) [EMAIL PROTECTED] x3/t0 o250-[EMAIL PROTECTED]@o2ib_0:26 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/-22 Sep 4 16:28:13 mds2 kernel: LustreError: 3797:0:(obd_mount.c:954:server_register_target()) registration with the MGS failed (-5) Sep 4 16:28:13 mds2 kernel: LustreError: 3797:0:(obd_mount.c:1054:server_start_targets()) Required registration failed for crew8-MDT: -5 Sep 4 16:28:13 mds2 kernel: LustreError: 15f-b: Communication error with the MGS. Is the MGS running? Sep 4 16:28:13 mds2 kernel: LustreError: 3797:0:(obd_mount.c:1570:server_fill_super()) Unable to start targets: -5 Sep 4 16:28:13 mds2 kernel: LustreError: 3797:0:(obd_mount.c:1368:server_put_super()) no obd crew8-MDT Sep 4 16:28:13 mds2 kernel: LustreError: 3797:0:(obd_mount.c:119:server_deregister_mount()) crew8-MDT not registered Sep 4 16:28:13 mds2 kernel: Lustre: server umount crew8-MDT complete Sep 4 16:28:13 mds2 kernel: LustreError: 3797:0:(obd_mount.c:1924:lustre_fill_super()) Unable to mount (-5) The o2ib network is up. It is ping-able via bash and lctl. I can get to it from itself and from other computers on this local subnet. [EMAIL PROTECTED] ~]# lctl lctl ping [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] lctl ping [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] lctl quit On this net, there are no firewalls as the computers are using only non-routable IP numbers. So there is not a firewall issue of which I am aware... [EMAIL PROTECTED] ~]# iptables -L -bash: iptables: command not found The only oddity I have found is that the modules in my working MGS/MDS are used more than the modules in my new MGS/MDT. Correctly functioning MGS/MDT: [EMAIL PROTECTED] ~]# lsmod | grep mgs mgs 181512 1 mgc86744 2 mgs ptlrpc659512 8 osc,mds,mgs,mgc,lustre,lov,lquota,mdc obdclass 542200 13 osc,mds,fsfilt_ldiskfs,mgs,mgc,lustre,lov,lquota,mdc,ptlrpc lvfs 84712 12 osc,mds,fsfilt_ldiskfs,mgs,mgc,lustre,lov,lquota,mdc,ptlrpc,obdclass libcfs183128 14 osc ,mds ,fsfilt_ldiskfs ,mgs,mgc,lustre,lov,lquota,mdc,ko2iblnd,ptlrpc,obdclass,lnet,lvfs [EMAIL PROTECTED] ~]# lsmod | grep osc osc 172136 11 ptlrpc659512 8 osc,mds,mgs,mgc,lustre,lov,lquota,mdc obdclass 542200 13 osc,mds,fsfilt_ldiskfs,mgs,mgc,lustre,lov,lquota,mdc,ptlrpc lvfs 84712 12
Re: [Lustre-discuss] small cluster
On Sep 05, 2008 07:59 -0600, Kevin Van Maren wrote: The biggest problem with using Lustre in this environment is that the Lustre _servers_ (those providing the disk space) cannot _use_ the filesystem (cannot have a client and a server on same machine). Well, it depends on how much memory pressure there is. It definitely works with client-on-OST (I run my home system that way, and lots of Lustre developers do functional tests in that config) but it has the chance of deadlock if there are memory-hungry apps running on the same OSS. This could be avoided by forcing all writes to be to remote OSS nodes, possibly at the cost of some performance, though network IO on a 1GigE is faster than to a single disk if you have a good switch. The other (more major, IMHO) issue is that if the client/OSS node crashes, it takes its disk with it, and now other clients cannot access the data there. Similarly, without RAID of the disk, any disk loss would mean loss of 1/36 of the filesystem. This could be avoided by having RAID1 of the disks using DRBD to a remote node that is configured as the OST failover. There is a wiki page on the lustre wiki about using DRBD. Please keep the list informed (and wiki updated) if you decide to use this configuration. Paolo Supino wrote: I'm new to the world of HPC and cluster :-( and I was given the task of setting up a new (small) HPC cluster based on 36 Sun X2200 M2 systems. Each system as 1 136GB SAS HD. I've installed CentOS 5.2 as the base OS on all the systems and through partitinoing managed to have a unused partition size 65GB (and with 36 nodes I get +2TB of distributed storage). Tthe rest of the HD is as follows: 4xRAM = 64GB swap and ~10GB for the OS itself. I want ot put a Clustered FS on the unused partitions and from looking around seems that Lustre is king of the hill ... The question I have is: Will I gain anything from using Lustre in such a tight environemnt or am I going to degrade the cluster's performnce that it simply not worth the effort? Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre directory sizes - fast du
On Sep 04, 2008 18:55 +0100, Peter Grandi wrote: Hi all, since our users have managed to write several TBs to Lustre by now, they sometimes would like to know what and how much there is in their directories. Is there any smarter way to find out than to do a du -hs dirname and wait for 30min for the 12TB-answer ? One possibility would be to enable quotas with large limits for every user that won't hamper usage. This will track space usage for each user. Depending on your coding skills it might even be possible to change the quota code so that it only tracked space usage but didn't enforce usage limits. This would potentially reduce the overhead of quotas because there is no need for OSTs to check the quota limits during IO. If you have any patches that speed up the fetching of (what are likely to be) millions of records from random places on a disk very quickly, and also speed up the latency of the associated network roundtrips please let us know :-). That is always our goal as well :-). I've already told them to substitute ls -l by find -type f -exec ls -l {};, although I'm not too sure about that either. I don't think that will help at all. ls is a crazy bunch of code that does stat on the directory and all kinds of extra work. Possibly better would be: lfs find ${dir} -type f | xargs stat -c %b Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss