[Lustre-discuss] Performance Question
We have a small luster setup with two ost on two oss servers and I'm curious if moving to one ost per oss with 4 oss servers would increase performance? ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Upgrade from 1.8.4-oracle to 1.8.8-whamcloud question
We are still running 1.8.4 back when Lustre was still hosted by Oracle, and its been mostly stable except for a few bugs here and there that I see have been fixed in the last 1.8.8 release from whamcloud. I'm wondering, can I update the server side with the whamcloud's rpms without updating the client side right away(thus requiring a full shut down). ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] How smart is Lustre?
In my experience, if there is a particular driver for multipathing from the vendor, go for that. In our setup, we have Oracle/Sun disk arrays and with the standard linux multipathing daemon, I would get lots of weird I/O errors. Turns out the disk arrays had picked their preferred path, but Linux was trying to talk to the LUNs on both paths and would only receive a response on the preferred one. There is an rdac driver that can be installed. Simply disable the multipathing daemon or configure it to ignore the disk arrays and use the vendor solution. I had no more I/O errors(Which only served to slow down the boot up process). On Wed, Dec 19, 2012 at 11:36 AM, Jason Brooks brook...@ohsu.edu wrote: Hello, I am building a 2.3.x filesystem right now, and I am looking at setting up some active-active failover abilities to my oss's. I have been looking at Dell's md3xxx arrays, as they have redundant controllers, and allow up to four hosts to connect to each controller. I can see how linux multi-path can be used with redundant disk controllers. I can even (slightly) understand how lustre fails over when an oss goes down. 1. Is lustre smart enough to use redundant paths, or failover oss's if an oss is congested? (it would be cool, no?) 2. Does the linux multi-path module slow performance? 3. How much does a raid array such as the one listed above act as a bottleneck, say if I have as many volumes available on the raid controllers as there are oss hosts? 4. Are there arrays similar to Dell's model that would work? Thanks! --jason ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Monitoring program io usage
We run a small cluster with a two node lustre setup, so its easy to see when some program thrashes the file system. Not being a programmer, what tools or methods could I use to monitor and log data to help the developer regarding their io usage on lustre? ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Service thread count parameter
How does one estimate a good number of service threads? I'm not sure I understand the following: 1 thread / 128MB * number of cpus On Wed, Oct 10, 2012 at 9:17 AM, Jean-Francois Le Fillatre jean-francois.lefilla...@clumeq.ca wrote: Hi David, It needs to be specified as a module parameter at boot time, in /etc/modprobe.conf. Check the Lustre tuning page: http://wiki.lustre.org/manual/LustreManual18_HTML/LustreTuning.html http://wiki.lustre.org/manual/LustreManual20_HTML/LustreTuning.html Note that once created, the threads won't be destroyed, so if you want to lower your thread count you'll need to reboot your system. Thanks, JF On Tue, Oct 9, 2012 at 6:00 PM, David Noriega tsk...@my.utsa.edu wrote: Is this a parameter, ost.OSS.ost_io.threads_max, when set via lctl conf_parm will persist between reboots/remounts? ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- Jean-François Le Fillâtre Calcul Québec / Université Laval, Québec, Canada jean-francois.lefilla...@clumeq.ca -- David Noriega CSBC/CBI System Administrator University of Texas at San Antonio One UTSA Circle San Antonio, TX 78249 Office: BSE 3.114 Phone: 210-458-7100 http://www.cbi.utsa.edu Please remember to acknowledge the RCMI grant , wording should be as stated below:This project was supported by a grant from the National Institute on Minority Health and Health Disparities (G12MD007591) from the National Institutes of Health. Also, remember to register all publications with PubMed Central. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Service thread count parameter
Is this a parameter, ost.OSS.ost_io.threads_max, when set via lctl conf_parm will persist between reboots/remounts? ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre missing physical volume
What an adventure this turned into. Turns out when I had to relabel the physical volumes, I got two of them backwards(realized this when I checked /proc/fs/luster/devices) and somehow this was tripping things up. I swapped them back using pvremove and pvcreate, remounted and after a few minutes, the clients reconnected and the system is happy again. On Mon, Jul 2, 2012 at 12:42 AM, David Noriega tsk...@my.utsa.edu wrote: Sorry for the rushed email. For some reason the LVM metadata got screwed up, managed to restore it, though now running into another issue. I've mounted the OSTs yet it seems they are not all cooperating. One of the OSTs will stay listed as Resource Unavailable and this seems to be the main message on the OSS node: LustreError: 137-5: UUID 'lustre-OST0002_UUID' is not available for connect (no target) LustreError: Skipped 470 previous similar messages LustreError: 5214:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing error (-19) req@8103ffc73400 x1404513746630678/t0 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1341207057 ref 1 fl Interpret:/0/0 rc -19/0 LustreError: 5214:0:(ldlm_lib.c:1914:target_send_reply_msg()) Skipped 470 previous similar messages I've tried remounting this ost on the other data node but still won't connect from the client side. I've even rebooted the mds and still no go. I've run e2fsck to check the OSTs and no issues and the disk arrays report no problems on their end and fibre connections are good and the multipath driver doesnt report anything(These are Sun disk arrays so using the rdac driver instead of the basic multpath daemon). On the client side I'll see this: Lustre: 3289:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request x1404591888147958 sent from lustre-OST0002-osc-8104104ad800 to NID 192.168.5.101@tcp 0s ago has failed due to network error (30s prior to deadline). req@81015113b400 x1404591888147958/t0 o8-lustre-OST0002_UUID@192.168.5.101@tcp:28/4 lens 368/584 e 0 to 1 dl 1341187631 ref 1 fl Rpc:N/0/0 rc 0/0 Lustre: 3290:0:(import.c:517:import_select_connection()) lustre-OST0002-osc-8104104ad800: tried all connections, increasing latency to 22s Lustre: 3290:0:(import.c:517:import_select_connection()) Skipped 39 previous similar messages On Sun, Jul 1, 2012 at 8:10 PM, Mark Day mark@rsp.com.au wrote: Does the device show up in /dev ? Have you physically checked for Fibre/SAS connectivity, RAID controller errors etc? You may need to supply more information about your setup. It sounds more like a RAID/disk issue than a Lustre issue. From: David Noriega tsk...@my.utsa.edu To: lustre-discuss@lists.lustre.org Sent: Monday, 2 July, 2012 8:51:18 AM Subject: [Lustre-discuss] Lustre missing physical volume Just recently used heartbeat to failover resources so that I could power down a lustre node to add more ram and failed back to do the same to our second lustre node. Only then do I find that now our lustre install is missing a physical volume out of lvm. pvscan only shows three out of four partitions. Any hints? I've tried some recovery steps in lvm with pvcreate using the archived config for the missing pv but no luck, says no device with such uuid. I'm lost on what to do now. This is lustre 1.8.4 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- David Noriega CSBC/CBI System Administrator University of Texas at San Antonio One UTSA Circle San Antonio, TX 78249 Office: BSE 3.112 Phone: 210-458-7100 http://www.cbi.utsa.edu -- David Noriega CSBC/CBI System Administrator University of Texas at San Antonio One UTSA Circle San Antonio, TX 78249 Office: BSE 3.112 Phone: 210-458-7100 http://www.cbi.utsa.edu ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Lustre missing physical volume
Just recently used heartbeat to failover resources so that I could power down a lustre node to add more ram and failed back to do the same to our second lustre node. Only then do I find that now our lustre install is missing a physical volume out of lvm. pvscan only shows three out of four partitions. Any hints? I've tried some recovery steps in lvm with pvcreate using the archived config for the missing pv but no luck, says no device with such uuid. I'm lost on what to do now. This is lustre 1.8.4 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre missing physical volume
Sorry for the rushed email. For some reason the LVM metadata got screwed up, managed to restore it, though now running into another issue. I've mounted the OSTs yet it seems they are not all cooperating. One of the OSTs will stay listed as Resource Unavailable and this seems to be the main message on the OSS node: LustreError: 137-5: UUID 'lustre-OST0002_UUID' is not available for connect (no target) LustreError: Skipped 470 previous similar messages LustreError: 5214:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing error (-19) req@8103ffc73400 x1404513746630678/t0 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1341207057 ref 1 fl Interpret:/0/0 rc -19/0 LustreError: 5214:0:(ldlm_lib.c:1914:target_send_reply_msg()) Skipped 470 previous similar messages I've tried remounting this ost on the other data node but still won't connect from the client side. I've even rebooted the mds and still no go. I've run e2fsck to check the OSTs and no issues and the disk arrays report no problems on their end and fibre connections are good and the multipath driver doesnt report anything(These are Sun disk arrays so using the rdac driver instead of the basic multpath daemon). On the client side I'll see this: Lustre: 3289:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request x1404591888147958 sent from lustre-OST0002-osc-8104104ad800 to NID 192.168.5.101@tcp 0s ago has failed due to network error (30s prior to deadline). req@81015113b400 x1404591888147958/t0 o8-lustre-OST0002_UUID@192.168.5.101@tcp:28/4 lens 368/584 e 0 to 1 dl 1341187631 ref 1 fl Rpc:N/0/0 rc 0/0 Lustre: 3290:0:(import.c:517:import_select_connection()) lustre-OST0002-osc-8104104ad800: tried all connections, increasing latency to 22s Lustre: 3290:0:(import.c:517:import_select_connection()) Skipped 39 previous similar messages On Sun, Jul 1, 2012 at 8:10 PM, Mark Day mark@rsp.com.au wrote: Does the device show up in /dev ? Have you physically checked for Fibre/SAS connectivity, RAID controller errors etc? You may need to supply more information about your setup. It sounds more like a RAID/disk issue than a Lustre issue. From: David Noriega tsk...@my.utsa.edu To: lustre-discuss@lists.lustre.org Sent: Monday, 2 July, 2012 8:51:18 AM Subject: [Lustre-discuss] Lustre missing physical volume Just recently used heartbeat to failover resources so that I could power down a lustre node to add more ram and failed back to do the same to our second lustre node. Only then do I find that now our lustre install is missing a physical volume out of lvm. pvscan only shows three out of four partitions. Any hints? I've tried some recovery steps in lvm with pvcreate using the archived config for the missing pv but no luck, says no device with such uuid. I'm lost on what to do now. This is lustre 1.8.4 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- David Noriega CSBC/CBI System Administrator University of Texas at San Antonio One UTSA Circle San Antonio, TX 78249 Office: BSE 3.112 Phone: 210-458-7100 http://www.cbi.utsa.edu ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Client kernel panic
I've seen this happen every once in a while on nodes in our cluster. Since they crash hard, unable to get much in the way of logs and this is all I can see via remote console from their ilom: Code: 8b 17 85 d2 74 73 8b 47 28 85 c0 74 f6 05 d1 58 d1 ff 01 RIP [88781ce1] :lustre:ll_intent_drop_lock+0x11/0xb0 RSP 810c3608d388 0Kernel panic -not syncing : Fatal exception I dont see anything on the OSS or meta data nodes except for the I think its dead I'm evicting it message. -- David Noriega System Administrator Computational Biology Initiative High Performance Computing Center University of Texas at San Antonio One UTSA Circle San Antonio, TX 78249 Office: BSE 3.112 Phone: 210-458-7100 http://www.cbi.utsa.edu ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Lustre Read Tuning
On our system, we typically have more reading then writing going on and was wondering what are the best parameters to tune? I have set lnet.debug to 0, and have increased max rpcs in flight as well as dirty mb. I left lru_size dynamic as setting it didn't seem to have any affect. -- David Noriega System Administrator Computational Biology Initiative High Performance Computing Center University of Texas at San Antonio One UTSA Circle San Antonio, TX 78249 Office: BSE 3.112 Phone: 210-458-7100 http://www.cbi.utsa.edu ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Thread might be hung, Heavy IO Load messages
We have two OSSs, each with two quad core AMD Opterons and 8GB of ram and two OSTs each(4.4T and 3.5T). Backend storage is a pair of Sun StorageTek 2540 connected with 8Gb fiber. What about tweaking max_dirty_mb on the client side? On Wed, Feb 1, 2012 at 6:33 PM, Carlos Thomaz ctho...@ddn.com wrote: David, The oss service threads is a function of your RAM size and CPUs. It's difficult to say what would be a good upper limit without knowing the size of your OSS, # clients, storage back-end and workload. But the good thing you can give a try on the fly via lctl set_param command. Assuming you are running lustre 1.8, here is a good explanation on how to do it: http://wiki.lustre.org/manual/LustreManual18_HTML/LustreProc.html#50651263_ 87260 Some remarks: - reducing the number of OSS threads may impact the performance depending on how is your workload. - unfortunately I guess you will need to try and see what happens. I would go for 128 and analyze the behavior of your OSSs (via log files) and also keeping an eye on your workload. Seems to me that 300 is a bit too high (but again, I don't know what you have on your storage back-end or OSS configuration). I can't tell you much about the lru_size, but as far as I understand the values are dynamic and there's not much to do rather than clear the last recently used queue or disable the lru sizing. I can't help much on this other than pointing you out the explanation for it (see 31.2.11): http://wiki.lustre.org/manual/LustreManual20_HTML/LustreProc.html Regards, Carlos -- Carlos Thomaz | HPC Systems Architect Mobile: +1 (303) 519-0578 ctho...@ddn.com | Skype ID: carlosthomaz DataDirect Networks, Inc. 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921 ddn.com http://www.ddn.com/ | Twitter: @ddn_limitless http://twitter.com/ddn_limitless | 1.800.TERABYTE On 2/1/12 2:11 PM, David Noriega tsk...@my.utsa.edu wrote: zone_reclaim_mode is 0 on all clients/servers When changing number of service threads or the lru_size, can these be done on the fly or do they require a reboot of either client or server? For my two OSTs, cat /proc/fs/lustre/ost/OSS/ost_io/threads_started give about 300(300, 359) so I'm thinking try half of that and see how it goes? Also checking lru_size, I get different numbers from the clients. cat /proc/fs/lustre/ldlm/namespaces/*/lru_size Client: MDT0 OST0 OST1 OST2 OST3 MGC head node: 0 22 22 22 22 400 (only a few users logged in) busy node: 1 501 504 503 505 400 (Fully loaded with jobs) samba/nfs server: 4 440070 44370 44348 26282 1600 So my understanding is the lru_size is set to auto by default thus the varying values, but setting it manually is effectively setting a max value? Also what does it mean to have a lower value(especially in the case of the samba/nfs server)? On Wed, Feb 1, 2012 at 1:27 PM, Charles Taylor tay...@hpc.ufl.edu wrote: You may also want to check and, if necessary, limit the lru_size on your clients. I believe there are guidelines in the ops manual. We have ~750 clients and limit ours to 600 per OST. That, combined with the setting zone_reclaim_mode=0 should make a big difference. Regards, Charlie Taylor UF HPC Center On Feb 1, 2012, at 2:04 PM, Carlos Thomaz wrote: Hi David, You may be facing the same issue discussed on previous threads, which is the issue regarding the zone_reclaim_mode. Take a look on the previous thread where myself and Kevin replied to Vijesh Ek. If you don't have access to the previous emails, look at your kernel settings for the zone reclaim: cat /proc/sys/vm/zone_reclaim_mode It should be set to 0. Also, look at the number of Lustre OSS service threads. It may be set to high... Rgds. Carlos. -- Carlos Thomaz | HPC Systems Architect Mobile: +1 (303) 519-0578 ctho...@ddn.com | Skype ID: carlosthomaz DataDirect Networks, Inc. 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921 ddn.com http://www.ddn.com/ | Twitter: @ddn_limitless http://twitter.com/ddn_limitless | 1.800.TERABYTE On 2/1/12 11:57 AM, David Noriega tsk...@my.utsa.edu wrote: indicates the system was overloaded (too many service threads, or ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss Charles A. Taylor, Ph.D. Associate Director, UF HPC Center (352) 392-4036 -- David Noriega System Administrator Computational Biology Initiative High Performance Computing Center University of Texas at San Antonio One UTSA Circle San Antonio, TX 78249 Office: BSE 3.112 Phone: 210-458-7100 http://www.cbi.utsa.edu ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- David Noriega System Administrator Computational Biology Initiative High Performance Computing Center University of Texas at San Antonio One
Re: [Lustre-discuss] Thread might be hung, Heavy IO Load messages
On a side note, what about increasing the MDS service threads? Checking that, its running at its max of 128. On Thu, Feb 2, 2012 at 9:54 AM, David Noriega tsk...@my.utsa.edu wrote: We have two OSSs, each with two quad core AMD Opterons and 8GB of ram and two OSTs each(4.4T and 3.5T). Backend storage is a pair of Sun StorageTek 2540 connected with 8Gb fiber. What about tweaking max_dirty_mb on the client side? On Wed, Feb 1, 2012 at 6:33 PM, Carlos Thomaz ctho...@ddn.com wrote: David, The oss service threads is a function of your RAM size and CPUs. It's difficult to say what would be a good upper limit without knowing the size of your OSS, # clients, storage back-end and workload. But the good thing you can give a try on the fly via lctl set_param command. Assuming you are running lustre 1.8, here is a good explanation on how to do it: http://wiki.lustre.org/manual/LustreManual18_HTML/LustreProc.html#50651263_ 87260 Some remarks: - reducing the number of OSS threads may impact the performance depending on how is your workload. - unfortunately I guess you will need to try and see what happens. I would go for 128 and analyze the behavior of your OSSs (via log files) and also keeping an eye on your workload. Seems to me that 300 is a bit too high (but again, I don't know what you have on your storage back-end or OSS configuration). I can't tell you much about the lru_size, but as far as I understand the values are dynamic and there's not much to do rather than clear the last recently used queue or disable the lru sizing. I can't help much on this other than pointing you out the explanation for it (see 31.2.11): http://wiki.lustre.org/manual/LustreManual20_HTML/LustreProc.html Regards, Carlos -- Carlos Thomaz | HPC Systems Architect Mobile: +1 (303) 519-0578 ctho...@ddn.com | Skype ID: carlosthomaz DataDirect Networks, Inc. 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921 ddn.com http://www.ddn.com/ | Twitter: @ddn_limitless http://twitter.com/ddn_limitless | 1.800.TERABYTE On 2/1/12 2:11 PM, David Noriega tsk...@my.utsa.edu wrote: zone_reclaim_mode is 0 on all clients/servers When changing number of service threads or the lru_size, can these be done on the fly or do they require a reboot of either client or server? For my two OSTs, cat /proc/fs/lustre/ost/OSS/ost_io/threads_started give about 300(300, 359) so I'm thinking try half of that and see how it goes? Also checking lru_size, I get different numbers from the clients. cat /proc/fs/lustre/ldlm/namespaces/*/lru_size Client: MDT0 OST0 OST1 OST2 OST3 MGC head node: 0 22 22 22 22 400 (only a few users logged in) busy node: 1 501 504 503 505 400 (Fully loaded with jobs) samba/nfs server: 4 440070 44370 44348 26282 1600 So my understanding is the lru_size is set to auto by default thus the varying values, but setting it manually is effectively setting a max value? Also what does it mean to have a lower value(especially in the case of the samba/nfs server)? On Wed, Feb 1, 2012 at 1:27 PM, Charles Taylor tay...@hpc.ufl.edu wrote: You may also want to check and, if necessary, limit the lru_size on your clients. I believe there are guidelines in the ops manual. We have ~750 clients and limit ours to 600 per OST. That, combined with the setting zone_reclaim_mode=0 should make a big difference. Regards, Charlie Taylor UF HPC Center On Feb 1, 2012, at 2:04 PM, Carlos Thomaz wrote: Hi David, You may be facing the same issue discussed on previous threads, which is the issue regarding the zone_reclaim_mode. Take a look on the previous thread where myself and Kevin replied to Vijesh Ek. If you don't have access to the previous emails, look at your kernel settings for the zone reclaim: cat /proc/sys/vm/zone_reclaim_mode It should be set to 0. Also, look at the number of Lustre OSS service threads. It may be set to high... Rgds. Carlos. -- Carlos Thomaz | HPC Systems Architect Mobile: +1 (303) 519-0578 ctho...@ddn.com | Skype ID: carlosthomaz DataDirect Networks, Inc. 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921 ddn.com http://www.ddn.com/ | Twitter: @ddn_limitless http://twitter.com/ddn_limitless | 1.800.TERABYTE On 2/1/12 11:57 AM, David Noriega tsk...@my.utsa.edu wrote: indicates the system was overloaded (too many service threads, or ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss Charles A. Taylor, Ph.D. Associate Director, UF HPC Center (352) 392-4036 -- David Noriega System Administrator Computational Biology Initiative High Performance Computing Center University of Texas at San Antonio One UTSA Circle San Antonio, TX 78249 Office: BSE 3.112 Phone: 210-458-7100 http://www.cbi.utsa.edu ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http
Re: [Lustre-discuss] Thread might be hung, Heavy IO Load messages
I found this thread Luster clients getting evicted as I've also seen the ost_connect operation failed with -16 message and there they recommend increasing the timeout, though that was for 1.6 and as I've read 1.8 has a different timeout system. Reading that, would increasing at_min(currently 0) or at_max(currently 600) be best? On Thu, Feb 2, 2012 at 12:07 PM, Andreas Dilger adil...@whamcloud.com wrote: On 2012-02-02, at 8:54 AM, David Noriega wrote: We have two OSSs, each with two quad core AMD Opterons and 8GB of ram and two OSTs each(4.4T and 3.5T). Backend storage is a pair of Sun StorageTek 2540 connected with 8Gb fiber. Running 32-64 threads per OST is the optimum number, based on previous experience. What about tweaking max_dirty_mb on the client side? Probably unrelated. On Wed, Feb 1, 2012 at 6:33 PM, Carlos Thomaz ctho...@ddn.com wrote: David, The oss service threads is a function of your RAM size and CPUs. It's difficult to say what would be a good upper limit without knowing the size of your OSS, # clients, storage back-end and workload. But the good thing you can give a try on the fly via lctl set_param command. Assuming you are running lustre 1.8, here is a good explanation on how to do it: http://wiki.lustre.org/manual/LustreManual18_HTML/LustreProc.html#50651263_ 87260 Some remarks: - reducing the number of OSS threads may impact the performance depending on how is your workload. - unfortunately I guess you will need to try and see what happens. I would go for 128 and analyze the behavior of your OSSs (via log files) and also keeping an eye on your workload. Seems to me that 300 is a bit too high (but again, I don't know what you have on your storage back-end or OSS configuration). I can't tell you much about the lru_size, but as far as I understand the values are dynamic and there's not much to do rather than clear the last recently used queue or disable the lru sizing. I can't help much on this other than pointing you out the explanation for it (see 31.2.11): http://wiki.lustre.org/manual/LustreManual20_HTML/LustreProc.html Regards, Carlos -- Carlos Thomaz | HPC Systems Architect Mobile: +1 (303) 519-0578 ctho...@ddn.com | Skype ID: carlosthomaz DataDirect Networks, Inc. 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921 ddn.com http://www.ddn.com/ | Twitter: @ddn_limitless http://twitter.com/ddn_limitless | 1.800.TERABYTE On 2/1/12 2:11 PM, David Noriega tsk...@my.utsa.edu wrote: zone_reclaim_mode is 0 on all clients/servers When changing number of service threads or the lru_size, can these be done on the fly or do they require a reboot of either client or server? For my two OSTs, cat /proc/fs/lustre/ost/OSS/ost_io/threads_started give about 300(300, 359) so I'm thinking try half of that and see how it goes? Also checking lru_size, I get different numbers from the clients. cat /proc/fs/lustre/ldlm/namespaces/*/lru_size Client: MDT0 OST0 OST1 OST2 OST3 MGC head node: 0 22 22 22 22 400 (only a few users logged in) busy node: 1 501 504 503 505 400 (Fully loaded with jobs) samba/nfs server: 4 440070 44370 44348 26282 1600 So my understanding is the lru_size is set to auto by default thus the varying values, but setting it manually is effectively setting a max value? Also what does it mean to have a lower value(especially in the case of the samba/nfs server)? On Wed, Feb 1, 2012 at 1:27 PM, Charles Taylor tay...@hpc.ufl.edu wrote: You may also want to check and, if necessary, limit the lru_size on your clients. I believe there are guidelines in the ops manual. We have ~750 clients and limit ours to 600 per OST. That, combined with the setting zone_reclaim_mode=0 should make a big difference. Regards, Charlie Taylor UF HPC Center On Feb 1, 2012, at 2:04 PM, Carlos Thomaz wrote: Hi David, You may be facing the same issue discussed on previous threads, which is the issue regarding the zone_reclaim_mode. Take a look on the previous thread where myself and Kevin replied to Vijesh Ek. If you don't have access to the previous emails, look at your kernel settings for the zone reclaim: cat /proc/sys/vm/zone_reclaim_mode It should be set to 0. Also, look at the number of Lustre OSS service threads. It may be set to high... Rgds. Carlos. -- Carlos Thomaz | HPC Systems Architect Mobile: +1 (303) 519-0578 ctho...@ddn.com | Skype ID: carlosthomaz DataDirect Networks, Inc. 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921 ddn.com http://www.ddn.com/ | Twitter: @ddn_limitless http://twitter.com/ddn_limitless | 1.800.TERABYTE On 2/1/12 11:57 AM, David Noriega tsk...@my.utsa.edu wrote: indicates the system was overloaded (too many service threads, or ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss Charles A. Taylor, Ph.D. Associate
Re: [Lustre-discuss] Thread might be hung, Heavy IO Load messages
zone_reclaim_mode is 0 on all clients/servers When changing number of service threads or the lru_size, can these be done on the fly or do they require a reboot of either client or server? For my two OSTs, cat /proc/fs/lustre/ost/OSS/ost_io/threads_started give about 300(300, 359) so I'm thinking try half of that and see how it goes? Also checking lru_size, I get different numbers from the clients. cat /proc/fs/lustre/ldlm/namespaces/*/lru_size Client: MDT0 OST0 OST1 OST2 OST3 MGC head node: 0 22 22 22 22 400 (only a few users logged in) busy node: 1 501 504 503 505 400 (Fully loaded with jobs) samba/nfs server: 4 440070 44370 44348 26282 1600 So my understanding is the lru_size is set to auto by default thus the varying values, but setting it manually is effectively setting a max value? Also what does it mean to have a lower value(especially in the case of the samba/nfs server)? On Wed, Feb 1, 2012 at 1:27 PM, Charles Taylor tay...@hpc.ufl.edu wrote: You may also want to check and, if necessary, limit the lru_size on your clients. I believe there are guidelines in the ops manual. We have ~750 clients and limit ours to 600 per OST. That, combined with the setting zone_reclaim_mode=0 should make a big difference. Regards, Charlie Taylor UF HPC Center On Feb 1, 2012, at 2:04 PM, Carlos Thomaz wrote: Hi David, You may be facing the same issue discussed on previous threads, which is the issue regarding the zone_reclaim_mode. Take a look on the previous thread where myself and Kevin replied to Vijesh Ek. If you don't have access to the previous emails, look at your kernel settings for the zone reclaim: cat /proc/sys/vm/zone_reclaim_mode It should be set to 0. Also, look at the number of Lustre OSS service threads. It may be set to high... Rgds. Carlos. -- Carlos Thomaz | HPC Systems Architect Mobile: +1 (303) 519-0578 ctho...@ddn.com | Skype ID: carlosthomaz DataDirect Networks, Inc. 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921 ddn.com http://www.ddn.com/ | Twitter: @ddn_limitless http://twitter.com/ddn_limitless | 1.800.TERABYTE On 2/1/12 11:57 AM, David Noriega tsk...@my.utsa.edu wrote: indicates the system was overloaded (too many service threads, or ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss Charles A. Taylor, Ph.D. Associate Director, UF HPC Center (352) 392-4036 -- David Noriega System Administrator Computational Biology Initiative High Performance Computing Center University of Texas at San Antonio One UTSA Circle San Antonio, TX 78249 Office: BSE 3.112 Phone: 210-458-7100 http://www.cbi.utsa.edu ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Lustre error with nfs?
I get these errors, any ideas? Running Lustre 1.8.4. This client is also the server where we nfs export the filesystem. LustreError: 4994:0:(dir.c:384:ll_readdir_18()) error reading dir 575283686/935610515 page 0: rc -110 LustreError: 11-0: an error occurred while communicating with 192.168.5.104@tcp. The mds_readpage operation failed with -107 LustreError: 28410:0:(dir.c:384:ll_readdir_18()) error reading dir 579577179/4015460576 page 0: rc -110 LustreError: Skipped 12 previous similar messages Lustre: lustre-MDT-mdc-810338e81400: Connection to service lustre-MDT via nid 192.168.5.104@tcp was lost; in progress operations using this service will wait for recovery to complete. LustreError: 167-0: This client was evicted by lustre-MDT; in progress operations using this service will fail. LustreError: 25118:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@8101f87d8c00 x1383759180968916/t0 o35-lustre-MDT_UUID@192.168.5.104@tcp:23/10 lens 408/1128 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 LustreError: 25118:0:(file.c:116:ll_close_inode_openhandle()) inode 17928860 mdc close failed: rc = -108 LustreError: 25118:0:(mdc_locks.c:646:mdc_enqueue()) ldlm_cli_enqueue: -108 LustreError: 9199:0:(file.c:116:ll_close_inode_openhandle()) inode 579577179 mdc close failed: rc = -108 LustreError: 9199:0:(file.c:116:ll_close_inode_openhandle()) Skipped 1 previous similar message Lustre: lustre-MDT-mdc-810338e81400: Connection restored to service lustre-MDT using nid 192.168.5.104@tcp. nfsd: non-standard errno: -43 nfsd: non-standard errno: -43 LustreError: 4994:0:(dir.c:384:ll_readdir_18()) error reading dir 575283686/935610515 page 0: rc -110 LustreError: 4994:0:(dir.c:384:ll_readdir_18()) Skipped 29 previous similar messages LustreError: 11-0: an error occurred while communicating with 192.168.5.104@tcp. The mds_readpage operation failed with -107 Lustre: lustre-MDT-mdc-810338e81400: Connection to service lustre-MDT via nid 192.168.5.104@tcp was lost; in progress operations using this service will wait for recovery to complete. LustreError: 167-0: This client was evicted by lustre-MDT; in progress operations using this service will fail. LustreError: 4994:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@8102a576c000 x1383759180969003/t0 o37-lustre-MDT_UUID@192.168.5.104@tcp:23/10 lens 408/600 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 LustreError: 4994:0:(client.c:858:ptlrpc_import_delay_req()) Skipped 34 previous similar messages nfsd: non-standard errno: -108 nfsd: non-standard errno: -4 nfsd: non-standard errno: -4 nfsd: non-standard errno: -108 LustreError: 25118:0:(file.c:116:ll_close_inode_openhandle()) inode 17928860 mdc close failed: rc = -4 LustreError: 25118:0:(file.c:116:ll_close_inode_openhandle()) Skipped 1 previous similar message LustreError: 25118:0:(mdc_locks.c:646:mdc_enqueue()) ldlm_cli_enqueue: -108 LustreError: 25118:0:(mdc_locks.c:646:mdc_enqueue()) Skipped 4 previous similar messages LustreError: 28407:0:(file.c:3280:ll_inode_revalidate_fini()) failure -108 inode 558497795 LustreError: 28407:0:(file.c:3280:ll_inode_revalidate_fini()) Skipped 3 previous similar messages nfsd: non-standard errno: -108 Lustre: lustre-MDT-mdc-810338e81400: Connection restored to service lustre-MDT using nid 192.168.5.104@tcp. LustreError: 11-0: an error occurred while communicating with 192.168.5.104@tcp. The mds_close operation failed with -116 LustreError: Skipped 1 previous similar message LustreError: 28407:0:(file.c:116:ll_close_inode_openhandle()) inode 558497794 mdc close failed: rc = -116 LustreError: 28407:0:(file.c:116:ll_close_inode_openhandle()) Skipped 4 previous similar messages LustreError: 11-0: an error occurred while communicating with 192.168.5.104@tcp. The mds_close operation failed with -116 -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre error with nfs?
Overloaded on the client or mds? All the lustre nodes use nic bonding, so I suppose since we have alot of io traffic on this client, should bump up the number of nics in use? On Thu, Oct 27, 2011 at 3:28 PM, Colin Faber colin_fa...@xyratex.com wrote: Hi, Just quickly looking at the log you've posted, it looks like you're timing out with overloaded network. -cf On 10/27/2011 10:08 AM, David Noriega wrote: I get these errors, any ideas? Running Lustre 1.8.4. This client is also the server where we nfs export the filesystem. LustreError: 4994:0:(dir.c:384:ll_readdir_18()) error reading dir 575283686/935610515 page 0: rc -110 LustreError: 11-0: an error occurred while communicating with 192.168.5.104@tcp. The mds_readpage operation failed with -107 LustreError: 28410:0:(dir.c:384:ll_readdir_18()) error reading dir 579577179/4015460576 page 0: rc -110 LustreError: Skipped 12 previous similar messages Lustre: lustre-MDT-mdc-810338e81400: Connection to service lustre-MDT via nid 192.168.5.104@tcp was lost; in progress operations using this service will wait for recovery to complete. LustreError: 167-0: This client was evicted by lustre-MDT; in progress operations using this service will fail. LustreError: 25118:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@8101f87d8c00 x1383759180968916/t0 o35-lustre-MDT_UUID@192.168.5.104@tcp:23/10 lens 408/1128 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 LustreError: 25118:0:(file.c:116:ll_close_inode_openhandle()) inode 17928860 mdc close failed: rc = -108 LustreError: 25118:0:(mdc_locks.c:646:mdc_enqueue()) ldlm_cli_enqueue: -108 LustreError: 9199:0:(file.c:116:ll_close_inode_openhandle()) inode 579577179 mdc close failed: rc = -108 LustreError: 9199:0:(file.c:116:ll_close_inode_openhandle()) Skipped 1 previous similar message Lustre: lustre-MDT-mdc-810338e81400: Connection restored to service lustre-MDT using nid 192.168.5.104@tcp. nfsd: non-standard errno: -43 nfsd: non-standard errno: -43 LustreError: 4994:0:(dir.c:384:ll_readdir_18()) error reading dir 575283686/935610515 page 0: rc -110 LustreError: 4994:0:(dir.c:384:ll_readdir_18()) Skipped 29 previous similar messages LustreError: 11-0: an error occurred while communicating with 192.168.5.104@tcp. The mds_readpage operation failed with -107 Lustre: lustre-MDT-mdc-810338e81400: Connection to service lustre-MDT via nid 192.168.5.104@tcp was lost; in progress operations using this service will wait for recovery to complete. LustreError: 167-0: This client was evicted by lustre-MDT; in progress operations using this service will fail. LustreError: 4994:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@8102a576c000 x1383759180969003/t0 o37-lustre-MDT_UUID@192.168.5.104@tcp:23/10 lens 408/600 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 LustreError: 4994:0:(client.c:858:ptlrpc_import_delay_req()) Skipped 34 previous similar messages nfsd: non-standard errno: -108 nfsd: non-standard errno: -4 nfsd: non-standard errno: -4 nfsd: non-standard errno: -108 LustreError: 25118:0:(file.c:116:ll_close_inode_openhandle()) inode 17928860 mdc close failed: rc = -4 LustreError: 25118:0:(file.c:116:ll_close_inode_openhandle()) Skipped 1 previous similar message LustreError: 25118:0:(mdc_locks.c:646:mdc_enqueue()) ldlm_cli_enqueue: -108 LustreError: 25118:0:(mdc_locks.c:646:mdc_enqueue()) Skipped 4 previous similar messages LustreError: 28407:0:(file.c:3280:ll_inode_revalidate_fini()) failure -108 inode 558497795 LustreError: 28407:0:(file.c:3280:ll_inode_revalidate_fini()) Skipped 3 previous similar messages nfsd: non-standard errno: -108 Lustre: lustre-MDT-mdc-810338e81400: Connection restored to service lustre-MDT using nid 192.168.5.104@tcp. LustreError: 11-0: an error occurred while communicating with 192.168.5.104@tcp. The mds_close operation failed with -116 LustreError: Skipped 1 previous similar message LustreError: 28407:0:(file.c:116:ll_close_inode_openhandle()) inode 558497794 mdc close failed: rc = -116 LustreError: 28407:0:(file.c:116:ll_close_inode_openhandle()) Skipped 4 previous similar messages LustreError: 11-0: an error occurred while communicating with 192.168.5.104@tcp. The mds_close operation failed with -116 __ This email may contain privileged or confidential information, which should only be used for the purpose for which it was sent by Xyratex. No further rights or licenses are granted to use such information. If you are not the intended recipient of this message, please notify the sender by return and delete it. You may not use, copy, disclose or rely on the information contained in it. Internet email is susceptible to data corruption, interception and unauthorised amendment for which Xyratex does not accept liability. While we have taken reasonable
[Lustre-discuss] Upgrade from 1.8.6 to 2.1?
How easy would it be to upgrade from 1.8.6 to 2.1? Would simply dropping in the new packages be enough? Would it require downtime of the whole system? Also could I have the servers move to 2.1 while still having the clients at 1.8.6? -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Client unable to connect after reboot: Unable to process log 108
I think I'll add the lctl ping to a start up script as a workaround, but any ideas why this is happening? On Mon, Aug 29, 2011 at 10:26 AM, David Noriega tsk...@my.utsa.edu wrote: I've begun to notice this behavor in my clients. Not sure whats going on, but when a client reboots, its unable to mount lustre. I have to use 'lctrl ping' to ping any of the lustre nodes before I'm able to mount the lustre filesystem. Any ideas? Lustre: OBD class driver, http://www.lustre.org/ Lustre: Lustre Version: 1.8.4 Lustre: Build Version: 1.8.4-20100726215630-PRISTINE-2.6.18-194.3.1.el5_lustre.1.8.4 Lustre: Added LNI 192.168.1.2@tcp [8/256/0/180] Lustre: Accept secure, port 988 Lustre: Lustre Client File System; http://www.lustre.org/ Lustre: 3977:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request x1378464080855041 sent from MGC192.168.5.104@tcp to NID 192.168.5.104@tcp 5s ago has timed out (5s prior to deadline). req@81032d28dc00 x1378464080855041/t0 o250-MGS@MGC192.168.5.104@tcp_0:26/25 lens 368/584 e 0 to 1 dl 1314605796 ref 1 fl Rpc:N/0/0 rc 0/0 Lustre: 3977:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request x1378464080855043 sent from MGC192.168.5.104@tcp to NID 192.168.5.105@tcp 5s ago has timed out (5s prior to deadline). req@81033f410c00 x1378464080855043/t0 o250-MGS@MGC192.168.5.104@tcp_1:26/25 lens 368/584 e 0 to 1 dl 1314605821 ref 1 fl Rpc:N/0/0 rc 0/0 LustreError: 3839:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@81032d28d800 x1378464080855044/t0 o501-MGS@MGC192.168.5.104@tcp_1:26/25 lens 264/432 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 LustreError: 15c-8: MGC192.168.5.104@tcp: The configuration from log 'lustre-client' failed (-108). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 3839:0:(llite_lib.c:1086:ll_fill_super()) Unable to process log: -108 Lustre: client 81033887dc00 umount complete LustreError: 3839:0:(obd_mount.c:2050:lustre_fill_super()) Unable to mount (-108) Installing knfsd (copyright (C) 1996 o...@monad.swb.de). NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory NFSD: starting 90-second grace period FS-Cache: Loaded Lustre: 3977:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request x1378464080855045 sent from MGC192.168.5.104@tcp to NID 192.168.5.104@tcp 0s ago has failed due to network error (5s prior to deadline). req@810324d67400 x1378464080855045/t0 o250-MGS@MGC192.168.5.104@tcp_0:26/25 lens 368/584 e 0 to 1 dl 1314605832 ref 1 fl Rpc:N/0/0 rc 0/0 Lustre: 3977:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request x1378464080855047 sent from MGC192.168.5.104@tcp to NID 192.168.5.105@tcp 0s ago has failed due to network error (5s prior to deadline). req@810330d9c800 x1378464080855047/t0 o250-MGS@MGC192.168.5.104@tcp_1:26/25 lens 368/584 e 0 to 1 dl 1314605857 ref 1 fl Rpc:N/0/0 rc 0/0 LustreError: 5178:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@810324d67000 x1378464080855048/t0 o501-MGS@MGC192.168.5.104@tcp_1:26/25 lens 264/432 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 LustreError: 15c-8: MGC192.168.5.104@tcp: The configuration from log 'lustre-client' failed (-108). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 5178:0:(llite_lib.c:1086:ll_fill_super()) Unable to process log: -108 Lustre: client 81032f4a3400 umount complete LustreError: 5178:0:(obd_mount.c:2050:lustre_fill_super()) Unable to mount (-108) -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Client unable to connect after reboot: Unable to process log 108
I've begun to notice this behavor in my clients. Not sure whats going on, but when a client reboots, its unable to mount lustre. I have to use 'lctrl ping' to ping any of the lustre nodes before I'm able to mount the lustre filesystem. Any ideas? Lustre: OBD class driver, http://www.lustre.org/ Lustre: Lustre Version: 1.8.4 Lustre: Build Version: 1.8.4-20100726215630-PRISTINE-2.6.18-194.3.1.el5_lustre.1.8.4 Lustre: Added LNI 192.168.1.2@tcp [8/256/0/180] Lustre: Accept secure, port 988 Lustre: Lustre Client File System; http://www.lustre.org/ Lustre: 3977:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request x1378464080855041 sent from MGC192.168.5.104@tcp to NID 192.168.5.104@tcp 5s ago has timed out (5s prior to deadline). req@81032d28dc00 x1378464080855041/t0 o250-MGS@MGC192.168.5.104@tcp_0:26/25 lens 368/584 e 0 to 1 dl 1314605796 ref 1 fl Rpc:N/0/0 rc 0/0 Lustre: 3977:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request x1378464080855043 sent from MGC192.168.5.104@tcp to NID 192.168.5.105@tcp 5s ago has timed out (5s prior to deadline). req@81033f410c00 x1378464080855043/t0 o250-MGS@MGC192.168.5.104@tcp_1:26/25 lens 368/584 e 0 to 1 dl 1314605821 ref 1 fl Rpc:N/0/0 rc 0/0 LustreError: 3839:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@81032d28d800 x1378464080855044/t0 o501-MGS@MGC192.168.5.104@tcp_1:26/25 lens 264/432 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 LustreError: 15c-8: MGC192.168.5.104@tcp: The configuration from log 'lustre-client' failed (-108). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 3839:0:(llite_lib.c:1086:ll_fill_super()) Unable to process log: -108 Lustre: client 81033887dc00 umount complete LustreError: 3839:0:(obd_mount.c:2050:lustre_fill_super()) Unable to mount (-108) Installing knfsd (copyright (C) 1996 o...@monad.swb.de). NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory NFSD: starting 90-second grace period FS-Cache: Loaded Lustre: 3977:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request x1378464080855045 sent from MGC192.168.5.104@tcp to NID 192.168.5.104@tcp 0s ago has failed due to network error (5s prior to deadline). req@810324d67400 x1378464080855045/t0 o250-MGS@MGC192.168.5.104@tcp_0:26/25 lens 368/584 e 0 to 1 dl 1314605832 ref 1 fl Rpc:N/0/0 rc 0/0 Lustre: 3977:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request x1378464080855047 sent from MGC192.168.5.104@tcp to NID 192.168.5.105@tcp 0s ago has failed due to network error (5s prior to deadline). req@810330d9c800 x1378464080855047/t0 o250-MGS@MGC192.168.5.104@tcp_1:26/25 lens 368/584 e 0 to 1 dl 1314605857 ref 1 fl Rpc:N/0/0 rc 0/0 LustreError: 5178:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@810324d67000 x1378464080855048/t0 o501-MGS@MGC192.168.5.104@tcp_1:26/25 lens 264/432 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 LustreError: 15c-8: MGC192.168.5.104@tcp: The configuration from log 'lustre-client' failed (-108). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 5178:0:(llite_lib.c:1086:ll_fill_super()) Unable to process log: -108 Lustre: client 81032f4a3400 umount complete LustreError: 5178:0:(obd_mount.c:2050:lustre_fill_super()) Unable to mount (-108) -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] multipathd or sun rdac driver?
We already use multipathd in our install already, but this was something I wondered about. We use Sun disk arrays and they mention the use of their RDAC driver to multipathing on Linux. Since its from the vendor, one would think it be better. What does the collective think? Sun StorageTek RDAC Multipath Failover Driver for Linux http://download.oracle.com/docs/cd/E19373-01/820-4738-13/chapsing.html David -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] multipathd or sun rdac driver?
They are 2540 and I'm running EL5(centos). Well the thought came around since I had to rebuild a node after a hardware problem. So I went ahead and gave it a shot. I think I posted about this problem before somewhere in the mailing list about getting stray I/O errors which were for /dev/sdX devices that were the other path to the same device(Well thats the idea we came to). Well after installing the Sun RDAC module and disabling multipathd, I can happily say those messages are gone, so I suppose Sun's module is able to talk to the disk array in a better manner then multipathd. Though I haven't failed back the lustre ost's to this particular node just yet(will wait till the weekend). I'll post again if anything goes wrong, but I think going with this RDAC module might be better. ps: One thing that has nagged me since Lustre was installed and setup by a vendor, was the disk arrays were never setup with initiators or hosts in the configuration(Using CAM). We have another similar disk array(6140) we setup for another filesystem and I know initiators/hosts were setup on the array. I can't say that this has caused any problems, but its something in the back of my mind. Thanks, David On Wed, Jul 20, 2011 at 4:15 PM, Kevin Van Maren kevin.van.ma...@oracle.com wrote: David Noriega wrote: We already use multipathd in our install already, but this was something I wondered about. We use Sun disk arrays and they mention the use of their RDAC driver to multipathing on Linux. Since its from the vendor, one would think it be better. What does the collective think? Sun StorageTek RDAC Multipath Failover Driver for Linux http://download.oracle.com/docs/cd/E19373-01/820-4738-13/chapsing.html David I assume you are using the ST25xx or ST6xxx storage with Lustre? Exactly which arrays? I've been happy with RDAC, but I don't think Oracle has released RHEL6 support yet (but Oracle also does not support Lustre servers on RHEL6 yet). If your multupath config is working (ie, you've tested it by unplugging/replugging cables under load and were happy with the behavior), I'm not going to tell you to change. Kevin -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Client doesn't mount at boot
Just installed a new node on the cluster, imaged just like the rest, but it was unable to mount lustre on boot. I tried to mount but got the following from dmesg: Lustre: OBD class driver, http://www.lustre.org/ Lustre: Lustre Version: 1.8.4 Lustre: Build Version: 1.8.4-20100726215630-PRISTINE-2.6.18-194.3.1.el5_lustre.1.8.4 Lustre: Added LNI 192.168.255.194@tcp [8/256/0/180] Lustre: Accept secure, port 988 Lustre: Lustre Client File System; http://www.lustre.org/ Lustre: 4872:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request x1373071042674689 sent from MGC192.168.5.104@tcp to NID 192.168.5.104@tcp 5s ago has timed out (5s prior to deadline). req@811070397800 x1373071042674689/t0 o250-MGS@MGC192.168.5.104@tcp_0:26/25 lens 368/584 e 0 to 1 dl 1309462593 ref 1 fl Rpc:N/0/0 rc 0/0 eth0: no IPv6 routers present Lustre: 4872:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request x1373071042674691 sent from MGC192.168.5.104@tcp to NID 192.168.5.105@tcp 5s ago has timed out (5s prior to deadline). req@81107dc57000 x1373071042674691/t0 o250-MGS@MGC192.168.5.104@tcp_1:26/25 lens 368/584 e 0 to 1 dl 1309462618 ref 1 fl Rpc:N/0/0 rc 0/0 LustreError: 4735:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@81107039b800 x1373071042674692/t0 o501-MGS@MGC192.168.5.104@tcp_1:26/25 lens 264/432 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 LustreError: 15c-8: MGC192.168.5.104@tcp: The configuration from log 'lustre-client' failed (-108). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 4735:0:(llite_lib.c:1086:ll_fill_super()) Unable to process log: -108 Lustre: client 81106881fc00 umount complete LustreError: 4735:0:(obd_mount.c:2050:lustre_fill_super()) Unable to mount (-108) and from /var/log/messages: Jun 30 14:52:18 compute-6-3 kernel: LustreError: 4395:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@81106f017c00 x1373072007364612/t0 o501-MGS@MGC192.168.5.104@tcp_1:26/25 lens 264/432 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 Jun 30 14:52:18 compute-6-3 kernel: LustreError: 15c-8: MGC192.168.5.104@tcp: The configuration from log 'lustre-client' failed (-108). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. Jun 30 14:52:18 compute-6-3 kernel: LustreError: 4395:0:(llite_lib.c:1086:ll_fill_super()) Unable to process log: -108 Jun 30 14:52:18 compute-6-3 kernel: LustreError: 4395:0:(obd_mount.c:2050:lustre_fill_super()) Unable to mount (-108) Only after I ran lctl ping x.x.x.x to the MDS/MGS was I able to manually mount lustre. I got the idea to run lctl ping from a post from someone with the same problem but over infinaband, we are using ethernet here. David -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] ZFS question: HW raid5 vs raidz?
I was checking out zfsonlinux.org to see how things have been going lately and I had a question. Whats the difference, or whats better: Use a hardware raid5(or 6) or use zfs to create a raidz pool? In terms of Lustre, is one preferred over another? David -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] ost_write operation failed with -28 in 1.8.5 lustre client
We are running lustre 1.8.4 and I can confirm that I see this message on one of our clients, the 'file server.' It serves up the lustre fs to machines outside our network via samba and nfs. On other clients(nodes in our compute cluster), I see the same message on a few times, though it says -19 or in one case -107 as the error number. Though just as they reported, we've had a few users say they have gotten a message saying the filesystem is full, even though its not. On Fri, Apr 29, 2011 at 10:04 AM, Rajendra prasad rajendra...@gmail.com wrote: Hi All, I am running lustre servers on 1.8.5 (recently upgraded from 1.8.2). Clients are still on 1.8.2 . I am getting the error ost_write operation failed with -28 in the clients. Due to this i am getting error message as No space left on the device oftenly. As per lfs df -h output all the OSTs are occupied around 55% only. lfs df -h UUID bytes Used Available Use% Mounted on lustre-MDT_UUID 52.3G 4.2G 48.1G 8% /opt/lustre[MDT:0] lustre-OST_UUID 442.9G 245.6G 197.3G 55% /opt/lustre[OST:0] lustre-OST0001_UUID 442.9G 238.7G 204.3G 53% /opt/lustre[OST:1] lustre-OST0002_UUID 442.9G 243.2G 199.7G 54% /opt/lustre[OST:2] lustre-OST0003_UUID 442.9G 236.5G 206.5G 53% /opt/lustre[OST:3] lustre-OST0004_UUID 442.9G 234.8G 208.1G 53% /opt/lustre[OST:4] lustre-OST0005_UUID 442.9G 239.7G 203.3G 54% /opt/lustre[OST:5] lustre-OST0006_UUID 442.9G 237.2G 205.7G 53% /opt/lustre[OST:6] lustre-OST0007_UUID 442.9G 227.9G 215.0G 51% /opt/lustre[OST:7] filesystem summary: 3.5T 1.9T 1.6T 53% /opt/lustre As per the below bugzilla, i have upgraded one of the lustre client verstion to 1.8.5 but still the issue persist in that client. https://bugzilla.lustre.org/show_bug.cgi?id=22755 Lustre clients are on Suse linux 10.1 . In order to install lustre client packages of 1.8.5, i have upgraded the Suse kernel also. I have also checked and found that no quota are enabled in the clients. lfs quota -u 36401 /opt/lustre Disk quotas for user 36401 (uid 36401): Filesystem kbytes quota limit grace files quota limit grace /opt/lustre 127315748 0 0 - 1001083 0 0 - Below are the lustre client packages i have installed. lustre-client-modules-1.8.5-2.6.16_60_0.69.1_lustre.1.8.5_smp lustre-client-1.8.5-2.6.16_60_0.69.1_lustre.1.8.5_smp Suse kernel packages installed: kernel-default-2.6.16.60-0.69.1 kernel-source-2.6.16.60-0.69.1 kernel-smp-2.6.16.60-0.69.1 kernel-syms-2.6.16.60-0.69.1 Error: Apr 29 15:35:55 hostname kernel: LustreError: 11-0: an error occurred while communicating with 172.16.x.x@tcp. The ost_write operation failed with -28 Apr 29 15:35:55 hostname kernel: LustreError: Skipped 9657 previous similar messages Apr 29 15:38:03 hostname kernel: LustreError: 11-0: an error occurred while communicating with 172.16.x.x@tcp. The ost_write operation failed with -28 Kindly suggest. Regards, Prasad ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] aacraid kernel panic caused failover
It is adaptec based, just branded by sun and built by intel. Anyways I reseated the card and will wait and see. If it still goes wonky, is there a card anyone recommends? It has to be a low profile pcie 8x with two x4 sas internal connectors. On Wed, Apr 6, 2011 at 10:38 AM, Thomas Roth t.r...@gsi.de wrote: Provided your card is actually a Adaptec Raid controller (it says Adaptec ASR 5405 on our cards, not Intel or Sun), this is definitely not the problem. We have had a number of broken or aged batteries amongs our 60 or so controller cards, but never any relation with the kernel panic and the controller complaining about its BBU. Cheers, Thomas On 04/06/2011 04:58 PM, David Noriega wrote: Our adaptec raid card is a Sun StorageTek RAID INT card, made by intel of all people. So I installed the raid manager software, which of course doesn't say anything is wrong, but it does come with a monitoring daemon and it printed this message after the last aacraid kernel panic: Sun StorageTek RAID Manager Agent: [203] The battery-backup cache device needs a new battery: controller 1. So could that be the problem? On Wed, Apr 6, 2011 at 7:52 AM, Jeff Johnson jeff.john...@aeoncomputing.com wrote: I have seen similar behavior on these controllers. On dissimilar configs and different aged systems. These happened to be non-Lustre standalone nfs and iscsi target boxes. Went through controller and drive firmware upgrades, low-level fw dumps and analysis from dev engineers. In the end it was never really explained or resolved. It appears that these controllers, like small children, have tantrums and fall apart. A power cycle clears the condition. Not the best controller for an OSS. --Jeff ---mobile signature--- Jeff Johnson - Aeon Computing jeff.john...@aeoncomputing.com On Apr 6, 2011, at 1:05, Thomas Rotht.r...@gsi.de wrote: We have ~ 60 servers with these Adaptec controllers, and found this problem just to happen from time to time. Upgrade of the aacraid module wouldn't help. We had contacts to Adaptec, but they had no clue either. Only good thing is it seems that this adapter panic happens in an instant, halting the machine, but has no prior phase of degradation: the controller doesn't start leaving out every second bit or just writing the '1's and not the '0's or ... - so whatever data has made it to the disks before the crash seems to be quite sensible. Reboot and never buy Adaptec again. Cheers, Thomas On 04/06/2011 07:03 AM, David Noriega wrote: Ok I updated the aacraid driver and the raid firmware, yet I still had the problem happen, so I did more research and applied the following tweaks: 1) Rebuilt mkinitrd with the following options: a) edit /etc/sysconfig/mkinitrid/multipath to contain MULTIPATH=yes b) mkinitrid initrd-2.6.18-194.3.1.el5_lustre.1.8.4.img 2.6.18-194.3.1.el5_lustre.1.8.4 --preload=scsi_dh_rdac 2) Added the local hard disk to the multipath black list 3) Edited modprobe.conf to have the following aacraid options: options aacraid firmware_debug=2 startup_timeout=60 #the debug doesn't seem to print anything to dmesg 4) Added pcie_aspm=off to the kernel boot options So things looked good for a while. I did have a problem mounting the lustre partitions but this was my fault in misconfiguring some lnet options I was experimenting with. I fixed that and just as a test, I ran 'modprobe lustre' since I wasn't ready to fail back the partitions just yet(wanted to wait till when activity was the lowest). That was earlier today. I was about to fail back tonight, yet when I checked the server again I saw in dmesg the same aacraid problems from before. Is it possible lustre is interfering with aacraid? Its weird since I do have a duplicate machine and its not having any of thise problems. On Fri, Mar 25, 2011 at 9:55 AM, Temple Jasonjtem...@cscs.ch wrote: Adaptec should have the firmware and drivers on their site for your card. If not adaptec, then SOracle will have it available somewhere. The firmware and system drivers usually have a utility that will check the current version and upgrade it for you. Hope this helps (I use different cards, so I can't tell you exactly). -Jason -Original Message- From: David Noriega [mailto:tsk...@my.utsa.edu] Sent: venerdì, 25. marzo 2011 15:47 To: Temple Jason Subject: Re: [Lustre-discuss] aacraid kernel panic caused failover Hmm not sure, whats the best way to find out? On Fri, Mar 25, 2011 at 9:46 AM, Temple Jasonjtem...@cscs.ch wrote: Hi, Are you using the latest firmware? This sort of thing used to happen to me, but with different raid cards. -Jason -Original Message- From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of David Noriega Sent: venerdì, 25. marzo 2011 15:38 To: lustre-discuss@lists.lustre.org Subject: [Lustre-discuss] aacraid kernel panic caused failover
Re: [Lustre-discuss] LNET routing question
What about this example? http://comments.gmane.org/gmane.comp.file-systems.lustre.user/6687 Also to my second question, would these changes have to be done all at once? or could I edit one modprobe.conf at a time and fail over then back as I make changes to each oss/mds? Thanks David On Tue, Apr 5, 2011 at 11:52 AM, Cliff White cli...@whamcloud.com wrote: Lustre routing is to connect different types of network. If all your networks are TCP, you should be able to use standard TCP routing/addressing without needing Lustre routers. Again, if the Linux workstations in your lab are TCP, you should be able to create a TCP route to the Lustre servers without needing a Lustre router in the middle, unless you have some barrier, and you need a lustre router to cross that barrier. Generally, people do not use routers as clients, there is nothing stopping your from doing this, but a) the router will take resources away from the clients, impacting performance of both. b) again,clients are typically endpoints, and routers sit in the middle, so from a network design perspective it's usually silly. Also, lustre routers function as a pool, failed routers are bypassed, so a pool of dedicated routers can tolerate individual machine outages. That's not good for clients. But the main reason people do dedicated boxes for Lustre routing, is that Lustre routing is designed to bridge different network hardware. If all your nets are TCP, I think using standard networking methods will be better for you, simpler and easier to maintain. cliffw On Mon, Apr 4, 2011 at 6:50 PM, David Noriega tsk...@my.utsa.edu wrote: The file server does sit on both networks, internal and external. I would just like to have a thrid option beyond nfs/samba, such as making the linux workstations up in our lab, lustre clients. But you are saying either 1) I do some sort of regular tcp routing? or 2) an existing client cannot also work as a router? On Mon, Apr 4, 2011 at 3:43 PM, Cliff White cli...@whamcloud.com wrote: On Mon, Apr 4, 2011 at 1:32 PM, David Noriega tsk...@my.utsa.edu wrote: Reading up on LNET routing and have a question. Currently have nothing special going on, simply specified tcp0(bond0) on the OSSs and MDS. Same for all the clients as well, we have an internal network for our cluster, 192.168.x.x. How would I go about doing the following? Data1,Data2 = OSS, Meta1,Meta2 = MDS. Internally its 192.168.1.x for cluster nodes, 192.168.5.x for lustre nodes. But I would like a 1) a 'forwarding' sever, which would be our file server which exports lustre via samba/nfs to also be the outside world's access point to lustre(outside world being the rest of the campus). 2) a second internal network simply connecting the OSSs and MDS to the backup client to do backups outside of the cluster network. Slightly confused am I. 1) is just a samba/nfs exporter, while you might have two networks in the one box, you wouldn't be doing any routing, the Lustre client is re-exporting the FS. The Lustre client has to find the Lustre servers, the samba/NFS clients only have to find the Lustre client. 2) if the second internal net connects backup clients directly to OSS/MDS you again need no routing. Lustre Routing is really to connect disparte network hardware for Lustre traffic, for example Infiniband routed to TCP/IP, or Quadratics to IB. Also, file servers are never routers, since they have direct connections to all clients. Routers are dedicated nodes that have both hardware interfaces and sit between a client and server. Typical setup are things like a cluster with server and clients on IB, you wish to add a second client pool on TCP/IP, you have to build nodes that have both TCP/IP and IB interfaces, and those are Lustre Routers. Since all your traffic is TCP/IP, sounds like normal TCP/IP network manipulation is all you are needing. You would need the 'lnet networks' stuff to align nets with interfaces, and that part looks correct. cliffw So would I do the following? OSS/MDS options lnet networks=tcp0(bond0),tcp1(eth3) routes=tcp2 192.168.2.1 Backup client options lnet networks=tcp1(eth1) Cluster clients options lnet networks=tcp0(eth0) File Server options lnet networks=tcp0(eth1),tcp2(eth2) forwarding=enabled And for any outside clients I would do the following? options lnet networks=tcp2(eth0) And when mounting from the outside I would use in /etc/fstab the external ip? x.x.x.x@tcp2:/lustre /lustre lustre defaults,_netdev 0 0 Is this how it would work? Also can I do this piece-meal or does it have to be done all at once? Thanks David -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector
Re: [Lustre-discuss] LNET routing question
Well I would call our setup a barrier case. The internal 192.168.x.x network is completely internal to the cluster, inaccessible from the outside. So following this I can setup a router machine to allow access from the external network to lustre, correct? On part two, i would simply be adding tcp1 to the oss/mds, tcp0 which everything already connects to would still be there. So it was my guess and looks like you agree that so long as the clients continue to use tcp0, which they will, they will still be able to connect just fine. tcp1 would be just for the backup client. Thanks David On Tue, Apr 5, 2011 at 1:48 PM, Cliff White cli...@whamcloud.com wrote: That's the 'barrier' case i was talking about - using routers to separate public/private networks - basically using a Lustre router as a hole through a firewall. Second question - depends on your world. Obviously, machines with mis-matched network configs may not be able to comunicate, so depends on whether you can tolerate some clients not reaching some OST while the changes are rolling through. I would think adding the second net for the backup would be transparent to existing clients, modulo the OST restart needed. cliffw On Tue, Apr 5, 2011 at 11:36 AM, David Noriega tsk...@my.utsa.edu wrote: What about this example? http://comments.gmane.org/gmane.comp.file-systems.lustre.user/6687 Also to my second question, would these changes have to be done all at once? or could I edit one modprobe.conf at a time and fail over then back as I make changes to each oss/mds? Thanks David On Tue, Apr 5, 2011 at 11:52 AM, Cliff White cli...@whamcloud.com wrote: Lustre routing is to connect different types of network. If all your networks are TCP, you should be able to use standard TCP routing/addressing without needing Lustre routers. Again, if the Linux workstations in your lab are TCP, you should be able to create a TCP route to the Lustre servers without needing a Lustre router in the middle, unless you have some barrier, and you need a lustre router to cross that barrier. Generally, people do not use routers as clients, there is nothing stopping your from doing this, but a) the router will take resources away from the clients, impacting performance of both. b) again,clients are typically endpoints, and routers sit in the middle, so from a network design perspective it's usually silly. Also, lustre routers function as a pool, failed routers are bypassed, so a pool of dedicated routers can tolerate individual machine outages. That's not good for clients. But the main reason people do dedicated boxes for Lustre routing, is that Lustre routing is designed to bridge different network hardware. If all your nets are TCP, I think using standard networking methods will be better for you, simpler and easier to maintain. cliffw On Mon, Apr 4, 2011 at 6:50 PM, David Noriega tsk...@my.utsa.edu wrote: The file server does sit on both networks, internal and external. I would just like to have a thrid option beyond nfs/samba, such as making the linux workstations up in our lab, lustre clients. But you are saying either 1) I do some sort of regular tcp routing? or 2) an existing client cannot also work as a router? On Mon, Apr 4, 2011 at 3:43 PM, Cliff White cli...@whamcloud.com wrote: On Mon, Apr 4, 2011 at 1:32 PM, David Noriega tsk...@my.utsa.edu wrote: Reading up on LNET routing and have a question. Currently have nothing special going on, simply specified tcp0(bond0) on the OSSs and MDS. Same for all the clients as well, we have an internal network for our cluster, 192.168.x.x. How would I go about doing the following? Data1,Data2 = OSS, Meta1,Meta2 = MDS. Internally its 192.168.1.x for cluster nodes, 192.168.5.x for lustre nodes. But I would like a 1) a 'forwarding' sever, which would be our file server which exports lustre via samba/nfs to also be the outside world's access point to lustre(outside world being the rest of the campus). 2) a second internal network simply connecting the OSSs and MDS to the backup client to do backups outside of the cluster network. Slightly confused am I. 1) is just a samba/nfs exporter, while you might have two networks in the one box, you wouldn't be doing any routing, the Lustre client is re-exporting the FS. The Lustre client has to find the Lustre servers, the samba/NFS clients only have to find the Lustre client. 2) if the second internal net connects backup clients directly to OSS/MDS you again need no routing. Lustre Routing is really to connect disparte network hardware for Lustre traffic, for example Infiniband routed to TCP/IP, or Quadratics to IB. Also, file servers are never routers, since they have direct connections to all clients. Routers are dedicated nodes that have both hardware interfaces
Re: [Lustre-discuss] aacraid kernel panic caused failover
Ok I updated the aacraid driver and the raid firmware, yet I still had the problem happen, so I did more research and applied the following tweaks: 1) Rebuilt mkinitrd with the following options: a) edit /etc/sysconfig/mkinitrid/multipath to contain MULTIPATH=yes b) mkinitrid initrd-2.6.18-194.3.1.el5_lustre.1.8.4.img 2.6.18-194.3.1.el5_lustre.1.8.4 --preload=scsi_dh_rdac 2) Added the local hard disk to the multipath black list 3) Edited modprobe.conf to have the following aacraid options: options aacraid firmware_debug=2 startup_timeout=60 #the debug doesn't seem to print anything to dmesg 4) Added pcie_aspm=off to the kernel boot options So things looked good for a while. I did have a problem mounting the lustre partitions but this was my fault in misconfiguring some lnet options I was experimenting with. I fixed that and just as a test, I ran 'modprobe lustre' since I wasn't ready to fail back the partitions just yet(wanted to wait till when activity was the lowest). That was earlier today. I was about to fail back tonight, yet when I checked the server again I saw in dmesg the same aacraid problems from before. Is it possible lustre is interfering with aacraid? Its weird since I do have a duplicate machine and its not having any of thise problems. On Fri, Mar 25, 2011 at 9:55 AM, Temple Jason jtem...@cscs.ch wrote: Adaptec should have the firmware and drivers on their site for your card. If not adaptec, then SOracle will have it available somewhere. The firmware and system drivers usually have a utility that will check the current version and upgrade it for you. Hope this helps (I use different cards, so I can't tell you exactly). -Jason -Original Message- From: David Noriega [mailto:tsk...@my.utsa.edu] Sent: venerdì, 25. marzo 2011 15:47 To: Temple Jason Subject: Re: [Lustre-discuss] aacraid kernel panic caused failover Hmm not sure, whats the best way to find out? On Fri, Mar 25, 2011 at 9:46 AM, Temple Jason jtem...@cscs.ch wrote: Hi, Are you using the latest firmware? This sort of thing used to happen to me, but with different raid cards. -Jason -Original Message- From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of David Noriega Sent: venerdì, 25. marzo 2011 15:38 To: lustre-discuss@lists.lustre.org Subject: [Lustre-discuss] aacraid kernel panic caused failover Had some crazyness happen to our lustre system. We have two OSSs, both identical sun x4140 servers and on only one of them have I've seen this pop up in the kernel messages and then a kernel panic. The panic seemed to then spread and caused the network to go down and the second OSS to try to failover(or failback?). Anyways 'splitbrain' occurred and I was able to get in and set them straight. I researched this aacraid module messages and so far all I can find says to increase the timeout, but these are old messages and currently they are set to 60. Anyone else have any ideas? aacraid: Host adapter abort request (0,0,0,0) aacraid: Host adapter reset request. SCSI hang ? AAC: Host adapter BLINK LED 0xef AAC0: adapter kernel panic'd ef. -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] LNET routing question
Reading up on LNET routing and have a question. Currently have nothing special going on, simply specified tcp0(bond0) on the OSSs and MDS. Same for all the clients as well, we have an internal network for our cluster, 192.168.x.x. How would I go about doing the following? Data1,Data2 = OSS, Meta1,Meta2 = MDS. Internally its 192.168.1.x for cluster nodes, 192.168.5.x for lustre nodes. But I would like a 1) a 'forwarding' sever, which would be our file server which exports lustre via samba/nfs to also be the outside world's access point to lustre(outside world being the rest of the campus). 2) a second internal network simply connecting the OSSs and MDS to the backup client to do backups outside of the cluster network. So would I do the following? OSS/MDS options lnet networks=tcp0(bond0),tcp1(eth3) routes=tcp2 192.168.2.1 Backup client options lnet networks=tcp1(eth1) Cluster clients options lnet networks=tcp0(eth0) File Server options lnet networks=tcp0(eth1),tcp2(eth2) forwarding=enabled And for any outside clients I would do the following? options lnet networks=tcp2(eth0) And when mounting from the outside I would use in /etc/fstab the external ip? x.x.x.x@tcp2:/lustre /lustre lustre defaults,_netdev 0 0 Is this how it would work? Also can I do this piece-meal or does it have to be done all at once? Thanks David -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] aacraid kernel panic caused failover
Had some crazyness happen to our lustre system. We have two OSSs, both identical sun x4140 servers and on only one of them have I've seen this pop up in the kernel messages and then a kernel panic. The panic seemed to then spread and caused the network to go down and the second OSS to try to failover(or failback?). Anyways 'splitbrain' occurred and I was able to get in and set them straight. I researched this aacraid module messages and so far all I can find says to increase the timeout, but these are old messages and currently they are set to 60. Anyone else have any ideas? aacraid: Host adapter abort request (0,0,0,0) aacraid: Host adapter reset request. SCSI hang ? AAC: Host adapter BLINK LED 0xef AAC0: adapter kernel panic'd ef. -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Help debugging a client
kernel ver 2.6.18-194.3.1.el5_lustre.1.8.4, downloaded from lustre recompiled. How can I check the stack size and how would I increase it? On Fri, Mar 11, 2011 at 1:17 PM, Michael Barnes michael.bar...@jlab.org wrote: David, What kernel are you running on the file server? I've heard on the list that the stock RedHat kernels are compiled with too small of a stack size option and that running NFS and lustre on the same node will not behave well together. A minimum of a 8k stack size is needed for this configuration. -mb On Mar 11, 2011, at 12:37 PM, David Noriega wrote: We've been running Lustre happily for a few months now, but we have one client that can be troublesome at times and it happens to be the most important client. Its our file server client as it runs NFS and Samba. I'm not sure where to start. I've seen this client disconnect from lustre nodes, but then recover and reconnect. There are hundreds of messages in dmesg about a few inodes. The big problem happened a few weeks ago when this client was booted and never could reconnect. The client and the lustre nodes simply kept saying HELLO to each other. Anyways as of right now this is what I see in dmesg: nfsd: non-standard errno: -108 LustreError: 30558:0:(mdc_locks.c:646:mdc_enqueue()) ldlm_cli_enqueue: -108 LustreError: 30558:0:(mdc_locks.c:646:mdc_enqueue()) Skipped 2114 previous similar messages LustreError: 30558:0:(file.c:3280:ll_inode_revalidate_fini()) failure -108 inode 561619132 LustreError: 30558:0:(file.c:3280:ll_inode_revalidate_fini()) Skipped 777 previous similar messages LustreError: 29282:0:(file.c:116:ll_close_inode_openhandle()) inode 18382976 mdc close failed: rc = -108 nfsd: non-standard errno: -108 LustreError: 29282:0:(file.c:116:ll_close_inode_openhandle()) Skipped 17238 previous similar messages nfsd: non-standard errno: -108 nfsd: non-standard errno: -108 nfsd: non-standard errno: -108 nfsd: non-standard errno: -108 nfsd: non-standard errno: -108 LustreError: 29282:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@81032da81800 x1360479978792199/t0 o35-lustre-MDT_UUID@192.168.5.104@tcp:23/10 lens 408/1128 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 LustreError: 29282:0:(client.c:858:ptlrpc_import_delay_req()) Skipped 19011 previous similar messages nfsd: non-standard errno: -108 LustreError: 11-0: an error occurred while communicating with 192.168.5.104@tcp. The mds_close operation failed with -116 LustreError: 520:0:(file.c:116:ll_close_inode_openhandle()) inode 12094041 mdc close failed: rc = -116 LustreError: 30271:0:(llite_nfs.c:96:search_inode_for_lustre()) failure -2 inode 560111661 Any ideas? -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- +--- | Michael Barnes | | Thomas Jefferson National Accelerator Facility | Scientific Computing Group | 12000 Jefferson Ave. | Newport News, VA 23606 | (757) 269-7634 +--- ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Setting up quotas after the fact
I've been reading up on setting up quotas and looks like luster needs to be shut down for that as it scans the entire filesystem. The thing is we already have ours up and running and with quite a bit of data on it. So any idea on how to estimate how long it would be to setup quotas on lustre? David -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Setting up quotas after the fact
Well we are running lustre 1.8.4, so thats great to hear. Thanks On Thu, Mar 10, 2011 at 12:15 PM, Johann Lombardi joh...@whamcloud.com wrote: On Thu, Mar 10, 2011 at 11:51:44AM -0600, David Noriega wrote: I've been reading up on setting up quotas and looks like luster needs to be shut down for that as it scans the entire filesystem. The thing The problem is that accounting can be wrong if files/blocks are allocated/freed during the scan. is we already have ours up and running and with quite a bit of data on it. So any idea on how to estimate how long it would be to setup quotas on lustre? quotacheck has been greatly improved in 1.8.2 (see bugzilla ticket 19763 for more information). As an example, quotacheck takes approximately 5min to complete when run against a 3.4TB filesystem (2 OSTs) which is 87% full. Cheers, Johann -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Metadata performance question
If I'm wrong please let me know, but my understanding of how lustre 1.8 works is metadata is only accessible from a single host. So should there be alot of activity, the metadata server becomes a bottleneck. But I've heard that in ver 2.x that we'll be able to setup multiple machines for metadata just like for the OSSs, and that should cut down on a bottleneck when accessing metadata information. -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Setting up quotas
Can I setup quotas after lustre is active? Or does that require taking everything offline? Or could I just run lfs quota on and then start setting quotas for every user? Will running this command on one client then effect all of them? or do I have to run it everywhere? And is there a way to notify users or at least the admins via email? Or is it simply something that is returned on the shell? David -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Strange messages from samba
So then Samba isn't Lustre-aware in the sense it checks and respects quotas? On Tue, Oct 5, 2010 at 7:18 AM, Johann Lombardi johann.lomba...@oracle.com wrote: Hi David, On Mon, Oct 04, 2010 at 12:09:21PM -0500, David Noriega wrote: Moved our samba server to use Lustre as its backend file system and things look like they are working, but I'm seeing the following message repeat over and over [2010/10/04 11:09:40, 0] lib/sysquotas.c:sys_get_quota(421) sys_path_to_bdev() failed for path [.]! [...] Any ideas? A quick google search shows that others - who don't export a lustre fs - get the same error message, so i would recommend to try the samba mailing list instead. That being said, if samba tries to access quota information through standard quotactl(2) calls, it cannot work since Lustre has its own quota administrative interface (see llapi_quotactl(3)). HTH Cheers, Johann -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Strange messages from samba
Moved our samba server to use Lustre as its backend file system and things look like they are working, but I'm seeing the following message repeat over and over [2010/10/04 11:09:40, 0] lib/sysquotas.c:sys_get_quota(421) sys_path_to_bdev() failed for path [.]! [2010/10/04 11:09:40, 0] lib/sysquotas.c:sys_get_quota(421) sys_path_to_bdev() failed for path [.]! [2010/10/04 11:09:45, 0] lib/sysquotas.c:sys_get_quota(421) sys_path_to_bdev() failed for path [.]! [2010/10/04 11:09:45, 0] lib/sysquotas.c:sys_get_quota(421) sys_path_to_bdev() failed for path [.]! Any ideas? -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Profiling data
This question isn't really about Lustre, but file system administration. I was wondering what tools exist, particularly anything free/open source, that can scan for old files and either report to the admin or user that said files are say 1yr old, please archive them or delete them. Also any tools that can profile file types, such as to check if someone is keeping their mp3 library on our server. Thanks David -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Exporting lustre over nfs
I've read you can export lustre via nfs but I'm running into some trouble. I tried nfs3 but when I would check a directory, all the files where labeled red and ls -al showed no username or permissions, just ? This was on the server nfsd: non-standard errno: -43 LustreError: 11-0: an error occurred while communicating with 192.168.5@tcp. The mds_getxattr operation failed with -43 nfsd: non-standard errno: -43 LustreError: 11-0: an error occurred while communicating with 192.168.5@tcp. The mds_getxattr operation failed with -43 nfsd: non-standard errno: -43 So then I tried out nfs4 and trying to navigate or ls into the nfs mount would hang and I would on get the mds_getxattr error. Something I'm doing wrong? -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Samba and file locking
No, we will only have a single samba server sharing out lustre-backed files. What do you mean in a way similar to samba? What does samba do that is different? We are using lustre to replace our old nfs server for serving up home directories in our cluster and the rest of our systems. On Fri, Aug 27, 2010 at 6:15 PM, Oleg Drokin oleg.dro...@oracle.com wrote: Hello! On Aug 27, 2010, at 6:41 PM, David Noriega wrote: But I also found out about the flock option for lustre. Should I set flock on all clients? or can I just use localflock option on the fileserver? It depends. If you are 100% sure none of your other clients use flocks in a way similar to samba to guard their file accesses AND you don't export (same fs with) samba from more than one node, you can mount with localflock on samba-exporting node. Otherwise you need to mount with flock, but please be aware that flock is not exactly cheap in lustre, every flock operation is a synchronous RPC plus it puts even more load on MDS and some applications start to use flock once they see it as available resulting in possible unexpected slowdowns (MPI apps in some IO modes without lustre ADIO driver tend to do this, I think) Bye, Oleg -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Samba and file locking
Well the samba server will be just for that, but we only have the single filesystem '/lustre' So because of that I'm going to have to put the flock option on all of the clients? this was my original question. On Mon, Aug 30, 2010 at 10:52 AM, Mark Hahn h...@mcmaster.ca wrote: No, we will only have a single samba server sharing out lustre-backed files. What do you mean in a way similar to samba? What does samba do that is different? We are using lustre to replace our old nfs server for serving up home directories in our cluster and the rest of our systems. what he meant is that if lustre is backing a single samba server, and the shared filesystem is only available via samba, you can turn optimize from flock to localflock. that is, since flock is relatively expensive, localflock provides the behavior within a single client, such as the machine running samba. if you have other lustre clients also mounting that filesystem, you'll need flock not localflock to provide consistency. -mark On Fri, Aug 27, 2010 at 6:15 PM, Oleg Drokin oleg.dro...@oracle.com wrote: Hello! On Aug 27, 2010, at 6:41 PM, David Noriega wrote: But I also found out about the flock option for lustre. Should I set flock on all clients? or can I just use localflock option on the fileserver? It depends. If you are 100% sure none of your other clients use flocks in a way similar to samba to guard their file accesses AND you don't export (same fs with) samba from more than one node, you can mount with localflock on samba-exporting node. Otherwise you need to mount with flock, but please be aware that flock is not exactly cheap in lustre, every flock operation is a synchronous RPC plus it puts even more load on MDS and some applications start to use flock once they see it as available resulting in possible unexpected slowdowns (MPI apps in some IO modes without lustre ADIO driver tend to do this, I think) Bye, Oleg -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Samba and file locking
Are their issues with Samba and Lustre working together? I remember something about turning oplocks off in samba, and while testing samba I noticed this [2010/08/27 17:30:59, 3] lib/util.c:fcntl_getlock(2064) fcntl_getlock: lock request failed at offset 75694080 count 65536 type 1 (Function not implemented) But I also found out about the flock option for lustre. Should I set flock on all clients? or can I just use localflock option on the fileserver? David -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] LNET internal/external question
OK our lustre system is up and running, but currently its hooked into our internal network. How do we go about accessing it from the external network(university). Its the basic setup, two OSSs, and two MDS/MGS, all setup with failover, all mount options are currently set using their internal ips(192.168.x.x). When these machines are given a public ip, do I have to change anything to allow access from external clients(ie not from 192.168.x.x space). David -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Configuration question
I'm curious about the underlying framework of lustre in regards to failover. When creating the filesystems, one can provide --failnode=x.x@tcp0 and even for the OSTs you can provide two nids for the MDS/MGS. What do these options tell lustre and the clients? Are these required for use with heartbeat? If so why doesn't that second of the manual reference this? Also I think there is a typo in 4.5 Operational Scenarios, where it says one can use 'mkfs.lustre --ost --mgs --fsname=' That of course returns an error. David -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] mkfs.lustre and failover question
I've read through the 'More Complicated Configurations' section in the manual and it says as part of setting up failover with two(active/passive) MDS/MGS and two OSSs(active/active) to use the following: mkfs.lustre --fsname=lustre --ost --failnode=192.168.5@tcp0 --mgsnode=192.168.5@tcp0,192.168.5@tcp0 /dev/lustre-ost1-dg1/lv1 Yet it fails when I try to mount: kjournald starting. Commit interval 5 seconds LDISKFS FS on dm-9, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds LDISKFS FS on dm-9, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled Lustre: 5984:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request x1343800163193112 sent from mgc192.168.5@tcp to NID 192.168.5@tcp 5s ago has timed out (5s prior to deadline). r...@810118a53400 x1343800163193112/t0 o250-m...@mgc192.168.5.104@tcp_0:26/25 lens 368/584 e 0 to 1 dl 1282144240 ref 1 fl Rpc:N/0/0 rc 0/0 LustreError: 4854:0:(obd_mount.c:1095:server_start_targets()) Required registration failed for lustre-OST: -4 LustreError: 4854:0:(obd_mount.c:1653:server_fill_super()) Unable to start targets: -4 LustreError: 4854:0:(obd_mount.c:1436:server_put_super()) no obd lustre-OST LustreError: 4854:0:(obd_mount.c:147:server_deregister_mount()) lustre-OST not registered LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost LDISKFS-fs: mballoc: 0 generated and it took 0 LDISKFS-fs: mballoc: 0 preallocated, 0 discarded Reading that makes me think its looking for 192.168.5.105 to be an active MGS/MDS as well as 192.168.5.104(which is the primary). David -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Splitting lustre space
OK hooray! Lustre setup with failover of all nodes, but now we have this huge lustre mount point. How can I say create /lustre/home and /lustre/groups and mount on the client? David -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Splitting lustre space
Ok, so I could do mount --bind /lustre/home /home mount --bind /lustre/groups /groups Is this a generally accepted practice with Lustre? This just seems so much like a nifty trick, but if its what the community uses, then ok. But ultimately if I wanted two separate filesystems, I would need more hardware? An OST can't be put into a general 'pool' for use between the two? David On Wed, Aug 18, 2010 at 12:33 PM, Kevin Van Maren kevin.van.ma...@oracle.com wrote: David Noriega wrote: OK hooray! Lustre setup with failover of all nodes, but now we have this huge lustre mount point. How can I say create /lustre/home and /lustre/groups and mount on the client? David Two choices: 1) create two Lustre file systems (separate MDT and OSTs for each) 2) use mount --bind on the client to make one filesystem's directories show up in different places -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Question on setting up fail-over
Some info: MDS/MGS 192.168.5.104 Passive failover MDS/MGS 192.168.5.105 OSS1 192.168.5.100 OSS2 192.168.5.101 I've got some more questions about setting up failover. Besides having heartbeat setup, what about using tunefs.lustre to set options? On the MDS/MGS I set the following options tunefs.lustre --failnode=192.168.5.105 /dev/lustre-mdt-dg/lv1 Heartbeat works just fine, can mount on the primary node and then failover to the other and back. Now on the OSSs things get a bit more confusing. Reading these two blog posts: http://mergingbusinessandit.blogspot.com/2008/12/implementing-lustre-failover.html http://jermen.posterous.com/lustre-mds-failover From these I tried these options: tunefs.lustre --erase-params --mgsnode=192.168.5@tcp0 --mgsnode=192.168.5@tcp0 --failover=192.168.5@tcp0 -write-params /dev/lustre-ost1-dg1/lv1 I ran that for all for OSTs, changing the failover option on the last two OSTs to point OSS1 while the first two point to OST2. My understanding is that you mount the OSTs first, then the MDS, but the OSTs are failing to mount. Are all these options needed? Or is simply specifying the primary MDS is enough for it to find out about the second MDS? David On Mon, Aug 16, 2010 at 2:14 PM, Kevin Van Maren kevin.van.ma...@oracle.com wrote: David Noriega wrote: Ok I've gotten heartbeat setup with the two OSSs, but I do have a question that isn't stated in the documentation. Shouldn't the lustre mounts be removed from fstab once they are given to heartbeat since when it comes online, it will mount the resources, correct? David Yes: on the servers, they must be not there or noauto. Once you start running heartbeat, you have given control of the resource away, and must not mount/umount it yourself (unless you stop heartbeat on both nodes in the HA pair to get control back). Kevin -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Question on setting up fail-over
Oppps some how I changed the target name of all OSTs to lustre-OST and trying to mount any other ost fails. I've gone and found the 'More Complicated Configuration' section which details the usage of --mgsnode=nid1,nid2 and so using this I think I'll just reformat. On Tue, Aug 17, 2010 at 11:26 AM, David Noriega tsk...@my.utsa.edu wrote: Some info: MDS/MGS 192.168.5.104 Passive failover MDS/MGS 192.168.5.105 OSS1 192.168.5.100 OSS2 192.168.5.101 I've got some more questions about setting up failover. Besides having heartbeat setup, what about using tunefs.lustre to set options? On the MDS/MGS I set the following options tunefs.lustre --failnode=192.168.5.105 /dev/lustre-mdt-dg/lv1 Heartbeat works just fine, can mount on the primary node and then failover to the other and back. Now on the OSSs things get a bit more confusing. Reading these two blog posts: http://mergingbusinessandit.blogspot.com/2008/12/implementing-lustre-failover.html http://jermen.posterous.com/lustre-mds-failover From these I tried these options: tunefs.lustre --erase-params --mgsnode=192.168.5@tcp0 --mgsnode=192.168.5@tcp0 --failover=192.168.5@tcp0 -write-params /dev/lustre-ost1-dg1/lv1 I ran that for all for OSTs, changing the failover option on the last two OSTs to point OSS1 while the first two point to OST2. My understanding is that you mount the OSTs first, then the MDS, but the OSTs are failing to mount. Are all these options needed? Or is simply specifying the primary MDS is enough for it to find out about the second MDS? David On Mon, Aug 16, 2010 at 2:14 PM, Kevin Van Maren kevin.van.ma...@oracle.com wrote: David Noriega wrote: Ok I've gotten heartbeat setup with the two OSSs, but I do have a question that isn't stated in the documentation. Shouldn't the lustre mounts be removed from fstab once they are given to heartbeat since when it comes online, it will mount the resources, correct? David Yes: on the servers, they must be not there or noauto. Once you start running heartbeat, you have given control of the resource away, and must not mount/umount it yourself (unless you stop heartbeat on both nodes in the HA pair to get control back). Kevin -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Question on setting up fail-over
That is good to know, but already started formatting. No issues as it hasn't been put into production, just playing with it and working kinks like this out. Though formatting the OSTs was rather quick while the MDT is taking some time. Is this normal? 192.168.5.105 is the other(standby) mds node. [r...@meta1 ~]# mkfs.lustre --reformat --fsname=lustre --mgs --mdt --failnode=192.168.5@tcp0 /dev/lustre-mdt-dg/lv1 Permanent disk data: Target: lustre-MDT Index: unassigned Lustre FS: lustre Mount type: ldiskfs Flags: 0x75 (MDT MGS needs_index first_time update ) Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro Parameters: failover.node=192.168.5@tcp mdt.group_upcall=/usr/sbin/l_getgroups device size = 2323456MB 2 6 18 formatting backing filesystem ldiskfs on /dev/lustre-mdt-dg/lv1 target name lustre-MDT 4k blocks 594804736 options-J size=400 -i 4096 -I 512 -q -O dir_index,extents,uninit_groups,mmp -F mkfs_cmd = mke2fs -j -b 4096 -L lustre-MDT -J size=400 -i 4096 -I 512 -q -O dir_index,extents,uninit_groups,mmp -F /dev/lustre-mdt-dg/lv1 594804736 David On Tue, Aug 17, 2010 at 12:27 PM, Wojciech Turek wj...@cam.ac.uk wrote: Hi David, You need to umount your OSTs and MDTs and run tunefs.lustre --writeconf /dev/lustre device on all Lustre OSTs and MDTs This will force the lustre targets to fetch new configuration next time they are mounted. The order of mounting is: MGT - MDT - OSTs Best regards, Wojciech On 17 August 2010 18:19, David Noriega tsk...@my.utsa.edu wrote: Oppps some how I changed the target name of all OSTs to lustre-OST and trying to mount any other ost fails. I've gone and found the 'More Complicated Configuration' section which details the usage of --mgsnode=nid1,nid2 and so using this I think I'll just reformat. On Tue, Aug 17, 2010 at 11:26 AM, David Noriega tsk...@my.utsa.edu wrote: Some info: MDS/MGS 192.168.5.104 Passive failover MDS/MGS 192.168.5.105 OSS1 192.168.5.100 OSS2 192.168.5.101 I've got some more questions about setting up failover. Besides having heartbeat setup, what about using tunefs.lustre to set options? On the MDS/MGS I set the following options tunefs.lustre --failnode=192.168.5.105 /dev/lustre-mdt-dg/lv1 Heartbeat works just fine, can mount on the primary node and then failover to the other and back. Now on the OSSs things get a bit more confusing. Reading these two blog posts: http://mergingbusinessandit.blogspot.com/2008/12/implementing-lustre-failover.html http://jermen.posterous.com/lustre-mds-failover From these I tried these options: tunefs.lustre --erase-params --mgsnode=192.168.5@tcp0 --mgsnode=192.168.5@tcp0 --failover=192.168.5@tcp0 -write-params /dev/lustre-ost1-dg1/lv1 I ran that for all for OSTs, changing the failover option on the last two OSTs to point OSS1 while the first two point to OST2. My understanding is that you mount the OSTs first, then the MDS, but the OSTs are failing to mount. Are all these options needed? Or is simply specifying the primary MDS is enough for it to find out about the second MDS? David On Mon, Aug 16, 2010 at 2:14 PM, Kevin Van Maren kevin.van.ma...@oracle.com wrote: David Noriega wrote: Ok I've gotten heartbeat setup with the two OSSs, but I do have a question that isn't stated in the documentation. Shouldn't the lustre mounts be removed from fstab once they are given to heartbeat since when it comes online, it will mount the resources, correct? David Yes: on the servers, they must be not there or noauto. Once you start running heartbeat, you have given control of the resource away, and must not mount/umount it yourself (unless you stop heartbeat on both nodes in the HA pair to get control back). Kevin -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- Wojciech Turek Senior System Architect High Performance Computing Service University of Cambridge Email: wj...@cam.ac.uk Tel: (+)44 1223 763517 -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked
[Lustre-discuss] needs_recovery flag?
Still very new to lustre, and now I'm going over the failover part. I use tune2fs to set MMP, but I would get this warning about needs_recovery, do a journal replay or else the setting will be lost. With dumpe2fs I could see the needs_recovery flag was set on all of the OST/MDT. Reading over the recovery part, nothing really matching what was going on here, I elected to use e2fsck -fn then e2fsck -fp on all of the OST/MDT, now the needs_recovery flag is gone and I was able to set MMP to on. So my question is, what did I do to get this sort of thing to happen? And how to avoid it? David -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Getting weird disk errors, no apparent impact
We have three Sun StorageTek 2150, one connected to the metadata server and two crossconnected to the two data storage nodes. They are connected via fiber using the qla2xxx driver that comes with CentOS 5.5. The multipath daemon has the following config: defaults { udev_dir/dev polling_interval10 selectorround-robin 0 path_grouping_policymultibus getuid_callout /sbin/scsi_id -g -u -s /block/%n prio_callout /sbin/mpath_prio_rdac /dev/%n path_checkerrdac rr_min_io 100 max_fds 8192 rr_weight priorities failbackimmediate no_path_retry fail user_friendly_names yes } Comment out from multipath.conf file: blacklist { devnode * } On Fri, Aug 13, 2010 at 4:31 AM, Wojciech Turek wj...@cam.ac.uk wrote: Hi David, I have seen simmilar errors given out by some storage arrays. There were caused by arrays exporting volumes via more then a single path without multi path driver installed or configured properly. Some times the array controllers requires a special driver to be installed on Linux host (for example RDAC mpp driver) to properly present and handle configured volumes in the OS. What sort of disk raid array are you using? Best gerads, Wojciech On 12 August 2010 17:58, David Noriega tsk...@my.utsa.edu wrote: We just setup a lustre system, and all looks good, but there is this nagging error thats floating about. When I reboot any of the nodes, be it a OSS or MDS, I will get this: [r...@meta1 ~]# dmesg | grep sdc sdc : very big device. try to use READ CAPACITY(16). SCSI device sdc: 4878622720 512-byte hdwr sectors (2497855 MB) sdc: Write Protect is off sdc: Mode Sense: 77 00 10 08 SCSI device sdc: drive cache: write back w/ FUA sdc : very big device. try to use READ CAPACITY(16). SCSI device sdc: 4878622720 512-byte hdwr sectors (2497855 MB) sdc: Write Protect is off sdc: Mode Sense: 77 00 10 08 SCSI device sdc: drive cache: write back w/ FUA sdc:end_request: I/O error, dev sdc, sector 0 Buffer I/O error on device sdc, logical block 0 end_request: I/O error, dev sdc, sector 0 This doesn't seem to affect anything. fdisk -l doesn't even report the device. The same(thought of course different block device sdd, sde, on the OSSs), happens on all the nodes. If I run pvdisplay or lvdisplay, I'll get this: /dev/sdc: read failed after 0 of 4096 at 0: Input/output error Any ideas? David -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- Wojciech Turek Senior System Architect High Performance Computing Service University of Cambridge Email: wj...@cam.ac.uk Tel: (+)44 1223 763517 -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Getting weird disk errors, no apparent impact
We just setup a lustre system, and all looks good, but there is this nagging error thats floating about. When I reboot any of the nodes, be it a OSS or MDS, I will get this: [r...@meta1 ~]# dmesg | grep sdc sdc : very big device. try to use READ CAPACITY(16). SCSI device sdc: 4878622720 512-byte hdwr sectors (2497855 MB) sdc: Write Protect is off sdc: Mode Sense: 77 00 10 08 SCSI device sdc: drive cache: write back w/ FUA sdc : very big device. try to use READ CAPACITY(16). SCSI device sdc: 4878622720 512-byte hdwr sectors (2497855 MB) sdc: Write Protect is off sdc: Mode Sense: 77 00 10 08 SCSI device sdc: drive cache: write back w/ FUA sdc:end_request: I/O error, dev sdc, sector 0 Buffer I/O error on device sdc, logical block 0 end_request: I/O error, dev sdc, sector 0 This doesn't seem to affect anything. fdisk -l doesn't even report the device. The same(thought of course different block device sdd, sde, on the OSSs), happens on all the nodes. If I run pvdisplay or lvdisplay, I'll get this: /dev/sdc: read failed after 0 of 4096 at 0: Input/output error Any ideas? David -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Question on setting up fail-over
Could you describe this resource fencing in more detail? As for regards to STONITH, the pdu already has the grubby hands of IT plugged into it and doubt they would be happy if I unplugged them. What about the network management port or ILOM? On Mon, Aug 9, 2010 at 1:08 PM, Kevin Van Maren kevin.van.ma...@oracle.com wrote: On Aug 9, 2010, at 11:45 AM, David Noriega tsk...@my.utsa.edu wrote: My understanding of setting up fail-over is you need some control over the power so with a script it can turn off a machine by cutting its power? Is this correct? It is the recommended configuration because it is simple to understand and implement. But the only _hard_ requirement is that both nodes can access the storage. Is there a way to do fail-over without having access to the pdu(power strips)? If you have IPMI support, that can be used for power control, instead of a switched PDU. Depending on the storage, you may be able to do resource fencing of the disks instead of STONITH. Or you can run fast-and-loose, without any way to ensure the dead node is really dead and not accessing storage (at your risk). While Lustre has MMP, it is really more to protect against a mount typo than to guarantee resource fencing. Thanks David -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Question on setting up fail-over
I think I'll go the ipmi route. So reading on STONITH, its just a script, so all I would need is a script to run ipmi that tells the server to power off, right? Also while reading through the lustre manual, seems some things are being deleted from the wiki, http://wiki.lustre.org/index.php?title=Clu_Manager no longer exists, and noticed this too when I found the lustre quick guide is no longer available. Thanks David On Tue, Aug 10, 2010 at 10:57 AM, Kevin Van Maren kevin.van.ma...@oracle.com wrote: David Noriega wrote: Could you describe this resource fencing in more detail? As for regards to STONITH, the pdu already has the grubby hands of IT plugged into it and doubt they would be happy if I unplugged them. What about the network management port or ILOM? Resource fencing is needed to ensure that a node does not take over a resource (ie, OST) while the other node is still accessing it (as could happen if the node only partly crashes, where it is not responding to the HA package but still writing to the disk). STONITH is a pretty common way to ensure the other node is dead and can no longer access the resource. If you can't use your switched PDU, then using the ILOM for IPMI-based power control works. The other common way to do resource fencing is to use scsi reserve commands (if supported by the hardware and the HA package) to ensure exclusive access. Kevin On Mon, Aug 9, 2010 at 1:08 PM, Kevin Van Maren kevin.van.ma...@oracle.com wrote: On Aug 9, 2010, at 11:45 AM, David Noriega tsk...@my.utsa.edu wrote: My understanding of setting up fail-over is you need some control over the power so with a script it can turn off a machine by cutting its power? Is this correct? It is the recommended configuration because it is simple to understand and implement. But the only _hard_ requirement is that both nodes can access the storage. Is there a way to do fail-over without having access to the pdu(power strips)? If you have IPMI support, that can be used for power control, instead of a switched PDU. Depending on the storage, you may be able to do resource fencing of the disks instead of STONITH. Or you can run fast-and-loose, without any way to ensure the dead node is really dead and not accessing storage (at your risk). While Lustre has MMP, it is really more to protect against a mount typo than to guarantee resource fencing. Thanks David -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Question on setting up fail-over
Another question. Is it possible to use centos/redhat's clustering software? In the manual it mentions using that for metadata failover(since having more then one metadata server online isnt possible right now), so why not use that for all of lustre? But since the information is missing, can someone fill in the blanks on setting up metadata failover? David On Tue, Aug 10, 2010 at 11:11 AM, Kevin Van Maren kevin.van.ma...@oracle.com wrote: Depends on the HA package you are using. Heartbeat comes with a script that supports IPMI. The important thing is that stonith NOT succeed if you don't _know_ that the node is off. So it is absolutely not a 1-line script. Kevin David Noriega wrote: I think I'll go the ipmi route. So reading on STONITH, its just a script, so all I would need is a script to run ipmi that tells the server to power off, right? Also while reading through the lustre manual, seems some things are being deleted from the wiki, http://wiki.lustre.org/index.php?title=Clu_Manager no longer exists, and noticed this too when I found the lustre quick guide is no longer available. Thanks David On Tue, Aug 10, 2010 at 10:57 AM, Kevin Van Maren kevin.van.ma...@oracle.com wrote: David Noriega wrote: Could you describe this resource fencing in more detail? As for regards to STONITH, the pdu already has the grubby hands of IT plugged into it and doubt they would be happy if I unplugged them. What about the network management port or ILOM? Resource fencing is needed to ensure that a node does not take over a resource (ie, OST) while the other node is still accessing it (as could happen if the node only partly crashes, where it is not responding to the HA package but still writing to the disk). STONITH is a pretty common way to ensure the other node is dead and can no longer access the resource. If you can't use your switched PDU, then using the ILOM for IPMI-based power control works. The other common way to do resource fencing is to use scsi reserve commands (if supported by the hardware and the HA package) to ensure exclusive access. Kevin On Mon, Aug 9, 2010 at 1:08 PM, Kevin Van Maren kevin.van.ma...@oracle.com wrote: On Aug 9, 2010, at 11:45 AM, David Noriega tsk...@my.utsa.edu wrote: My understanding of setting up fail-over is you need some control over the power so with a script it can turn off a machine by cutting its power? Is this correct? It is the recommended configuration because it is simple to understand and implement. But the only _hard_ requirement is that both nodes can access the storage. Is there a way to do fail-over without having access to the pdu(power strips)? If you have IPMI support, that can be used for power control, instead of a switched PDU. Depending on the storage, you may be able to do resource fencing of the disks instead of STONITH. Or you can run fast-and-loose, without any way to ensure the dead node is really dead and not accessing storage (at your risk). While Lustre has MMP, it is really more to protect against a mount typo than to guarantee resource fencing. Thanks David -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Two file servers question.
We just got our lustre system online, and as we continue to play with it, I need some help supporting my argument that we should have two file servers. One nfs server to host up user's home directories and then the lustre file server to host up space for their jobs to run. My manager's concern is confusing users, which I think for anyone using a cluster isn't completely valid, but any information towards technical details supporting a two file server solution would be helpful. Thanks David -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Question on setting up fail-over
My understanding of setting up fail-over is you need some control over the power so with a script it can turn off a machine by cutting its power? Is this correct? Is there a way to do fail-over without having access to the pdu(power strips)? Thanks David -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Lustre and Automount
We are pre-Lustre right now and have some questions. Currently our cluster uses LDAP+automount to mount user's home directories from our file server. Once we go Lustre, is there any sort of modification to LDAP or automount(besides the installation of the Lustre client programs) needed? -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Monitoring filesystem usage
What tools do you use to keep track of who is using and how much of the filesystem? Are there any free tools to keep track of old files, temp files, large files, etc? Basically how to you keep things running in an orderly fashion and keep users in line, besides adding more space? -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] What do you think of this idea?
My supervisor has this idea and I would like the input of the Lustre community as we are still very new to Lustre. We have 7 workstations, and the idea was to put into them 3 2TB drives, for a total of 42TB, and set them up as object servers, and another workstation as a meta data server. How feasible is this idea? And I know of the downfalls, what if a student reboots a machine, etc, but baring those events, would this setup work? David ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] What do you think of this idea?
I just had another idea but again since I know very little about how Lustre works, I'll need some input. What if I take a single workstation and attach to it via iSCSI two Drobo disk arrays. Would it be possible to run both the metadata and the object storage off of a single machine? Two maxed out Drobo arrays gets 32TB of space. It costs more but would this be better then adding disks to existing workstations where we cant control the environment(ie users)? On Wed, May 19, 2010 at 1:01 PM, hungsheng Tsao hungsheng.t...@oracle.comwrote: as playground is fine. even in this type of env , IMHO, u will need to do some raid for 3x2tb ost and mirror for mds Hope that U have enough memory and cpu power and network for MDS and OST. these are dedicate for lustre, u need other compute nodes. regards On 5/19/2010 1:31 PM, David Noriega wrote: My supervisor has this idea and I would like the input of the Lustre community as we are still very new to Lustre. We have 7 workstations, and the idea was to put into them 3 2TB drives, for a total of 42TB, and set them up as object servers, and another workstation as a meta data server. How feasible is this idea? And I know of the downfalls, what if a student reboots a machine, etc, but baring those events, would this setup work? David -- ___ Lustre-discuss mailing listlustre-disc...@lists.lustre.orghttp://lists.lustre.org/mailman/listinfo/lustre-discuss -- [image: Oracle] Hung-Sheng Tsao, Ph.D. | Principal Sales Consultant Higher Education | +1.973.495.0840 Oracle * - North America Commercial Hardware* 400 Atrium Dr. Somerset, NJ 08873 Email hungsheng.t...@oracle.com hungsheng.t...@oracle.com -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ora_logo_small.gif___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss