Re: [Lustre-discuss] no handle for file close
On Thu, May 07, 2009 at 10:45:31AM -0500, Nirmal Seenu wrote: I am getting quite a few errors similar to the following error on the MDS server which is running the latest 1.6.7.1 patched kernel. The clients are running 1.6.7 patchless client on 2.6.18-128.1.6.el5 kernel and this cluster has 130 nodes/Lustre clients and uses GigE network. May 7 04:13:48 lustre3 kernel: LustreError: 7213:0:(mds_open.c:1567:mds_close()) @@@ no handle for file close ino 772769: cookie 0xcfe66441310829d4 r...@8101ca8a3800 x2681218/t0 o35-fedc91f9-4de7-c789-6bdd-1de1f5e3d...@net_0x2c0a8f109_uuid:0/0 lens 296/1680 e 0 to 0 dl 1241687634 ref 1 fl Interpret:/0/0 rc 0/0 May 7 04:13:48 lustre3 kernel: LustreError: 7213:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ processing error (-116) r...@8101ca8a3800 x2681218/t0 o35-fedc91f9-4de7-c789-6bdd-1de1f5e3d...@net_0x2c0a8f109_uuid:0/0 lens 296/1680 e 0 to 0 dl 1241687634 ref 1 fl Interpret:/0/0 rc -116/0 I don't see the same errors on another cluster/Lustre installation with 2000 Lustre clients which uses Infiniband network. we see this sometimes when a job that is using a shared library that lives on Lustre is killed - presumably the un-memorymapping of the .so from a bunch of nodes at once confuses Lustre a bit. what is your inode 772769? eg. find -inum 772769 /some/lustre/fs/ if the file is a .so then that would be similar to what we are seeing. so we have this listed in the probably harmless section of the errors that we get from Lustre, so if it's not harmless than we'd very much like to know about it :) this cluster is IB, rhel5, x86_64, 1.6.6 on servers and patchless 1.6.4.3 on clients w/ 2.6.23.17 kernels. cheers, robin -- Dr Robin Humble, HPC Systems Analyst, NCI National Facility I looked at the following bugs 19328, 18946, 18192 and 19085 but I am not sure if any of those bugs apply to this error. I would appreciate it someone could help me understand these errors and possibly suggest the solution. TIA Nirmal ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] tcp network load balancing understanding lustre 1.8
Thanks for the screen shot Arden. What is the maximum # of slaves you can have on a bonded interface? On Sun, May 10, 2009 at 12:15 AM, Arden Wiebe albert...@yahoo.com wrote: Bond0 knows which interface to utilize because all the other eth0-5 are designated as slaves in their configuration files. The manual is fairly clear on that. In the screenshot the memory used in gnome system monitor is at 452.4 MiB of 7.8 GiB and the sustained bandwidth to the OSS and OST is 404.2 MiB/s which corresponds roughly to what collectl is showing for KBWrite for Disks. Collectl shows a few different results for Disks, Network and Lustre OST and I believe it to be measuring the other OST on the network around 170MiB/s if you view the other screenshot for OST1 or lustrethree. In the screenshots Lustreone=MGS Lustretwo=MDT Lustrethree=OSS+raid10 target Lustrefour=OSS+raid10 target To help clarify the entire network and stress testing I did with all the clients I could give it is at www.ioio.ca/Lustre-tcp-bonding/images/html and www.ioio.ca/Lustre-tcp-bonding/Lustre-notes/images.html Proper benchmarking would be nice though as I just hit it with everything I could and it lived so I was happy. I found the manual to be lacking in benchmarking and really wanted to make nice graphs of it all but failed with iozone to do so for some reason. I'll be taking a run at upgrading everything to 1.8 in the coming week or so and when I do I'll grab some new screenshots and post the relevant items to the wiki. Otherwise if someone else wants to post the existing screenshots your welcome to use them as they do detail a ground up build. Apparently 1.8 is great with small files now so it should work even better with www.oil-gas.ca/phpsysinfo and www.linuxguru.ca/phpsysinfo --- On Sat, 5/9/09, Andreas Dilger adil...@sun.com wrote: From: Andreas Dilger adil...@sun.com Subject: Re: [Lustre-discuss] tcp network load balancing understanding lustre 1.8 To: Arden Wiebe albert...@yahoo.com Cc: lustre-discuss@lists.lustre.org, Michael Ruepp mich...@schwarzfilm.ch Date: Saturday, May 9, 2009, 11:31 AM On May 09, 2009 09:18 -0700, Arden Wiebe wrote: This might help answer some questions. http://ioio.ca/Lustre-tcp-bonding/OST2.png which shows my mostly not tuned OSS and OST's pulling 400+MiB/s over TCP Bonding provided by the kernel complete with a cat of the modeprobe.conf file. You have the other links I've sent you but the picture above is relevant to your questions. Arden, thanks for sharing this info. Any chance you could post it to wiki.lustre.org? It would seem there is one bit of info missing somewhere - how does bond0 know which interfaces to use? Also, another oddity - the network monitor is showing 450MiB/s Received, yet the disk is showing only about 170MiB/s going to the disk. Either something is wacky with the monitoring (e.g. it is counting Received for both the eth* networks AND bond0), or Lustre is doing something very wierd and retransmitting the bulk data like crazy (seems unlikely). --- On Thu, 5/7/09, Michael Ruepp mich...@schwarzfilm.ch wrote: From: Michael Ruepp mich...@schwarzfilm.ch Subject: [Lustre-discuss] tcp network load balancing understanding lustre 1.8 To: lustre-discuss@lists.lustre.org Date: Thursday, May 7, 2009, 5:50 AM Hi there, I am configured a simple tcp lustre 1.8 with one mdc (one nic) and two oss (four nic per oss) As well as in the 1.6 documentation, the multihomed sections is a little bit unclear to me. I give every NID a IP in the same subnet, eg: 10.111.20.35-38 - oss0 and 10.111.20.39-42 oss1 Do I have to make modprobe.conf.local look like this to force lustre to use all four interfaces parallel: options lnet networks=tcp0(eth0,eth1,eth2,eth3) Because on Page 138 the 1.8 Manual says: Note – In the case of TCP-only clients, the first available non- loopback IP interface is used for tcp0 since the interfaces are not specified. or do I have to specify it like this: options lnet networks=tcp Because on Page 112 the lustre 1.6 Manual says: Note – In the case of TCP-only clients, all available IP interfaces are used for tcp0 since the interfaces are not specified. If there is more than one, the IP of the first one found is used to construct the tcp0 ID. Which is the opposite of the 1.8 Manual My goal ist to let lustre utilize all four Gb Links parallel. And my Lustre Clients are equipped with two Gb links which should be utilized by the lustre clients as well (eth0, eth1) Or is bonding the better solution in terms of performance? Thanks very much for input, Michael Ruepp Schwarzfilm AG ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org
Re: [Lustre-discuss] tcp network load balancing understanding lustre 1.8
Mag, your welcome. From the page referenced first for a search for Linux Bonding it states: How many bonding devices can I have? There is no limit. How many slaves can a bonding device have? This is limited only by the number of network interfaces Linux supports and/or the number of network cards you can place in your system. --- On Sun, 5/10/09, Mag Gam magaw...@gmail.com wrote: From: Mag Gam magaw...@gmail.com Subject: Re: [Lustre-discuss] tcp network load balancing understanding lustre 1.8 To: Arden Wiebe albert...@yahoo.com Cc: Andreas Dilger adil...@sun.com, Michael Ruepp mich...@schwarzfilm.ch, lustre-discuss@lists.lustre.org Date: Sunday, May 10, 2009, 5:48 AM Thanks for the screen shot Arden. What is the maximum # of slaves you can have on a bonded interface? On Sun, May 10, 2009 at 12:15 AM, Arden Wiebe albert...@yahoo.com wrote: Bond0 knows which interface to utilize because all the other eth0-5 are designated as slaves in their configuration files. The manual is fairly clear on that. In the screenshot the memory used in gnome system monitor is at 452.4 MiB of 7.8 GiB and the sustained bandwidth to the OSS and OST is 404.2 MiB/s which corresponds roughly to what collectl is showing for KBWrite for Disks. Collectl shows a few different results for Disks, Network and Lustre OST and I believe it to be measuring the other OST on the network around 170MiB/s if you view the other screenshot for OST1 or lustrethree. In the screenshots Lustreone=MGS Lustretwo=MDT Lustrethree=OSS+raid10 target Lustrefour=OSS+raid10 target To help clarify the entire network and stress testing I did with all the clients I could give it is at www.ioio.ca/Lustre-tcp-bonding/images/html and www.ioio.ca/Lustre-tcp-bonding/Lustre-notes/images.html Proper benchmarking would be nice though as I just hit it with everything I could and it lived so I was happy. I found the manual to be lacking in benchmarking and really wanted to make nice graphs of it all but failed with iozone to do so for some reason. I'll be taking a run at upgrading everything to 1.8 in the coming week or so and when I do I'll grab some new screenshots and post the relevant items to the wiki. Otherwise if someone else wants to post the existing screenshots your welcome to use them as they do detail a ground up build. Apparently 1.8 is great with small files now so it should work even better with www.oil-gas.ca/phpsysinfo and www.linuxguru.ca/phpsysinfo --- On Sat, 5/9/09, Andreas Dilger adil...@sun.com wrote: From: Andreas Dilger adil...@sun.com Subject: Re: [Lustre-discuss] tcp network load balancing understanding lustre 1.8 To: Arden Wiebe albert...@yahoo.com Cc: lustre-discuss@lists.lustre.org, Michael Ruepp mich...@schwarzfilm.ch Date: Saturday, May 9, 2009, 11:31 AM On May 09, 2009 09:18 -0700, Arden Wiebe wrote: This might help answer some questions. http://ioio.ca/Lustre-tcp-bonding/OST2.png which shows my mostly not tuned OSS and OST's pulling 400+MiB/s over TCP Bonding provided by the kernel complete with a cat of the modeprobe.conf file. You have the other links I've sent you but the picture above is relevant to your questions. Arden, thanks for sharing this info. Any chance you could post it to wiki.lustre.org? It would seem there is one bit of info missing somewhere - how does bond0 know which interfaces to use? Also, another oddity - the network monitor is showing 450MiB/s Received, yet the disk is showing only about 170MiB/s going to the disk. Either something is wacky with the monitoring (e.g. it is counting Received for both the eth* networks AND bond0), or Lustre is doing something very wierd and retransmitting the bulk data like crazy (seems unlikely). --- On Thu, 5/7/09, Michael Ruepp mich...@schwarzfilm.ch wrote: From: Michael Ruepp mich...@schwarzfilm.ch Subject: [Lustre-discuss] tcp network load balancing understanding lustre 1.8 To: lustre-discuss@lists.lustre.org Date: Thursday, May 7, 2009, 5:50 AM Hi there, I am configured a simple tcp lustre 1.8 with one mdc (one nic) and two oss (four nic per oss) As well as in the 1.6 documentation, the multihomed sections is a little bit unclear to me. I give every NID a IP in the same subnet, eg: 10.111.20.35-38 - oss0 and 10.111.20.39-42 oss1 Do I have to make modprobe.conf.local look like this to force lustre to use all four interfaces parallel: options lnet networks=tcp0(eth0,eth1,eth2,eth3) Because on Page 138 the 1.8 Manual says: Note – In the case of TCP-only clients, the first available non- loopback IP interface is used for tcp0 since the interfaces are not specified. or do I have to specify it like this: options lnet networks=tcp Because on Page 112 the
Re: [Lustre-discuss] tcp network load balancing understanding lustre 1.8
On May 10, 2009, at 7:12 AM, Arden Wiebe albert...@yahoo.com wrote: Mag, your welcome. From the page referenced first for a search for Linux Bonding it states: How many bonding devices can I have? There is no limit. How many slaves can a bonding device have? This is limited only by the number of network interfaces Linux supports and/or the number of network cards you can place in your system. In practice, most configurations are limited to the (typical) 4 or 8 maximum supported by the switch you are using. --- On Sun, 5/10/09, Mag Gam magaw...@gmail.com wrote: From: Mag Gam magaw...@gmail.com Subject: Re: [Lustre-discuss] tcp network load balancing understanding lustre 1.8 To: Arden Wiebe albert...@yahoo.com Cc: Andreas Dilger adil...@sun.com, Michael Ruepp mich...@schwarzfilm.ch , lustre-discuss@lists.lustre.org Date: Sunday, May 10, 2009, 5:48 AM Thanks for the screen shot Arden. What is the maximum # of slaves you can have on a bonded interface? On Sun, May 10, 2009 at 12:15 AM, Arden Wiebe albert...@yahoo.com wrote: Bond0 knows which interface to utilize because all the other eth0-5 are designated as slaves in their configuration files. The manual is fairly clear on that. In the screenshot the memory used in gnome system monitor is at 452.4 MiB of 7.8 GiB and the sustained bandwidth to the OSS and OST is 404.2 MiB/s which corresponds roughly to what collectl is showing for KBWrite for Disks. Collectl shows a few different results for Disks, Network and Lustre OST and I believe it to be measuring the other OST on the network around 170MiB/s if you view the other screenshot for OST1 or lustrethree. In the screenshots Lustreone=MGS Lustretwo=MDT Lustrethree=OSS+raid10 target Lustrefour=OSS+raid10 target To help clarify the entire network and stress testing I did with all the clients I could give it is at www.ioio.ca/Lustre-tcp-bonding/images/html and www.ioio.ca/Lustre-tcp-bonding/Lustre-notes/images.html Proper benchmarking would be nice though as I just hit it with everything I could and it lived so I was happy. I found the manual to be lacking in benchmarking and really wanted to make nice graphs of it all but failed with iozone to do so for some reason. I'll be taking a run at upgrading everything to 1.8 in the coming week or so and when I do I'll grab some new screenshots and post the relevant items to the wiki. Otherwise if someone else wants to post the existing screenshots your welcome to use them as they do detail a ground up build. Apparently 1.8 is great with small files now so it should work even better with www.oil-gas.ca/phpsysinfo and www.linuxguru.ca/phpsysinfo --- On Sat, 5/9/09, Andreas Dilger adil...@sun.com wrote: From: Andreas Dilger adil...@sun.com Subject: Re: [Lustre-discuss] tcp network load balancing understanding lustre 1.8 To: Arden Wiebe albert...@yahoo.com Cc: lustre-discuss@lists.lustre.org, Michael Ruepp mich...@schwarzfilm.ch Date: Saturday, May 9, 2009, 11:31 AM On May 09, 2009 09:18 -0700, Arden Wiebe wrote: This might help answer some questions. http://ioio.ca/Lustre-tcp-bonding/OST2.png which shows my mostly not tuned OSS and OST's pulling 400+MiB/s over TCP Bonding provided by the kernel complete with a cat of the modeprobe.conf file. You have the other links I've sent you but the picture above is relevant to your questions. Arden, thanks for sharing this info. Any chance you could post it to wiki.lustre.org? It would seem there is one bit of info missing somewhere - how does bond0 know which interfaces to use? Also, another oddity - the network monitor is showing 450MiB/s Received, yet the disk is showing only about 170MiB/s going to the disk. Either something is wacky with the monitoring (e.g. it is counting Received for both the eth* networks AND bond0), or Lustre is doing something very wierd and retransmitting the bulk data like crazy (seems unlikely). --- On Thu, 5/7/09, Michael Ruepp mich...@schwarzfilm.ch wrote: From: Michael Ruepp mich...@schwarzfilm.ch Subject: [Lustre-discuss] tcp network load balancing understanding lustre 1.8 To: lustre-discuss@lists.lustre.org Date: Thursday, May 7, 2009, 5:50 AM Hi there, I am configured a simple tcp lustre 1.8 with one mdc (one nic) and two oss (four nic per oss) As well as in the 1.6 documentation, the multihomed sections is a little bit unclear to me. I give every NID a IP in the same subnet, eg: 10.111.20.35-38 - oss0 and 10.111.20.39-42 oss1 Do I have to make modprobe.conf.local look like this to force lustre to use all four interfaces parallel: options lnet networks=tcp0(eth0,eth1,eth2,eth3) Because on Page 138 the 1.8 Manual says: Note – In the case of TCP-only clients, the first available non- loopback IP interface is used for tcp0 since the interfaces are not specified. or do I have to specify it like this: options
Re: [Lustre-discuss] tcp network load balancing understanding lustre 1.8
Mag Gam wrote: Thanks for the screen shot Arden. What is the maximum # of slaves you can have on a bonded interface? On Sun, May 10, 2009 at 12:15 AM, Arden Wiebe albert...@yahoo.com wrote: Bond0 knows which interface to utilize because all the other eth0-5 are designated as slaves in their configuration files. The manual is fairly clear on that. In the screenshot the memory used in gnome system monitor is at 452.4 MiB of 7.8 GiB and the sustained bandwidth to the OSS and OST is 404.2 MiB/s which corresponds roughly to what collectl is showing for KBWrite for Disks. Collectl shows a few different results for Disks, Network and Lustre OST and I believe it to be measuring the other OST on the network around 170MiB/s if you view the other screenshot for OST1 or lustrethree. In the screenshots Lustreone=MGS Lustretwo=MDT Lustrethree=OSS+raid10 target Lustrefour=OSS+raid10 target To help clarify the entire network and stress testing I did with all the clients I could give it is at www.ioio.ca/Lustre-tcp-bonding/images/html and www.ioio.ca/Lustre-tcp-bonding/Lustre-notes/images.html Proper benchmarking would be nice though as I just hit it with everything I could and it lived so I was happy. I found the manual to be lacking in benchmarking and really wanted to make nice graphs of it all but failed with iozone to do so for some reason. I too have been trying to benchmark a lustre filesystem with iozone 3.321. Sometimes it works, and sometimes it hangs. I turned on debugging, and ran a test with 2 clients on each of 40 machines. In the output, I get lines like: loop: R_STAT_DATA for client 9 For 79 clients, there are two of these messages in the output, and for one of them only 1. I've had a brief skim of the source code, and I think that the problem is that iozone uses UDP packets to communicate. On a heavily loaded network, one of these is bound to get lost. Presumably iozone doesn't have the right retry strategy. The iozone author has suggested using a different network for the timing packets - but I don't think I can justify the time or expense involved in building one purely to do some benchmarking. Chris PS On a machine with 2 bonded Gigabit ethernet cards, I found I needed two iozone threads to get the available bandwidth. One iozone thread seemed to get the bandwidth from one card only. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] tcp network load balancing understanding lustre 1.8
On Sun, 2009-05-10 at 15:07 +0100, Christopher J. Walker wrote: I've had a brief skim of the source code, and I think that the problem is that iozone uses UDP packets to communicate. On a heavily loaded network, one of these is bound to get lost. Presumably iozone doesn't have the right retry strategy. Why not use a benchmark that uses an established MPI (such as MPICH or LAM, which can run it's message passing infrastructure on a TCP transport such as rsh or ssh) library. IOR is one such benchmark. Of course, if your network is really so loaded as to be dropping UDP packets then that will probably impact the latency of the MPI messages. Not sure if that will have a meaningful impact on IOR or not. I tend to think the messaging is quite low volume so perhaps not. In any case, it can add another data point to your debugging efforts to help prove or disprove your hypothesis. b. signature.asc Description: This is a digitally signed message part ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss