Re: [Lustre-discuss] no handle for file close

2009-05-10 Thread Robin Humble
On Thu, May 07, 2009 at 10:45:31AM -0500, Nirmal Seenu wrote:
I am getting quite a few errors similar to the following error on the 
MDS server which is running the latest 1.6.7.1 patched kernel. The 
clients are running 1.6.7 patchless client on 2.6.18-128.1.6.el5 kernel 
and this cluster has 130 nodes/Lustre clients and uses GigE network.

May  7 04:13:48 lustre3 kernel: LustreError: 
7213:0:(mds_open.c:1567:mds_close()) @@@ no handle for file close ino 772769: 
cookie 0xcfe66441310829d4  r...@8101ca8a3800 x2681218/t0 
o35-fedc91f9-4de7-c789-6bdd-1de1f5e3d...@net_0x2c0a8f109_uuid:0/0 lens 
296/1680 e 0 to 0 dl 1241687634 ref 1 fl Interpret:/0/0 rc 0/0

May  7 04:13:48 lustre3 kernel: LustreError: 
7213:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ processing error (-116)  
r...@8101ca8a3800 x2681218/t0 
o35-fedc91f9-4de7-c789-6bdd-1de1f5e3d...@net_0x2c0a8f109_uuid:0/0 lens 
296/1680 e 0 to 0 dl 1241687634 ref 1 fl Interpret:/0/0 rc -116/0

I don't see the same errors on another cluster/Lustre installation with 
2000 Lustre clients which uses Infiniband network.

we see this sometimes when a job that is using a shared library that
lives on Lustre is killed - presumably the un-memorymapping of the .so
from a bunch of nodes at once confuses Lustre a bit.

what is your inode 772769?
eg.
   find -inum 772769 /some/lustre/fs/
if the file is a .so then that would be similar to what we are seeing.

so we have this listed in the probably harmless section of the errors
that we get from Lustre, so if it's not harmless than we'd very much
like to know about it :)

this cluster is IB, rhel5, x86_64, 1.6.6 on servers and patchless
1.6.4.3 on clients w/ 2.6.23.17 kernels.

cheers,
robin
--
Dr Robin Humble, HPC Systems Analyst, NCI National Facility

I looked at the following bugs 19328, 18946, 18192 and 19085 but I am 
not sure if any of those bugs apply to this error. I would appreciate it 
someone could help me understand these errors and possibly suggest the 
solution.

TIA
Nirmal
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] tcp network load balancing understanding lustre 1.8

2009-05-10 Thread Mag Gam
Thanks for the screen shot Arden.

What is the maximum # of slaves you can have on a bonded interface?



On Sun, May 10, 2009 at 12:15 AM, Arden Wiebe albert...@yahoo.com wrote:

 Bond0 knows which interface to utilize because all the other eth0-5 are 
 designated as slaves in their configuration files.  The manual is fairly 
 clear on that.

 In the screenshot the memory used in gnome system monitor is at 452.4 MiB of 
 7.8 GiB and the sustained bandwidth to the OSS and OST is 404.2 MiB/s which 
 corresponds roughly to what collectl is showing for KBWrite for Disks.  
 Collectl shows a few different results for Disks, Network and Lustre OST and 
 I believe it to be measuring the other OST on the network around 170MiB/s if 
 you view the other screenshot for OST1 or lustrethree.

 In the screenshots Lustreone=MGS Lustretwo=MDT Lustrethree=OSS+raid10 target 
 Lustrefour=OSS+raid10 target

 To help clarify the entire network and stress testing I did with all the 
 clients I could give it is at www.ioio.ca/Lustre-tcp-bonding/images/html and 
 www.ioio.ca/Lustre-tcp-bonding/Lustre-notes/images.html

 Proper benchmarking would be nice though as I just hit it with everything I 
 could and it lived so I was happy. I found the manual to be lacking in 
 benchmarking and really wanted to make nice graphs of it all but failed with 
 iozone to do so for some reason.

 I'll be taking a run at upgrading everything to 1.8 in the coming week or so 
 and when I do I'll grab some new screenshots and post the relevant items to 
 the wiki.  Otherwise if someone else wants to post the existing screenshots 
 your welcome to use them as they do detail a ground up build. Apparently 1.8 
 is great with small files now so it should work even better with 
 www.oil-gas.ca/phpsysinfo and www.linuxguru.ca/phpsysinfo


 --- On Sat, 5/9/09, Andreas Dilger adil...@sun.com wrote:

 From: Andreas Dilger adil...@sun.com
 Subject: Re: [Lustre-discuss] tcp network load balancing understanding 
 lustre 1.8
 To: Arden Wiebe albert...@yahoo.com
 Cc: lustre-discuss@lists.lustre.org, Michael Ruepp mich...@schwarzfilm.ch
 Date: Saturday, May 9, 2009, 11:31 AM
 On May 09, 2009  09:18 -0700,
 Arden Wiebe wrote:
  This might help answer some questions.
  http://ioio.ca/Lustre-tcp-bonding/OST2.png which shows
 my mostly not
  tuned OSS and OST's pulling 400+MiB/s over TCP Bonding
 provided by the
  kernel complete with a cat of the modeprobe.conf
 file.  You have the other
  links I've sent you but the picture above is relevant
 to your questions.

 Arden, thanks for sharing this info.  Any chance you
 could post it to
 wiki.lustre.org?  It would seem there is one bit of
 info missing somewhere -
 how does bond0 know which interfaces to use?


 Also, another oddity - the network monitor is showing
 450MiB/s Received,
 yet the disk is showing only about 170MiB/s going to the
 disk.  Either
 something is wacky with the monitoring (e.g. it is counting
 Received for
 both the eth* networks AND bond0), or Lustre is doing
 something very
 wierd and retransmitting the bulk data like crazy (seems
 unlikely).


  --- On Thu, 5/7/09, Michael Ruepp mich...@schwarzfilm.ch
 wrote:
 
   From: Michael Ruepp mich...@schwarzfilm.ch
   Subject: [Lustre-discuss] tcp network load
 balancing understanding lustre 1.8
   To: lustre-discuss@lists.lustre.org
   Date: Thursday, May 7, 2009, 5:50 AM
   Hi there,
  
   I am configured a simple tcp lustre 1.8 with one
 mdc (one
   nic) and two
   oss (four nic per oss)
   As well as in the 1.6 documentation, the
 multihomed
   sections is a
   little bit unclear to me.
  
   I give every NID a IP in the same subnet, eg:
   10.111.20.35-38 - oss0
   and 10.111.20.39-42 oss1
  
   Do I have to make modprobe.conf.local look like
 this to
   force lustre
   to use all four interfaces parallel:
  
   options lnet networks=tcp0(eth0,eth1,eth2,eth3)
   Because on Page 138 the 1.8 Manual says:
   Note – In the case of TCP-only clients, the
 first
   available non-
   loopback IP interface
   is used for tcp0 since the interfaces are not
 specified. 
  
   or do I have to specify it like this:
   options lnet networks=tcp
   Because on Page 112 the lustre 1.6 Manual says:
   Note – In the case of TCP-only clients, all
 available IP
   interfaces
   are used for tcp0
   since the interfaces are not specified. If there
 is more
   than one, the
   IP of the first one
   found is used to construct the tcp0 ID.
  
   Which is the opposite of the 1.8 Manual
  
   My goal ist to let lustre utilize all four Gb
 Links
   parallel. And my
   Lustre Clients are equipped with two Gb links
 which should
   be utilized
   by the lustre clients as well (eth0, eth1)
  
   Or is bonding the better solution in terms of
 performance?
  
   Thanks very much for input,
  
   Michael Ruepp
   Schwarzfilm AG
  
  
   ___
   Lustre-discuss mailing list
   Lustre-discuss@lists.lustre.org
   

Re: [Lustre-discuss] tcp network load balancing understanding lustre 1.8

2009-05-10 Thread Arden Wiebe

Mag, your welcome. From the page referenced first for a search for Linux 
Bonding it states:

How many bonding devices can I have?

There is no limit.
How many slaves can a bonding device have?

This is limited only by the number of network interfaces Linux supports and/or 
the number of network cards you can place in your system.

--- On Sun, 5/10/09, Mag Gam magaw...@gmail.com wrote:

 From: Mag Gam magaw...@gmail.com
 Subject: Re: [Lustre-discuss] tcp network load balancing understanding lustre 
  1.8
 To: Arden Wiebe albert...@yahoo.com
 Cc: Andreas Dilger adil...@sun.com, Michael Ruepp 
 mich...@schwarzfilm.ch, lustre-discuss@lists.lustre.org
 Date: Sunday, May 10, 2009, 5:48 AM
 Thanks for the screen shot Arden.
 
 What is the maximum # of slaves you can have on a bonded
 interface?
 
 
 
 On Sun, May 10, 2009 at 12:15 AM, Arden Wiebe albert...@yahoo.com
 wrote:
 
  Bond0 knows which interface to utilize because all the
 other eth0-5 are designated as slaves in their configuration
 files.  The manual is fairly clear on that.
 
  In the screenshot the memory used in gnome system
 monitor is at 452.4 MiB of 7.8 GiB and the sustained
 bandwidth to the OSS and OST is 404.2 MiB/s which
 corresponds roughly to what collectl is showing for KBWrite
 for Disks.  Collectl shows a few different results for
 Disks, Network and Lustre OST and I believe it to be
 measuring the other OST on the network around 170MiB/s if
 you view the other screenshot for OST1 or lustrethree.
 
  In the screenshots Lustreone=MGS Lustretwo=MDT
 Lustrethree=OSS+raid10 target Lustrefour=OSS+raid10 target
 
  To help clarify the entire network and stress testing
 I did with all the clients I could give it is at
 www.ioio.ca/Lustre-tcp-bonding/images/html and
 www.ioio.ca/Lustre-tcp-bonding/Lustre-notes/images.html
 
  Proper benchmarking would be nice though as I just hit
 it with everything I could and it lived so I was happy. I
 found the manual to be lacking in benchmarking and really
 wanted to make nice graphs of it all but failed with iozone
 to do so for some reason.
 
  I'll be taking a run at upgrading everything to 1.8 in
 the coming week or so and when I do I'll grab some new
 screenshots and post the relevant items to the wiki.
  Otherwise if someone else wants to post the existing
 screenshots your welcome to use them as they do detail a
 ground up build. Apparently 1.8 is great with small files
 now so it should work even better with
 www.oil-gas.ca/phpsysinfo and www.linuxguru.ca/phpsysinfo
 
 
  --- On Sat, 5/9/09, Andreas Dilger adil...@sun.com
 wrote:
 
  From: Andreas Dilger adil...@sun.com
  Subject: Re: [Lustre-discuss] tcp network load
 balancing understanding lustre 1.8
  To: Arden Wiebe albert...@yahoo.com
  Cc: lustre-discuss@lists.lustre.org,
 Michael Ruepp mich...@schwarzfilm.ch
  Date: Saturday, May 9, 2009, 11:31 AM
  On May 09, 2009  09:18 -0700,
  Arden Wiebe wrote:
   This might help answer some questions.
   http://ioio.ca/Lustre-tcp-bonding/OST2.png which shows
  my mostly not
   tuned OSS and OST's pulling 400+MiB/s over
 TCP Bonding
  provided by the
   kernel complete with a cat of the
 modeprobe.conf
  file.  You have the other
   links I've sent you but the picture above is
 relevant
  to your questions.
 
  Arden, thanks for sharing this info.  Any chance
 you
  could post it to
  wiki.lustre.org?  It would seem there is one bit
 of
  info missing somewhere -
  how does bond0 know which interfaces to use?
 
 
  Also, another oddity - the network monitor is
 showing
  450MiB/s Received,
  yet the disk is showing only about 170MiB/s going
 to the
  disk.  Either
  something is wacky with the monitoring (e.g. it is
 counting
  Received for
  both the eth* networks AND bond0), or Lustre is
 doing
  something very
  wierd and retransmitting the bulk data like crazy
 (seems
  unlikely).
 
 
   --- On Thu, 5/7/09, Michael Ruepp mich...@schwarzfilm.ch
  wrote:
  
From: Michael Ruepp mich...@schwarzfilm.ch
Subject: [Lustre-discuss] tcp network
 load
  balancing understanding lustre 1.8
To: lustre-discuss@lists.lustre.org
Date: Thursday, May 7, 2009, 5:50 AM
Hi there,
   
I am configured a simple tcp lustre 1.8
 with one
  mdc (one
nic) and two
oss (four nic per oss)
As well as in the 1.6 documentation,
 the
  multihomed
sections is a
little bit unclear to me.
   
I give every NID a IP in the same
 subnet, eg:
10.111.20.35-38 - oss0
and 10.111.20.39-42 oss1
   
Do I have to make modprobe.conf.local
 look like
  this to
force lustre
to use all four interfaces parallel:
   
options lnet
 networks=tcp0(eth0,eth1,eth2,eth3)
Because on Page 138 the 1.8 Manual
 says:
Note – In the case of TCP-only
 clients, the
  first
available non-
loopback IP interface
is used for tcp0 since the interfaces
 are not
  specified. 
   
or do I have to specify it like this:
options lnet networks=tcp
Because on Page 112 the 

Re: [Lustre-discuss] tcp network load balancing understanding lustre 1.8

2009-05-10 Thread Kevin Van Maren


On May 10, 2009, at 7:12 AM, Arden Wiebe albert...@yahoo.com wrote:


 Mag, your welcome. From the page referenced first for a search for  
 Linux Bonding it states:

 How many bonding devices can I have?

 There is no limit.
 How many slaves can a bonding device have?

 This is limited only by the number of network interfaces Linux  
 supports and/or the number of network cards you can place in your  
 system.


In practice, most configurations are limited to the (typical) 4 or 8  
maximum supported by the switch you are using.


 --- On Sun, 5/10/09, Mag Gam magaw...@gmail.com wrote:

 From: Mag Gam magaw...@gmail.com
 Subject: Re: [Lustre-discuss] tcp network load balancing  
 understanding lustre  1.8
 To: Arden Wiebe albert...@yahoo.com
 Cc: Andreas Dilger adil...@sun.com, Michael Ruepp 
 mich...@schwarzfilm.ch 
 , lustre-discuss@lists.lustre.org
 Date: Sunday, May 10, 2009, 5:48 AM
 Thanks for the screen shot Arden.

 What is the maximum # of slaves you can have on a bonded
 interface?



 On Sun, May 10, 2009 at 12:15 AM, Arden Wiebe albert...@yahoo.com
 wrote:

 Bond0 knows which interface to utilize because all the
 other eth0-5 are designated as slaves in their configuration
 files.  The manual is fairly clear on that.

 In the screenshot the memory used in gnome system
 monitor is at 452.4 MiB of 7.8 GiB and the sustained
 bandwidth to the OSS and OST is 404.2 MiB/s which
 corresponds roughly to what collectl is showing for KBWrite
 for Disks.  Collectl shows a few different results for
 Disks, Network and Lustre OST and I believe it to be
 measuring the other OST on the network around 170MiB/s if
 you view the other screenshot for OST1 or lustrethree.

 In the screenshots Lustreone=MGS Lustretwo=MDT
 Lustrethree=OSS+raid10 target Lustrefour=OSS+raid10 target

 To help clarify the entire network and stress testing
 I did with all the clients I could give it is at
 www.ioio.ca/Lustre-tcp-bonding/images/html and
 www.ioio.ca/Lustre-tcp-bonding/Lustre-notes/images.html

 Proper benchmarking would be nice though as I just hit
 it with everything I could and it lived so I was happy. I
 found the manual to be lacking in benchmarking and really
 wanted to make nice graphs of it all but failed with iozone
 to do so for some reason.

 I'll be taking a run at upgrading everything to 1.8 in
 the coming week or so and when I do I'll grab some new
 screenshots and post the relevant items to the wiki.
  Otherwise if someone else wants to post the existing
 screenshots your welcome to use them as they do detail a
 ground up build. Apparently 1.8 is great with small files
 now so it should work even better with
 www.oil-gas.ca/phpsysinfo and www.linuxguru.ca/phpsysinfo


 --- On Sat, 5/9/09, Andreas Dilger adil...@sun.com
 wrote:

 From: Andreas Dilger adil...@sun.com
 Subject: Re: [Lustre-discuss] tcp network load
 balancing understanding lustre 1.8
 To: Arden Wiebe albert...@yahoo.com
 Cc: lustre-discuss@lists.lustre.org,
 Michael Ruepp mich...@schwarzfilm.ch
 Date: Saturday, May 9, 2009, 11:31 AM
 On May 09, 2009  09:18 -0700,
 Arden Wiebe wrote:
 This might help answer some questions.
 http://ioio.ca/Lustre-tcp-bonding/OST2.png which shows
 my mostly not
 tuned OSS and OST's pulling 400+MiB/s over
 TCP Bonding
 provided by the
 kernel complete with a cat of the
 modeprobe.conf
 file.  You have the other
 links I've sent you but the picture above is
 relevant
 to your questions.

 Arden, thanks for sharing this info.  Any chance
 you
 could post it to
 wiki.lustre.org?  It would seem there is one bit
 of
 info missing somewhere -
 how does bond0 know which interfaces to use?


 Also, another oddity - the network monitor is
 showing
 450MiB/s Received,
 yet the disk is showing only about 170MiB/s going
 to the
 disk.  Either
 something is wacky with the monitoring (e.g. it is
 counting
 Received for
 both the eth* networks AND bond0), or Lustre is
 doing
 something very
 wierd and retransmitting the bulk data like crazy
 (seems
 unlikely).


 --- On Thu, 5/7/09, Michael Ruepp mich...@schwarzfilm.ch
 wrote:

 From: Michael Ruepp mich...@schwarzfilm.ch
 Subject: [Lustre-discuss] tcp network
 load
 balancing understanding lustre 1.8
 To: lustre-discuss@lists.lustre.org
 Date: Thursday, May 7, 2009, 5:50 AM
 Hi there,

 I am configured a simple tcp lustre 1.8
 with one
 mdc (one
 nic) and two
 oss (four nic per oss)
 As well as in the 1.6 documentation,
 the
 multihomed
 sections is a
 little bit unclear to me.

 I give every NID a IP in the same
 subnet, eg:
 10.111.20.35-38 - oss0
 and 10.111.20.39-42 oss1

 Do I have to make modprobe.conf.local
 look like
 this to
 force lustre
 to use all four interfaces parallel:

 options lnet
 networks=tcp0(eth0,eth1,eth2,eth3)
 Because on Page 138 the 1.8 Manual
 says:
 Note – In the case of TCP-only
 clients, the
 first
 available non-
 loopback IP interface
 is used for tcp0 since the interfaces
 are not
 specified. 

 or do I have to specify it like this:
 options 

Re: [Lustre-discuss] tcp network load balancing understanding lustre 1.8

2009-05-10 Thread Christopher J. Walker
Mag Gam wrote:
 Thanks for the screen shot Arden.
 
 What is the maximum # of slaves you can have on a bonded interface?
 
 
 
 On Sun, May 10, 2009 at 12:15 AM, Arden Wiebe albert...@yahoo.com wrote:
 Bond0 knows which interface to utilize because all the other eth0-5 are 
 designated as slaves in their configuration files.  The manual is fairly 
 clear on that.

 In the screenshot the memory used in gnome system monitor is at 452.4 MiB of 
 7.8 GiB and the sustained bandwidth to the OSS and OST is 404.2 MiB/s which 
 corresponds roughly to what collectl is showing for KBWrite for Disks.  
 Collectl shows a few different results for Disks, Network and Lustre OST and 
 I believe it to be measuring the other OST on the network around 170MiB/s if 
 you view the other screenshot for OST1 or lustrethree.

 In the screenshots Lustreone=MGS Lustretwo=MDT Lustrethree=OSS+raid10 target 
 Lustrefour=OSS+raid10 target

 To help clarify the entire network and stress testing I did with all the 
 clients I could give it is at www.ioio.ca/Lustre-tcp-bonding/images/html and 
 www.ioio.ca/Lustre-tcp-bonding/Lustre-notes/images.html

 Proper benchmarking would be nice though as I just hit it with everything I 
 could and it lived so I was happy. I found the manual to be lacking in 
 benchmarking and really wanted to make nice graphs of it all but failed with 
 iozone to do so for some reason.

I too have been trying to benchmark a lustre filesystem with iozone 3.321.

Sometimes it works, and sometimes it hangs.

I turned on debugging, and ran a test with 2 clients on each of 40 
machines. In the output, I get lines like:
  loop: R_STAT_DATA for client 9

For 79 clients, there are two of these messages in the output, and for 
one of them only 1.

I've had a brief skim of the source code, and I think that the problem 
is that iozone uses UDP packets to communicate. On a heavily loaded 
network, one of these is bound to get lost. Presumably iozone doesn't 
have the right retry strategy.

The iozone author has suggested using a different network for the timing 
packets - but I don't think I can justify the time or expense involved 
in building one purely to do some benchmarking.

Chris

PS On a machine with 2 bonded Gigabit ethernet cards, I found I needed 
two iozone threads to get the available bandwidth. One iozone thread 
seemed to get the bandwidth from one card only.


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] tcp network load balancing understanding lustre 1.8

2009-05-10 Thread Brian J. Murrell
On Sun, 2009-05-10 at 15:07 +0100, Christopher J. Walker wrote:
 
 I've had a brief skim of the source code, and I think that the problem 
 is that iozone uses UDP packets to communicate. On a heavily loaded 
 network, one of these is bound to get lost. Presumably iozone doesn't 
 have the right retry strategy.

Why not use a benchmark that uses an established MPI (such as MPICH or
LAM, which can run it's message passing infrastructure on a TCP
transport such as rsh or ssh) library.  IOR is one such benchmark.

Of course, if your network is really so loaded as to be dropping UDP
packets then that will probably impact the latency of the MPI messages.
Not sure if that will have a meaningful impact on IOR or not.  I tend to
think the messaging is quite low volume so perhaps not.

In any case, it can add another data point to your debugging efforts to
help prove or disprove your hypothesis.

b.



signature.asc
Description: This is a digitally signed message part
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss