Re: [Lustre-discuss] Line rate performance for clients

2011-08-02 Thread Isaac Huang
On Mon, Aug 01, 2011 at 02:52:07PM +0200, Peter Kjellström wrote:
   On 2011-07-29, at 11:33 AM, Brock Palen wrote:
  ..
  Does that make sense?  Is it even right for me to expect that I could
  combine the performance together and expect full speed in and full speed
  out if I can consistently get them independent of each other?

I believe yes. I remember that we once did a test on 1GigE where one
client read from and another wrote to a same server and observed
about 223MB/s aggregate read/write throughput.

 Can your setup do wirespeed full duplex in the simplest case (never mind with 
 lustre)? I'd try iperf or something similar before investing too much time 
 looking for lost performance in higher layers.

Agree. And if 'iperf' results look good, I'd suggest to move on to the
LNet selftest, and it'd tell you whether the Lustre networking stack
is capable of saturating the link in both directions.

Here's a script we once used, with outputs:

[root@sata16 ~]# export LST_SESSION=$$
[root@sata16 ~]# lst new_session --timeout 100 read/write
SESSION: read/write TIMEOUT: 100 FORCE: No
[root@sata16 ~]# lst add_group servers sata14@tcp
sata14@tcp are added to session
[root@sata16 ~]# lst add_group readers sata16@tcp
sata16@tcp are added to session
[root@sata16 ~]# lst add_group writers sata16@tcp
sata16@tcp are added to session
[root@sata16 ~]# lst add_batch bulk_rw
[root@sata16 ~]# lst add_test --batch bulk_rw --concurrency 8 --from
readers --to servers brw read size=1M
Test was added successfully
[root@sata16 ~]# lst add_test --batch bulk_rw --concurrency 8 --from
writers --to servers brw write size=1M
Test was added successfully
[root@sata16 ~]# lst run bulk_rw
bulk_rw is running now
[root@sata16 ~]# lst stat servers
[LNet Rates of servers]
[R] Avg: 335  RPC/s Min: 335  RPC/s Max: 335  RPC/s
[W] Avg: 446  RPC/s Min: 446  RPC/s Max: 446  RPC/s
[LNet Bandwidth of servers]
[R] Avg: 111.83   MB/s  Min: 111.83   MB/s  Max: 111.83   MB/s
[W] Avg: 111.23   MB/s  Min: 111.23   MB/s  Max: 111.23   MB/s

The script can be easily adapted to run on your system. Please load
the lnet_selftest kernel module on all test nodes before running it.
Lustre needs not to be running.

- Isaac
__
This email may contain privileged or confidential information, which should 
only be used for the purpose for which it was sent by Xyratex. No further 
rights or licenses are granted to use such information. If you are not the 
intended recipient of this message, please notify the sender by return and 
delete it. You may not use, copy, disclose or rely on the information contained 
in it.
 
Internet email is susceptible to data corruption, interception and unauthorised 
amendment for which Xyratex does not accept liability. While we have taken 
reasonable precautions to ensure that this email is free of viruses, Xyratex 
does not accept liability for the presence of any computer viruses in this 
email, nor for any losses caused as a result of viruses.
 
Xyratex Technology Limited (03134912), Registered in England  Wales, 
Registered Office, Langstone Road, Havant, Hampshire, PO9 1SA.
 
The Xyratex group of companies also includes, Xyratex Ltd, registered in 
Bermuda, Xyratex International Inc, registered in California, Xyratex 
(Malaysia) Sdn Bhd registered in Malaysia, Xyratex Technology (Wuxi) Co Ltd 
registered in The People's Republic of China and Xyratex Japan Limited 
registered in Japan.
__
 

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Line rate performance for clients

2011-08-01 Thread Peter Kjellström
  On 2011-07-29, at 11:33 AM, Brock Palen wrote:
  I think this is a networking question.
  
  We have lustre 1.8 clients with 1gig-e interfaces that according to
  ethtool are running full duplex.
  
  If I do the following:
  
  cp /lustre/largeilfe.h5 /tmp/
  
  I get 117MB/s
  
  If I then use globus-url-copy to move that file from /tmp/ to - remove
  tape archive I get 117MB/s
  
  If I go directly from  /lustre - archive  I get 50MB/s,
...
 Its just when the client is reading form lustre and sending the data out at
 the same time that I only get 50MB/s.
 
 Does that make sense?  Is it even right for me to expect that I could
 combine the performance together and expect full speed in and full speed
 out if I can consistently get them independent of each other?

Can your setup do wirespeed full duplex in the simplest case (never mind with 
lustre)? I'd try iperf or something similar before investing too much time 
looking for lost performance in higher layers.

/Peter


signature.asc
Description: This is a digitally signed message part.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Line rate performance for clients

2011-07-29 Thread Brock Palen
I think this is a networking question.

We have lustre 1.8 clients with 1gig-e interfaces that according to ethtool are 
running full duplex.

If I do the following:

cp /lustre/largeilfe.h5 /tmp/

I get 117MB/s

If I then use globus-url-copy to move that file from /tmp/ to - remove tape 
archive I get 117MB/s

If I go directly from  /lustre - archive  I get 50MB/s,  

this is consistently reproducible.  It doesn't mater if I just copy a large 
file on lustre to lustre,  or scp, or globus.  If I try to ingest and outgest 
data I get what looks like half duplex performance. 

Anyone have ideas why I cannot do 1Gig-e full duplex?

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Line rate performance for clients

2011-07-29 Thread Colin Faber
Try testing it with lnet self test and see what kind of results you see.

-cf


On 07/29/2011 11:33 AM, Brock Palen wrote:
 I think this is a networking question.

 We have lustre 1.8 clients with 1gig-e interfaces that according to ethtool 
 are running full duplex.

 If I do the following:

 cp /lustre/largeilfe.h5 /tmp/

 I get 117MB/s

 If I then use globus-url-copy to move that file from /tmp/ to -  remove tape 
 archive I get 117MB/s

 If I go directly from  /lustre -  archive  I get 50MB/s,

 this is consistently reproducible.  It doesn't mater if I just copy a large 
 file on lustre to lustre,  or scp, or globus.  If I try to ingest and outgest 
 data I get what looks like half duplex performance.

 Anyone have ideas why I cannot do 1Gig-e full duplex?

 Brock Palen
 www.umich.edu/~brockp
 Center for Advanced Computing
 bro...@umich.edu
 (734)936-1985



 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Line rate performance for clients

2011-07-29 Thread Andreas Dilger
On 2011-07-29, at 11:33 AM, Brock Palen wrote:
 I think this is a networking question.
 
 We have lustre 1.8 clients with 1gig-e interfaces that according to ethtool 
 are running full duplex.
 
 If I do the following:
 
 cp /lustre/largeilfe.h5 /tmp/
 
 I get 117MB/s
 
 If I then use globus-url-copy to move that file from /tmp/ to - remove tape 
 archive I get 117MB/s
 
 If I go directly from  /lustre - archive  I get 50MB/s,  

Strace your globus-url-copy and see what IO size it is using.  cp has long 
ago been modified to use the blocksize reported by stat(2) for copying, and 
Lustre reports a 2MB IO size for striped files (1MB for unstriped).  If your 
globus tool is using e.g. 4kB reads then it will be very inefficient for 
Lustre, but much less so than from /tmp.

 this is consistently reproducible.  It doesn't mater if I just copy a large 
 file on lustre to lustre,  or scp, or globus.  If I try to ingest and outgest 
 data I get what looks like half duplex performance. 
 
 Anyone have ideas why I cannot do 1Gig-e full duplex?

I don't think this has anything to do with full duplex.  117MB/s is pretty 
much  the maximum line rate for GigE (and pretty good for Lustre, if I do say 
so myself) in one direction.  There is presumably no data moving in the other 
direction at that time.

Cheers, Andreas
--
Andreas Dilger 
Principal Engineer
Whamcloud, Inc.



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Line rate performance for clients

2011-07-29 Thread Brock Palen


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Jul 29, 2011, at 2:01 PM, Andreas Dilger wrote:

 On 2011-07-29, at 11:33 AM, Brock Palen wrote:
 I think this is a networking question.
 
 We have lustre 1.8 clients with 1gig-e interfaces that according to ethtool 
 are running full duplex.
 
 If I do the following:
 
 cp /lustre/largeilfe.h5 /tmp/
 
 I get 117MB/s
 
 If I then use globus-url-copy to move that file from /tmp/ to - remove tape 
 archive I get 117MB/s
 
 If I go directly from  /lustre - archive  I get 50MB/s,  
 
 Strace your globus-url-copy and see what IO size it is using.  cp has long 
 ago been modified to use the blocksize reported by stat(2) for copying, and 
 Lustre reports a 2MB IO size for striped files (1MB for unstriped).  If your 
 globus tool is using e.g. 4kB reads then it will be very inefficient for 
 Lustre, but much less so than from /tmp.
 
 this is consistently reproducible.  It doesn't mater if I just copy a large 
 file on lustre to lustre,  or scp, or globus.  If I try to ingest and 
 outgest data I get what looks like half duplex performance. 
 
 Anyone have ideas why I cannot do 1Gig-e full duplex?
 
 I don't think this has anything to do with full duplex.  117MB/s is pretty 
 much  the maximum line rate for GigE (and pretty good for Lustre, if I do say 
 so myself) in one direction.  There is presumably no data moving in the other 
 direction at that time.

Ah I guess I wasn't clear, I only get 117MB/s when I do 'one direction on the 
network'  eg copy form lustre to /tmp (local drive)',   /tmp using globus out.

Its just when the client is reading form lustre and sending the data out at the 
same time that I only get 50MB/s.  

Does that make sense?  Is it even right for me to expect that I could combine 
the performance together and expect full speed in and full speed out if I can 
consistently get them independent of each other? 

 
 Cheers, Andreas
 --
 Andreas Dilger 
 Principal Engineer
 Whamcloud, Inc.
 
 
 
 
 

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss