Re: [Lustre-discuss] One or two OSS, no difference?

2010-03-05 Thread Andreas Dilger
On 2010-03-04, at 14:18, Jeffrey Bennett wrote:
 I just noticed the sequential performance is ok, but the random IO  
 (which is what I am measuring) is not. Is there any way to increase  
 random IO performance on Lustre? We have LUNs that can provide  
 around 250.000 random read 4kb IOPS but we are only seeing 3.000 to  
 10.000 on Lustre.

There is work currently underway to improve the SMP scaling  
performance for the RPC handling layer in Lustre.  Currently that  
limits the delivered RPC rate to 10-15k/sec or so.

 -Original Message-
 From: oleg.dro...@sun.com [mailto:oleg.dro...@sun.com]
 Sent: Thursday, March 04, 2010 12:49 PM
 To: Jeffrey Bennett
 Cc: lustre-discuss@lists.lustre.org
 Subject: Re: [Lustre-discuss] One or two OSS, no difference?

 Hello!

   This is pretty strange. Are there any differences in network  
 topology that can explain this?
   If you remove the first client, does the second one shows  
 performance
   at the level of of the first, but as soon as you start the load on  
 the first again, the second
   client performance drops?

 Bye,
Oleg
 On Mar 4, 2010, at 1:45 PM, Jeffrey Bennett wrote:

 Hi Oleg, thanks for your reply

 I was actually testing with only one client. When adding a second  
 client using a different file, one client gets all the performance  
 and the other one gets very low performance, any recommendation?

 Thanks in advance

 jab


 -Original Message-
 From: oleg.dro...@sun.com [mailto:oleg.dro...@sun.com]
 Sent: Wednesday, March 03, 2010 5:20 PM
 To: Jeffrey Bennett
 Cc: lustre-discuss@lists.lustre.org
 Subject: Re: [Lustre-discuss] One or two OSS, no difference?

 Hello!

 On Mar 3, 2010, at 6:35 PM, Jeffrey Bennett wrote:
 We are building a very small Lustre cluster with 32 clients  
 (patchless) and two OSS servers. Each OSS server has 1 OST with 1  
 TB of Solid State Drives. All is connected using dual-port DDR IB.

 For testing purposes, I am enabling/disabling one of the OSS/OST  
 by using the lfs setstripe command. I am running XDD and vdbench  
 benchmarks.

 Does anybody have an idea why there is no difference in MB/sec or  
 random IOPS when using one OSS or two OSS? A quick test with dd  
 also shows the same MB/sec when using one or two OSTs.

 I wonder if you just don't saturate even one OST (both backend SSD  
 and IB interconnect) with this number of clients? Does the total  
 throughput decreases as you decrease
 number of active clients and increases as you increase it even  
 further?
 Increasing maximum number of in-flight rpcs might help in that case.
 Also are all of your clients writing to the same file or each  
 client does io to a separate file (I hope)?

 Bye,
   Oleg

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] problems restoring from MDT backup (test file system)

2010-03-05 Thread Andreas Dilger
On 2010-03-04, at 05:46, Frederik Ferner wrote:
 Brian J. Murrell wrote:
 On Thu, 2010-03-04 at 11:21 +, Frederik Ferner wrote:
 tar tizf test_MDT_Backup.tar.gz
 
 ./ROOT/tmp/frederik/cs04r-sc-com02-04/
 ./ROOT/tmp/frederik/cs04r-sc-com02-04/iozone.DUMMY.47
 tar: Unexpected EOF in archive
 tar: Error is not recoverable: exiting now
 /snip

 Looks to me like either your tar executable is broken or your  
 archive is
 broken.  A typical process of elimination should help you discover  
 which
 is the case.

 It certainly looks like it's the tar archive that is broken. I get the
 same when I copy it over to a different machine. Unless is the tar
 executable that is broken so that it creates the broken archive as  
 every
 time I create a new archive it seems to be broken at the same place.

 Other tar files created on the same machine don't have that problem,  
 but
 I'll try creating a new archive with a new executable.


Make sure you use --sparse so that tar isn't mistakenly creating  
huge archives full of zeroes.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Extremely high load and hanging processes on a Lustre client

2010-03-05 Thread Götz Waschk
Hi everyone,

I have a critical problem on one of my Lustre client machines running
Scientific Linux 5.4 and the patchless Lustre 1.8.2 client. After a
few days of usage, some processes like cp and kswapd0 start to use
100% CPU. Only 180k of swap space are in use though.

Processes that try to access Lustre use a lot of CPU and seem to hang.

There is some output in the kernel log I'll attach to this mail.

Do you have any idea what to test before rebooting the machine?

Regards, Götz Waschk

-- 
AL I:40: Do what thou wilt shall be the whole of the Law.


kernel-log.txt.bz2
Description: BZip2 compressed data


kernel-lustre-log.txt.bz2
Description: BZip2 compressed data
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Problem with flock and perl on Lustre FS

2010-03-05 Thread Jagga Soorma
Hi Guys,

How does lustre handle locking?  One of our users is complaining that a perl
module (Sotrable) has trouble with its lock_nstore method when it tries to
use flock.  The following is a hwo they are reporducing this issue:

--
 perl -d -e ''

Loading DB routines from perl5db.pl version 1.3
Editor support available.

Enter h or `h h' for help, or `man perldebug' for more help.

Debugged program terminated.  Use q to quit or R to restart,
 use o inhibit_exit to avoid stopping after program termination,
 h q, h R or h o to get additional info.
 DB1 use Fcntl ':flock'

 DB2 open(FOO, /tmp/gh) or die darn

 DB3 flock(FOO, LOCK_EX) || die SHIE: $!

 DB4 close FOO

 DB5 open(FOO, gh) or die darn

 DB6 flock(FOO, LOCK_EX) || die SHIE: $!
SHIE: Function not implemented at (eval
10)[/usr/lib/perl5/5.10.0/perl5db.pl:638] line 2.

 DB7
--

Thanks in advance for your assistance.

Regards,
-J
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] One or two OSS, no difference?

2010-03-05 Thread Jeffrey Bennett
Andreas, if we are using 4kb blocks I understand we only transfer 1 page per 
RPC call, so are we limited to 10-15K RPC per second or what's the same, 
10-15.000 IOPS?

jab


-Original Message-
From: andreas.dil...@sun.com [mailto:andreas.dil...@sun.com] On Behalf Of 
Andreas Dilger
Sent: Friday, March 05, 2010 2:05 AM
To: Jeffrey Bennett
Cc: oleg.dro...@sun.com; lustre-discuss@lists.lustre.org
Subject: Re: [Lustre-discuss] One or two OSS, no difference?

On 2010-03-04, at 14:18, Jeffrey Bennett wrote:
 I just noticed the sequential performance is ok, but the random IO  
 (which is what I am measuring) is not. Is there any way to increase  
 random IO performance on Lustre? We have LUNs that can provide  
 around 250.000 random read 4kb IOPS but we are only seeing 3.000 to  
 10.000 on Lustre.

There is work currently underway to improve the SMP scaling  
performance for the RPC handling layer in Lustre.  Currently that  
limits the delivered RPC rate to 10-15k/sec or so.

 -Original Message-
 From: oleg.dro...@sun.com [mailto:oleg.dro...@sun.com]
 Sent: Thursday, March 04, 2010 12:49 PM
 To: Jeffrey Bennett
 Cc: lustre-discuss@lists.lustre.org
 Subject: Re: [Lustre-discuss] One or two OSS, no difference?

 Hello!

   This is pretty strange. Are there any differences in network  
 topology that can explain this?
   If you remove the first client, does the second one shows  
 performance
   at the level of of the first, but as soon as you start the load on  
 the first again, the second
   client performance drops?

 Bye,
Oleg
 On Mar 4, 2010, at 1:45 PM, Jeffrey Bennett wrote:

 Hi Oleg, thanks for your reply

 I was actually testing with only one client. When adding a second  
 client using a different file, one client gets all the performance  
 and the other one gets very low performance, any recommendation?

 Thanks in advance

 jab


 -Original Message-
 From: oleg.dro...@sun.com [mailto:oleg.dro...@sun.com]
 Sent: Wednesday, March 03, 2010 5:20 PM
 To: Jeffrey Bennett
 Cc: lustre-discuss@lists.lustre.org
 Subject: Re: [Lustre-discuss] One or two OSS, no difference?

 Hello!

 On Mar 3, 2010, at 6:35 PM, Jeffrey Bennett wrote:
 We are building a very small Lustre cluster with 32 clients  
 (patchless) and two OSS servers. Each OSS server has 1 OST with 1  
 TB of Solid State Drives. All is connected using dual-port DDR IB.

 For testing purposes, I am enabling/disabling one of the OSS/OST  
 by using the lfs setstripe command. I am running XDD and vdbench  
 benchmarks.

 Does anybody have an idea why there is no difference in MB/sec or  
 random IOPS when using one OSS or two OSS? A quick test with dd  
 also shows the same MB/sec when using one or two OSTs.

 I wonder if you just don't saturate even one OST (both backend SSD  
 and IB interconnect) with this number of clients? Does the total  
 throughput decreases as you decrease
 number of active clients and increases as you increase it even  
 further?
 Increasing maximum number of in-flight rpcs might help in that case.
 Also are all of your clients writing to the same file or each  
 client does io to a separate file (I hope)?

 Bye,
   Oleg

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Problem with flock and perl on Lustre FS

2010-03-05 Thread Andreas Dilger
On 2010-03-05, at 14:49, Jagga Soorma wrote:
 How does lustre handle locking?  One of our users is complaining  
 that a perl module (Sotrable) has trouble with its lock_nstore  
 method when it tries to use flock.  The following is a hwo they are  
 reporducing this issue:

  DB6 flock(FOO, LOCK_EX) || die SHIE: $!
 SHIE: Function not implemented at (eval
 10)[/usr/lib/perl5/5.10.0/perl5db.pl:638] line 2.

Search the list or manual for -o flock, -o localflock, and -o  
noflock mount options for the client(s).

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] One or two OSS, no difference?

2010-03-05 Thread Andreas Dilger
On 2010-03-05, at 14:53, Jeffrey Bennett wrote:
 Andreas, if we are using 4kb blocks I understand we only transfer 1  
 page per RPC call, so are we limited to 10-15K RPC per second or  
 what's the same, 10-15.000 IOPS?

That depends on whether you are doing read or write requests, whether  
it is in the client cache, etc.  Random read requests would definitely  
fall under the RPC limit, random write requests can benefit from  
aggregation on the client, assuming you aren't doing O_DIRECT or  
O_SYNC IO operations.

Increasing your max_rpcs_in_flight and max_dirty_mb on the clients can  
improve IOPS, assuming the servers are not handling enough requests  
from the clients.  Check the RPC req_waittime, req_qdepth,  
ost_{read,write} service time via:

 lctl get_param ost.OSS.ost_io.stats

to see whether the servers are saturated, or idle.  CPU usage may also  
be a factor.

There was also bug 22074 fixed recently (post 1.8.2) that addresses a  
performance problem with lots of small IOs to different files (NOT  
related to small IOs to a single file).

 -Original Message-
 From: andreas.dil...@sun.com [mailto:andreas.dil...@sun.com] On  
 Behalf Of Andreas Dilger
 Sent: Friday, March 05, 2010 2:05 AM
 To: Jeffrey Bennett
 Cc: oleg.dro...@sun.com; lustre-discuss@lists.lustre.org
 Subject: Re: [Lustre-discuss] One or two OSS, no difference?

 On 2010-03-04, at 14:18, Jeffrey Bennett wrote:
 I just noticed the sequential performance is ok, but the random IO
 (which is what I am measuring) is not. Is there any way to increase
 random IO performance on Lustre? We have LUNs that can provide
 around 250.000 random read 4kb IOPS but we are only seeing 3.000 to
 10.000 on Lustre.

 There is work currently underway to improve the SMP scaling
 performance for the RPC handling layer in Lustre.  Currently that
 limits the delivered RPC rate to 10-15k/sec or so.

 -Original Message-
 From: oleg.dro...@sun.com [mailto:oleg.dro...@sun.com]
 Sent: Thursday, March 04, 2010 12:49 PM
 To: Jeffrey Bennett
 Cc: lustre-discuss@lists.lustre.org
 Subject: Re: [Lustre-discuss] One or two OSS, no difference?

 Hello!

  This is pretty strange. Are there any differences in network
 topology that can explain this?
  If you remove the first client, does the second one shows
 performance
  at the level of of the first, but as soon as you start the load on
 the first again, the second
  client performance drops?

 Bye,
   Oleg
 On Mar 4, 2010, at 1:45 PM, Jeffrey Bennett wrote:

 Hi Oleg, thanks for your reply

 I was actually testing with only one client. When adding a second
 client using a different file, one client gets all the performance
 and the other one gets very low performance, any recommendation?

 Thanks in advance

 jab


 -Original Message-
 From: oleg.dro...@sun.com [mailto:oleg.dro...@sun.com]
 Sent: Wednesday, March 03, 2010 5:20 PM
 To: Jeffrey Bennett
 Cc: lustre-discuss@lists.lustre.org
 Subject: Re: [Lustre-discuss] One or two OSS, no difference?

 Hello!

 On Mar 3, 2010, at 6:35 PM, Jeffrey Bennett wrote:
 We are building a very small Lustre cluster with 32 clients
 (patchless) and two OSS servers. Each OSS server has 1 OST with 1
 TB of Solid State Drives. All is connected using dual-port DDR IB.

 For testing purposes, I am enabling/disabling one of the OSS/OST
 by using the lfs setstripe command. I am running XDD and vdbench
 benchmarks.

 Does anybody have an idea why there is no difference in MB/sec or
 random IOPS when using one OSS or two OSS? A quick test with dd
 also shows the same MB/sec when using one or two OSTs.

 I wonder if you just don't saturate even one OST (both backend SSD
 and IB interconnect) with this number of clients? Does the total
 throughput decreases as you decrease
 number of active clients and increases as you increase it even
 further?
 Increasing maximum number of in-flight rpcs might help in that case.
 Also are all of your clients writing to the same file or each
 client does io to a separate file (I hope)?

 Bye,
  Oleg

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


 Cheers, Andreas
 --
 Andreas Dilger
 Sr. Staff Engineer, Lustre Group
 Sun Microsystems of Canada, Inc.



Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Question regarding caution statement in 1.8 manual for the consistent mode flock option

2010-03-05 Thread Jagga Soorma
Hi Guys,

Thanks Andreas for pointing me to the flock options.  However, I see the
following caution statement for the consistent mode:

--
CAUTION: This mode has a noticeable performance impact and may affect
stability, depending on the Lustre version used. Consider using a newer
Lustre version which is more stable.
--

Is there an impact if the option is turned on, or only if it is turned on
and used?  Is the impact local to the file being locked, the machine on
which that file is locked, or the entire set of machines mounting that
lustre file system?

Thanks in advance,
-J
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Question regarding caution statement in 1.8 manual for the consistent mode flock option

2010-03-05 Thread Andreas Dilger
On 2010-03-05, at 15:18, Jagga Soorma wrote:
 Thanks Andreas for pointing me to the flock options.  However, I see  
 the following caution statement for the consistent mode:

 --
 CAUTION: This mode has a noticeable performance impact and may  
 affect stability, depending on the Lustre version used. Consider  
 using a newer Lustre version which is more stable.
 --

 Is there an impact if the option is turned on, or only if it is  
 turned on and used?  Is the impact local to the file being locked,  
 the machine on which that file is locked, or the entire set of  
 machines mounting that lustre file system?


It only affects the performance of the file that is being flocked.  If  
it is enabled and no applications are using flock then it has no effect.

It used to be that we defaulted to localflock behaviour, which has  
minimal performance impact, but that was confusing to applications.   
The noflock default now reports an error as you saw and it is up to  
the administrator to pick either localflock (fastest, low impact,  
not coherent between nodes) or flock (slower, performance impact for  
use, coherent between nodes).

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Extremely high load and hanging processes on a Lustre client

2010-03-05 Thread Bernd Schubert
On Friday 05 March 2010, Götz Waschk wrote:
 Hi everyone,
 
 I have a critical problem on one of my Lustre client machines running
 Scientific Linux 5.4 and the patchless Lustre 1.8.2 client. After a
 few days of usage, some processes like cp and kswapd0 start to use
 100% CPU. Only 180k of swap space are in use though.
 
 Processes that try to access Lustre use a lot of CPU and seem to hang.
 
 There is some output in the kernel log I'll attach to this mail.
 
 Do you have any idea what to test before rebooting the machine?

Don't reboot, but disable LRU resizing. 

for i in /proc/fs/lustre/ldlm/namespaces/*; do echo 800  ${i}/lru_size; done


At least that helped all the time before when we had that problem. I hoped it 
would be fixed in 1.8.2, but seems it is not. Please open a bug report.


Thanks,
Bernd

-- 
Bernd Schubert
DataDirect Networks
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Question regarding caution statement in 1.8 manual for the consistent mode flock option

2010-03-05 Thread Oleg Drokin
Hello!

On Mar 5, 2010, at 5:25 PM, Andreas Dilger wrote:

 On 2010-03-05, at 15:18, Jagga Soorma wrote:
 Is there an impact if the option is turned on, or only if it is  
 turned on and used?  Is the impact local to the file being locked,  
 the machine on which that file is locked, or the entire set of  
 machines mounting that lustre file system?
 It only affects the performance of the file that is being flocked.  If  
 it is enabled and no applications are using flock then it has no effect.

Actually another side effect is if you have a lot of flock activity going on
that might put a lot of (cpu) load on your MDS esp. if there are a lot of
conflicts.

Another gotcha is some application might try to use flock when they see
functionality as available and this is pretty slow on lustre, every
lock/unlock request directly translates to an RPC.

And lastly speaking of real flock (not posix locking through fcntl) there is
one additional limitation, you can't actually pass on a file descriptor to 
another
process and inherit the lock there (classic example you can find in any book
is that if you do flock, then fork, your child process can close/unlock the
file and parent process will lose the lock too. Does not happen with Lustre)

Bye,
Oleg
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss