Re: [Lustre-discuss] One or two OSS, no difference?
On 2010-03-04, at 14:18, Jeffrey Bennett wrote: I just noticed the sequential performance is ok, but the random IO (which is what I am measuring) is not. Is there any way to increase random IO performance on Lustre? We have LUNs that can provide around 250.000 random read 4kb IOPS but we are only seeing 3.000 to 10.000 on Lustre. There is work currently underway to improve the SMP scaling performance for the RPC handling layer in Lustre. Currently that limits the delivered RPC rate to 10-15k/sec or so. -Original Message- From: oleg.dro...@sun.com [mailto:oleg.dro...@sun.com] Sent: Thursday, March 04, 2010 12:49 PM To: Jeffrey Bennett Cc: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] One or two OSS, no difference? Hello! This is pretty strange. Are there any differences in network topology that can explain this? If you remove the first client, does the second one shows performance at the level of of the first, but as soon as you start the load on the first again, the second client performance drops? Bye, Oleg On Mar 4, 2010, at 1:45 PM, Jeffrey Bennett wrote: Hi Oleg, thanks for your reply I was actually testing with only one client. When adding a second client using a different file, one client gets all the performance and the other one gets very low performance, any recommendation? Thanks in advance jab -Original Message- From: oleg.dro...@sun.com [mailto:oleg.dro...@sun.com] Sent: Wednesday, March 03, 2010 5:20 PM To: Jeffrey Bennett Cc: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] One or two OSS, no difference? Hello! On Mar 3, 2010, at 6:35 PM, Jeffrey Bennett wrote: We are building a very small Lustre cluster with 32 clients (patchless) and two OSS servers. Each OSS server has 1 OST with 1 TB of Solid State Drives. All is connected using dual-port DDR IB. For testing purposes, I am enabling/disabling one of the OSS/OST by using the lfs setstripe command. I am running XDD and vdbench benchmarks. Does anybody have an idea why there is no difference in MB/sec or random IOPS when using one OSS or two OSS? A quick test with dd also shows the same MB/sec when using one or two OSTs. I wonder if you just don't saturate even one OST (both backend SSD and IB interconnect) with this number of clients? Does the total throughput decreases as you decrease number of active clients and increases as you increase it even further? Increasing maximum number of in-flight rpcs might help in that case. Also are all of your clients writing to the same file or each client does io to a separate file (I hope)? Bye, Oleg ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] problems restoring from MDT backup (test file system)
On 2010-03-04, at 05:46, Frederik Ferner wrote: Brian J. Murrell wrote: On Thu, 2010-03-04 at 11:21 +, Frederik Ferner wrote: tar tizf test_MDT_Backup.tar.gz ./ROOT/tmp/frederik/cs04r-sc-com02-04/ ./ROOT/tmp/frederik/cs04r-sc-com02-04/iozone.DUMMY.47 tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now /snip Looks to me like either your tar executable is broken or your archive is broken. A typical process of elimination should help you discover which is the case. It certainly looks like it's the tar archive that is broken. I get the same when I copy it over to a different machine. Unless is the tar executable that is broken so that it creates the broken archive as every time I create a new archive it seems to be broken at the same place. Other tar files created on the same machine don't have that problem, but I'll try creating a new archive with a new executable. Make sure you use --sparse so that tar isn't mistakenly creating huge archives full of zeroes. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Extremely high load and hanging processes on a Lustre client
Hi everyone, I have a critical problem on one of my Lustre client machines running Scientific Linux 5.4 and the patchless Lustre 1.8.2 client. After a few days of usage, some processes like cp and kswapd0 start to use 100% CPU. Only 180k of swap space are in use though. Processes that try to access Lustre use a lot of CPU and seem to hang. There is some output in the kernel log I'll attach to this mail. Do you have any idea what to test before rebooting the machine? Regards, Götz Waschk -- AL I:40: Do what thou wilt shall be the whole of the Law. kernel-log.txt.bz2 Description: BZip2 compressed data kernel-lustre-log.txt.bz2 Description: BZip2 compressed data ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Problem with flock and perl on Lustre FS
Hi Guys, How does lustre handle locking? One of our users is complaining that a perl module (Sotrable) has trouble with its lock_nstore method when it tries to use flock. The following is a hwo they are reporducing this issue: -- perl -d -e '' Loading DB routines from perl5db.pl version 1.3 Editor support available. Enter h or `h h' for help, or `man perldebug' for more help. Debugged program terminated. Use q to quit or R to restart, use o inhibit_exit to avoid stopping after program termination, h q, h R or h o to get additional info. DB1 use Fcntl ':flock' DB2 open(FOO, /tmp/gh) or die darn DB3 flock(FOO, LOCK_EX) || die SHIE: $! DB4 close FOO DB5 open(FOO, gh) or die darn DB6 flock(FOO, LOCK_EX) || die SHIE: $! SHIE: Function not implemented at (eval 10)[/usr/lib/perl5/5.10.0/perl5db.pl:638] line 2. DB7 -- Thanks in advance for your assistance. Regards, -J ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] One or two OSS, no difference?
Andreas, if we are using 4kb blocks I understand we only transfer 1 page per RPC call, so are we limited to 10-15K RPC per second or what's the same, 10-15.000 IOPS? jab -Original Message- From: andreas.dil...@sun.com [mailto:andreas.dil...@sun.com] On Behalf Of Andreas Dilger Sent: Friday, March 05, 2010 2:05 AM To: Jeffrey Bennett Cc: oleg.dro...@sun.com; lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] One or two OSS, no difference? On 2010-03-04, at 14:18, Jeffrey Bennett wrote: I just noticed the sequential performance is ok, but the random IO (which is what I am measuring) is not. Is there any way to increase random IO performance on Lustre? We have LUNs that can provide around 250.000 random read 4kb IOPS but we are only seeing 3.000 to 10.000 on Lustre. There is work currently underway to improve the SMP scaling performance for the RPC handling layer in Lustre. Currently that limits the delivered RPC rate to 10-15k/sec or so. -Original Message- From: oleg.dro...@sun.com [mailto:oleg.dro...@sun.com] Sent: Thursday, March 04, 2010 12:49 PM To: Jeffrey Bennett Cc: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] One or two OSS, no difference? Hello! This is pretty strange. Are there any differences in network topology that can explain this? If you remove the first client, does the second one shows performance at the level of of the first, but as soon as you start the load on the first again, the second client performance drops? Bye, Oleg On Mar 4, 2010, at 1:45 PM, Jeffrey Bennett wrote: Hi Oleg, thanks for your reply I was actually testing with only one client. When adding a second client using a different file, one client gets all the performance and the other one gets very low performance, any recommendation? Thanks in advance jab -Original Message- From: oleg.dro...@sun.com [mailto:oleg.dro...@sun.com] Sent: Wednesday, March 03, 2010 5:20 PM To: Jeffrey Bennett Cc: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] One or two OSS, no difference? Hello! On Mar 3, 2010, at 6:35 PM, Jeffrey Bennett wrote: We are building a very small Lustre cluster with 32 clients (patchless) and two OSS servers. Each OSS server has 1 OST with 1 TB of Solid State Drives. All is connected using dual-port DDR IB. For testing purposes, I am enabling/disabling one of the OSS/OST by using the lfs setstripe command. I am running XDD and vdbench benchmarks. Does anybody have an idea why there is no difference in MB/sec or random IOPS when using one OSS or two OSS? A quick test with dd also shows the same MB/sec when using one or two OSTs. I wonder if you just don't saturate even one OST (both backend SSD and IB interconnect) with this number of clients? Does the total throughput decreases as you decrease number of active clients and increases as you increase it even further? Increasing maximum number of in-flight rpcs might help in that case. Also are all of your clients writing to the same file or each client does io to a separate file (I hope)? Bye, Oleg ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Problem with flock and perl on Lustre FS
On 2010-03-05, at 14:49, Jagga Soorma wrote: How does lustre handle locking? One of our users is complaining that a perl module (Sotrable) has trouble with its lock_nstore method when it tries to use flock. The following is a hwo they are reporducing this issue: DB6 flock(FOO, LOCK_EX) || die SHIE: $! SHIE: Function not implemented at (eval 10)[/usr/lib/perl5/5.10.0/perl5db.pl:638] line 2. Search the list or manual for -o flock, -o localflock, and -o noflock mount options for the client(s). Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] One or two OSS, no difference?
On 2010-03-05, at 14:53, Jeffrey Bennett wrote: Andreas, if we are using 4kb blocks I understand we only transfer 1 page per RPC call, so are we limited to 10-15K RPC per second or what's the same, 10-15.000 IOPS? That depends on whether you are doing read or write requests, whether it is in the client cache, etc. Random read requests would definitely fall under the RPC limit, random write requests can benefit from aggregation on the client, assuming you aren't doing O_DIRECT or O_SYNC IO operations. Increasing your max_rpcs_in_flight and max_dirty_mb on the clients can improve IOPS, assuming the servers are not handling enough requests from the clients. Check the RPC req_waittime, req_qdepth, ost_{read,write} service time via: lctl get_param ost.OSS.ost_io.stats to see whether the servers are saturated, or idle. CPU usage may also be a factor. There was also bug 22074 fixed recently (post 1.8.2) that addresses a performance problem with lots of small IOs to different files (NOT related to small IOs to a single file). -Original Message- From: andreas.dil...@sun.com [mailto:andreas.dil...@sun.com] On Behalf Of Andreas Dilger Sent: Friday, March 05, 2010 2:05 AM To: Jeffrey Bennett Cc: oleg.dro...@sun.com; lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] One or two OSS, no difference? On 2010-03-04, at 14:18, Jeffrey Bennett wrote: I just noticed the sequential performance is ok, but the random IO (which is what I am measuring) is not. Is there any way to increase random IO performance on Lustre? We have LUNs that can provide around 250.000 random read 4kb IOPS but we are only seeing 3.000 to 10.000 on Lustre. There is work currently underway to improve the SMP scaling performance for the RPC handling layer in Lustre. Currently that limits the delivered RPC rate to 10-15k/sec or so. -Original Message- From: oleg.dro...@sun.com [mailto:oleg.dro...@sun.com] Sent: Thursday, March 04, 2010 12:49 PM To: Jeffrey Bennett Cc: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] One or two OSS, no difference? Hello! This is pretty strange. Are there any differences in network topology that can explain this? If you remove the first client, does the second one shows performance at the level of of the first, but as soon as you start the load on the first again, the second client performance drops? Bye, Oleg On Mar 4, 2010, at 1:45 PM, Jeffrey Bennett wrote: Hi Oleg, thanks for your reply I was actually testing with only one client. When adding a second client using a different file, one client gets all the performance and the other one gets very low performance, any recommendation? Thanks in advance jab -Original Message- From: oleg.dro...@sun.com [mailto:oleg.dro...@sun.com] Sent: Wednesday, March 03, 2010 5:20 PM To: Jeffrey Bennett Cc: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] One or two OSS, no difference? Hello! On Mar 3, 2010, at 6:35 PM, Jeffrey Bennett wrote: We are building a very small Lustre cluster with 32 clients (patchless) and two OSS servers. Each OSS server has 1 OST with 1 TB of Solid State Drives. All is connected using dual-port DDR IB. For testing purposes, I am enabling/disabling one of the OSS/OST by using the lfs setstripe command. I am running XDD and vdbench benchmarks. Does anybody have an idea why there is no difference in MB/sec or random IOPS when using one OSS or two OSS? A quick test with dd also shows the same MB/sec when using one or two OSTs. I wonder if you just don't saturate even one OST (both backend SSD and IB interconnect) with this number of clients? Does the total throughput decreases as you decrease number of active clients and increases as you increase it even further? Increasing maximum number of in-flight rpcs might help in that case. Also are all of your clients writing to the same file or each client does io to a separate file (I hope)? Bye, Oleg ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Question regarding caution statement in 1.8 manual for the consistent mode flock option
Hi Guys, Thanks Andreas for pointing me to the flock options. However, I see the following caution statement for the consistent mode: -- CAUTION: This mode has a noticeable performance impact and may affect stability, depending on the Lustre version used. Consider using a newer Lustre version which is more stable. -- Is there an impact if the option is turned on, or only if it is turned on and used? Is the impact local to the file being locked, the machine on which that file is locked, or the entire set of machines mounting that lustre file system? Thanks in advance, -J ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Question regarding caution statement in 1.8 manual for the consistent mode flock option
On 2010-03-05, at 15:18, Jagga Soorma wrote: Thanks Andreas for pointing me to the flock options. However, I see the following caution statement for the consistent mode: -- CAUTION: This mode has a noticeable performance impact and may affect stability, depending on the Lustre version used. Consider using a newer Lustre version which is more stable. -- Is there an impact if the option is turned on, or only if it is turned on and used? Is the impact local to the file being locked, the machine on which that file is locked, or the entire set of machines mounting that lustre file system? It only affects the performance of the file that is being flocked. If it is enabled and no applications are using flock then it has no effect. It used to be that we defaulted to localflock behaviour, which has minimal performance impact, but that was confusing to applications. The noflock default now reports an error as you saw and it is up to the administrator to pick either localflock (fastest, low impact, not coherent between nodes) or flock (slower, performance impact for use, coherent between nodes). Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Extremely high load and hanging processes on a Lustre client
On Friday 05 March 2010, Götz Waschk wrote: Hi everyone, I have a critical problem on one of my Lustre client machines running Scientific Linux 5.4 and the patchless Lustre 1.8.2 client. After a few days of usage, some processes like cp and kswapd0 start to use 100% CPU. Only 180k of swap space are in use though. Processes that try to access Lustre use a lot of CPU and seem to hang. There is some output in the kernel log I'll attach to this mail. Do you have any idea what to test before rebooting the machine? Don't reboot, but disable LRU resizing. for i in /proc/fs/lustre/ldlm/namespaces/*; do echo 800 ${i}/lru_size; done At least that helped all the time before when we had that problem. I hoped it would be fixed in 1.8.2, but seems it is not. Please open a bug report. Thanks, Bernd -- Bernd Schubert DataDirect Networks ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Question regarding caution statement in 1.8 manual for the consistent mode flock option
Hello! On Mar 5, 2010, at 5:25 PM, Andreas Dilger wrote: On 2010-03-05, at 15:18, Jagga Soorma wrote: Is there an impact if the option is turned on, or only if it is turned on and used? Is the impact local to the file being locked, the machine on which that file is locked, or the entire set of machines mounting that lustre file system? It only affects the performance of the file that is being flocked. If it is enabled and no applications are using flock then it has no effect. Actually another side effect is if you have a lot of flock activity going on that might put a lot of (cpu) load on your MDS esp. if there are a lot of conflicts. Another gotcha is some application might try to use flock when they see functionality as available and this is pretty slow on lustre, every lock/unlock request directly translates to an RPC. And lastly speaking of real flock (not posix locking through fcntl) there is one additional limitation, you can't actually pass on a file descriptor to another process and inherit the lock there (classic example you can find in any book is that if you do flock, then fork, your child process can close/unlock the file and parent process will lose the lock too. Does not happen with Lustre) Bye, Oleg ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss