Re: [Lustre-discuss] Lustre on WAN

2009-04-03 Thread Peter Grandi
[ ... ]

 Lustre is originally designed to target at HPC clusters,
 i.e., systems on a single LAN environment.

It is not so much single LAN, but streaming and low latency.

 On the other hand, the cloud we are building is physically
 distributed at different cities in the province of Alberta.
 [ ... ] performance is impressively good, partly due to the
 fast network we are running in the province.

The question here is whether the *clients* and/or the *servers*
are physically distributed.

If the servers are physically distributed then what are the
resilience requirements, and this largely relates to what is the
redundancy strategy for the underlying storage, and whether
files are striped across OSSes at different sites.

In the example below the servers are centralized but maybe this
is not what you mean by a cloud.

 This is very similar to the environment that is being used at
 Indiana University.  They have the Lustre servers at a
 central site, but several labs/campuses in other cities are
 mounting the filesystem and they can saturate 10GigE links
 between the sites.

That is quite plausible, but relevant performance depends on
whether access patterns are streaming or not, and doing some
decent TCP setup to maximize link utilization.

 [ ... ] one application we've tried over the WAN didn't fare
 very well out of the box - lots of small random reads (read
 ~4k, seek a bunch, read ~4k, etc ad nauseum).

That looks like some people have unrealistic expectations as to
latency and synchronous IO more than something to which Lustre
is relevant. Even if Lustre is mostly targeted at streaming
workloads on low latency networks, and even if it is not too bad
in different circumstances.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Client evictions and RMDA failures

2009-04-03 Thread syed haider
Thanks for your help Brian. We've resolved the problem by upgrading
the firmware on the HCA's from 1.0.7 to 1.2.0. mounts have stabilized.
Also upgraded to ofed 1.4 (minus the kernel-ib patches).


On Tue, Mar 31, 2009 at 4:29 PM, Brian J. Murrell brian.murr...@sun.com wrote:
 On Tue, 2009-03-31 at 16:02 -0400, syed haider wrote:

 What would cause this? Could this be because of the fabric also?

 Sure.  When the fabric is flaky all sorts of unexpected things (can)
 happen.  Really, your primary task should be making your network stable
 rather than continuing to muck with Lustre on it.

 b.


 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] e2scan question

2009-04-03 Thread Wojciech Turek
We are using e2scan since few days and we have noticed that date 
specification is not being processed correctly by e2scan.
  date
Fri Apr  3 15:56:49 BST 2009
   /usr/sbin/e2scan -C /ROOT -l -N 2009-03-29 19:44:00 /dev/dm-0  
file_list
generating list of files with
mtime newer than Sun Mar 29 15:56:54 2009
ctime newer than Sun Mar 29 15:56:54 2009
inode bitmap is read, 0 seconds
visible root: /ROOT
scanning inode tables ..

However solution for us is to use epoch time format.
Let me know if it works for you.

Cheers

Wojciech

Chad Kerner wrote:
 We have been trying to use e2scan to generate a list of files that were 
 changed since a specific date.  We are seeing that it has been missing 
 some of the modified files and was wondering if anyone else has seen 
 this behavior.

 I executed:
 e2scan -N Mar 27 08:00:00 2009 /dev/mapper/mdt_scratch  chad.out

 [r...@abe-mds04 projects_sync]# grep ILRocstar chad.out
 ./ROOT/.projects/roc/fnajjar/ILRocstar
 [r...@abe-mds04 projects_sync]#

 The following list is a very small subset of files within this directory 
 that were changed after the scan date:
 [r...@abe-bk1 ILRocstar]# pwd
 /cfs/scratch/.projects/roc/fnajjar/ILRocstar
 [r...@abe-bk1 ILRocstar]# ls -lR | grep Mar 28
 drwxr-x---  4 fnajjar gac4096 Mar 28 00:56 RSRM105_3D_HG
 drwxr-x--- 2 fnajjar gac  4096 Mar 28 00:56 Turb_LES
 -rw--- 1 fnajjar gac  4424 Mar 28 01:05 IVHG_j01.o863001
 -rw--- 1 fnajjar gac  4412 Mar 28 02:00 IVHG_j02.o863005
 -rw--- 1 fnajjar gac  4427 Mar 28 05:53 IVHG_j03.o863017
 -rw--- 1 fnajjar gac  5616 Mar 28 06:54 IVHG_j04.o863026
 -rw--- 1 fnajjar gac  5084 Mar 28 11:47 IVHG_j05.o863591
 -rw--- 1 fnajjar gac  4489 Mar 28 11:58 IVHG_j06.o863592
 -rw--- 1 fnajjar gac  4428 Mar 28 20:55 IVHG_j07.o863637

 [r...@abe-bk1 ILRocstar]# ls -lR | grep Mar 28 | wc -l
 164
 [r...@abe-bk1 ILRocstar]#


 We are wanting to use e2scan to generate a list of files to back up with 
 tivoli, because it is much faster to pass tivoli a list than have it 
 traverse the entire tree.  However, it looks like e2scan can't be 
 trusted to generate a valid list.

 The version of e2scan we are running is:
 [r...@abe-mds04 projects_sync]# rpm -qilf `which e2scan`
 Name: e2fsprogsRelocations: (not relocatable)
 Version : 1.40.11.sun1  Vendor: (none)
 Release : 0redhat   Build Date: Thu 10 Jul 2008 
 11:21:23 AM CDT
 Install Date: Sat 20 Dec 2008 04:07:31 AM CST  Build Host: 
 lts-build-x86-64-0.co.cfs
 Group   : System Environment/Base   Source RPM: 
 e2fsprogs-1.40.11.sun1-0redhat.src.rpm
 Size: 2085263  License: GPLv2
 Signature   : (none)
 URL : ftp://ftp.lustre.org/pub/lustre/other/e2fsprogs/
 Summary : Utilities for managing the second and third extended 
 (ext2/ext3) filesystems
 Description :


 Any thoughts or suggestions would be greately appreciated.

 Thanks,
 Chad
 --
 Chad Kerner - cker...@ncsa.uiuc.edu
 Systems Engineer, Storage Enabling Technologies
 National Center for Supercomputing Applications
 http://www.ncsa.uiuc.edu/~ckerner

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
   
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] OSS Cache Size for read optimization

2009-04-03 Thread Cliff White
Jordan Mendler wrote:
 Hi all,
 
 I deployed Lustre on some legacy hardware and as a result my (4) OSS's 
 each have 32GB of RAM. Our workflow is such that we are frequently 
 rereading the same 15GB indexes over and over again from Lustre (they 
 are striped across all OSS's) by all nodes on our cluster. As such, is 
 there any way to increase the amount of memory that either Lustre or the 
 Linux kernel uses to cache files read from disk by the OSS's? This would 
 allow much of the indexes to be served from memory on the OSS's rather 
 than disk.
 
 I see a /lustre.memused_max = 48140176/ parameter, but not sure what 
 that does. If it matters, my setup is such that each of the 4 OSS's 
 serves 1 OST that consists of a software RAID10 across 4 SATA disks 
 internal to that OSS.
 
 Any other suggestions for tuning for fast reads of large files would 
 also be greatly appreciated.
 

Current Lustre does not cache on OSTs at all. All IO is direct.
Future Lustre releases will provide an OST cache.

For now, you can increase the amount of data cached on clients, which
might help a little. Client caching is set with 
/proc/fs/lustre/osc/*/max_dirty_mb.

cliffw

 Thanks so much,
 Jordan
 
 
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] OSS Cache Size for read optimization

2009-04-03 Thread Lundgren, Andrew
The parameter is called dirty, is that write cache, or is it read-write?

 
 Current Lustre does not cache on OSTs at all. All IO is direct.
 Future Lustre releases will provide an OST cache.
 
 For now, you can increase the amount of data cached on clients, which
 might help a little. Client caching is set with
 /proc/fs/lustre/osc/*/max_dirty_mb.
 
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] OSS Cache Size for read optimization

2009-04-03 Thread Oleg Drokin
Yes, it is for dirty cache limiting on a per-osc basis.
There is also /proc/fs/lustre/llite/*/max_cached_mb that regulates how  
much cached
data per client you can have. (default is 3/4 of RAM)

On Apr 3, 2009, at 2:52 PM, Lundgren, Andrew wrote:

 The parameter is called dirty, is that write cache, or is it read- 
 write?


 Current Lustre does not cache on OSTs at all. All IO is direct.
 Future Lustre releases will provide an OST cache.

 For now, you can increase the amount of data cached on clients, which
 might help a little. Client caching is set with
 /proc/fs/lustre/osc/*/max_dirty_mb.

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss