Re: [Lustre-discuss] programmatic access to parameters

2010-03-25 Thread Cliff White
burlen wrote:
 System limits are sometimes provided in a header, I wasn't sure if 
 Lustre adopted that approach. The llapi_* functions are great, I see how 
 to set the stripe count and size. I wasn't sure if there was also a 
 function to query about the configuration, eg number of OST's deployed?
 
 This would be for use in a global hybrid megnetospheric simulation that 
 runs on a large scale (1E4-1E5 cores). The good striping parameters 
 depend on the run, and could be calculated at run time. It can make a 
 significant difference in our run times to have these set correctly. I 
 am not sure if we always want a stripe count of the maximum. I think 
 this depends on how many files we are synchronously writing, and the 
 number of available OST's total. Eg if there are 256 OST's on some 
 system and we have 2 files to write would it not make sense to set the 
 stripe count to 128?
 
 We can't rely on our user to set the Lustre parameter correctly. We 
 can't rely on the system defaults either, they typically aren't set 
 optimally for our use case. MPI hints look promising but the ADIO Lustre 
 optimization are fairly new,  as far as I understand not publically 
 available in MPICH until next release (maybe in may?). We run on a 
 variety of systems some with variety of MPI implementation (eg Cray, 
 SGI). The MPI hints will only be useful on implementation that support 
 the particular hint. From a consistency point of view we need to both 
 make use of MPI hints and direct access via the llapi so that we run 
 well on all those systems, regardless of which MPI implementation is 
 deployed.  
 

I don't know what your constraints are, but should note that this sort
of information (number of OSTs) can be obtained rather trivially from 
any lustre client via shell prompt, to wit:
# lctl dl |grep OST |wc -l
2
or:
# ls /proc/fs/lustre/osc | grep OST |wc -l
2

probably a few other ways to do that. Not as stylish as llapi_*..

cliffw

 Thanks
 Burlen
 
 
 Andreas Dilger wrote:
 On 2010-03-23, at 14:25, burlen wrote:
 How can one programmatically probe the lustre system an application is
 running on?
 Lustre-specific interfaces are generally llapi_* functions, from 
 liblustreapi.

 At compile time I'd like access to the various lustre system limits ,
 for example those listed in ch.32 of operations manual.
 There are no llapi_* functions for this today.  Can you explain a bit 
 better what you are trying to use this for?

 statfs(2) will tell you a number of limits, as will pathconf(3), and 
 those are standard POSIX APIs.

 Incidentally one I didn't see listed in that chapter is the maximum 
 number of OST's a single file can be striped across.
 That is the first thing listed:

 32.1Maximum Stripe Count
 The maximum number of stripe count is 160. This limit is hard-coded, 
 but is near the upper limit imposed by the underlying ext3 file 
 system. It may be increased in future releases. Under normal 
 circumstances, the stripe count is not affected by ACLs.

 At run time I'd like to be able to probe the size (number of OSS, OST
 etc...) of the system the application is running on.

 One shortcut is to specify -1 for the stripe count will stripe a 
 file across all available OSTs, which is what most applications want, 
 if they are not being striped over only 1 or 2 OSTs.

 If you are using MPIIO, the Lustre ADIO layer can optimize these 
 things for you, based on application hints.

 If you could elaborate on your needs, there may not be any need to 
 make your application more Lustre-aware.

 Cheers, Andreas
 -- 
 Andreas Dilger
 Sr. Staff Engineer, Lustre Group
 Sun Microsystems of Canada, Inc.

 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] SLES11 'make rpms' dies

2010-03-25 Thread Brian J. Murrell
On Wed, 2010-03-24 at 18:45 -0400, Joe Landman wrote: 
 
 This seems to have worked on an unpatched kernel.  Thanks.  I presume we 
 need the OFED 1.4.2 stack installed before for o2ib kernel build?

Specifically, you need kernel-ib-devel or it's equivalent -- that is,
the OFED kernel headers.  If they are in a standard OFED kernel-ib-devel
and installed at /usr/src/ofa_kernel, then configure will find them,
otherwise you need to point to them with --with-o2ib=path.

b.




signature.asc
Description: This is a digitally signed message part
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Lustre, automount and EIO

2010-03-25 Thread Stephen Willey
We're seeing errors which we believe are down to automount returning too early 
from a Lustre mount.

We're using autofs so the Lustre may be mounted instantly before the command 
using it is run.  We believe it may be because the client has not yet 
established connections to all the OSTs when mount returns and the following 
command is run.

We've tried creating an automounter module based on mount_generic that simply 
puts a 1s delay in the mount, and that's reduced the number of errors, but 
they're very much still there.  Putting in a larger delay is an option, but 
fairly obviously a pretty bad one.

Once the filesystem is actually mounted, things will work properly, until that 
is, the automounter drops the mount again of course.

Pasted below are two example log excerpts where we've automounted a filesystem 
called /net/epsilon, then immediately tried to fopen() a file on it which gives 
an I/O error.

I've attached a tiny C program that can regularly replicate the issue (it 
happened on 16 machines when run with pdsh across a set of roughly 400 and this 
is fairly representative)

Any ideas or recommendations would be much appreciated,

Stephen



Mar 25 12:26:38 rr445 automount[6457]: open_mount: (mount):cannot open mount 
module lustre (/usr/lib64/autofs/mount_lustre.so: cannot open shared object 
file: No such file or directory)
Mar 25 12:26:38 rr445 kernel: Lustre: Client epsilon-client has started 

   
Mar 25 12:26:38 rr445 kernel: LustreError: 
22600:0:(file.c:993:ll_glimpse_size()) obd_enqueue returned rc -5, returning 
-EIO


Mar 25 12:26:37 rr447 automount[6458]: open_mount: (mount):cannot open mount 
module lustre (/usr/lib64/autofs/mount_lustre.so: cannot open shared object 
file: No such file or directory)

 
Mar 25 12:26:37 rr447 kernel: Lustre: Client epsilon-client has started 

   
Mar 25 12:26:37 rr447 kernel: LustreError: 
2370:0:(file.c:993:ll_glimpse_size()) obd_enqueue returned rc -5, returning -EIO


-- 
Stephen Willey
Senior Systems Engineer
Framestore
19-23 Wells Street, London W1T 3PQ
+44 207 344 8000
www.framestore.com 
/*
 *  * Immediately begin writing to a file on disk, to test Lustre
 *   */

#include stdio.h
#include string.h

#define DATA kjrlewkujriojfjvclsdjfoiewujfdkjljvoisjvowjfelkjelwkjvfljifwedse

int main(int argc, char *argv[])
{
	FILE *f;
	unsigned int times = 512;

	if (argc != 2) {
		fprintf(stderr, Usage: %s pathname\n, argv[0]);
		return -1;
	}

	f = fopen(argv[1], a); /* creates if necessary */
	if (f == NULL) {
		perror(fopen);
		return -1;
	}

	while (times != 0) {
		if (fwrite(DATA, strlen(DATA), 1, f) != 1) {
			perror(fwrite);
			return -1;
		}
		times--;
	}

	if (fclose(f) != 0) {
		perror(fclose);
		return -1;
	}

	return 0;
}
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] programmatic access to parameters

2010-03-25 Thread burlen
 I don't know what your constraints are, but should note that this sort
 of information (number of OSTs) can be obtained rather trivially from 
 any lustre client via shell prompt, to wit: 
True, but parsing the output of a c system call is something I hoped 
to avoid. It might not be portable and might be fragile over time.

This also gets at my motivation for asking for a header with the Lustre 
limits, if I hard code something down the road the limits change, and we 
are suddenly shooting ourselves in the foot.

I think I have made a mistake about the MPI hints in my last mail,. The 
striping_* hints are a part of the MPI standard at least as far back as 
2003. It says that they are reserved but implementations are not 
required to  interpret them. That's a pretty weak assurance.

I'd like this thread to be considered by Lustre team as a feature 
request for better programmatic support. I think it makes sense because 
the performance is fairly sensitive to both the deployed hardware and 
striping parameters. There also can be more information regarding the 
specific IO needs available at the application level than at the MPI 
level. And MPI implementations don't have to honor hints.

Thanks, I am grateful for the help as I get up to speed on Lustre fs
Burlen

Cliff White wrote:
 burlen wrote:
 System limits are sometimes provided in a header, I wasn't sure if 
 Lustre adopted that approach. The llapi_* functions are great, I see 
 how to set the stripe count and size. I wasn't sure if there was also 
 a function to query about the configuration, eg number of OST's 
 deployed?

 This would be for use in a global hybrid megnetospheric simulation 
 that runs on a large scale (1E4-1E5 cores). The good striping 
 parameters depend on the run, and could be calculated at run time. It 
 can make a significant difference in our run times to have these set 
 correctly. I am not sure if we always want a stripe count of the 
 maximum. I think this depends on how many files we are synchronously 
 writing, and the number of available OST's total. Eg if there are 256 
 OST's on some system and we have 2 files to write would it not make 
 sense to set the stripe count to 128?

 We can't rely on our user to set the Lustre parameter correctly. We 
 can't rely on the system defaults either, they typically aren't set 
 optimally for our use case. MPI hints look promising but the ADIO 
 Lustre optimization are fairly new,  as far as I understand not 
 publically available in MPICH until next release (maybe in may?). We 
 run on a variety of systems some with variety of MPI implementation 
 (eg Cray, SGI). The MPI hints will only be useful on implementation 
 that support the particular hint. From a consistency point of view we 
 need to both make use of MPI hints and direct access via the llapi so 
 that we run well on all those systems, regardless of which MPI 
 implementation is deployed. 

 I don't know what your constraints are, but should note that this sort
 of information (number of OSTs) can be obtained rather trivially from 
 any lustre client via shell prompt, to wit:
 # lctl dl |grep OST |wc -l
 2
 or:
 # ls /proc/fs/lustre/osc | grep OST |wc -l
 2

 probably a few other ways to do that. Not as stylish as llapi_*..

 cliffw

 Thanks
 Burlen


 Andreas Dilger wrote:
 On 2010-03-23, at 14:25, burlen wrote:
 How can one programmatically probe the lustre system an application is
 running on?
 Lustre-specific interfaces are generally llapi_* functions, from 
 liblustreapi.

 At compile time I'd like access to the various lustre system limits ,
 for example those listed in ch.32 of operations manual.
 There are no llapi_* functions for this today.  Can you explain a 
 bit better what you are trying to use this for?

 statfs(2) will tell you a number of limits, as will pathconf(3), and 
 those are standard POSIX APIs.

 Incidentally one I didn't see listed in that chapter is the maximum 
 number of OST's a single file can be striped across.
 That is the first thing listed:

 32.1Maximum Stripe Count
 The maximum number of stripe count is 160. This limit is 
 hard-coded, but is near the upper limit imposed by the underlying 
 ext3 file system. It may be increased in future releases. Under 
 normal circumstances, the stripe count is not affected by ACLs.

 At run time I'd like to be able to probe the size (number of OSS, OST
 etc...) of the system the application is running on.

 One shortcut is to specify -1 for the stripe count will stripe a 
 file across all available OSTs, which is what most applications 
 want, if they are not being striped over only 1 or 2 OSTs.

 If you are using MPIIO, the Lustre ADIO layer can optimize these 
 things for you, based on application hints.

 If you could elaborate on your needs, there may not be any need to 
 make your application more Lustre-aware.

 Cheers, Andreas
 -- 
 Andreas Dilger
 Sr. Staff Engineer, Lustre Group
 Sun Microsystems of Canada, Inc.


 

Re: [Lustre-discuss] programmatic access to parameters

2010-03-25 Thread Daniel Kobras
On Thu, Mar 25, 2010 at 10:07:03AM -0700, burlen wrote:
  I don't know what your constraints are, but should note that this sort
  of information (number of OSTs) can be obtained rather trivially from 
  any lustre client via shell prompt, to wit: 
 True, but parsing the output of a c system call is something I hoped 
 to avoid. It might not be portable and might be fragile over time.
 
 This also gets at my motivation for asking for a header with the Lustre 
 limits, if I hard code something down the road the limits change, and we 
 are suddenly shooting ourselves in the foot.

Instead of teaching the application some filesystem intrinsics, you could
also teach the queueing system about your application's output behaviour
and let it set up and adequately configured working directory. GridEngine
allows to run queue-specific prolog scripts for this purpose, other systems
certainly offer similar features.

Regards,

Daniel.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] programmatic access to parameters

2010-03-25 Thread Andreas Dilger
On 2010-03-24, at 18:39, burlen wrote:
 System limits are sometimes provided in a header, I wasn't sure if  
 Lustre adopted that approach.

Well, having static limits in a header doesn't help if the limits are  
dynamic.

 The llapi_* functions are great, I see how to set the stripe count  
 and size. I wasn't sure if there was also a function to query about  
 the configuration, eg number of OST's deployed?

There isn't directly such a function, but indirectly this is possible  
to get from userspace without changing Lustre or the liblustreapi  
library:

/* Return the number of OSTs configured for the filesystem on which the
  * file descriptor fd is opened.
  * Return +ve number of OSTs on success, or -ve errno on failure. */
int get_ost_count(int fd)
{
 int ost_count = 0;
 int rc;

 rc = llapi_lov_get_uuids(fd, NULL, ost_count);
 if (rc  0)
 return rc;

 return ost_count;
}


/* return the maximum possible number of OSTs any file can be striped  
over
  * for the filesystem on which the file descriptor fd is opened.
  * Return +ve max_stripe_count on success, or -ve errno on failure. */
int get_max_stripe_count(int fd)
{
 int max_stripe_count = 0;
 int rc;

 rc = llapi_lov_get_uuids(fd, NULL, max_stripe_count);
 if (rc  0)
 return rc;

 if (max_stripe_count  LOV_MAX_STRIPE_COUNT)
 max_stripe_count = LOV_MAX_STRIPE_COUNT;

 return max_stripe_count;
}

Note that there is a drawback from forcing a file to have N stripes,  
vs. letting Lustre make this decision.  If you request N stripes, but  
e.g. one OST is unavailable (full, offline, whatever) your file  
creation will fail.  If Lustre is making this decision it will use the  
currently available OSTs, regardless of how many there are configured.

 This would be for use in a global hybrid megnetospheric simulation  
 that runs on a large scale (1E4-1E5 cores). The good striping  
 parameters depend on the run, and could be calculated at run time.  
 It can make a significant difference in our run times to have these  
 set correctly. I am not sure if we always want a stripe count of the  
 maximum. I think this depends on how many files we are synchronously  
 writing, and the number of available OST's total. Eg if there are  
 256 OST's on some system and we have 2 files to write would it not  
 make sense to set the stripe count to 128?

Sure, but not many applications run in this mode.  Either they have  
1:1 (file per process), N:M (shared single file, maximally-striped) or  
M:M (shared single file, maximally-striped, 1 process writing per  
stripe).

 We can't rely on our user to set the Lustre parameter correctly. We  
 can't rely on the system defaults either, they typically aren't set  
 optimally for our use case. MPI hints look promising but the ADIO  
 Lustre optimization are fairly new,  as far as I understand not  
 publically available in MPICH until next release (maybe in may?). We  
 run on a variety of systems some with variety of MPI implementation  
 (eg Cray, SGI). The MPI hints will only be useful on implementation  
 that support the particular hint. From a consistency point of view  
 we need to both make use of MPI hints and direct access via the  
 llapi so that we run well on all those systems, regardless of which  
 MPI implementation is deployed.
 Thanks
 Burlen


 Andreas Dilger wrote:
 On 2010-03-23, at 14:25, burlen wrote:
 How can one programmatically probe the lustre system an  
 application is
 running on?

 Lustre-specific interfaces are generally llapi_* functions, from  
 liblustreapi.

 At compile time I'd like access to the various lustre system  
 limits ,
 for example those listed in ch.32 of operations manual.

 There are no llapi_* functions for this today.  Can you explain a  
 bit better what you are trying to use this for?

 statfs(2) will tell you a number of limits, as will pathconf(3),  
 and those are standard POSIX APIs.

 Incidentally one I didn't see listed in that chapter is the  
 maximum number of OST's a single file can be striped across.

 That is the first thing listed:

 32.1Maximum Stripe Count
 The maximum number of stripe count is 160. This limit is hard- 
 coded, but is near the upper limit imposed by the underlying ext3  
 file system. It may be increased in future releases. Under normal  
 circumstances, the stripe count is not affected by ACLs.

 At run time I'd like to be able to probe the size (number of OSS,  
 OST
 etc...) of the system the application is running on.


 One shortcut is to specify -1 for the stripe count will stripe a  
 file across all available OSTs, which is what most applications  
 want, if they are not being striped over only 1 or 2 OSTs.

 If you are using MPIIO, the Lustre ADIO layer can optimize these  
 things for you, based on application hints.

 If you could elaborate on your needs, there may not be any need to  
 

Re: [Lustre-discuss] programmatic access to parameters

2010-03-25 Thread Andreas Dilger
On 2010-03-25, at 15:12, Andreas Dilger wrote:
 The llapi_* functions are great, I see how to set the stripe count
 and size. I wasn't sure if there was also a function to query about
 the configuration, eg number of OST's deployed?

 There isn't directly such a function, but indirectly this is possible
 to get from userspace without changing Lustre or the liblustreapi
 library:


I filed bug 22472 for this issue, with a proposed patch, though the  
actual implementation may change before this is included into any  
release.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] filter_grant_incoming()) LBUG in 1.8.1.1

2010-03-25 Thread Scott Barber
Background:
MDS and OSTs are all running CentOS 5.4 / x86_64 /
2.6.18-128.7.1.el5_lustre.1.8.1.1
2 types of clients
 - CentOS 5.4 / x86_64 / 2.6.18-128.7.1.el5_lustre.1.8.1.1
 - Ubuntu 8.04.1 / i686 / 2.6.22.19 patchless

A few days ago one of the OSSs hit an LBUG. The syslog looked like:
http://pastie.org/887643

I brought it back up by unmounting the OSTs, restarting the machine
and remounting the OSTs. The OST was just fine after that, but this
seemed to start a chain-reaction with other OSSs. I'd run into the
same LBUG and same call trace in the syslog on other OSSs. I kept
bringing them back up again and an hour later it would happen again -
interestingly never on the same OSS twice. It finally stopped when I
unmounted the MDS/MGS, rebooted the MDS server and them remounted it
again. We had no issues after that until this afternoon :(

In researching the issue it looks as though it is bug #19338 which in
turn is a duplicate of #20278. It looks as though that bug isn't
slated for 1.8 at all. Am I reading that right? There's been no
testing that I could tell of the patch on 1.8.x so I'm leery of trying
to patch my servers. Is there something else that I can do? Any more
info you need?


Thanks for your help,
Scott Barber
Senior Systems Admin
iMemories.com
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] programmatic access to parameters

2010-03-25 Thread Christopher J. Morrone
Cliff White wrote:

 I don't know what your constraints are, but should note that this sort
 of information (number of OSTs) can be obtained rather trivially from 
 any lustre client via shell prompt, to wit:
 # lctl dl |grep OST |wc -l
 2
 or:
 # ls /proc/fs/lustre/osc | grep OST |wc -l
 2

IF you only have one lustre filesystem mounted on that node.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] programmatic access to parameters

2010-03-25 Thread Jason Rappleye

On Mar 25, 2010, at 6:25 PM, Christopher J. Morrone wrote:

 Cliff White wrote:
 
 I don't know what your constraints are, but should note that this sort
 of information (number of OSTs) can be obtained rather trivially from 
 any lustre client via shell prompt, to wit:
 # lctl dl |grep OST |wc -l
 2
 or:
 # ls /proc/fs/lustre/osc | grep OST |wc -l
 2
 
 IF you only have one lustre filesystem mounted on that node.

How about /proc/fs/lustre/lov/filesystem-*/numobd

$ cat /proc/fs/lustre/lov/nbp10-clilov-81007e33b800/numobd 
120

j

--
Jason Rappleye
System Administrator
NASA Advanced Supercomputing Division
NASA Ames Research Center
Moffett Field, CA 94035





___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss