Re: [lustre-discuss] Lustre on Ceph Block Devices
If we do test this I'll let you know how it works. Why Lustre on GPFS? Why not just run GPFS then given it support byte range locking / MPI-IO and POSIX (Ignore license costs). I'm trying to limit the number of disk systems to maintain in a system of modest size where both MPI-IO and Object is required.I have dedicated Lustre today for larger systems and they will stay that way. Was just curious if anyone tried this. Brock Palen www.umich.edu/~brockp Director Advanced Research Computing - TS XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Wed, Feb 22, 2017 at 4:54 AM, Shinobu Kinjo <shinobu...@gmail.com> wrote: > Yeah, that's interesting. But that does not really make sense to use > Lustre. And should not be used for any computations. > > If anything goes wrong, troubleshooting would become nightmare. > > Have you ever thought of using Lustre on top of GPFS native client? > > Anyway if you are going to build Lustre on top of any RADOS client and run > MPI jobs, please share results. I'm really really interested in them. > > > > On Wed, Feb 22, 2017 at 2:06 PM, Brian Andrus <toomuc...@gmail.com> wrote: > >> I had looked at it, but then, why? >> >> There is no benefit using object storage when you are putting lustre over >> top. It would bog down. Supposedly you would want to use CephFS over the >> ceph storage. It talks directly to rados. >> If you are able to enunciate the rados block devices, you should also be >> able to send them directly as block devices (iSCSI at least) so lustre is >> able to manage where the data is stored and use it's optimizing. Otherwise >> the data can't be optimized. Lustre would THINK it knows where it was, but >> the rados crush map would have put it somewhere else. >> >> Just my 2cents. >> >> Brian >> On 2/21/2017 3:08 PM, Brock Palen wrote: >> >> Has anyone ever ran Lustre OST's (and maybe MDT's) on Ceph Radios Block >> Devices? >> >> In theory this would work just like an SAN attached solution. Has anyone >> ever done it before? I know we are seeing decent performance from RBD on >> our system but I don't have a way to test lustre on it. >> >> I'm looking at a future system where Ceph and Lustre might be needed >> (Object and High performance HPC) but also not a huge budget to have two >> full disk stacks. So an idea was to have lustre servers consume Ceph Block >> devices, and that same cluster serves object requests. >> >> Thoughts or prior art? This probably isn't that different than the Cloud >> Formation script that uses EBS volumes if it works as intended. >> >> Thanks >> >> Brock Palen >> www.umich.edu/~brockp <http://www.umich.edu/%7Ebrockp> >> Director Advanced Research Computing - TS >> XSEDE Campus Champion >> bro...@umich.edu >> (734)936-1985 <(734)%20936-1985> >> >> >> ___ >> lustre-discuss mailing >> listlustre-discuss@lists.lustre.orghttp://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >> >> >> >> ___ >> lustre-discuss mailing list >> lustre-discuss@lists.lustre.org >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >> >> > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Lustre on Ceph Block Devices
Has anyone ever ran Lustre OST's (and maybe MDT's) on Ceph Radios Block Devices? In theory this would work just like an SAN attached solution. Has anyone ever done it before? I know we are seeing decent performance from RBD on our system but I don't have a way to test lustre on it. I'm looking at a future system where Ceph and Lustre might be needed (Object and High performance HPC) but also not a huge budget to have two full disk stacks. So an idea was to have lustre servers consume Ceph Block devices, and that same cluster serves object requests. Thoughts or prior art? This probably isn't that different than the Cloud Formation script that uses EBS volumes if it works as intended. Thanks Brock Palen www.umich.edu/~brockp Director Advanced Research Computing - TS XSEDE Campus Champion bro...@umich.edu (734)936-1985 ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [Lustre-discuss] How to efficiently get sizes of all files stored in Lustre?
I would like to add though, if you want to scan the filesystem quickly with robinhood, the current versions are _very_ slow, but work with change logs so it can always be upto-date which could be go. I personally rolled back to a 2.3 version for two reasons, *We don't use change logs so scan speed is important *The ENTIRES table schema is simple, and lets us to queries in HIVE and Pig must more easily, and even commands like: rbh-report -i -P /limited/path/ Are much faster. The above on newer versions of robinhood are slower than using find. So if you find it slow, try an old version. Or if you are using change logs and can have it run all the time, new versions should be fast enough to keep up with changes. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Sep 17, 2014, at 8:44 AM, Alexander Oltu alexander.o...@uni.no wrote: On Tue, 16 Sep 2014 16:41:20 +0200 Marcin BarczyĆski wrote: Hello, I would like to efficiently get sizes of all files stored in Lustre filesystem. Another approach can be to use Robinhood, http://sourceforge.net/projects/robinhood/ Best regards, Alex. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss signature.asc Description: Message signed with OpenPGP using GPGMail ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] killing lfs_migrate
I will have a limited window to migrate files to a new OST. I would like to get as far as I can in the window I have. Is it safe to kill lfs_migrate while it is still running? If so will it leave any 'partial copies' around? Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] killing lfs_migrate
On Feb 27, 2012, at 2:49 PM, Ashley Pittman wrote: On 27 Feb 2012, at 19:30, Brock Palen wrote: I will have a limited window to migrate files to a new OST. I would like to get as far as I can in the window I have. Is it safe to kill lfs_migrate while it is still running? If so will it leave any 'partial copies' around? The script will be limited by client bandwidth, if possible you could run multiple instances, each working on a different part of the tree you want copied. Noted and planned on doing that, I'd also consider mounting the FS as a client on the server which hosts the OST and running it there. Wasn't making a server also a client considered bad juju ? Memory pressure things and panics and other badness. This would be nice because the OSS's have the biggest network pipes in our setup. BTW I am moving old files from old OST's to new OST's to balance them back out. in usage and in age distribution. Ashley. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Setting lustre directory and content immutable but keep permissions
For a policy issue with scratch space we want to lock a users scratch space that lives on lustre 1.8.x. We want users to be able to grab their data but not be able to add any more, they also do not need to delete files. We could recursively remove write bit, problem is we at time may wish to restore write access to the files with the same permissions they had before so we wish to not change the permissions. We also don't want to put the stress of a bunch of chmod's on the MDS. So short is there a simple way to say 'directory and children not mutable' that is undoable by an admin? Thanks! Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Line rate performance for clients
I think this is a networking question. We have lustre 1.8 clients with 1gig-e interfaces that according to ethtool are running full duplex. If I do the following: cp /lustre/largeilfe.h5 /tmp/ I get 117MB/s If I then use globus-url-copy to move that file from /tmp/ to - remove tape archive I get 117MB/s If I go directly from /lustre - archive I get 50MB/s, this is consistently reproducible. It doesn't mater if I just copy a large file on lustre to lustre, or scp, or globus. If I try to ingest and outgest data I get what looks like half duplex performance. Anyone have ideas why I cannot do 1Gig-e full duplex? Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Line rate performance for clients
Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Jul 29, 2011, at 2:01 PM, Andreas Dilger wrote: On 2011-07-29, at 11:33 AM, Brock Palen wrote: I think this is a networking question. We have lustre 1.8 clients with 1gig-e interfaces that according to ethtool are running full duplex. If I do the following: cp /lustre/largeilfe.h5 /tmp/ I get 117MB/s If I then use globus-url-copy to move that file from /tmp/ to - remove tape archive I get 117MB/s If I go directly from /lustre - archive I get 50MB/s, Strace your globus-url-copy and see what IO size it is using. cp has long ago been modified to use the blocksize reported by stat(2) for copying, and Lustre reports a 2MB IO size for striped files (1MB for unstriped). If your globus tool is using e.g. 4kB reads then it will be very inefficient for Lustre, but much less so than from /tmp. this is consistently reproducible. It doesn't mater if I just copy a large file on lustre to lustre, or scp, or globus. If I try to ingest and outgest data I get what looks like half duplex performance. Anyone have ideas why I cannot do 1Gig-e full duplex? I don't think this has anything to do with full duplex. 117MB/s is pretty much the maximum line rate for GigE (and pretty good for Lustre, if I do say so myself) in one direction. There is presumably no data moving in the other direction at that time. Ah I guess I wasn't clear, I only get 117MB/s when I do 'one direction on the network' eg copy form lustre to /tmp (local drive)', /tmp using globus out. Its just when the client is reading form lustre and sending the data out at the same time that I only get 50MB/s. Does that make sense? Is it even right for me to expect that I could combine the performance together and expect full speed in and full speed out if I can consistently get them independent of each other? Cheers, Andreas -- Andreas Dilger Principal Engineer Whamcloud, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] mv_sata module for rhel5 and write through patch
We are (finally) updating our x4500's to rhel5 and luster 1.8.5 from rhel4 and 1.6.7 On rhel4 we had used the patch from: https://bugzilla.lustre.org/show_bug.cgi?id=14040 for the mv_sata module. Is this still recommended on rhel5? To use the mv_sata module over the stock redhat sata_mv as well as applying this patch? That patch is quite old is there a newer one? What are other x4500/thumper users running? Also I will do some digging on the list but why is lustre 2.0 not the 'production' version? We are planning on 1.8.x for now but if 2.0 is stable we would install that one. Can we upgrade directly from 1.6 to 2.0 if we did this? Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] finding clients that is opening/closing files
This was very helpful, I found the culprit. Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Oct 26, 2010, at 3:42 PM, Wojciech Turek wrote: One way is to check the /proc/fs/lustre/mds/*/exports/*/stats files, which contains per-client statistics. They can be cleared by writing 0 to the file, and then check for files with lots of operations. On 26 October 2010 20:10, Brock Palen bro...@umich.edu wrote: I have wat I think is a badly behaving user, look at /proc/fs/lustre/mds/nobackup-MDT/stats The open/close counters are running about 1000/s, I would like to track down what clients this is coming from and knock the users about fixing there code that are doing this. how does does look at 'stats by node' do I need to look at all clients? Or can I get this information from the mds? Thanks! Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- Wojciech Turek Senior System Architect High Performance Computing Service University of Cambridge Email: wj...@cam.ac.uk Tel: (+)44 1223 763517 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] controlling which eth interface lustre uses
We recently added a new oss, it has 1 1Gb interface and 1 10Gb interface, The 10Gb interface is eth4 10.164.0.166 The 1Gb interface is eth0 10.164.0.10 In modprobe.conf I have: options lnet networks=tcp0(eth4) lctl list_nids 10.164.0@tcp From a host I run: lctl which_nid oss4 10.164.0@tcp But yet I still see traffic over eth0 the 1Gb management network, might higher than I would expect (upto 100MB/s) The management interface is oss4-gb So If I do from a client: lctl which_nid oss4-gb 10.164.0...@tcp Why If I have netwroks=tcp0(eth4) and that list_nids showa only the 10Gb interface, do I have so much traffic over the 1Gb interface? There is some traffic on the 10Gb interface, but I would like to tell lustre 'don't use the 1Gb interface'. Thanks! Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] controlling which eth interface lustre uses
On Oct 21, 2010, at 9:48 AM, Joe Landman wrote: On 10/21/2010 09:37 AM, Brock Palen wrote: We recently added a new oss, it has 1 1Gb interface and 1 10Gb interface, The 10Gb interface is eth4 10.164.0.166 The 1Gb interface is eth0 10.164.0.10 They look like they are on the same subnet if you are using /24 ... You are correct Both interfaces are on the same subnet: [r...@oss4-gb ~]# route Kernel IP routing table Destination Gateway Genmask Flags Metric RefUse Iface 10.164.0.0 * 255.255.248.0 U 0 00 eth0 10.164.0.0 * 255.255.248.0 U 0 00 eth4 169.254.0.0 * 255.255.0.0 U 0 00 eth4 default 10.164.0.1 0.0.0.0 UG0 00 eth0 There is no way to mask the lustre service away from the 1Gb interface? In modprobe.conf I have: options lnet networks=tcp0(eth4) lctl list_nids 10.164.0@tcp From a host I run: lctl which_nid oss4 10.164.0@tcp But yet I still see traffic over eth0 the 1Gb management network, might higher than I would expect (upto 100MB/s) The management interface is oss4-gb So If I do from a client: lctl which_nid oss4-gb 10.164.0...@tcp Why If I have netwroks=tcp0(eth4) and that list_nids showa only the 10Gb interface, do I have so much traffic over the 1Gb interface? There is some traffic on the 10Gb interface, but I would like to tell lustre 'don't use the 1Gb interface'. If they are on the same subnet, its possible that the 1GbE sees the arp response first. And then its pretty much guaranteed to have the traffic go out that port. If your subnets are different, this shouldn't be the issue. Thanks! Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: land...@scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/jackrabbit phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] controlling which eth interface lustre uses
Why do you need both active? If one is a backup to the other, then bond them as a primary/backup pair, meaning only one will be active at at a time, ie, your designated primary (unless it goes down). We could do this, the 10Gb drivers have been such a pain for us we wanted to have a 'back door' management network to get to the box should we have issues with the 10Gb driver. Oddly I ran: ifconfig eth0 down and I could nolonger ping the box over the eth4 interface, I had to power cycle it form management. Very odd. bob On 10/21/2010 9:51 AM, Brock Palen wrote: On Oct 21, 2010, at 9:48 AM, Joe Landman wrote: On 10/21/2010 09:37 AM, Brock Palen wrote: We recently added a new oss, it has 1 1Gb interface and 1 10Gb interface, The 10Gb interface is eth4 10.164.0.166 The 1Gb interface is eth0 10.164.0.10 They look like they are on the same subnet if you are using /24 ... You are correct Both interfaces are on the same subnet: [r...@oss4-gb ~]# route Kernel IP routing table Destination Gateway Genmask Flags Metric RefUse Iface 10.164.0.0 * 255.255.248.0 U 0 00 eth0 10.164.0.0 * 255.255.248.0 U 0 00 eth4 169.254.0.0 * 255.255.0.0 U 0 00 eth4 default 10.164.0.1 0.0.0.0 UG0 00 eth0 There is no way to mask the lustre service away from the 1Gb interface? In modprobe.conf I have: options lnet networks=tcp0(eth4) lctl list_nids 10.164.0@tcp From a host I run: lctl which_nid oss4 10.164.0@tcp But yet I still see traffic over eth0 the 1Gb management network, might higher than I would expect (upto 100MB/s) The management interface is oss4-gb So If I do from a client: lctl which_nid oss4-gb 10.164.0...@tcp Why If I have netwroks=tcp0(eth4) and that list_nids showa only the 10Gb interface, do I have so much traffic over the 1Gb interface? There is some traffic on the 10Gb interface, but I would like to tell lustre 'don't use the 1Gb interface'. If they are on the same subnet, its possible that the 1GbE sees the arp response first. And then its pretty much guaranteed to have the traffic go out that port. If your subnets are different, this shouldn't be the issue. Thanks! Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: land...@scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/jackrabbit phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] controlling which eth interface lustre uses
On Oct 21, 2010, at 10:35 AM, Brian J. Murrell wrote: On Thu, 2010-10-21 at 10:29 -0400, Brock Palen wrote: We could do this, the 10Gb drivers have been such a pain for us we wanted to have a 'back door' management network to get to the box should we have issues with the 10Gb driver. If you really do want two separate networks, one for Lustre and one for management, they why not configure them as separate networks with different subnets? Anything else is just going to confuse the routing engine. I think at best two interfaces on the same subnet is going to cause indeterminate behaviour. We settled on disabling the eth0 interface and hope the 10Gb driver will not give us any more trouble. We don't currently have a dedicated management network, it was passed over being setup for just a single host. b. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] mixing server versions
We have a filesystem that we can't take down for a while to upgrade the oss's they are running 1.6.x we do have a need to quickly add some storage to it, and thus the new server would have 1.8.x Is there any problems with this? I know 1.6.x isn't supported anymore and we would like to move everything to 1.8 soon but we are in a bind for the moment. Is our only (safe) option to load 1.6.x on the new server also and wait till we can shutdown the filesystem? Thanks! Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] repquota for lustre
I see the bug in bugzilla from version 1.4 that is put on hold, I just want to bump interest for such a tool. If anyone has made something that does quota reports for lustre I would be interested. Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] repquota for lustre
Thanks I am checking it out, Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Oct 23, 2009, at 3:38 PM, Jim Garlick wrote: I wrote a 'repquota' tool that groks lustre: http://sourceforge.net/projects/rquota/ I think LBL has a lustre quota reporting tool as well. Jim On Fri, Oct 23, 2009 at 02:31:49PM -0400, Brock Palen wrote: I see the bug in bugzilla from version 1.4 that is put on hold, I just want to bump interest for such a tool. If anyone has made something that does quota reports for lustre I would be interested. Brock Palen www.*umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://*lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] recover borked mds
Some additional details, I mounted the mds as ldiskfs and deleted the files in OBJECTS/* and CATALOGS, Remounted as lustre, same issue. I also did a write conf, restarted all the servers, saw messages on the MGS, that new config logs were being created, but still same error on the mds trying to start up. Is there a way to get lustre to stop trying to open 0xf150010:80d24629: ? And not go though recovery? If not, can I format a new mds, and just untar ROOTS/ and apply the extended attributes to ROOTS from the old mds filesystem? Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Aug 19, 2009, at 12:57 PM, Brock Palen wrote: After a network event (switches bouncing) looks like our mds got borked somewhere, from all the random failovers (switches came up and down rapidly over a few hours). Now we can not mount the mds, when we do we get the following errors: Aug 19 12:37:39 mds2 kernel: LustreError: 137-5: UUID 'nobackup- MDT_UUID' is not available for connect (no target) Aug 19 12:37:39 mds2 kernel: LustreError: 7455:0:(ldlm_lib.c: 1619:target_send_reply_msg()) @@@ processing error (-19) r...@01037c9db600 x85226/t0 o38-?@?:0/0 lens 304/0 e 0 to 0 dl 1250699959 ref 1 fl Interpret:/0/0 rc -19/0 Aug 19 12:37:39 mds2 kernel: LustreError: 137-5: UUID 'nobackup- MDT_UUID' is not available for connect (no target) Aug 19 12:37:39 mds2 kernel: LustreError: 7456:0:(ldlm_lib.c: 1619:target_send_reply_msg()) @@@ processing error (-19) r...@0104163a6000 x47117/t0 o38-?@?:0/0 lens 304/0 e 0 to 0 dl 1250699959 ref 1 fl Interpret:/0/0 rc -19/0 Aug 19 12:37:39 mds2 kernel: LustreError: 137-5: UUID 'nobackup- MDT_UUID' is not available for connect (no target)Aug 19 12:37:39 mds2 kernel: LustreError: Skipped 11 previous similar messages Aug 19 12:37:39 mds2 kernel: LustreError: 7468:0:(ldlm_lib.c: 1619:target_send_reply_msg()) @@@ processing error (-19) r...@010350a4d200 x81788/t0 o38-?@?:0/0 lens 304/0 e 0 to 0 dl 1250699959 ref 1 fl Interpret:/0/0 rc -19/0 Aug 19 12:37:39 mds2 kernel: LustreError: 7468:0:(ldlm_lib.c: 1619:target_send_reply_msg()) Skipped 11 previous similar messages Aug 19 12:37:40 mds2 kernel: LustreError: 137-5: UUID 'nobackup- MDT_UUID' is not available for connect (no target) Aug 19 12:37:40 mds2 kernel: LustreError: Skipped 18 previous similar messages Aug 19 12:37:40 mds2 kernel: LustreError: 7455:0:(ldlm_lib.c: 1619:target_send_reply_msg()) @@@ processing error (-19) r...@010414dc1850 x81855/t0 o38-?@?:0/0 lens 304/0 e 0 to 0 dl 1250699960 ref 1 fl Interpret:/0/0 rc -19/0Aug 19 12:37:40 mds2 kernel: LustreError: 7455:0:(ldlm_lib.c:1619:target_send_reply_msg()) Skipped 18 previous similar messages Aug 19 12:37:42 mds2 kernel: LustreError: 137-5: UUID 'nobackup- MDT_UUID' is not available for connect (no target) Aug 19 12:37:42 mds2 kernel: LustreError: Skipped 42 previous similar messages Aug 19 12:37:42 mds2 kernel: LustreError: 7466:0:(ldlm_lib.c: 1619:target_send_reply_msg()) @@@ processing error (-19) r...@01037c9db600 x77144/t0 o38-?@?:0/0 lens 304/0 e 0 to 0 dl 1250699962 ref 1 fl Interpret:/0/0 rc -19/0 Aug 19 12:37:42 mds2 kernel: LustreError: 7466:0:(ldlm_lib.c: 1619:target_send_reply_msg()) Skipped 42 previous similar messages Aug 19 12:37:43 mds2 kernel: Lustre: Request x3 sent from mgc10.164.3@tcp to NID 10.164.3@tcp 5s ago has timed out (limit 5s). Aug 19 12:37:43 mds2 kernel: Lustre: Changing connection for mgc10.164.3@tcp to mgc10.164.3@tcp_1/0...@lo Aug 19 12:37:43 mds2 kernel: Lustre: Enabling user_xattr Aug 19 12:37:43 mds2 kernel: Lustre: 7524:0:(mds_fs.c: 493:mds_init_server_data()) RECOVERY: service nobackup-MDT, 439 recoverable clients, last_transno 3647966566 Aug 19 12:37:43 mds2 kernel: Lustre: MDT nobackup-MDT now serving dev (nobackup-MDT/57dddb69-2475-b551-4100-e045f91ce38c), but will be in recovery for at least 5:00, or until 439 clients reconnect. During this time new clients will not be allowed to connect. Recovery progress can be monitored by watching / proc/fs/lustre/mds/nobackup-MDT/rec overy_status. Aug 19 12:37:43 mds2 kernel: Lustre: 7524:0:(lproc_mds.c: 273:lprocfs_wr_group_upcall()) nobackup-MDT: group upcall set to / usr/sbin/l_getgroups Aug 19 12:37:43 mds2 kernel: Lustre: nobackup-MDT.mdt: set parameter group_upcall=/usr/sbin/l_getgroupsAug 19 12:37:43 mds2 kernel: Lustre: 7524:0:(mds_lov.c:1070:mds_notify()) MDS nobackup- MDT: in recovery, not resetting orphans on nobackup-OST_UUID Aug 19 12:37:43 mds2 kernel: Lustre: nobackup-MDT: temporarily refusing client connection from 10.164.1@tcp Aug 19 12:37:43 mds2 kernel: LustreError: 7525:0:(llog_lvfs.c: 612:llog_lvfs_create()) error looking up logfile 0xf150010:0x80d24629: rc -2 Aug 19 12:37:43 mds2 kernel: LustreError: 7525:0:(llog_cat.c: 176:llog_cat_id2handle
[Lustre-discuss] recover borked mds
:0x9642a0ac Aug 19 12:37:43 mds2 kernel: LustreError: 7524:0:(lov_log.c: 230:lov_llog_init()) error osc_llog_init idx 0 osc 'nobackup-OST- osc' tgt 'nobackup-MDT' (rc=-2) Aug 19 12:37:43 mds2 kernel: LustreError: 7524:0:(mds_log.c: 220:mds_llog_init()) lov_llog_init err -2 Aug 19 12:37:43 mds2 kernel: LustreError: 7524:0:(llog_obd.c: 417:llog_cat_initialize()) rc: -2 Aug 19 12:37:43 mds2 kernel: LustreError: 7524:0:(lov_obd.c: 727:lov_add_target()) add failed (-2), deleting nobackup-OST_UUID Aug 19 12:37:43 mds2 kernel: LustreError: 7524:0:(obd_config.c: 1093:class_config_llog_handler()) Err -2 on cfg command: Aug 19 12:37:43 mds2 kernel: Lustre:cmd=cf00d 0:nobackup-mdtlov 1:nobackup-OST_UUID 2:0 3:1 Aug 19 12:37:43 mds2 kernel: LustreError: 15c-8: mgc10.164.3@tcp: The configuration from log 'nobackup-MDT' failed (-2). This may be the result of communication errors b etween this node and the MGS, a bad configuration, or other errors. See the syslog for more information. Aug 19 12:37:43 mds2 kernel: LustreError: 7438:0:(obd_mount.c: 1113:server_start_targets()) failed to start server nobackup-MDT: -2 Aug 19 12:37:44 mds2 kernel: LustreError: 7438:0:(obd_mount.c: 1623:server_fill_super()) Unable to start targets: -2 Aug 19 12:37:44 mds2 kernel: Lustre: Failing over nobackup-MDT Aug 19 12:37:44 mds2 kernel: Lustre: *** setting obd nobackup-MDT device 'unknown-block(8,16)' read-only *** We have ran e2fsck on the volume, found a few errors and corrected. But the problem presists. We also tried mounting with -o abort_recov this resulted in a assertion (lbug) and does not work. ANy thoughts? The lines: Aug 19 12:37:43 mds2 kernel: LustreError: 7525:0:(llog_lvfs.c: 612:llog_lvfs_create()) error looking up logfile 0xf150010:0x80d24629: rc -2 Aug 19 12:37:43 mds2 kernel: LustreError: 7525:0:(llog_cat.c: 176:llog_cat_id2handle()) error opening log id 0xf150010:80d24629: rc -2 Aug 19 12:37:43 mds2 kernel: LustreError: 7525:0:(llog_obd.c: 262:cat_cancel_cb()) Cannot find handle for log 0xf150010 Catch my attention, Thanks, we are running 1.6.6 Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Lustre featured on podcast (HT: Andreas Dilger)
Thanks to Andreas for taking an hour out to talk with Jeff Squyres and myself (Brock Palen) about the Lustre cluster filesystem on our podcast www.rce-cast.com, You can find the whole show at: http://www.rce-cast.com/index.php/Podcast/rce-14-lustre-cluster-filesystem.html Thanks again! If any of you have requests of topics you would like to hear please let me know! Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre featured on podcast (HT: Andreas Dilger)
http://en.wikipedia.org/wiki/Nagle%27s_algorithm Looks like you intentionally hold up data to try to make fatter payloads in packets so they are not 99% header/crc data. Sounds like a way to make latency bad. Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Aug 3, 2009, at 8:20 PM, Mag Gam wrote: Very nice. 15:54, what is Nagle ? He didn't say anything about SNS, but changeLogs seems very promising! On Mon, Aug 3, 2009 at 8:55 AM, Brock Palenbro...@umich.edu wrote: Thanks to Andreas for taking an hour out to talk with Jeff Squyres and myself (Brock Palen) about the Lustre cluster filesystem on our podcast www.rce-cast.com, You can find the whole show at: http://www.rce-cast.com/index.php/Podcast/rce-14-lustre-cluster-filesystem.html Thanks again! If any of you have requests of topics you would like to hear please let me know! Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] x4540 (thor) panic
On Jun 15, 2009, at 11:44 AM, Nirmal Seenu wrote: We have been running the Lustre servers on a machine with Nvidia chipset(nVidia Corporation MCP55 Ethernet (rev a3)) for well over a year now, the following two options seems to work the best on these servers: options forcedeth max_interrupt_work=50 optimization_mode=1 Thanks we put those in place, and disabled bonding for now (running on one over taxed gig-e port). We also tried noapic because of some notes online for the crashes we were seeing, but that does not let the MPT disk controllers in the machine startup. (sets al drives offline). Thanks for the note, Brock optimization_mode enables Interrupt coalescing. Nirmal ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Lustre on Podcast?
I host an HPC podcast along with Jeff Squyres at www.rce-cast.com We would like to invite Lustre to be the next guest on the show. Please contact me on or off list if you would like to do this, and if so who should be the point of contact from the Lustre group. Thanks! Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] OpenMX
I had the dev of OpenMX on my podcast (www.rce-cast.com) this got me thinking, has anyone ever tried OpenMX with Lustre? In theory it should work, but it wasn't the case with some other tools when asking around. Note we have not tried OpenMX yet, but will evaluate it soon. Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] checking lustre health
I am writing a small script to monitor the health of the lustre servers by reading /proc/fs/lustre/health_check Is the regex ^healthy$ enough to make sure that I am notified if it ever changes? Should there be any other locations I should check for lustre errors that should be acted on? Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] SELinux and lustre clients
It has been stated on the list before that the lustre servers are not compatible with SELinux, but what about clients? We have some post-processing desktops that are clients of our lustre system. We don't have control over this load, and they are dedicated to using SELinux. Redhat says it is a lustre problem, after working on it a few months with them: https://bugzilla.redhat.com/show_bug.cgi?id=489583 Is this the case? Has anyone managed to run lustre clients on systems with SELinux enabled? Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] RHEL4 build of lustre patched e2fsprogs
I am trying to install the a newer version of e2fsprogs to see if the e2scan in newer versions were built with sqlite (the rpm I got when I built the cluster did not). The new rpm: e2fsprogs-1.40.11.sun1-0redhat.x86_64.rpm Appears to built against versions of berkdb (not sqlite now?) that is newer than is part of RHEL4. error: Failed dependencies: libc.so.6(GLIBC_2.4)(64bit) is needed by e2fsprogs-1.40.11.sun1-0redhat.x86_64 libdb-4.3.so()(64bit) is needed by e2fsprogs-1.40.11.sun1-0redhat.x86_64 rtld(GNU_HASH) is needed by e2fsprogs-1.40.11.sun1-0redhat.x86_64 up2date says that db4-4.2 is all that is available of rhel4 stock from redhat with updates. Not 4.3 not good. The libc error is funny also because as far as I can tell /lib64/ libc.so.6 is just that... In any-case, I could only install if I said nodeps, but that is getting just silly. Does anyone have a working patched e2fsprogs from rhel4? Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] e2scan for cleaning scratch space
e2scan will show me all the files that have changed from a date, but I want to know all the files that have not changed sense some date. The goal is to make a system for purging scratch spaces that is fast, and minimum wear on the filesystem. How are groups doing this now? Are you using e2scan? Is there a way to have e2scan not only list the file but also the mtime/ctime in the log file, so that we can sort oldest to newest? Thank you! Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] e2scan for cleaning scratch space
The e2scan shipped from sun's rpms does not support sqlite3 out of the box: rpm -qf /usr/sbin/e2scan e2fsprogs-1.40.7.sun3-0redhat e2scan: sqlite3 was not detected on configure, database creation is not supported Should I just rebuilt only e2scan? Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Mar 4, 2009, at 2:19 PM, Daire Byrne wrote: Brock, - Brock Palen bro...@umich.edu wrote: e2scan will show me all the files that have changed from a date, but I want to know all the files that have not changed sense some date. The goal is to make a system for purging scratch spaces that is fast, and minimum wear on the filesystem. How are groups doing this now? Are you using e2scan? Is there a way to have e2scan not only list the file but also the mtime/ctime in the log file, so that we can sort oldest to newest? e2scan can dump it's findings to a sqlite DB which has the ctime/mtime info in it. But you'll need to write some logic to construct the filepaths because everything is stored with inode number as the index. There is code in e2scan that can probably be recycled for that purpose though. So I suppose you would get e2scan to create the DB and then a custom app would search by ctime/mtime and spit out the full file path. Daire ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] rdac configuration, please help
Actually I did this again recently and the directions still work, Here is what I have on our internal Wiki here at Michigan, begin the sun array does not work with redhats DM-multipath rpm. Download the linuxrdac source from sun and the lustre kernel source code. Install the kernel source and link to /usr/src/linux rpm -ivh kernel-lustre-source-2.6.9-67.0.7.EL_lustre.1.6.5.1.x86_64.rpm ln -s linux-2.6.9-67.0.7.EL_lustre.1.6.5.1 linux cd /usr/src/linux make mrproper cp /boot/config-`uname -r` .config make oldconfig make dep make modules There will be two directories, one ending in -obj and one not. The directories scripts/mod scripts/genksyms need to be copied to the one ending in -obj/scripts . Once done untar the linuxrdac package and copy in a working Makefile provided by This Makefile will not install right, you will need to comment out the install of mppiscsi_umountall It is not needed. Makefile_linuxrdac-09.01.B2.74 rdac-LINUX-09.01.B2.40-source.tar.gz make clean make make uninstall make make install yes Edit grub.conf to use the mpp initrd over the standard one: initrd /initrd-2.6.9-67.0.7.EL_lustre.1.6.5.1smp.img To: initrd /mpp-2.6.9-67.0.7.EL_lustre.1.6.5.1smp.img LUNs are accessible form a single SCSI block device, failover happens in a few seconds, but not right away. CAM should notify you. [r...@mds1 scripts]# /opt/mpp/lsvdev Array Name Lunsd device - mds-raid0 - /dev/sdb mds-raid1 - /dev/sdc mds-raid2 - /dev/sdd I hope that is enough detail for you, /end Again, sun sold us this array, but the sun packaged kernels didn't come with support for it, annoying. Maybe sun in the future will push their stuff into DM-Multipath, or just package it with lustre. Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Feb 27, 2009, at 6:34 PM, Adint, Eric (CIV) wrote: ok at this point im desparate i have a rocks cluster with the SUN 4Gb FC cards bassed on qla22xx drivers with a Storagetek 6140 i am trying to build the folowing rdac to the lustre kernel 2.6.18-92.1.10.el5_lustre.1.6.6smp using the following source linuxrdac-09.02.C2.13 the error i get is[r...@nas-0-0 linuxrdac-09.02.C2.13]# make make V=0 -C/lib/modules/2.6.18-92.1.10.el5_lustre.1.6.6smp/build M=/root/linuxrdac-09.02.C2.13 MODVERDIR=/lib/modules/ 2.6.18-92.1.10.el5_lustre.1.6.6smp/build/.tmp_versions SUBDIRS=/ root/linuxrdac-09.02.C2.13 modules make[1]: Entering directory `/usr/src/ linux-2.6.18-92.1.10.el5_lustre.1.6.6-obj/x86_64/smp' make -C ../../../linux-2.6.18-92.1.10.el5_lustre.1.6.6 O=../ linux-2.6.18-92.1.10.el5_lustre.1.6.6-obj/x86_64/smp modules ERROR: Kernel configuration is invalid. include/linux/autoconf.h or include/config/auto.conf are missing. Run 'make oldconfig make prepare' on kernel src to fix it. CC [M] /root/linuxrdac-09.02.C2.13/mppLnx26p_upper.o /bin/sh: scripts/genksyms/genksyms: No such file or directory make[4]: *** [/root/linuxrdac-09.02.C2.13/mppLnx26p_upper.o] Error 1 make[3]: *** [_module_/root/linuxrdac-09.02.C2.13] Error 2 make[2]: *** [modules] Error 2 make[1]: *** [modules] Error 2 make[1]: Leaving directory `/usr/src/ linux-2.6.18-92.1.10.el5_lustre.1.6.6-obj/x86_64/smp' make: *** [mppUpper] Error 2 i have tried the following suggestion from lustre http://www.mail-archive.com/lustre-discuss@lists.lustre.org/ msg01682.html i may not have changed the information enough does anyone know if it is neccesary to recompile the rdac and if so, is there a comprehensive lustre howto on how to compile kernel modules. I thank you in advance for any help Eric Adint ehad...@nps.edu High Performance Computing Specialist Naval Postgraduate School 833 Dyer Road Bldg 232 Room 139a Monterey Ca 93943 831-656-3440 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Recovery without end
We used to do something similar, and still had issues, Upgrading all servers (2 OSS's 7 OSTs each) and clients (800) to 1.6.6 fixed all our issues, we run default timeout's and default everything really, no issues. Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Feb 25, 2009, at 11:22 AM, Charles Taylor wrote: I'm going to pipe in here.We too use a very large (1000) timeout value. We have two separate luster file systems one of them consists of two rather beefy OSSs with 12 OSTs each (FalconIII FC-SATA RAID). The other consists of 8 OSSs with 3 OSTs each (Xyratex 4900FC). We have about 500 clients and support both tcp and o2ib NIDS. We run Lustre 1.6.4.2 on a patched 2.6.18-8.1.14 CentOS/RH kernel. It has worked *very* well for us for over a year now - very few problems with very good performance under very heavy loads. We've tried setting our timeout to lower values but settled on the 1000 value (despite the long recovery periods) because if we don't, our lustre connectivity starts to breakdown and our mounts come and go with errors like transport endpoint failure or transport endpoint not connected or some such (its been a while now).File system access comes and goes randomly on nodes.We tried many tunings and looked for other sources of problems (underlying network issues). Ultimately, the only thing we found that fixed this was to extend the timeout value. I know you will be tempted to tell us that our network must be flakey but it simply is not. We'd love to understand why we need such a large timeout value and why, if we don't use a large value, we see these transport end-point failures.However, after spending several days trying to understand and resolve the issue, we finally just accepted the long timeout as a suitable workaround. I wonder if there are others who have silently done the same. We'll be upgrading to 1.6.6 or 1.6.7 in the not-too-distant future.Maybe then we'll be able to do away with the long timeout value but until then, we need it. :( Just my two cents, Charlie Taylor UF HPC Center On Feb 25, 2009, at 11:03 AM, Brian J. Murrell wrote: On Wed, 2009-02-25 at 16:09 +0100, Thomas Roth wrote: Our /proc/sys/lustre/timeout is 1000 That's way to high. Long recoveries are exactly the reason you don't want this number to be huge. - there has been some debate on this large value here, but most other installation will not run in a network environment with a setup as crazy as ours. What's so crazy about your set up? Unless your network is very flaky and/or you have not tuned your OSSes properly, there should be no need for such a high timeout and if there is you need to address the problems requiring it. Putting the timeout to 100 immediately results in Transport endpoint errors, impossible to run Lustre like this. 300 is the max that we recommend and we have very large production clusters that use such values successfully. Since this is a 1.6.5.1 system, I activated the adaptive timeouts - and put them to equally large values, /sys/module/ptlrpc/parameters/at_max = 6000 /sys/module/ptlrpc/parameters/at_history = 6000 /sys/module/ptlrpc/parameters/at_early_margin = 50 /sys/module/ptlrpc/parameters/at_extra = 30 This is likely not good as well. I will let somebody more knowledgeable about AT comment in detail though. It's a new feature and not getting wide use at all yet, so the real-world experience is still low. b. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre NOT HEALTHY
Ok thanks, It happened again last night, sooner than normal. I will send a new message with the details. Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Jan 13, 2009, at 11:09 PM, Cliff White wrote: Brock Palen wrote: How common is it for servers to go NOT HEALTHY? I feel it is happening much more often than it should be with us. A few times a month. It should not happen at all, in the normal case. It indicates a problem. If this happens, we reboot the servers. Should we do something else? Maybe it depends on what the problem was? Well, determining what the actual problem that caused the NOT HEALTHY would be quite useful, yes. I would not just reboot. -Examine consoles of _all_ servers for any error indications - Examine syslogs of _all_ servers for any LustreErrors or LBUG - Check network and hardware health. Are your disks happy? Is your network dropping packets? Try to figure out what was happening on the cluster. Does this relate to a specific user workload or system load condition? Can you reproduce the situation? Does it happen at a specific time of day, time of month? If we should not be getting NOT HEALTHY that often, what information should I collect to report to CFS? The lustre-diagnostics package is good start for general system config. Beyond that, most of what we would need is listed above. cliffw Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] LBUG ASSERTION(lock-l_resource != NULL) failed
I am having servers LBUG on a regular basis, Clients are running 1.6.6 patchless on RHEL4, servers are running RHEL4 with 1.6.5.1 RPM's from the download page. All connection is over Ethernet, Servers are x4600's. The OSS that BUG'd has in its log: Jan 13 16:35:39 oss2 kernel: LustreError: 10243:0:(ldlm_lock.c: 430:__ldlm_handle2lock()) ASSERTION(lock-l_resource != NULL) failed Jan 13 16:35:39 oss2 kernel: LustreError: 10243:0:(tracefile.c: 432:libcfs_assertion_failed()) LBUG Jan 13 16:35:39 oss2 kernel: Lustre: 10243:0:(linux-debug.c: 167:libcfs_debug_dumpstack()) showing stack for process 10243 Jan 13 16:35:39 oss2 kernel: ldlm_cn_08R running task 0 10243 1 10244 7776 (L-TLB) Jan 13 16:35:39 oss2 kernel: a0414629 0103d83c7e00 Jan 13 16:35:39 oss2 kernel:0101f8c88d40 a021445e 0103e315dd98 0001 Jan 13 16:35:39 oss2 kernel:0101f3993ea0 Jan 13 16:35:39 oss2 kernel: Call Trace:a0414629 {:ptlrpc:ptlrpc_server_handle_request+2457} Jan 13 16:35:39 oss2 kernel:a021445e {:libcfs:lcw_update_time+30} 80133855{__wake_up_common+67} Jan 13 16:35:39 oss2 kernel:a0416d05 {:ptlrpc:ptlrpc_main+3989} a0415270 {:ptlrpc:ptlrpc_retry_rqbds+0} Jan 13 16:35:39 oss2 kernel:a0415270 {:ptlrpc:ptlrpc_retry_rqbds+0} a0415270 {:ptlrpc:ptlrpc_retry_rqbds+0} Jan 13 16:35:39 oss2 kernel:80110de3{child_rip+8} a0415d70{:ptlrpc:ptlrpc_main+0} Jan 13 16:35:39 oss2 kernel:80110ddb{child_rip+0} Jan 13 16:35:40 oss2 kernel: LustreError: dumping log to /tmp/lustre- log.1231882539.10243 At the same time a client (nyx346) lost contact with that oss, and is never allowed to reconnect. Client /var/log/message: Jan 13 16:37:20 nyx346 kernel: Lustre: nobackup-OST000d- osc-01022c2a7800: Connection to service nobackup-OST000d via nid 10.164.3@tcp was lost; in progress operations using this service will wait for recovery to complete.Jan 13 16:37:20 nyx346 kernel: Lustre: Skipped 6 previous similar messagesJan 13 16:37:20 nyx346 kernel: LustreError: 3889:0:(ldlm_request.c:996:ldlm_cli_cancel_req ()) Got rc -11 from cancel RPC: canceling anywayJan 13 16:37:20 nyx346 kernel: LustreError: 3889:0:(ldlm_request.c: 1605:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -11Jan 13 16:37:20 nyx346 kernel: LustreError: 11-0: an error occurred while communicating with 10.164.3@tcp. The ost_connect operation failed with -16Jan 13 16:37:20 nyx346 kernel: LustreError: Skipped 10 previous similar messages Jan 13 16:37:45 nyx346 kernel: Lustre: 3849:0:(import.c: 410:import_select_connection()) nobackup-OST000d- osc-01022c2a7800: tried all connections, increasing latency to 7s Even now the server(OSS) is refusing connection to OST00d, with the message: Lustre: 9631:0:(ldlm_lib.c:760:target_handle_connect()) nobackup- OST000d: refuse reconnection from 145a1ec5-07ef- f7eb-0ca9-2a2b6503e...@10.164.1.90@tcp to 0x0103d5ce7000; still busy with 2 active RPCs If I reboot the OSS, the OST's on it go though recovery like normal, and then the client is fine. Network looks clean, found one machine with lots of dropped packets between the servers, but that is not the client in question. Thank you! If it happens again, and I find any other data I will let you know. Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Lustre NOT HEALTHY
How common is it for servers to go NOT HEALTHY? I feel it is happening much more often than it should be with us. A few times a month. If this happens, we reboot the servers. Should we do something else? Maybe it depends on what the problem was? If we should not be getting NOT HEALTHY that often, what information should I collect to report to CFS? Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Lustre Intelligence?
So question, had a user, thought his problem was disk system, turns out he was just OOM machine, his IO code though looked like this: for(i=0;ikloc;i++) { sprintf((buffer[39]),%d, i+(int)floor(rank*(double)k/(double)numprocs)+1); f1 = fopen(buffer,w); for(j=0;jN;j++) { fprintf(f1,%e\n,u[i][j]); } fclose(f1); } So how I read this every processor (every processor calls this function and writes to its own sets of files) is writing one double at a time to their files. IO performance though was still quite good. I enabled extents_stats on rank0 of this job and ran it, Here Is what I ended up with (stats were zeroed, and only job running on client) extentscalls% cum% | calls% cum% 0K -4K : 1244 | 0 00 4K -8K :004 | 0 00 8K - 16K : 004 | 0 00 16K - 32K : 004 | 0 00 32K - 64K : 004 | 0 00 64K - 128K : 004 | 000 128K - 256K :004 | 000 256K - 512K :004 | 000 512K - 1024K : 004 | 411 1M -2M :136 47 51 |220 98 100 2M -4M :140 48 100 | 00 100 So 98% of writes and reads (read code is similar and reads in about 2GB this way) were all 1-4MB. Is this lustre showing its' preference for 1MB IO ops? Even though the code wanted to do 8bytes at a time, lustre cleaned it up? Or did LInux do this some place? Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] lustre/abaqus tweaks for lustre?
I have seen a few papers around, but does anyone have comments on how to optimize ether lustre or abaqus to use lustre for scratch? I see reads coming in at only 20MB/s and IO Wait gets quite high on the client. I know this is probably not enough information, but is there any knobs people have twisted on their own systems for this that I can be informed on? Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Clients fail every now and again,
Thanks, Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Nov 18, 2008, at 4:47 PM, Andreas Dilger wrote: On Nov 18, 2008 12:14 -0500, Brock Palen wrote: if that is the bug causing this, is the fix till we upgrade to the newer lustre, to set statahead_max=0 again? Yes, this is another statahead bug. I see this same behavior this morning on a compute node. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Nov 16, 2008, at 10:49 PM, Yong Fan wrote: Brock Palen ćé: We consistantly see random ocurances of a client being kicked out, and while lustre says it tries to reconnect, it almost never can without a reboot: Maybe you can check: https://bugzilla.lustre.org/show_bug.cgi?id=15927 Regards! -- Fan Yong Nov 14 18:28:18 nyx-login1 kernel: LustreError: 14130:0:(import.c: 226:ptlrpc_invalidate_import()) nobackup-MDT_UUID: rc = -110 waiting for callback (3 != 0) Nov 14 18:28:18 nyx-login1 kernel: LustreError: 14130:0:(import.c: 230:ptlrpc_invalidate_import()) @@@ still on sending list [EMAIL PROTECTED] x979024/t0 o101-nobackup- [EMAIL PROTECTED]@tcp:12/10 lens 448/1184 e 0 to 100 dl 1226700928 ref 1 fl Rpc:RES/0/0 rc -4/0 Nov 14 18:28:18 nyx-login1 kernel: LustreError: 14130:0:(import.c: 230:ptlrpc_invalidate_import()) Skipped 1 previous similar messageNov 14 18:28:18 nyx-login1 kernel: Lustre: nobackup- MDT- mdc-0100f7ef0400: Connection restored to service nobackup-MDT using nid [EMAIL PROTECTED] Nov 14 18:30:32 nyx-login1 kernel: LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The mds_statfs operation failed with -107 Nov 14 18:30:32 nyx-login1 kernel: Lustre: nobackup-MDT- mdc-0100f7ef0400: Connection to service nobackup-MDT via nid [EMAIL PROTECTED] was lost; in progress operations using this service will wait for recovery to complete. Nov 14 18:30:32 nyx-login1 kernel: LustreError: 167-0: This client was evicted by nobackup-MDT; in progress operations using this service will fail. Nov 14 18:30:32 nyx-login1 kernel: LustreError: 16523:0: (llite_lib.c: 1549:ll_statfs_internal()) mdc_statfs fails: rc = -5 Nov 14 18:30:35 nyx-login1 kernel: LustreError: 16525:0:(client.c: 716:ptlrpc_import_delay_req()) @@@ IMP_INVALID [EMAIL PROTECTED] x983192/t0 o41-nobackup- [EMAIL PROTECTED]@tcp:12/10 lens 128/400 e 0 to 100 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 Nov 14 18:30:35 nyx-login1 kernel: LustreError: 16525:0: (llite_lib.c: 1549:ll_statfs_internal()) mdc_statfs fails: rc = -108 Is there any way to make lustre more robust against these types of failures? According to the manual (and many times in practice, like rebooting a MDS) the filesystem will just block and comeback. This almost never comes back, after a while it will say reconnected, but will fail again right away. On the MDS I see: Nov 14 18:30:20 mds1 kernel: Lustre: nobackup-MDT: haven't heard from client 1284bfca-91bd-03f6-649c-f591e5d807d5 (at [EMAIL PROTECTED]) in 227 seconds. I think it's dead, and I am evicting it. Nov 14 18:30:28 mds1 kernel: LustreError: 11463:0:(handler.c: 1515:mds_handle()) operation 41 on unconnected MDS from [EMAIL PROTECTED] Nov 14 18:30:28 mds1 kernel: LustreError: 11463:0:(ldlm_lib.c: 1536:target_send_reply_msg()) @@@ processing error (-107) [EMAIL PROTECTED] x983190/t0 o41-?@?:0/0 lens 128/0 e 0 to 0 dl 1226705528 ref 1 fl Interpret:/0/0 rc -107/0 Nov 14 18:34:15 mds1 kernel: Lustre: nobackup-MDT: haven't heard from client 1284bfca-91bd-03f6-649c-f591e5d807d5 (at [EMAIL PROTECTED]) in 227 seconds. I think it's dead, and I am evicting it. Just keeps kicking it out, /proc/fs/lustre/health_check on client, and servers are healthy. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Is patchless ok for EL4 now?
We have been running this for a while. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Nov 6, 2008, at 10:54 AM, Peter Kjellstrom wrote: After reading http://wiki.lustre.org/index.php? title=Patchless_Client it is my understanding that it is now (2.6.9-78.0.5.EL and lustre-1.6.6) ok to run patchless client on EL4 (64-bit). This based on the fact that the problems described on the wiki-page were fixed in versions older than mentioned above (last bug/comment was for -55). Is this accurate or is the wiki missing information here? (Brian wrote in july that EL4 simply was too old...) Anybody running this already? Tia, Peter ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Is patchless ok for EL4 now?
2.6.9-78.0.1.ELsmp Lustre-1.6.5.1 Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Nov 6, 2008, at 11:18 AM, Peter Kjellstrom wrote: On Thursday 06 November 2008, Brock Palen wrote: We have been running this for a while. Brock Palen Thanks for the data point. What are the exact kernel and lustre versions you've been running (presumably) ok? /Peter ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] unexpectedly long timeout
New error I have never seen before, googling didn't fine much other than an error involving IB. This node has IB, but lustre runs over TCP. Nov 5 02:19:54 nyx668 kernel: Lustre: 4329:0:(niobuf.c: 305:ptlrpc_unregister_bulk()) @@@ Unexpectedly long timeout: desc 01041802f600 [EMAIL PROTECTED] x1071812/t0 o4-nobackup- [EMAIL PROTECTED]@tcp:6/4 lens 384/480 e 0 to 100 dl 1225842598 ref 2 fl Rpc:X/0/0 rc 0/0Nov 5 02:19:54 nyx668 kernel: Lustre: 4329:0:(niobuf.c:305:ptlrpc_unregister_bulk()) Skipped 1 previous similar message Nov 5 02:29:54 nyx668 kernel: Lustre: 4329:0:(niobuf.c: 305:ptlrpc_unregister_bulk()) @@@ Unexpectedly long timeout: desc 01041802f600 [EMAIL PROTECTED] x1071812/t0 o4-nobackup- [EMAIL PROTECTED]@tcp:6/4 lens 384/480 e 0 to 100 dl 1225842598 ref 2 fl Rpc:X/0/0 rc 0/0 On the OSS that provides OST000c The only errors I see from that node are the usual, 'can't hear from node' Nov 4 18:46:02 oss2 kernel: Lustre: 6426:0:(ost_handler.c: 1270:ost_brw_write()) nobackup-OST000c: ignoring bulk IO comm error with [EMAIL PROTECTED] id [EMAIL PROTECTED] - client will retry Nov 4 18:49:42 oss2 kernel: Lustre: nobackup-OST000c: haven't heard from client 0d8e8d79-bfac-9d81-a345-39aaf2d4bc0e (at [EMAIL PROTECTED]) in 227 seconds. I think it's dead, and I am evicting it. Nov 4 18:49:42 oss2 kernel: Lustre: nobackup-OST000d: haven't heard from client 0d8e8d79-bfac-9d81-a345-39aaf2d4bc0e (at [EMAIL PROTECTED]) in 227 seconds. I think it's dead, and I am evicting it. Any thoughts? Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre 1.6.5.1 on X4200 and STK 6140 Issues
I know you say the only addition was the RDAC for the MDS's I assume (we use it also just fine). When I ran faultmond from suns dcmu rpm (RHEL 4 here) the x4500's would crash like clock work ~48 hours. For a very simple bit of code I was surpised that once when I forgot to turn it on when working on the load this would happen. Just FYI it was unrelated to lustre (using provided rpm's no kernel build) this solved my problem on the x4500 Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Oct 13, 2008, at 4:41 AM, Malcolm Cowe wrote: The X4200m2 MDS systems and the X4500 OSS were rebuilt using the stock Lustre packages (Kernel + modules + userspace). With the exception of the RDAC kernel module, no additional software was applied to the systems. We recreated our volumes and ran the servers over the weekend. However, the OSS crashed about 8 hours in. The syslog output is attached to this message. Looks like it could be similar to bug #16404, which means patching and rebuilding the kernel. Given my lack of success at trying to build from source, I am again asking for some guidance on how to do this. I sent out the steps I used to try and build from source on the 7th because I was encountering problems and was unable to get a working set of packages. Included in that messages was output from quilt that implies that the kernel patching process was not working properly. Regards, Malcolm. -- 6g_top.gif Malcolm Cowe Solutions Integration Engineer Sun Microsystems, Inc. Blackness Road Linlithgow, West Lothian EH49 7LR UK Phone: x73602 / +44 1506 673 602 Email: [EMAIL PROTECTED] 6g_top.gif Oct 10 06:49:39 oss-1 kernel: LDISKFS FS on md15, internal journal Oct 10 06:49:39 oss-1 kernel: LDISKFS-fs: mounted filesystem with ordered data mode. Oct 10 06:53:42 oss-1 kernel: kjournald starting. Commit interval 5 seconds Oct 10 06:53:42 oss-1 kernel: LDISKFS FS on md16, internal journal Oct 10 06:53:42 oss-1 kernel: LDISKFS-fs: mounted filesystem with ordered data mode. Oct 10 06:57:49 oss-1 kernel: kjournald starting. Commit interval 5 seconds Oct 10 06:57:49 oss-1 kernel: LDISKFS FS on md17, internal journal Oct 10 06:57:49 oss-1 kernel: LDISKFS-fs: mounted filesystem with ordered data mode. Oct 10 07:44:55 oss-1 faultmond: 16:Polling all 48 slots for drive fault Oct 10 07:45:00 oss-1 faultmond: Polling cycle 16 is complete Oct 10 07:56:23 oss-1 kernel: Lustre: OBD class driver, [EMAIL PROTECTED] Oct 10 07:56:23 oss-LDISKFS-fs: file extents enabled1 kernel: Lustre VersionLDISKFS-fs: mballoc enabled : 1.6.5.1 Oct 10 07:56:23 oss-1 kernel: Build Version: 1.6.5.1-1969123119-PRISTINE-.cache.OLDRPMS.20080618230526.linux- smp-2.6.9-67.0.7.EL_lustre.1.6.5.1.x86_64-2.6.9-67.0.7.EL_lustre. 1.6.5.1smp Oct 10 07:56:24 oss-1 kernel: Lustre: Added LNI [EMAIL PROTECTED] [8/64] Oct 10 07:56:24 oss-1 kernel: Lustre: Lustre Client File System; [EMAIL PROTECTED] Oct 10 07:56:24 oss-1 kernel: kjournald starting. Commit interval 5 seconds Oct 10 07:56:24 oss-1 kernel: LDISKFS FS on md11, external journal on md21 Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: mounted filesystem with journal data mode. Oct 10 07:56:24 oss-1 kernel: kjournald starting. Commit interval 5 seconds Oct 10 07:56:24 oss-1 kernel: LDISKFS FS on md11, external journal on md21 Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: mounted filesystem with journal data mode. Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: file extents enabled Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: mballoc enabled Lustre: Request x1 sent from [EMAIL PROTECTED] to NID [EMAIL PROTECTED] 5s ago has timed out (limit 5s). Oct 10 07:56:30 oss-1 kernel: Lustre: Request x1 sent from [EMAIL PROTECTED] to NID [EMAIL PROTECTED] 5s ago has timed out (limit 5s). LustreError: 4685:0:(events.c:55:request_out_callback()) @@@ type 4, status -113 [EMAIL PROTECTED] x3/t0 o250- [EMAIL PROTECTED]@o2ib_1:26/25 lens 240/400 e 0 to 5 dl 1223621815 ref 2 fl Rpc:/0/0 rc 0/0 Lustre: Request x3 sent from [EMAIL PROTECTED] to NID [EMAIL PROTECTED] 0s ago has timed out (limit 5s). LustreError: 18125:0:(obd_mount.c:1062:server_start_targets()) Required registration failed for lfs01-OST: -5 LustreError: 15f-b: Communication error with the MGS. Is the MGS running? LustreError: 18125:0:(obd_mount.c:1597:server_fill_super()) Unable to start targets: -5 LustreError: 18125:0:(obd_mount.c:1382:server_put_super()) no obd lfs01-OST LustreError: 18125:0:(obd_mount.c:119:server_deregister_mount()) lfs01-OST not registered LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost LDISKFS-fs: mballoc: 0 generated and it took 0 LDISKFS-fs: mballoc: 0 preallocated, 0 discarded Oct 10 07:56:50 oss-1
Re: [Lustre-discuss] Lustre 1.6.5.1 on X4200 and STK 6140 Issues
I never uninstalled it (i still use some of the tools in it) Faultmond is a service, just chkconfig it off. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Oct 13, 2008, at 11:03 AM, Malcolm Cowe wrote: Brock Palen wrote: I know you say the only addition was the RDAC for the MDS's I assume (we use it also just fine). Yes, the MDS's share a STK 6140. When I ran faultmond from suns dcmu rpm (RHEL 4 here) the x4500's would crash like clock work ~48 hours. For a very simple bit of code I was surpised that once when I forgot to turn it on when working on the load this would happen. Just FYI it was unrelated to lustre (using provided rpm's no kernel build) this solved my problem on the x4500 The DCMU RPM is installed. I didn't explicitly install this, so it must have been bundled in with the SIA CD... I'll try removing the rpm to see what happens. Thanks for the heads up. Regards, Malcolm. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Oct 13, 2008, at 4:41 AM, Malcolm Cowe wrote: The X4200m2 MDS systems and the X4500 OSS were rebuilt using the stock Lustre packages (Kernel + modules + userspace). With the exception of the RDAC kernel module, no additional software was applied to the systems. We recreated our volumes and ran the servers over the weekend. However, the OSS crashed about 8 hours in. The syslog output is attached to this message. Looks like it could be similar to bug #16404, which means patching and rebuilding the kernel. Given my lack of success at trying to build from source, I am again asking for some guidance on how to do this. I sent out the steps I used to try and build from source on the 7th because I was encountering problems and was unable to get a working set of packages. Included in that messages was output from quilt that implies that the kernel patching process was not working properly. Regards, Malcolm. -- 6g_top.gif Malcolm Cowe Solutions Integration Engineer Sun Microsystems, Inc. Blackness Road Linlithgow, West Lothian EH49 7LR UK Phone: x73602 / +44 1506 673 602 Email: [EMAIL PROTECTED] 6g_top.gif Oct 10 06:49:39 oss-1 kernel: LDISKFS FS on md15, internal journal Oct 10 06:49:39 oss-1 kernel: LDISKFS-fs: mounted filesystem with ordered data mode. Oct 10 06:53:42 oss-1 kernel: kjournald starting. Commit interval 5 seconds Oct 10 06:53:42 oss-1 kernel: LDISKFS FS on md16, internal journal Oct 10 06:53:42 oss-1 kernel: LDISKFS-fs: mounted filesystem with ordered data mode. Oct 10 06:57:49 oss-1 kernel: kjournald starting. Commit interval 5 seconds Oct 10 06:57:49 oss-1 kernel: LDISKFS FS on md17, internal journal Oct 10 06:57:49 oss-1 kernel: LDISKFS-fs: mounted filesystem with ordered data mode. Oct 10 07:44:55 oss-1 faultmond: 16:Polling all 48 slots for drive fault Oct 10 07:45:00 oss-1 faultmond: Polling cycle 16 is complete Oct 10 07:56:23 oss-1 kernel: Lustre: OBD class driver, [EMAIL PROTECTED] Oct 10 07:56:23 oss-LDISKFS-fs: file extents enabled1 kernel: Lustre VersionLDISKFS-fs: mballoc enabled : 1.6.5.1 Oct 10 07:56:23 oss-1 kernel: Build Version: 1.6.5.1-1969123119-PRISTINE-.cache.OLDRPMS. 20080618230526.linux- smp-2.6.9-67.0.7.EL_lustre. 1.6.5.1.x86_64-2.6.9-67.0.7.EL_lustre. 1.6.5.1smp Oct 10 07:56:24 oss-1 kernel: Lustre: Added LNI [EMAIL PROTECTED] [8/64] Oct 10 07:56:24 oss-1 kernel: Lustre: Lustre Client File System; [EMAIL PROTECTED] Oct 10 07:56:24 oss-1 kernel: kjournald starting. Commit interval 5 seconds Oct 10 07:56:24 oss-1 kernel: LDISKFS FS on md11, external journal on md21 Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: mounted filesystem with journal data mode. Oct 10 07:56:24 oss-1 kernel: kjournald starting. Commit interval 5 seconds Oct 10 07:56:24 oss-1 kernel: LDISKFS FS on md11, external journal on md21 Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: mounted filesystem with journal data mode. Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: file extents enabled Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: mballoc enabled Lustre: Request x1 sent from [EMAIL PROTECTED] to NID [EMAIL PROTECTED] 5s ago has timed out (limit 5s). Oct 10 07:56:30 oss-1 kernel: Lustre: Request x1 sent from [EMAIL PROTECTED] to NID [EMAIL PROTECTED] 5s ago has timed out (limit 5s). LustreError: 4685:0:(events.c: 55:request_out_callback()) @@@ type 4, status -113 [EMAIL PROTECTED] x3/t0 o250- [EMAIL PROTECTED]@o2ib_1:26/25 lens 240/400 e 0 to 5 dl 1223621815 ref 2 fl Rpc:/0/0 rc 0/0 Lustre: Request x3 sent from [EMAIL PROTECTED] to NID [EMAIL PROTECTED] 0s ago has timed out (limit 5s). LustreError: 18125:0:(obd_mount.c: 1062:server_start_targets()) Required registration failed for lfs01-OST: -5 LustreError: 15f-b: Communication error with the MGS. Is the MGS
Re: [Lustre-discuss] Getting random No space left on device (28)
On any client lfs df -h Show you all your OST usage for all your OST in one command. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Oct 12, 2008, at 3:24 PM, Kevin Van Maren wrote: Sounds like one (or more) of your existing OSTs are out of space. The OSTs are assigned at file creation time, and Lustre will return an error if you cannot allocate space on the OST for a file you are writing. Do a df on your OSS nodes. Lustre does not re-stripe files; you may have to manually move (cp/rm) some files to the new OST to rebalance the file system. It is a manual process, but you can use lfs setstripe for force a specific OST, and use lfs getstripe to see where a file's storage is allocated. Kevin Mag Gam wrote: We have recently added another 1TB to a filesystem. We added a new OST and mounted the OST. On the clients we do a lfs df -h and we see the new space has been acquired. Also, lfs df -i shows enough inodes. However, we randomly see 'No Space left on device (28) when we run our jobs. But if we resubmit the jobs it works again. Is there anything special we need to do, after we mount up a new OST? TIA ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Adding IB to tcp only cluster
On Oct 10, 2008, at 2:45 PM, Brian J. Murrell wrote: On Fri, 2008-10-10 at 11:08 -0400, Brock Palen wrote: We have added a few IB nodes to our cluster (about 70 our of 600 nodes). What would it take to have lustre go over IB as well as tcp for the rest of the hosts? So I'm assuming that at least some of these IB nodes are servers (i.e. OSS) then. Not right now, the question was because we were thinking abou tit would only the oss need HCA's? or does the MDS need to have hca's also? No. There is no requirement that the MDS use IB just because (some) OSSes use it. Really? So given that lnet does the best path and it is not part of lustre its self. So if we only hook some of the OSS by IB, is there a way to have a user (who is a user of IB) IO prefer the IB connected OSS's. If that is not possible now, I think some of the patches announced that are for 1.8 or 2.0 had the ability to select a OSS for only given users. Am I correct? It would be nice to have MDS traffic over TCP (fast enough for this user) and IO over IB. Fair enough. How does lustre figure out the preferred path? An LNET node with multiple paths to another LNET node chooses the best path. How that decision is made, I'm not so sure, but I tend to think that o2iblnd will be preferred over socklnd. How can we have the nodes figure out IF I have IB talk to oss's over IB else use TCP? Assuming you get the configuration right on the nodes, LNET will just do that using it's best path algorithm. b. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] lustre-ldiskfs
I ran into this problem my self when sun convoluted download system took over hosting the lustre packages. When I tried to 'wget' the package, i forgot that sun makes you login and thus you download an html error page in place of the rpm. You will need to download to your machine then upload to the cluster, no cmd line download was possible. If anyone knows how to get around this let me know. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Sep 26, 2008, at 6:39 AM, Andreas Dilger wrote: On Sep 26, 2008 10:26 +0530, Chirag Raval wrote: When I am installing the lustre-ldiskfs-3.0.4-2.6.9_67.0.7.EL_lustre.1.6.5.1smp.i686.rpm I get the following error. Can someone please help me what can be wrong I am installing it on CentOS 4.5 # rpm -ivh lustre-ldiskfs-3.0.4-2.6.9_67.0.7.EL_lustre. 1.6.5.1smp.i686.rpm error: open of HTMLHEADTITLEError/TITLE/HEADBODY failed: No such file or directory error: open of An failed: No such file or directory error: open of error failed: No such file or directory error: open of occurred failed: No such file or directory error: open of while failed: No such file or directory error: open of processing failed: No such file or directory error: open of your failed: No such file or directory error: open of request.p failed: No such file or directory error: open of Reference failed: No such file or directory error: open of /BODY/HTML failed: No such file or directory You downloaded and are trying to install a web page (which itself appears to report that you had an error downloading the RPM). Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] l_getgroups: no such user
We are getting a bunch of: l_getgroups: no such user ## in our log files on the mds. We keep our /etc/passswd and /etc/group in sync with the clusters that mount it. Only one visulization workstation has users who are not in its list. Problem is I don't see any files owned by those users on the filesystem find . -uid # Finds nothing, Does lustre check if a user just cd's to that directory? Or is it for any user that logs in? Is it safe to ignore these messages for non cluster users? Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre clients failing, and cant reconnect
I had to reboot the MDS to get the problem to go away. I will watch and see if it reappears. I screwed up and deleted the wrong /var/log/messages So I don't have the messages. I am watching this issues. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Sep 5, 2008, at 10:01 AM, Brian J. Murrell wrote: On Fri, 2008-09-05 at 00:15 -0400, Brock Palen wrote: Looks like that didn't fix it. One of the login nodes repeated the behavior. So what are the messages the client logged when the problem occurred? And what, if anything was logged on the MDS at the same time? b. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Lustre clients failing, and cant reconnect
I am having clients lose their connection to the MDS. Messages on the clients look like this: Sep 4 19:51:30 nyx-login2 kernel: Lustre: nobackup-MDT- mdc-0101fc44e800: Connection to service nobackup-MDT via nid [EMAIL PROTECTED] was lost; in progress operations using this service will wait for recovery to complete. Sep 4 19:51:30 nyx-login2 kernel: LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The mds_connect operation failed with -16 It will keep doing this trying to connect and spiting out mds_connect failed -16. The clients never recover. On the mds all I see is: Lustre: 7653:0:(ldlm_lib.c:760:target_handle_connect()) nobackup- MDT: refuse reconnection from 618cf36e-a7a6- [EMAIL PROTECTED]@tcp to 0x01037c109000; still busy with 3 active RPCs This is common between many hosts that I get this RPC message. Clients and servers are all using TCP. Is this enough information? Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre clients failing, and cant reconnect
Looks like that didn't fix it. One of the login nodes repeated the behavior. The strange thing is that the MDS does not show anything about the NID of the client. The client just says it lost connection with it, but the MDS never says it has not heard from the client and is kicking it out. Very strange. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Sep 4, 2008, at 11:34 PM, Brock Palen wrote: Is this enough information? Probably. If you are running 1.6.5, try disabling statahead on all of your clients... # echo 0 /proc/fs/lustre/.../statahead_max I thought statahead was fixed in 1.6.5 ? Main reason we upgraded. Login nodes already are showing that behavior again. I will try it out Of course, this setting goes back to it's default of 32 on a reboot. b. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] lru_size very small
Great! So I read this as being lru_size no-longer needs to be manually adjusted. Thats great! Thanks! Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Aug 23, 2008, at 7:22 AM, Andreas Dilger wrote: On Aug 22, 2008 15:39 -0400, Brock Palen wrote: It looks like lru_size is not a static parameter. While on most of our hosts it starts as zero. Once the file system is accessed some the values start to rise. The values get highest for the MDS. cat nobackup-MDT-mdc-01022c433800/lru_size 3877 Yes, in 1.6.5 instead of having a static LRU size it is dynamic based on load. This optimizes the number of locks available to nodes that have very different workloads than others (e.g. login/build nodes vs. compute nodes vs. backup nodes). So in 1.6.5.1 are lock dynamically adjusted based on ram available on the MDS/OSS's? Notice how the value above is _much_ higher than the default '100' in the manual. The total number of locks available are now a function of the RAM on the server. I think the maximum is 50 locks/MB, but this is hooked into the kernel VM so that in case of too much memory pressure then the LRU size is shrunk. I should point out this value was 0 till I did a 'find . | wc -l' in a directory. The same is for regular access. users on nodes that access lustre have locks. Nodes that have not had lustre access yet are still 0 (by access I mean an application that uses our lustre mount vs our NFS mount.) Any feedback on the nature of locks and lru_size? We are looking to do what the manual says about upping the number on the login nodes. Yes, the manual needs an update. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] HLRN lustre breakdown
On Aug 21, 2008, at 10:22 AM, Troy Benjegerdes wrote: This is a big nasty issue, particularly for HPC applications where performance is a big issue. How does one even begin to benchmark the performance overhead of a parallel filesystem with checksumming? I am having nightmares over the ways vendors will try to play games with performance numbers. True My suspicion is that whenever a parallel filesystem with checksumming is available and works, that all the end-users will just turn it off anyway because the applications will run twice as fast without it, regardless of what the benchmarks say.. leaving us back at the same problem. I don't think this will be a problem. On current systems it may be the case of the checksummed filesystem becoming cpu bound. I think the OST's will be bailed out by cpu speeds going up faster than disk speeds. You just need to limit the number of OST's/OSS. Where I could see it being a problem is on the client side. That assumes that writes and reads are competing with the application for cycles. So far on our clusters I see applications do ether compute or IO on a thread/rank. Not both, freeing up allocated cpus for IO. Then again maybe I should ask our users why they don't do any async IO. Prob depends. My 2 cents. On Wed, Aug 20, 2008 at 07:12:10PM +0200, Bernd Schubert wrote: Oh damn, I'm always afraid of silent data corruptions due to bad harddisks. We also already had this issue, fortunately we found this disk before taking the system into production. Will lustre-2.0 use the ZFS checksum feature? Thanks, Bernd On Wednesday 20 August 2008 19:08:34 Peter Jones wrote: Hi there I got the following background information from Juergen Kreuels at SGI It turned out that a bad disk ( which did NOT report itself as being bad ) killed the lustre leading to data corruption due to inode areas on that disk. It was finally decided to remake the whole FS and only during that action we finally ( after nearly 48 h ) found that bad drive. It had nothing to do with the lustre FS itself. Lustre had been the victim of a HW failure on a Raid6 lun. I hope that this helps PJones Heiko Schroeter wrote: Hello list, does anyone has more background infos of what happened there ? Regards Heiko HLRN News - Since Mon Aug 18, 2008 12:00 HLRN-II complex Berlin is open for users, again. During the maintenance it turned out that the Lustre file system holding the users $WORK and $TMPDIR was damaged completely. The file system had to be reconstructed from scratch. All user data in $WORK are lost. We hope that this event remains an exception. SGI apologizes for this event. /Bka === = This is an announcement for all HLRN Users ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- Bernd Schubert Q-Leap Networks GmbH ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- -- Troy Benjegerdes'da hozer' [EMAIL PROTECTED] Somone asked me why I work on this free (http://www.gnu.org/ philosophy/) software stuff and not get a real job. Charles Shultz had the best answer: Why do musicians compose symphonies and poets write poems? They do it because life wouldn't have any meaning for them if they didn't. That's why I draw cartoons. It's my life. -- Charles Shultz ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] HLRN lustre breakdown
Really ? You sure? I just set up a new 1.6.5.1 filesystem this week: [EMAIL PROTECTED] ~]# cat /proc/fs/lustre/llite/nobackup-010037e27c00/ checksum_pages 0 I am curious to test if they were on. My MPI_File_write() of a large file was less than I expected, but it looked like OST's were cpu bound. (two x4500's) Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Aug 21, 2008, at 2:59 PM, Andreas Dilger wrote: On Aug 21, 2008 10:55 -0400, Brock Palen wrote: On Aug 21, 2008, at 10:22 AM, Troy Benjegerdes wrote: This is a big nasty issue, particularly for HPC applications where performance is a big issue. How does one even begin to benchmark the performance overhead of a parallel filesystem with checksumming? I am having nightmares over the ways vendors will try to play games with performance numbers. True Actually, Lustre 1.6.5 does checksumming by default, and that is how we do our benchmarking. Some customers will turn it off because the overhead hurts them. New customers may not even notice it... Also, for many workloads the data integrity is much more important than the speed. My suspicion is that whenever a parallel filesystem with checksumming is available and works, that all the end-users will just turn it off anyway because the applications will run twice as fast without it, regardless of what the benchmarks say.. leaving us back at the same problem. I don't think this will be a problem. On current systems it may be the case of the checksummed filesystem becoming cpu bound. I think the OST's will be bailed out by cpu speeds going up faster than disk speeds. You just need to limit the number of OST's/OSS. I agree that CPU speeds will almost certainly cover this in the future. Where I could see it being a problem is on the client side. That assumes that writes and reads are competing with the application for cycles. So far on our clusters I see applications do ether compute or IO on a thread/rank. Not both, freeing up allocated cpus for IO. Yes, that is our experience also. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] New lustre message
I don't know if this is a bad thing, I was doing a stress of our new lustre install and managed to have a client kicked out with the following message on the OST that kicked it out: Lustre: 6584:0:(ldlm_lib.c:760:target_handle_connect()) nobackup- OST: refuse reconnection from 749b3c01-4ac0- [EMAIL PROTECTED]@tcp to 0x0102f7cdc000; still busy with 6 active RPCs Was this just a result of hammering the filesystem really hard? Both OSS became CPU bound, so I would not be surprised if it was just to much. Any other common causes of this message (I never saw it with our old setup) would be great. Thanks, New install is working great, nice product. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] New lustre message
On Aug 21, 2008, at 11:17 PM, Brian J. Murrell wrote: On Thu, 2008-08-21 at 22:23 -0400, Brock Palen wrote: I don't know if this is a bad thing, I was doing a stress of our new lustre install and managed to have a client kicked out with the following message on the OST that kicked it out: To be clear the below message is not a client being evicted but rather a client trying to reconnect after it has been evicted. Thanks yes, this message appeared after the eviction notice, Lustre: 6584:0:(ldlm_lib.c:760:target_handle_connect()) nobackup- OST: refuse reconnection from 749b3c01-4ac0- [EMAIL PROTECTED]@tcp to 0x0102f7cdc000; still busy with 6 active RPCs The OSS is refusing to allow the client to reconnect however because it is still trying to finish the transactions the client had in progress when it was evicted. Good to know that its just for 'that' client. Was this just a result of hammering the filesystem really hard? Could be, if the load was atypical and you have tuned your obd_timeout for a more typical load. Typically, until AT is in full swing, you need to tune for your worst case scenario. b. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] lru_size very small
Sorry for throwing up so many quick questions on the list in a short time. Looking at the manual about locking, the manual states The default value of LRU size is 100 I looked on our login nodes to increase its value, currently lustre set lru_size to 32 for the MDS and 1 for 9 of the OST's, 3 for 1 OST, 4 for 1 OST and 0 for 3 OST's. I should note though that all 14 OST's are spread across two OSS, both with 16GB of ram (x4500's). Compared to what the manual says this sounds really small. Would this be a sign that we don't have enough memory in our OSS/ MDS's for our number of clients? I looked on a few of our clients, many only have 1 lru_size for the MDS and 0 for all the OST's. Am I reading something wrong? Or do we have to set this at start up, not let lustre figure it out from clients/ram as stated in the manual. This state worries me because it gives me the felling the cache will not function at all because of the lack of available locks. I don't want to end up on the wrong end of can speed up Lustre dramatically. Thanks. 633 clients, 16 GB MDS/MGS 2x16GB OSS's. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] It gives error no space left while lustre still have spaces left.
If I understand right when you use 'setstripe -c -1' lustre will try to evenly spread the data of a file over all OST's. Because one of yours gets full, the file can nolonger be added to. Lustre does not fall back to using fewer stripes as most users say 'use more stripes' for a reason. Lustre should not ignore this (and doesn't). I don't know how you would work around a this, A use every stripe you can till its out of space I don't think exists. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Aug 21, 2008, at 12:13 AM, /* Chris */ wrote: Hi, all, I got a problem when I testing lustre-1.6.5.1 on CentOS-5.2. I have four machines(PCs), they are MGS co-located with MDT, OSS-1, OSS-2 and CLT. OSS-1 have two disks which are formatted as ost01(40GB) and ost02 (15GB). OSS-2 have two disks which are formatted as ost03(23GB) and ost04 (5GB). At first, I reformat the MGS/MDT and mount it to /mnt/ as follow: [EMAIL PROTECTED] ~]# mkfs.lustre --reformat --fsname=testfs --mgs --mdt / dev/hdb [EMAIL PROTECTED] ~]# mount -t lustre /dev/hdb /mnt/mgs Second, I reformat OSTs and mount it to /mnt as follow: [EMAIL PROTECTED] ~]# mkfs.lustre --reformat --fsname=testfs --ost -- [EMAIL PROTECTED] /dev/hdc [EMAIL PROTECTED] ~]# mkfs.lustre --reformat --fsname=testfs --ost -- [EMAIL PROTECTED] /dev/hdd [EMAIL PROTECTED] ~]# mount -t lustre /dev/hdc /mnt/ost01 [EMAIL PROTECTED] ~]# mount -t lustre /dev/hdd /mnt/ost02 [EMAIL PROTECTED] ~]# mkfs.lustre --reformat --fsname=testfs --ost -- [EMAIL PROTECTED] /dev/hdc [EMAIL PROTECTED] ~]# mkfs.lustre --reformat --fsname=testfs --ost -- [EMAIL PROTECTED] /dev/hdd [EMAIL PROTECTED] ~]# mount -t lustre /dev/hdc /mnt/ost03 [EMAIL PROTECTED] ~]# mount -t lustre /dev/hdd /mnt/ost04 Third, I mounted lustre file system at CLT like this: [EMAIL PROTECTED] ~] # mount -t lustre [EMAIL PROTECTED]:/testfs /mnt/lfs [EMAIL PROTECTED] mnt]# df -h Filesystem Capacity Used Available Use% Mounted on: /dev/mapper/VolGroup00-LogVol00 4.3G 1.9G 2.2G 46% / /dev/hda1 99M 67M 28M 72% /boot tmpfs252M 0252M 0% /dev/shm [EMAIL PROTECTED]:/testfs 82G 1.6G 77G 2% /mnt/lfs Fourth, I try to use lfs command to set stripe parameters at CLT. [EMAIL PROTECTED] mnt]# lfs setstripe lfs -s 8m -c -1 Fifth, I use dd command to test lustre file system. Then, it gives error no space left since just ost04(5GB) get full. [EMAIL PROTECTED] lfs]# dd if=/dev/zero of=testfile001 bs=128M count=24 24+0 records in 24+0 records out 3221225472 bytes (3.2 GB) copiedïŒ164.585 secondsïŒ19.6 MB/s [EMAIL PROTECTED] lfs]# dd if=/dev/zero of=testfile002 bs=128M count=24 24+0 records in 24+0 records out 3221225472 bytes (3.2 GB) copiedïŒ164.836 secondsïŒ19.5 MB/s [EMAIL PROTECTED] lfs]# dd if=/dev/zero of=testfile003 bs=128M count=48 48+0 records in 48+0 records out 6442450944 bytes (6.4 GB) copiedïŒ383.2 secondsïŒ16.8 MB/s [EMAIL PROTECTED] lfs]# dd if=/dev/zero of=testfile004 bs=128M count=48 dd: write error: âtestfile004â: No space left on this device. 47+0 records in 46+0 records out 6301048832 bytes (6.3 GB) copiedïŒ418.321 ç§ïŒ15.1 MB/s [EMAIL PROTECTED] lfs]# df -h Filesystem Capacity Used Available Use% Mounted on /dev/mapper/VolGroup00-LogVol00 4.3G 1.9G 2.2G 46% / /dev/hda199M 67M 28M 72% /boot tmpfs252M 0 252M0% /dev/shm [EMAIL PROTECTED]:/testfs82G 20G 59G 25% /mnt/lfs [EMAIL PROTECTED] lfs]# lfs df UUID 1K-blocks Used Available Use% Mounted on testfs-MDT_UUID2752272127844 24671444% /mnt/lfs [MDT:0] testfs-OST_UUID 41284928 5145080 34042632 12% /mnt/lfs [OST:0] testfs-OST0001_UUID 15481840 5134432 9560912 33% /mnt/lfs [OST:1] testfs-OST0002_UUID 23738812 5141040 17391848 21% /mnt/lfs [OST:2] testfs-OST0003_UUID5160576 4898364 4 94% /mnt/lfs [OST:3] filesystem summary: 85666156 20318916 60995396 23% /mnt/lfs I have no idea about this error. Is there anyone could tell me about that how to config lustre to avoid this error? didn't Lustre put file into the OSTs which still have free spaces instead of those full ones ? Regards, Chris ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Bug 15912
Hi, I never did get a reply for this. We are faced with planned production on Monday, so we could really use some guidance. What is the quickest work around for working around bug 15912, there are patches now but they are for unreleased versions. We need a solution for the current 1.6.5.1. Can I change the MGSSPEC for the OST's after the fact? And will that work? How would this be done? Thanks ahead of time. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Aug 14, 2008, at 11:15 AM, Brock Palen wrote: I see it is fixed now as we were being bit by this. We would like to put the new filesystem into use on Monday thus we are trying to get this resolved. Question, because it is just a parsing problem in mkfs, can after the filesystem is created can the problem be corrected? If not, how can we work around this? Do I just need to build the mkfs out of CVS for 1.6.6 ? Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Bug 15912
Ignore, After days of banging head against the wall and trying to use tunefs.lustre which appears to maybe suffer from the same bug. I found the alternative that specifying --mgsnode= more than once was valid. This mixed with the wonderful --print option to mkfs.lustre I think I have my work around. mkfs.lustre --reformat --ost --fsname=nobackup --mgsnode=mds1 -- mgsnode=mds2 --mkfsoptions -j -J device=/dev/md27 /dev/md17 Thanks, Though I am scared about behavior of tunefs.lustre if we ever needed to re-ip the nodes. Re-formating is not really an option. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Aug 18, 2008, at 8:16 PM, Brock Palen wrote: Hi, I never did get a reply for this. We are faced with planned production on Monday, so we could really use some guidance. What is the quickest work around for working around bug 15912, there are patches now but they are for unreleased versions. We need a solution for the current 1.6.5.1. Can I change the MGSSPEC for the OST's after the fact? And will that work? How would this be done? Thanks ahead of time. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Aug 14, 2008, at 11:15 AM, Brock Palen wrote: I see it is fixed now as we were being bit by this. We would like to put the new filesystem into use on Monday thus we are trying to get this resolved. Question, because it is just a parsing problem in mkfs, can after the filesystem is created can the problem be corrected? If not, how can we work around this? Do I just need to build the mkfs out of CVS for 1.6.6 ? Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Bug 15912
I see it is fixed now as we were being bit by this. We would like to put the new filesystem into use on Monday thus we are trying to get this resolved. Question, because it is just a parsing problem in mkfs, can after the filesystem is created can the problem be corrected? If not, how can we work around this? Do I just need to build the mkfs out of CVS for 1.6.6 ? Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] mv_sata patch
Is the cache patch for mv_sata noted in the sun paper on the x4500 available? Or has it been rolled into the source distributed by sun? Trying to avoid data loss. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] stata_mv mv_stata which is better?
Thanks, I might look into it. Right now the performance of the stock driver that comes with the kernel is more than the 4 1gig connections we will be using. I am having other issues now with the new filesystem that I did not have with our old one, that iwll be a new question though. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Aug 7, 2008, at 2:11 PM, Mike Berg wrote: Brock, It is recommended that mv_sata is used on the x4500. It has been a while since I have built this up myself and a few Lustre releases back but I do understand the pain. I hope that with Lustre 1.6.5.1 on RHEL 4.5 you can just build mv_sata against the provided Lustre kernel and alias it accordingly in modprobe.conf and create a new initrd, then update grub. I don't have gear handy to give it try unfortunately. Please let me know your experiences with this if you pursue it. Enclosed is a somewhat dated document on what we have found to be the best configuration of the x4500 for use with Lustre. Ignore the N1SM parts. We optimized for performance and RAS with some sacrifices on capacity. Hopefully this is a useful reference. Regards, Mike Berg Sr. Lustre Solutions Engineer Sun Microsystems, Inc. Office/Fax: (303) 547-3491 E-mail: [EMAIL PROTECTED] X4500-preparation.pdf On Aug 6, 2008, at 1:48 PM, Brock Palen wrote: Is it still worth the effort to try and build mv_stata? when working with an x4500? stata_mv from RHEL4 does not appear to show some of the stability problems discussed online before. I am curious because the build system sun provides with the driver does not play nicely with the lustre kernel source packaging. If it is worth all the pain, if others have already figured it out. Any help would be grateful. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] operation 400 on unconnected MGS
The problem I was refering to: With the new filesystem we just created I am getting the following problem, clients loose connection to the MGS and the MGS says it evicted them, machines are on the same network and there is no errors on the interfaces. The MGS says: Lustre: MGS: haven't heard from client e8eb1779-5cea-9cc7- b5ae-4c5ccf54f5ca (at [EMAIL PROTECTED]) in 240 seconds. I think it's dead, and I am evicting it. LustreError: 9103:0:(mgs_handler.c:538:mgs_handle()) lustre_mgs: operation 400 on unconnected MGS LustreError: 9103:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error (-107) [EMAIL PROTECTED] x24929/t0 o400-?@?: 0/0 lens 128/0 e 0 to 0 dl 1218142953 ref 1 fl Interpret:/0/0 rc -107/0 The operation 400 on unconnected MGS is the only new message I am not familiar with. Once the client losses connection with the MGS I will see the OST's start booting the client also. Servers are 1.6.5.1 clients are patch-less 1.6.4.1 on RHEL4. Any insight would be great. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] stata_mv mv_stata which is better?
Is it still worth the effort to try and build mv_stata? when working with an x4500? stata_mv from RHEL4 does not appear to show some of the stability problems discussed online before. I am curious because the build system sun provides with the driver does not play nicely with the lustre kernel source packaging. If it is worth all the pain, if others have already figured it out. Any help would be grateful. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Luster recovery when clients go away
One of our OSS's died with a panic last night. Between when it was found (no failover) and restarted two clients had died also. (nodes crashed by user OOM). Because of this the OST's now are looking for 626 clients to recover when only 624 are up. So the 624 recover in about 15 minutes, but the OST's on that OSS hang waiting for the last two that are dead and not coming back. Note the MDS reports only 624 clients. Is there a a way to tell the OST's to go ahead and evict those two clients and finish recovering? Also time remaining has been 0 sense it was booted. How long will the OST's wait before it lets operations continue? Is there any rule to speeding up recovery? The OSS that crashed sees very little cpus/disk/network traffic when recovery is going on so any way to speed it up even if it results in a higher load would be great to know. status: RECOVERING recovery_start: 1217509142 time remaining: 0 connected_clients: 624/626 completed_clients: 624/626 replayed_requests: 0/?? queued_requests: 0 next_transno: 175342162 status: RECOVERING recovery_start: 1217509144 time remaining: 0 connected_clients: 624/626 completed_clients: 624/626 replayed_requests: 0/?? queued_requests: 0 next_transno: 193097794 Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] lustre 1.6.5.1 panic on failover
I have two machines I am setting up as my first mds failover pair. The two sun x4100's are connected to a FC disk array. I have set up heartbeat with IPMI for STONITH. Problem is when I run a test on the host that currently has the mds/ mgs mounted 'killall -9 heartbeat' I see the IPMI shutdown and when the second 4100 tries to mount the filesystem it does a kernel panic. Has anyone else seen this behavior? Is there something I am running into? If I do a 'hb_takelover' or shutdown heartbeat cleanly all is well. Only if I simulate heartbeat failing does this happen. Note I have not tired yanking power yet, but I want to simulate a MDS in a semi dead state and ran into this. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] lustre 1.6.5.1 panic on failover
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Whats a good tool to grab this? Its more than one page long, and the machine does not have serial ports. Links are ok. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Jul 31, 2008, at 5:14 PM, Brian J. Murrell wrote: On Thu, 2008-07-31 at 16:57 -0400, Brock Palen wrote: Problem is when I run a test on the host that currently has the mds/ mgs mounted 'killall -9 heartbeat' I see the IPMI shutdown and when the second 4100 tries to mount the filesystem it does a kernel panic. We'd need to see the *full* panic info to do any amount of diagnostics. b. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (Darwin) iD8DBQFIkldGMFCQB4Bvz5QRAjEqAJ99IN1m0/JJcqyh/Dm7WF0w5nd2eQCfT9IT w39dxPiWCdXKzpLEo4WxBSU= =Gnsm -END PGP SIGNATURE- ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] MGS failover
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Thank you, I was only looking at the mkfs.lustre didn't think the hosts are the ones that need to look there, not mgs filesystem its self. does the mgsspec also work for --mgsnode= when creating a file system? mkfs.lustre [EMAIL PROTECTED]:[EMAIL PROTECTED] Would that be valid? Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Jul 30, 2008, at 10:29 AM, Brian J. Murrell wrote: On Wed, 2008-07-30 at 09:48 -0400, Brock Palen wrote: The manual does not make much sense when it comes to MGS failover. Manual: Note â The MGS does not use the --failnode option. This is true. You need to set the command on all other nodes of the filesystem (servers and clients), about the failover options for the MGS. This is true also. Use the --mgsnode parameter on servers and mount address for clients. Also true. The servers need to contact the MGS for configuration information; Also true. they cannot query the MGS about the failover partner. This part is either unclear or wrong. I guess the question is what the writer was referring to as they. You could file a bug about that. This does not make any sense at all, other than you can't use -- failnode On the MGS, yes. On servers, yes you can use that option. and that clients can't check with two different hosts for MGS data. They sure can! That's what the mgsspec:=mgsnode[:mgsnode] syntax in the mount.lustre manpage is all about. Our MGS will be on its own LUN setup with heartbeat between two nodes that are also working as an MDS pair. Good. While Heartbeat takes care of mounting the MGS file system, how can we tell clients if mds1 is down use mds2 for MGS data $ man mount.lustre Check out mgsspec in the OPTIONS section. b. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (Darwin) iD8DBQFIkH5JMFCQB4Bvz5QRArwBAJ0TBBFVIBWiLQIt1e6kbG/n6Ufn5wCcCx1L /KJFr81OkKkuTTW0N4LtcUk= =hfmn -END PGP SIGNATURE- ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] rpm kernel-devel package
On the download site, I am trying to figure out which rpm i need to download that would match the 'kernel-devel' equivilent for lustre. I have a need to build the sun multipath driver against that kernel for our new MDS's machines, but it is not very obvious if I need: lustre-source-1.6.5.1-2.6.9_67.0.7.EL_lustre.1.6.5.1smp.x86_64.rpm Or: kernel-lustre-source-2.6.9-67.0.7.EL_lustre.1.6.5.1.x86_64.rpm Is there a reason why there is not just a normal: kernel-lustre-smp-devel Just like RedHat/SLES provides? Thanks! Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Can't build sun rdac driver against lustre source.
Hi I ran into two problems, The first was easy to resolve: /bin/sh: scripts/genksyms/genksyms: No such file or directory /bin/sh: scripts/mod/modpost: No such file or directory I just had to copy genksyms and mod from linux-2.6.9-67.0.7.EL_lustre.1.6.5.1 to linux-2.6.9-67.0.7.EL_lustre. 1.6.5.1-obj I figured you should be aware of this, if its a problem with sun's build system for their multipath driver or lustre source package. This is on RHEL4. Using the lustre RPM's form sun's website. The next problem I am stuck on is: In file included from mppLnx26_spinlock_size.c:51: /usr/include/linux/autoconf.h:1:2: #error Invalid kernel header included in userspace mppLnx26_spinlock_size.c: In function `main': mppLnx26_spinlock_size.c:102: error: `spinlock_t' undeclared (first use in this function) mppLnx26_spinlock_size.c:102: error: (Each undeclared identifier is reported only once mppLnx26_spinlock_size.c:102: error: for each function it appears in.) make: *** [mppLnx_Spinlock_Size] Error 1 I guess what I should really ask is, Has anyone ever made multipath work with a sun 2540 array for use as the MDS/MGS file system? Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Can't build sun rdac driver against lustre source.
Yes that worked! Thank you very much. Hin't to sun, the 2540 is a very nice array for lustre, it would be good if all the tools with it were checked to work out the box with lustre. Just 2 cents. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Jul 25, 2008, at 2:19 PM, Stuart Marshall wrote: Hi, I have compiled and used the run rdac driver and my modified makefile is attached. The sequence I've used (perhaps not the best) is: - cd /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5.1smp/source/ - cp /boot/config-2.6.9-67.0.7.EL_lustre.1.6.5.1smp .config - make clean - make mrproper - make prepare-all - cd /tmp - tar xf path_to_rdac_tarfile/rdac-LINUX-09.01.B2.74-source.tar - cd linuxrdac-09.01.B2.74/ - cp path_to_my_makefile/Makefile_linuxrdac-09.01.B2.74 Makefile - make clean - make uninstall - make - make install - vim /boot/grub/menu.lst (initrd - mpp) - reboot The changes in the Makefile may fix your problem. I'm using 6140 Sun arrays and also plan to use a 2540 as the MDT soon. Stuart On Fri, Jul 25, 2008 at 11:03 AM, Brock Palen [EMAIL PROTECTED] wrote: Hi I ran into two problems, The first was easy to resolve: /bin/sh: scripts/genksyms/genksyms: No such file or directory /bin/sh: scripts/mod/modpost: No such file or directory I just had to copy genksyms and mod from linux-2.6.9-67.0.7.EL_lustre.1.6.5.1 to linux-2.6.9-67.0.7.EL_lustre. 1.6.5.1-obj I figured you should be aware of this, if its a problem with sun's build system for their multipath driver or lustre source package. This is on RHEL4. Using the lustre RPM's form sun's website. The next problem I am stuck on is: In file included from mppLnx26_spinlock_size.c:51: /usr/include/linux/autoconf.h:1:2: #error Invalid kernel header included in userspace mppLnx26_spinlock_size.c: In function `main': mppLnx26_spinlock_size.c:102: error: `spinlock_t' undeclared (first use in this function) mppLnx26_spinlock_size.c:102: error: (Each undeclared identifier is reported only once mppLnx26_spinlock_size.c:102: error: for each function it appears in.) make: *** [mppLnx_Spinlock_Size] Error 1 I guess what I should really ask is, Has anyone ever made multipath work with a sun 2540 array for use as the MDS/MGS file system? Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss Makefile_linuxrdac-09.01.B2.74 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Can't build sun rdac driver against lustre source.
Stuart, It looks like you have a newer rdac package than sun has on their website. So while your make file builds everything, it ties to install a bit of code that does not exist. FYI. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Jul 25, 2008, at 2:30 PM, Brock Palen wrote: Yes that worked! Thank you very much. Hin't to sun, the 2540 is a very nice array for lustre, it would be good if all the tools with it were checked to work out the box with lustre. Just 2 cents. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Jul 25, 2008, at 2:19 PM, Stuart Marshall wrote: Hi, I have compiled and used the run rdac driver and my modified makefile is attached. The sequence I've used (perhaps not the best) is: - cd /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5.1smp/source/ - cp /boot/config-2.6.9-67.0.7.EL_lustre.1.6.5.1smp .config - make clean - make mrproper - make prepare-all - cd /tmp - tar xf path_to_rdac_tarfile/rdac-LINUX-09.01.B2.74-source.tar - cd linuxrdac-09.01.B2.74/ - cp path_to_my_makefile/Makefile_linuxrdac-09.01.B2.74 Makefile - make clean - make uninstall - make - make install - vim /boot/grub/menu.lst (initrd - mpp) - reboot The changes in the Makefile may fix your problem. I'm using 6140 Sun arrays and also plan to use a 2540 as the MDT soon. Stuart On Fri, Jul 25, 2008 at 11:03 AM, Brock Palen [EMAIL PROTECTED] wrote: Hi I ran into two problems, The first was easy to resolve: /bin/sh: scripts/genksyms/genksyms: No such file or directory /bin/sh: scripts/mod/modpost: No such file or directory I just had to copy genksyms and mod from linux-2.6.9-67.0.7.EL_lustre.1.6.5.1 to linux-2.6.9-67.0.7.EL_lustre. 1.6.5.1-obj I figured you should be aware of this, if its a problem with sun's build system for their multipath driver or lustre source package. This is on RHEL4. Using the lustre RPM's form sun's website. The next problem I am stuck on is: In file included from mppLnx26_spinlock_size.c:51: /usr/include/linux/autoconf.h:1:2: #error Invalid kernel header included in userspace mppLnx26_spinlock_size.c: In function `main': mppLnx26_spinlock_size.c:102: error: `spinlock_t' undeclared (first use in this function) mppLnx26_spinlock_size.c:102: error: (Each undeclared identifier is reported only once mppLnx26_spinlock_size.c:102: error: for each function it appears in.) make: *** [mppLnx_Spinlock_Size] Error 1 I guess what I should really ask is, Has anyone ever made multipath work with a sun 2540 array for use as the MDS/MGS file system? Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss Makefile_linuxrdac-09.01.B2.74 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Lustre locking up on login/interactive nodes
Every so often lustre locks up. It will recover eventually. The process show this self's in 'D' Uninterruptible IO Wait. This case it was 'ar' making an archive. Dmesg then shows: Lustre: nobackup-MDT-mdc-0101fc467800: Connection to service nobackup-MDT via nid [EMAIL PROTECTED] was lost; in progress operations using this service will wait for recovery to complete. LustreError: 167-0: This client was evicted by nobackup-MDT; in progress operations using this service will fail. LustreError: 17575:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID [EMAIL PROTECTED] x912452/t0 o101-[EMAIL PROTECTED]@tcp:12 lens 488/768 ref 1 fl Rpc:P/0/0 rc 0/0 LustreError: 17575:0:(mdc_locks.c:423:mdc_finish_enqueue()) ldlm_cli_enqueue: -108 LustreError: 27076:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID [EMAIL PROTECTED] x912464/t0 o101-[EMAIL PROTECTED]@tcp:12 lens 440/768 ref 1 fl Rpc:/0/0 rc 0/0 LustreError: 27076:0:(mdc_locks.c:423:mdc_finish_enqueue()) ldlm_cli_enqueue: -108 LustreError: 27489:0:(file.c:97:ll_close_inode_openhandle()) inode 12653753 mdc close failed: rc = -108 LustreError: 27489:0:(file.c:97:ll_close_inode_openhandle()) inode 12195682 mdc close failed: rc = -108 LustreError: 27489:0:(file.c:97:ll_close_inode_openhandle()) Skipped 46 previous similar messages Lustre: nobackup-MDT-mdc-0101fc467800: Connection restored to service nobackup-MDT using nid [EMAIL PROTECTED] LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The mds_close operation failed with -116 LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The mds_close operation failed with -116 LustreError: 26930:0:(file.c:97:ll_close_inode_openhandle()) inode 11441446 mdc close failed: rc = -116 LustreError: 26930:0:(file.c:97:ll_close_inode_openhandle()) Skipped 113 previous similar messages Is there special options that should be done on interactive/login nodes? I remember something about how much memory should be available on login vs batch nodes. But I don't know how to change that, I just assumed lustre would use it. Login nodes have 8GB. __ www.palen.serveftp.net Center for Advanced Computing http://cac.engin.umich.edu [EMAIL PROTECTED] ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre locking up on login/interactive nodes
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Jul 21, 2008, at 11:51 AM, Brian J. Murrell wrote: On Mon, 2008-07-21 at 11:43 -0400, Brock Palen wrote: Every so often lustre locks up. It will recover eventually. The process show this self's in 'D' Uninterruptible IO Wait. This case it was 'ar' making an archive. Dmesg then shows: Syslog is usually a better place to get messages from as it gives some context as to the time of the messages. Ok will keep in mind. Looks the same though, Its odd though, if I login to the same machine I can move to that directory list the files etc. read files on those OST's and notice this was eviction by the MDS, I see no lost network connections or network errors. Strange not good not good at all. The syslog data is the same, its below: Brock Jul 21 11:38:39 nyx-login1 kernel: Lustre: nobackup-MDT- mdc-0101fc467800: Connection to service nobackup-MDT via nid [EMAIL PROTECTED] was lost; in progress operations using this service will wait for recovery to complete.Jul 21 11:38:39 nyx-login1 kernel: LustreError: 167-0: This client was evicted by nobackup- MDT; in progress operations using this service will fail.Jul 21 11:38:39 nyx-login1 kernel: LustreError: 17575:0:(client.c: 519:ptlrpc_import_delay_req()) @@@ IMP_INVALID [EMAIL PROTECTED] x912452/t0 o101-[EMAIL PROTECTED]@tcp:12 lens 488/768 ref 1 fl Rpc:P/0/0 rc 0/0Jul 21 11:38:39 nyx-login1 kernel: LustreError: 17575:0:(mdc_locks.c:423:mdc_finish_enqueue()) ldlm_cli_enqueue: -108Jul 21 11:38:39 nyx-login1 kernel: LustreError: 27076:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID [EMAIL PROTECTED] x912464/t0 o101-nobackup- [EMAIL PROTECTED]@tcp:12 lens 440/768 ref 1 fl Rpc:/0/0 rc 0/0Jul 21 11:38:39 nyx-login1 kernel: LustreError: 27076:0: (mdc_locks.c:423:mdc_finish_enqueue()) ldlm_cli_enqueue: -108Jul 21 11:38:39 nyx-login1 kernel: LustreError: 27489:0:(file.c: 97:ll_close_inode_openhandle()) inode 12653753 mdc close failed: rc = - -108Jul 21 11:38:39 nyx-login1 kernel: LustreError: 27489:0:(file.c: 97:ll_close_inode_openhandle()) inode 12195682 mdc close failed: rc = - -108Jul 21 11:38:39 nyx-login1 kernel: LustreError: 27489:0:(file.c: 97:ll_close_inode_openhandle()) Skipped 46 previous similar messagesJul 21 11:38:39 nyx-login1 kernel: Lustre: nobackup-MDT- mdc-0101fc467800: Connection restored to service nobackup-MDT using nid [EMAIL PROTECTED] 21 11:38:39 nyx-login1 kernel: LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The mds_close operation failed with -116Jul 21 11:38:39 nyx-login1 kernel: LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The mds_close operation failed with -116Jul 21 11:38:39 nyx-login1 kernel: LustreError: 26930:0:(file.c:97:ll_close_inode_openhandle()) inode 11441446 mdc close failed: rc = -116Jul 21 11:38:39 nyx-login1 kernel: LustreError: 26930:0:(file.c:97:ll_close_inode_openhandle()) Skipped 113 previous similar messages -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (Darwin) iD8DBQFIhLOqMFCQB4Bvz5QRAgWvAJ9HhQAo9JZdcS2iyMFb19HzcgkwcQCdGosB sHaligENGxnJHdMu5116D5U= =GOlg -END PGP SIGNATURE- ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] OSS load in the roof
our OSS went crazy today. It is attached to two OST's. The load normally around 2-4. Right now it is 123. I noticed this to be the cause: root 6748 0.0 0.0 00 ?DMay27 8:57 [ll_ost_io_123] All of them are stuck in un-interruptible sleep. Has anyone seen this happen before? Is this caused by a pending disk failure? I ask the disk system failure because I also see this message: mptscsi: ioc1: attempting task abort! (sc=010038904c40) scsi1 : destination target 0, lun 0 command = Read (10) 00 75 94 40 00 00 10 00 00 mptscsi: ioc1: task abort: SUCCESS (sc=010038904c40) and: Lustre: 6698:0:(lustre_fsfilt.h:306:fsfilt_setattr()) nobackup- OST0001: slow setattr 100s Lustre: 6698:0:(watchdog.c:312:lcw_update_time()) Expired watchdog for pid 6698 disabled after 103.1261s Thanks Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] OSS load in the roof
On Jun 27, 2008, at 1:39 PM, Bernd Schubert wrote: On Fri, Jun 27, 2008 at 01:07:32PM -0400, Brian J. Murrell wrote: On Fri, 2008-06-27 at 12:44 -0400, Brock Palen wrote: All of them are stuck in un-interruptible sleep. Has anyone seen this happen before? Is this caused by a pending disk failure? Well, they are certainly stuck because of some blocking I/O. That could be disk failure, indeed. mptscsi: ioc1: attempting task abort! (sc=010038904c40) scsi1 : destination target 0, lun 0 command = Read (10) 00 75 94 40 00 00 10 00 00 mptscsi: ioc1: task abort: SUCCESS (sc=010038904c40) That does not look like a picture of happiness, indeed, no. You have SCSI commands aborting. Well, these messages are not nice of course, since the mpt error handler got activated, but in principle a scsi device can recover then. Unfortunately, the verbosity level of scsi makes it impossbible to figure out what was actually the problem. Since we suffered from severe scsi problems, I wrote quite a number of patches to improve the situation. We now at least can understand where the problem came from and also have a slightly improved error handling. These are presently for 2.6.22 only, but my plan is to sent these upstream for 2.6.28. Lustre: 6698:0:(lustre_fsfilt.h:306:fsfilt_setattr()) nobackup- OST0001: slow setattr 100s Lustre: 6698:0:(watchdog.c:312:lcw_update_time()) Expired watchdog for pid 6698 disabled after 103.1261s Those are just fallout from the above disk situation. Probably the device was offlined and actually this also should have been printed in the logs. Brock, can you check the device status (cat /sys/block/sdX/device/state). IO Is still flowing from both OST's on that OSS, [EMAIL PROTECTED] ~]# cat /sys/block/sd*/device/state running running Sigh, it only needs to live till August when we install our x4500's. I think its safe to send a notice to users they may want to copy their data. Cheers, Bernd ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] OSS load in the roof
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Jun 27, 2008, at 1:07 PM, Brian J. Murrell wrote: On Fri, 2008-06-27 at 12:44 -0400, Brock Palen wrote: All of them are stuck in un-interruptible sleep. Has anyone seen this happen before? Is this caused by a pending disk failure? Well, they are certainly stuck because of some blocking I/O. That could be disk failure, indeed. mptscsi: ioc1: attempting task abort! (sc=010038904c40) scsi1 : destination target 0, lun 0 command = Read (10) 00 75 94 40 00 00 10 00 00 mptscsi: ioc1: task abort: SUCCESS (sc=010038904c40) That does not look like a picture of happiness, indeed, no. You have SCSI commands aborting. While the array was reporting no problems one of the disk was really lagging the others. We have swapped it out. Thanks for the feedback everyone. Lustre: 6698:0:(lustre_fsfilt.h:306:fsfilt_setattr()) nobackup- OST0001: slow setattr 100s Lustre: 6698:0:(watchdog.c:312:lcw_update_time()) Expired watchdog for pid 6698 disabled after 103.1261s Those are just fallout from the above disk situation. b. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (Darwin) iD8DBQFIZUq/MFCQB4Bvz5QRAvacAJ9jkhi+2KgfbJ7bUI/KfHJ0Hnq1wQCeNgHO d6+tzscwCqwYtuHXmzT2kFI= =5p1N -END PGP SIGNATURE- ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre delete efficency
On Jun 26, 2008, at 1:57 PM, Stew Paddaso wrote: We are considering using Lustre as our backend file platform. The specific application involves storing a high-volume of sequential data writes, with a moderate amount of reads (mostly sequencial, with some random seeks). Our concern is with reclaiming space. As the file system fills, we need to be able to quickly delete the oldest files. Does Lustre have an efficient file delete? I'm not expecting specific metrics (though they would be nice if available), just some general info about the Lustre delete process (Does it immediately reclaim the space, or do it 'lazily' in the background? etc, etc.). I don't know specifics about if space reclaiming is 'lazy' or not. But from what I have seen compared to regular ext3 deleting large files on lustre was very fast. I expect this to be because ldsikfs is extent based and regular ext3 is not. If I am wrong on this someone please correct me I really would like to know this also. For me deleting large number of files _feels_ very quick compared to our NFS bobcat from onstor also. Even a operation like the following was much quicker (I wish there was a better way to do this) du -h --max-depth=1 Thanks. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] lustre and multi path
Our new lustre hardware arrived from sun today. Looking at the duel MDS and FC disk array for it. We will need multipath. Has anyone ever used multipath with lustre? Is there any issues? If we set up regular multipath via LVM lustre won't care as far as I can tell and browsing archives. What about multipath without LVM? Our StorageTek array has dual controllers with dual ports going to dual port FC cards in the MDS's. Each MDS has a connection to both controllers so we will need multipath to get any advantage to this. Comments? Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] external journals
Whats a good way to find out if your (our) workload would benefit from external journals? Our OST's are x4500's and I get little if any activity on the journals from my regular benchmarks. What are the benefits? Does ldiskfs do anything intelligent with external journals? What should we see the most help with? Or should be just devote these disks to being another OST? Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Luster access locking up login nodes
I have seen this behavior a few times. Under heavy IO lustre will just stop and dmesg will have the following: LustreError: 3976:0:(events.c:134:client_bulk_callback()) event type 0, status -5, desc 01012ce12000 LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The mds_statfs operation failed with -107 LustreError: Skipped 1 previous similar messageLustre: nobackup- MDT-mdc-0100e9e9ac00: Connection to service nobackup-MDT via nid [EMAIL PROTECTED] was lost; in progress operations using this service will wait for recovery to complete. No network connection issues between the login nodes. When this happens the client does not recover till we reboot the node. This does happen at times on the compute nodes but I see it most on login hosts. If I just go to the lustre mount and try to ls it it will hang for forever. Many times when lustre screws up it recovers but more and more it does not. and we see these bulk errors followed by mds errors. We are using lustre 1.6.x Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Luster access locking up login nodes
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Ahh didn't realize this was related to that. Good to know fix in the works (2 x4500's on the way so we have made a commitment to lustre). How would I make this option the default on boot? There isn't an llite module I see on the clients. I can pdsh to all the clients, but machines to get rebooted some times. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On May 16, 2008, at 4:13 PM, Brian J. Murrell wrote: On Fri, 2008-05-16 at 15:48 -0400, Brock Palen wrote: I have seen this behavior a few times. Under heavy IO lustre will just stop and dmesg will have the following: Review the list archives for statahead problems. b. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (Darwin) iD8DBQFILe2TMFCQB4Bvz5QRAvuRAKCG94UfDQvSxcBXCPSxThLuirrGbACfUsTm hobLA1aA+AHrZd4mwkY5sKQ= =3Gzv -END PGP SIGNATURE- ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] MDS Fail-Over planning.
I know some users talked about DRBD for the shared disk on the MDS. What was the conclusion of this? Bad Idea? I do some high available NFS using this exact same setup. DRBD provides shared storage, Heart Beat is used to monitor hosts. IPMI is used by HeartBeat to power down hosts that are to be killed. The plan on our table right now is two thumpers as the OSS's. Then two x4100 or 4200/s with mirrors SAS drives then shared across with DRBD with Heart Beat. Any comments? Any issues to be aware of? Anyone running something similar? Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] state of sun x4500 drivers
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Thats disappointing, thanks for the input though. The paper at: http://wiki.lustre.org/images/7/79/Thumper-BP-6.pdf Points out how to patch it to enable that functionality, but we want to keep with the CFS stock kernel. 30MB/s might be fine for us, we only plan on bonding the 4 thumber 1Gig-e interfaces. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Apr 23, 2008, at 1:00 PM, Brian Behlendorf wrote: Recently I have also been doing some linux work with the x4500 and I have been using the sata_mv driver (v0.81). The driver will properly detect all the drives and you may access them safely. However, from what I've seen the driver needs some further development work to actually perform well. I see only 30 MB/s write rates to a single disk using a simple streaming dd test. Much of this bad performance may simply be due to the fact that the driver can not enable the disk write-back cache forcing you to use write- thru mode. So currently the bottom line is linux will work on the x4500. But to get it working well someone is going to need to invest some development effort to improve the linux driver. Good luck, Brian There was some discussion about the driver/module for the sata controlers in the thumper (x4500) in the linux kernel. My question is if we bought one of these, would the CFS kernel have everything needed to use the thumper in a safe way. Thank You. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (Darwin) iD8DBQFID2xmMFCQB4Bvz5QRAsDCAJ44d0zU+fUlQkEASgc3M6JBciYgdQCfaUUM r2VnAo6S0Tj3+qGYCoKrcGE= =Xw7Q -END PGP SIGNATURE- ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] lfs setstripe
On Apr 17, 2008, at 10:48 PM, Kaizaad Bilimorya wrote: On Thu, 17 Apr 2008, Brock Palen wrote: I don't think you need to do this. If i understand right, you can set the stripe size of the mount, and everything inside that directory inherits it, unless they them self's were explicitly set. Hi Brock, thanks for the reply. I have set the stripe count on the lustre mount using lfs setstripe but the problem is that any sub directories that already existed under this mount will have the default filesystem stripe count and not the new one I set, so any new files created under these existent sub directories will inherit their parent directory stripe count and not the newly set one from the lustre mount. Ahh I see, I really don't know, I think try walking the system and changing all the old directories. I have not had to do this my self. eg: /lustremount - lfs setstripe /lustremount 0 -1 2 /lustremount/existing_dir - has filesystem default stripe count (1 in this case) /lustremount/new_dir - gets stripe count of parent (2 in this case) /lustremount/existing_dir/newfile - has filesystem default stripe count of 1 So that is why I have to do either option 1 (change default) or 2 (traverse and set explicitly for all existing dirs) that I specified, but I would like to know if there are any performance or other reasons not to do option 2. thanks -k Also files that already are created will keep the stripe settings they were created with. You would need to copy them, and move over the old one to change to the new stripe settings. Check the lustre manual they have something about this. On Apr 17, 2008, at 10:34 AM, Kaizaad Bilimorya wrote: Hello, I would like to adjust the stripe count for our lustre filesystem. Would it be better to: 1) Kill all jobs, unmount the lustre filesystem from all clients, and then adjust the default stripe count for the lustre filesystem on the MDS using lctl. or 2) Use find and the lfs setstripe command to traverse and set the stripe count for all directories in a currently mounted lustre filesystem. Besides the traversal cost of the filesystem, are there other disadvantages, performance reasons, or other reasons not to use option 2? thanks -k ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] lfs setstripe
I don't think you need to do this. If i understand right, you can set the stripe size of the mount, and everything inside that directory inherits it, unless they them self's were explicitly set. Also files that already are created will keep the stripe settings they were created with. You would need to copy them, and move over the old one to change to the new stripe settings. Check the lustre manual they have something about this. You can use 'getstripe' to see what a file/directory use for their settings. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Apr 17, 2008, at 10:34 AM, Kaizaad Bilimorya wrote: Hello, I would like to adjust the stripe count for our lustre filesystem. Would it be better to: 1) Kill all jobs, unmount the lustre filesystem from all clients, and then adjust the default stripe count for the lustre filesystem on the MDS using lctl. or 2) Use find and the lfs setstripe command to traverse and set the stripe count for all directories in a currently mounted lustre filesystem. Besides the traversal cost of the filesystem, are there other disadvantages, performance reasons, or other reasons not to use option 2? thanks -k ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] MGS and loop devices
I don't know if this still applies, but back when I was doing some work with Xen Hypervisor, loopback devices did not provide safe places to put files in a power failure. Loopback did not make sure that things in memory were flushed to the file and synced to the disk, leaving dirty data in memory. Might want to verify this, just don't get caught with stuff in ram. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Apr 14, 2008, at 3:12 PM, Jakob Goldbach wrote: On Mon, 2008-04-14 at 17:40 +0200, Fereyre Jerome wrote: Has anybody used loop devices for MGT? Since there's not so much information stored in this Target, it can be a good alternative to disk partitions... You could place it on the same partition/volume as the MDT but I believe you get less noise in dmesg during start/stop if you have the MGT and MDT seperate as this allows you to start the MDT after your OSSs. I'm using an LVM volume for my MGS (and MDT - wanted to try the fast scanner for backup which requires snapshoting). My MGS size is 64MB - about 10% is used in a two oss + 3 clients setup. I'm also interested in knowing about how much space the MGT uses for a many-node system. /Jakob ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] filesystem UID' GID's
Does a /etc/passwd with all the filesystem users UID's required only on the MDS ? Or does the OST's need them also? Testing for me shows only the MDS, but I could be wrong. We don't use LDAP or anything like that at the moment for UID GID mapping. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] more problems with lustre,
I found today that a very large number of nodes who are lustre clients show ptlrpcd taking 100% cpu and the lustre mount is completely unavailable! I have attached the output of 'lctl dk /tmp/data' to this message. Any insight would be helpful. I am afraid though that this with the problem of clients being evicted all the time, that our evaluation of lustre will end, and we will not be using it in the future :-( When it works it works great, but to our group we can not deal with how unstable it is. We will try lustre again when it hits version 2.0 (running 1.6.4.1 right now with patchless clients). Thanks for all the help you have given us while we have been evaluating it! data Description: Binary data Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] lustre dstat plugin
On Mar 9, 2008, at 10:03 PM, Aaron Knister wrote: Just wondering if either of you have used collectl if/and which you prefer- dstat or collectl. Never used it, Looks like they solve the same problem. I like dstat for the simple plugins. (if your a better python programer than me). And how you can pull our results, like I use the following on our lustre OSS with two OST's sda and sdb. dstat -D sda,sdb,total That gives me per disk stats and a total. Similar tools could be made for collectl I'm sure. Brock -Aaron On Mar 7, 2008, at 7:03 PM, Brock Palen wrote: On Mar 7, 2008, at 6:58 PM, Kilian CAVALOTTI wrote: Hi Brock, On Wednesday 05 March 2008 05:21:51 pm Brock Palen wrote: I have wrote a lustre dstat plugin. You can find it on my blog: That's cool! Very useful for my daily work, thanks! Thanks! Its the first python I ever wrote. It only works on clients, and has not been tested on multiple mounts, Its very simple just reads /proc/ It indeed doesn't read stats for multiple mounts. I slightly modified it so it can display read/write numbers for all the mounts it founds (see the attached patch). This is great idea Here's a typical output for a rsync transfer from scrath to home: -- 8 --- $ dstat -M lustre Module dstat_lustre is still experimental. --scratch---home--- read write: read write 110M0 : 0 110M 183M0 : 0 183M 184M0 : 0 184M -- 8 --- Maybe it could be useful to also add the other metrics from the stat file, but I'm not sure which ones would be the more relevant. And it would probably be wise to do that in a separate module, like lustre_stats, to avoid clutter. Yes, dstat comes with plugins for nfsv3 and has two modules, dstat_nfs3 and dstat_nfs3op which has extended details. So I think this would be a good idea to follow that model. Anyway, great job, and thanks for sharing it! Thanks again. Cheers, -- Kiliandstat_lustre.diff ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss Aaron Knister Associate Systems Analyst Center for Ocean-Land-Atmosphere Studies (301) 595-7000 [EMAIL PROTECTED] ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] yet another lustre error
On Mar 9, 2008, at 10:01 PM, Aaron Knister wrote: Hi! I have a few questions for you- 1. How many nodes was his job running on? around 64 serial jobs accessing the same directory (not the same files). 2. What version of lustre and linux kernel are you running on your servers/clients? Lustre servers: 2.6.9-55.0.9.EL_lustre.1.6.4.1smp Clients: 2.6.9-67.0.1.ELsmp 3. What ethernet module are you using on the servers/clients? Most use the tg3, some use e1000. I honestly am not sure what the RPC errors mean but I've had similar issues caused by ethernet-level errors. Over the weekend the MDS/MGS went into a unhealthy state forced a reboot+fsck and when it came back up the directory was accessible again and jobs started working again. -Aaron On Mar 7, 2008, at 6:45 PM, Brock Palen wrote: On a file system thats been up for only 57 days, I have: 505 lustre-log. dumps. THe problem at hand is a user has many jobs where his jobs are now hung trying to create a directory from his pbs script. On the clients i see: LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The mds_connect operation failed with -16 LustreError: Skipped 2 previous similar messages On every client his jobs are on. In the most recent /tmp/lustre-log. on the MDS/MGS I see this message: @@@ processing error (-16) [EMAIL PROTECTED] x12808293/t0 o38- [EMAIL PROTECTED]:-1 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0 ldlm_lib.c target_handle_reconnect nobackup-MDT: 34b4fbea-200b-1f7c-dac0-516b8ce786fc reconnecting ldlm_lib.c target_handle_connect nobackup-MDT: refuse reconnection from 34b4fbea-200b-1f7c- [EMAIL PROTECTED]@tcp to 0x0100069a7000; still busy with 2 active RPCs ldlm_lib.c target_send_reply_msg @@@ processing error (-16) [EMAIL PROTECTED] x11199816/t0 o38- [EMAIL PROTECTED]:-1 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0 What I see messages about active rpc's in other logs. What would this mean? Is something suck someplace ? Brock Palen Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss Aaron Knister Associate Systems Analyst Center for Ocean-Land-Atmosphere Studies (301) 595-7000 [EMAIL PROTECTED] ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] socknal_sd00 100% lower?
On Mar 7, 2008, at 8:51 AM, Maxim V. Patlasov wrote: Brock, If our IO servers are seeing extended periods of socknal_sd00 at 100% cpu, Would this cause a bottle neck? Yes, I think so. If so its a single homed hosts, would adding another interface to the host help? Probably, no. It could only help in the case you have several CPUs but something prevents ksocklnd to spread the load over them. The servers are dual cpu systems. But I only see a single socknal_sd thread. Is there threading anyplace? Yes, ksocklnd spawns separate socknal_sd thread for each CPU/core that you have. There are two algorithms of spreading the load - you can play with enable_irq_affinity modparam flag. I see some things in logs about setting cpu affinity, Ill check out the manual some more, Or is faster cpu the only way out? I believe you either need faster CPU or faster system bus. If slow system bus isn't your case, increasing number of CPUs also will do. Ok Sincerely, Maxim ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] yet another lustre error
On a file system thats been up for only 57 days, I have: 505 lustre-log. dumps. THe problem at hand is a user has many jobs where his jobs are now hung trying to create a directory from his pbs script. On the clients i see: LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The mds_connect operation failed with -16 LustreError: Skipped 2 previous similar messages On every client his jobs are on. In the most recent /tmp/lustre-log. on the MDS/MGS I see this message: @@@ processing error (-16) [EMAIL PROTECTED] x12808293/t0 o38- [EMAIL PROTECTED]:-1 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0 ldlm_lib.c target_handle_reconnect nobackup-MDT: 34b4fbea-200b-1f7c-dac0-516b8ce786fc reconnecting ldlm_lib.c target_handle_connect nobackup-MDT: refuse reconnection from 34b4fbea-200b-1f7c- [EMAIL PROTECTED]@tcp to 0x0100069a7000; still busy with 2 active RPCs ldlm_lib.c target_send_reply_msg @@@ processing error (-16) [EMAIL PROTECTED] x11199816/t0 o38- [EMAIL PROTECTED]:-1 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0 What I see messages about active rpc's in other logs. What would this mean? Is something suck someplace ? Brock Palen Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] socknal_sd00 100% lower?
If our IO servers are seeing extended periods of socknal_sd00 at 100% cpu, Would this cause a bottle neck? If so its a single homed hosts, would adding another interface to the host help? Is there threading anyplace? Or is faster cpu the only way out? Brock Palen Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] lustre dstat plugin
I have wrote a lustre dstat plugin. You can find it on my blog: http://www.mlds-networks.com/index.php/component/option,com_mojo/ Itemid,29/p,31/ It only works on clients, and has not been tested on multiple mounts, Its very simple just reads /proc/ Example: dstat -a -M lustre total-cpu-usage -dsk/total- -net/total- ---paging-- --- system-- lustre-1.6- usr sys idl wai hiq siq| read writ| recv send| in out | int csw | read writ 23 53 1 21 0 0| 0 0 |3340k 4383k| 0 0 | 3476 198 | 16M 22M 13 69 16 2 0 1| 0 0 |1586k 16M| 0 0 | 3523 424 | 24M 14M 69 30 0 0 01| 0 8192B|1029k 18M| 0 0 | 3029 88 | 0 0 Patches/comments, Brock Palen Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Luster clients getting evicted
If client get eviction from the server, it might be triggered by 1) server did not get client pinger msg in a long time. 2) client is too busy to handle the server lock cancel req. Clients show a load of 4.2 (4 cores total, 1 process per core). 3) client cancel the lock, but the network just dropped the cancel reply to server. I see a very small amount (6339) of dropped packets on the interfaces of the OSS. Links between the switches show no errors. 4) server is too busy to handle the lock cancel reply from the client or be blocked somewhere. I started paying attention to the OSS more once you said this, some times i see the cpu use of socknal_sd00 get to 100%. Now is this process used to keep all the odb_ping's going? Both the OSS and the MDS/MGS are SMP systems and run single interfaces. If I dual homed the servers would that create another socknal process for lnet? It seems there are a lot of metadata operations in your job. I guess your eviction might be caused by the latter 2 reasons. If you could provide the process stack trace on MDS when the job died, it might help us to figure out what is going on there? WangDi Brock Palen Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Feb 4, 2008, at 2:47 PM, Brock Palen wrote: Which version of lustre do you use? Server and clients same version and same os? which one? lustre-1.6.4.1 The servers (oss and mds/mgs) use the RHEL4 rpm from lustre.org: 2.6.9-55.0.9.EL_lustre.1.6.4.1smp The clients run patchless RHEL4 2.6.9-67.0.1.ELsmp One set of clients are on a 10.x network while the servers and other half of clients are on a 141. network, because we are using the tcp network type, we have not setup any lnet routes. I don't think should cause a problem but I include the information for clarity. We do route 10.x on campus. Harald On Monday 04 February 2008 04:11 pm, Brock Palen wrote: on our cluster that has been running lustre for about 1 month. I have 1 MDT/MGS and 1 OSS with 2 OST's. Our cluster uses all Gige and has about 608 nodes 1854 cores. We have allot of jobs that die, and/or go into high IO wait, strace shows processes stuck in fstat(). The big problem is (i think) I would like some feedback on it that of these 608 nodes 209 of them have in dmesg the string This client was evicted by Is this normal for clients to be dropped like this? Is there some tuning that needs to be done to the server to carry this many nodes out of the box? We are using default lustre install with Gige. Brock Palen Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- Harald van Pee Helmholtz-Institut fuer Strahlen- und Kernphysik der Universitaet Bonn ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss