from:"Brock Palen"

Re: [lustre-discuss] Lustre on Ceph Block Devices

2017-02-22 Thread Brock Palen

If we do test this I'll let you know how it works.

Why Lustre on GPFS?  Why not just run GPFS then given it support byte range
locking / MPI-IO and POSIX (Ignore license costs).

I'm trying to limit the number of disk systems to maintain in a system of
modest size where both MPI-IO and Object is required.I have dedicated
Lustre today for larger systems and they will stay that way.  Was just
curious if anyone tried this.


Brock Palen
www.umich.edu/~brockp
Director Advanced Research Computing - TS
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985

On Wed, Feb 22, 2017 at 4:54 AM, Shinobu Kinjo <shinobu...@gmail.com> wrote:

> Yeah, that's interesting. But that does not really make sense to use
> Lustre. And should not be used for any computations.
>
> If anything goes wrong, troubleshooting would become nightmare.
>
> Have you ever thought of using Lustre on top of GPFS native client?
>
> Anyway if you are going to build Lustre on top of any RADOS client and run
> MPI jobs, please share results. I'm really really interested in them.
>
>
>
> On Wed, Feb 22, 2017 at 2:06 PM, Brian Andrus <toomuc...@gmail.com> wrote:
>
>> I had looked at it, but then, why?
>>
>> There is no benefit using object storage when you are putting lustre over
>> top. It would bog down. Supposedly you would want to use CephFS over the
>> ceph storage. It talks directly to rados.
>> If you are able to enunciate the rados block devices, you should also be
>> able to send them directly as block devices (iSCSI at least) so lustre is
>> able to manage where the data is stored and use it's optimizing. Otherwise
>> the data can't be optimized. Lustre would THINK it knows where it was, but
>> the rados crush map would have put it somewhere else.
>>
>> Just my 2cents.
>>
>> Brian
>> On 2/21/2017 3:08 PM, Brock Palen wrote:
>>
>> Has anyone ever ran Lustre OST's (and maybe MDT's)  on Ceph Radios Block
>> Devices?
>>
>> In theory this would work just like an SAN attached solution.  Has anyone
>> ever done it before?  I know we are seeing decent performance from RBD on
>> our system but I don't have a way to test lustre on it.
>>
>> I'm looking at a future system where Ceph and Lustre might be needed
>> (Object and High performance HPC) but also not a huge budget to have two
>> full disk stacks.  So an idea was to have lustre servers consume Ceph Block
>> devices, and that same cluster serves object requests.
>>
>> Thoughts or prior art?  This probably isn't that different than the Cloud
>> Formation script that uses EBS volumes if it works as intended.
>>
>> Thanks
>>
>> Brock Palen
>> www.umich.edu/~brockp <http://www.umich.edu/%7Ebrockp>
>> Director Advanced Research Computing - TS
>> XSEDE Campus Champion
>> bro...@umich.edu
>> (734)936-1985 <(734)%20936-1985>
>>
>>
>> ___
>> lustre-discuss mailing 
>> listlustre-discuss@lists.lustre.orghttp://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>>
>>
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] Lustre on Ceph Block Devices

2017-02-21 Thread Brock Palen

Has anyone ever ran Lustre OST's (and maybe MDT's)  on Ceph Radios Block
Devices?

In theory this would work just like an SAN attached solution.  Has anyone
ever done it before?  I know we are seeing decent performance from RBD on
our system but I don't have a way to test lustre on it.

I'm looking at a future system where Ceph and Lustre might be needed
(Object and High performance HPC) but also not a huge budget to have two
full disk stacks.  So an idea was to have lustre servers consume Ceph Block
devices, and that same cluster serves object requests.

Thoughts or prior art?  This probably isn't that different than the Cloud
Formation script that uses EBS volumes if it works as intended.

Thanks

Brock Palen
www.umich.edu/~brockp
Director Advanced Research Computing - TS
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [Lustre-discuss] How to efficiently get sizes of all files stored in Lustre?

2014-09-17 Thread Brock Palen

I would like to add though, if you want to scan the filesystem quickly with 
robinhood, the current versions are _very_ slow,  but work with change logs so 
it can always be upto-date which could be go.

I personally rolled back to a 2.3 version for two reasons,

*We don't use change logs so scan speed is important
*The ENTIRES table schema is simple, and lets us to queries in HIVE and Pig 
must more easily, and even commands like:

rbh-report -i -P /limited/path/

Are much faster.  The above on newer versions of robinhood are slower than 
using find.

So if you find it slow, try an old version.  Or if you are using change logs 
and can have it run all the time, new versions should be fast enough to keep up 
with changes.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Sep 17, 2014, at 8:44 AM, Alexander Oltu alexander.o...@uni.no wrote:

 On Tue, 16 Sep 2014 16:41:20 +0200
 Marcin Barczyński wrote:
 
 Hello,
 
 I would like to efficiently get sizes of all files stored in Lustre
 filesystem.
 
 
 Another approach can be to use Robinhood,
 http://sourceforge.net/projects/robinhood/
 
 Best regards,
 Alex.
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] killing lfs_migrate

2012-02-27 Thread Brock Palen

I will have a limited window to migrate files to a new OST. I would like to get 
as far as I can in the window I have.

Is it safe to kill lfs_migrate while it is still running?

If so will it leave any 'partial copies' around?

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] killing lfs_migrate

2012-02-27 Thread Brock Palen

On Feb 27, 2012, at 2:49 PM, Ashley Pittman wrote:

 
 On 27 Feb 2012, at 19:30, Brock Palen wrote:
 
 I will have a limited window to migrate files to a new OST. I would like to 
 get as far as I can in the window I have.
 
 Is it safe to kill lfs_migrate while it is still running?
 
 If so will it leave any 'partial copies' around?
 
 The script will be limited by client bandwidth, if possible you could run 
 multiple instances, each working on a different part of the tree you want 
 copied.

Noted and planned on doing that, 

 
 I'd also consider mounting the FS as a client on the server which hosts the 
 OST and running it there.

Wasn't making a server also a client considered bad juju ?  Memory pressure 
things and panics and other badness.   This would be nice because the OSS's 
have the biggest network pipes in our setup.   BTW I am moving old files from 
old OST's to new OST's to balance them back out.  in usage and in age 
distribution. 

 
 Ashley.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] Setting lustre directory and content immutable but keep permissions

2011-12-14 Thread Brock Palen

For a policy issue with scratch space we want to lock a users scratch space 
that lives on lustre 1.8.x.

We want users to be able to grab their data but not be able to add any more, 
they also do not need to delete files.

We could recursively remove write bit, problem is we at time may wish to 
restore write access to the files with the same permissions they had before so 
we wish to not change the permissions.  We also don't want to put the stress of 
a bunch of chmod's on the MDS.

So short is there a simple way to say 'directory and children not mutable'  
that is undoable by an admin?

Thanks!

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] Line rate performance for clients

2011-07-29 Thread Brock Palen

I think this is a networking question.

We have lustre 1.8 clients with 1gig-e interfaces that according to ethtool are 
running full duplex.

If I do the following:

cp /lustre/largeilfe.h5 /tmp/

I get 117MB/s

If I then use globus-url-copy to move that file from /tmp/ to - remove tape 
archive I get 117MB/s

If I go directly from  /lustre - archive  I get 50MB/s,  

this is consistently reproducible.  It doesn't mater if I just copy a large 
file on lustre to lustre,  or scp, or globus.  If I try to ingest and outgest 
data I get what looks like half duplex performance. 

Anyone have ideas why I cannot do 1Gig-e full duplex?

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Line rate performance for clients

2011-07-29 Thread Brock Palen



Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Jul 29, 2011, at 2:01 PM, Andreas Dilger wrote:

 On 2011-07-29, at 11:33 AM, Brock Palen wrote:
 I think this is a networking question.
 
 We have lustre 1.8 clients with 1gig-e interfaces that according to ethtool 
 are running full duplex.
 
 If I do the following:
 
 cp /lustre/largeilfe.h5 /tmp/
 
 I get 117MB/s
 
 If I then use globus-url-copy to move that file from /tmp/ to - remove tape 
 archive I get 117MB/s
 
 If I go directly from  /lustre - archive  I get 50MB/s,  
 
 Strace your globus-url-copy and see what IO size it is using.  cp has long 
 ago been modified to use the blocksize reported by stat(2) for copying, and 
 Lustre reports a 2MB IO size for striped files (1MB for unstriped).  If your 
 globus tool is using e.g. 4kB reads then it will be very inefficient for 
 Lustre, but much less so than from /tmp.
 
 this is consistently reproducible.  It doesn't mater if I just copy a large 
 file on lustre to lustre,  or scp, or globus.  If I try to ingest and 
 outgest data I get what looks like half duplex performance. 
 
 Anyone have ideas why I cannot do 1Gig-e full duplex?
 
 I don't think this has anything to do with full duplex.  117MB/s is pretty 
 much  the maximum line rate for GigE (and pretty good for Lustre, if I do say 
 so myself) in one direction.  There is presumably no data moving in the other 
 direction at that time.

Ah I guess I wasn't clear, I only get 117MB/s when I do 'one direction on the 
network'  eg copy form lustre to /tmp (local drive)',   /tmp using globus out.

Its just when the client is reading form lustre and sending the data out at the 
same time that I only get 50MB/s.  

Does that make sense?  Is it even right for me to expect that I could combine 
the performance together and expect full speed in and full speed out if I can 
consistently get them independent of each other? 

 
 Cheers, Andreas
 --
 Andreas Dilger 
 Principal Engineer
 Whamcloud, Inc.
 
 
 
 
 

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] mv_sata module for rhel5 and write through patch

2011-05-26 Thread Brock Palen

We are (finally) updating our x4500's to rhel5 and luster 1.8.5 from rhel4 and 
1.6.7

On rhel4 we had used the patch from:
https://bugzilla.lustre.org/show_bug.cgi?id=14040

for the mv_sata  module.

Is this still recommended on rhel5? To use the mv_sata module over the stock 
redhat sata_mv as well as applying this patch?  That patch is quite old is 
there a newer one?

What are other x4500/thumper users running?

Also I will do some digging on the list but why is lustre 2.0 not the 
'production' version? We are planning on 1.8.x for now but if 2.0 is stable we 
would install that one.

Can we upgrade directly from 1.6 to 2.0 if we did this?

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] finding clients that is opening/closing files

2010-10-26 Thread Brock Palen

This was very helpful, I found the culprit. 

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Oct 26, 2010, at 3:42 PM, Wojciech Turek wrote:

 One way is to check the /proc/fs/lustre/mds/*/exports/*/stats files, which 
 contains per-client statistics.  They can be cleared by writing 0 to the 
 file, and then check for files with lots of operations.
 
 
 On 26 October 2010 20:10, Brock Palen bro...@umich.edu wrote:
 I have wat I think is a badly behaving user, look at
 /proc/fs/lustre/mds/nobackup-MDT/stats
 
 The open/close counters are running about 1000/s,
 
 I would like to track down what clients this is coming from and knock the 
 users about fixing there code that are doing this.
 
 how does does look at 'stats by node'  do I need to look at all clients?  Or 
 can I get this information from the mds?
 Thanks!
 
 Brock Palen
 www.umich.edu/~brockp
 Center for Advanced Computing
 bro...@umich.edu
 (734)936-1985
 
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 
 
 -- 
 Wojciech Turek
 
 Senior System Architect
 
 High Performance Computing Service
 University of Cambridge
 Email: wj...@cam.ac.uk
 Tel: (+)44 1223 763517 

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] controlling which eth interface lustre uses

2010-10-21 Thread Brock Palen

We recently added a new oss, it has 1 1Gb interface and 1 10Gb interface, 

The 10Gb interface is eth4 10.164.0.166
The 1Gb   interface is eth0 10.164.0.10

In modprobe.conf I have:

options lnet networks=tcp0(eth4)

lctl list_nids
10.164.0@tcp

From a host I run:

lctl which_nid oss4
10.164.0@tcp

But yet I still see traffic over eth0 the 1Gb management network, might higher 
than I would expect (upto 100MB/s) The management interface is oss4-gb  So If I 
do from a client:

lctl which_nid oss4-gb
10.164.0...@tcp

Why If I have netwroks=tcp0(eth4)  and that list_nids showa only the 10Gb 
interface, do I have so much traffic over the 1Gb interface?  There is some 
traffic on the 10Gb interface, but I would like to tell lustre 'don't use the 
1Gb interface'.

Thanks!

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] controlling which eth interface lustre uses

2010-10-21 Thread Brock Palen

On Oct 21, 2010, at 9:48 AM, Joe Landman wrote:

 On 10/21/2010 09:37 AM, Brock Palen wrote:
 We recently added a new oss, it has 1 1Gb interface and 1 10Gb
 interface,
 
 The 10Gb interface is eth4 10.164.0.166 The 1Gb   interface is eth0
 10.164.0.10
 
 They look like they are on the same subnet if you are using /24 ...

You are correct 

Both interfaces are on the same subnet:

[r...@oss4-gb ~]# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric RefUse Iface
10.164.0.0  *   255.255.248.0   U 0  00 eth0
10.164.0.0  *   255.255.248.0   U 0  00 eth4
169.254.0.0 *   255.255.0.0 U 0  00 eth4
default 10.164.0.1  0.0.0.0 UG0  00 eth0

There is no way to mask the lustre service away from the 1Gb interface?  

 
 
 In modprobe.conf I have:
 
 options lnet networks=tcp0(eth4)
 
 lctl list_nids 10.164.0@tcp
 
 From a host I run:
 
 lctl which_nid oss4 10.164.0@tcp
 
 But yet I still see traffic over eth0 the 1Gb management network,
 might higher than I would expect (upto 100MB/s) The management
 interface is oss4-gb  So If I do from a client:
 
 lctl which_nid oss4-gb 10.164.0...@tcp
 
 Why If I have netwroks=tcp0(eth4)  and that list_nids showa only the
 10Gb interface, do I have so much traffic over the 1Gb interface?
 There is some traffic on the 10Gb interface, but I would like to tell
 lustre 'don't use the 1Gb interface'.
 
 If they are on the same subnet, its possible that the 1GbE sees the arp 
 response first.  And then its pretty much guaranteed to have the traffic 
 go out that port.
 
 If your subnets are different, this shouldn't be the issue.
 
 
 Thanks!
 
 Brock Palen www.umich.edu/~brockp Center for Advanced Computing
 bro...@umich.edu (734)936-1985
 
 
 
 ___ Lustre-discuss
 mailing list Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 
 -- 
 Joseph Landman, Ph.D
 Founder and CEO
 Scalable Informatics Inc.
 email: land...@scalableinformatics.com
 web  : http://scalableinformatics.com
http://scalableinformatics.com/jackrabbit
 phone: +1 734 786 8423 x121
 fax  : +1 866 888 3112
 cell : +1 734 612 4615
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] controlling which eth interface lustre uses

2010-10-21 Thread Brock Palen



 Why do you need both active?  If one is a backup to the other, then bond 
 them as a primary/backup pair, meaning only one will be active at at a 
 time, ie, your designated primary (unless it goes down).

We could do this, the 10Gb drivers have been such a pain for us we wanted to 
have a 'back door' management network to get to the box should we have issues 
with the 10Gb driver.

Oddly I ran:

ifconfig eth0 down 

and I could nolonger ping the box over the eth4 interface, I had to power cycle 
it form management.  Very odd.

 
 bob
 
 On 10/21/2010 9:51 AM, Brock Palen wrote:
 On Oct 21, 2010, at 9:48 AM, Joe Landman wrote:
 
 On 10/21/2010 09:37 AM, Brock Palen wrote:
 We recently added a new oss, it has 1 1Gb interface and 1 10Gb
 interface,
 
 The 10Gb interface is eth4 10.164.0.166 The 1Gb   interface is eth0
 10.164.0.10
 They look like they are on the same subnet if you are using /24 ...
 You are correct
 
 Both interfaces are on the same subnet:
 
 [r...@oss4-gb ~]# route
 Kernel IP routing table
 Destination Gateway Genmask Flags Metric RefUse Iface
 10.164.0.0  *   255.255.248.0   U 0  00 eth0
 10.164.0.0  *   255.255.248.0   U 0  00 eth4
 169.254.0.0 *   255.255.0.0 U 0  00 eth4
 default 10.164.0.1  0.0.0.0 UG0  00 eth0
 
 There is no way to mask the lustre service away from the 1Gb interface?
 
 In modprobe.conf I have:
 
 options lnet networks=tcp0(eth4)
 
 lctl list_nids 10.164.0@tcp
 
 From a host I run:
 lctl which_nid oss4 10.164.0@tcp
 
 But yet I still see traffic over eth0 the 1Gb management network,
 might higher than I would expect (upto 100MB/s) The management
 interface is oss4-gb  So If I do from a client:
 
 lctl which_nid oss4-gb 10.164.0...@tcp
 
 Why If I have netwroks=tcp0(eth4)  and that list_nids showa only the
 10Gb interface, do I have so much traffic over the 1Gb interface?
 There is some traffic on the 10Gb interface, but I would like to tell
 lustre 'don't use the 1Gb interface'.
 If they are on the same subnet, its possible that the 1GbE sees the arp
 response first.  And then its pretty much guaranteed to have the traffic
 go out that port.
 
 If your subnets are different, this shouldn't be the issue.
 
 Thanks!
 
 Brock Palen www.umich.edu/~brockp Center for Advanced Computing
 bro...@umich.edu (734)936-1985
 
 
 
 ___ Lustre-discuss
 mailing list Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 -- 
 Joseph Landman, Ph.D
 Founder and CEO
 Scalable Informatics Inc.
 email: land...@scalableinformatics.com
 web  : http://scalableinformatics.com
http://scalableinformatics.com/jackrabbit
 phone: +1 734 786 8423 x121
 fax  : +1 866 888 3112
 cell : +1 734 612 4615
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] controlling which eth interface lustre uses

2010-10-21 Thread Brock Palen

On Oct 21, 2010, at 10:35 AM, Brian J. Murrell wrote:

 On Thu, 2010-10-21 at 10:29 -0400, Brock Palen wrote: 
 
 We could do this, the 10Gb drivers have been such a pain for us we wanted to 
 have a 'back door' management network to get to the box should we have 
 issues with the 10Gb driver.
 
 If you really do want two separate networks, one for Lustre and one for
 management, they why not configure them as separate networks with
 different subnets?  Anything else is just going to confuse the routing
 engine.
 
 I think at best two interfaces on the same subnet is going to cause
 indeterminate behaviour.

We settled on disabling the eth0 interface and hope the 10Gb driver will not 
give us any more trouble.
We don't currently have a dedicated management network, it was passed over 
being setup for just a single host.



 
 b.
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] mixing server versions

2010-09-15 Thread Brock Palen

We have a filesystem that we can't take down for a while to upgrade the oss's 
they are running 1.6.x  we do have a need to quickly add some storage to it, 
and thus the new server would have 1.8.x   

Is there any problems with this?   I know 1.6.x isn't supported anymore and we 
would like to move everything to 1.8 soon but we are in a bind for the moment.

Is our only (safe) option to load 1.6.x on the new server also and wait till we 
can shutdown the filesystem?

Thanks!

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] repquota for lustre

2009-10-23 Thread Brock Palen

I see the bug in bugzilla from version 1.4 that is put on hold, I just  
want to bump interest for such a tool.

If anyone has made something that does quota reports for lustre I  
would be interested.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] repquota for lustre

2009-10-23 Thread Brock Palen

Thanks I am checking it out,

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Oct 23, 2009, at 3:38 PM, Jim Garlick wrote:

 I wrote a 'repquota' tool that groks lustre:

 http://sourceforge.net/projects/rquota/

 I think LBL has a lustre quota reporting tool as well.

 Jim

 On Fri, Oct 23, 2009 at 02:31:49PM -0400, Brock Palen wrote:
 I see the bug in bugzilla from version 1.4 that is put on hold, I  
 just
 want to bump interest for such a tool.

 If anyone has made something that does quota reports for lustre I
 would be interested.

 Brock Palen
 www.*umich.edu/~brockp
 Center for Advanced Computing
 bro...@umich.edu
 (734)936-1985



 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://*lists.lustre.org/mailman/listinfo/lustre-discuss



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] recover borked mds

2009-08-20 Thread Brock Palen

Some additional details,
I mounted the mds as ldiskfs  and deleted the files in  OBJECTS/*  and  
CATALOGS,
Remounted as lustre, same issue.
I also did a write conf, restarted all the servers, saw messages on  
the MGS, that new config logs were being created, but still same error  
on the mds trying to start up.
Is there a way to get lustre to stop trying to open  
0xf150010:80d24629:  ?  And not go though recovery?

If not,  can I format a new mds,  and just untar  ROOTS/  and apply  
the extended attributes to ROOTS from the old mds filesystem?

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Aug 19, 2009, at 12:57 PM, Brock Palen wrote:

 After a network event (switches bouncing) looks like our mds got
 borked somewhere, from all the random failovers (switches came up and
 down rapidly over a few hours).

 Now we can not mount the mds,  when we do we get the following errors:

 Aug 19 12:37:39 mds2 kernel: LustreError: 137-5: UUID 'nobackup-
 MDT_UUID' is not available  for connect (no target)
 Aug 19 12:37:39 mds2 kernel: LustreError: 7455:0:(ldlm_lib.c:
 1619:target_send_reply_msg()) @@@ processing error (-19)
 r...@01037c9db600 x85226/t0 o38-?@?:0/0 lens 304/0 e 0 to 0 dl
 1250699959 ref 1 fl Interpret:/0/0 rc -19/0
 Aug 19 12:37:39 mds2 kernel: LustreError: 137-5: UUID 'nobackup-
 MDT_UUID' is not available  for connect (no target)
 Aug 19 12:37:39 mds2 kernel: LustreError: 7456:0:(ldlm_lib.c:
 1619:target_send_reply_msg()) @@@ processing error (-19)
 r...@0104163a6000 x47117/t0 o38-?@?:0/0 lens 304/0 e 0 to 0 dl
 1250699959 ref 1 fl Interpret:/0/0 rc -19/0
 Aug 19 12:37:39 mds2 kernel: LustreError: 137-5: UUID 'nobackup-
 MDT_UUID' is not available  for connect (no target)Aug 19 12:37:39
 mds2 kernel: LustreError: Skipped 11 previous similar messages
 Aug 19 12:37:39 mds2 kernel: LustreError: 7468:0:(ldlm_lib.c:
 1619:target_send_reply_msg()) @@@ processing error (-19)
 r...@010350a4d200 x81788/t0 o38-?@?:0/0 lens 304/0 e 0 to 0 dl
 1250699959 ref 1 fl Interpret:/0/0 rc -19/0
 Aug 19 12:37:39 mds2 kernel: LustreError: 7468:0:(ldlm_lib.c:
 1619:target_send_reply_msg()) Skipped 11 previous similar messages
 Aug 19 12:37:40 mds2 kernel: LustreError: 137-5: UUID 'nobackup-
 MDT_UUID' is not available  for connect (no target)
 Aug 19 12:37:40 mds2 kernel: LustreError: Skipped 18 previous similar
 messages
 Aug 19 12:37:40 mds2 kernel: LustreError: 7455:0:(ldlm_lib.c:
 1619:target_send_reply_msg()) @@@ processing error (-19)
 r...@010414dc1850 x81855/t0 o38-?@?:0/0 lens 304/0 e 0 to 0 dl
 1250699960 ref 1 fl Interpret:/0/0 rc -19/0Aug 19 12:37:40 mds2
 kernel: LustreError: 7455:0:(ldlm_lib.c:1619:target_send_reply_msg())
 Skipped 18 previous similar messages
 Aug 19 12:37:42 mds2 kernel: LustreError: 137-5: UUID 'nobackup-
 MDT_UUID' is not available  for connect (no target)
 Aug 19 12:37:42 mds2 kernel: LustreError: Skipped 42 previous similar
 messages
 Aug 19 12:37:42 mds2 kernel: LustreError: 7466:0:(ldlm_lib.c:
 1619:target_send_reply_msg()) @@@ processing error (-19)
 r...@01037c9db600 x77144/t0 o38-?@?:0/0 lens 304/0 e 0 to 0 dl
 1250699962 ref 1 fl Interpret:/0/0 rc -19/0
 Aug 19 12:37:42 mds2 kernel: LustreError: 7466:0:(ldlm_lib.c:
 1619:target_send_reply_msg()) Skipped 42 previous similar messages
 Aug 19 12:37:43 mds2 kernel: Lustre: Request x3 sent from
 mgc10.164.3@tcp to NID 10.164.3@tcp 5s ago has timed out
 (limit 5s).
 Aug 19 12:37:43 mds2 kernel: Lustre: Changing connection for
 mgc10.164.3@tcp to mgc10.164.3@tcp_1/0...@lo
 Aug 19 12:37:43 mds2 kernel: Lustre: Enabling user_xattr
 Aug 19 12:37:43 mds2 kernel: Lustre: 7524:0:(mds_fs.c:
 493:mds_init_server_data()) RECOVERY: service nobackup-MDT, 439
 recoverable clients, last_transno 3647966566
 Aug 19 12:37:43 mds2 kernel: Lustre: MDT nobackup-MDT now serving
 dev (nobackup-MDT/57dddb69-2475-b551-4100-e045f91ce38c), but will
 be in recovery for at least 5:00, or
 until 439 clients reconnect. During this time new clients will not be
 allowed to connect. Recovery progress can be monitored by watching /
 proc/fs/lustre/mds/nobackup-MDT/rec
 overy_status.
 Aug 19 12:37:43 mds2 kernel: Lustre: 7524:0:(lproc_mds.c:
 273:lprocfs_wr_group_upcall()) nobackup-MDT: group upcall set to /
 usr/sbin/l_getgroups
 Aug 19 12:37:43 mds2 kernel: Lustre: nobackup-MDT.mdt: set
 parameter group_upcall=/usr/sbin/l_getgroupsAug 19 12:37:43 mds2
 kernel: Lustre: 7524:0:(mds_lov.c:1070:mds_notify()) MDS nobackup-
 MDT: in recovery, not resetting orphans on nobackup-OST_UUID
 Aug 19 12:37:43 mds2 kernel: Lustre: nobackup-MDT: temporarily
 refusing client connection from 10.164.1@tcp
 Aug 19 12:37:43 mds2 kernel: LustreError: 7525:0:(llog_lvfs.c:
 612:llog_lvfs_create()) error looking up logfile 0xf150010:0x80d24629:
 rc -2
 Aug 19 12:37:43 mds2 kernel: LustreError: 7525:0:(llog_cat.c:
 176:llog_cat_id2handle

[Lustre-discuss] recover borked mds

2009-08-19 Thread Brock Palen

:0x9642a0ac
Aug 19 12:37:43 mds2 kernel: LustreError: 7524:0:(lov_log.c: 
230:lov_llog_init()) error osc_llog_init idx 0 osc 'nobackup-OST- 
osc' tgt 'nobackup-MDT' (rc=-2)
Aug 19 12:37:43 mds2 kernel: LustreError: 7524:0:(mds_log.c: 
220:mds_llog_init()) lov_llog_init err -2
Aug 19 12:37:43 mds2 kernel: LustreError: 7524:0:(llog_obd.c: 
417:llog_cat_initialize()) rc: -2
Aug 19 12:37:43 mds2 kernel: LustreError: 7524:0:(lov_obd.c: 
727:lov_add_target()) add failed (-2), deleting nobackup-OST_UUID
Aug 19 12:37:43 mds2 kernel: LustreError: 7524:0:(obd_config.c: 
1093:class_config_llog_handler()) Err -2 on cfg command:
Aug 19 12:37:43 mds2 kernel: Lustre:cmd=cf00d 0:nobackup-mdtlov   
1:nobackup-OST_UUID  2:0  3:1
Aug 19 12:37:43 mds2 kernel: LustreError: 15c-8: mgc10.164.3@tcp:  
The configuration from log 'nobackup-MDT' failed (-2). This may be  
the result of communication errors b
etween this node and the MGS, a bad configuration, or other errors.  
See the syslog for more information.
Aug 19 12:37:43 mds2 kernel: LustreError: 7438:0:(obd_mount.c: 
1113:server_start_targets()) failed to start server nobackup-MDT: -2
Aug 19 12:37:44 mds2 kernel: LustreError: 7438:0:(obd_mount.c: 
1623:server_fill_super()) Unable to start targets: -2
Aug 19 12:37:44 mds2 kernel: Lustre: Failing over nobackup-MDT
Aug 19 12:37:44 mds2 kernel: Lustre: *** setting obd nobackup-MDT  
device 'unknown-block(8,16)' read-only ***

We have ran e2fsck on the volume, found a few errors and corrected.   
But the problem presists.  We also tried mounting with -o abort_recov   
this resulted in a assertion (lbug) and does not work.
ANy thoughts?  The lines:
Aug 19 12:37:43 mds2 kernel: LustreError: 7525:0:(llog_lvfs.c: 
612:llog_lvfs_create()) error looking up logfile 0xf150010:0x80d24629:  
rc -2
Aug 19 12:37:43 mds2 kernel: LustreError: 7525:0:(llog_cat.c: 
176:llog_cat_id2handle()) error opening log id 0xf150010:80d24629: rc -2
Aug 19 12:37:43 mds2 kernel: LustreError: 7525:0:(llog_obd.c: 
262:cat_cancel_cb()) Cannot find handle for log 0xf150010

Catch my attention,
Thanks,  we are running 1.6.6


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] Lustre featured on podcast (HT: Andreas Dilger)

2009-08-03 Thread Brock Palen

Thanks to Andreas for taking an hour out to talk with Jeff Squyres and  
myself (Brock Palen) about the Lustre cluster filesystem on our  
podcast www.rce-cast.com,

You can find the whole show at:
http://www.rce-cast.com/index.php/Podcast/rce-14-lustre-cluster-filesystem.html

Thanks again!
If any of you have requests of topics you would like to hear please  
let me know!

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Lustre featured on podcast (HT: Andreas Dilger)

2009-08-03 Thread Brock Palen

http://en.wikipedia.org/wiki/Nagle%27s_algorithm

Looks like you intentionally hold up data to try to make fatter  
payloads in packets so they are not 99% header/crc data.  Sounds like  
a way to make latency bad.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Aug 3, 2009, at 8:20 PM, Mag Gam wrote:

 Very nice.

 15:54, what is Nagle ?

 He didn't say anything about SNS, but changeLogs seems very promising!



 On Mon, Aug 3, 2009 at 8:55 AM, Brock Palenbro...@umich.edu wrote:
 Thanks to Andreas for taking an hour out to talk with Jeff Squyres  
 and
 myself (Brock Palen) about the Lustre cluster filesystem on our
 podcast www.rce-cast.com,

 You can find the whole show at:
 http://www.rce-cast.com/index.php/Podcast/rce-14-lustre-cluster-filesystem.html

 Thanks again!
 If any of you have requests of topics you would like to hear please
 let me know!

 Brock Palen
 www.umich.edu/~brockp
 Center for Advanced Computing
 bro...@umich.edu
 (734)936-1985



 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss




___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] x4540 (thor) panic

2009-06-15 Thread Brock Palen


On Jun 15, 2009, at 11:44 AM, Nirmal Seenu wrote:

 We have been running the Lustre servers on a machine with Nvidia
 chipset(nVidia Corporation MCP55 Ethernet (rev a3)) for well over a  
 year
 now, the following two options seems to work the best on these  
 servers:

 options forcedeth max_interrupt_work=50 optimization_mode=1

Thanks we put those in place, and disabled bonding for now (running  
on one over taxed gig-e port).

We also tried noapic because of some notes online for the crashes we  
were seeing, but that does not let the MPT disk controllers in the  
machine startup. (sets al drives offline).

Thanks for the note,

Brock


 optimization_mode enables Interrupt coalescing.

 Nirmal
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] Lustre on Podcast?

2009-06-10 Thread Brock Palen

I host an HPC podcast along with Jeff Squyres at www.rce-cast.com

We would like to invite Lustre to be the next guest on the show.  
Please contact me on or off list if you would like to do this, and if  
so who should be the point of contact from the Lustre group.

Thanks!


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] OpenMX

2009-05-29 Thread Brock Palen

I had the dev of OpenMX on my podcast (www.rce-cast.com) this got me  
thinking,  has anyone ever tried OpenMX with Lustre? In theory it  
should work, but it wasn't the case with some other tools when asking  
around.

Note we have not tried OpenMX yet, but will evaluate it soon.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] checking lustre health

2009-05-06 Thread Brock Palen

I am writing a small script to monitor the health of the lustre  
servers by reading

/proc/fs/lustre/health_check

Is the regex

^healthy$

enough to make sure that I am notified if it ever changes?

Should there be any other locations I should check for lustre errors  
that should be acted on?


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] SELinux and lustre clients

2009-03-17 Thread Brock Palen

It has been stated on the list before that the lustre servers are not  
compatible with SELinux,  but what about clients?

We have some post-processing desktops that are clients of our lustre  
system. We don't have control over this load, and they are dedicated  
to using SELinux.

Redhat says it is a lustre problem, after working on it a few months  
with them:
https://bugzilla.redhat.com/show_bug.cgi?id=489583

Is this the case?  Has anyone managed to run lustre clients on  
systems with SELinux enabled?


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] RHEL4 build of lustre patched e2fsprogs

2009-03-16 Thread Brock Palen

I am trying to install the a newer version of e2fsprogs to see if the  
e2scan in newer versions were built with sqlite (the rpm I got when I  
built the cluster did not).

The new rpm:
e2fsprogs-1.40.11.sun1-0redhat.x86_64.rpm

Appears to built against versions of berkdb  (not sqlite now?) that  
is newer than is part of RHEL4.
error: Failed dependencies:
 libc.so.6(GLIBC_2.4)(64bit) is needed by  
e2fsprogs-1.40.11.sun1-0redhat.x86_64
 libdb-4.3.so()(64bit) is needed by  
e2fsprogs-1.40.11.sun1-0redhat.x86_64
 rtld(GNU_HASH) is needed by  
e2fsprogs-1.40.11.sun1-0redhat.x86_64


up2date says that db4-4.2  is all that is available of rhel4 stock  
from redhat with updates. Not 4.3  not good.
The libc error is funny also because as far as I can tell /lib64/ 
libc.so.6  is just that...

In any-case, I could only install if I said nodeps,  but that is  
getting just silly.

Does anyone have a working patched e2fsprogs from rhel4?

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] e2scan for cleaning scratch space

2009-03-04 Thread Brock Palen

e2scan will show me all the files that have changed from a date, but  
I want to know all the files that have not changed sense some date.

The goal is to make a system for purging scratch spaces that is fast,  
and minimum wear on the filesystem.
How are groups doing this now?  Are you using e2scan?
Is there a way to have e2scan not only list the file but also the  
mtime/ctime in the log file, so that we can sort oldest to newest?

Thank  you!


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] e2scan for cleaning scratch space

2009-03-04 Thread Brock Palen

The e2scan shipped from sun's rpms does not support sqlite3 out of  
the box:

rpm -qf /usr/sbin/e2scan
e2fsprogs-1.40.7.sun3-0redhat

e2scan: sqlite3 was not detected on configure, database creation is  
not supported

Should I just rebuilt only e2scan?

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Mar 4, 2009, at 2:19 PM, Daire Byrne wrote:

 Brock,

 - Brock Palen bro...@umich.edu wrote:

 e2scan will show me all the files that have changed from a date, but
 I want to know all the files that have not changed sense some date.
 The goal is to make a system for purging scratch spaces that is fast,
 and minimum wear on the filesystem.

 How are groups doing this now?  Are you using e2scan?
 Is there a way to have e2scan not only list the file but also the
 mtime/ctime in the log file, so that we can sort oldest to newest?

 e2scan can dump it's findings to a sqlite DB which has the ctime/mtime
 info in it. But you'll need to write some logic to construct the
 filepaths because everything is stored with inode number as the index.
 There is code in e2scan that can probably be recycled for that purpose
 though. So I suppose you would get e2scan to create the DB and then a
 custom app would search by ctime/mtime and spit out the full file  
 path.

 Daire



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] rdac configuration, please help

2009-02-27 Thread Brock Palen

Actually I did this again recently and the directions still work,   
Here is what I have on our internal Wiki here at Michigan,

begin
the sun array does not work with redhats DM-multipath rpm. Download  
the linuxrdac source from sun and the lustre kernel source code.  
Install the kernel source and link to  /usr/src/linux

rpm -ivh kernel-lustre-source-2.6.9-67.0.7.EL_lustre.1.6.5.1.x86_64.rpm
ln -s linux-2.6.9-67.0.7.EL_lustre.1.6.5.1 linux
cd /usr/src/linux
make mrproper
cp /boot/config-`uname -r` .config
make oldconfig
make dep
make modules

There will be two directories, one ending in  -obj  and one not. The  
directories  scripts/mod scripts/genksyms  need to be copied to the  
one ending in  -obj/scripts .

Once done untar the linuxrdac package and copy in a working   
Makefile  provided by    This  Makefile  will not install  
right, you will need to comment out the install of mppiscsi_umountall  
It is not needed.

Makefile_linuxrdac-09.01.B2.74 rdac-LINUX-09.01.B2.40-source.tar.gz

make clean
make
make uninstall
make
make install
yes

Edit  grub.conf  to use the mpp initrd over the standard one:

initrd /initrd-2.6.9-67.0.7.EL_lustre.1.6.5.1smp.img

To:

initrd /mpp-2.6.9-67.0.7.EL_lustre.1.6.5.1smp.img

LUNs are accessible form a single SCSI block device, failover happens  
in a few seconds, but not right away. CAM should notify you.

[r...@mds1 scripts]# /opt/mpp/lsvdev
 Array Name  Lunsd device
 -
 mds-raid0 - /dev/sdb
 mds-raid1 - /dev/sdc
 mds-raid2 - /dev/sdd


I hope that is enough detail for you,
/end

Again, sun sold us this array, but the sun packaged kernels didn't  
come with support for it, annoying.
Maybe sun in the future will push their stuff into DM-Multipath, or  
just package it with lustre.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Feb 27, 2009, at 6:34 PM, Adint, Eric (CIV) wrote:

 ok at this point im desparate
 i have a rocks cluster with the SUN 4Gb FC cards bassed on qla22xx  
 drivers with a Storagetek 6140
 i am trying to build the folowing rdac to the lustre kernel  
 2.6.18-92.1.10.el5_lustre.1.6.6smp
 using the following source
 linuxrdac-09.02.C2.13
 the error i get is[r...@nas-0-0 linuxrdac-09.02.C2.13]# make
 make V=0 -C/lib/modules/2.6.18-92.1.10.el5_lustre.1.6.6smp/build   
 M=/root/linuxrdac-09.02.C2.13 MODVERDIR=/lib/modules/ 
 2.6.18-92.1.10.el5_lustre.1.6.6smp/build/.tmp_versions SUBDIRS=/ 
 root/linuxrdac-09.02.C2.13 modules
 make[1]: Entering directory `/usr/src/ 
 linux-2.6.18-92.1.10.el5_lustre.1.6.6-obj/x86_64/smp'
 make -C ../../../linux-2.6.18-92.1.10.el5_lustre.1.6.6 O=../ 
 linux-2.6.18-92.1.10.el5_lustre.1.6.6-obj/x86_64/smp modules

   ERROR: Kernel configuration is invalid.
  include/linux/autoconf.h or include/config/auto.conf are  
 missing.
  Run 'make oldconfig  make prepare' on kernel src to fix it.

   CC [M]  /root/linuxrdac-09.02.C2.13/mppLnx26p_upper.o
 /bin/sh: scripts/genksyms/genksyms: No such file or directory
 make[4]: *** [/root/linuxrdac-09.02.C2.13/mppLnx26p_upper.o] Error 1
 make[3]: *** [_module_/root/linuxrdac-09.02.C2.13] Error 2
 make[2]: *** [modules] Error 2
 make[1]: *** [modules] Error 2
 make[1]: Leaving directory `/usr/src/ 
 linux-2.6.18-92.1.10.el5_lustre.1.6.6-obj/x86_64/smp'
 make: *** [mppUpper] Error 2

 i have tried the following suggestion from lustre
 http://www.mail-archive.com/lustre-discuss@lists.lustre.org/ 
 msg01682.html
 i may not have changed the information enough
 does anyone know if it is neccesary to recompile the rdac and if  
 so, is there a comprehensive lustre howto on how to compile kernel  
 modules.
 I thank you in advance for any help




 Eric Adint
 ehad...@nps.edu
 High Performance Computing Specialist
 Naval Postgraduate School
 833 Dyer Road Bldg 232 Room 139a
 Monterey Ca 93943
 831-656-3440

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Recovery without end

2009-02-25 Thread Brock Palen

We used to do something similar, and still had issues,

Upgrading all servers (2 OSS's 7 OSTs each) and clients (800)  to  
1.6.6 fixed all our issues, we run default timeout's and default  
everything really, no issues.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Feb 25, 2009, at 11:22 AM, Charles Taylor wrote:

 I'm going to pipe in here.We too use a very large (1000) timeout
 value.   We have two separate luster file systems one of them consists
 of two rather beefy OSSs with 12 OSTs each (FalconIII FC-SATA RAID).
 The other consists of 8 OSSs with 3 OSTs each (Xyratex 4900FC).   We
 have about 500 clients and support both tcp and o2ib NIDS.   We run
 Lustre 1.6.4.2 on a patched 2.6.18-8.1.14 CentOS/RH kernel.   It has
 worked *very* well for us for over a year now - very few problems with
 very good performance under very heavy loads.

 We've tried setting our timeout to lower values but settled on the
 1000 value (despite the long recovery periods) because if we don't,
 our lustre connectivity starts to breakdown and our mounts come and go
 with errors like transport endpoint failure or transport endpoint
 not connected or some such (its been a while now).File system
 access comes and goes randomly on nodes.We tried many tunings and
 looked for other sources of  problems (underlying network issues).
 Ultimately, the only thing we found that fixed this was to extend the
 timeout value.

 I know you will be tempted to tell us that our network must be flakey
 but it simply is not.   We'd love to understand why we need such a
 large timeout value and why, if we don't use a large value, we see
 these transport end-point failures.However, after spending several
 days trying to understand and resolve the issue, we finally just
 accepted the long timeout as a suitable workaround.

 I wonder if there are others who have silently done the same.   We'll
 be upgrading to 1.6.6 or 1.6.7 in the not-too-distant future.Maybe
 then we'll be able to do away with the long timeout value but until
 then, we need it.  :(

 Just my two cents,

 Charlie Taylor
 UF HPC Center

 On Feb 25, 2009, at 11:03 AM, Brian J. Murrell wrote:

 On Wed, 2009-02-25 at 16:09 +0100, Thomas Roth wrote:

 Our /proc/sys/lustre/timeout is 1000

 That's way to high.  Long recoveries are exactly the reason you don't
 want this number to be huge.

 - there has been some debate on
 this large value here, but most other installation will not run in a
 network environment with a setup as crazy as ours.

 What's so crazy about your set up?  Unless your network is very flaky
 and/or you have not tuned your OSSes properly, there should be no  
 need
 for such a high timeout and if there is you need to address the
 problems
 requiring it.

 Putting the timeout
 to 100 immediately results in Transport endpoint errors,
 impossible to
 run Lustre like this.

 300 is the max that we recommend and we have very large production
 clusters that use such values successfully.

 Since this is a 1.6.5.1 system, I activated the adaptive timeouts
 - and
 put them to equally large values,
 /sys/module/ptlrpc/parameters/at_max = 6000
 /sys/module/ptlrpc/parameters/at_history = 6000
 /sys/module/ptlrpc/parameters/at_early_margin = 50
 /sys/module/ptlrpc/parameters/at_extra = 30

 This is likely not good as well.  I will let somebody more
 knowledgeable
 about AT comment in detail though.  It's a new feature and not  
 getting
 wide use at all yet, so the real-world experience is still low.

 b.

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Lustre NOT HEALTHY

2009-01-14 Thread Brock Palen

Ok thanks,

It happened again last night, sooner than normal.  I will send a new  
message with the details.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Jan 13, 2009, at 11:09 PM, Cliff White wrote:

 Brock Palen wrote:
 How common is it for servers to go NOT HEALTHY?  I feel it is   
 happening much more often than it should be with us.  A few times  
 a  month.
 It should not happen at all, in the normal case. It indicates a  
 problem.

 If this happens, we reboot the servers.  Should we do something   
 else?  Maybe it depends on what the problem was?

 Well, determining what the actual problem that caused the NOT  
 HEALTHY would be quite useful, yes. I would not just reboot.

 -Examine consoles of _all_ servers for any error indications
 - Examine syslogs of _all_ servers for any LustreErrors or LBUG
 - Check network and hardware health. Are your disks happy?
 Is your network dropping packets?

 Try to figure out what was happening on the cluster. Does this  
 relate to
 a specific user workload or system load condition? Can you reproduce
 the situation? Does it happen at a specific time of day, time of  
 month?
 If we should not be getting NOT HEALTHY that often, what  
 information  should I collect to report to CFS?

 The lustre-diagnostics package is good start for general system  
 config.
 Beyond that, most of what we would need is listed above.
 cliffw

 Brock Palen
 www.umich.edu/~brockp
 Center for Advanced Computing
 bro...@umich.edu
 (734)936-1985
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss




___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] LBUG ASSERTION(lock-l_resource != NULL) failed

2009-01-14 Thread Brock Palen

I am having servers LBUG on a regular basis, Clients are running  
1.6.6 patchless on RHEL4,  servers are running RHEL4 with 1.6.5.1  
RPM's from the download page.  All connection is over Ethernet,   
Servers are x4600's.

The OSS that BUG'd has in its log:

Jan 13 16:35:39 oss2 kernel: LustreError: 10243:0:(ldlm_lock.c: 
430:__ldlm_handle2lock()) ASSERTION(lock-l_resource != NULL) failed
Jan 13 16:35:39 oss2 kernel: LustreError: 10243:0:(tracefile.c: 
432:libcfs_assertion_failed()) LBUG
Jan 13 16:35:39 oss2 kernel: Lustre: 10243:0:(linux-debug.c: 
167:libcfs_debug_dumpstack()) showing stack for process 10243
Jan 13 16:35:39 oss2 kernel: ldlm_cn_08R  running task   0  
10243  1 10244  7776 (L-TLB)
Jan 13 16:35:39 oss2 kernel:  a0414629  
0103d83c7e00 
Jan 13 16:35:39 oss2 kernel:0101f8c88d40 a021445e  
0103e315dd98 0001
Jan 13 16:35:39 oss2 kernel:0101f3993ea0 
Jan 13 16:35:39 oss2 kernel: Call Trace:a0414629 
{:ptlrpc:ptlrpc_server_handle_request+2457}
Jan 13 16:35:39 oss2 kernel:a021445e 
{:libcfs:lcw_update_time+30} 80133855{__wake_up_common+67}
Jan 13 16:35:39 oss2 kernel:a0416d05 
{:ptlrpc:ptlrpc_main+3989} a0415270 
{:ptlrpc:ptlrpc_retry_rqbds+0}
Jan 13 16:35:39 oss2 kernel:a0415270 
{:ptlrpc:ptlrpc_retry_rqbds+0} a0415270 
{:ptlrpc:ptlrpc_retry_rqbds+0}
Jan 13 16:35:39 oss2 kernel:80110de3{child_rip+8}  
a0415d70{:ptlrpc:ptlrpc_main+0}
Jan 13 16:35:39 oss2 kernel:80110ddb{child_rip+0}
Jan 13 16:35:40 oss2 kernel: LustreError: dumping log to /tmp/lustre- 
log.1231882539.10243


At the same time a client (nyx346) lost contact with that oss, and is  
never allowed to reconnect.
Client /var/log/message:

Jan 13 16:37:20 nyx346 kernel: Lustre: nobackup-OST000d- 
osc-01022c2a7800: Connection to service nobackup-OST000d via nid  
10.164.3@tcp was lost; in progress operations using this service  
will wait for recovery to complete.Jan 13 16:37:20 nyx346 kernel:  
Lustre: Skipped 6 previous similar messagesJan 13 16:37:20 nyx346  
kernel: LustreError: 3889:0:(ldlm_request.c:996:ldlm_cli_cancel_req 
()) Got rc -11 from cancel RPC: canceling anywayJan 13 16:37:20  
nyx346 kernel: LustreError: 3889:0:(ldlm_request.c: 
1605:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -11Jan 13 16:37:20  
nyx346 kernel: LustreError: 11-0: an error occurred while  
communicating with 10.164.3@tcp. The ost_connect operation failed  
with -16Jan 13 16:37:20 nyx346 kernel: LustreError: Skipped 10  
previous similar messages
Jan 13 16:37:45 nyx346 kernel: Lustre: 3849:0:(import.c: 
410:import_select_connection()) nobackup-OST000d- 
osc-01022c2a7800: tried all connections, increasing latency to 7s

Even now the server(OSS) is refusing connection to OST00d,  with the  
message:

Lustre: 9631:0:(ldlm_lib.c:760:target_handle_connect()) nobackup- 
OST000d: refuse reconnection from 145a1ec5-07ef- 
f7eb-0ca9-2a2b6503e...@10.164.1.90@tcp to 0x0103d5ce7000; still  
busy with 2 active RPCs


If I reboot the OSS, the OST's on it go though recovery like normal,  
and then the client is fine.

Network looks clean, found one machine with lots of dropped packets  
between the servers, but that is not the client in question.

Thank you!  If it happens again, and I find any other data I will let  
you know.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] Lustre NOT HEALTHY

2009-01-13 Thread Brock Palen

How common is it for servers to go NOT HEALTHY?  I feel it is  
happening much more often than it should be with us.  A few times a  
month.

If this happens, we reboot the servers.  Should we do something  
else?  Maybe it depends on what the problem was?

If we should not be getting NOT HEALTHY that often, what information  
should I collect to report to CFS?


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] Lustre Intelligence?

2008-12-10 Thread Brock Palen

So question, had a user, thought his problem was disk system, turns  
out he was just OOM machine, his IO code though looked like this:

   for(i=0;ikloc;i++)
   {
 sprintf((buffer[39]),%d,
  i+(int)floor(rank*(double)k/(double)numprocs)+1);
 f1 = fopen(buffer,w);
 for(j=0;jN;j++)
 {
   fprintf(f1,%e\n,u[i][j]);
 }
 fclose(f1);
   }


So how I read this every processor (every processor calls this  
function and writes to its own sets of files)  is writing one double  
at a time to their files.  IO performance though was still quite good.

I enabled extents_stats on rank0 of this job and ran it,  Here Is  
what I ended up with (stats were zeroed, and only job running on client)

   extentscalls% cum%  |  calls% cum%
0K -4K :  1244  |  0 
00
4K -8K :004  |  0 
00
8K -   16K :   004  |  0 
00
   16K -   32K :  004  |  0 
00
   32K -   64K :  004  |  0 
00
   64K -  128K : 004  |  000
  128K -  256K :004  |  000
  256K -  512K :004  |  000
  512K - 1024K :  004  |  411
1M -2M :136   47   51  |220   98  100
2M -4M :140   48  100  |  00  100


So 98% of writes and reads (read code is similar and reads in about  
2GB this way)  were all 1-4MB.  Is this lustre showing its'  
preference for 1MB IO ops? Even though the code wanted to do 8bytes  
at a time, lustre cleaned it up?   Or did LInux do this some place?

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] lustre/abaqus tweaks for lustre?

2008-11-26 Thread Brock Palen

I have seen a few papers around, but does anyone have comments on how  
to optimize ether lustre or abaqus to use lustre for scratch?

I see reads coming in at only 20MB/s and IO Wait gets quite high on  
the client.
I know this is probably not enough information, but is there any  
knobs people have twisted on their own systems for this that I can be  
informed on?


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Clients fail every now and again,

2008-11-18 Thread Brock Palen

Thanks,

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



On Nov 18, 2008, at 4:47 PM, Andreas Dilger wrote:

 On Nov 18, 2008  12:14 -0500, Brock Palen wrote:
 if that is the bug causing this, is the fix till we upgrade to the
 newer lustre, to set statahead_max=0 again?

 Yes, this is another statahead bug.

 I see this same behavior this morning on a compute node.

 Brock Palen
 www.umich.edu/~brockp
 Center for Advanced Computing
 [EMAIL PROTECTED]
 (734)936-1985



 On Nov 16, 2008, at 10:49 PM, Yong Fan wrote:

 Brock Palen 写道:
 We consistantly see random ocurances of a client being kicked
 out,  and while lustre says it tries to reconnect, it almost never
 can  without a reboot:


 Maybe you can check:
 https://bugzilla.lustre.org/show_bug.cgi?id=15927

 Regards!
 --
 Fan Yong
 Nov 14 18:28:18 nyx-login1 kernel: LustreError: 14130:0:(import.c:
 226:ptlrpc_invalidate_import()) nobackup-MDT_UUID: rc = -110
 waiting for callback (3 != 0)
 Nov 14 18:28:18 nyx-login1 kernel: LustreError: 14130:0:(import.c:
 230:ptlrpc_invalidate_import()) @@@ still on sending list
 [EMAIL PROTECTED] x979024/t0 o101-nobackup-
 [EMAIL PROTECTED]@tcp:12/10 lens 448/1184 e 0 to 100 dl
 1226700928 ref 1 fl Rpc:RES/0/0 rc -4/0
 Nov 14 18:28:18 nyx-login1 kernel: LustreError: 14130:0:(import.c:
 230:ptlrpc_invalidate_import()) Skipped 1 previous similar
 messageNov  14 18:28:18 nyx-login1 kernel: Lustre: nobackup-
 MDT- mdc-0100f7ef0400: Connection restored to service
 nobackup-MDT  using nid [EMAIL PROTECTED]
 Nov 14 18:30:32 nyx-login1 kernel: LustreError: 11-0: an error
 occurred while communicating with [EMAIL PROTECTED] The
 mds_statfs  operation failed with -107
 Nov 14 18:30:32 nyx-login1 kernel: Lustre: nobackup-MDT-
 mdc-0100f7ef0400: Connection to service nobackup-MDT via
 nid  [EMAIL PROTECTED] was lost; in progress operations using this
 service  will wait for recovery to complete.
 Nov 14 18:30:32 nyx-login1 kernel: LustreError: 167-0: This
 client  was evicted by nobackup-MDT; in progress operations
 using this  service will fail.
 Nov 14 18:30:32 nyx-login1 kernel: LustreError: 16523:0:
 (llite_lib.c: 1549:ll_statfs_internal()) mdc_statfs fails: rc = -5
 Nov 14 18:30:35 nyx-login1 kernel: LustreError: 16525:0:(client.c:
 716:ptlrpc_import_delay_req()) @@@ IMP_INVALID
 [EMAIL PROTECTED]  x983192/t0 o41-nobackup-
 [EMAIL PROTECTED]@tcp:12/10 lens  128/400 e 0 to 100 dl 0
 ref 1 fl Rpc:/0/0 rc 0/0
 Nov 14 18:30:35 nyx-login1 kernel: LustreError: 16525:0:
 (llite_lib.c: 1549:ll_statfs_internal()) mdc_statfs fails: rc =  
 -108

 Is there any way to make lustre more robust against these types
 of  failures?  According to the manual (and many times in
 practice, like  rebooting a MDS)  the filesystem will just block
 and comeback.  This  almost never comes back, after a while it
 will say reconnected, but  will fail again right away.

 On the MDS I see:

 Nov 14 18:30:20 mds1 kernel: Lustre: nobackup-MDT: haven't
 heard  from client 1284bfca-91bd-03f6-649c-f591e5d807d5 (at
 [EMAIL PROTECTED]) in 227 seconds. I think it's dead, and I am
 evicting it.
 Nov 14 18:30:28 mds1 kernel: LustreError: 11463:0:(handler.c:
 1515:mds_handle()) operation 41 on unconnected MDS from
 [EMAIL PROTECTED]
 Nov 14 18:30:28 mds1 kernel: LustreError: 11463:0:(ldlm_lib.c:
 1536:target_send_reply_msg()) @@@ processing error (-107)
 [EMAIL PROTECTED] x983190/t0 o41-?@?:0/0 lens 128/0 e 0 to
 0  dl 1226705528 ref 1 fl Interpret:/0/0 rc -107/0
 Nov 14 18:34:15 mds1 kernel: Lustre: nobackup-MDT: haven't
 heard  from client 1284bfca-91bd-03f6-649c-f591e5d807d5 (at
 [EMAIL PROTECTED]) in 227 seconds. I think it's dead, and I am
 evicting it.

 Just keeps kicking it out,  /proc/fs/lustre/health_check on
 client,  and servers are healthy.

 Brock Palen
 www.umich.edu/~brockp
 Center for Advanced Computing
 [EMAIL PROTECTED]
 (734)936-1985



 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss





 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

 Cheers, Andreas
 --
 Andreas Dilger
 Sr. Staff Engineer, Lustre Group
 Sun Microsystems of Canada, Inc.




___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Is patchless ok for EL4 now?

2008-11-06 Thread Brock Palen

We have been running this for a while.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



On Nov 6, 2008, at 10:54 AM, Peter Kjellstrom wrote:

 After reading http://wiki.lustre.org/index.php? 
 title=Patchless_Client it is my
 understanding that it is now (2.6.9-78.0.5.EL and lustre-1.6.6) ok  
 to run
 patchless client on EL4 (64-bit).

 This based on the fact that the problems described on the wiki-page  
 were fixed
 in versions older than mentioned above (last bug/comment was for -55).

 Is this accurate or is the wiki missing information here? (Brian  
 wrote in july
 that EL4 simply was too old...) Anybody running this already?

 Tia,
  Peter
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Is patchless ok for EL4 now?

2008-11-06 Thread Brock Palen

2.6.9-78.0.1.ELsmp

Lustre-1.6.5.1

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



On Nov 6, 2008, at 11:18 AM, Peter Kjellstrom wrote:

 On Thursday 06 November 2008, Brock Palen wrote:
 We have been running this for a while.

 Brock Palen

 Thanks for the data point. What are the exact kernel and lustre  
 versions
 you've been running (presumably) ok?

 /Peter

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] unexpectedly long timeout

2008-11-05 Thread Brock Palen

New error I have never seen before, googling didn't fine much other  
than an error involving IB. This node has IB, but lustre runs over TCP.

Nov  5 02:19:54 nyx668 kernel: Lustre: 4329:0:(niobuf.c: 
305:ptlrpc_unregister_bulk()) @@@ Unexpectedly long timeout: desc  
01041802f600  [EMAIL PROTECTED] x1071812/t0 o4-nobackup- 
[EMAIL PROTECTED]@tcp:6/4 lens 384/480 e 0 to 100 dl  
1225842598 ref 2 fl Rpc:X/0/0 rc 0/0Nov  5 02:19:54 nyx668 kernel:  
Lustre: 4329:0:(niobuf.c:305:ptlrpc_unregister_bulk()) Skipped 1  
previous similar message
Nov  5 02:29:54 nyx668 kernel: Lustre: 4329:0:(niobuf.c: 
305:ptlrpc_unregister_bulk()) @@@ Unexpectedly long timeout: desc  
01041802f600  [EMAIL PROTECTED] x1071812/t0 o4-nobackup- 
[EMAIL PROTECTED]@tcp:6/4 lens 384/480 e 0 to 100 dl  
1225842598 ref 2 fl Rpc:X/0/0 rc 0/0

On the OSS that provides OST000c  The only errors I see from that  
node are the usual, 'can't hear from node'

Nov  4 18:46:02 oss2 kernel: Lustre: 6426:0:(ost_handler.c: 
1270:ost_brw_write()) nobackup-OST000c: ignoring bulk IO comm error  
with [EMAIL PROTECTED] id  
[EMAIL PROTECTED] - client will retry
Nov  4 18:49:42 oss2 kernel: Lustre: nobackup-OST000c: haven't heard  
from client 0d8e8d79-bfac-9d81-a345-39aaf2d4bc0e (at  
[EMAIL PROTECTED]) in 227 seconds. I think it's dead, and I am  
evicting it.
Nov  4 18:49:42 oss2 kernel: Lustre: nobackup-OST000d: haven't heard  
from client 0d8e8d79-bfac-9d81-a345-39aaf2d4bc0e (at  
[EMAIL PROTECTED]) in 227 seconds. I think it's dead, and I am  
evicting it.

Any thoughts?


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Lustre 1.6.5.1 on X4200 and STK 6140 Issues

2008-10-13 Thread Brock Palen

I know you say the only addition was the RDAC for the MDS's I assume  
(we use it also just fine).

When I ran faultmond from suns dcmu rpm (RHEL 4 here)  the x4500's  
would crash like clock work ~48 hours.  For a very simple bit of code  
I was surpised that once when I forgot to turn it on when working on  
the load this would happen.  Just FYI it was unrelated to lustre  
(using provided rpm's no kernel build)  this solved my problem on the  
x4500

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



On Oct 13, 2008, at 4:41 AM, Malcolm Cowe wrote:

 The X4200m2 MDS systems and the X4500 OSS were rebuilt using the  
 stock Lustre packages (Kernel + modules + userspace). With the  
 exception of the RDAC kernel module, no additional software was  
 applied to the systems. We recreated our volumes and ran the  
 servers over the weekend. However, the OSS crashed about 8 hours  
 in. The syslog output is attached to this message.

 Looks like it could be similar to bug #16404, which means patching  
 and rebuilding the kernel. Given my lack of success at trying to  
 build from source, I am again asking for some guidance on how to do  
 this. I sent out the steps I used to try and build from source on  
 the 7th because I was encountering problems and was unable to get a  
 working set of packages. Included in that messages was output from  
 quilt that implies that the kernel patching process was not working  
 properly.


 Regards,

 Malcolm.

 -- 
 6g_top.gif
 Malcolm Cowe
 Solutions Integration Engineer

 Sun Microsystems, Inc.
 Blackness Road
 Linlithgow, West Lothian EH49 7LR UK
 Phone: x73602 / +44 1506 673 602
 Email: [EMAIL PROTECTED]
 6g_top.gif
 Oct 10 06:49:39 oss-1 kernel: LDISKFS FS on md15, internal journal
 Oct 10 06:49:39 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
 ordered data mode.
 Oct 10 06:53:42 oss-1 kernel: kjournald starting.  Commit interval  
 5 seconds
 Oct 10 06:53:42 oss-1 kernel: LDISKFS FS on md16, internal journal
 Oct 10 06:53:42 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
 ordered data mode.
 Oct 10 06:57:49 oss-1 kernel: kjournald starting.  Commit interval  
 5 seconds
 Oct 10 06:57:49 oss-1 kernel: LDISKFS FS on md17, internal journal
 Oct 10 06:57:49 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
 ordered data mode.
 Oct 10 07:44:55 oss-1 faultmond: 16:Polling all 48 slots for drive  
 fault
 Oct 10 07:45:00 oss-1 faultmond: Polling cycle 16 is complete
 Oct 10 07:56:23 oss-1 kernel: Lustre: OBD class driver,  
 [EMAIL PROTECTED]
 Oct 10 07:56:23 oss-LDISKFS-fs: file extents enabled1 kernel:
   Lustre VersionLDISKFS-fs: mballoc enabled
 : 1.6.5.1
 Oct 10 07:56:23 oss-1 kernel: Build Version:  
 1.6.5.1-1969123119-PRISTINE-.cache.OLDRPMS.20080618230526.linux- 
 smp-2.6.9-67.0.7.EL_lustre.1.6.5.1.x86_64-2.6.9-67.0.7.EL_lustre. 
 1.6.5.1smp
 Oct 10 07:56:24 oss-1 kernel: Lustre: Added LNI [EMAIL PROTECTED]  
 [8/64]
 Oct 10 07:56:24 oss-1 kernel: Lustre: Lustre Client File System;  
 [EMAIL PROTECTED]
 Oct 10 07:56:24 oss-1 kernel: kjournald starting.  Commit interval  
 5 seconds
 Oct 10 07:56:24 oss-1 kernel: LDISKFS FS on md11, external journal  
 on md21
 Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
 journal data mode.
 Oct 10 07:56:24 oss-1 kernel: kjournald starting.  Commit interval  
 5 seconds
 Oct 10 07:56:24 oss-1 kernel: LDISKFS FS on md11, external journal  
 on md21
 Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
 journal data mode.
 Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: file extents enabled
 Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: mballoc enabled
 Lustre: Request x1 sent from [EMAIL PROTECTED] to NID  
 [EMAIL PROTECTED] 5s ago has timed out (limit 5s).
 Oct 10 07:56:30 oss-1 kernel: Lustre: Request x1 sent from  
 [EMAIL PROTECTED] to NID [EMAIL PROTECTED] 5s ago has timed  
 out (limit 5s).
 LustreError: 4685:0:(events.c:55:request_out_callback()) @@@ type  
 4, status -113  [EMAIL PROTECTED] x3/t0 o250- 
 [EMAIL PROTECTED]@o2ib_1:26/25 lens 240/400 e 0 to 5 dl  
 1223621815 ref 2 fl Rpc:/0/0 rc 0/0
 Lustre: Request x3 sent from [EMAIL PROTECTED] to NID  
 [EMAIL PROTECTED] 0s ago has timed out (limit 5s).
 LustreError: 18125:0:(obd_mount.c:1062:server_start_targets())  
 Required registration failed for lfs01-OST: -5
 LustreError: 15f-b: Communication error with the MGS.  Is the MGS  
 running?
 LustreError: 18125:0:(obd_mount.c:1597:server_fill_super()) Unable  
 to start targets: -5
 LustreError: 18125:0:(obd_mount.c:1382:server_put_super()) no obd  
 lfs01-OST
 LustreError: 18125:0:(obd_mount.c:119:server_deregister_mount())  
 lfs01-OST not registered
 LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success)
 LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0  
 breaks, 0 lost
 LDISKFS-fs: mballoc: 0 generated and it took 0
 LDISKFS-fs: mballoc: 0 preallocated, 0 discarded
 Oct 10 07:56:50 oss-1

Re: [Lustre-discuss] Lustre 1.6.5.1 on X4200 and STK 6140 Issues

2008-10-13 Thread Brock Palen

I never uninstalled it (i still use some of the tools in it)   
Faultmond is a service,  just chkconfig it off.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



On Oct 13, 2008, at 11:03 AM, Malcolm Cowe wrote:

 Brock Palen wrote:

 I know you say the only addition was the RDAC for the MDS's I  
 assume (we use it also just fine).
 Yes, the MDS's share a STK 6140.
 When I ran faultmond from suns dcmu rpm (RHEL 4 here) the x4500's  
 would crash like clock work ~48 hours. For a very simple bit of  
 code I was surpised that once when I forgot to turn it on when  
 working on the load this would happen. Just FYI it was unrelated  
 to lustre (using provided rpm's no kernel build) this solved my  
 problem on the x4500
 The DCMU RPM is installed. I didn't explicitly install this, so it  
 must have been bundled in with the SIA CD... I'll try removing the  
 rpm to see what happens. Thanks for the heads up.

 Regards,

 Malcolm.

 Brock Palen www.umich.edu/~brockp Center for Advanced Computing  
 [EMAIL PROTECTED] (734)936-1985 On Oct 13, 2008, at 4:41 AM,  
 Malcolm Cowe wrote:

 The X4200m2 MDS systems and the X4500 OSS were rebuilt using the  
 stock Lustre packages (Kernel + modules + userspace). With the  
 exception of the RDAC kernel module, no additional software was  
 applied to the systems. We recreated our volumes and ran the  
 servers over the weekend. However, the OSS crashed about 8 hours  
 in. The syslog output is attached to this message. Looks like it  
 could be similar to bug #16404, which means patching and  
 rebuilding the kernel. Given my lack of success at trying to  
 build from source, I am again asking for some guidance on how to  
 do this. I sent out the steps I used to try and build from source  
 on the 7th because I was encountering problems and was unable to  
 get a working set of packages. Included in that messages was  
 output from quilt that implies that the kernel patching process  
 was not working properly. Regards, Malcolm. -- 6g_top.gif  
 Malcolm Cowe Solutions Integration Engineer Sun Microsystems,  
 Inc. Blackness Road Linlithgow, West Lothian EH49 7LR UK Phone:  
 x73602 / +44 1506 673 602 Email: [EMAIL PROTECTED]  
 6g_top.gif Oct 10 06:49:39 oss-1 kernel: LDISKFS FS on md15,  
 internal journal Oct 10 06:49:39 oss-1 kernel: LDISKFS-fs:  
 mounted filesystem with ordered data mode. Oct 10 06:53:42 oss-1  
 kernel: kjournald starting. Commit interval 5 seconds Oct 10  
 06:53:42 oss-1 kernel: LDISKFS FS on md16, internal journal Oct  
 10 06:53:42 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
 ordered data mode. Oct 10 06:57:49 oss-1 kernel: kjournald  
 starting. Commit interval 5 seconds Oct 10 06:57:49 oss-1 kernel:  
 LDISKFS FS on md17, internal journal Oct 10 06:57:49 oss-1  
 kernel: LDISKFS-fs: mounted filesystem with ordered data mode.  
 Oct 10 07:44:55 oss-1 faultmond: 16:Polling all 48 slots for  
 drive fault Oct 10 07:45:00 oss-1 faultmond: Polling cycle 16 is  
 complete Oct 10 07:56:23 oss-1 kernel: Lustre: OBD class driver,  
 [EMAIL PROTECTED] Oct 10 07:56:23 oss-LDISKFS-fs: file extents  
 enabled1 kernel: Lustre VersionLDISKFS-fs: mballoc enabled :  
 1.6.5.1 Oct 10 07:56:23 oss-1 kernel: Build Version:  
 1.6.5.1-1969123119-PRISTINE-.cache.OLDRPMS. 
 20080618230526.linux- smp-2.6.9-67.0.7.EL_lustre. 
 1.6.5.1.x86_64-2.6.9-67.0.7.EL_lustre. 1.6.5.1smp Oct 10 07:56:24  
 oss-1 kernel: Lustre: Added LNI [EMAIL PROTECTED] [8/64] Oct 10  
 07:56:24 oss-1 kernel: Lustre: Lustre Client File System;  
 [EMAIL PROTECTED] Oct 10 07:56:24 oss-1 kernel: kjournald  
 starting. Commit interval 5 seconds Oct 10 07:56:24 oss-1 kernel:  
 LDISKFS FS on md11, external journal on md21 Oct 10 07:56:24  
 oss-1 kernel: LDISKFS-fs: mounted filesystem with journal data  
 mode. Oct 10 07:56:24 oss-1 kernel: kjournald starting. Commit  
 interval 5 seconds Oct 10 07:56:24 oss-1 kernel: LDISKFS FS on  
 md11, external journal on md21 Oct 10 07:56:24 oss-1 kernel:  
 LDISKFS-fs: mounted filesystem with   journal data mode. Oct 10  
 07:56:24 oss-1 kernel: LDISKFS-fs: file extents enabled Oct 10  
 07:56:24 oss-1 kernel: LDISKFS-fs: mballoc enabled Lustre:  
 Request x1 sent from [EMAIL PROTECTED] to NID  
 [EMAIL PROTECTED] 5s ago has timed out (limit 5s). Oct 10  
 07:56:30 oss-1 kernel: Lustre: Request x1 sent from  
 [EMAIL PROTECTED] to NID [EMAIL PROTECTED] 5s ago has  
 timed out (limit 5s). LustreError: 4685:0:(events.c: 
 55:request_out_callback()) @@@ type 4, status -113  
 [EMAIL PROTECTED] x3/t0 o250-

 [EMAIL PROTECTED]@o2ib_1:26/25 lens 240/400 e 0 to 5 dl
 1223621815 ref 2 fl Rpc:/0/0 rc 0/0 Lustre: Request x3 sent from  
 [EMAIL PROTECTED] to NID [EMAIL PROTECTED] 0s ago has  
 timed out (limit 5s). LustreError: 18125:0:(obd_mount.c: 
 1062:server_start_targets()) Required registration failed for  
 lfs01-OST: -5 LustreError: 15f-b: Communication error with  
 the MGS. Is the MGS

Re: [Lustre-discuss] Getting random No space left on device (28)

2008-10-12 Thread Brock Palen

On any client

lfs df -h

Show you all your OST usage for all your OST in one command.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



On Oct 12, 2008, at 3:24 PM, Kevin Van Maren wrote:

 Sounds like one (or more) of your existing OSTs are out of space.  The
 OSTs are assigned at file creation
 time, and Lustre will return an error if you cannot allocate space on
 the OST for a file you are writing.
 Do a df on your OSS nodes.

 Lustre does not re-stripe files; you may have to manually move (cp/rm)
 some files to the new OST
 to rebalance the file system.  It is a manual process, but you can use
 lfs setstripe for force a specific OST,
 and use lfs getstripe to see where a file's storage is allocated.

 Kevin


 Mag Gam wrote:
 We have recently added another 1TB to a filesystem. We added a new  
 OST
 and mounted the OST. On the clients we do a lfs df -h and we see the
 new space has been acquired. Also, lfs df -i shows enough inodes.
 However, we randomly see 'No Space left on device (28) when we run
 our  jobs. But if we resubmit the jobs it works again.

 Is there anything special we need to do, after we mount up a new OST?

 TIA
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Adding IB to tcp only cluster

2008-10-10 Thread Brock Palen

On Oct 10, 2008, at 2:45 PM, Brian J. Murrell wrote:

 On Fri, 2008-10-10 at 11:08 -0400, Brock Palen wrote:
 We have added a few IB nodes to our cluster (about 70 our of 600  
 nodes).
 What would it take to have lustre go over IB as well as tcp for the
 rest of the hosts?

 So I'm assuming that at least some of these IB nodes are servers (i.e.
 OSS) then.

Not right now, the question was because we were thinking abou tit


 would only the oss need HCA's?  or does the MDS need to have hca's
 also?

 No.  There is no requirement that the MDS use IB just because (some)
 OSSes use it.

Really?  So given that lnet does the best path and it is not part of  
lustre its self.
So if we only hook some of the OSS by IB,  is there a way to have a  
user (who is a user of IB) IO prefer the IB connected OSS's.

If that is not possible now, I think some of the patches announced  
that are for 1.8 or 2.0 had the ability to select a OSS for only  
given users.  Am I correct?


 It would be nice to have MDS traffic over TCP (fast enough for
 this user) and IO over IB.

 Fair enough.

 How does lustre figure out the preferred path?

 An LNET node with multiple paths to another LNET node chooses the  
 best
 path.  How that decision is made, I'm not so sure, but I tend to think
 that o2iblnd will be preferred over socklnd.

 How can we have the
 nodes figure out IF I have IB talk to oss's over IB else use TCP?

 Assuming you get the configuration right on the nodes, LNET will  
 just do
 that using it's best path algorithm.

 b.


 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] lustre-ldiskfs

2008-09-26 Thread Brock Palen

I ran into this problem my self when sun convoluted download system  
took over hosting the lustre packages.
When I tried to 'wget'  the package, i forgot that sun makes you  
login and thus you download an html error page in place of the rpm.

You will need to download to your machine then upload to the cluster,  
no cmd line download was possible.  If anyone knows how to get around  
this let me know.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



On Sep 26, 2008, at 6:39 AM, Andreas Dilger wrote:
 On Sep 26, 2008  10:26 +0530, Chirag Raval wrote:
 When I am installing the
 lustre-ldiskfs-3.0.4-2.6.9_67.0.7.EL_lustre.1.6.5.1smp.i686.rpm

 I get the following error.



 Can someone please help me what can be wrong I am installing it on  
 CentOS
 4.5

 # rpm -ivh lustre-ldiskfs-3.0.4-2.6.9_67.0.7.EL_lustre. 
 1.6.5.1smp.i686.rpm

 error: open of HTMLHEADTITLEError/TITLE/HEADBODY  
 failed: No such
 file or directory

 error: open of An failed: No such file or directory
 error: open of error failed: No such file or directory
 error: open of occurred failed: No such file or directory
 error: open of while failed: No such file or directory
 error: open of processing failed: No such file or directory
 error: open of your failed: No such file or directory
 error: open of request.p failed: No such file or directory
 error: open of Reference failed: No such file or directory
 error: open of /BODY/HTML failed: No such file or directory

 You downloaded and are trying to install a web page (which itself  
 appears
 to report that you had an error downloading the RPM).

 Cheers, Andreas
 --
 Andreas Dilger
 Sr. Staff Engineer, Lustre Group
 Sun Microsystems of Canada, Inc.

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] l_getgroups: no such user

2008-09-26 Thread Brock Palen

We are getting a bunch of:

l_getgroups: no such user ##

in our log files on the mds.
We keep our /etc/passswd and /etc/group in sync with the clusters  
that mount it.  Only one visulization workstation has users who are  
not in its list.

Problem is I don't see any files owned by those users on the filesystem

find . -uid #

Finds nothing,
Does lustre check if a user just cd's to that directory?  Or is it  
for any user that logs in?
Is it safe to ignore these messages for non cluster users?


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Lustre clients failing, and cant reconnect

2008-09-05 Thread Brock Palen

I had to reboot the MDS to get the problem to go away.
I will watch and see if it reappears. I screwed up and deleted the  
wrong /var/log/messages  So I don't have the messages.

I am watching this issues.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



On Sep 5, 2008, at 10:01 AM, Brian J. Murrell wrote:
 On Fri, 2008-09-05 at 00:15 -0400, Brock Palen wrote:
 Looks like that didn't fix it.  One of the login nodes repeated the
 behavior.

 So what are the messages the client logged when the problem occurred?
 And what, if anything was logged on the MDS at the same time?

 b.

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] Lustre clients failing, and cant reconnect

2008-09-04 Thread Brock Palen

I am having clients lose their connection to the MDS.  Messages on  
the clients look like this:

Sep  4 19:51:30 nyx-login2 kernel: Lustre: nobackup-MDT- 
mdc-0101fc44e800: Connection to service nobackup-MDT via nid  
[EMAIL PROTECTED] was lost; in progress operations using this service  
will wait for recovery to complete.
Sep  4 19:51:30 nyx-login2 kernel: LustreError: 11-0: an error  
occurred while communicating with [EMAIL PROTECTED] The mds_connect  
operation failed with -16

It will keep doing this trying to connect and spiting out mds_connect  
failed -16.  The clients never recover.

On the mds  all I see is:

Lustre: 7653:0:(ldlm_lib.c:760:target_handle_connect()) nobackup- 
MDT: refuse reconnection from 618cf36e-a7a6- 
[EMAIL PROTECTED]@tcp to 0x01037c109000; still  
busy with 3 active RPCs

This is common between many hosts that I get this RPC message.

Clients and servers are all using TCP.

Is this enough information?

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Lustre clients failing, and cant reconnect

2008-09-04 Thread Brock Palen

Looks like that didn't fix it.  One of the login nodes repeated the  
behavior.
The strange thing is that the MDS does not show anything about the  
NID of the client.  The client just says it lost connection with it,  
but the MDS never says it has not heard from the client and is  
kicking it out.

Very strange.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



On Sep 4, 2008, at 11:34 PM, Brock Palen wrote:


 Is this enough information?

 Probably.  If you are running 1.6.5, try disabling statahead on  
 all of
 your clients...

 # echo 0  /proc/fs/lustre/.../statahead_max

 I thought statahead was fixed in 1.6.5 ?  Main reason we upgraded.
 Login nodes already are showing that behavior again.
 I will try it out


 Of course, this setting goes back to it's default of 32 on a reboot.

 b.

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] lru_size very small

2008-08-23 Thread Brock Palen

Great!

So I read this as being lru_size no-longer needs to be manually  
adjusted.  Thats great!
Thanks!

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



On Aug 23, 2008, at 7:22 AM, Andreas Dilger wrote:
 On Aug 22, 2008  15:39 -0400, Brock Palen wrote:
 It looks like lru_size is not a static parameter.  While on most of
 our hosts it starts as zero.  Once the file system is accessed some
 the values start to rise.  The values get highest for the MDS.

 cat nobackup-MDT-mdc-01022c433800/lru_size
   3877

 Yes, in 1.6.5 instead of having a static LRU size it is dynamic based
 on load.  This optimizes the number of locks available to nodes that
 have very different workloads than others (e.g. login/build nodes vs.
 compute nodes vs. backup nodes).

 So in 1.6.5.1  are lock dynamically adjusted based on ram available
 on the MDS/OSS's?  Notice how the value above is _much_ higher than
 the default '100' in the manual.

 The total number of locks available are now a function of the RAM
 on the server.  I think the maximum is 50 locks/MB, but this is
 hooked into the kernel VM so that in case of too much memory pressure
 then the LRU size is shrunk.

 I should point out this value was 0  till I did a   'find . | wc -l'
 in a directory.  The same is for regular access.  users on nodes that
 access lustre have locks.  Nodes that have not had lustre access yet
 are still 0  (by access I mean an application that uses our lustre
 mount vs our NFS mount.)

 Any feedback on the nature of locks and lru_size?
 We are looking to do what the manual says about upping the number on
 the login nodes.

 Yes, the manual needs an update.

 Cheers, Andreas
 --
 Andreas Dilger
 Sr. Staff Engineer, Lustre Group
 Sun Microsystems of Canada, Inc.




___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] HLRN lustre breakdown

2008-08-21 Thread Brock Palen

On Aug 21, 2008, at 10:22 AM, Troy Benjegerdes wrote:
 This is a big nasty issue, particularly for HPC applications where
 performance is a big issue.

 How does one even begin to benchmark the performance overhead of a
 parallel filesystem with checksumming? I am having nightmares over the
 ways vendors will try to play games with performance numbers.

True


 My suspicion is that whenever a parallel filesystem with  
 checksumming is
 available and works, that all the end-users will just turn it off  
 anyway
 because the applications will run twice as fast without it, regardless
 of what the benchmarks say.. leaving us back at the same problem.

I don't think this will be a problem. On current systems it may be  
the case of the checksummed filesystem becoming cpu bound.  I think  
the OST's will be bailed out by cpu speeds going up faster than disk  
speeds. You just need to limit the number of OST's/OSS.

Where I could see it being a problem is on the client side. That  
assumes that writes and reads are competing with the application for  
cycles.  So far on our clusters I see applications do ether compute  
or IO on a thread/rank.  Not both, freeing up allocated cpus for IO.   
Then again maybe I should ask our users why they don't do any async IO.

Prob depends.
My 2 cents.


 On Wed, Aug 20, 2008 at 07:12:10PM +0200, Bernd Schubert wrote:
 Oh damn, I'm always afraid of silent data corruptions due to bad  
 harddisks. We
 also already had this issue, fortunately we found this disk before  
 taking the
 system into production.

 Will lustre-2.0 use the ZFS checksum feature?


 Thanks,
 Bernd

 On Wednesday 20 August 2008 19:08:34 Peter Jones wrote:
 Hi there

 I got the following background information from Juergen Kreuels  
 at SGI

 It turned out that a bad disk ( which did NOT report itself as  
 being
 bad ) killed the lustre leading to data corruption due to inode  
 areas on
 that disk.
 It was finally decided to remake the whole FS and only during that
 action we finally ( after nearly 48 h ) found that bad drive.

 It had nothing to do with the lustre FS itself. Lustre had been the
 victim of a HW failure on a Raid6 lun.

 I hope that this helps

 PJones

 Heiko Schroeter wrote:
 Hello list,

 does anyone has more background infos of what happened there ?

 Regards
 Heiko




 HLRN News
 -


 Since Mon Aug 18, 2008 12:00 HLRN-II complex Berlin is open for  
 users,
 again.

 During the maintenance it turned out that the Lustre file system  
 holding
 the users $WORK and $TMPDIR was damaged completely.
 The file system had to be reconstructed from scratch. All user  
 data in
 $WORK are lost.

 We hope that this event remains an exception. SGI apologizes for  
 this
 event.

 /Bka

 === 
 =
 This is an announcement for all HLRN Users
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss



 -- 
 Bernd Schubert
 Q-Leap Networks GmbH
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

 -- 
 -- 
 
 Troy Benjegerdes'da hozer' 
 [EMAIL PROTECTED]

 Somone asked me why I work on this free (http://www.gnu.org/ 
 philosophy/)
 software stuff and not get a real job. Charles Shultz had the best  
 answer:

 Why do musicians compose symphonies and poets write poems? They do it
 because life wouldn't have any meaning for them if they didn't.  
 That's why
 I draw cartoons. It's my life. -- Charles Shultz
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] HLRN lustre breakdown

2008-08-21 Thread Brock Palen

Really ?  You sure?  I just set up a new 1.6.5.1 filesystem this week:

[EMAIL PROTECTED] ~]# cat /proc/fs/lustre/llite/nobackup-010037e27c00/ 
checksum_pages
  0

I am curious to test if they were on.  My MPI_File_write() of a large  
file was less than I expected, but it looked like OST's were cpu  
bound.  (two x4500's)

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



On Aug 21, 2008, at 2:59 PM, Andreas Dilger wrote:
 On Aug 21, 2008  10:55 -0400, Brock Palen wrote:
 On Aug 21, 2008, at 10:22 AM, Troy Benjegerdes wrote:
 This is a big nasty issue, particularly for HPC applications where
 performance is a big issue.

 How does one even begin to benchmark the performance overhead of a
 parallel filesystem with checksumming? I am having nightmares  
 over the
 ways vendors will try to play games with performance numbers.

 True

 Actually, Lustre 1.6.5 does checksumming by default, and that is how
 we do our benchmarking.  Some customers will turn it off because the
 overhead hurts them.  New customers may not even notice it...   
 Also, for
 many workloads the data integrity is much more important than the  
 speed.

 My suspicion is that whenever a parallel filesystem with
 checksumming is
 available and works, that all the end-users will just turn it off
 anyway
 because the applications will run twice as fast without it,  
 regardless
 of what the benchmarks say.. leaving us back at the same problem.

 I don't think this will be a problem. On current systems it may be
 the case of the checksummed filesystem becoming cpu bound.  I think
 the OST's will be bailed out by cpu speeds going up faster than disk
 speeds. You just need to limit the number of OST's/OSS.

 I agree that CPU speeds will almost certainly cover this in the  
 future.

 Where I could see it being a problem is on the client side. That
 assumes that writes and reads are competing with the application for
 cycles.  So far on our clusters I see applications do ether compute
 or IO on a thread/rank.  Not both, freeing up allocated cpus for IO.

 Yes, that is our experience also.

 Cheers, Andreas
 --
 Andreas Dilger
 Sr. Staff Engineer, Lustre Group
 Sun Microsystems of Canada, Inc.




___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] New lustre message

2008-08-21 Thread Brock Palen

I don't know if this is a bad thing,  I was doing a stress of our new  
lustre install and managed to have a client kicked out with the  
following message on the OST that kicked it out:

Lustre: 6584:0:(ldlm_lib.c:760:target_handle_connect()) nobackup- 
OST: refuse reconnection from 749b3c01-4ac0- 
[EMAIL PROTECTED]@tcp to 0x0102f7cdc000; still  
busy with 6 active RPCs


Was this just a result of hammering the filesystem really hard?  Both  
OSS became CPU bound, so I would not be surprised if it was just to  
much.  Any other common causes of this message (I never saw it with  
our old setup) would be great.

Thanks,
New install is working great, nice product.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] New lustre message

2008-08-21 Thread Brock Palen

On Aug 21, 2008, at 11:17 PM, Brian J. Murrell wrote:
 On Thu, 2008-08-21 at 22:23 -0400, Brock Palen wrote:
 I don't know if this is a bad thing,  I was doing a stress of our new
 lustre install and managed to have a client kicked out with the
 following message on the OST that kicked it out:

 To be clear the below message is not a client being evicted but  
 rather a
 client trying to reconnect after it has been evicted.

Thanks yes,  this message appeared after the eviction notice,


 Lustre: 6584:0:(ldlm_lib.c:760:target_handle_connect()) nobackup-
 OST: refuse reconnection from 749b3c01-4ac0-
 [EMAIL PROTECTED]@tcp to 0x0102f7cdc000; still
 busy with 6 active RPCs

 The OSS is refusing to allow the client to reconnect however  
 because it
 is still trying to finish the transactions the client had in progress
 when it was evicted.

Good to know that its just for 'that' client.


 Was this just a result of hammering the filesystem really hard?

 Could be, if the load was atypical and you have tuned your obd_timeout
 for a more typical load.  Typically, until AT is in full swing, you  
 need
 to tune for your worst case scenario.

 b.

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] lru_size very small

2008-08-21 Thread Brock Palen

Sorry for throwing up so many quick questions on the list in a short  
time.

Looking at the manual about locking,  the manual states

The default value of LRU size is 100

I looked on our login nodes to increase its value, currently lustre  
set lru_size to 32 for the MDS and 1 for 9 of the OST's, 3 for 1 OST,  
4 for 1 OST and 0 for 3 OST's.

I should note though that all 14 OST's are spread across two OSS,  
both with 16GB of ram (x4500's).

Compared to what the manual says this sounds really small.
Would this be a sign that we don't have enough memory in our OSS/ 
MDS's  for our number of clients?

I looked on a few of our clients, many only have 1 lru_size for the  
MDS and 0 for all the OST's.

Am I reading something wrong?  Or do we have to set this at start up,  
not let lustre figure it out from clients/ram  as stated in the manual.

This state worries me because it gives me the felling the cache will  
not function at all because of the lack of available locks.  I don't  
want to end up on the wrong end of can speed up Lustre dramatically.

Thanks.

633 clients,
16 GB MDS/MGS
2x16GB OSS's.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] It gives error no space left while lustre still have spaces left.

2008-08-20 Thread Brock Palen

If I understand right when you use 'setstripe -c -1'  lustre will try  
to evenly spread the data of a file over all OST's.

Because one of yours gets full, the file can nolonger be added to.   
Lustre does not fall back to using fewer stripes as most users say  
'use more stripes'  for a reason.  Lustre should not ignore this (and  
doesn't).

I don't know how you would work around a this,  A use every stripe  
you can till its out of space  I don't think exists.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



On Aug 21, 2008, at 12:13 AM, /* Chris */ wrote:
 Hi, all,

 I got a problem when I testing lustre-1.6.5.1 on CentOS-5.2.
 I have four machines(PCs), they are MGS co-located with MDT, OSS-1,  
 OSS-2 and CLT.
 OSS-1 have two disks which are formatted as ost01(40GB) and ost02 
 (15GB).
 OSS-2 have two disks which are formatted as ost03(23GB) and ost04 
 (5GB).

 At first, I reformat the MGS/MDT and mount it to /mnt/ as follow:
 [EMAIL PROTECTED] ~]# mkfs.lustre --reformat --fsname=testfs --mgs --mdt / 
 dev/hdb
 [EMAIL PROTECTED] ~]# mount -t lustre /dev/hdb /mnt/mgs

 Second, I reformat OSTs and mount it to /mnt as follow:
 [EMAIL PROTECTED] ~]# mkfs.lustre --reformat --fsname=testfs --ost -- 
 [EMAIL PROTECTED] /dev/hdc
 [EMAIL PROTECTED] ~]# mkfs.lustre --reformat --fsname=testfs --ost -- 
 [EMAIL PROTECTED] /dev/hdd
 [EMAIL PROTECTED] ~]# mount -t lustre /dev/hdc /mnt/ost01
 [EMAIL PROTECTED] ~]# mount -t lustre /dev/hdd /mnt/ost02

 [EMAIL PROTECTED] ~]# mkfs.lustre --reformat --fsname=testfs --ost -- 
 [EMAIL PROTECTED] /dev/hdc
 [EMAIL PROTECTED] ~]# mkfs.lustre --reformat --fsname=testfs --ost -- 
 [EMAIL PROTECTED] /dev/hdd
 [EMAIL PROTECTED] ~]# mount -t lustre /dev/hdc /mnt/ost03
 [EMAIL PROTECTED] ~]# mount -t lustre /dev/hdd /mnt/ost04

 Third, I mounted lustre file system at CLT like this:
 [EMAIL PROTECTED] ~] # mount -t lustre [EMAIL PROTECTED]:/testfs /mnt/lfs
 [EMAIL PROTECTED] mnt]# df -h
 Filesystem  Capacity  Used Available Use% Mounted on:
 /dev/mapper/VolGroup00-LogVol00 4.3G  1.9G  2.2G  46% /
 /dev/hda1   99M   67M   28M  72% /boot
 tmpfs252M 0252M   0% /dev/shm
 [EMAIL PROTECTED]:/testfs   82G  1.6G 77G   2% /mnt/lfs
 Fourth, I try to use lfs command to set stripe parameters at CLT.
 [EMAIL PROTECTED] mnt]# lfs setstripe lfs -s 8m -c -1

 Fifth, I use dd command to test lustre file system.
 Then, it gives error no space left since just ost04(5GB) get full.

 [EMAIL PROTECTED] lfs]# dd if=/dev/zero of=testfile001 bs=128M count=24
 24+0 records in
 24+0 records out
 3221225472 bytes (3.2 GB) copied，164.585 seconds，19.6 MB/s
 [EMAIL PROTECTED] lfs]# dd if=/dev/zero of=testfile002 bs=128M count=24
 24+0 records in
 24+0 records out
 3221225472 bytes (3.2 GB) copied，164.836 seconds，19.5 MB/s
 [EMAIL PROTECTED] lfs]# dd if=/dev/zero of=testfile003 bs=128M count=48
 48+0 records in
 48+0 records out
 6442450944 bytes (6.4 GB) copied，383.2 seconds，16.8 MB/s
 [EMAIL PROTECTED] lfs]# dd if=/dev/zero of=testfile004 bs=128M count=48
 dd: write error: ‘testfile004’: No space left on this device.
 47+0 records in
 46+0 records out
 6301048832 bytes (6.3 GB) copied，418.321 秒，15.1 MB/s

 [EMAIL PROTECTED] lfs]# df -h
 Filesystem  Capacity  Used Available Use% Mounted on
 /dev/mapper/VolGroup00-LogVol00  4.3G  1.9G  2.2G  46% /
 /dev/hda199M   67M   28M  72% /boot
 tmpfs252M 0 252M0% /dev/shm
 [EMAIL PROTECTED]:/testfs82G   20G 59G  25% /mnt/lfs
 [EMAIL PROTECTED] lfs]# lfs df
UUID 1K-blocks  Used Available   
 Use% Mounted on
 testfs-MDT_UUID2752272127844   24671444% /mnt/lfs 
 [MDT:0]
 testfs-OST_UUID   41284928   5145080  34042632   12% /mnt/lfs 
 [OST:0]
 testfs-OST0001_UUID   15481840   5134432   9560912   33% /mnt/lfs 
 [OST:1]
 testfs-OST0002_UUID   23738812   5141040  17391848   21% /mnt/lfs 
 [OST:2]
 testfs-OST0003_UUID5160576   4898364 4   94% /mnt/lfs 
 [OST:3]

 filesystem summary:   85666156  20318916  60995396   23% /mnt/lfs
 I have no idea about this error.
 Is there anyone could tell me about that how to config lustre to  
 avoid this error?
 didn't Lustre put file into the OSTs which still have free spaces  
 instead of those full ones ?


 Regards,

 Chris
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Bug 15912

2008-08-18 Thread Brock Palen

Hi,
I never did get a reply for this.  We are faced with planned  
production on Monday, so we could really use some guidance.

What is the quickest work around for working around bug 15912,  there  
are patches now but they are for unreleased versions.  We need a  
solution for the current 1.6.5.1.

Can I change the MGSSPEC for the OST's after the fact? And will that  
work?
How would this be done?

Thanks ahead of time.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



On Aug 14, 2008, at 11:15 AM, Brock Palen wrote:
 I see it is fixed now as we were being bit by this.
 We would like to put the new filesystem into use on Monday thus we
 are trying to get this resolved.

 Question, because it is just a parsing problem in mkfs, can after the
 filesystem is created can the problem be corrected?

 If not, how can we work around this?  Do I just need to build the
 mkfs out of CVS for 1.6.6 ?


 Brock Palen
 www.umich.edu/~brockp
 Center for Advanced Computing
 [EMAIL PROTECTED]
 (734)936-1985



 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Bug 15912

2008-08-18 Thread Brock Palen

Ignore,

After days of banging head against the wall and trying to use  
tunefs.lustre  which appears to maybe suffer from the same bug.  I  
found the alternative that specifying --mgsnode=  more than once was  
valid.  This mixed with the wonderful --print option to mkfs.lustre   
I think I have my work around.

mkfs.lustre --reformat --ost --fsname=nobackup --mgsnode=mds1 -- 
mgsnode=mds2 --mkfsoptions -j -J device=/dev/md27  /dev/md17

Thanks,
Though I am scared about behavior of tunefs.lustre if we ever needed  
to re-ip the nodes. Re-formating is not really an option.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



On Aug 18, 2008, at 8:16 PM, Brock Palen wrote:
 Hi,
 I never did get a reply for this.  We are faced with planned  
 production on Monday, so we could really use some guidance.

 What is the quickest work around for working around bug 15912,   
 there are patches now but they are for unreleased versions.  We  
 need a solution for the current 1.6.5.1.

 Can I change the MGSSPEC for the OST's after the fact? And will  
 that work?
 How would this be done?

 Thanks ahead of time.

 Brock Palen
 www.umich.edu/~brockp
 Center for Advanced Computing
 [EMAIL PROTECTED]
 (734)936-1985



 On Aug 14, 2008, at 11:15 AM, Brock Palen wrote:
 I see it is fixed now as we were being bit by this.
 We would like to put the new filesystem into use on Monday thus we
 are trying to get this resolved.

 Question, because it is just a parsing problem in mkfs, can after the
 filesystem is created can the problem be corrected?

 If not, how can we work around this?  Do I just need to build the
 mkfs out of CVS for 1.6.6 ?


 Brock Palen
 www.umich.edu/~brockp
 Center for Advanced Computing
 [EMAIL PROTECTED]
 (734)936-1985



 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss






___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] Bug 15912

2008-08-14 Thread Brock Palen

I see it is fixed now as we were being bit by this.
We would like to put the new filesystem into use on Monday thus we  
are trying to get this resolved.

Question, because it is just a parsing problem in mkfs, can after the  
filesystem is created can the problem be corrected?

If not, how can we work around this?  Do I just need to build the  
mkfs out of CVS for 1.6.6 ?


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] mv_sata patch

2008-08-13 Thread Brock Palen

Is the cache patch for mv_sata noted in the sun paper on the x4500  
available?  Or has it been rolled into the source distributed by sun?

Trying to avoid data loss.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] stata_mv mv_stata which is better?

2008-08-07 Thread Brock Palen

Thanks, I might look into it.  Right now the performance of the stock  
driver that comes with the kernel is more than the 4 1gig connections  
we will be using.

I am having other issues now with the new filesystem that I did not  
have with our old one, that iwll be a new question though.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



On Aug 7, 2008, at 2:11 PM, Mike Berg wrote:
 Brock,

 It is recommended that mv_sata is used on the x4500.

 It has been a while since I have built this up myself and a few  
 Lustre releases back but I do understand the pain. I hope that with  
 Lustre 1.6.5.1 on RHEL 4.5 you can just build mv_sata against the  
 provided Lustre kernel and alias it accordingly in modprobe.conf  
 and create a new initrd, then update grub. I don't have gear handy  
 to give it try unfortunately. Please let me know your experiences  
 with this if you pursue it.

 Enclosed is a somewhat dated document on what we have found to be  
 the best configuration of the x4500 for use with Lustre. Ignore the  
 N1SM parts. We optimized for performance and RAS with some  
 sacrifices on capacity. Hopefully this is a useful reference.


 Regards,
 Mike Berg
 Sr. Lustre Solutions Engineer
 Sun Microsystems, Inc.
 Office/Fax: (303) 547-3491
 E-mail:  [EMAIL PROTECTED]


 X4500-preparation.pdf

 On Aug 6, 2008, at 1:48 PM, Brock Palen wrote:

 Is it still worth the effort to try and build mv_stata?  when working
 with an x4500?
 stata_mv from RHEL4 does not appear to show some of the stability
 problems discussed online before.

 I am curious because the build system sun provides with the driver
 does not play nicely with the lustre kernel source packaging.

 If it is worth all the pain, if others have already figured it out.
 Any help would be grateful.


 Brock Palen
 www.umich.edu/~brockp
 Center for Advanced Computing
 [EMAIL PROTECTED]
 (734)936-1985



 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] operation 400 on unconnected MGS

2008-08-07 Thread Brock Palen

The problem I was refering to:

With the new filesystem we just created I am getting the following  
problem,

clients loose connection to the MGS and the MGS says it evicted  
them,  machines are on the same network and there is no errors on the  
interfaces.  The MGS  says:

Lustre: MGS: haven't heard from client e8eb1779-5cea-9cc7- 
b5ae-4c5ccf54f5ca (at [EMAIL PROTECTED]) in 240 seconds. I think it's  
dead, and I am evicting it.
LustreError: 9103:0:(mgs_handler.c:538:mgs_handle()) lustre_mgs:  
operation 400 on unconnected MGS
LustreError: 9103:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@  
processing error (-107)  [EMAIL PROTECTED] x24929/t0 o400-?@?: 
0/0 lens 128/0 e 0 to 0 dl 1218142953 ref 1 fl Interpret:/0/0 rc -107/0


The operation 400 on unconnected MGS  is the only new message I am  
not familiar with.  Once the client losses connection with the MGS I  
will see the OST's start booting the client also.


Servers are 1.6.5.1  clients are patch-less 1.6.4.1  on RHEL4.

Any insight would be great.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] stata_mv mv_stata which is better?

2008-08-06 Thread Brock Palen

Is it still worth the effort to try and build mv_stata?  when working  
with an x4500?
stata_mv from RHEL4 does not appear to show some of the stability  
problems discussed online before.

I am curious because the build system sun provides with the driver  
does not play nicely with the lustre kernel source packaging.

If it is worth all the pain, if others have already figured it out.   
Any help would be grateful.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] Luster recovery when clients go away

2008-07-31 Thread Brock Palen

One of our OSS's died with a panic last night.  Between when it was  
found (no failover) and restarted two clients had died also.  (nodes  
crashed by user OOM).

Because of this the OST's now are looking for 626  clients to recover  
when only 624 are up.  So the 624 recover in about 15 minutes, but  
the OST's on that OSS hang waiting for the last two that are dead and  
not coming back.  Note the MDS reports only 624 clients.

Is there a a way to tell the OST's to go ahead and evict those two  
clients and finish recovering?  Also time remaining has been 0  
sense it was booted.  How long will the OST's wait before it lets  
operations continue?

Is there any rule to speeding up recovery?  The OSS that crashed sees  
very little cpus/disk/network traffic when recovery is going on so  
any way to speed it up even if it results in a higher load would be  
great to know.

status: RECOVERING
recovery_start: 1217509142
time remaining: 0
connected_clients: 624/626
completed_clients: 624/626
replayed_requests: 0/??
queued_requests: 0
next_transno: 175342162
status: RECOVERING
recovery_start: 1217509144
time remaining: 0
connected_clients: 624/626
completed_clients: 624/626
replayed_requests: 0/??
queued_requests: 0
next_transno: 193097794



Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] lustre 1.6.5.1 panic on failover

2008-07-31 Thread Brock Palen

I have two machines I am setting up as my first mds failover pair.

The two sun x4100's  are connected to a FC disk array.  I have set up  
heartbeat with IPMI for STONITH.

Problem is when I run a test on the host that currently has the mds/ 
mgs mounted  'killall -9 heartbeat'  I see the IPMI shutdown and when  
the second 4100 tries to mount the filesystem it does a kernel panic.

Has anyone else seen this behavior?  Is there something I am running  
into?  If I do a 'hb_takelover' or shutdown heartbeat cleanly all is  
well.  Only if I simulate heartbeat failing does this happen.  Note I  
have not tired yanking power yet, but I want to simulate a MDS in a  
semi dead state and ran into this.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] lustre 1.6.5.1 panic on failover

2008-07-31 Thread Brock Palen

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Whats a good tool to grab this? Its more than one page long, and the  
machine does not have serial ports.
Links are ok.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



On Jul 31, 2008, at 5:14 PM, Brian J. Murrell wrote:
 On Thu, 2008-07-31 at 16:57 -0400, Brock Palen wrote:

 Problem is when I run a test on the host that currently has the mds/
 mgs mounted  'killall -9 heartbeat'  I see the IPMI shutdown and when
 the second 4100 tries to mount the filesystem it does a kernel panic.

 We'd need to see the *full* panic info to do any amount of  
 diagnostics.

 b.

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (Darwin)

iD8DBQFIkldGMFCQB4Bvz5QRAjEqAJ99IN1m0/JJcqyh/Dm7WF0w5nd2eQCfT9IT
w39dxPiWCdXKzpLEo4WxBSU=
=Gnsm
-END PGP SIGNATURE-
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] MGS failover

2008-07-30 Thread Brock Palen

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Thank you, I was only looking at the mkfs.lustre didn't think the  
hosts are the ones that need to look there, not mgs filesystem its self.

does the mgsspec also work for --mgsnode=  when creating a file system?

mkfs.lustre   [EMAIL PROTECTED]:[EMAIL PROTECTED]

Would that be valid?

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



On Jul 30, 2008, at 10:29 AM, Brian J. Murrell wrote:
 On Wed, 2008-07-30 at 09:48 -0400, Brock Palen wrote:
 The manual does not make much sense when it comes to MGS failover.

 Manual:

 Note – The MGS does not use the --failnode option.

 This is true.

 You need to set the
 command on all other nodes of the filesystem (servers and clients),
 about the
 failover options for the MGS.

 This is true also.

 Use the --mgsnode parameter on servers
 and mount
 address for clients.

 Also true.

 The servers need to contact the MGS for
 configuration
 information;

 Also true.

 they cannot query the MGS about the failover partner.

 This part is either unclear or wrong.  I guess the question is what  
 the
 writer was referring to as they.  You could file a bug about that.

 This does not make any sense at all, other than you can't use --
 failnode

 On the MGS, yes.  On servers, yes you can use that option.

 and that clients can't check with two different hosts for
 MGS data.

 They sure can!  That's what the mgsspec:=mgsnode[:mgsnode]
 syntax in the mount.lustre manpage is all about.

 Our MGS will be on its own LUN setup with heartbeat
 between two nodes that are also working as an MDS pair.

 Good.

 While
 Heartbeat takes care of mounting the MGS file system, how can we tell
 clients if mds1 is down use mds2 for MGS data

 $ man mount.lustre

 Check out mgsspec in the OPTIONS section.

 b.

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (Darwin)

iD8DBQFIkH5JMFCQB4Bvz5QRArwBAJ0TBBFVIBWiLQIt1e6kbG/n6Ufn5wCcCx1L
/KJFr81OkKkuTTW0N4LtcUk=
=hfmn
-END PGP SIGNATURE-
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] rpm kernel-devel package

2008-07-25 Thread Brock Palen

On the download site, I am trying to figure out which rpm i need to  
download that would match the 'kernel-devel'  equivilent for lustre.

I have a need to build the sun multipath driver against that kernel  
for our new MDS's  machines, but it is not very obvious if I need:

  lustre-source-1.6.5.1-2.6.9_67.0.7.EL_lustre.1.6.5.1smp.x86_64.rpm

Or:

  kernel-lustre-source-2.6.9-67.0.7.EL_lustre.1.6.5.1.x86_64.rpm


Is there a reason why there is not just a normal:
kernel-lustre-smp-devel

Just like RedHat/SLES provides?
Thanks!


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] Can't build sun rdac driver against lustre source.

2008-07-25 Thread Brock Palen

Hi I ran into two problems, The first was easy to resolve:

/bin/sh: scripts/genksyms/genksyms: No such file or directory
/bin/sh: scripts/mod/modpost: No such file or directory

I just had to copy  genksyms and mod from
linux-2.6.9-67.0.7.EL_lustre.1.6.5.1 to linux-2.6.9-67.0.7.EL_lustre. 
1.6.5.1-obj

I figured you should be aware of this, if its a problem with sun's  
build system for their multipath driver or lustre source package.   
This is on RHEL4.  Using the lustre RPM's form sun's website.


The next problem I am stuck on is:

In file included from mppLnx26_spinlock_size.c:51:
/usr/include/linux/autoconf.h:1:2: #error Invalid kernel header  
included in userspace
mppLnx26_spinlock_size.c: In function `main':
mppLnx26_spinlock_size.c:102: error: `spinlock_t' undeclared (first  
use in this function)
mppLnx26_spinlock_size.c:102: error: (Each undeclared identifier is  
reported only once
mppLnx26_spinlock_size.c:102: error: for each function it appears in.)
make: *** [mppLnx_Spinlock_Size] Error 1


I guess what I should really ask is,
Has anyone ever made multipath work with a sun 2540 array for use as  
the MDS/MGS file system?

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Can't build sun rdac driver against lustre source.

2008-07-25 Thread Brock Palen


Yes that worked!  Thank you very much.

Hin't to sun, the 2540 is a very nice array for lustre, it would be  
good if all the tools with it were checked to work out the box with  
lustre.  Just 2 cents.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



On Jul 25, 2008, at 2:19 PM, Stuart Marshall wrote:


Hi,

I have compiled and used the run rdac driver and my modified  
makefile is attached.


The sequence I've used (perhaps not the best) is:

- cd /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5.1smp/source/
- cp /boot/config-2.6.9-67.0.7.EL_lustre.1.6.5.1smp .config
- make clean
- make mrproper
- make prepare-all

- cd /tmp
- tar xf path_to_rdac_tarfile/rdac-LINUX-09.01.B2.74-source.tar
- cd linuxrdac-09.01.B2.74/
- cp path_to_my_makefile/Makefile_linuxrdac-09.01.B2.74 Makefile
- make clean
- make uninstall
- make
- make install
- vim /boot/grub/menu.lst (initrd - mpp)
- reboot

The changes in the Makefile may fix your problem.  I'm using 6140  
Sun arrays and also plan to use a 2540 as the MDT soon.


Stuart


On Fri, Jul 25, 2008 at 11:03 AM, Brock Palen [EMAIL PROTECTED]  
wrote:

Hi I ran into two problems, The first was easy to resolve:

/bin/sh: scripts/genksyms/genksyms: No such file or directory
/bin/sh: scripts/mod/modpost: No such file or directory

I just had to copy  genksyms and mod from
linux-2.6.9-67.0.7.EL_lustre.1.6.5.1 to linux-2.6.9-67.0.7.EL_lustre.
1.6.5.1-obj

I figured you should be aware of this, if its a problem with sun's
build system for their multipath driver or lustre source package.
This is on RHEL4.  Using the lustre RPM's form sun's website.


The next problem I am stuck on is:

In file included from mppLnx26_spinlock_size.c:51:
/usr/include/linux/autoconf.h:1:2: #error Invalid kernel header
included in userspace
mppLnx26_spinlock_size.c: In function `main':
mppLnx26_spinlock_size.c:102: error: `spinlock_t' undeclared (first
use in this function)
mppLnx26_spinlock_size.c:102: error: (Each undeclared identifier is
reported only once
mppLnx26_spinlock_size.c:102: error: for each function it appears in.)
make: *** [mppLnx_Spinlock_Size] Error 1


I guess what I should really ask is,
Has anyone ever made multipath work with a sun 2540 array for use as
the MDS/MGS file system?

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Makefile_linuxrdac-09.01.B2.74


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Can't build sun rdac driver against lustre source.

2008-07-25 Thread Brock Palen


Stuart,
It looks like you have a newer rdac package than sun has on their  
website.  So while your make file builds everything, it ties to  
install a bit of code that does not exist.  FYI.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



On Jul 25, 2008, at 2:30 PM, Brock Palen wrote:


Yes that worked!  Thank you very much.

Hin't to sun, the 2540 is a very nice array for lustre, it would be  
good if all the tools with it were checked to work out the box with  
lustre.  Just 2 cents.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



On Jul 25, 2008, at 2:19 PM, Stuart Marshall wrote:

Hi,

I have compiled and used the run rdac driver and my modified  
makefile is attached.


The sequence I've used (perhaps not the best) is:

- cd /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5.1smp/source/
- cp /boot/config-2.6.9-67.0.7.EL_lustre.1.6.5.1smp .config
- make clean
- make mrproper
- make prepare-all

- cd /tmp
- tar xf path_to_rdac_tarfile/rdac-LINUX-09.01.B2.74-source.tar
- cd linuxrdac-09.01.B2.74/
- cp path_to_my_makefile/Makefile_linuxrdac-09.01.B2.74  
Makefile

- make clean
- make uninstall
- make
- make install
- vim /boot/grub/menu.lst (initrd - mpp)
- reboot

The changes in the Makefile may fix your problem.  I'm using 6140  
Sun arrays and also plan to use a 2540 as the MDT soon.


Stuart


On Fri, Jul 25, 2008 at 11:03 AM, Brock Palen [EMAIL PROTECTED]  
wrote:

Hi I ran into two problems, The first was easy to resolve:

/bin/sh: scripts/genksyms/genksyms: No such file or directory
/bin/sh: scripts/mod/modpost: No such file or directory

I just had to copy  genksyms and mod from
linux-2.6.9-67.0.7.EL_lustre.1.6.5.1 to linux-2.6.9-67.0.7.EL_lustre.
1.6.5.1-obj

I figured you should be aware of this, if its a problem with sun's
build system for their multipath driver or lustre source package.
This is on RHEL4.  Using the lustre RPM's form sun's website.


The next problem I am stuck on is:

In file included from mppLnx26_spinlock_size.c:51:
/usr/include/linux/autoconf.h:1:2: #error Invalid kernel header
included in userspace
mppLnx26_spinlock_size.c: In function `main':
mppLnx26_spinlock_size.c:102: error: `spinlock_t' undeclared (first
use in this function)
mppLnx26_spinlock_size.c:102: error: (Each undeclared identifier is
reported only once
mppLnx26_spinlock_size.c:102: error: for each function it appears  
in.)

make: *** [mppLnx_Spinlock_Size] Error 1


I guess what I should really ask is,
Has anyone ever made multipath work with a sun 2540 array for use as
the MDS/MGS file system?

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Makefile_linuxrdac-09.01.B2.74


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] Lustre locking up on login/interactive nodes

2008-07-21 Thread Brock Palen

Every so often lustre locks up. It will recover eventually. The  
process show this self's in 'D'  Uninterruptible IO Wait.  This case  
it was 'ar' making an archive.

Dmesg then shows:

Lustre: nobackup-MDT-mdc-0101fc467800: Connection to service  
nobackup-MDT via nid [EMAIL PROTECTED] was lost; in progress  
operations using this service will wait for recovery to complete.
LustreError: 167-0: This client was evicted by nobackup-MDT; in  
progress operations using this service will fail.
LustreError: 17575:0:(client.c:519:ptlrpc_import_delay_req()) @@@  
IMP_INVALID  [EMAIL PROTECTED] x912452/t0  
o101-[EMAIL PROTECTED]@tcp:12 lens 488/768 ref 1  
fl Rpc:P/0/0 rc 0/0
LustreError: 17575:0:(mdc_locks.c:423:mdc_finish_enqueue())  
ldlm_cli_enqueue: -108
LustreError: 27076:0:(client.c:519:ptlrpc_import_delay_req()) @@@  
IMP_INVALID  [EMAIL PROTECTED] x912464/t0  
o101-[EMAIL PROTECTED]@tcp:12 lens 440/768 ref 1  
fl Rpc:/0/0 rc 0/0
LustreError: 27076:0:(mdc_locks.c:423:mdc_finish_enqueue())  
ldlm_cli_enqueue: -108
LustreError: 27489:0:(file.c:97:ll_close_inode_openhandle()) inode  
12653753 mdc close failed: rc = -108
LustreError: 27489:0:(file.c:97:ll_close_inode_openhandle()) inode  
12195682 mdc close failed: rc = -108
LustreError: 27489:0:(file.c:97:ll_close_inode_openhandle()) Skipped  
46 previous similar messages
Lustre: nobackup-MDT-mdc-0101fc467800: Connection restored to  
service nobackup-MDT using nid [EMAIL PROTECTED]
LustreError: 11-0: an error occurred while communicating with  
[EMAIL PROTECTED] The mds_close operation failed with -116
LustreError: 11-0: an error occurred while communicating with  
[EMAIL PROTECTED] The mds_close operation failed with -116
LustreError: 26930:0:(file.c:97:ll_close_inode_openhandle()) inode  
11441446 mdc close failed: rc = -116
LustreError: 26930:0:(file.c:97:ll_close_inode_openhandle()) Skipped  
113 previous similar messages


Is there special options that should be done on interactive/login  
nodes?  I remember something about how much memory should be available  
on login vs batch nodes. But I don't know how to change that, I just  
assumed lustre would use it.  Login nodes have 8GB.
__
www.palen.serveftp.net
Center for Advanced Computing
http://cac.engin.umich.edu
[EMAIL PROTECTED]



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Lustre locking up on login/interactive nodes

2008-07-21 Thread Brock Palen

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Jul 21, 2008, at 11:51 AM, Brian J. Murrell wrote:
 On Mon, 2008-07-21 at 11:43 -0400, Brock Palen wrote:
 Every so often lustre locks up. It will recover eventually. The
 process show this self's in 'D'  Uninterruptible IO Wait.  This case
 it was 'ar' making an archive.

 Dmesg then shows:

 Syslog is usually a better place to get messages from as it gives some
 context as to the time of the messages.

Ok will keep in mind. Looks the same though, Its odd though, if I  
login to the same machine I can move to that directory list the files  
etc.  read files on those OST's  and notice this was eviction by the  
MDS,

I see no lost network connections or network errors.  Strange not  
good not good at all.
The syslog data is the same, its below:

Brock


Jul 21 11:38:39 nyx-login1 kernel: Lustre: nobackup-MDT- 
mdc-0101fc467800: Connection to service nobackup-MDT via nid  
[EMAIL PROTECTED] was lost; in progress operations using this  
service will wait for recovery to complete.Jul 21 11:38:39 nyx-login1  
kernel: LustreError: 167-0: This client was evicted by nobackup- 
MDT; in progress operations using this service will fail.Jul 21  
11:38:39 nyx-login1 kernel: LustreError: 17575:0:(client.c: 
519:ptlrpc_import_delay_req()) @@@ IMP_INVALID  [EMAIL PROTECTED]  
x912452/t0 o101-[EMAIL PROTECTED]@tcp:12 lens  
488/768 ref 1 fl Rpc:P/0/0 rc 0/0Jul 21 11:38:39 nyx-login1 kernel:  
LustreError: 17575:0:(mdc_locks.c:423:mdc_finish_enqueue())  
ldlm_cli_enqueue: -108Jul 21 11:38:39 nyx-login1 kernel: LustreError:  
27076:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID   
[EMAIL PROTECTED] x912464/t0 o101-nobackup- 
[EMAIL PROTECTED]@tcp:12 lens 440/768 ref 1 fl Rpc:/0/0 rc  
0/0Jul 21 11:38:39 nyx-login1 kernel: LustreError: 27076:0: 
(mdc_locks.c:423:mdc_finish_enqueue()) ldlm_cli_enqueue: -108Jul 21  
11:38:39 nyx-login1 kernel: LustreError: 27489:0:(file.c: 
97:ll_close_inode_openhandle()) inode 12653753 mdc close failed: rc =  
- -108Jul 21 11:38:39 nyx-login1 kernel: LustreError: 27489:0:(file.c: 
97:ll_close_inode_openhandle()) inode 12195682 mdc close failed: rc =  
- -108Jul 21 11:38:39 nyx-login1 kernel: LustreError: 27489:0:(file.c: 
97:ll_close_inode_openhandle()) Skipped 46 previous similar  
messagesJul 21 11:38:39 nyx-login1 kernel: Lustre: nobackup-MDT- 
mdc-0101fc467800: Connection restored to service nobackup-MDT  
using nid [EMAIL PROTECTED] 21 11:38:39 nyx-login1 kernel:  
LustreError: 11-0: an error occurred while communicating with  
[EMAIL PROTECTED] The mds_close operation failed with -116Jul 21  
11:38:39 nyx-login1 kernel: LustreError: 11-0: an error occurred  
while communicating with [EMAIL PROTECTED] The mds_close operation  
failed with -116Jul 21 11:38:39 nyx-login1 kernel: LustreError:  
26930:0:(file.c:97:ll_close_inode_openhandle()) inode 11441446 mdc  
close failed: rc = -116Jul 21 11:38:39 nyx-login1 kernel:  
LustreError: 26930:0:(file.c:97:ll_close_inode_openhandle()) Skipped  
113 previous similar messages

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (Darwin)

iD8DBQFIhLOqMFCQB4Bvz5QRAgWvAJ9HhQAo9JZdcS2iyMFb19HzcgkwcQCdGosB
sHaligENGxnJHdMu5116D5U=
=GOlg
-END PGP SIGNATURE-
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] OSS load in the roof

2008-06-27 Thread Brock Palen

our OSS went crazy today.  It is attached to two OST's.

The load normally around 2-4.  Right now it is 123.

I noticed this to be the cause:

root  6748  0.0  0.0 00 ?DMay27   8:57  
[ll_ost_io_123]

All of them are stuck in un-interruptible sleep.
Has anyone seen this happen before?  Is this caused by a pending disk  
failure?

I ask the disk system failure because I also see this message:

mptscsi: ioc1: attempting task abort! (sc=010038904c40)
scsi1 : destination target 0, lun 0
 command = Read (10) 00 75 94 40 00 00 10 00 00
mptscsi: ioc1: task abort: SUCCESS (sc=010038904c40)

and:

Lustre: 6698:0:(lustre_fsfilt.h:306:fsfilt_setattr()) nobackup- 
OST0001: slow setattr 100s
Lustre: 6698:0:(watchdog.c:312:lcw_update_time()) Expired watchdog  
for pid 6698 disabled after 103.1261s

Thanks

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] OSS load in the roof

2008-06-27 Thread Brock Palen

On Jun 27, 2008, at 1:39 PM, Bernd Schubert wrote:
 On Fri, Jun 27, 2008 at 01:07:32PM -0400, Brian J. Murrell wrote:
 On Fri, 2008-06-27 at 12:44 -0400, Brock Palen wrote:

 All of them are stuck in un-interruptible sleep.
 Has anyone seen this happen before?  Is this caused by a pending  
 disk
 failure?

 Well, they are certainly stuck because of some blocking I/O.  That  
 could
 be disk failure, indeed.

 mptscsi: ioc1: attempting task abort! (sc=010038904c40)
 scsi1 : destination target 0, lun 0
  command = Read (10) 00 75 94 40 00 00 10 00 00
 mptscsi: ioc1: task abort: SUCCESS (sc=010038904c40)

 That does not look like a picture of happiness, indeed, no.  You have
 SCSI commands aborting.


 Well, these messages are not nice of course, since the mpt error  
 handler
 got activated, but in principle a scsi device can recover then.
 Unfortunately, the verbosity level of scsi makes it impossbible to
 figure out what was actually the problem. Since we suffered from  
 severe
 scsi problems, I wrote quite a number of patches to improve the  
 situation.
 We now at least can understand where the problem came from and also  
 have
 a slightly improved error handling. These are presently for 2.6.22  
 only,
 but my plan is to sent these upstream for 2.6.28.


 Lustre: 6698:0:(lustre_fsfilt.h:306:fsfilt_setattr()) nobackup-
 OST0001: slow setattr 100s
 Lustre: 6698:0:(watchdog.c:312:lcw_update_time()) Expired watchdog
 for pid 6698 disabled after 103.1261s

 Those are just fallout from the above disk situation.

 Probably the device was offlined and actually this also should have  
 been
 printed in the logs. Brock, can you check the device status
 (cat /sys/block/sdX/device/state).

IO Is still flowing from both OST's on that OSS,

[EMAIL PROTECTED] ~]# cat /sys/block/sd*/device/state
running
running

Sigh, it only needs to live till August when we install our x4500's.
I think its safe to send a notice to users they may want to copy  
their data.


 Cheers,
 Bernd
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] OSS load in the roof

2008-06-27 Thread Brock Palen

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Jun 27, 2008, at 1:07 PM, Brian J. Murrell wrote:
 On Fri, 2008-06-27 at 12:44 -0400, Brock Palen wrote:

 All of them are stuck in un-interruptible sleep.
 Has anyone seen this happen before?  Is this caused by a pending disk
 failure?

 Well, they are certainly stuck because of some blocking I/O.  That  
 could
 be disk failure, indeed.

 mptscsi: ioc1: attempting task abort! (sc=010038904c40)
 scsi1 : destination target 0, lun 0
  command = Read (10) 00 75 94 40 00 00 10 00 00
 mptscsi: ioc1: task abort: SUCCESS (sc=010038904c40)

 That does not look like a picture of happiness, indeed, no.  You have
 SCSI commands aborting.

While the array was reporting no problems one of the disk was really  
lagging the others. We have swapped it out.  Thanks for the feedback  
everyone.


 Lustre: 6698:0:(lustre_fsfilt.h:306:fsfilt_setattr()) nobackup-
 OST0001: slow setattr 100s
 Lustre: 6698:0:(watchdog.c:312:lcw_update_time()) Expired watchdog
 for pid 6698 disabled after 103.1261s

 Those are just fallout from the above disk situation.

 b.

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (Darwin)

iD8DBQFIZUq/MFCQB4Bvz5QRAvacAJ9jkhi+2KgfbJ7bUI/KfHJ0Hnq1wQCeNgHO
d6+tzscwCqwYtuHXmzT2kFI=
=5p1N
-END PGP SIGNATURE-
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Lustre delete efficency

2008-06-26 Thread Brock Palen

On Jun 26, 2008, at 1:57 PM, Stew Paddaso wrote:
 We are considering using Lustre as our backend file platform. The
 specific application involves storing a high-volume of sequential data
 writes, with a moderate amount of reads (mostly sequencial, with some
 random seeks). Our concern is with reclaiming space. As the file
 system fills, we need to be able to quickly delete the oldest files.

 Does Lustre have an efficient file delete? I'm not expecting specific
 metrics (though they would be nice if available), just some general
 info about the Lustre delete process (Does it immediately reclaim the
 space, or do it 'lazily' in the background? etc, etc.).


I don't know specifics about if space reclaiming is 'lazy' or not.   
But from what I have seen compared to regular ext3  deleting large  
files on lustre was very fast.  I expect this to be because ldsikfs  
is extent based and regular ext3 is not.  If I am wrong on this  
someone please correct me I really would like to know this also.

For me deleting large number of files _feels_  very quick compared to  
our NFS bobcat from onstor also. Even a operation like the following  
was much quicker (I wish there was a better way to do this)

du -h --max-depth=1

 Thanks.
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] lustre and multi path

2008-06-05 Thread Brock Palen

Our new lustre hardware arrived from sun today.  Looking at the duel  
MDS and FC disk array for it.  We will need multipath.
Has anyone ever used multipath with lustre?  Is there any issues?  If  
we set up regular multipath via LVM lustre won't care as far as I can  
tell and browsing archives.

What about multipath without LVM?  Our StorageTek array has dual  
controllers with dual ports going to dual port FC cards in the  
MDS's.  Each MDS has a connection to both controllers so we will need  
multipath to get any advantage to this.

Comments?


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] external journals

2008-05-29 Thread Brock Palen

Whats a good way to find out if your (our) workload would benefit  
from external journals?  Our OST's are x4500's and I get little if  
any activity on the journals from my regular benchmarks.

What are the benefits?  Does ldiskfs do anything intelligent with  
external journals? What should we see the most help with?  Or should  
be just devote these disks to being another OST?


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] Luster access locking up login nodes

2008-05-16 Thread Brock Palen

I have seen this behavior a few times.
Under heavy IO lustre will just stop and dmesg will have the following:

LustreError: 3976:0:(events.c:134:client_bulk_callback()) event type  
0, status -5, desc 01012ce12000
LustreError: 11-0: an error occurred while communicating with  
[EMAIL PROTECTED] The mds_statfs operation failed with -107
LustreError: Skipped 1 previous similar messageLustre: nobackup- 
MDT-mdc-0100e9e9ac00: Connection to service nobackup-MDT  
via nid [EMAIL PROTECTED] was lost; in progress operations using  
this service will wait for recovery to complete.


No network connection issues between the login nodes.
When this happens the client does not recover till we reboot the  
node.  This does happen at times on the compute nodes but I see it  
most on login hosts.

If I just go to the lustre mount and try to ls it it will hang for  
forever.   Many times when lustre screws up it recovers but more and  
more it does not. and we see these bulk errors followed by mds errors.

We are using lustre 1.6.x


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Luster access locking up login nodes

2008-05-16 Thread Brock Palen

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Ahh didn't realize this was related to that.  Good to know fix in the  
works  (2 x4500's on the way so we have made a commitment to lustre).

How would I make this option the default on boot?  There isn't an  
llite module I see on the clients.
I can pdsh to all the clients, but machines to get rebooted some times.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



On May 16, 2008, at 4:13 PM, Brian J. Murrell wrote:
 On Fri, 2008-05-16 at 15:48 -0400, Brock Palen wrote:
 I have seen this behavior a few times.
 Under heavy IO lustre will just stop and dmesg will have the  
 following:

 Review the list archives for statahead problems.

 b.

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (Darwin)

iD8DBQFILe2TMFCQB4Bvz5QRAvuRAKCG94UfDQvSxcBXCPSxThLuirrGbACfUsTm
hobLA1aA+AHrZd4mwkY5sKQ=
=3Gzv
-END PGP SIGNATURE-
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] MDS Fail-Over planning.

2008-05-06 Thread Brock Palen

I know some users talked about DRBD for the shared disk on the MDS.

What was the conclusion of this?  Bad Idea?  I do some high available  
NFS using this exact same setup.


DRBD  provides shared storage,
Heart Beat is used to monitor hosts.
IPMI  is used by HeartBeat to power down hosts that are to be killed.

The plan on our table right now is two thumpers as the OSS's.
Then two x4100 or 4200/s  with mirrors SAS drives then shared across  
with DRBD with Heart Beat.


Any comments?  Any issues to be aware of?  Anyone running something  
similar?


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] state of sun x4500 drivers

2008-04-23 Thread Brock Palen

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Thats disappointing, thanks for the input though.  The paper at:
http://wiki.lustre.org/images/7/79/Thumper-BP-6.pdf

Points out how to patch it to enable that functionality, but we want  
to keep with the CFS stock kernel.
30MB/s  might be fine for us,  we only plan on bonding the 4 thumber  
1Gig-e  interfaces.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



On Apr 23, 2008, at 1:00 PM, Brian Behlendorf wrote:

 Recently I have also been doing some linux work with the x4500 and  
 I have been
 using the sata_mv driver (v0.81).  The driver will properly detect  
 all the
 drives and you may access them safely.  However, from what I've  
 seen the
 driver needs some further development work to actually perform  
 well.  I see
 only 30 MB/s write rates to a single disk using a simple streaming  
 dd test.
 Much of this bad performance may simply be due to the fact that the  
 driver
 can not enable the disk write-back cache forcing you to use write- 
 thru mode.

 So currently the bottom line is linux will work on the x4500.  But  
 to get it
 working well someone is going to need to invest some development  
 effort to
 improve the linux driver.

 Good luck,
 Brian


 There was some discussion about the driver/module for the sata
 controlers in the thumper (x4500) in the linux kernel.

 My question is if we bought one of these,  would the CFS kernel have
 everything needed to use the thumper in a safe way.
 Thank You.
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (Darwin)

iD8DBQFID2xmMFCQB4Bvz5QRAsDCAJ44d0zU+fUlQkEASgc3M6JBciYgdQCfaUUM
r2VnAo6S0Tj3+qGYCoKrcGE=
=Xw7Q
-END PGP SIGNATURE-
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] lfs setstripe

2008-04-18 Thread Brock Palen

On Apr 17, 2008, at 10:48 PM, Kaizaad Bilimorya wrote:

 On Thu, 17 Apr 2008, Brock Palen wrote:
 I don't think you need to do this.
 If i understand right, you can set the stripe size of the mount,  
 and everything inside that directory inherits it, unless they them  
 self's were explicitly set.

 Hi Brock, thanks for the reply.

 I have set the stripe count on the lustre mount using lfs  
 setstripe but the problem is that any sub directories that already  
 existed under this mount will have the default filesystem stripe  
 count and not the new one I set, so any new files created under  
 these existent sub directories will inherit their parent directory  
 stripe count and not the newly set one from the lustre mount.

Ahh I see, I really don't know, I think try walking the system and  
changing all the old directories.  I have not had to do this my self.


 eg:

 /lustremount - lfs setstripe /lustremount 0 -1 2
 /lustremount/existing_dir - has filesystem default stripe count (1  
 in this case)
 /lustremount/new_dir - gets stripe count of parent (2 in this case)
 /lustremount/existing_dir/newfile - has filesystem default stripe  
 count of 1

 So that is why I have to do either option 1 (change default) or 2  
 (traverse and set explicitly for all existing dirs) that I  
 specified, but I would like to know if there are any performance or  
 other reasons not to do option 2.

 thanks
 -k

 Also files that already are created will keep the stripe settings  
 they were created with.  You would need to copy them, and move  
 over the old one to change to the new stripe settings.  Check the  
 lustre manual they have something about this.

 On Apr 17, 2008, at 10:34 AM, Kaizaad Bilimorya wrote:
 Hello,
 I would like to adjust the stripe count for our lustre  
 filesystem. Would
 it be better to:
 1) Kill all jobs, unmount the lustre filesystem from all clients,  
 and then
 adjust the default stripe count for the lustre filesystem on  
 the MDS
 using lctl.
 or
 2) Use find and the lfs setstripe command to traverse and set  
 the
 stripe count for all directories in a currently mounted lustre
 filesystem.
 Besides the traversal cost of the filesystem, are there other
 disadvantages, performance reasons, or other reasons not to use  
 option 2?
 thanks
 -k
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss




___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] lfs setstripe

2008-04-17 Thread Brock Palen

I don't think you need to do this.
If i understand right, you can set the stripe size of the mount, and  
everything inside that directory inherits it, unless they them self's  
were explicitly set.

Also files that already are created will keep the stripe settings  
they were created with.  You would need to copy them, and move over  
the old one to change to the new stripe settings.  Check the lustre  
manual they have something about this.

You can use 'getstripe'  to see what a file/directory use for their  
settings.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



On Apr 17, 2008, at 10:34 AM, Kaizaad Bilimorya wrote:
 Hello,

 I would like to adjust the stripe count for our lustre filesystem.  
 Would
 it be better to:

 1) Kill all jobs, unmount the lustre filesystem from all clients,  
 and then
 adjust the default stripe count for the lustre filesystem on  
 the MDS
 using lctl.

 or

 2) Use find and the lfs setstripe command to traverse and set the
 stripe count for all directories in a currently mounted lustre
 filesystem.


 Besides the traversal cost of the filesystem, are there other
 disadvantages, performance reasons, or other reasons not to use  
 option 2?

 thanks
 -k
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] MGS and loop devices

2008-04-14 Thread Brock Palen

I don't know if this still applies, but back when I was doing some  
work with Xen Hypervisor, loopback devices did not provide safe  
places to put files in a power failure.

Loopback did not make sure that things in memory were flushed to the  
file and synced to the disk, leaving dirty data in memory.

Might want to verify this, just don't get caught with stuff in ram.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



On Apr 14, 2008, at 3:12 PM, Jakob Goldbach wrote:

 On Mon, 2008-04-14 at 17:40 +0200, Fereyre Jerome wrote:
 Has anybody used loop devices for MGT?

 Since there's not so much information stored in this Target, it
 can be a good alternative to disk partitions...


 You could place it on the same partition/volume as the MDT but I  
 believe
 you get less noise in dmesg during start/stop if you have the MGT and
 MDT seperate as this allows you to start the MDT after your OSSs.

 I'm using an LVM volume for my MGS (and MDT - wanted to try the fast
 scanner for backup which requires snapshoting). My MGS size is 64MB -
 about 10% is used in a two oss + 3 clients setup.

 I'm also interested in knowing about how much space the MGT uses for a
 many-node system.

 /Jakob

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] filesystem UID' GID's

2008-04-11 Thread Brock Palen

Does a /etc/passwd with all the filesystem users UID's required only  
on the MDS ?  Or does the OST's need them also?

Testing for me shows only the MDS, but I could be wrong.
We don't use LDAP or anything like that at the moment for UID GID  
mapping.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] more problems with lustre,

2008-03-24 Thread Brock Palen

I found today that a very large number of nodes who are lustre  
clients show ptlrpcd taking 100% cpu and the lustre mount is  
completely unavailable!


I have attached the output of 'lctl dk /tmp/data'  to this message.   
Any insight would be helpful.  I am afraid though that this with the  
problem of clients being evicted all the time, that our evaluation of  
lustre will end, and we will not be using it in the future :-(


When it works it works great,  but to our group we can not deal with  
how unstable it is.  We will try lustre again when it hits version  
2.0 (running 1.6.4.1 right now with patchless clients).  Thanks for  
all the help you have given us while we have been evaluating it!




data
Description: Binary data



Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] lustre dstat plugin

2008-03-10 Thread Brock Palen

On Mar 9, 2008, at 10:03 PM, Aaron Knister wrote:

 Just wondering if either of you have used collectl if/and which you  
 prefer- dstat or collectl.

Never used it, Looks like they solve the same problem.  I like dstat  
for the simple plugins. (if your a better python programer than me).   
And how you can pull our results, like I use the following on our  
lustre OSS with two OST's sda and sdb.

dstat -D sda,sdb,total

That gives me per disk stats and a total.

Similar tools could be made for collectl I'm sure.

Brock


 -Aaron

 On Mar 7, 2008, at 7:03 PM, Brock Palen wrote:

 On Mar 7, 2008, at 6:58 PM, Kilian CAVALOTTI wrote:

 Hi Brock,

 On Wednesday 05 March 2008 05:21:51 pm Brock Palen wrote:
 I have wrote a lustre dstat plugin.  You can find it on my blog:

 That's cool! Very useful for my daily work, thanks!

 Thanks!  Its the first python I ever wrote.


 It only works on clients, and has not been tested on multiple  
 mounts,
 Its very simple just reads /proc/

 It indeed doesn't read stats for multiple mounts. I slightly
 modified it
 so it can display read/write numbers for all the mounts it founds  
 (see
 the attached patch).

 This is great idea


 Here's a typical output for a rsync transfer from scrath to home:

 -- 8  
 ---
 $ dstat -M lustre

 Module dstat_lustre is still experimental.
 --scratch---home---
 read write: read write
 110M0 :   0   110M
 183M0 :   0   183M
 184M0 :   0   184M
 -- 8  
 ---

 Maybe it could be useful to also add the other metrics from the stat
 file, but I'm not sure which ones would be the more relevant. And it
 would probably be wise to do that in a separate module, like
 lustre_stats, to avoid clutter.

 Yes,  dstat comes with plugins for nfsv3  and has two modules,

 dstat_nfs3  and dstat_nfs3op  which has extended details.  So I think
 this would be a good idea to follow that model.


 Anyway, great job, and thanks for sharing it!

 Thanks again.

 Cheers,
 -- 
 Kiliandstat_lustre.diff

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

 Aaron Knister
 Associate Systems Analyst
 Center for Ocean-Land-Atmosphere Studies

 (301) 595-7000
 [EMAIL PROTECTED]







___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] yet another lustre error

2008-03-10 Thread Brock Palen

On Mar 9, 2008, at 10:01 PM, Aaron Knister wrote:

 Hi! I have a few questions for you-

 1. How many nodes was his job running on?

around 64 serial jobs accessing the same directory (not the same files).

 2. What version of lustre and linux kernel are you running on your  
 servers/clients?

Lustre servers:
2.6.9-55.0.9.EL_lustre.1.6.4.1smp

Clients:
2.6.9-67.0.1.ELsmp


 3. What ethernet module are you using on the servers/clients?

Most use the tg3, some use e1000.


 I honestly am not sure what the RPC errors mean but I've had  
 similar issues caused by ethernet-level errors.

Over the weekend the MDS/MGS went into a unhealthy state forced a  
reboot+fsck and when it came back up the directory was accessible  
again and jobs started working again.


 -Aaron

 On Mar 7, 2008, at 6:45 PM, Brock Palen wrote:

 On a file system thats been up for only 57 days,  I have:

 505 lustre-log.   dumps.

 THe problem at hand is a user has many jobs where his jobs are now
 hung trying to create a directory from his pbs script.  On the
 clients i see:

 LustreError: 11-0: an error occurred while communicating with
 [EMAIL PROTECTED] The mds_connect operation failed with -16
 LustreError: Skipped 2 previous similar messages

 On every client his jobs are on.

 In the most recent /tmp/lustre-log.  on the MDS/MGS I see this  
 message:

 @@@ processing error (-16)  [EMAIL PROTECTED] x12808293/t0 o38-
 [EMAIL PROTECTED]:-1
 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0
 ldlm_lib.c
 target_handle_reconnect
 nobackup-MDT: 34b4fbea-200b-1f7c-dac0-516b8ce786fc reconnecting
 ldlm_lib.c
 target_handle_connect
 nobackup-MDT: refuse reconnection from 34b4fbea-200b-1f7c-
 [EMAIL PROTECTED]@tcp to 0x0100069a7000; still busy
 with 2 active RPCs
 ldlm_lib.c
 target_send_reply_msg
 @@@ processing error (-16)  [EMAIL PROTECTED] x11199816/t0 o38-
 [EMAIL PROTECTED]:-1
 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0


 What I see messages about active rpc's in other logs.  What would
 this mean?  Is something suck someplace ?



 Brock Palen
 Center for Advanced Computing
 [EMAIL PROTECTED]
 (734)936-1985


 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

 Aaron Knister
 Associate Systems Analyst
 Center for Ocean-Land-Atmosphere Studies

 (301) 595-7000
 [EMAIL PROTECTED]







___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] socknal_sd00 100% lower?

2008-03-07 Thread Brock Palen

On Mar 7, 2008, at 8:51 AM, Maxim V. Patlasov wrote:

 Brock,

 If our IO servers are seeing extended periods of socknal_sd00 at  
 100%  cpu,  Would this cause a bottle neck?
 Yes, I think so.

 If so its a single homed  hosts, would adding another interface to  
 the host help?

 Probably, no. It could only help in the case you have several CPUs  
 but something prevents ksocklnd to spread the load over them.

The servers are dual cpu systems.  But I only see a single socknal_sd  
thread.


 Is there threading anyplace?
 Yes, ksocklnd spawns separate socknal_sd thread for each CPU/core  
 that you have. There are two algorithms of spreading the load  -  
 you can play with enable_irq_affinity modparam flag.

I see some things in logs about setting cpu affinity,  Ill check out  
the manual some more,


 Or is faster cpu the only way out?

 I believe you either need faster CPU or faster system bus. If slow  
 system bus isn't your case, increasing number of CPUs also will do.

Ok


 Sincerely,
 Maxim



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] yet another lustre error

2008-03-07 Thread Brock Palen

On a file system thats been up for only 57 days,  I have:

505 lustre-log.   dumps.

THe problem at hand is a user has many jobs where his jobs are now  
hung trying to create a directory from his pbs script.  On the  
clients i see:

LustreError: 11-0: an error occurred while communicating with  
[EMAIL PROTECTED] The mds_connect operation failed with -16
LustreError: Skipped 2 previous similar messages

On every client his jobs are on.

In the most recent /tmp/lustre-log.  on the MDS/MGS I see this message:

@@@ processing error (-16)  [EMAIL PROTECTED] x12808293/t0 o38- 
 [EMAIL PROTECTED]:-1  
lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0
ldlm_lib.c
target_handle_reconnect
nobackup-MDT: 34b4fbea-200b-1f7c-dac0-516b8ce786fc reconnecting
ldlm_lib.c
target_handle_connect
nobackup-MDT: refuse reconnection from 34b4fbea-200b-1f7c- 
[EMAIL PROTECTED]@tcp to 0x0100069a7000; still busy  
with 2 active RPCs
ldlm_lib.c
target_send_reply_msg
@@@ processing error (-16)  [EMAIL PROTECTED] x11199816/t0 o38- 
 [EMAIL PROTECTED]:-1  
lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0


What I see messages about active rpc's in other logs.  What would  
this mean?  Is something suck someplace ?



Brock Palen
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] socknal_sd00 100% lower?

2008-03-06 Thread Brock Palen

If our IO servers are seeing extended periods of socknal_sd00 at 100%  
cpu,  Would this cause a bottle neck?  If so its a single homed  
hosts, would adding another interface to the host help?

Is there threading anyplace?  Or is faster cpu the only way out?


Brock Palen
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] lustre dstat plugin

2008-03-05 Thread Brock Palen

I have wrote a lustre dstat plugin.  You can find it on my blog:

http://www.mlds-networks.com/index.php/component/option,com_mojo/ 
Itemid,29/p,31/

It only works on clients, and has not been tested on multiple mounts,  
Its very simple just reads /proc/

Example:

dstat -a -M lustre

total-cpu-usage -dsk/total- -net/total- ---paging-- --- 
system-- lustre-1.6-
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int
csw | read  writ
  23  53   1  21   0   0|   0  0 |3340k 4383k|   0 0 | 
3476   198 |  16M   22M
  13  69  16   2   0   1|   0  0 |1586k   16M|   0  0 | 
3523   424 |  24M   14M
  69  30   0   0   01|   0  8192B|1029k   18M|   0  0 | 
3029 88 |   0  0

Patches/comments,

Brock Palen
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Luster clients getting evicted

2008-02-06 Thread Brock Palen

 If client get eviction from the server, it might be triggered by

 1) server did not get client pinger msg in a long time.
 2) client is too busy to handle the server lock cancel req.

Clients show a load of 4.2  (4 cores total, 1 process per core).

 3) client cancel the lock, but the network just dropped the cancel  
 reply to server.
I see a very small amount (6339) of dropped packets on the interfaces  
of the OSS.  Links between the switches show no errors.


 4) server is too busy to handle the lock cancel reply from the  
 client or be blocked somewhere.

I started paying attention to the OSS more once you said this, some  
times i see the cpu use of socknal_sd00 get to 100%.  Now is this  
process used to keep all the odb_ping's going?

Both the OSS and the MDS/MGS are SMP systems and run single  
interfaces.  If I dual homed the servers would that create another  
socknal process for lnet?



 It seems there are a lot of metadata operations in your job. I  
 guess your eviction
 might be caused by the latter 2 reasons. If you could provide the  
 process stack trace on MDS
 when the job died, it might help us to figure out what is going on  
 there?

 WangDi
 Brock Palen
 Center for Advanced Computing
 [EMAIL PROTECTED]
 (734)936-1985


 On Feb 4, 2008, at 2:47 PM, Brock Palen wrote:


 Which version of lustre do you use?
 Server and clients same version and same os? which one?

 lustre-1.6.4.1

 The servers (oss and mds/mgs) use the RHEL4 rpm from lustre.org:
 2.6.9-55.0.9.EL_lustre.1.6.4.1smp

 The clients run patchless RHEL4
 2.6.9-67.0.1.ELsmp

 One set of clients are on a 10.x network while the servers and other
 half of clients are on a 141.  network, because we are using the tcp
 network type, we have not setup any lnet routes.  I don't think
 should cause a problem but I include the information for  
 clarity.  We
 do route 10.x on campus.


 Harald

 On Monday 04 February 2008 04:11 pm, Brock Palen wrote:

 on our cluster that has been running lustre for about 1 month.  
 I  have
 1 MDT/MGS and 1 OSS with 2 OST's.

 Our cluster uses all Gige and has about 608 nodes 1854 cores.

 We have allot of jobs that die, and/or go into high IO wait,   
 strace
 shows processes stuck in fstat().

 The big problem is (i think) I would like some feedback on it   
 that of
 these 608 nodes 209 of them have in dmesg the string

 This client was evicted by

 Is this normal for clients to be dropped like this?  Is there some
 tuning that needs to be done to the server to carry this many  
 nodes
 out of the box?  We are using default lustre install with Gige.


 Brock Palen
 Center for Advanced Computing
 [EMAIL PROTECTED]
 (734)936-1985


 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

 -- 
 Harald van Pee

 Helmholtz-Institut fuer Strahlen- und Kernphysik der  
 Universitaet  Bonn
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss



 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss




 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss





___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

95 matches

Mail list logo