[Lustre-discuss] Lustre RAID1 SNS

2010-05-06 Thread Mag Gam
Was looking thru here,
http://wiki.lustre.org/images/f/ff/OST_Migration_RAID1_SNS.pdf

Is this actually work in progress or proposal? This is perhaps the
feature of the decade for Lustre :-)
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] MDS high cpu usage

2010-05-06 Thread Frans-Paul Ruzius
Hello,

On the lustre system I have set up I use one MDS and 5 OSTs.
The MDS has almoast 100% CPU usage when I check top.
Sometimes the filesystem responds slow.
There are two processes running with over 35% cpu usage.

socknal_sd00 and socknal_sd01

Are these the main processes of the lfs?
Is it better to move the MDS to a faster node (with more CPU power)?
The MDS has now a 2Ghz dualcore AMD Athlon processor.

There area around 30 clients accessing the filesystem, somtimes all at once.

Regards,

Frans Ruzius
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre, NFS and mds_getattr_lock operation

2010-05-06 Thread Andreas Dilger
On 2010-05-06, at 11:57, Frederik Ferner wrote:
 On our Lustre system we are seeing the following error fairly regularly, 
 so far we have not had complaints from users and have not noticed any 
 negative effects, but it would still be nice to understand the errors 
 better. The systems reporting these errors are NFS exporters for 
 subtrees of the Lustre file system.
 
 One the Lustre client/NFS server:
 
 May  6 14:23:09 i16-storage1 kernel: LustreError: 11-0: an error 
 occurred while communicating with 172.23.6...@tcp. The mds_getattr_lock 
 operation failed with -13

-13 is -EACCESS (per /usr/include/asm-generic/errno-base.h) or equivalent

That just means that someone tried to access a file they don't have permission 
to access.  As to why this is being printed on the console is a bit of a 
mystery, since I haven't seen anything similar.  I wonder if NFS is going down 
some obscure code path that is returning the error to the RPC handler instead 
of stashing this normal error code inside the reply.

In any case it is harmless and expected (sigh).  I'd hope it would have been 
removed in newer versions, but I don't know at all.

 Does anyone know if we should worry about those messages or if we can 
 safely ignore them? Or should we assume that some of our users might 
 have a problem accessing data that they have just not reported? Even 
 though I find that unlikely.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss