[Lustre-discuss] (unknown subject)

2008-10-13 Thread Deval kulshrestha

HI Kevin

Thanks for your reply. Now I can set it up.

Regards
Deval 

Message: 9
Date: Thu, 09 Oct 2008 05:46:27 -0600
From: Kevin Van Maren [EMAIL PROTECTED]
Subject: Re: [Lustre-discuss] Test setup configuration
To: Deval kulshrestha [EMAIL PROTECTED]
Cc: lustre-discuss@lists.lustre.org
Message-ID: [EMAIL PROTECTED]
Content-Type: text/plain; format=flowed; charset=ISO-8859-1

Deval kulshrestha wrote:

 Hi

  

 I am a new luster user, trying to evaluate luster with few 
 configuration. I am going through luster 1.6 Operation manual. But I 
 am not able to understand which package should be installed on MDS, 
 OSS , and client.

 Should I install all the packages on all three types of nodes?

  

 Please explain

  

 Best Regards

 Deval K


Lustre servers (MDS/OSS):
  kernel-lustre-smp // patched server kernel
  lustre-modules // Lustre kernel modules
  lustre // user space tools (server)
  lustre-ldiskfs // ldiskfs
  e2fsprogs // filesystem tools

You can install all those RPMs on the client as well, but it is not 
necessary.
Lustre clients (assuming you have the matching vendor kernel for the 
lustre-modules installed):
  lustre-client-modules // kernel modules for client
  lustre-client // user space (client)

Kevin



--

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


End of Lustre-discuss Digest, Vol 33, Issue 12
**
No virus found in this incoming message.
Checked by AVG - http://www.avg.com 
Version: 8.0.173 / Virus Database: 270.7.6/1715 - Release Date: 10/8/2008
7:19 PM

===
Privileged or confidential information may be contained
in this message. If you are not the addressee indicated
in this message (or responsible for delivery of the 
message to such person), please delete this message and
kindly notify the sender by an emailed reply. Opinions,
conclusions and other information in this message that
do not relate to the official business of Progression
and its associate entities shall be understood as neither
given nor endorsed by them.
  

---
Progression Infonet Private Limited, Gurgaon (Haryana), India
Authorised dealer of PostMaster, by QuantumLink Communications Pvt. Ltd
Get your free copy of PostMaster at http://www.postmaster.co.in/



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] kerberos with lustre

2008-10-13 Thread mohammed gaafar
hi all,

I've been trying to kerberize lustre, so I followed the instructions
mentioned in the lustre manual, but unfortunately the step which requires
mounting the MDT and OST using the command mount -t lustre -o sec=plain
/dev/sda8 /mnt/data/mdt didn't work raising an error which says
Unrecognized mount option sec=plain or missing value

Any help, please

Thank You

-- 
Mohammed Abd El-Monem Gaafar
Software Engineer
ICT Sector
Bibliotheca Alexandrina
P.O.Box 138, Chatby,
Alexandria 21526, Egypt
Tel:  +20 3 483
Fax: +20 3 4820405
Ext.: 1417
Website: www.bibalex.org
Email: [EMAIL PROTECTED]
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre 1.6.5.1 on X4200 and STK 6140 Issues

2008-10-13 Thread Brock Palen
I know you say the only addition was the RDAC for the MDS's I assume  
(we use it also just fine).

When I ran faultmond from suns dcmu rpm (RHEL 4 here)  the x4500's  
would crash like clock work ~48 hours.  For a very simple bit of code  
I was surpised that once when I forgot to turn it on when working on  
the load this would happen.  Just FYI it was unrelated to lustre  
(using provided rpm's no kernel build)  this solved my problem on the  
x4500

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



On Oct 13, 2008, at 4:41 AM, Malcolm Cowe wrote:

 The X4200m2 MDS systems and the X4500 OSS were rebuilt using the  
 stock Lustre packages (Kernel + modules + userspace). With the  
 exception of the RDAC kernel module, no additional software was  
 applied to the systems. We recreated our volumes and ran the  
 servers over the weekend. However, the OSS crashed about 8 hours  
 in. The syslog output is attached to this message.

 Looks like it could be similar to bug #16404, which means patching  
 and rebuilding the kernel. Given my lack of success at trying to  
 build from source, I am again asking for some guidance on how to do  
 this. I sent out the steps I used to try and build from source on  
 the 7th because I was encountering problems and was unable to get a  
 working set of packages. Included in that messages was output from  
 quilt that implies that the kernel patching process was not working  
 properly.


 Regards,

 Malcolm.

 -- 
 6g_top.gif
 Malcolm Cowe
 Solutions Integration Engineer

 Sun Microsystems, Inc.
 Blackness Road
 Linlithgow, West Lothian EH49 7LR UK
 Phone: x73602 / +44 1506 673 602
 Email: [EMAIL PROTECTED]
 6g_top.gif
 Oct 10 06:49:39 oss-1 kernel: LDISKFS FS on md15, internal journal
 Oct 10 06:49:39 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
 ordered data mode.
 Oct 10 06:53:42 oss-1 kernel: kjournald starting.  Commit interval  
 5 seconds
 Oct 10 06:53:42 oss-1 kernel: LDISKFS FS on md16, internal journal
 Oct 10 06:53:42 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
 ordered data mode.
 Oct 10 06:57:49 oss-1 kernel: kjournald starting.  Commit interval  
 5 seconds
 Oct 10 06:57:49 oss-1 kernel: LDISKFS FS on md17, internal journal
 Oct 10 06:57:49 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
 ordered data mode.
 Oct 10 07:44:55 oss-1 faultmond: 16:Polling all 48 slots for drive  
 fault
 Oct 10 07:45:00 oss-1 faultmond: Polling cycle 16 is complete
 Oct 10 07:56:23 oss-1 kernel: Lustre: OBD class driver,  
 [EMAIL PROTECTED]
 Oct 10 07:56:23 oss-LDISKFS-fs: file extents enabled1 kernel:
   Lustre VersionLDISKFS-fs: mballoc enabled
 : 1.6.5.1
 Oct 10 07:56:23 oss-1 kernel: Build Version:  
 1.6.5.1-1969123119-PRISTINE-.cache.OLDRPMS.20080618230526.linux- 
 smp-2.6.9-67.0.7.EL_lustre.1.6.5.1.x86_64-2.6.9-67.0.7.EL_lustre. 
 1.6.5.1smp
 Oct 10 07:56:24 oss-1 kernel: Lustre: Added LNI [EMAIL PROTECTED]  
 [8/64]
 Oct 10 07:56:24 oss-1 kernel: Lustre: Lustre Client File System;  
 [EMAIL PROTECTED]
 Oct 10 07:56:24 oss-1 kernel: kjournald starting.  Commit interval  
 5 seconds
 Oct 10 07:56:24 oss-1 kernel: LDISKFS FS on md11, external journal  
 on md21
 Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
 journal data mode.
 Oct 10 07:56:24 oss-1 kernel: kjournald starting.  Commit interval  
 5 seconds
 Oct 10 07:56:24 oss-1 kernel: LDISKFS FS on md11, external journal  
 on md21
 Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
 journal data mode.
 Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: file extents enabled
 Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: mballoc enabled
 Lustre: Request x1 sent from [EMAIL PROTECTED] to NID  
 [EMAIL PROTECTED] 5s ago has timed out (limit 5s).
 Oct 10 07:56:30 oss-1 kernel: Lustre: Request x1 sent from  
 [EMAIL PROTECTED] to NID [EMAIL PROTECTED] 5s ago has timed  
 out (limit 5s).
 LustreError: 4685:0:(events.c:55:request_out_callback()) @@@ type  
 4, status -113  [EMAIL PROTECTED] x3/t0 o250- 
 [EMAIL PROTECTED]@o2ib_1:26/25 lens 240/400 e 0 to 5 dl  
 1223621815 ref 2 fl Rpc:/0/0 rc 0/0
 Lustre: Request x3 sent from [EMAIL PROTECTED] to NID  
 [EMAIL PROTECTED] 0s ago has timed out (limit 5s).
 LustreError: 18125:0:(obd_mount.c:1062:server_start_targets())  
 Required registration failed for lfs01-OST: -5
 LustreError: 15f-b: Communication error with the MGS.  Is the MGS  
 running?
 LustreError: 18125:0:(obd_mount.c:1597:server_fill_super()) Unable  
 to start targets: -5
 LustreError: 18125:0:(obd_mount.c:1382:server_put_super()) no obd  
 lfs01-OST
 LustreError: 18125:0:(obd_mount.c:119:server_deregister_mount())  
 lfs01-OST not registered
 LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success)
 LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0  
 breaks, 0 lost
 LDISKFS-fs: mballoc: 0 generated and it took 0
 LDISKFS-fs: mballoc: 0 preallocated, 0 discarded
 Oct 10 07:56:50 oss-1 

Re: [Lustre-discuss] lustre/drbd/heartbeat setup [was: drbd async mode]

2008-10-13 Thread Thomas Roth
Hi,

read your instructions - that's pretty much the setup we are using, too. 
And it works very well, drbd 0.8 non-withstanding, but on a hardware raid.
I do not quite understand your remark about not using an extra net for 
drbd - have you tried putting the name that's in your drbd.conf together 
with the other IP into /etc/hosts ? My guess is that the performance of 
your MDS-pair is influenced by drbd doing its job - I would keep that 
separate from the Lustre data stream.
The machines we are planning to use in our next cluster are actually 
equipped with four network interfaces - two (bonded) for Lustre, one for 
drbd and one for heartbeat - those serial cables only give me error 
messages and headaches.

We have separate partitions for MGS and MDT - on one machine. Didn't 
understand that this would not be the Lustre way? This way at least one 
doesn't have to worry about a super fast connection between MGS and MDT.

Is there a particular reason for not managing the IP via heartbeat? At 
least it's easier to setup than the drbddisk and Filesystem resources.

Regards,
Thomas

Heiko Schroeter wrote:
 Hello,
 
 at last a first version of our setup scenario is ready.
 
 Please consider this as a general guideline. It may contain errors.
 We know that some things are done differently in the lustre community i.e. 
 placing MDS and MDT on seperate machines.
 
 Please let me know if you find bugs or if things can be improved.
 
 There is more than one way.
 
 Regards
 Heiko
 
 
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Multiple interfaces in LNet

2008-10-13 Thread Danny Sternkopf
Hi,

I learn about LNet and I also have read one of the recent post with
subject 'Adding IB to tcp only cluster'.

Interesting is how to use multiple interfaces on the same server in
Lustre/LNet. My understanding is that TCP(ksocklnd) can manage multiple
physical interfaces as one LNet interface with one unique NID. Is that
still correct and recommended to use? Or is it better to setup Ethernet
bonding(under Linux) and bind that bonding interface to LNet?

Beside of TCP it is only possible to use multiple interfaces on the same
node with o2ib, right? With ko2iblnd one can setup several Lustre
networks for each IB interface. In fact you must setup several Lustre
networks otherwise only the 1st IB interface is used, correct?

It is not clear for me how MGS, MDS, OSS and Client choose a NID for
communication. I mean I know that LNet choose the best one, but who
provides the list with all available NIDs for a server? Or does it work
somehow different?

I'am aware of 'lctl list_nids' and 'lctl which_nid list of nids'. That
 is fine. I would like to know what a Lustre server or a client does.

Furthermore I don't get the MGS target information kept in MDT and OST
devices. For what and how is it used? What happens during mount of MDT
or OST and who talks to each other and how?

I'am looking forward to get some better understanding. Thank you and
Best regards,

Danny
-- 
Danny Sternkopf http://www.nec.de/hpc   [EMAIL PROTECTED]
HPCE Division  Germany phone: +49-711-68770-35 fax: +49-711-6877145
~~~
NEC Deutschland GmbH, Hansaallee 101, 40549 Düsseldorf
Geschäftsführer Yuya Momose
Handelsregister Düsseldorf HRB 57941; VAT ID DE129424743

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] e2fsprogs in Debian/Lenny

2008-10-13 Thread Jakob Goldbach
Hi,

Debian lenny has e2fsprogs 1.41.2, with lots of ext4/extents info in its
changelog. Could this be used when fsck'ing an OST instead of SUNs
e2fsprogs ?

/Jakob

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre 1.6.5.1 on X4200 and STK 6140 Issues

2008-10-13 Thread Brock Palen
I never uninstalled it (i still use some of the tools in it)   
Faultmond is a service,  just chkconfig it off.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



On Oct 13, 2008, at 11:03 AM, Malcolm Cowe wrote:

 Brock Palen wrote:

 I know you say the only addition was the RDAC for the MDS's I  
 assume (we use it also just fine).
 Yes, the MDS's share a STK 6140.
 When I ran faultmond from suns dcmu rpm (RHEL 4 here) the x4500's  
 would crash like clock work ~48 hours. For a very simple bit of  
 code I was surpised that once when I forgot to turn it on when  
 working on the load this would happen. Just FYI it was unrelated  
 to lustre (using provided rpm's no kernel build) this solved my  
 problem on the x4500
 The DCMU RPM is installed. I didn't explicitly install this, so it  
 must have been bundled in with the SIA CD... I'll try removing the  
 rpm to see what happens. Thanks for the heads up.

 Regards,

 Malcolm.

 Brock Palen www.umich.edu/~brockp Center for Advanced Computing  
 [EMAIL PROTECTED] (734)936-1985 On Oct 13, 2008, at 4:41 AM,  
 Malcolm Cowe wrote:

 The X4200m2 MDS systems and the X4500 OSS were rebuilt using the  
 stock Lustre packages (Kernel + modules + userspace). With the  
 exception of the RDAC kernel module, no additional software was  
 applied to the systems. We recreated our volumes and ran the  
 servers over the weekend. However, the OSS crashed about 8 hours  
 in. The syslog output is attached to this message. Looks like it  
 could be similar to bug #16404, which means patching and  
 rebuilding the kernel. Given my lack of success at trying to  
 build from source, I am again asking for some guidance on how to  
 do this. I sent out the steps I used to try and build from source  
 on the 7th because I was encountering problems and was unable to  
 get a working set of packages. Included in that messages was  
 output from quilt that implies that the kernel patching process  
 was not working properly. Regards, Malcolm. -- 6g_top.gif  
 Malcolm Cowe Solutions Integration Engineer Sun Microsystems,  
 Inc. Blackness Road Linlithgow, West Lothian EH49 7LR UK Phone:  
 x73602 / +44 1506 673 602 Email: [EMAIL PROTECTED]  
 6g_top.gif Oct 10 06:49:39 oss-1 kernel: LDISKFS FS on md15,  
 internal journal Oct 10 06:49:39 oss-1 kernel: LDISKFS-fs:  
 mounted filesystem with ordered data mode. Oct 10 06:53:42 oss-1  
 kernel: kjournald starting. Commit interval 5 seconds Oct 10  
 06:53:42 oss-1 kernel: LDISKFS FS on md16, internal journal Oct  
 10 06:53:42 oss-1 kernel: LDISKFS-fs: mounted filesystem with  
 ordered data mode. Oct 10 06:57:49 oss-1 kernel: kjournald  
 starting. Commit interval 5 seconds Oct 10 06:57:49 oss-1 kernel:  
 LDISKFS FS on md17, internal journal Oct 10 06:57:49 oss-1  
 kernel: LDISKFS-fs: mounted filesystem with ordered data mode.  
 Oct 10 07:44:55 oss-1 faultmond: 16:Polling all 48 slots for  
 drive fault Oct 10 07:45:00 oss-1 faultmond: Polling cycle 16 is  
 complete Oct 10 07:56:23 oss-1 kernel: Lustre: OBD class driver,  
 [EMAIL PROTECTED] Oct 10 07:56:23 oss-LDISKFS-fs: file extents  
 enabled1 kernel: Lustre VersionLDISKFS-fs: mballoc enabled :  
 1.6.5.1 Oct 10 07:56:23 oss-1 kernel: Build Version:  
 1.6.5.1-1969123119-PRISTINE-.cache.OLDRPMS. 
 20080618230526.linux- smp-2.6.9-67.0.7.EL_lustre. 
 1.6.5.1.x86_64-2.6.9-67.0.7.EL_lustre. 1.6.5.1smp Oct 10 07:56:24  
 oss-1 kernel: Lustre: Added LNI [EMAIL PROTECTED] [8/64] Oct 10  
 07:56:24 oss-1 kernel: Lustre: Lustre Client File System;  
 [EMAIL PROTECTED] Oct 10 07:56:24 oss-1 kernel: kjournald  
 starting. Commit interval 5 seconds Oct 10 07:56:24 oss-1 kernel:  
 LDISKFS FS on md11, external journal on md21 Oct 10 07:56:24  
 oss-1 kernel: LDISKFS-fs: mounted filesystem with journal data  
 mode. Oct 10 07:56:24 oss-1 kernel: kjournald starting. Commit  
 interval 5 seconds Oct 10 07:56:24 oss-1 kernel: LDISKFS FS on  
 md11, external journal on md21 Oct 10 07:56:24 oss-1 kernel:  
 LDISKFS-fs: mounted filesystem with   journal data mode. Oct 10  
 07:56:24 oss-1 kernel: LDISKFS-fs: file extents enabled Oct 10  
 07:56:24 oss-1 kernel: LDISKFS-fs: mballoc enabled Lustre:  
 Request x1 sent from [EMAIL PROTECTED] to NID  
 [EMAIL PROTECTED] 5s ago has timed out (limit 5s). Oct 10  
 07:56:30 oss-1 kernel: Lustre: Request x1 sent from  
 [EMAIL PROTECTED] to NID [EMAIL PROTECTED] 5s ago has  
 timed out (limit 5s). LustreError: 4685:0:(events.c: 
 55:request_out_callback()) @@@ type 4, status -113  
 [EMAIL PROTECTED] x3/t0 o250-

 [EMAIL PROTECTED]@o2ib_1:26/25 lens 240/400 e 0 to 5 dl
 1223621815 ref 2 fl Rpc:/0/0 rc 0/0 Lustre: Request x3 sent from  
 [EMAIL PROTECTED] to NID [EMAIL PROTECTED] 0s ago has  
 timed out (limit 5s). LustreError: 18125:0:(obd_mount.c: 
 1062:server_start_targets()) Required registration failed for  
 lfs01-OST: -5 LustreError: 15f-b: Communication error with  
 the MGS. Is the MGS 

Re: [Lustre-discuss] e2fsprogs in Debian/Lenny

2008-10-13 Thread Guy Coates
Jakob Goldbach wrote:
 Hi,
 
 Debian lenny has e2fsprogs 1.41.2, with lots of ext4/extents info in its
 changelog. Could this be used when fsck'ing an OST instead of SUNs
 e2fsprogs ?
No, it is still missing some of the lustre specific bits. If you want a deb of
the Sun e2fsprogs, I have one.

Cheers,

Guy

-- 
Dr. Guy Coates,  Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK
Tel: +44 (0)1223 834244 x 6925
Fax: +44 (0)1223 496802


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] e2fsprogs in Debian/Lenny

2008-10-13 Thread Jakob Goldbach
On Mon, 2008-10-13 at 16:37 +0100, Guy Coates wrote:

 No, it is still missing some of the lustre specific bits. If you want a deb of
 the Sun e2fsprogs, I have one.
 

Yes please. 

/Jakob

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] e2fsprogs in Debian/Lenny

2008-10-13 Thread Guy Coates
Jakob Goldbach wrote:
 On Mon, 2008-10-13 at 16:37 +0100, Guy Coates wrote:
 
 No, it is still missing some of the lustre specific bits. If you want a deb 
 of
 the Sun e2fsprogs, I have one.

 
 Yes please. 
 
 /Jakob
 
 
 
You can grab the debs+source from:

ftp://ftp.sanger.ac.uk/pub/gmpc/

If you want to rebuild the packages for another architecture you will need to
change the configure line in debian/rules file to point to a copy of the lustre
source tree; I did not get around to integrating the e2fsprogs package with the
rest of the debian lustre packages.

Cheers,

Guy

-- 
Dr. Guy Coates,  Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK
Tel: +44 (0)1223 834244 x 6925
Fax: +44 (0)1223 496802


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] LBUG mds_reint.c, questions about recovery time

2008-10-13 Thread Thomas Roth
Hi all,

I just ran into a LBUG on an MDS still running Lustre Version 1.6.3 with 
kernel 2.6.18, Debian Etch.
kern.log c.f. below. You will probably tell me that is a known BUG 
already fixed/ to be fixed (I'm unsure how to search for such a thing in 
bugzilla).
But my main question concerns the subsequent recovery. It seems to have 
worked fine, however it took 2 hours. I would like to know what 
influences the recovery time?
During this period, I was watching 
/proc/fs/lustre/mds/lustre-MDT/recovery_status. It kind of 
continually showed a remainder time of 2100 sec, fluctuating between 
2400 and 1900, until the last 10 min or so, when the time really went 
down. So this is just a rough guess of Lustre as to what the remaining 
recovery time might be?
recovery_status also showed 346 connected clients, of which 146 had 
finished for a long time, the others obviously not. I wanted to be very 
clever and manually umounted Lustre on a number of our batch nodes which 
were not using Lustre at that time. This did neither influence the given 
number of connected clients nor did it have any perceptible effect on 
recovery.

---
Oct 13 17:10:58  kernel: LustreError: 
9132:0:(mds_reint.c:1512:mds_orphan_add_link()) ASSERTION(inode-i_nlink 
== 1) f
ailed:dir nlink == 0
Oct 13 17:10:58  kernel: LustreError: 
9132:0:(mds_reint.c:1512:mds_orphan_add_link()) LBUG
Oct 13 17:10:58  kernel: Lustre: 
9132:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing stack for 
process 9132
Oct 13 17:10:58  kernel: ll_mdt_77 R running 0  9132  1 
  9133  9131 (L-TLB)
Oct 13 17:10:58  kernel: e14eb98c 0046 55c3b8b1 16ab 006e 
000a c084b550 e1abeaa0
Oct 13 17:10:58  kernel: 8d1b6f09 001aedee c81c 0001 c0116bb3 
dffcc000 ea78f060 c02cbab0
Oct 13 17:10:58  kernel: dffcc000 0082 c0117c15 0013fa7b  
0001 3638acd3 3931
Oct 13 17:10:58  kernel: Call Trace:
Oct 13 17:10:58  kernel: [c0116bb3] task_rq_lock+0x31/0x58
Oct 13 17:10:58  kernel: [c0116bb3] task_rq_lock+0x31/0x58
Oct 13 17:10:58  kernel: [c0116bb3] task_rq_lock+0x31/0x58
Oct 13 17:10:58  kernel: [c011de22] printk+0x14/0x18
Oct 13 17:10:58  kernel: [c0136851] __print_symbol+0x9f/0xa8
Oct 13 17:10:58  kernel: [c0116bb3] task_rq_lock+0x31/0x58
Oct 13 17:10:58  kernel: [c0117c15] try_to_wake_up+0x355/0x35f
Oct 13 17:10:58  kernel: [c01166f5] __wake_up_common+0x2f/0x53
Oct 13 17:10:58  kernel: [c0116b46] __wake_up+0x2a/0x3d
Oct 13 17:10:58  kernel: [c011d854] release_console_sem+0x1b4/0x1bc
Oct 13 17:10:58  kernel: [c011d854] release_console_sem+0x1b4/0x1bc
Oct 13 17:10:58  kernel: [c011d854] release_console_sem+0x1b4/0x1bc
Oct 13 17:10:58  kernel: [c012c6d8] __kernel_text_address+0x18/0x23
Oct 13 17:10:58  kernel: [c0103b62] show_trace_log_lvl+0x47/0x6a
Oct 13 17:10:58  kernel: [c0103c13] show_stack_log_lvl+0x8e/0x96
Oct 13 17:10:58  kernel: [c0104107] show_stack+0x20/0x25
Oct 13 17:10:58  kernel: [fa1bef79] lbug_with_loc+0x69/0xc0 [libcfs]
Oct 13 17:10:58  kernel: [fa689448] mds_orphan_add_link+0xcb8/0xd20 [mds]
Oct 13 17:10:58  kernel: [fa69c87a] mds_reint_unlink+0x292a/0x3fd0 [mds]
Oct 13 17:10:58  kernel: [fa3ac990] lustre_swab_ldlm_request+0x0/0x20 
[ptlrpc]
Oct 13 17:10:58  kernel: [fa688495] mds_reint_rec+0xf5/0x3f0 [mds]
Oct 13 17:10:58  kernel: [fa39f788] ptl_send_buf+0x1b8/0xb00 [ptlrpc]
Oct 13 17:10:58  kernel: [fa66bfeb] mds_reint+0xcb/0x8a0 [mds]
Oct 13 17:10:58  kernel: [fa67f998] mds_handle+0x3048/0xb9df [mds]
Oct 13 17:10:58  kernel: [fa4ac402] LNetMEAttach+0x142/0x4a0 [lnet]
Oct 13 17:10:58  kernel: [fa2dcd91] class_handle_free_cb+0x21/0x190 
[obdclass]
Oct 13 17:10:58  kernel: [c0124d83] do_gettimeofday+0x31/0xce
Oct 13 17:10:58  kernel: [fa2dc06b] class_handle2object+0xbb/0x2a0 
[obdclass]
Oct 13 17:10:58  kernel: [fa3aca00] lustre_swab_ptlrpc_body+0x0/0xc0 
[ptlrpc]
Oct 13 17:10:58  kernel: [fa3a9b5a] lustre_swab_buf+0xfa/0x180 [ptlrpc]
Oct 13 17:10:58  kernel: [c0125aac] lock_timer_base+0x15/0x2f
Oct 13 17:10:59  kernel: [c0125bbd] __mod_timer+0x99/0xa3
Oct 13 17:10:59  kernel: [fa3a6efe] lustre_msg_get_conn_cnt+0xce/0x220 
[ptlrpc]
Oct 13 17:10:59  kernel: [fa3b8e56] ptlrpc_main+0x2016/0x2f40 [ptlrpc]
Oct 13 17:10:59  kernel: [c01b6dc0] __next_cpu+0x12/0x21
Oct 13 17:10:59  kernel: [c012053d] do_exit+0x711/0x71b
Oct 13 17:10:59  kernel: [c0117c1f] default_wake_function+0x0/0xc
Oct 13 17:10:59  kernel: [fa3b6e40] ptlrpc_main+0x0/0x2f40 [ptlrpc]
Oct 13 17:10:59  kernel: [c0101005] kernel_thread_helper+0x5/0xb

Oct 13 17:12:38  kernel: Lustre: 0:0:(watchdog.c:130:lcw_cb()) Watchdog 
triggered for pid 9132: it was inactive for 10
0s


Cheers,
Thomas


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] lfs df vs. df -k

2008-10-13 Thread Hendelman, Rob
So I've read through the mailing lists, faq, etc and have not come
across this.

Total space available: df -k and lfs df show the exact same # of 1k
blocks.  So far so good.  However, a df -k shows 4783238028 1k blocks in
use, while a lfs df shows 5399853568 1k blocks in use.  Interestingly,
lfs df -i and df -i both show the same number of total inodes the same.
They ALSO both show the used inodes the same as well.  Free inodes are
also the same.  Everything matches.

Anyone know why df -k used space doesn't match between lfs df -k and
df -k ?

Client is 2.6.22 patchless kernel/client on Ubuntu.  The server is
centos using the packages provided for download, including kernel:

-bash-3.1$ rpm -qa | sort | grep lust
kernel-lustre-smp-2.6.18-53.1.13.el5_lustre.1.6.4.3
lustre-1.6.4.3-2.6.18_53.1.13.el5_lustre.1.6.4.3smp
lustre-ldiskfs-3.0.4-2.6.18_53.1.13.el5_lustre.1.6.4.3smp
lustre-modules-1.6.4.3-2.6.18_53.1.13.el5_lustre.1.6.4.3smp

Thanks,

Robert

The information contained in this message and its attachments 
is intended only for the private and confidential use of the 
intended recipient(s).  If you are not the intended recipient 
(or have received this e-mail in error) please notify the 
sender immediately and destroy this e-mail. Any unauthorized 
copying, disclosure or distribution of the material in this e-
mail is strictly prohibited.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] LBUG mds_reint.c, questions about recovery time

2008-10-13 Thread Wojciech Turek
Lustre recovery time is 2.5 x timeout
You can find timeout by running this command on the MDS
cat /proc/sys/lustre/timeout

Thomas Roth wrote:
 Hi all,

 I just ran into a LBUG on an MDS still running Lustre Version 1.6.3 with 
 kernel 2.6.18, Debian Etch.
 kern.log c.f. below. You will probably tell me that is a known BUG 
 already fixed/ to be fixed (I'm unsure how to search for such a thing in 
 bugzilla).
 But my main question concerns the subsequent recovery. It seems to have 
 worked fine, however it took 2 hours. I would like to know what 
 influences the recovery time?
 During this period, I was watching 
 /proc/fs/lustre/mds/lustre-MDT/recovery_status. It kind of 
 continually showed a remainder time of 2100 sec, fluctuating between 
 2400 and 1900, until the last 10 min or so, when the time really went 
 down. So this is just a rough guess of Lustre as to what the remaining 
 recovery time might be?
 recovery_status also showed 346 connected clients, of which 146 had 
 finished for a long time, the others obviously not. I wanted to be very 
 clever and manually umounted Lustre on a number of our batch nodes which 
 were not using Lustre at that time. This did neither influence the given 
 number of connected clients nor did it have any perceptible effect on 
 recovery.

 ---
 Oct 13 17:10:58  kernel: LustreError: 
 9132:0:(mds_reint.c:1512:mds_orphan_add_link()) ASSERTION(inode-i_nlink 
 == 1) f
 ailed:dir nlink == 0
 Oct 13 17:10:58  kernel: LustreError: 
 9132:0:(mds_reint.c:1512:mds_orphan_add_link()) LBUG
 Oct 13 17:10:58  kernel: Lustre: 
 9132:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing stack for 
 process 9132
 Oct 13 17:10:58  kernel: ll_mdt_77 R running 0  9132  1 
   9133  9131 (L-TLB)
 Oct 13 17:10:58  kernel: e14eb98c 0046 55c3b8b1 16ab 006e 
 000a c084b550 e1abeaa0
 Oct 13 17:10:58  kernel: 8d1b6f09 001aedee c81c 0001 c0116bb3 
 dffcc000 ea78f060 c02cbab0
 Oct 13 17:10:58  kernel: dffcc000 0082 c0117c15 0013fa7b  
 0001 3638acd3 3931
 Oct 13 17:10:58  kernel: Call Trace:
 Oct 13 17:10:58  kernel: [c0116bb3] task_rq_lock+0x31/0x58
 Oct 13 17:10:58  kernel: [c0116bb3] task_rq_lock+0x31/0x58
 Oct 13 17:10:58  kernel: [c0116bb3] task_rq_lock+0x31/0x58
 Oct 13 17:10:58  kernel: [c011de22] printk+0x14/0x18
 Oct 13 17:10:58  kernel: [c0136851] __print_symbol+0x9f/0xa8
 Oct 13 17:10:58  kernel: [c0116bb3] task_rq_lock+0x31/0x58
 Oct 13 17:10:58  kernel: [c0117c15] try_to_wake_up+0x355/0x35f
 Oct 13 17:10:58  kernel: [c01166f5] __wake_up_common+0x2f/0x53
 Oct 13 17:10:58  kernel: [c0116b46] __wake_up+0x2a/0x3d
 Oct 13 17:10:58  kernel: [c011d854] release_console_sem+0x1b4/0x1bc
 Oct 13 17:10:58  kernel: [c011d854] release_console_sem+0x1b4/0x1bc
 Oct 13 17:10:58  kernel: [c011d854] release_console_sem+0x1b4/0x1bc
 Oct 13 17:10:58  kernel: [c012c6d8] __kernel_text_address+0x18/0x23
 Oct 13 17:10:58  kernel: [c0103b62] show_trace_log_lvl+0x47/0x6a
 Oct 13 17:10:58  kernel: [c0103c13] show_stack_log_lvl+0x8e/0x96
 Oct 13 17:10:58  kernel: [c0104107] show_stack+0x20/0x25
 Oct 13 17:10:58  kernel: [fa1bef79] lbug_with_loc+0x69/0xc0 [libcfs]
 Oct 13 17:10:58  kernel: [fa689448] mds_orphan_add_link+0xcb8/0xd20 [mds]
 Oct 13 17:10:58  kernel: [fa69c87a] mds_reint_unlink+0x292a/0x3fd0 [mds]
 Oct 13 17:10:58  kernel: [fa3ac990] lustre_swab_ldlm_request+0x0/0x20 
 [ptlrpc]
 Oct 13 17:10:58  kernel: [fa688495] mds_reint_rec+0xf5/0x3f0 [mds]
 Oct 13 17:10:58  kernel: [fa39f788] ptl_send_buf+0x1b8/0xb00 [ptlrpc]
 Oct 13 17:10:58  kernel: [fa66bfeb] mds_reint+0xcb/0x8a0 [mds]
 Oct 13 17:10:58  kernel: [fa67f998] mds_handle+0x3048/0xb9df [mds]
 Oct 13 17:10:58  kernel: [fa4ac402] LNetMEAttach+0x142/0x4a0 [lnet]
 Oct 13 17:10:58  kernel: [fa2dcd91] class_handle_free_cb+0x21/0x190 
 [obdclass]
 Oct 13 17:10:58  kernel: [c0124d83] do_gettimeofday+0x31/0xce
 Oct 13 17:10:58  kernel: [fa2dc06b] class_handle2object+0xbb/0x2a0 
 [obdclass]
 Oct 13 17:10:58  kernel: [fa3aca00] lustre_swab_ptlrpc_body+0x0/0xc0 
 [ptlrpc]
 Oct 13 17:10:58  kernel: [fa3a9b5a] lustre_swab_buf+0xfa/0x180 [ptlrpc]
 Oct 13 17:10:58  kernel: [c0125aac] lock_timer_base+0x15/0x2f
 Oct 13 17:10:59  kernel: [c0125bbd] __mod_timer+0x99/0xa3
 Oct 13 17:10:59  kernel: [fa3a6efe] lustre_msg_get_conn_cnt+0xce/0x220 
 [ptlrpc]
 Oct 13 17:10:59  kernel: [fa3b8e56] ptlrpc_main+0x2016/0x2f40 [ptlrpc]
 Oct 13 17:10:59  kernel: [c01b6dc0] __next_cpu+0x12/0x21
 Oct 13 17:10:59  kernel: [c012053d] do_exit+0x711/0x71b
 Oct 13 17:10:59  kernel: [c0117c1f] default_wake_function+0x0/0xc
 Oct 13 17:10:59  kernel: [fa3b6e40] ptlrpc_main+0x0/0x2f40 [ptlrpc]
 Oct 13 17:10:59  kernel: [c0101005] kernel_thread_helper+0x5/0xb

 Oct 13 17:12:38  kernel: Lustre: 0:0:(watchdog.c:130:lcw_cb()) Watchdog 
 triggered for pid 9132: it was inactive for 10
 0s


 Cheers,
 

Re: [Lustre-discuss] kerberos with lustre

2008-10-13 Thread Andreas Dilger
On Oct 13, 2008  14:20 +0200, mohammed gaafar wrote:
 I've been trying to kerberize lustre, so I followed the instructions
 mentioned in the lustre manual, but unfortunately the step which requires
 mounting the MDT and OST using the command mount -t lustre -o sec=plain
 /dev/sda8 /mnt/data/mdt didn't work raising an error which says
 Unrecognized mount option sec=plain or missing value

Kerberos support is not in any released version of Lustre.  If you want
to test with a pre-release version of Lustre, you need to CVS checkout
lustre HEAD.


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss