[Lustre-discuss] (unknown subject)
HI Kevin Thanks for your reply. Now I can set it up. Regards Deval Message: 9 Date: Thu, 09 Oct 2008 05:46:27 -0600 From: Kevin Van Maren [EMAIL PROTECTED] Subject: Re: [Lustre-discuss] Test setup configuration To: Deval kulshrestha [EMAIL PROTECTED] Cc: lustre-discuss@lists.lustre.org Message-ID: [EMAIL PROTECTED] Content-Type: text/plain; format=flowed; charset=ISO-8859-1 Deval kulshrestha wrote: Hi I am a new luster user, trying to evaluate luster with few configuration. I am going through luster 1.6 Operation manual. But I am not able to understand which package should be installed on MDS, OSS , and client. Should I install all the packages on all three types of nodes? Please explain Best Regards Deval K Lustre servers (MDS/OSS): kernel-lustre-smp // patched server kernel lustre-modules // Lustre kernel modules lustre // user space tools (server) lustre-ldiskfs // ldiskfs e2fsprogs // filesystem tools You can install all those RPMs on the client as well, but it is not necessary. Lustre clients (assuming you have the matching vendor kernel for the lustre-modules installed): lustre-client-modules // kernel modules for client lustre-client // user space (client) Kevin -- ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss End of Lustre-discuss Digest, Vol 33, Issue 12 ** No virus found in this incoming message. Checked by AVG - http://www.avg.com Version: 8.0.173 / Virus Database: 270.7.6/1715 - Release Date: 10/8/2008 7:19 PM === Privileged or confidential information may be contained in this message. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), please delete this message and kindly notify the sender by an emailed reply. Opinions, conclusions and other information in this message that do not relate to the official business of Progression and its associate entities shall be understood as neither given nor endorsed by them. --- Progression Infonet Private Limited, Gurgaon (Haryana), India Authorised dealer of PostMaster, by QuantumLink Communications Pvt. Ltd Get your free copy of PostMaster at http://www.postmaster.co.in/ ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] kerberos with lustre
hi all, I've been trying to kerberize lustre, so I followed the instructions mentioned in the lustre manual, but unfortunately the step which requires mounting the MDT and OST using the command mount -t lustre -o sec=plain /dev/sda8 /mnt/data/mdt didn't work raising an error which says Unrecognized mount option sec=plain or missing value Any help, please Thank You -- Mohammed Abd El-Monem Gaafar Software Engineer ICT Sector Bibliotheca Alexandrina P.O.Box 138, Chatby, Alexandria 21526, Egypt Tel: +20 3 483 Fax: +20 3 4820405 Ext.: 1417 Website: www.bibalex.org Email: [EMAIL PROTECTED] ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre 1.6.5.1 on X4200 and STK 6140 Issues
I know you say the only addition was the RDAC for the MDS's I assume (we use it also just fine). When I ran faultmond from suns dcmu rpm (RHEL 4 here) the x4500's would crash like clock work ~48 hours. For a very simple bit of code I was surpised that once when I forgot to turn it on when working on the load this would happen. Just FYI it was unrelated to lustre (using provided rpm's no kernel build) this solved my problem on the x4500 Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Oct 13, 2008, at 4:41 AM, Malcolm Cowe wrote: The X4200m2 MDS systems and the X4500 OSS were rebuilt using the stock Lustre packages (Kernel + modules + userspace). With the exception of the RDAC kernel module, no additional software was applied to the systems. We recreated our volumes and ran the servers over the weekend. However, the OSS crashed about 8 hours in. The syslog output is attached to this message. Looks like it could be similar to bug #16404, which means patching and rebuilding the kernel. Given my lack of success at trying to build from source, I am again asking for some guidance on how to do this. I sent out the steps I used to try and build from source on the 7th because I was encountering problems and was unable to get a working set of packages. Included in that messages was output from quilt that implies that the kernel patching process was not working properly. Regards, Malcolm. -- 6g_top.gif Malcolm Cowe Solutions Integration Engineer Sun Microsystems, Inc. Blackness Road Linlithgow, West Lothian EH49 7LR UK Phone: x73602 / +44 1506 673 602 Email: [EMAIL PROTECTED] 6g_top.gif Oct 10 06:49:39 oss-1 kernel: LDISKFS FS on md15, internal journal Oct 10 06:49:39 oss-1 kernel: LDISKFS-fs: mounted filesystem with ordered data mode. Oct 10 06:53:42 oss-1 kernel: kjournald starting. Commit interval 5 seconds Oct 10 06:53:42 oss-1 kernel: LDISKFS FS on md16, internal journal Oct 10 06:53:42 oss-1 kernel: LDISKFS-fs: mounted filesystem with ordered data mode. Oct 10 06:57:49 oss-1 kernel: kjournald starting. Commit interval 5 seconds Oct 10 06:57:49 oss-1 kernel: LDISKFS FS on md17, internal journal Oct 10 06:57:49 oss-1 kernel: LDISKFS-fs: mounted filesystem with ordered data mode. Oct 10 07:44:55 oss-1 faultmond: 16:Polling all 48 slots for drive fault Oct 10 07:45:00 oss-1 faultmond: Polling cycle 16 is complete Oct 10 07:56:23 oss-1 kernel: Lustre: OBD class driver, [EMAIL PROTECTED] Oct 10 07:56:23 oss-LDISKFS-fs: file extents enabled1 kernel: Lustre VersionLDISKFS-fs: mballoc enabled : 1.6.5.1 Oct 10 07:56:23 oss-1 kernel: Build Version: 1.6.5.1-1969123119-PRISTINE-.cache.OLDRPMS.20080618230526.linux- smp-2.6.9-67.0.7.EL_lustre.1.6.5.1.x86_64-2.6.9-67.0.7.EL_lustre. 1.6.5.1smp Oct 10 07:56:24 oss-1 kernel: Lustre: Added LNI [EMAIL PROTECTED] [8/64] Oct 10 07:56:24 oss-1 kernel: Lustre: Lustre Client File System; [EMAIL PROTECTED] Oct 10 07:56:24 oss-1 kernel: kjournald starting. Commit interval 5 seconds Oct 10 07:56:24 oss-1 kernel: LDISKFS FS on md11, external journal on md21 Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: mounted filesystem with journal data mode. Oct 10 07:56:24 oss-1 kernel: kjournald starting. Commit interval 5 seconds Oct 10 07:56:24 oss-1 kernel: LDISKFS FS on md11, external journal on md21 Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: mounted filesystem with journal data mode. Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: file extents enabled Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: mballoc enabled Lustre: Request x1 sent from [EMAIL PROTECTED] to NID [EMAIL PROTECTED] 5s ago has timed out (limit 5s). Oct 10 07:56:30 oss-1 kernel: Lustre: Request x1 sent from [EMAIL PROTECTED] to NID [EMAIL PROTECTED] 5s ago has timed out (limit 5s). LustreError: 4685:0:(events.c:55:request_out_callback()) @@@ type 4, status -113 [EMAIL PROTECTED] x3/t0 o250- [EMAIL PROTECTED]@o2ib_1:26/25 lens 240/400 e 0 to 5 dl 1223621815 ref 2 fl Rpc:/0/0 rc 0/0 Lustre: Request x3 sent from [EMAIL PROTECTED] to NID [EMAIL PROTECTED] 0s ago has timed out (limit 5s). LustreError: 18125:0:(obd_mount.c:1062:server_start_targets()) Required registration failed for lfs01-OST: -5 LustreError: 15f-b: Communication error with the MGS. Is the MGS running? LustreError: 18125:0:(obd_mount.c:1597:server_fill_super()) Unable to start targets: -5 LustreError: 18125:0:(obd_mount.c:1382:server_put_super()) no obd lfs01-OST LustreError: 18125:0:(obd_mount.c:119:server_deregister_mount()) lfs01-OST not registered LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost LDISKFS-fs: mballoc: 0 generated and it took 0 LDISKFS-fs: mballoc: 0 preallocated, 0 discarded Oct 10 07:56:50 oss-1
Re: [Lustre-discuss] lustre/drbd/heartbeat setup [was: drbd async mode]
Hi, read your instructions - that's pretty much the setup we are using, too. And it works very well, drbd 0.8 non-withstanding, but on a hardware raid. I do not quite understand your remark about not using an extra net for drbd - have you tried putting the name that's in your drbd.conf together with the other IP into /etc/hosts ? My guess is that the performance of your MDS-pair is influenced by drbd doing its job - I would keep that separate from the Lustre data stream. The machines we are planning to use in our next cluster are actually equipped with four network interfaces - two (bonded) for Lustre, one for drbd and one for heartbeat - those serial cables only give me error messages and headaches. We have separate partitions for MGS and MDT - on one machine. Didn't understand that this would not be the Lustre way? This way at least one doesn't have to worry about a super fast connection between MGS and MDT. Is there a particular reason for not managing the IP via heartbeat? At least it's easier to setup than the drbddisk and Filesystem resources. Regards, Thomas Heiko Schroeter wrote: Hello, at last a first version of our setup scenario is ready. Please consider this as a general guideline. It may contain errors. We know that some things are done differently in the lustre community i.e. placing MDS and MDT on seperate machines. Please let me know if you find bugs or if things can be improved. There is more than one way. Regards Heiko ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Multiple interfaces in LNet
Hi, I learn about LNet and I also have read one of the recent post with subject 'Adding IB to tcp only cluster'. Interesting is how to use multiple interfaces on the same server in Lustre/LNet. My understanding is that TCP(ksocklnd) can manage multiple physical interfaces as one LNet interface with one unique NID. Is that still correct and recommended to use? Or is it better to setup Ethernet bonding(under Linux) and bind that bonding interface to LNet? Beside of TCP it is only possible to use multiple interfaces on the same node with o2ib, right? With ko2iblnd one can setup several Lustre networks for each IB interface. In fact you must setup several Lustre networks otherwise only the 1st IB interface is used, correct? It is not clear for me how MGS, MDS, OSS and Client choose a NID for communication. I mean I know that LNet choose the best one, but who provides the list with all available NIDs for a server? Or does it work somehow different? I'am aware of 'lctl list_nids' and 'lctl which_nid list of nids'. That is fine. I would like to know what a Lustre server or a client does. Furthermore I don't get the MGS target information kept in MDT and OST devices. For what and how is it used? What happens during mount of MDT or OST and who talks to each other and how? I'am looking forward to get some better understanding. Thank you and Best regards, Danny -- Danny Sternkopf http://www.nec.de/hpc [EMAIL PROTECTED] HPCE Division Germany phone: +49-711-68770-35 fax: +49-711-6877145 ~~~ NEC Deutschland GmbH, Hansaallee 101, 40549 Düsseldorf Geschäftsführer Yuya Momose Handelsregister Düsseldorf HRB 57941; VAT ID DE129424743 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] e2fsprogs in Debian/Lenny
Hi, Debian lenny has e2fsprogs 1.41.2, with lots of ext4/extents info in its changelog. Could this be used when fsck'ing an OST instead of SUNs e2fsprogs ? /Jakob ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre 1.6.5.1 on X4200 and STK 6140 Issues
I never uninstalled it (i still use some of the tools in it) Faultmond is a service, just chkconfig it off. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Oct 13, 2008, at 11:03 AM, Malcolm Cowe wrote: Brock Palen wrote: I know you say the only addition was the RDAC for the MDS's I assume (we use it also just fine). Yes, the MDS's share a STK 6140. When I ran faultmond from suns dcmu rpm (RHEL 4 here) the x4500's would crash like clock work ~48 hours. For a very simple bit of code I was surpised that once when I forgot to turn it on when working on the load this would happen. Just FYI it was unrelated to lustre (using provided rpm's no kernel build) this solved my problem on the x4500 The DCMU RPM is installed. I didn't explicitly install this, so it must have been bundled in with the SIA CD... I'll try removing the rpm to see what happens. Thanks for the heads up. Regards, Malcolm. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Oct 13, 2008, at 4:41 AM, Malcolm Cowe wrote: The X4200m2 MDS systems and the X4500 OSS were rebuilt using the stock Lustre packages (Kernel + modules + userspace). With the exception of the RDAC kernel module, no additional software was applied to the systems. We recreated our volumes and ran the servers over the weekend. However, the OSS crashed about 8 hours in. The syslog output is attached to this message. Looks like it could be similar to bug #16404, which means patching and rebuilding the kernel. Given my lack of success at trying to build from source, I am again asking for some guidance on how to do this. I sent out the steps I used to try and build from source on the 7th because I was encountering problems and was unable to get a working set of packages. Included in that messages was output from quilt that implies that the kernel patching process was not working properly. Regards, Malcolm. -- 6g_top.gif Malcolm Cowe Solutions Integration Engineer Sun Microsystems, Inc. Blackness Road Linlithgow, West Lothian EH49 7LR UK Phone: x73602 / +44 1506 673 602 Email: [EMAIL PROTECTED] 6g_top.gif Oct 10 06:49:39 oss-1 kernel: LDISKFS FS on md15, internal journal Oct 10 06:49:39 oss-1 kernel: LDISKFS-fs: mounted filesystem with ordered data mode. Oct 10 06:53:42 oss-1 kernel: kjournald starting. Commit interval 5 seconds Oct 10 06:53:42 oss-1 kernel: LDISKFS FS on md16, internal journal Oct 10 06:53:42 oss-1 kernel: LDISKFS-fs: mounted filesystem with ordered data mode. Oct 10 06:57:49 oss-1 kernel: kjournald starting. Commit interval 5 seconds Oct 10 06:57:49 oss-1 kernel: LDISKFS FS on md17, internal journal Oct 10 06:57:49 oss-1 kernel: LDISKFS-fs: mounted filesystem with ordered data mode. Oct 10 07:44:55 oss-1 faultmond: 16:Polling all 48 slots for drive fault Oct 10 07:45:00 oss-1 faultmond: Polling cycle 16 is complete Oct 10 07:56:23 oss-1 kernel: Lustre: OBD class driver, [EMAIL PROTECTED] Oct 10 07:56:23 oss-LDISKFS-fs: file extents enabled1 kernel: Lustre VersionLDISKFS-fs: mballoc enabled : 1.6.5.1 Oct 10 07:56:23 oss-1 kernel: Build Version: 1.6.5.1-1969123119-PRISTINE-.cache.OLDRPMS. 20080618230526.linux- smp-2.6.9-67.0.7.EL_lustre. 1.6.5.1.x86_64-2.6.9-67.0.7.EL_lustre. 1.6.5.1smp Oct 10 07:56:24 oss-1 kernel: Lustre: Added LNI [EMAIL PROTECTED] [8/64] Oct 10 07:56:24 oss-1 kernel: Lustre: Lustre Client File System; [EMAIL PROTECTED] Oct 10 07:56:24 oss-1 kernel: kjournald starting. Commit interval 5 seconds Oct 10 07:56:24 oss-1 kernel: LDISKFS FS on md11, external journal on md21 Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: mounted filesystem with journal data mode. Oct 10 07:56:24 oss-1 kernel: kjournald starting. Commit interval 5 seconds Oct 10 07:56:24 oss-1 kernel: LDISKFS FS on md11, external journal on md21 Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: mounted filesystem with journal data mode. Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: file extents enabled Oct 10 07:56:24 oss-1 kernel: LDISKFS-fs: mballoc enabled Lustre: Request x1 sent from [EMAIL PROTECTED] to NID [EMAIL PROTECTED] 5s ago has timed out (limit 5s). Oct 10 07:56:30 oss-1 kernel: Lustre: Request x1 sent from [EMAIL PROTECTED] to NID [EMAIL PROTECTED] 5s ago has timed out (limit 5s). LustreError: 4685:0:(events.c: 55:request_out_callback()) @@@ type 4, status -113 [EMAIL PROTECTED] x3/t0 o250- [EMAIL PROTECTED]@o2ib_1:26/25 lens 240/400 e 0 to 5 dl 1223621815 ref 2 fl Rpc:/0/0 rc 0/0 Lustre: Request x3 sent from [EMAIL PROTECTED] to NID [EMAIL PROTECTED] 0s ago has timed out (limit 5s). LustreError: 18125:0:(obd_mount.c: 1062:server_start_targets()) Required registration failed for lfs01-OST: -5 LustreError: 15f-b: Communication error with the MGS. Is the MGS
Re: [Lustre-discuss] e2fsprogs in Debian/Lenny
Jakob Goldbach wrote: Hi, Debian lenny has e2fsprogs 1.41.2, with lots of ext4/extents info in its changelog. Could this be used when fsck'ing an OST instead of SUNs e2fsprogs ? No, it is still missing some of the lustre specific bits. If you want a deb of the Sun e2fsprogs, I have one. Cheers, Guy -- Dr. Guy Coates, Informatics System Group The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK Tel: +44 (0)1223 834244 x 6925 Fax: +44 (0)1223 496802 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] e2fsprogs in Debian/Lenny
On Mon, 2008-10-13 at 16:37 +0100, Guy Coates wrote: No, it is still missing some of the lustre specific bits. If you want a deb of the Sun e2fsprogs, I have one. Yes please. /Jakob ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] e2fsprogs in Debian/Lenny
Jakob Goldbach wrote: On Mon, 2008-10-13 at 16:37 +0100, Guy Coates wrote: No, it is still missing some of the lustre specific bits. If you want a deb of the Sun e2fsprogs, I have one. Yes please. /Jakob You can grab the debs+source from: ftp://ftp.sanger.ac.uk/pub/gmpc/ If you want to rebuild the packages for another architecture you will need to change the configure line in debian/rules file to point to a copy of the lustre source tree; I did not get around to integrating the e2fsprogs package with the rest of the debian lustre packages. Cheers, Guy -- Dr. Guy Coates, Informatics System Group The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK Tel: +44 (0)1223 834244 x 6925 Fax: +44 (0)1223 496802 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] LBUG mds_reint.c, questions about recovery time
Hi all, I just ran into a LBUG on an MDS still running Lustre Version 1.6.3 with kernel 2.6.18, Debian Etch. kern.log c.f. below. You will probably tell me that is a known BUG already fixed/ to be fixed (I'm unsure how to search for such a thing in bugzilla). But my main question concerns the subsequent recovery. It seems to have worked fine, however it took 2 hours. I would like to know what influences the recovery time? During this period, I was watching /proc/fs/lustre/mds/lustre-MDT/recovery_status. It kind of continually showed a remainder time of 2100 sec, fluctuating between 2400 and 1900, until the last 10 min or so, when the time really went down. So this is just a rough guess of Lustre as to what the remaining recovery time might be? recovery_status also showed 346 connected clients, of which 146 had finished for a long time, the others obviously not. I wanted to be very clever and manually umounted Lustre on a number of our batch nodes which were not using Lustre at that time. This did neither influence the given number of connected clients nor did it have any perceptible effect on recovery. --- Oct 13 17:10:58 kernel: LustreError: 9132:0:(mds_reint.c:1512:mds_orphan_add_link()) ASSERTION(inode-i_nlink == 1) f ailed:dir nlink == 0 Oct 13 17:10:58 kernel: LustreError: 9132:0:(mds_reint.c:1512:mds_orphan_add_link()) LBUG Oct 13 17:10:58 kernel: Lustre: 9132:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing stack for process 9132 Oct 13 17:10:58 kernel: ll_mdt_77 R running 0 9132 1 9133 9131 (L-TLB) Oct 13 17:10:58 kernel: e14eb98c 0046 55c3b8b1 16ab 006e 000a c084b550 e1abeaa0 Oct 13 17:10:58 kernel: 8d1b6f09 001aedee c81c 0001 c0116bb3 dffcc000 ea78f060 c02cbab0 Oct 13 17:10:58 kernel: dffcc000 0082 c0117c15 0013fa7b 0001 3638acd3 3931 Oct 13 17:10:58 kernel: Call Trace: Oct 13 17:10:58 kernel: [c0116bb3] task_rq_lock+0x31/0x58 Oct 13 17:10:58 kernel: [c0116bb3] task_rq_lock+0x31/0x58 Oct 13 17:10:58 kernel: [c0116bb3] task_rq_lock+0x31/0x58 Oct 13 17:10:58 kernel: [c011de22] printk+0x14/0x18 Oct 13 17:10:58 kernel: [c0136851] __print_symbol+0x9f/0xa8 Oct 13 17:10:58 kernel: [c0116bb3] task_rq_lock+0x31/0x58 Oct 13 17:10:58 kernel: [c0117c15] try_to_wake_up+0x355/0x35f Oct 13 17:10:58 kernel: [c01166f5] __wake_up_common+0x2f/0x53 Oct 13 17:10:58 kernel: [c0116b46] __wake_up+0x2a/0x3d Oct 13 17:10:58 kernel: [c011d854] release_console_sem+0x1b4/0x1bc Oct 13 17:10:58 kernel: [c011d854] release_console_sem+0x1b4/0x1bc Oct 13 17:10:58 kernel: [c011d854] release_console_sem+0x1b4/0x1bc Oct 13 17:10:58 kernel: [c012c6d8] __kernel_text_address+0x18/0x23 Oct 13 17:10:58 kernel: [c0103b62] show_trace_log_lvl+0x47/0x6a Oct 13 17:10:58 kernel: [c0103c13] show_stack_log_lvl+0x8e/0x96 Oct 13 17:10:58 kernel: [c0104107] show_stack+0x20/0x25 Oct 13 17:10:58 kernel: [fa1bef79] lbug_with_loc+0x69/0xc0 [libcfs] Oct 13 17:10:58 kernel: [fa689448] mds_orphan_add_link+0xcb8/0xd20 [mds] Oct 13 17:10:58 kernel: [fa69c87a] mds_reint_unlink+0x292a/0x3fd0 [mds] Oct 13 17:10:58 kernel: [fa3ac990] lustre_swab_ldlm_request+0x0/0x20 [ptlrpc] Oct 13 17:10:58 kernel: [fa688495] mds_reint_rec+0xf5/0x3f0 [mds] Oct 13 17:10:58 kernel: [fa39f788] ptl_send_buf+0x1b8/0xb00 [ptlrpc] Oct 13 17:10:58 kernel: [fa66bfeb] mds_reint+0xcb/0x8a0 [mds] Oct 13 17:10:58 kernel: [fa67f998] mds_handle+0x3048/0xb9df [mds] Oct 13 17:10:58 kernel: [fa4ac402] LNetMEAttach+0x142/0x4a0 [lnet] Oct 13 17:10:58 kernel: [fa2dcd91] class_handle_free_cb+0x21/0x190 [obdclass] Oct 13 17:10:58 kernel: [c0124d83] do_gettimeofday+0x31/0xce Oct 13 17:10:58 kernel: [fa2dc06b] class_handle2object+0xbb/0x2a0 [obdclass] Oct 13 17:10:58 kernel: [fa3aca00] lustre_swab_ptlrpc_body+0x0/0xc0 [ptlrpc] Oct 13 17:10:58 kernel: [fa3a9b5a] lustre_swab_buf+0xfa/0x180 [ptlrpc] Oct 13 17:10:58 kernel: [c0125aac] lock_timer_base+0x15/0x2f Oct 13 17:10:59 kernel: [c0125bbd] __mod_timer+0x99/0xa3 Oct 13 17:10:59 kernel: [fa3a6efe] lustre_msg_get_conn_cnt+0xce/0x220 [ptlrpc] Oct 13 17:10:59 kernel: [fa3b8e56] ptlrpc_main+0x2016/0x2f40 [ptlrpc] Oct 13 17:10:59 kernel: [c01b6dc0] __next_cpu+0x12/0x21 Oct 13 17:10:59 kernel: [c012053d] do_exit+0x711/0x71b Oct 13 17:10:59 kernel: [c0117c1f] default_wake_function+0x0/0xc Oct 13 17:10:59 kernel: [fa3b6e40] ptlrpc_main+0x0/0x2f40 [ptlrpc] Oct 13 17:10:59 kernel: [c0101005] kernel_thread_helper+0x5/0xb Oct 13 17:12:38 kernel: Lustre: 0:0:(watchdog.c:130:lcw_cb()) Watchdog triggered for pid 9132: it was inactive for 10 0s Cheers, Thomas ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] lfs df vs. df -k
So I've read through the mailing lists, faq, etc and have not come across this. Total space available: df -k and lfs df show the exact same # of 1k blocks. So far so good. However, a df -k shows 4783238028 1k blocks in use, while a lfs df shows 5399853568 1k blocks in use. Interestingly, lfs df -i and df -i both show the same number of total inodes the same. They ALSO both show the used inodes the same as well. Free inodes are also the same. Everything matches. Anyone know why df -k used space doesn't match between lfs df -k and df -k ? Client is 2.6.22 patchless kernel/client on Ubuntu. The server is centos using the packages provided for download, including kernel: -bash-3.1$ rpm -qa | sort | grep lust kernel-lustre-smp-2.6.18-53.1.13.el5_lustre.1.6.4.3 lustre-1.6.4.3-2.6.18_53.1.13.el5_lustre.1.6.4.3smp lustre-ldiskfs-3.0.4-2.6.18_53.1.13.el5_lustre.1.6.4.3smp lustre-modules-1.6.4.3-2.6.18_53.1.13.el5_lustre.1.6.4.3smp Thanks, Robert The information contained in this message and its attachments is intended only for the private and confidential use of the intended recipient(s). If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e- mail is strictly prohibited. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] LBUG mds_reint.c, questions about recovery time
Lustre recovery time is 2.5 x timeout You can find timeout by running this command on the MDS cat /proc/sys/lustre/timeout Thomas Roth wrote: Hi all, I just ran into a LBUG on an MDS still running Lustre Version 1.6.3 with kernel 2.6.18, Debian Etch. kern.log c.f. below. You will probably tell me that is a known BUG already fixed/ to be fixed (I'm unsure how to search for such a thing in bugzilla). But my main question concerns the subsequent recovery. It seems to have worked fine, however it took 2 hours. I would like to know what influences the recovery time? During this period, I was watching /proc/fs/lustre/mds/lustre-MDT/recovery_status. It kind of continually showed a remainder time of 2100 sec, fluctuating between 2400 and 1900, until the last 10 min or so, when the time really went down. So this is just a rough guess of Lustre as to what the remaining recovery time might be? recovery_status also showed 346 connected clients, of which 146 had finished for a long time, the others obviously not. I wanted to be very clever and manually umounted Lustre on a number of our batch nodes which were not using Lustre at that time. This did neither influence the given number of connected clients nor did it have any perceptible effect on recovery. --- Oct 13 17:10:58 kernel: LustreError: 9132:0:(mds_reint.c:1512:mds_orphan_add_link()) ASSERTION(inode-i_nlink == 1) f ailed:dir nlink == 0 Oct 13 17:10:58 kernel: LustreError: 9132:0:(mds_reint.c:1512:mds_orphan_add_link()) LBUG Oct 13 17:10:58 kernel: Lustre: 9132:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing stack for process 9132 Oct 13 17:10:58 kernel: ll_mdt_77 R running 0 9132 1 9133 9131 (L-TLB) Oct 13 17:10:58 kernel: e14eb98c 0046 55c3b8b1 16ab 006e 000a c084b550 e1abeaa0 Oct 13 17:10:58 kernel: 8d1b6f09 001aedee c81c 0001 c0116bb3 dffcc000 ea78f060 c02cbab0 Oct 13 17:10:58 kernel: dffcc000 0082 c0117c15 0013fa7b 0001 3638acd3 3931 Oct 13 17:10:58 kernel: Call Trace: Oct 13 17:10:58 kernel: [c0116bb3] task_rq_lock+0x31/0x58 Oct 13 17:10:58 kernel: [c0116bb3] task_rq_lock+0x31/0x58 Oct 13 17:10:58 kernel: [c0116bb3] task_rq_lock+0x31/0x58 Oct 13 17:10:58 kernel: [c011de22] printk+0x14/0x18 Oct 13 17:10:58 kernel: [c0136851] __print_symbol+0x9f/0xa8 Oct 13 17:10:58 kernel: [c0116bb3] task_rq_lock+0x31/0x58 Oct 13 17:10:58 kernel: [c0117c15] try_to_wake_up+0x355/0x35f Oct 13 17:10:58 kernel: [c01166f5] __wake_up_common+0x2f/0x53 Oct 13 17:10:58 kernel: [c0116b46] __wake_up+0x2a/0x3d Oct 13 17:10:58 kernel: [c011d854] release_console_sem+0x1b4/0x1bc Oct 13 17:10:58 kernel: [c011d854] release_console_sem+0x1b4/0x1bc Oct 13 17:10:58 kernel: [c011d854] release_console_sem+0x1b4/0x1bc Oct 13 17:10:58 kernel: [c012c6d8] __kernel_text_address+0x18/0x23 Oct 13 17:10:58 kernel: [c0103b62] show_trace_log_lvl+0x47/0x6a Oct 13 17:10:58 kernel: [c0103c13] show_stack_log_lvl+0x8e/0x96 Oct 13 17:10:58 kernel: [c0104107] show_stack+0x20/0x25 Oct 13 17:10:58 kernel: [fa1bef79] lbug_with_loc+0x69/0xc0 [libcfs] Oct 13 17:10:58 kernel: [fa689448] mds_orphan_add_link+0xcb8/0xd20 [mds] Oct 13 17:10:58 kernel: [fa69c87a] mds_reint_unlink+0x292a/0x3fd0 [mds] Oct 13 17:10:58 kernel: [fa3ac990] lustre_swab_ldlm_request+0x0/0x20 [ptlrpc] Oct 13 17:10:58 kernel: [fa688495] mds_reint_rec+0xf5/0x3f0 [mds] Oct 13 17:10:58 kernel: [fa39f788] ptl_send_buf+0x1b8/0xb00 [ptlrpc] Oct 13 17:10:58 kernel: [fa66bfeb] mds_reint+0xcb/0x8a0 [mds] Oct 13 17:10:58 kernel: [fa67f998] mds_handle+0x3048/0xb9df [mds] Oct 13 17:10:58 kernel: [fa4ac402] LNetMEAttach+0x142/0x4a0 [lnet] Oct 13 17:10:58 kernel: [fa2dcd91] class_handle_free_cb+0x21/0x190 [obdclass] Oct 13 17:10:58 kernel: [c0124d83] do_gettimeofday+0x31/0xce Oct 13 17:10:58 kernel: [fa2dc06b] class_handle2object+0xbb/0x2a0 [obdclass] Oct 13 17:10:58 kernel: [fa3aca00] lustre_swab_ptlrpc_body+0x0/0xc0 [ptlrpc] Oct 13 17:10:58 kernel: [fa3a9b5a] lustre_swab_buf+0xfa/0x180 [ptlrpc] Oct 13 17:10:58 kernel: [c0125aac] lock_timer_base+0x15/0x2f Oct 13 17:10:59 kernel: [c0125bbd] __mod_timer+0x99/0xa3 Oct 13 17:10:59 kernel: [fa3a6efe] lustre_msg_get_conn_cnt+0xce/0x220 [ptlrpc] Oct 13 17:10:59 kernel: [fa3b8e56] ptlrpc_main+0x2016/0x2f40 [ptlrpc] Oct 13 17:10:59 kernel: [c01b6dc0] __next_cpu+0x12/0x21 Oct 13 17:10:59 kernel: [c012053d] do_exit+0x711/0x71b Oct 13 17:10:59 kernel: [c0117c1f] default_wake_function+0x0/0xc Oct 13 17:10:59 kernel: [fa3b6e40] ptlrpc_main+0x0/0x2f40 [ptlrpc] Oct 13 17:10:59 kernel: [c0101005] kernel_thread_helper+0x5/0xb Oct 13 17:12:38 kernel: Lustre: 0:0:(watchdog.c:130:lcw_cb()) Watchdog triggered for pid 9132: it was inactive for 10 0s Cheers,
Re: [Lustre-discuss] kerberos with lustre
On Oct 13, 2008 14:20 +0200, mohammed gaafar wrote: I've been trying to kerberize lustre, so I followed the instructions mentioned in the lustre manual, but unfortunately the step which requires mounting the MDT and OST using the command mount -t lustre -o sec=plain /dev/sda8 /mnt/data/mdt didn't work raising an error which says Unrecognized mount option sec=plain or missing value Kerberos support is not in any released version of Lustre. If you want to test with a pre-release version of Lustre, you need to CVS checkout lustre HEAD. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss