[Lustre-discuss] lustre 1.8.1.1 support matrix
hi when are we going to see the new updatd lustre 1.8.1.1 support matrix? http://wiki.lustre.org/index.php/Lustre_Support_Matrix only list 1.8.1 Another question Does lustre support Servers running the latest rhel 5.3 with 1.8.1.1 but client only running rhel5.1? Can Client also run 1.8.1.1? can this configuration run OFED 1.4.1 or 1.4.2? TIA attachment: hung-sheng_tsao.vcf___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] 1.8.1 test setup achieved, what about maximum mdt size
Hello, Proudly report working server patched-client 1.8.1 + 2.6.27.23 + drbd 8.3.4 on Debian GNU/Linux (x86) sid/experimental Test install made with two vmware-based virtual machines, and base system (also debian gnu linux) as lustre patched client. Note the following: * First I tried with really small partitions, just a few MB, and this was impossible to mkfs.lustre, because file system too small for a journal, quite reasonably, though. * Test install on virtualbox did not succeed because host-only network bugs/limitation on virtualbox * Confirmed one can use LVM's PVs or LVs as lustre block devices. * Confirmed working with MGS/MDT/OSTs (actually two OSTs for now) AND client on very same (fully-virtual) machine, for testing purposes, no problems with that so far. Now, I did a simple count of MDT size as described in lustre 1.8.1 manual, and setup mdt as recommended. The question is, no matter I did right count or not, what actually will happen, if MDT partition runs out of space? Any chances to dump the whole MGS+MDT combined fs, supply a bigger block device, or extend partition size with some e2fsprogs/tune2fs trick ? This assumes, that no matter how big MDT is, it will be exhausted someday. One possible solution is simply to add/create another FS, with another MGS/MDT. But the question persists :) And one more thing - I use combined MGS/MDT. What's actually about MGS size? I mean, if I use separate MGS and MDT, what size it should have, and how management service works, regarding to its block-device storage ? Regards, Piotr Wadas ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] lustre 1.8.1.1 support matrix
The support matrix has been updated for 1.8.1.1: http://wiki.lustre.org/index.php/Lustre_Release_Information Dr. Hung-Sheng Tsao wrote: hi when are we going to see the new updatd lustre 1.8.1.1 support matrix? http://wiki.lustre.org/index.php/Lustre_Support_Matrix only list 1.8.1 Another question Does lustre support Servers running the latest rhel 5.3 with 1.8.1.1 but client only running rhel5.1? Can Client also run 1.8.1.1? can this configuration run OFED 1.4.1 or 1.4.2? TIA ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] soft lockups on NFS server/Lustre client
On Mon, Oct 12, 2009 at 05:06:28PM +0100, Frederik Ferner wrote: Hi List, on our NFS server exporting our Lustre file system to a number of NFS clients, we've recently started to see kernel: BUG: soft lockup messages. As the locked processes include nfsd, our users are obviously not happy. Around the time when the soft lockup occurs we also see a log of kernel: BUG: warning at fs/inotify.c:181/set_dentry_child_flags() messages, but I don't know if this is related. probably not related. we were seeing this too (no NFS involved at all) https://bugzilla.lustre.org/show_bug.cgi?id=20904 and the upshot is that I'm pretty sure it's harmless and a RHEL bug. I filed https://bugzilla.redhat.com/show_bug.cgi?id=526853 but it's probably being ignored. if you have a rhel support contract maybe you can kick it along a bit... dunno about your soft lockups. as I understand it soft lockups themselves aren't harmful as long as they progress eventually. Lustre 1.6.6 isn't exactly recent. have you tried 1.6.7.2 on your NFS exporter? presumably soft lockups could also be saying your re-exporter or OSS's are overloaded or that you have a slow disk or 3 in a RAID... without NFS involved are all your OSTs up to speed? do you still get problems after echo 60 /proc/sys/kernel/softlockup_thresh cheers, robin We are using Lustre 1.6.6 on all machines, (MDS, OSS, clients). The NFS server/Lustre client with the lockups is running RHEL5.4 with an unpatched RedHat kernel (kernel-2.6.18-92.1.10.el5) with the Lustre modules from Sun. See below for sample logs from the Lustre client/NFS server. I can provide more logs if required. I'm not sure if this a Lustre issue but would appreciate if someone could help. We've not seen it on any other NFS server so far and there seems to be at least some lustre related stuff in the stack trace. Is this a known issue and how can we avoid this? I have not found anything using google and the search on bugzilla.lustre.org. At least the BUG warning seems to be a known issue on this kernel. I hope the logs below are readable enough, I tried to find entries where the stack traces don't overlap but this seems to be the best I can find. Oct 9 15:21:27 cs04r-sc-serv-07 kernel: BUG: warning at fs/inotify.c:181/set_dentry_child_flags() (Tainted: G ) Oct 9 15:21:27 cs04r-sc-serv-07 kernel: Oct 9 15:21:27 cs04r-sc-serv-07 kernel: Call Trace: Oct 9 15:21:27 cs04r-sc-serv-07 kernel: [800ed7d1] set_dentry_child_flags+0xef/0x14d Oct 9 15:21:27 cs04r-sc-serv-07 kernel: [800ed867] remove_watch_no_event+0x38/0x47 Oct 9 15:21:27 cs04r-sc-serv-07 kernel: [800ed88e] inotify_remove_watch_locked+0x18/0x3b Oct 9 15:21:27 cs04r-sc-serv-07 kernel: [800ed97c] inotify_rm_wd+0x7e/0xa1 Oct 9 15:21:27 cs04r-sc-serv-07 kernel: [800ede6e] sys_inotify_rm_watch+0x46/0x63 Oct 9 15:21:27 cs04r-sc-serv-07 kernel: [8005d28d] tracesys+0xd5/0xe0 Oct 9 15:21:27 cs04r-sc-serv-07 kernel: Oct 9 15:21:27 cs04r-sc-serv-07 kernel: BUG: warning at fs/inotify.c:181/set_dentry_child_flags() (Tainted: G ) Oct 9 15:21:27 cs04r-sc-serv-07 kernel: Oct 9 15:21:27 cs04r-sc-serv-07 kernel: Call Trace: Oct 9 15:21:27 cs04r-sc-serv-07 kernel: [800ed7d1] set_dentry_child_flags+0xef/0x14d Oct 9 15:21:27 cs04r-sc-serv-07 kernel: [800ed867] remove_watch_no_event+0x38/0x47 Oct 9 15:21:27 cs04r-sc-serv-07 kernel: [800ed88e] inotify_remove_watch_locked+0x18/0x3b Oct 9 15:21:27 cs04r-sc-serv-07 kernel: [800ed97c] inotify_rm_wd+0x7e/0xa1 Oct 9 15:21:27 cs04r-sc-serv-07 kernel: [800ede6e] sys_inotify_rm_watch+0x46/0x63 Oct 9 15:21:27 cs04r-sc-serv-07 kernel: BUG: soft lockup - CPU#5 stuck for 10s! [nfsd:1] Oct 9 15:21:28 cs04r-sc-serv-07 kernel: CPU 5: Oct 9 15:21:28 cs04r-sc-serv-07 kernel: Modules linked in: vfat fat usb_storage dell_rbu mptctl ipmi_devintf ipmi_si ipmi_msghandler nfs fscache nfsd exportfs lockd nfs_acl auth_rpcgss autofs4 hidp mgc(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ptlrpc(U) ob dclass(U) lnet(U) lvfs(U) libcfs(U) rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo crypto_api mlx4_en(U) dm_multipath video sbs backlight i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev sr_mod cdrom mlx4_core(U) bnx2 serio_raw pcsp kr sg dm_snapshot dm_zero dm_mirror dm_mod ata_piix libata shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Oct 9 15:21:28 cs04r-sc-serv-07 kernel: Pid: 1, comm: nfsd Tainted: G 2.6.18-92.1.10.el5 #1 Oct 9 15:21:28 cs04r-sc-serv-07 kernel: RIP: 0010:[80064ba7] [80064ba7] .text.lock.spinlock+0x5/0x30 Oct 9 15:21:28 cs04r-sc-serv-07 kernel: RSP: 0018:810044241ac8 EFLAGS: 0286 Oct 9 15:21:28 cs04r-sc-serv-07 kernel: RAX: 81006cb6a1a8 RBX: 81006cb6a178 RCX: 810044241b50 Oct 9 15:21:28 cs04r-sc-serv-07