[Lustre-discuss] lustre 1.8.1.1 support matrix

2009-10-18 Thread Dr. Hung-Sheng Tsao

hi
when are we going to see the new updatd  lustre 1.8.1.1 support matrix?
http://wiki.lustre.org/index.php/Lustre_Support_Matrix
only list 1.8.1

Another question
Does lustre support  Servers running the latest rhel 5.3 with 1.8.1.1
but client only running rhel5.1?
Can Client also run 1.8.1.1?
can this configuration run OFED 1.4.1 or 1.4.2?
TIA



attachment: hung-sheng_tsao.vcf___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] 1.8.1 test setup achieved, what about maximum mdt size

2009-10-18 Thread Piotr Wadas

Hello, 
Proudly report working 

server  patched-client 1.8.1 + 2.6.27.23 + drbd 8.3.4 on 
Debian GNU/Linux (x86) sid/experimental

Test install made with two vmware-based virtual machines, and
base system (also debian gnu linux) as lustre patched client.

Note the following:

* First I tried with really small partitions, just a few MB,
and this was impossible to mkfs.lustre, because file system too small for 
a journal, quite reasonably, though.

* Test install on virtualbox did not succeed because host-only network
bugs/limitation on virtualbox

* Confirmed one can use LVM's PVs or LVs as lustre block devices.

* Confirmed working with MGS/MDT/OSTs (actually two OSTs for now) AND 
client on very same (fully-virtual) machine, for testing purposes, no 
problems with that so far.

Now, I did a simple count of MDT size as described in lustre 1.8.1 manual,
and setup mdt as recommended. The question is, no matter I did right count
or not, what actually will happen, if MDT partition runs out of space?
Any chances to dump the whole MGS+MDT combined fs, supply a bigger block 
device, or extend partition size with some e2fsprogs/tune2fs trick ?
This assumes, that no matter how big MDT is, it will be exhausted someday.

One possible solution is simply to add/create another FS, with another 
MGS/MDT. But the question persists :) 

And one more thing - I use combined MGS/MDT. What's actually about MGS 
size? I mean, if I use separate MGS and MDT, what size it should have,
and how management service works, regarding to its block-device storage ?

Regards,
Piotr Wadas
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lustre 1.8.1.1 support matrix

2009-10-18 Thread Sheila Barthel
The support matrix has been updated for 1.8.1.1:

http://wiki.lustre.org/index.php/Lustre_Release_Information

Dr. Hung-Sheng Tsao wrote:
 hi
 when are we going to see the new updatd  lustre 1.8.1.1 support matrix?
 http://wiki.lustre.org/index.php/Lustre_Support_Matrix
 only list 1.8.1

 Another question
 Does lustre support  Servers running the latest rhel 5.3 with 1.8.1.1
 but client only running rhel5.1?
 Can Client also run 1.8.1.1?
 can this configuration run OFED 1.4.1 or 1.4.2?
 TIA



 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
   

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] soft lockups on NFS server/Lustre client

2009-10-18 Thread Robin Humble
On Mon, Oct 12, 2009 at 05:06:28PM +0100, Frederik Ferner wrote:
Hi List,

on our NFS server exporting our Lustre file system to a number of NFS 
clients, we've recently started to see kernel: BUG: soft lockup 
messages. As the locked processes include nfsd, our users are obviously 
not happy.

Around the time when the soft lockup occurs we also see a log of 
kernel: BUG: warning at fs/inotify.c:181/set_dentry_child_flags() 
messages, but I don't know if this is related.

probably not related. we were seeing this too (no NFS involved at all)
  https://bugzilla.lustre.org/show_bug.cgi?id=20904
and the upshot is that I'm pretty sure it's harmless and a RHEL bug.
I filed
  https://bugzilla.redhat.com/show_bug.cgi?id=526853
but it's probably being ignored. if you have a rhel support contract
maybe you can kick it along a bit...

dunno about your soft lockups. as I understand it soft lockups
themselves aren't harmful as long as they progress eventually.

Lustre 1.6.6 isn't exactly recent. have you tried 1.6.7.2 on your NFS
exporter?

presumably soft lockups could also be saying your re-exporter or OSS's
are overloaded or that you have a slow disk or 3 in a RAID... without
NFS involved are all your OSTs up to speed?

do you still get problems after
  echo 60  /proc/sys/kernel/softlockup_thresh

cheers,
robin


We are using Lustre 1.6.6 on all machines, (MDS, OSS, clients). The NFS 
server/Lustre client with the lockups is running RHEL5.4 with an 
unpatched RedHat kernel (kernel-2.6.18-92.1.10.el5) with the Lustre 
modules from Sun.

See below for sample logs from the Lustre client/NFS server. I can 
provide more logs if required.

I'm not sure if this a Lustre issue but would appreciate if someone 
could help. We've not seen it on any other NFS server so far and there 
seems to be at least some lustre related stuff in the stack trace.

Is this a known issue and how can we avoid this? I have not found 
anything using google and the search on bugzilla.lustre.org. At least 
the BUG warning seems to be a known issue on this kernel.

I hope the logs below are readable enough, I tried to find entries where 
the stack traces don't overlap but this seems to be the best I can find.

Oct  9 15:21:27 cs04r-sc-serv-07 kernel: BUG: warning at 
fs/inotify.c:181/set_dentry_child_flags() (Tainted: G )
Oct  9 15:21:27 cs04r-sc-serv-07 kernel:
Oct  9 15:21:27 cs04r-sc-serv-07 kernel: Call Trace:
Oct  9 15:21:27 cs04r-sc-serv-07 kernel:  [800ed7d1] 
set_dentry_child_flags+0xef/0x14d
Oct  9 15:21:27 cs04r-sc-serv-07 kernel:  [800ed867] 
remove_watch_no_event+0x38/0x47
Oct  9 15:21:27 cs04r-sc-serv-07 kernel:  [800ed88e] 
inotify_remove_watch_locked+0x18/0x3b
Oct  9 15:21:27 cs04r-sc-serv-07 kernel:  [800ed97c] 
inotify_rm_wd+0x7e/0xa1
Oct  9 15:21:27 cs04r-sc-serv-07 kernel:  [800ede6e] 
sys_inotify_rm_watch+0x46/0x63
Oct  9 15:21:27 cs04r-sc-serv-07 kernel:  [8005d28d] 
tracesys+0xd5/0xe0
Oct  9 15:21:27 cs04r-sc-serv-07 kernel:
Oct  9 15:21:27 cs04r-sc-serv-07 kernel: BUG: warning at 
fs/inotify.c:181/set_dentry_child_flags() (Tainted: G )
Oct  9 15:21:27 cs04r-sc-serv-07 kernel:
Oct  9 15:21:27 cs04r-sc-serv-07 kernel: Call Trace:
Oct  9 15:21:27 cs04r-sc-serv-07 kernel:  [800ed7d1] 
set_dentry_child_flags+0xef/0x14d
Oct  9 15:21:27 cs04r-sc-serv-07 kernel:  [800ed867] 
remove_watch_no_event+0x38/0x47
Oct  9 15:21:27 cs04r-sc-serv-07 kernel:  [800ed88e] 
inotify_remove_watch_locked+0x18/0x3b
Oct  9 15:21:27 cs04r-sc-serv-07 kernel:  [800ed97c] 
inotify_rm_wd+0x7e/0xa1
Oct  9 15:21:27 cs04r-sc-serv-07 kernel:  [800ede6e] 
sys_inotify_rm_watch+0x46/0x63
Oct  9 15:21:27 cs04r-sc-serv-07 kernel: BUG: soft lockup - CPU#5 stuck 
for 10s! [nfsd:1]
Oct  9 15:21:28 cs04r-sc-serv-07 kernel: CPU 5:
Oct  9 15:21:28 cs04r-sc-serv-07 kernel: Modules linked in: vfat fat 
usb_storage dell_rbu mptctl ipmi_devintf ipmi_si ipmi_msghandler nfs 
fscache nfsd exportfs lockd nfs_acl auth_rpcgss autofs4 hidp mgc(U) 
lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ptlrpc(U) ob
dclass(U) lnet(U) lvfs(U) libcfs(U) rfcomm l2cap bluetooth sunrpc ipv6 
xfrm_nalgo crypto_api mlx4_en(U) dm_multipath video sbs backlight i2c_ec 
i2c_core button battery asus_acpi acpi_memhotplug ac parport_pc lp 
parport joydev sr_mod cdrom mlx4_core(U) bnx2 serio_raw pcsp
kr sg dm_snapshot dm_zero dm_mirror dm_mod ata_piix libata shpchp mptsas 
mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd 
ohci_hcd ehci_hcd
Oct  9 15:21:28 cs04r-sc-serv-07 kernel: Pid: 1, comm: nfsd Tainted: 
G  2.6.18-92.1.10.el5 #1
Oct  9 15:21:28 cs04r-sc-serv-07 kernel: RIP: 0010:[80064ba7] 
  [80064ba7] .text.lock.spinlock+0x5/0x30
Oct  9 15:21:28 cs04r-sc-serv-07 kernel: RSP: 0018:810044241ac8 
EFLAGS: 0286
Oct  9 15:21:28 cs04r-sc-serv-07 kernel: RAX: 81006cb6a1a8 RBX: 
81006cb6a178 RCX: 810044241b50
Oct  9 15:21:28 cs04r-sc-serv-07