Re: [Lustre-discuss] LBUG encountered in 1.8.0

2009-08-05 Thread Isaac Huang
On Fri, Jul 31, 2009 at 10:52:46AM -0600, Daniel Kulinski wrote: Unmounting lustre when our heartbeat software was misconfigured (IPMI password changed). tx1oss3-clusternet kernel: LustreError: 19350:0:(quota_context.c:1369:lqs_exit()) ASSERTION(atomic_read(q-lqs_refcount)

Re: [Lustre-discuss] Problems upgrading from 1.6 to 1.8

2009-08-05 Thread Mag Gam
Were you able to fix this? On Fri, Jul 17, 2009 at 11:38 AM, Christopher J.Walkerc.j.wal...@qmul.ac.uk wrote: In order to avoid occasional crashes on our 1.6.4.3 OSSs, we have just upgraded our MDS and OSSs from 1.6.4.3 to lustre 1.8.0.1. Unfortunately, we are having problems writing files  

[Lustre-discuss] size of OST

2009-08-05 Thread Mag Gam
I know the largest possible OST is 8TB, but is that a recommended size? I wan to avoid maintaining many objects therefore I was thinking of creating 10x8TB OSTs on 10 OSS. Was wondering what kind of problems can arise. TIA ___ Lustre-discuss mailing

Re: [Lustre-discuss] size of OST

2009-08-05 Thread Brian J. Murrell
On Wed, 2009-08-05 at 07:20 -0400, Mag Gam wrote: I know the largest possible OST is 8TB, but is that a recommended size? Sure, if it meets your needs. I wan to avoid maintaining many objects therefore I was thinking of creating 10x8TB OSTs on 10 OSS. Was wondering what kind of problems can

Re: [Lustre-discuss] How to estimate the time for e2fsck on OST

2009-08-05 Thread Andreas Dilger
On Aug 04, 2009 22:47 +, Peter Grandi wrote: Andreas Dilger wrote: adilger Putting 4 OSTs on a single disk doesn't make sense. adilger A single OST can be up to 8TB, and if you have multiple adilger OSTs on the same disk(s) it will cause terrible adilger performance problems due to

Re: [Lustre-discuss] Lustre v1.8.0.1 slower than expected large-file, sequential-buffered-file-read speed

2009-08-05 Thread Rick Rothstein
Hi Andreas - Thanks for the advice. I will gather additional CPU stats and see what shows up. However, CPU does not seem to be a factor in the slower than expected large file buffered I/O reads. My machines have dual quad 2.66ghz processors, and gross CPU usage hovers around 50% when I'm

Re: [Lustre-discuss] Problems upgrading from 1.6 to 1.8

2009-08-05 Thread Christopher J.Walker
Mag Gam wrote: Were you able to fix this? The MGS errors, we did fix. What we did was follow the procedure for a minor upgrade in the lustre manual - unmount all the clients, unmount all the osts then tunefs.lustre -writeconf on the MGS/MDT, remount that, then tunefs.lustre -writeconf on

[Lustre-discuss] Inode errors at time of job failure

2009-08-05 Thread Daniel Kulinski
What would cause the following error to appear? LustreError: 10991:0:(file.c:2930:ll_inode_revalidate_fini()) failure -2 inode 14520180 This happened at the same time a job failed. Error number 2 is ENOENT which means that this inode does not exist? Is there a way to query the MDS to

Re: [Lustre-discuss] Problems upgrading from 1.6 to 1.8

2009-08-05 Thread Andreas Dilger
On Aug 05, 2009 18:45 +0100, Christopher J.Walker wrote: Aug 5 13:53:01 se02 kernel: LustreError: 2668:0:(lib-move.c:95:lnet_try_match_md()) Matching packet from 12345-10.1.4@tcp, match 1449 length 832 too big: 816 left, 816 allowed This looks like bug 20020, fixed in the 1.8.1

Re: [Lustre-discuss] Moving away from bugzilla

2009-08-05 Thread Christopher J. Morrone
Mag Gam wrote: Are there any plans to move away from Bugzilla for issue tracking? I have been lurking around https://*bugzilla.lustre.org for several months now and I still find it very hard to use, do others have the same feeling? or is there a setting or a preferred filter to see all the

Re: [Lustre-discuss] Inode errors at time of job failure

2009-08-05 Thread Oleg Drokin
Hello! On Aug 5, 2009, at 3:12 PM, Daniel Kulinski wrote: What would cause the following error to appear? Typically this is some sort of a race where you presume an inode exist (because you have some traces of it in memory), but it is not anymore (on mds, anyway). So when client comes to

Re: [Lustre-discuss] Lustre v1.8.0.1 slower than expected large-file, sequential-buffered-file-read speed

2009-08-05 Thread Andreas Dilger
On Aug 05, 2009 13:30 -0400, Rick Rothstein wrote: My machines have dual quad 2.66ghz processors, and gross CPU usage hovers around 50% when I'm running 16 dd read jobs. Be cautious of nice round numbers for CPU usage. Sometimes this means that 1 CPU is 100% busy, and another is 0% busy.