Re: [Lustre-discuss] Cannot send after transport endpoint shutdown (-108)

2008-03-05 Thread Brian J. Murrell
On Tue, 2008-03-04 at 15:55 -0500, Aaron S. Knister wrote:
 I think I tried that before and it didn't help, but I will try it
 again. Thanks for the suggestion.

Just so you guys know, 1000 seconds for the obd_timeout is very, very
large!  As you could probably guess, we have some very, very big Lustre
installations and to the best of my knowledge none of them are using
anywhere near that.  AFAIK (and perhaps a Sun engineer with closer
experience to some of these very large clusters might correct me) the
largest value that the largest clusters are using is in the
neighbourhood of 300s.  There has to be some other problem at play here
that you need 1000s.

Can you both please report your lustre and kernel versions?  I know you
said latest Aaron, but some version numbers might be more solid to go
on.

b.


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] ko2iblnd panics in kiblnd_map_tx_descs

2008-03-05 Thread Liang Zhen
Hi Chris,
To resolve your problem, please:
1. apply this patch to your lnet:
https://bugzilla.lustre.org/attachment.cgi?id=15733
2. please make sure use this option while configure: 
--with-o2ib=/path/to/ofed
3. Copy /path/to/ofed/Module.symvers to your $LUSTRE before building

Regards
Liang

Chris Worley wrote:
 I'm trying to port Lustre 1.6.4.2 to OFED 1.2.5.5 w/ the RHEL kernel
 2.6.9.67.0.4.

 ksocklnd-based mounts work fine, but when I try to mount over IB, I
 get a panic in ko2iblnd in the transmit descriptor mapping routine:

 general protection fault:  [1] SMP
 CPU 1
 Modules linked in: ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U)
 libcfs(U) nfs(U) lockd(U) nfs_acl(U) sunrpc(U) rdma_ucm(U) ib_sdp(U)
 rdma_cm(U) iw_cm(U) ib_addr(U) mlx4_ib(U) mlx4_core(U) ib_mthca(U)
 dm_mod(U) ib_ipoib(U) md5(U) ipv6(U) ib_umad(U) ib_ucm(U) ib_uverbs(U)
 ib_cm(U) ib_sa(U) ib_mad(U) ib_core(U) aic79xx(U) e1000(U) ext3(U)
 jbd(U) raid0(U) mptscsih(U) mptsas(U) mptspi(U) mptscsi(U) mptbase(U)
 sd_mod(U) ata_piix(U) libata(U) scsi_mod(U)
 Pid: 5141, comm: modprobe Not tainted 2.6.9-67.0.4.EL-Lustre-1.6.4.2
 RIP: 0010:[a04659d1]
 a04659d1{:ko2iblnd:kiblnd_map_tx_descs+225}
 RSP: :0102105d7cd8  EFLAGS: 00010286
 RAX: a01e6b4e RBX: ff0010028000 RCX: 0001
 RDX: 1000 RSI: 01020e705000 RDI: 0102154e2000
 RBP: 0102102c4200 R08:  R09: 
 R10:  R11:  R12: 
 R13:  R14:  R15: 0102102c4228
 FS:  002a958a0b00() GS:8046ac00() knlGS:
 CS:  0010 DS:  ES:  CR0: 8005003b
 CR2: 002a9598200f CR3: 9fa08000 CR4: 06e0
 Process modprobe (pid: 5141, threadinfo 0102105d6000, task 
 0102175e0030)
 Stack:  0102102c4080 0102102c4100 0102102c4200
0102179c2b86 0102177df400 010215548ac0 a0466fdf
0102179c2b85 
 Call Trace:a0466fdf{:ko2iblnd:kiblnd_startup+2239}
 a03043dc{:lnet:lnet_startup_lndnis+332}
a02d2f38{:libcfs:cfs_alloc+40}
 a0305206{:lnet:LNetNIInit+278}
a03fcb0a{:ptlrpc:ptlrpc_ni_init+106}
 8012f9cd{default_wake_function+0}
a03fcbfa{:ptlrpc:ptlrpc_init_portals+10}
8012f9cd{default_wake_function+0}
 a045f22b{:ptlrpc:init_module+267}
8014bc0a{sys_init_module+278}
 8010f23e{system_call+126}


 Code: ff 50 08 eb 12 48 8b 3f b9 01 00 00 00 ba 00 10 00 00 e8 30
 RIP a04659d1{:ko2iblnd:kiblnd_map_tx_descs+225} RSP 
 0102105d7cd8

 Does this ring any bells?  Otherwise, any debugging tips?

 Shane said that they get an oops if they compile with the version
 specific OFA tree.  Is this the Oops?

 Thanks,

 Chris
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
   

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Cannot send after transport endpoint shutdown (-108)

2008-03-05 Thread Charles Taylor

Sure, we will provide you with more details of our installation but  
let me first say that, if recollection serves, we did not pull that  
number out of a hat.   I believe that there is a formula in one of  
the lustre tuning manuals for calculating the recommended timeout  
value.   I'll have to take a moment to go back and find it.   Anyway,  
if you use that formula for our cluster, the recommended timeout  
value, I think, comes out to be *much* larger than 1000.

Later this morning, we will go back and find that formula and share  
with the list how we came up w/ our timeout.   Perhaps you can show  
us where we are going wrong.

One more comment We just brought up our second large lustre file  
system.   It is 80+ TB served by 24 OSTs on two (pretty beefy)  
OSSs.   We just achieved over 2GB/sec of sustained (large block,  
sequential) I/O from an aggregate of 20 clients.Our design target  
was 1.0 GB/sec/OSS and we hit that pretty comfortably.   That said,  
when we first mounted the new (1.6.4.2) file system across all 400  
nodes in our cluster, we immediately started getting transport  
endpoint failures and evictions.   We looked rather intensively for  
network/fabric problems (we have both o2ib and tcp nids) and could  
find none.   All of our MPI apps are/were running just fine.   The  
only way we could get rid of the evictions and transport endpoint  
failures was by increasing the timeout.   Also, we knew to do this  
based on our experience with our first lustre file system (1.6.3 +  
patches) where we had to do the same thing.

Like I said, a little bit later, Craig or I will post more details  
about our implementation.   If we are doing something wrong with  
regard to this timeout business, I would love to know what it is.

Thanks,

Charlie Taylor
UF HPC Center

On Mar 4, 2008, at 4:04 PM, Brian J. Murrell wrote:

 On Tue, 2008-03-04 at 15:55 -0500, Aaron S. Knister wrote:
 I think I tried that before and it didn't help, but I will try it
 again. Thanks for the suggestion.

 Just so you guys know, 1000 seconds for the obd_timeout is very, very
 large!  As you could probably guess, we have some very, very big  
 Lustre
 installations and to the best of my knowledge none of them are using
 anywhere near that.  AFAIK (and perhaps a Sun engineer with closer
 experience to some of these very large clusters might correct me) the
 largest value that the largest clusters are using is in the
 neighbourhood of 300s.  There has to be some other problem at play  
 here
 that you need 1000s.

 Can you both please report your lustre and kernel versions?  I know  
 you
 said latest Aaron, but some version numbers might be more solid  
 to go
 on.

 b.


 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Cannot send after transport endpoint shutdown (-108)

2008-03-05 Thread Charles Taylor
Well, go figure.We are running...

Lustre: 1.6.4.2 on clients and servers
Kernel: 2.6.18-8.1.14.el5Lustre (clients and servers)
Platform: X86_64 (opteron 275s, mostly)
Interconnect: IB,  Ethernet
IB Stack: OFED 1.2

We already posted our procedure for patching the kernel, building  
OFED, and building lustre so I don't think I'll go into that  
again.Like I said, we just brought a new file system online.
Everything looked fine at first with just a few clients mounted. 
Once we mounted all 408 (or so), we started gettting all kinds of  
transport endpoint failures and the MGSs and OSTs were evicting  
clients left and right.We looked for network problems and could  
not find any of any substance.Once we increased the obd/lustre/ 
system timeout setting as previously discussed, the errors  
vanished.This was consistent with our experience with 1.6.3 as  
well.That file system has been online since early December.
Both file systems appear to be working well.

I'm not sure what to make of it.Perhaps we are just masking  
another problem. Perhaps there are some other, related values  
that need to be tuned.We've done the best we could but I'm sure  
there is still much about Lustre we don't know.   We'll try to get  
someone out to the next class but until then, we're on our own, so to  
speak.

Charlie Taylor
UF HPC Center


 Just so you guys know, 1000 seconds for the obd_timeout is very, very
 large!  As you could probably guess, we have some very, very big  
 Lustre
 installations and to the best of my knowledge none of them are using
 anywhere near that.  AFAIK (and perhaps a Sun engineer with closer
 experience to some of these very large clusters might correct me) the
 largest value that the largest clusters are using is in the
 neighbourhood of 300s.  There has to be some other problem at play  
 here
 that you need 1000s.

 I can confirm that at a recent large installation with several  
 thousand
 clients, the default of 100 is in effect.


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Lustre MPI-IO performance on CNL

2008-03-05 Thread Weikuan Yu
Hi,

The I/O performance of CNL (as measured with IOR) seems quite different
for a shared file, compared to the same with separated files.

Here are some numbers on a smaller file system on XT system at ORNL. All 
files are striped to 72OSTs. I deliberately use a block size 8512m.

1. sample tests with separate files
# aprun -n 32 -N 1 ~/benchmarks/IOR-2.9.1/src/C/IOR -a MPIIO -b 8512m -t 
64m -d 1 -i 2 -w -r -g -F -o iortes
Max Write: 9978.18 MiB/sec (10462.88 MB/sec)
Max Read:  5612.78 MiB/sec (5885.43 MB/sec)

2. sample share file performance
# aprun -n 32 -N 1 ~/benchmarks/IOR-2.9.1/src/C/IOR -a MPIIO -b 8512m -t 
64m -d 1 -i 2 -w -r -g -o iortes
Max Write: 6817.31 MiB/sec (7148.47 MB/sec)
Max Read:  5591.98 MiB/sec (5863.62 MB/sec)

In addition, using my experimental MPI-IO library, I noticed that 
enabling direct I/O can have various effects for I/O on CNL.

3. sample seprate files with direct I/O
export MPIO_DIRECT_WRITE=true; export MPIO_DIRECT_READ=true; aprun -n 32 
-N 1 ~/benchmarks/IOR-2.10.1/src/C/IOR -a MPIIO -b 8512m -t 64m -d 1 -i 
2 -w -r -g -F -k -o lustre:iortest
Max Write: 9353.66 MiB/sec (9808.03 MB/sec)
Max Read:  8269.28 MiB/sec (8670.97 MB/sec)

4. sample share file performance with direct IO
# export MPIO_DIRECT_WRITE=true; export MPIO_DIRECT_READ=true; aprun -n 
32 -N 1 ~/benchmarks/IOR-2.10.1/src/C/IOR -a MPIIO -b 8512m -t 64m -d 1 
-i 2 -w -r -g -k -o lustre:iortes
Max Write: 9484.11 MiB/sec (9944.81 MB/sec)
Max Read:  7929.63 MiB/sec (8314.81 MB/sec)

It seems direct I/O helps quite a bit on the performance of parallel 
reads, but not on writes. The shared file mode appears to benefit more 
from direct write.

While it is understandable that the client cache can play a big role 
here,  I am not sure how it could help the share-file mode much better. 
Anybody can help with some explanations on the comparison between reads 
and writes and the same for shared-file and separated-files?

Also let me know if I am not clear in my descriptions.

-- 
Weikuan Yu + 1-865-574-7990
http://ft.ornl.gov/~wyu/

P.S.:
What shown are the good numbers from several runs. So you may consider 
them as consistent results.


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lustre official flock support

2008-03-05 Thread Oleg Drokin
Hello!

On Mar 5, 2008, at 11:33 AM, Joe Barjo wrote:
 While making my tests, I saw that the flock system call was not  
 working.
 Googling aroung I found the flock option in the mount command, and it
 seems to work just fine.
 However, I've read in the documentation that flock will only be
 supported in 1.8 version of lustre.
 What is the current status of this?
 Is flock usable in production for 1.6.4.2?

flock has a major flaw in a sense that it is not fd-attached, so
once you open a file, get flock lock, fork and try to release the lock  
from
child, the lock won't actually go away.
posix locking (through fcntl) on the other hand should work just fine.
Note that right now there are some assertions in the code that would  
kill the client
if you issue locking call with unknown parameters (like command), I  
think samba does that.
That code needs to be changed to just return error (in ll_file_flock,  
2 occurrences),
there is a separate patch somewhere in bugzilla, but I cannot find it  
immediately
and it would be included with some changes I am preparing anyway.

Bye,
 Oleg
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Installing Lustre on PowerPC (IBM pSeries)

2008-03-05 Thread Oleg Drokin
Hello!

On Mar 4, 2008, at 4:44 AM, gas5x1 wrote:
 Could you please advice me, how, if at all passible, is to install
 Lustre on IBM PPC64? I have already Lustre 1.6 installation working
 for Intel i386 and AMD Opteron nodes, and now would like to acess it
 from IBM clients.

You just compile as normal and it should technically work.
For missing segment.h problem you saw earlier, please apply patch from
https://bugzilla.lustre.org/show_bug.cgi?id=14844

Bye,
 Oleg
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre MPI-IO performance on CNL

2008-03-05 Thread Weikuan Yu
Marty Barnaby wrote:
 My, perhaps, misunderstanding was that a Lustre FS had a maximum lfs 
 stripe-count of 160. Is this not a constant set  in the LFS, but just 
 some local configuration? Could you be more specific about the actual 
 lfs stripe-count of the file or files you wrote?

You're right on the maximal stripe-count, and 72 being local for my 
choice of the testing. The stripe count can have, but probably a little, 
for the relative comparisons between w/ or w/o direct I/O.

--Weikuan
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre MPI-IO performance on CNL

2008-03-05 Thread Weikuan Yu
 What is the stripe_size of this test? 4M? If it is 4M, then 
 transfer_size is a little
 bigger(64M). And we have seen this situation before, finally it seems 
 because client hold
 too much lock in each write(because of lustre down-forward extent lock 
 policy) which might
 block other client writing, so impact the parallel of the whole system. 
 Maybe you could try
 decrease transfer size to stripe_size. Or increase stripe_size to 64M 
 and see how is it?

Yes, the situation between shared file and separated files has been seen 
before. But I have never seen an explanation regarding CNL. BTW, this 
performance difference between shared/separated stays the same, 
regardless what transfer size is.

Anybody wants to post a reason regarding direct I/O too?

--Weikuan

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre MPI-IO performance on CNL

2008-03-05 Thread Tom.Wang
Hi,
Weikuan Yu wrote:
 Hi,

 The I/O performance of CNL (as measured with IOR) seems quite different
 for a shared file, compared to the same with separated files.

 Here are some numbers on a smaller file system on XT system at ORNL. All 
 files are striped to 72OSTs. I deliberately use a block size 8512m.

 1. sample tests with separate files
 # aprun -n 32 -N 1 ~/benchmarks/IOR-2.9.1/src/C/IOR -a MPIIO -b 8512m -t 
 64m -d 1 -i 2 -w -r -g -F -o iortes
 Max Write: 9978.18 MiB/sec (10462.88 MB/sec)
 Max Read:  5612.78 MiB/sec (5885.43 MB/sec)

 2. sample share file performance
 # aprun -n 32 -N 1 ~/benchmarks/IOR-2.9.1/src/C/IOR -a MPIIO -b 8512m -t 
 64m -d 1 -i 2 -w -r -g -o iortes
 Max Write: 6817.31 MiB/sec (7148.47 MB/sec)
 Max Read:  5591.98 MiB/sec (5863.62 MB/sec)

 In addition, using my experimental MPI-IO library, I noticed that 
 enabling direct I/O can have various effects for I/O on CNL.
   
What is the stripe_size of this test? 4M? If it is 4M, then 
transfer_size is a little
bigger(64M). And we have seen this situation before, finally it seems 
because client hold
too much lock in each write(because of lustre down-forward extent lock 
policy) which might
block other client writing, so impact the parallel of the whole system. 
Maybe you could try
decrease transfer size to stripe_size. Or increase stripe_size to 64M 
and see how is it?

Thanks
WangDi
 3. sample seprate files with direct I/O
 export MPIO_DIRECT_WRITE=true; export MPIO_DIRECT_READ=true; aprun -n 32 
 -N 1 ~/benchmarks/IOR-2.10.1/src/C/IOR -a MPIIO -b 8512m -t 64m -d 1 -i 
 2 -w -r -g -F -k -o lustre:iortest
 Max Write: 9353.66 MiB/sec (9808.03 MB/sec)
 Max Read:  8269.28 MiB/sec (8670.97 MB/sec)

 4. sample share file performance with direct IO
 # export MPIO_DIRECT_WRITE=true; export MPIO_DIRECT_READ=true; aprun -n 
 32 -N 1 ~/benchmarks/IOR-2.10.1/src/C/IOR -a MPIIO -b 8512m -t 64m -d 1 
 -i 2 -w -r -g -k -o lustre:iortes
 Max Write: 9484.11 MiB/sec (9944.81 MB/sec)
 Max Read:  7929.63 MiB/sec (8314.81 MB/sec)

 It seems direct I/O helps quite a bit on the performance of parallel 
 reads, but not on writes. The shared file mode appears to benefit more 
 from direct write.

 While it is understandable that the client cache can play a big role 
 here,  I am not sure how it could help the share-file mode much better. 
 Anybody can help with some explanations on the comparison between reads 
 and writes and the same for shared-file and separated-files?

 Also let me know if I am not clear in my descriptions.

   

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Cannot send after transport endpoint shutdown (-108)

2008-03-05 Thread Frank Leers
On Wed, 2008-03-05 at 13:37 -0500, Aaron Knister wrote:
 Could you tell me what version of OFED was being used? Was it the  
 version that ships with the kernel?

OFED version is 1.2.5.4

 
 -Aaron
 
 On Mar 5, 2008, at 11:33 AM, Frank Leers wrote:
 
  On Wed, 2008-03-05 at 11:08 -0500, Aaron Knister wrote:
  That's very strange. What interconnect is that site using?
 
 
  Not really strange, but -
 
  SDR IB/OFED
 
  lustre 1.6.4.2
  2.6.18.8 clients
  2.6.9-55.0.9 servers
 
  My versions are -
 
  Lustre  - 1.6.4.2
  Kernel (servers) - 2.6.18-8.1.14.el5_lustre.1.6.4.2smp
  Kernel (clients) - 2.6.18-53.1.13.el5
 
 
 
  On Mar 5, 2008, at 11:03 AM, Frank Leers wrote:
 
  On Tue, 2008-03-04 at 22:04 +0100, Brian J. Murrell wrote:
  On Tue, 2008-03-04 at 15:55 -0500, Aaron S. Knister wrote:
  I think I tried that before and it didn't help, but I will try it
  again. Thanks for the suggestion.
 
  Just so you guys know, 1000 seconds for the obd_timeout is very,  
  very
  large!  As you could probably guess, we have some very, very big
  Lustre
  installations and to the best of my knowledge none of them are  
  using
  anywhere near that.  AFAIK (and perhaps a Sun engineer with closer
  experience to some of these very large clusters might correct me)  
  the
  largest value that the largest clusters are using is in the
  neighbourhood of 300s.  There has to be some other problem at play
  here
  that you need 1000s.
 
  I can confirm that at a recent large installation with several
  thousand
  clients, the default of 100 is in effect.
 
 
  Can you both please report your lustre and kernel versions?  I know
  you
  said latest Aaron, but some version numbers might be more solid
  to go
  on.
 
  b.
 
 
  ___
  Lustre-discuss mailing list
  Lustre-discuss@lists.lustre.org
  http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
  ___
  Lustre-discuss mailing list
  Lustre-discuss@lists.lustre.org
  http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
  Aaron Knister
  Associate Systems Analyst
  Center for Ocean-Land-Atmosphere Studies
 
  (301) 595-7000
  [EMAIL PROTECTED]
 
 
 
 
 
 
 Aaron Knister
 Associate Systems Analyst
 Center for Ocean-Land-Atmosphere Studies
 
 (301) 595-7000
 [EMAIL PROTECTED]
 
 
 
 

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Cannot send after transport endpoint shutdown (-108)

2008-03-05 Thread Aaron Knister
Are the clients SuSE, redhat or another distro? I can't get OFED  
1.2.5.4 to build with rhel5 but im working on that.

On Mar 5, 2008, at 2:03 PM, Frank Leers wrote:

 On Wed, 2008-03-05 at 13:37 -0500, Aaron Knister wrote:
 Could you tell me what version of OFED was being used? Was it the
 version that ships with the kernel?

 OFED version is 1.2.5.4


 -Aaron

 On Mar 5, 2008, at 11:33 AM, Frank Leers wrote:

 On Wed, 2008-03-05 at 11:08 -0500, Aaron Knister wrote:
 That's very strange. What interconnect is that site using?


 Not really strange, but -

 SDR IB/OFED

 lustre 1.6.4.2
 2.6.18.8 clients
 2.6.9-55.0.9 servers

 My versions are -

 Lustre  - 1.6.4.2
 Kernel (servers) - 2.6.18-8.1.14.el5_lustre.1.6.4.2smp
 Kernel (clients) - 2.6.18-53.1.13.el5



 On Mar 5, 2008, at 11:03 AM, Frank Leers wrote:

 On Tue, 2008-03-04 at 22:04 +0100, Brian J. Murrell wrote:
 On Tue, 2008-03-04 at 15:55 -0500, Aaron S. Knister wrote:
 I think I tried that before and it didn't help, but I will try  
 it
 again. Thanks for the suggestion.

 Just so you guys know, 1000 seconds for the obd_timeout is very,
 very
 large!  As you could probably guess, we have some very, very big
 Lustre
 installations and to the best of my knowledge none of them are
 using
 anywhere near that.  AFAIK (and perhaps a Sun engineer with  
 closer
 experience to some of these very large clusters might correct me)
 the
 largest value that the largest clusters are using is in the
 neighbourhood of 300s.  There has to be some other problem at  
 play
 here
 that you need 1000s.

 I can confirm that at a recent large installation with several
 thousand
 clients, the default of 100 is in effect.


 Can you both please report your lustre and kernel versions?  I  
 know
 you
 said latest Aaron, but some version numbers might be more solid
 to go
 on.

 b.


 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

 Aaron Knister
 Associate Systems Analyst
 Center for Ocean-Land-Atmosphere Studies

 (301) 595-7000
 [EMAIL PROTECTED]






 Aaron Knister
 Associate Systems Analyst
 Center for Ocean-Land-Atmosphere Studies

 (301) 595-7000
 [EMAIL PROTECTED]






Aaron Knister
Associate Systems Analyst
Center for Ocean-Land-Atmosphere Studies

(301) 595-7000
[EMAIL PROTECTED]




___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] lustre dstat plugin

2008-03-05 Thread Brock Palen
I have wrote a lustre dstat plugin.  You can find it on my blog:

http://www.mlds-networks.com/index.php/component/option,com_mojo/ 
Itemid,29/p,31/

It only works on clients, and has not been tested on multiple mounts,  
Its very simple just reads /proc/

Example:

dstat -a -M lustre

total-cpu-usage -dsk/total- -net/total- ---paging-- --- 
system-- lustre-1.6-
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int
csw | read  writ
  23  53   1  21   0   0|   0  0 |3340k 4383k|   0 0 | 
3476   198 |  16M   22M
  13  69  16   2   0   1|   0  0 |1586k   16M|   0  0 | 
3523   424 |  24M   14M
  69  30   0   0   01|   0  8192B|1029k   18M|   0  0 | 
3029 88 |   0  0

Patches/comments,

Brock Palen
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss