Re: [Lustre-discuss] socknal_sd00 100% lower?

2008-03-10 Thread Isaac Huang
On Fri, Mar 07, 2008 at 03:26:23PM -0700, Andreas Dilger wrote:
 
 Maxim, Isaac, what are your thoughts about disabling IRQ affinity
 by default?  In the past this was important for maximizing performance
 with N CPUs and N ethernet NICs, but the CPUs have gotten much faster
 and more cores and I believe other customers have found better performance
 with irq_affinity disabled.
 

Agree, and with ksocklnd bonding feature deprecated it's now more
common to configure lnet with a single NIC. I've committed the change
and filed a documentation bug to update the manuals accordingly.

Thanks,
Isaac
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lustre dstat plugin

2008-03-10 Thread Brock Palen
On Mar 9, 2008, at 10:03 PM, Aaron Knister wrote:

 Just wondering if either of you have used collectl if/and which you  
 prefer- dstat or collectl.

Never used it, Looks like they solve the same problem.  I like dstat  
for the simple plugins. (if your a better python programer than me).   
And how you can pull our results, like I use the following on our  
lustre OSS with two OST's sda and sdb.

dstat -D sda,sdb,total

That gives me per disk stats and a total.

Similar tools could be made for collectl I'm sure.

Brock


 -Aaron

 On Mar 7, 2008, at 7:03 PM, Brock Palen wrote:

 On Mar 7, 2008, at 6:58 PM, Kilian CAVALOTTI wrote:

 Hi Brock,

 On Wednesday 05 March 2008 05:21:51 pm Brock Palen wrote:
 I have wrote a lustre dstat plugin.  You can find it on my blog:

 That's cool! Very useful for my daily work, thanks!

 Thanks!  Its the first python I ever wrote.


 It only works on clients, and has not been tested on multiple  
 mounts,
 Its very simple just reads /proc/

 It indeed doesn't read stats for multiple mounts. I slightly
 modified it
 so it can display read/write numbers for all the mounts it founds  
 (see
 the attached patch).

 This is great idea


 Here's a typical output for a rsync transfer from scrath to home:

 -- 8  
 ---
 $ dstat -M lustre

 Module dstat_lustre is still experimental.
 --scratch---home---
 read write: read write
 110M0 :   0   110M
 183M0 :   0   183M
 184M0 :   0   184M
 -- 8  
 ---

 Maybe it could be useful to also add the other metrics from the stat
 file, but I'm not sure which ones would be the more relevant. And it
 would probably be wise to do that in a separate module, like
 lustre_stats, to avoid clutter.

 Yes,  dstat comes with plugins for nfsv3  and has two modules,

 dstat_nfs3  and dstat_nfs3op  which has extended details.  So I think
 this would be a good idea to follow that model.


 Anyway, great job, and thanks for sharing it!

 Thanks again.

 Cheers,
 -- 
 Kiliandstat_lustre.diff

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

 Aaron Knister
 Associate Systems Analyst
 Center for Ocean-Land-Atmosphere Studies

 (301) 595-7000
 [EMAIL PROTECTED]







___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] yet another lustre error

2008-03-10 Thread Brock Palen
On Mar 9, 2008, at 10:01 PM, Aaron Knister wrote:

 Hi! I have a few questions for you-

 1. How many nodes was his job running on?

around 64 serial jobs accessing the same directory (not the same files).

 2. What version of lustre and linux kernel are you running on your  
 servers/clients?

Lustre servers:
2.6.9-55.0.9.EL_lustre.1.6.4.1smp

Clients:
2.6.9-67.0.1.ELsmp


 3. What ethernet module are you using on the servers/clients?

Most use the tg3, some use e1000.


 I honestly am not sure what the RPC errors mean but I've had  
 similar issues caused by ethernet-level errors.

Over the weekend the MDS/MGS went into a unhealthy state forced a  
reboot+fsck and when it came back up the directory was accessible  
again and jobs started working again.


 -Aaron

 On Mar 7, 2008, at 6:45 PM, Brock Palen wrote:

 On a file system thats been up for only 57 days,  I have:

 505 lustre-log.   dumps.

 THe problem at hand is a user has many jobs where his jobs are now
 hung trying to create a directory from his pbs script.  On the
 clients i see:

 LustreError: 11-0: an error occurred while communicating with
 [EMAIL PROTECTED] The mds_connect operation failed with -16
 LustreError: Skipped 2 previous similar messages

 On every client his jobs are on.

 In the most recent /tmp/lustre-log.  on the MDS/MGS I see this  
 message:

 @@@ processing error (-16)  [EMAIL PROTECTED] x12808293/t0 o38-
 [EMAIL PROTECTED]:-1
 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0
 ldlm_lib.c
 target_handle_reconnect
 nobackup-MDT: 34b4fbea-200b-1f7c-dac0-516b8ce786fc reconnecting
 ldlm_lib.c
 target_handle_connect
 nobackup-MDT: refuse reconnection from 34b4fbea-200b-1f7c-
 [EMAIL PROTECTED]@tcp to 0x0100069a7000; still busy
 with 2 active RPCs
 ldlm_lib.c
 target_send_reply_msg
 @@@ processing error (-16)  [EMAIL PROTECTED] x11199816/t0 o38-
 [EMAIL PROTECTED]:-1
 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0


 What I see messages about active rpc's in other logs.  What would
 this mean?  Is something suck someplace ?



 Brock Palen
 Center for Advanced Computing
 [EMAIL PROTECTED]
 (734)936-1985


 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

 Aaron Knister
 Associate Systems Analyst
 Center for Ocean-Land-Atmosphere Studies

 (301) 595-7000
 [EMAIL PROTECTED]







___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Modprobe lustre fails

2008-03-10 Thread mitcheloc
Isaac,

Thanks for the quick response. A quick google search didn't tell me how I
can check the module parameters. What command or file should I check for
this?

And as you requested:

[EMAIL PROTECTED] ~]# ls /lib/modules/2.6.18-53.1.14.el5.lustre
/kernel/net/lustre
ksocklnd.ko  libcfs.ko  lnet.ko  lnet_selftest.ko

[EMAIL PROTECTED] ~]# rpm -ql lustre-modules
/lib/modules/2.6.18-53.1.14.el5.lustre/kernel/fs/lustre/llite_lloop.ko
/lib/modules/2.6.18-53.1.14.el5.lustre/kernel/fs/lustre/lov.ko
/lib/modules/2.6.18-53.1.14.el5.lustre/kernel/fs/lustre/lquota.ko
/lib/modules/2.6.18-53.1.14.el5.lustre/kernel/fs/lustre/lustre.ko
/lib/modules/2.6.18-53.1.14.el5.lustre/kernel/fs/lustre/lvfs.ko
/lib/modules/2.6.18-53.1.14.el5.lustre/kernel/fs/lustre/mdc.ko
/lib/modules/2.6.18-53.1.14.el5.lustre/kernel/fs/lustre/mgc.ko
/lib/modules/2.6.18-53.1.14.el5.lustre/kernel/fs/lustre/obdclass.ko
/lib/modules/2.6.18-53.1.14.el5.lustre/kernel/fs/lustre/obdecho.ko
/lib/modules/2.6.18-53.1.14.el5.lustre/kernel/fs/lustre/osc.ko
/lib/modules/2.6.18-53.1.14.el5.lustre/kernel/fs/lustre/ptlrpc.ko
/lib/modules/2.6.18-53.1.14.el5.lustre/kernel/net/lustre/ksocklnd.ko
/lib/modules/2.6.18-53.1.14.el5.lustre/kernel/net/lustre/libcfs.ko
/lib/modules/2.6.18-53.1.14.el5.lustre/kernel/net/lustre/lnet.ko
/lib/modules/2.6.18-53.1.14.el5.lustre/kernel/net/lustre/lnet_selftest.ko
/usr/share/doc/lustre-modules-1.6.4.3
/usr/share/doc/lustre-modules-1.6.4.3/COPYING

Thank you!
On Mon, Mar 10, 2008 at 11:02 AM, Isaac Huang [EMAIL PROTECTED] wrote:

 On Mon, Mar 10, 2008 at 10:04:50AM -0500, mitcheloc wrote:
 
 [EMAIL PROTECTED] ~]# dmesg
 Lustre: OBD class driver, [EMAIL PROTECTED]
 Lustre Version: [3]1.6.4.3
 Build Version:
 
 1.6.4.3-1969123116-PRISTINE-.usr.src.linux-2.6.18-53.1.14.el5.lustr
 e
 Lustre: Added LNI [EMAIL PROTECTED] [8/256]
 LustreError: 2359:0:(api-ni.c:1025:lnet_startup_lndnis()) Can't load
 LND elan, module kqswlnd, rc=256

 LNet couldn't load the driver module (kqswlnd) for elan. What's your
 lnet module parameters?

 Please also run:
 ls /lib/modules/2.6.18-53.1.14.el5.lustre/kernel/net/lustre
 rpm -ql lustre-modules

 Thanks,
 Isaac

 Lustre: Removed LNI [EMAIL PROTECTED]
  LustreError: 2359:0:(events.c:654:ptlrpc_init_portals()) network
 initialisation failed

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Modprobe lustre fails

2008-03-10 Thread Isaac Huang
On Mon, Mar 10, 2008 at 11:19:54AM -0500, mitcheloc wrote:
Isaac,
 
Thanks for the quick response. A quick google search didn't tell me how
I can check the module parameters. What command or file should I check
for this?
 

It shall be in /etc/modprobe.conf or some file under /etc/modprobe.d.
Exact location depends on your distribution. Look for a line that
starts with options lnet .

 
And as you requested:
 
[EMAIL PROTECTED] ~]# ls
/lib/modules/2.6.18-53.1.14.el5.lustre/kernel/net/lustre
ksocklnd.ko  libcfs.ko  lnet.ko  lnet_selftest.ko
 

The kqswlnd.ko is missing.

Isaac
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Modprobe lustre fails

2008-03-10 Thread mitcheloc
From modprobe.conf:

options lnet networks=tcp0,elan0

Where should kqswlnd.ko be coming from?

On Mon, Mar 10, 2008 at 11:35 AM, Isaac Huang [EMAIL PROTECTED] wrote:

 On Mon, Mar 10, 2008 at 11:19:54AM -0500, mitcheloc wrote:
 Isaac,
 
 Thanks for the quick response. A quick google search didn't tell me
 how
 I can check the module parameters. What command or file should I
 check
 for this?
 

 It shall be in /etc/modprobe.conf or some file under /etc/modprobe.d.
 Exact location depends on your distribution. Look for a line that
 starts with options lnet .

 
 And as you requested:
 
 [EMAIL PROTECTED] ~]# ls
 /lib/modules/2.6.18-53.1.14.el5.lustre/kernel/net/lustre
 ksocklnd.ko  libcfs.ko  lnet.ko  lnet_selftest.ko
 

 The kqswlnd.ko is missing.

 Isaac

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Modprobe lustre fails

2008-03-10 Thread Isaac Huang
On Mon, Mar 10, 2008 at 11:38:33AM -0500, mitcheloc wrote:
From modprobe.conf:
 
options lnet networks=tcp0,elan0

If you don't have Quadrics Elan hardware, you can change it to:
options lnet networks=tcp0

Otherwise,

Where should kqswlnd.ko be coming from?

you need to compile lustre with proper QsNet support.

Isaac
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Modprobe lustre fails

2008-03-10 Thread mitcheloc
Isaac,

I checked my ethernet card and it didn't look like Quadrics hardware.

 [EMAIL PROTECTED] ~]# lspci | grep Ethernet
00:19.0 Ethernet controller: Intel Corporation 82566DM Gigabit Network
Connection (rev 02)

So I removed the parameter, rebooted and it worked like a charm! I wonder
how that setting got into my modules.conf file. I checked on another CentOS
system I set up and it is not there. It was probably inserted by some other
DFS I was trying out.

After changing modules.conf and rebooting:

[EMAIL PROTECTED] ~]# modprobe lustre
[EMAIL PROTECTED] ~]# dmesg
Lustre: OBD class driver, [EMAIL PROTECTED]
Lustre Version: 1.6.4.3
Build Version:
1.6.4.3-1969123116-PRISTINE-.usr.src.linux-2.6.18-53.1.14.el5.lustre
Lustre: Added LNI [EMAIL PROTECTED] [8/256]
Lustre: Accept secure, port 988
Lustre: Lustre Client File System; [EMAIL PROTECTED]

Thanks  hopefully I don't run into any other issues.

Cheers,
Mitchel

On Mon, Mar 10, 2008 at 12:48 PM, Isaac Huang [EMAIL PROTECTED] wrote:

 On Mon, Mar 10, 2008 at 11:38:33AM -0500, mitcheloc wrote:
 From modprobe.conf:
 
 options lnet networks=tcp0,elan0

 If you don't have Quadrics Elan hardware, you can change it to:
 options lnet networks=tcp0

 Otherwise,

 Where should kqswlnd.ko be coming from?

 you need to compile lustre with proper QsNet support.

 Isaac

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Modprobe lustre fails

2008-03-10 Thread mitcheloc
Hmm. I did run into this while trying llmount.sh.
[EMAIL PROTECTED] tests]# pwd
/usr/src/lustre-1.6.4.3/lustre/tests
[EMAIL PROTECTED] tests]# sh llmount.sh
Loading modules from /usr/src/lustre-1.6.4.3/lustre/tests/..
lnet options: 'networks=tcp0'
FATAL: Module mgs not found.
[EMAIL PROTECTED] tests]# dmesg -c
[EMAIL PROTECTED] tests]#
Does this mean I should add a ,mgs to networks=tcp0?
On Mon, Mar 10, 2008 at 1:32 PM, mitcheloc [EMAIL PROTECTED] wrote:

 Isaac,

 I checked my ethernet card and it didn't look like Quadrics hardware.

  [EMAIL PROTECTED] ~]# lspci | grep Ethernet
 00:19.0 Ethernet controller: Intel Corporation 82566DM Gigabit Network
 Connection (rev 02)

 So I removed the parameter, rebooted and it worked like a charm! I wonder
 how that setting got into my modules.conf file. I checked on another
 CentOS system I set up and it is not there. It was probably inserted by some
 other DFS I was trying out.

 After changing modules.conf and rebooting:

 [EMAIL PROTECTED] ~]# modprobe lustre
 [EMAIL PROTECTED] ~]# dmesg
 Lustre: OBD class driver, [EMAIL PROTECTED]
 Lustre Version: 1.6.4.3
 Build Version:
 1.6.4.3-1969123116-PRISTINE-.usr.src.linux-2.6.18-53.1.14.el5.lustre
 Lustre: Added LNI [EMAIL PROTECTED] [8/256]
 Lustre: Accept secure, port 988
 Lustre: Lustre Client File System; [EMAIL PROTECTED]

 Thanks  hopefully I don't run into any other issues.

 Cheers,
 Mitchel

   On Mon, Mar 10, 2008 at 12:48 PM, Isaac Huang [EMAIL PROTECTED] wrote:

  On Mon, Mar 10, 2008 at 11:38:33AM -0500, mitcheloc wrote:
  From modprobe.conf:
  
  options lnet networks=tcp0,elan0
 
  If you don't have Quadrics Elan hardware, you can change it to:
  options lnet networks=tcp0
 
  Otherwise,
 
  Where should kqswlnd.ko be coming from?
 
  you need to compile lustre with proper QsNet support.
 
  Isaac
 

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Modprobe lustre fails

2008-03-10 Thread Jack Chen

 Hmm. I did run into this while trying llmount.sh.
 [EMAIL PROTECTED] tests]# pwd
 /usr/src/lustre-1.6.4.3/lustre/tests http://1.6.4.3/lustre/tests
 [EMAIL PROTECTED] tests]# sh llmount.sh
 Loading modules from /usr/src/lustre-1.6.4.3/lustre/tests/ 
 http://1.6.4.3/lustre/tests/..
 lnet options: 'networks=tcp0'
 FATAL: Module mgs not found.
 [EMAIL PROTECTED] tests]# dmesg -c
 [EMAIL PROTECTED] tests]#
 Does this mean I should add a ,mgs to networks=tcp0?
Can you verify if mgs module is exist?

Run command by Isaac mentioned:

ls /lib/modules/2.6.18-53.1.14.el5.lustre/kernel/fs/lustre
rpm -ql lustre-modules

If so, please try to modprobe mgs manually to see if any messages 
displayed.

Jack
 

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
   

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre SNMP module

2008-03-10 Thread Kilian CAVALOTTI
Hi Klaus,

On Friday 07 March 2008 05:52:51 pm Klaus Steden wrote:
 I was asking that same question a few months ago.

Yes, I remember you haven't been overwhelmed by answers. :\

 I can send you my 
 1.6.2 spec file for reference ... That version also did not bundle
 the SNMP library, so I ended up building it by recompiling the whole
 set of Lustre RPMs to get what I needed, and then just dropped the
 DSO in place.

That's exactly what I did, finally.

 I'm curious as to what metrics you see to be useful -- I wasn't sure
 what to look for, so while I installed the module, I haven't yet
 thought of good things to ask of it.

So, from what I've seen in the MIB, the current SNMP module mainly 
report version numbers and free space information.

I think it would also be useful to get activity metrics, the same kind 
of information which is in /proc/fs/lustre/llite/*/stats on clients (so 
we can see reads/writes and fs operations rates), 
in /proc/fs/lustre/obdfilter/*/stats on OSSes and 
in /proc/fs/lustre/mds/*/stats on MDSes.

Actually, all the /proc/fs/lustre/*/**/stats could be useful, but I 
guess what precise metric is the most useful heavily depends on what 
you want to see. :)

Cheers,
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss