[Lustre-discuss] Lustre inode cache tunables

2009-01-15 Thread Jakob Goldbach
Hi,

I daily run 'find /lustre' on a filesystem with many files. This
consumes a lot of memory and /proc/slabinfo reveals that
lustre_inode_cache has ~900 objects.

I've seen the system swapping sometimes, causing slow responses and
evictions. Any tunables for reclaming pages from the lustre_inode_cache
slab? 

/Jakob


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre MDS Errors 1-7 and operation 101

2009-01-15 Thread Thomas Roth
Thank you for this clarification on the operation X message!
Running Lustre without MGS or even MDT is something I have tested
already - involuntarily ;-)
But I was confused because in this case,  there were new mounts coming
all the time, so the MGS was there and answering, and at the same time
Lustre talks about an unconnected MGS.

Thomas

Cliff White wrote:
 Thomas Roth wrote:
 Hi all,

 on our production cluster we have for a surprisingly long time ( 1 day)
 only the following two error messages (and no visible problems),
 although the system is under heavy load right now:

 Jan 14 10:44:33 server1 kernel: LustreError:
 5118:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error
 (-107)  r...@8107fd6c4c50 x2077599/t0 o101-?@?:0/0 lens 232/0 e
 0 to 0 dl 1231927273 ref 1 fl Interpret:/0/0 rc -107/0

 and:

 Jan 14 10:46:42 server1 kernel: LustreError:
 6766:0:(mgs_handler.c:557:mgs_handle()) lustre_mgs: operation 101 on
 unconnected MGS


 error (-107) is /* Transport endpoint is not connected */  -   I have
 seen this before on clients which had lost the connection to the
 cluster. But this is on the MGS/MDS - one server with one partition for
 the MGS and one for the MDT.
 
 Remember, this is a distributed client/server system. When any node
 needs to connect to a service, there will be a client process.
 So, an OSS (which needs to talk to the MDS) will have a metadata client
 (mdc) running on it.
 
 The second error suggests of course that the MGS is actually not
 connected - but how can a Lustre system run when its MGS isn't there?
 Makes no sense, does it?
 
 Ah, that's the beauty of Lustre. The MGS is needed for two things:
 - New clients get the mount from the MGS
 - Configuration changes are propagated from the MGS.
 So, if you are not actively mounting clients, and not changing the
 configuration, in fact Lustre can run just fine without the MGS.
 Filesystem users will not even notice it's gone, unless they are
 attempting a mount.
 
 Likewise, the MDS is used for metadata transactions. If a client is not
 actively touching metadata, (for example a client already has an open
 file and is doing IO only) you can fail the MDS without the clients
 noticing.
 
 Those two errors are quite harmless in this case - 'operation x on
 unconnected MGS' means a client was evicted, the client is attempting to
 replay an RPC, however the server has destroyed the import (due to the
 eviction) and it has not been re-established.
 
 cliffw
 
 

 O.k., the cluster is running Debian Etch 64bit, Kernel 2.6.22, Lustre
 1.6.5.1.  The operation 101 thing is supposed to have been solved in
 the 1.6.4 - 1.6.5 upgrade, according to the change logs. Either it
 hasn't, or I have a real problem were this error message really applies.

 It is also remarkable that it seems nobody seems to know about the
 meaning of operation X on unconnected MGS - via Google one will find
 many questions  but no answers - at least that's my impression (and I
 didn't search Bugzilla).

 Many thanks,
 Thomas




 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 

-- 

Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986

GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1
D-64291 Darmstadt
www.gsi.de

Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Darmstadt
Handelsregister: Amtsgericht Darmstadt, HRB 1528

Geschäftsführer: Professor Dr. Horst Stöcker

Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph,
Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] o2ib cant ping/mount Infiniband NID

2009-01-15 Thread subbu kl
Problem is similer to
http://lists.lustre.org/pipermail/lustre-discuss/2008-May/007498.html
But by looking at the thread could not really get the solution for the
problem.

I have two RHEL5 Linux servers installed with following packages -

kernel-lustre-smp-2.6.18-53.1.14.el5_lustre.1.6.5.1
kernel-ib-1.3-2.6.18_53.1.14.el5_lustre.1.6.5.1smp
lustre-ldiskfs-3.0.4-2.6.18_53.1.14.el5_lustre.1.6.5.1smp
lustre-1.6.5.1-2.6.18_53.1.14.el5_lustre.1.6.5.1smp
lustre-modules-1.6.5.1-2.6.18_53.1.14.el5_lustre.1.6.5.1smp
e2fsprogs-1.40.7.sun3-0redhat


machine 1: with ib0 IP address : 172.24.198.111
machine 2: with ib0 IP address : 172.24.198.112

/etc/modprobe.conf contains
options lnet networks=o2ib

TCP networking worked fine and now I am trying with Infiniband network
finding it difficult in communicating with IB nodes mounting effort throghs
me the following error

[r...@p186 ~]# mount -t lustre -o loop /tmp/lustre-ost1 /mnt/ost1
mount.lustre: mount /dev/loop0 at /mnt/ost1 failed: Input/output error
Is the MGS running?

/var/log/messages :
Jan 15 16:55:25 p186 kernel: kjournald starting.  Commit interval 5 seconds
Jan 15 16:55:25 p186 kernel: LDISKFS FS on loop0, internal journal
Jan 15 16:55:25 p186 kernel: LDISKFS-fs: mounted filesystem with ordered
data mode.
Jan 15 16:55:25 p186 kernel: kjournald starting.  Commit interval 5 seconds
Jan 15 16:55:25 p186 kernel: LDISKFS FS on loop0, internal journal
Jan 15 16:55:25 p186 kernel: LDISKFS-fs: mounted filesystem with ordered
data mode.
Jan 15 16:55:25 p186 kernel: LDISKFS-fs: file extents enabled
Jan 15 16:55:25 p186 kernel: LDISKFS-fs: mballoc enabled
Jan 15 16:55:30 p186 kernel: Lustre: Request x7 sent from
mgc172.24.198@o2ib to NID 172.24.198@o2ib 5s ago has timed out
(limit 5s).
Jan 15 16:55:30 p186 kernel: LustreError:
7193:0:(obd_mount.c:1062:server_start_targets()) Required registration
failed for lustre-OST: -5
Jan 15 16:55:30 p186 kernel: LustreError: 15f-b: Communication error with
the MGS.  Is the MGS running?
Jan 15 16:55:30 p186 kernel: LustreError:
7193:0:(obd_mount.c:1597:server_fill_super()) Unable to start targets: -5
Jan 15 16:55:30 p186 kernel: LustreError:
7193:0:(obd_mount.c:1382:server_put_super()) no obd lustre-OST
Jan 15 16:55:30 p186 kernel: LustreError:
7193:0:(obd_mount.c:119:server_deregister_mount()) lustre-OST not
registered
Jan 15 16:55:30 p186 kernel: LDISKFS-fs: mballoc: 0 blocks 0 reqs (0
success)
Jan 15 16:55:30 p186 kernel: LDISKFS-fs: mballoc: 0 extents scanned, 0 goal
hits, 0 2^N hits, 0 breaks, 0 lost
Jan 15 16:55:30 p186 kernel: LDISKFS-fs: mballoc: 0 generated and it took 0
Jan 15 16:55:30 p186 kernel: LDISKFS-fs: mballoc: 0 preallocated, 0
discarded
Jan 15 16:55:30 p186 kernel: Lustre: server umount lustre-OST complete
Jan 15 16:55:30 p186 kernel: LustreError:
7193:0:(obd_mount.c:1951:lustre_fill_super()) Unable to mount  (-5)

All pinging efforts also failed to the IB NIDS local/remote
can ping the ip address :
[r...@p186 ~]# ping 172.24.198.112
PING 172.24.198.112 (172.24.198.112) 56(84) bytes of data.
64 bytes from 172.24.198.112: icmp_seq=1 ttl=64 time=0.052 ms
64 bytes from 172.24.198.112: icmp_seq=2 ttl=64 time=0.024 ms

--- 172.24.198.112 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.024/0.038/0.052/0.014 ms
[r...@p186 ~]# ping 172.24.198.111
PING 172.24.198.111 (172.24.198.111) 56(84) bytes of data.
64 bytes from 172.24.198.111: icmp_seq=1 ttl=64 time=2.16 ms
64 bytes from 172.24.198.111: icmp_seq=2 ttl=64 time=0.296 ms

--- 172.24.198.111 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.296/1.231/2.166/0.935 ms

but cant ping the NIDS :
[r...@p186 ~]# lctl ping 172.24.198@o2ib
failed to ping 172.24.198@o2ib: Input/output error
[r...@p186 ~]# lctl ping 172.24.198@o2ib
failed to ping 172.24.198@o2ib: Input/output error

Any idea why lnet cant ping NIDS ?

some more configurations:
[r...@p186 ~]# ibstat
CA 'mthca0'
CA type: MT23108
Number of ports: 2
Firmware version: 3.5.0
Hardware version: a1
Node GUID: 0x0002c9020021550c

Machines are connected via IB switch.

Looking forward for help.

~subbu
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] About MDS failover

2009-01-15 Thread Cliff White
Jeffrey Alan Bennett wrote:
 Hi,
 
 What software are people using for MDS failover? 
 
 I have been using Heartbeat from Linux-HA but I am not absolutely happy with 
 its performance.
 
 Is there anything better out there?

Are you using heartbeat V1 or V2?

I would like to hear more about the issues you are experiencing.
We have had some people use the Red Hat cluster tools.

cliffw

 
 Thanks,
 
 Jeffrey Bennett
 HPC Data Engineer
 San Diego Supercomputer Center
 858.822.0936 http://users.sdsc.edu/~jab
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] o2ib cant ping/mount Infiniband NID

2009-01-15 Thread Liang Zhen
Subbu,

I'd suggest:
1) make sure ko2iblnd has been brought up (please check if there is any 
error message when startup ko2iblnd)
2) echo +neterror  /proc/sys/lnet/printk, then try with lctl ping, if 
it still can't work please post error messages

Regards
Liang

subbu kl:
 Problem is similer to 
 http://lists.lustre.org/pipermail/lustre-discuss/2008-May/007498.html
 But by looking at the thread could not really get the solution for the 
 problem.

 I have two RHEL5 Linux servers installed with following packages -

 kernel-lustre-smp-2.6.18-53.1.14.el5_lustre.1.6.5.1
 kernel-ib-1.3-2.6.18_53.1.14.el5_lustre.1.6.5.1smp
 lustre-ldiskfs-3.0.4-2.6.18_53.1.14.el5_lustre.1.6.5.1smp
 lustre-1.6.5.1-2.6.18_53.1.14.el5_lustre.1.6.5.1smp
 lustre-modules-1.6.5.1-2.6.18_53.1.14.el5_lustre.1.6.5.1smp
 e2fsprogs-1.40.7.sun3-0redhat


 machine 1: with ib0 IP address : 172.24.198.111
 machine 2: with ib0 IP address : 172.24.198.112

 /etc/modprobe.conf contains
 options lnet networks=o2ib

 TCP networking worked fine and now I am trying with Infiniband network 
 finding it difficult in communicating with IB nodes mounting effort 
 throghs me the following error

 [r...@p186 ~]# mount -t lustre -o loop /tmp/lustre-ost1 /mnt/ost1
 mount.lustre: mount /dev/loop0 at /mnt/ost1 failed: Input/output error
 Is the MGS running?

 /var/log/messages :
 Jan 15 16:55:25 p186 kernel: kjournald starting.  Commit interval 5 
 seconds
 Jan 15 16:55:25 p186 kernel: LDISKFS FS on loop0, internal journal
 Jan 15 16:55:25 p186 kernel: LDISKFS-fs: mounted filesystem with 
 ordered data mode.
 Jan 15 16:55:25 p186 kernel: kjournald starting.  Commit interval 5 
 seconds
 Jan 15 16:55:25 p186 kernel: LDISKFS FS on loop0, internal journal
 Jan 15 16:55:25 p186 kernel: LDISKFS-fs: mounted filesystem with 
 ordered data mode.
 Jan 15 16:55:25 p186 kernel: LDISKFS-fs: file extents enabled
 Jan 15 16:55:25 p186 kernel: LDISKFS-fs: mballoc enabled
 Jan 15 16:55:30 p186 kernel: Lustre: Request x7 sent from 
 mgc172.24.198@o2ib to NID 172.24.198@o2ib 5s ago has timed out 
 (limit 5s).
 Jan 15 16:55:30 p186 kernel: LustreError: 
 7193:0:(obd_mount.c:1062:server_start_targets()) Required registration 
 failed for lustre-OST: -5
 Jan 15 16:55:30 p186 kernel: LustreError: 15f-b: Communication error 
 with the MGS.  Is the MGS running?
 Jan 15 16:55:30 p186 kernel: LustreError: 
 7193:0:(obd_mount.c:1597:server_fill_super()) Unable to start targets: -5
 Jan 15 16:55:30 p186 kernel: LustreError: 
 7193:0:(obd_mount.c:1382:server_put_super()) no obd lustre-OST
 Jan 15 16:55:30 p186 kernel: LustreError: 
 7193:0:(obd_mount.c:119:server_deregister_mount()) lustre-OST not 
 registered
 Jan 15 16:55:30 p186 kernel: LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 
 success)
 Jan 15 16:55:30 p186 kernel: LDISKFS-fs: mballoc: 0 extents scanned, 0 
 goal hits, 0 2^N hits, 0 breaks, 0 lost
 Jan 15 16:55:30 p186 kernel: LDISKFS-fs: mballoc: 0 generated and it 
 took 0
 Jan 15 16:55:30 p186 kernel: LDISKFS-fs: mballoc: 0 preallocated, 0 
 discarded
 Jan 15 16:55:30 p186 kernel: Lustre: server umount lustre-OST complete
 Jan 15 16:55:30 p186 kernel: LustreError: 
 7193:0:(obd_mount.c:1951:lustre_fill_super()) Unable to mount  (-5)

 All pinging efforts also failed to the IB NIDS local/remote
 can ping the ip address :
 [r...@p186 ~]# ping 172.24.198.112
 PING 172.24.198.112 (172.24.198.112) 56(84) bytes of data.
 64 bytes from 172.24.198.112 http://172.24.198.112: icmp_seq=1 
 ttl=64 time=0.052 ms
 64 bytes from 172.24.198.112 http://172.24.198.112: icmp_seq=2 
 ttl=64 time=0.024 ms

 --- 172.24.198.112 ping statistics ---
 2 packets transmitted, 2 received, 0% packet loss, time 1000ms
 rtt min/avg/max/mdev = 0.024/0.038/0.052/0.014 ms
 [r...@p186 ~]# ping 172.24.198.111
 PING 172.24.198.111 (172.24.198.111) 56(84) bytes of data.
 64 bytes from 172.24.198.111 http://172.24.198.111: icmp_seq=1 
 ttl=64 time=2.16 ms
 64 bytes from 172.24.198.111 http://172.24.198.111: icmp_seq=2 
 ttl=64 time=0.296 ms

 --- 172.24.198.111 ping statistics ---
 2 packets transmitted, 2 received, 0% packet loss, time 1000ms
 rtt min/avg/max/mdev = 0.296/1.231/2.166/0.935 ms

 but cant ping the NIDS :
 [r...@p186 ~]# lctl ping 172.24.198@o2ib
 failed to ping 172.24.198@o2ib: Input/output error
 [r...@p186 ~]# lctl ping 172.24.198@o2ib
 failed to ping 172.24.198@o2ib: Input/output error

 Any idea why lnet cant ping NIDS ?

 some more configurations:
 [r...@p186 ~]# ibstat
 CA 'mthca0'
 CA type: MT23108
 Number of ports: 2
 Firmware version: 3.5.0
 Hardware version: a1
 Node GUID: 0x0002c9020021550c

 Machines are connected via IB switch.

 Looking forward for help.

 ~subbu
 

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
   


Re: [Lustre-discuss] Lustre MDS Errors 1-7 and operation 101

2009-01-15 Thread Andreas Dilger
On Jan 14, 2009  11:34 +0100, Thomas Roth wrote:
 Jan 14 10:44:33 server1 kernel: LustreError:
 5118:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error
 (-107)  r...@8107fd6c4c50 x2077599/t0 o101-?@?:0/0 lens 232/0 e
 0 to 0 dl 1231927273 ref 1 fl Interpret:/0/0 rc -107/0
 
 and:
 
 Jan 14 10:46:42 server1 kernel: LustreError:
 6766:0:(mgs_handler.c:557:mgs_handle()) lustre_mgs: operation 101 on
 unconnected MGS
 
 
 error (-107) is /* Transport endpoint is not connected */  -   I have
 seen this before on clients which had lost the connection to the
 cluster. But this is on the MGS/MDS - one server with one partition for
 the MGS and one for the MDT.
 The second error suggests of course that the MGS is actually not
 connected - but how can a Lustre system run when its MGS isn't there?
 Makes no sense, does it?

It means some client is trying to perform operations on the MGS before
it is connected.

 O.k., the cluster is running Debian Etch 64bit, Kernel 2.6.22, Lustre
 1.6.5.1.  The operation 101 thing is supposed to have been solved in
 the 1.6.4 - 1.6.5 upgrade, according to the change logs.

There are a million things that might cause operation 101 problems.
101 = LDLM_ENQUEUE, so this is just a lock enqueue.
 
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre MDS Errors 1-7 and operation 101

2009-01-15 Thread Thomas Roth
Thanks, Andreas.

Andreas Dilger wrote:


 error (-107) is /* Transport endpoint is not connected */  -   I have
 seen this before on clients which had lost the connection to the
 cluster. But this is on the MGS/MDS - one server with one partition for
 the MGS and one for the MDT.
 The second error suggests of course that the MGS is actually not
 connected - but how can a Lustre system run when its MGS isn't there?
 Makes no sense, does it?
 
 It means some client is trying to perform operations on the MGS before
 it is connected.

Who? Before the client is connected, or before the MGS is connected?
Of course the client can't do something before it is connected? But the
MGS is connected in the sense that it is mounted and responsive - I can
do a fresh client mount of this system any time.
Maybe that's more semantics than Lustre. In any case, I am  reassured by
your comments, in particular since the cluster is doing fine in this
situation.

Regards,
Thomas

 O.k., the cluster is running Debian Etch 64bit, Kernel 2.6.22, Lustre
 1.6.5.1.  The operation 101 thing is supposed to have been solved in
 the 1.6.4 - 1.6.5 upgrade, according to the change logs.
 
 There are a million things that might cause operation 101 problems.
 101 = LDLM_ENQUEUE, so this is just a lock enqueue.
  
 Cheers, Andreas
 --
 Andreas Dilger
 Sr. Staff Engineer, Lustre Group
 Sun Microsystems of Canada, Inc.
 

-- 

Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986

GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1
D-64291 Darmstadt
www.gsi.de

Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Darmstadt
Handelsregister: Amtsgericht Darmstadt, HRB 1528

Geschäftsführer: Professor Dr. Horst Stöcker

Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph,
Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] About MDS failover

2009-01-15 Thread Jeffrey Alan Bennett
Hi Cliff,

 
 Are you using heartbeat V1 or V2?
 
I am using heartbeat V2. It works as expected, I just had to tune some time 
outs, but it still takes around 3 minutes to totally move the MGS/MDS services 
to the other system. I guess having the MGS and MDS on separate systems would 
help reduce this time. Also, MMP is affecting somehow to this time, but MMP is 
necessary for failover.

My biggest concern is that I can't control the situation in which the HBA 
connectivity with the storage system is damaged, ie: I pull the cables from the 
HBAs on the MGS/MDS and nothing happens, the MDS and MGS services keep running, 
they are still mounted and therefore heartbeat does nothing. From the heartbeat 
documentation it does not seem that this can be done, at least easily?. I 
read something about HBA ping and it seems it requires HBAAPI which does not 
work with Brocade HBAs...

Any help will be greatly appreciated.

 I would like to hear more about the issues you are experiencing.
 We have had some people use the Red Hat cluster tools.
 
I will try Red Hat cluster tools.

Thanks,

Jeff
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Query

2009-01-15 Thread Ricardo M. Correia
Hi Ravi,

On Ter, 2009-01-13 at 11:59 +0530, Ravi Rattihalli wrote:
  Here are my questions:

  1. Integrating ZFS with Lustre: Does this mean that some features
 of ZFS are going to be integrated with Lustre? ( like
 optimized checksums of ZFS etc ) 

Yes, we plan to integrate some ZFS features with Lustre.

One of such features is the checksumming, like you mentioned. We are
planning to make the Lustre clients compute and provide the checksums
over the wire to the servers, and use them as the block checksums in
ZFS. That will achieve two goals: 1) offload checksum computation to
clients, which in total have more CPU available than servers and 2)
achieve true end-to-end data integrity.

There may also be some integration in terms of quotas, or some other
features.

We will also be developing some features in ZFS, to achieve either
better performance in some cases (e.g., a zero-copy API), or to achieve
new functionality (e.g., multi-mount protection, but this is not our
highest priority right now).
 
  1. Which version of Lustre will have end-to-end data interity and
 which checksum algorithm will be used (if not CRC32)?
 
 (I read in one document written by Peter Bojanic which said Lustre+ZFS
 = End-to-End Data Integrity) So is it ver. 3.0 and above?

I believe so.
 
  I read in wikipedia under ZFS integration “Lustre 3.0 will allow
 users to choose between ZFS and ldiskfs as back-end storage”.
 
 Why ZFS and ldiskfs are treated separately here even after integration
 here? Once integrated it is just Lustre 3.0 isn’t it?

Yes, but I'm not sure what is your confusion here.

With Lustre 3.0, you will be free to choose whether you wish to create
ldiskfs or ZFS-formatted backend devices - both options should be
available.
 
  I would be glad to hear the answers from you which may solve my
 queries and confusionJ

I hope my answers clarify things a bit.

Cheers,
Ricardo


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] About MDS failover

2009-01-15 Thread Andreas Dilger
On Jan 15, 2009  11:38 -0800, Jeffrey Alan Bennett wrote:
 I am using heartbeat V2. It works as expected, I just had to tune some
 time outs, but it still takes around 3 minutes to totally move the MGS/MDS
 services to the other system.

This is largely an issue of the Lustre failover itself, and not the HA
software.  The problem today is that under heavy load the clients may
have to wait a long time for any requests sent to the server to complete
(100s of seconds in some cases), so it is difficult for the clients to
distinguish between server death (unlikely) and heavy server load (common).

In the case where a server dies and fails over, the clients have to wait
for their requests to time out, then they resend and wait again (in the
common case the server is just overloaded), then finally they try to contact
any other server listed as failover for that node.

What we are looking to do for improving failover speed is to have the
backup server broadcast to the clients that it has taken over the OST/MDT
when it has started.  Then the clients will be able to do failover to
the new server as soon as it is ready, instead of waiting for the original
requests to time out.

 My biggest concern is that I can't control the situation in which
 the HBA connectivity with the storage system is damaged, ie: I pull the
 cables from the HBAs on the MGS/MDS and nothing happens, the MDS and MGS
 services keep running, they are still mounted and therefore heartbeat
 does nothing. From the heartbeat documentation it does not seem that
 this can be done, at least easily?. I read something about HBA ping and
 it seems it requires HBAAPI which does not work with Brocade HBAs...

You can use HBA multi-pathing to avoid this problem, if your hardware
supports it.  You can also use /proc/fs/lustre/health_check to check
if the filesystems have encountered errors and are marked unhealthy.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Log creation/deletion of files?

2009-01-15 Thread Lundgren, Andrew
Is there a way to enable logging of UID and host for creation/deletion of 
files/directories within the cluster?

--
Andrew
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] About MDS failover

2009-01-15 Thread Jeffrey Alan Bennett
Thanks Andreas,

I understand that this is a common issue with failover, as mentioned in the 
Lustre documentation.

 
 You can use HBA multi-pathing to avoid this problem, if your 
 hardware supports it.  You can also use 
 /proc/fs/lustre/health_check to check if the filesystems have 
 encountered errors and are marked unhealthy.
 

We use multipath in all our configurations. However, will Lustre be able to 
detect if the connectivity to the storage has been totally lost ( ie. no 
available path ) and display accordingly on /proc/fs/lustre/health_check?

Thanks,

Jeff
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Log creation/deletion of files?

2009-01-15 Thread Brian J. Murrell
On Thu, 2009-01-15 at 18:41 -0700, Lundgren, Andrew wrote:
 Is there a way to enable logging of UID and host for creation/deletion
 of files/directories within the cluster?

The feature you are looking for is called audit logs.  It used to
exist on code branch for the Hendrix project but I don't see it on our
current roadmap.  Likely, given the age of the Hendrix code, it would
take some non-insignificant work to port forward to current Lustre.

That said, we do have a server changelogs feature coming in 2.0 and
while that will likely log the filesystem changes, I'm not sure if/that
it will log the uid responsible for the change.

b.



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Log creation/deletion of files?

2009-01-15 Thread Lundgren, Andrew
Thank you.

 -Original Message-
 From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-
 boun...@lists.lustre.org] On Behalf Of Brian J. Murrell
 Sent: Thursday, January 15, 2009 7:06 PM
 To: lustre-discuss@lists.lustre.org
 Subject: Re: [Lustre-discuss] Log creation/deletion of files?

 On Thu, 2009-01-15 at 18:41 -0700, Lundgren, Andrew wrote:
  Is there a way to enable logging of UID and host for
 creation/deletion
  of files/directories within the cluster?

 The feature you are looking for is called audit logs.  It used to
 exist on code branch for the Hendrix project but I don't see it on our
 current roadmap.  Likely, given the age of the Hendrix code, it would
 take some non-insignificant work to port forward to current Lustre.

 That said, we do have a server changelogs feature coming in 2.0 and
 while that will likely log the filesystem changes, I'm not sure if/that
 it will log the uid responsible for the change.

 b.



 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] o2ib cant ping/mount Infiniband NID

2009-01-15 Thread subbu kl
Liang,
after executing following echo :
echo +neterror  /proc/sys/lnet/printk

now lctlt ping shows the following error

# lctl ping 172.24.198@o2ib
failed to ping 172.24.198@o2ib: Input/output error

Jan 16 10:24:14 p128 kernel: Lustre:
2750:0:(o2iblnd_cb.c:2687:kiblnd_cm_callback()) 172.24.198@o2ib: ROUTE
ERROR -22
Jan 16 10:24:14 p128 kernel: Lustre:
2750:0:(o2iblnd_cb.c:2101:kiblnd_peer_connect_failed()) Deleting messages
for 172.24.198@o2ib: connection failed

Looks like some problem with IB connection manager !

1. do we have any help docs to setup IPoIB and Lustre, lustre operation
manual has very minimal info about this . I think I am missing some IPoIB
setup part here.
2. or is it mannual assignment of  IP addresses to ib0 is creating some
problem


*Some more supporting info :
*subnet manager of following version is also running : OpenSM 3.1.8

Initially I got this error for MDS mount

Jan 16 09:45:20 p128 kernel: LustreError:
4991:0:(linux-tcpip.c:124:libcfs_ipif_query()) Can't get IP address for
interface ib0
Jan 16 09:45:20 p128 kernel: LustreError:
4991:0:(o2iblnd.c:1563:kiblnd_startup()) Can't query IPoIB interface ib0:
-99
Jan 16 09:45:21 p128 kernel: LustreError: 105-4: Error -100 starting up LNI
o2ib
Jan 16 09:45:21 p128 kernel: LustreError:
4991:0:(events.c:707:ptlrpc_init_portals()) network initialisation failed
Jan 16 09:45:21 p128 modprobe: WARNING: Error inserting ptlrpc
(/lib/modules/2.6.18-53.1.14.el5_lustre.1.6.5.1smp/kernel/fs/lustre/ptlrpc.ko):
Input/output error
Jan 16 09:45:21 p128 modprobe: WARNING: Error inserting osc
(/lib/modules/2.6.18-53.1.14.el5_lustre.1.6.5.1smp/kernel/fs/lustre/osc.ko):
Unknown symbol in module, or unknown parameter (see dmesg)
Jan 16 09:45:21 p128 kernel: osc: Unknown symbol ldlm_prep_enqueue_req
Jan 16 09:45:21 p128 kernel: osc: Unknown symbol ldlm_resource_get
Jan 16 09:45:21 p128 kernel: osc: Unknown symbol ptlrpc_lprocfs_register_obd
.
.
.

then I mannually set the IP address for ib0 as folows :
ifconfig ib0 172.24.198.111

[r...@p186 ~]# ifconfig ib0
ib0   Link encap:InfiniBand  HWaddr
80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
  inet addr:172.24.198.112  Bcast:172.24.255.255  Mask:255.255.0.0
  UP BROADCAST MULTICAST  MTU:65520  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:256
  RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

then it mounted sucessfully

Jan 16 09:47:09 p128 kernel: Lustre: Added LNI 172.24.198@o2ib [8/64]
Jan 16 09:47:09 p128 kernel: Lustre: MGS MGS started
Jan 16 09:47:09 p128 kernel: Lustre: Setting parameter
lustre-MDT.mdt.group_upcall in log lustre-MDT
Jan 16 09:47:09 p128 kernel: Lustre: Enabling user_xattr
Jan 16 09:47:09 p128 kernel: Lustre: lustre-MDT: new disk, initializing
Jan 16 09:47:09 p128 kernel: Lustre: MDT lustre-MDT now serving dev
(lustre-MDT/64db1fc7-03ba-9803-4d20-ab0d2aa66116) with recovery enabled
Jan 16 09:47:09 p128 kernel: Lustre:
5274:0:(lproc_mds.c:262:lprocfs_wr_group_upcall()) lustre-MDT: group
upcall set to /usr/sbin/l_getgroups
Jan 16 09:47:09 p128 kernel: Lustre: lustre-MDT.mdt: set parameter
group_upcall=/usr/sbin/l_getgroups
Jan 16 09:47:09 p128 kernel: Lustre: Server lustre-MDT on device
/dev/loop0 has started
.
.
.


~subbu


On Thu, Jan 15, 2009 at 8:37 PM, Liang Zhen zhen.li...@sun.com wrote:

 Subbu,

 I'd suggest:
 1) make sure ko2iblnd has been brought up (please check if there is any
 error message when startup ko2iblnd)
 2) echo +neterror  /proc/sys/lnet/printk, then try with lctl ping, if it
 still can't work please post error messages

 Regards
 Liang

 subbu kl:

 Problem is similer to
 http://lists.lustre.org/pipermail/lustre-discuss/2008-May/007498.html
 But by looking at the thread could not really get the solution for the
 problem.

 I have two RHEL5 Linux servers installed with following packages -

 kernel-lustre-smp-2.6.18-53.1.14.el5_lustre.1.6.5.1
 kernel-ib-1.3-2.6.18_53.1.14.el5_lustre.1.6.5.1smp
 lustre-ldiskfs-3.0.4-2.6.18_53.1.14.el5_lustre.1.6.5.1smp
 lustre-1.6.5.1-2.6.18_53.1.14.el5_lustre.1.6.5.1smp
 lustre-modules-1.6.5.1-2.6.18_53.1.14.el5_lustre.1.6.5.1smp
 e2fsprogs-1.40.7.sun3-0redhat


 machine 1: with ib0 IP address : 172.24.198.111
 machine 2: with ib0 IP address : 172.24.198.112

 /etc/modprobe.conf contains
 options lnet networks=o2ib

 TCP networking worked fine and now I am trying with Infiniband network
 finding it difficult in communicating with IB nodes mounting effort throghs
 me the following error

 [r...@p186 ~]# mount -t lustre -o loop /tmp/lustre-ost1 /mnt/ost1
 mount.lustre: mount /dev/loop0 at /mnt/ost1 failed: Input/output error
 Is the MGS running?

 /var/log/messages :
 Jan 15 16:55:25 p186 kernel: kjournald starting.  Commit interval 5
 seconds
 Jan 15 16:55:25 p186 kernel: LDISKFS FS on loop0, internal journal
 Jan 15 16:55:25 

[Lustre-discuss] Lustre locking

2009-01-15 Thread Mag Gam
At our university many of our students and professors use SQLite and
Berkley DB for their projects. Probally, BDB more than SQLite. Would I
we need to have Lustre mounted up a certain way to avoid corruption
via file locking? Any thoughts about this?

TIA
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre inode cache tunables

2009-01-15 Thread Andreas Dilger
On Jan 15, 2009  11:27 +0100, Jakob Goldbach wrote:
 I daily run 'find /lustre' on a filesystem with many files. This
 consumes a lot of memory and /proc/slabinfo reveals that
 lustre_inode_cache has ~900 objects.
 
 I've seen the system swapping sometimes, causing slow responses and
 evictions. Any tunables for reclaming pages from the lustre_inode_cache
 slab? 

This is a problem with the Linux VFS more than Lustre itself.  A find
even on a local filesystem would generate this many inodes.

Depending on what you are doing with find you could instead use the
lfs find command.  This avoids instantiating inodes or requesting
any data from the OSTs unless it is absolutely required.  In many cases
lfs find can do its work with only information from the MDS, and it
does not need to instantiate the inode.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss