Re: [Lustre-discuss] problem mounting two different lustre instances

2009-02-10 Thread Götz Waschk
On Tue, Feb 10, 2009 at 1:44 AM, Isaac Huang he.hu...@sun.com wrote:
 On Mon, Feb 09, 2009 at 04:52:20PM +0100, G?tz Waschk wrote:
 My client has this in modprobe.conf:
 options lnet networks=o2ib,tcp
 I'm trying to mount the remote network with
 mount -t lustre 141.34.228...@tcp0:/atlas /scratch/lustre-1.6/atlas
 and the command just hangs, the error is this:
 LustreError: 2887:0:(events.c:66:request_out_callback()) @@@ type 4,
 status -113  ...@0100dfc2ac00 x7/t0

Hi Isaac,

 The outgoing message failed with -113 (EHOSTUNREACH). What does lctl
 list_nids say on the client?
on that client, the output is:
192.168.224...@o2ib
141.34.216...@tcp

 Also, please:
 echo +neterror  /proc/sys/lnet/printk
 So that more network errors would go onto console.
OK, after the next mount attempt I have this in the log now:

Lustre: OBD class driver, http://www.lustre.org/
Lustre Version: 1.6.6
Build Version:
1.6.6-1970010101-PRISTINE-.usr.src.redhat.BUILD.lustre-1.6.6.kernel-2.6.9-78.0.13.ELsmp
Lustre: Added LNI 192.168.224...@o2ib [8/64]
Lustre: Added LNI 141.34.216...@tcp [8/256]
Lustre: Accept secure, port 988
Lustre: Lustre Client File System; http://www.lustre.org/
Lustre: 2887:0:(o2iblnd_cb.c:2704:kiblnd_cm_callback())
192.168.22...@o2ib: ROUTE ERROR -22
Lustre: 2887:0:(o2iblnd_cb.c:2118:kiblnd_peer_connect_failed())
Deleting messages for 192.168.22...@o2ib: connection failed
LustreError: 2887:0:(events.c:66:request_out_callback()) @@@ type 4,
status -113  r...@010037eeac00 x7/t0
o38-atlas-mdt_u...@192.168.22.32@o2ib:12/10 lens 240/400 e 0 to 5
dl 1234255029 ref 2 fl Rpc:/0/0 rc 0/0
Lustre: 9263:0:(client.c:1199:ptlrpc_expire_one_request()) @@@ network
error (sent at 1234255024, 0s ago)  r...@010037eeac00 x7/t0
o38-atlas-mdt_u...@192.168.22.32@o2ib:12/10 lens 240/400 e 0 to 5
dl 1234255029 ref 1 fl Rpc:/0/0 rc 0/0
Lustre: Request x7 sent from atlas-MDT-mdc-0107fc2ee400 to NID
192.168.22...@o2ib 0s ago has timed out (limit 5s).
Lustre: 9264:0:(import.c:410:import_select_connection())
atlas-MDT-mdc-0107fc2ee400: tried all connections, increasing
latency to 5s
Lustre: 2887:0:(o2iblnd_cb.c:2704:kiblnd_cm_callback())
192.168.22...@o2ib: ROUTE ERROR -22
Lustre: 2887:0:(o2iblnd_cb.c:2118:kiblnd_peer_connect_failed())
Deleting messages for 192.168.22...@o2ib: connection failed
LustreError: 2887:0:(events.c:66:request_out_callback()) @@@ type 4,
status -113  r...@01080325a400 x10/t0
o38-atlas-mdt_u...@192.168.22.32@o2ib:12/10 lens 240/400 e 0 to 5
dl 1234255054 ref 2 fl Rpc:/0/0 rc 0/0
Lustre: 9263:0:(client.c:1199:ptlrpc_expire_one_request()) @@@ network
error (sent at 1234255049, 0s ago)  r...@01080325a400 x10/t0
o38-atlas-mdt_u...@192.168.22.32@o2ib:12/10 lens 240/400 e 0 to 5
dl 1234255054 ref 1 fl Rpc:/0/0 rc 0/0
Lustre: Request x10 sent from atlas-MDT-mdc-0107fc2ee400 to
NID 192.168.22...@o2ib 0s ago has timed out (limit 5s).


Regards, Götz

-- 
AL I:40: Do what thou wilt shall be the whole of the Law.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] problem mounting two different lustre instances

2009-02-10 Thread Isaac Huang
On Mon, Feb 09, 2009 at 04:52:20PM +0100, G?tz Waschk wrote:
 Hello everyone,
 .
 My client has this in modprobe.conf:
 options lnet networks=o2ib,tcp
 I'm trying to mount the remote network with
 mount -t lustre 141.34.228...@tcp0:/atlas /scratch/lustre-1.6/atlas
 and the command just hangs, the error is this:
 LustreError: 2887:0:(events.c:66:request_out_callback()) @@@ type 4,
 status -113  r...@0100dfc2ac00 x7/t0

The outgoing message failed with -113 (EHOSTUNREACH). What does lctl
list_nids say on the client?

Also, please:
echo +neterror  /proc/sys/lnet/printk

So that more network errors would go onto console.

Isaac
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] problem mounting two different lustre instances

2009-02-10 Thread Götz Waschk
On Tue, Feb 10, 2009 at 1:45 PM, Johann Lombardi joh...@sun.com wrote:
 38 = MDS_CONNECT. The client tries to reach the MDT via 192.168.22...@o2ib,
 whereas i think it should use tcp to access the lustre filesystem of the 
 remote
 cluster, is my understanding of your configuration correct?
That's right, it should use 141.34.228...@tcp0 instead.

Regards, Götz


-- 
AL I:40: Do what thou wilt shall be the whole of the Law.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] problem mounting two different lustre instances

2009-02-09 Thread Götz Waschk
Hello everyone,

I have a problem mounting two different lustre instances on one
client. Both lustre instances are configured with o2ib networking for
the local clients and tcp for remote clients.

So I have two MGS instances, 141.34.228...@tcp0 is the remote lustre,
192.168.22...@o2ib0 is the local one.


My client has this in modprobe.conf:
options lnet networks=o2ib,tcp
I'm trying to mount the remote network with
mount -t lustre 141.34.228...@tcp0:/atlas /scratch/lustre-1.6/atlas
and the command just hangs, the error is this:
LustreError: 2887:0:(events.c:66:request_out_callback()) @@@ type 4,
status -113  r...@0100dfc2ac00 x7/t0
o38-atlas-mdt_u...@192.168.22.32@o2ib:12/10 lens 240/400 e 0 to 5
dl 1234194365 ref 2 fl Rpc:/0/0 rc 0/0

I can mount the local lustre just fine:
mount -t lustre 192.168.22...@o2ib0:/lhcb /lustre/lhcb/

On the other client I have reversed the network list in modprobe.conf:
options lnet networks=tcp,o2ib
Now I can mount both lustre instances, but both seem to use the tcp
network, even the one that is local and should use o2ib.

On the local MGS:
lctl list_nids
192.168.22...@o2ib
141.34.21...@tcp
On my client:
lctl which_nid 192.168.22...@o2ib 141.34.21...@tcp
141.34.21...@tcp


What can I do?

Regards, Götz Waschk
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss