[Lustre-discuss] Recovery Problem

2010-05-20 Thread Stefano Elmopi



Hi Andreas

My version of Lustre 1.8.3
Sorry for my bad English but I used the wrong word, crash is not the  
right word.

I try to explain better, I start copying a large file on the file system
and while the copy process continues, I reboot the server OSS,
and the copy process enters state - stalled -.
I expected that once the server back online, the copy process to  
resume normal

and complete copy of the file, instead the copy process fault.
Therefore the copy process that goes wrong, Lustre continues to  
perform good.

The failure of the copy process is a timeout issue ?
How can I change the timeout ?

Thanks !!!


Cheers, Stefano


Ing. Stefano Elmopi
Gruppo Darco - Resp. ICT Sistemi
Via Ostiense 131/L Corpo B, 00154 Roma

cell. 3466147165
tel.  0657060500
email:stefano.elm...@sociale.it

Ai sensi e per effetti della legge sulla tutela  della  riservatezza  
personale
(D.lgs n. 196/2003),  questa @mail e' destinata  unicamente alle  
persone sopra
indicate e le informazioni in essa contenute sono da considerarsi  
strettamente
riservate. E' proibito leggere, copiare, usare o diffondere il  
contenuto della
presente @mail  senza  autorizzazione. Se avete ricevuto  questo  
messaggio per

errore, siete pregati di rispedire la stessa al mittente. Grazie

Il giorno 19/mag/10, alle ore 17:07, Andreas Dilger ha scritto:

More important is to include the crash message from the client and  
the version of Lustre you are using.


Cheers, Andreas

On 2010-05-19, at 6:34, Stefano Elmopi stefano.elm...@sociale.it  
wrote:





Hi,

I have a small problem but it certainly is the fault of the little  
knowledge I have by the argument.
I have a Lustre file system with a node MGS/MDS, two nodes OSS and  
one Client.

I launch a copy of a large file on Lustre and while the copy goes on,
I restart the node OSS that is handling the writing on the File  
System.
The copy process is put in the state -stalled- and when the node  
OSS is back on,

I expected the copy process to resume normally, but instead crashes.
This is a log on the node MGS:

May 19 13:43:43 mdt01prdpom kernel: Lustre: 3827:0:(client.c: 
1463:ptlrpc_expire_one_request()) @@@ Request x1336168048230433  
sent from lustre01-OST-osc to NID 172.16.100@tcp 17s ago  
has timed out (17s prior to deadline).
May 19 13:43:43 mdt01prdpom kernel:   r...@81012e11e400  
x1336168048230433/t0 o400-lustre01-ost_u...@172.16.100.121@tcp: 
28/4 lens 192/384 e 0 to 1 dl 1274269423 ref 1 fl Rpc:N/0/0 rc 0/0
May 19 13:43:43 mdt01prdpom kernel: Lustre: lustre01-OST-osc:  
Connection to service lustre01-OST via nid 172.16.100@tcp  
was lost; in progress operations using this service will wait for  
recovery to complete.
May 19 13:44:09 mdt01prdpom kernel: Lustre: 3828:0:(client.c: 
1463:ptlrpc_expire_one_request()) @@@ Request x1336168048230435  
sent from lustre01-OST-osc to NID 172.16.100@tcp 26s ago  
has timed out (26s prior to deadline).
May 19 13:44:09 mdt01prdpom kernel:   r...@81012e5f2000  
x1336168048230435/t0 o8-lustre01-ost_u...@172.16.100.121@tcp: 
28/4 lens 368/584 e 0 to 1 dl 1274269449 ref 1 fl Rpc:N/0/0 rc 0/0
May 19 13:44:37 mdt01prdpom kernel: Lustre: 3829:0:(import.c: 
517:import_select_connection()) lustre01-OST-osc: tried all  
connections, increasing latency to 2s
May 19 13:44:37 mdt01prdpom kernel: LustreError: 3828:0:(lib-move.c: 
2441:LNetPut()) Error sending PUT to 12345-172.16.100@tcp: -113
May 19 13:44:37 mdt01prdpom kernel: LustreError: 3828:0:(events.c: 
66:request_out_callback()) @@@ type 4, status -113   
r...@81012d3e5800 x1336168048230437/t0 o8-lustre01-ost_u...@172.16.100.121 
@tcp:28/4 lens 368/584 e 0 to 1 dl 1274269504 ref 2 fl Rpc:N/0/0 rc  
0/0
May 19 13:44:37 mdt01prdpom kernel: Lustre: 3828:0:(client.c: 
1463:ptlrpc_expire_one_request()) @@@ Request x1336168048230437  
sent from lustre01-OST-osc to NID 172.16.100@tcp 0s ago has  
failed due to network error (27s prior to deadline).
May 19 13:44:37 mdt01prdpom kernel:   r...@81012d3e5800  
x1336168048230437/t0 o8-lustre01-ost_u...@172.16.100.121@tcp: 
28/4 lens 368/584 e 0 to 1 dl 1274269504 ref 1 fl Rpc:N/0/0 rc 0/0
May 19 13:45:33 mdt01prdpom kernel: Lustre: 3829:0:(import.c: 
517:import_select_connection()) lustre01-OST-osc: tried all  
connections, increasing latency to 3s
May 19 13:45:33 mdt01prdpom kernel: LustreError: 3828:0:(lib-move.c: 
2441:LNetPut()) Error sending PUT to 12345-172.16.100@tcp: -113
May 19 13:45:33 mdt01prdpom kernel: LustreError: 3828:0:(events.c: 
66:request_out_callback()) @@@ type 4, status -113   
r...@81012e11e400 x1336168048230441/t0 o8-lustre01-ost_u...@172.16.100.121 
@tcp:28/4 lens 368/584 e 0 to 1 dl 1274269561 ref 2 fl Rpc:N/0/0 rc  
0/0
May 19 13:45:33 mdt01prdpom kernel: Lustre: 3828:0:(client.c: 
1463:ptlrpc_expire_one_request()) @@@ Request x1336168048230441  
sent from lustre01-OST-osc to NID 172.16.100@tcp 0s ago has  
failed due to 

[Lustre-discuss] MGS Nids

2010-05-20 Thread leen smit
Dear All,

Im in the middle of creating a new Lustre setup, as a replacement for 
our current one.
The current one is a single machine with MGS/MDT/OST all living on this 
one box.

In the new setup I have 4 machines, two MDT's and  two OST's
We want to use keepalived as a failover mechanism between the two MDT's.
To keep the MDT's in sync, I'm using a DRBD disk between the two.

Keepalive uses a VIP in a active/passive state. In a failover situation 
the VIP gets transferred to the passive one.

The problem I'm experiencing, is that I can't seem to get the VIP listed 
as a NID, thus the OSS can only connect on the real IP, which is 
unwanted in this situation. Is there an easy way to change the nid on 
the MGS machine to the VIP?

See below for setup details. The last output from lctl list_nids is the 
problem area, where is that NID coming from?

I hope some one can shed some light on this...

Cheers,

Leen


Hosts:
192.168.21.32   fs-mgs-001
192.168.21.33   fs-mgs-002
192.168.21.34   fs-ost-001
192.168.21.35   fs-ost-002
192.168.21.40   fs-mgs-vip


mkfs.lustre --reformat --fsname datafs --mgs --mgsnode=fs-mgs-...@tcp 
/dev/VG1/mgs
mkfs.lustre --reformat --fsname datafs --mdt --mgsnode=fs-mgs-...@tcp 
/dev/drbd1

mount -t lustre /dev/VG1/mgs mgs/
mount -t lustre /dev/drbd1 /mnt/mdt/


fs-mgs-001:/mnt# lctl dl
   0 UP mgs MGS MGS 9
   1 UP mgc mgc192.168.21...@tcp 8f8dfecc-44bd-caae-3ed4-cd23168d59ab 5
   2 UP mdt MDS MDS_uuid 3
   3 UP lov datafs-mdtlov datafs-mdtlov_UUID 4
   4 UP mds datafs-MDT datafs-MDT_UUID 3


fs-mgs-001:/mnt# lctl list_nids
192.168.21...@tcp



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Recovery Problem

2010-05-20 Thread Johann Lombardi
On Thu, May 20, 2010 at 12:29:41PM +0200, Stefano Elmopi wrote:
 Hi Andreas
 My version of Lustre 1.8.3
 Sorry for my bad English but I used the wrong word, crash is not the
 right word.
 I try to explain better, I start copying a large file on the file system
 and while the copy process continues, I reboot the server OSS,
 and the copy process enters state - stalled -.
 I expected that once the server back online, the copy process to resume
 normal
 and complete copy of the file, instead the copy process fault.
 Therefore the copy process that goes wrong, Lustre continues to perform
 good.

May 19 13:46:31 mdt01prdpom kernel: LustreError: 167-0: This client was
evicted by lustre01-OST; in progress operations using this service
will fail.

The cp process failed because the client got evicted by the OSS.
We need to look at the OSS logs to figure out the root cause of
the eviction.

Johann
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] MGS Nids

2010-05-20 Thread Johann Lombardi
On Thu, May 20, 2010 at 12:46:42PM +0200, leen smit wrote:
 In the new setup I have 4 machines, two MDT's and  two OST's
 We want to use keepalived as a failover mechanism between the two MDT's.
 To keep the MDT's in sync, I'm using a DRBD disk between the two.
 
 Keepalive uses a VIP in a active/passive state. In a failover situation 
 the VIP gets transferred to the passive one.

Lustre uses stateful client/server connection. You don't need to - and cannot -
use a virtual ip. The lustre protocol already takes care of reconnection 
recovery.

 The problem I'm experiencing, is that I can't seem to get the VIP listed 
 as a NID, thus the OSS can only connect on the real IP, which is 
 unwanted in this situation. Is there an easy way to change the nid on 
 the MGS machine to the VIP?

No, you have to list the nids of all the mgs nodes at mkfs time (i.e.
--mgsnode=192.168.21...@tcp --mgsnode=192.168.21...@tcp in your case).

Johann
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] MGS Nids

2010-05-20 Thread Brian J. Murrell
On Thu, 2010-05-20 at 12:46 +0200, leen smit wrote: 
 Keepalive uses a VIP in a active/passive state. In a failover situation 
 the VIP gets transferred to the passive one.

Don't use virtual IPs with Lustre.  Lustre clients know how to deal with
failover nodes that have different IP addresses and using a virtual,
floating IP address will just confuse it.

b.



signature.asc
Description: This is a digitally signed message part
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] MGS Nids

2010-05-20 Thread leen smit
Ok, no VIP's then.. But how does failover work in lustre then?
If I setup everything using the real IP and then mount from a client and 
bring down the active MGS, the client will just sit there until it comes 
back up again.
As in, there is no failover to the second node.  So how does this 
internal lustre failover mechanism work?

I've been going trought the docs, and I must say there is very little on 
the failover mechanism, apart from mentions that a seperate app should 
care of that. Thats the reason I'm implementing keepalived..

At this stage I really am clueless, and can only think of creating a TUN 
interface, which will have the VIP address (thus, it becomes a real IP, 
not just a VIP).
But I got a feeling that ain't the right approach either...
Is there any docs available where a active/passive MGS setup is described?
Is it sufficient to define a --failnode=nid,...  at creation time?

Any help would be greatly appreciated!

Leen


On 05/20/2010 01:45 PM, Brian J. Murrell wrote:
 On Thu, 2010-05-20 at 12:46 +0200, leen smit wrote:

 Keepalive uses a VIP in a active/passive state. In a failover situation
 the VIP gets transferred to the passive one.
  
 Don't use virtual IPs with Lustre.  Lustre clients know how to deal with
 failover nodes that have different IP addresses and using a virtual,
 floating IP address will just confuse it.

 b.


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] MGS Nids

2010-05-20 Thread Kevin Van Maren
leen smit wrote:
 Ok, no VIP's then.. But how does failover work in lustre then?
 If I setup everything using the real IP and then mount from a client and 
 bring down the active MGS, the client will just sit there until it comes 
 back up again.
 As in, there is no failover to the second node.  So how does this 
 internal lustre failover mechanism work?

 I've been going trought the docs, and I must say there is very little on 
 the failover mechanism, apart from mentions that a seperate app should 
 care of that. Thats the reason I'm implementing keepalived..
   
Right: the external service needs to keep the mount active/healthy on 
one of the servers.
Lustre handles reconnecting clients/servers as long as the volume is 
mounted where it expects
(ie, the mkfs node or the --failover node).
 At this stage I really am clueless, and can only think of creating a TUN 
 interface, which will have the VIP address (thus, it becomes a real IP, 
 not just a VIP).
 But I got a feeling that ain't the right approach either...
 Is there any docs available where a active/passive MGS setup is described?
 Is it sufficient to define a --failnode=nid,...  at creation time?
   
Yep.  See Johann's email on the MGS, but for the MDTs and OSTs that's 
all you have to do
(besides listing both MGS NIDs at mkfs time).

For the clients, you specify both MGS NIDs at mount time, so it can 
mount regardless of which
node has the active MGS.

Kevin

 Any help would be greatly appreciated!

 Leen


 On 05/20/2010 01:45 PM, Brian J. Murrell wrote:
   
 On Thu, 2010-05-20 at 12:46 +0200, leen smit wrote:

 
 Keepalive uses a VIP in a active/passive state. In a failover situation
 the VIP gets transferred to the passive one.
  
   
 Don't use virtual IPs with Lustre.  Lustre clients know how to deal with
 failover nodes that have different IP addresses and using a virtual,
 floating IP address will just confuse it.

 b.


 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
   

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] MGS Nids

2010-05-20 Thread Gabriele Paciucci
For a clearification in a two servers configuration:

server1 - 192.168.2.20 MGS+MDT+OST0
server2 - 192.168.2.22 OST1
/dev/sdb is a lun shared between server1 and server 2

from server1: mkfs.lustre --mgs --failnode=192.168.2.22 --reformat /dev/sdb1
from server1: mkfs.lustre  --reformat --mdt --mgsnode=192.168.2.20 
--fsname=prova --failover=192.168.2.22 /dev/sdb4
from server1: mkfs.lustre  --reformat --ost --mgsnode=192.168.2.20 
--failover=192.168.2.22 --fsname=prova /dev/sdb2
from server2: mkfs.lustre  --reformat --ost --mgsnode=192.168.2.20 
--failover=192.168.2.20 --fsname=prova /dev/sdb3


from server1: mount -t lustre /dev/sdb1 /lustre/mgs_prova
from server1: mount -t lustre /dev/sdb4 /lustre/mdt_prova
from server1: mount -t lustre /dev/sdb2 /lustre/ost0_prova
from server2: mount -t lustre /dev/sdb3 /lustre/ost1_prova


from client:
modprobe lustre
mount -t lustre 192.168.2...@tcp:192.168.2...@tcp:/prova /prova

now halt server1 and mount MGS, MDT and OST0 on server2, the client 
should continue the activity without problem



On 05/20/2010 02:55 PM, Kevin Van Maren wrote:
 leen smit wrote:

 Ok, no VIP's then.. But how does failover work in lustre then?
 If I setup everything using the real IP and then mount from a client and
 bring down the active MGS, the client will just sit there until it comes
 back up again.
 As in, there is no failover to the second node.  So how does this
 internal lustre failover mechanism work?

 I've been going trought the docs, and I must say there is very little on
 the failover mechanism, apart from mentions that a seperate app should
 care of that. Thats the reason I'm implementing keepalived..

  
 Right: the external service needs to keep the mount active/healthy on
 one of the servers.
 Lustre handles reconnecting clients/servers as long as the volume is
 mounted where it expects
 (ie, the mkfs node or the --failover node).

 At this stage I really am clueless, and can only think of creating a TUN
 interface, which will have the VIP address (thus, it becomes a real IP,
 not just a VIP).
 But I got a feeling that ain't the right approach either...
 Is there any docs available where a active/passive MGS setup is described?
 Is it sufficient to define a --failnode=nid,...  at creation time?

  
 Yep.  See Johann's email on the MGS, but for the MDTs and OSTs that's
 all you have to do
 (besides listing both MGS NIDs at mkfs time).

 For the clients, you specify both MGS NIDs at mount time, so it can
 mount regardless of which
 node has the active MGS.

 Kevin


 Any help would be greatly appreciated!

 Leen


 On 05/20/2010 01:45 PM, Brian J. Murrell wrote:

  
 On Thu, 2010-05-20 at 12:46 +0200, leen smit wrote:



 Keepalive uses a VIP in a active/passive state. In a failover situation
 the VIP gets transferred to the passive one.


  
 Don't use virtual IPs with Lustre.  Lustre clients know how to deal with
 failover nodes that have different IP addresses and using a virtual,
 floating IP address will just confuse it.

 b.




 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

  
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss




-- 
_Gabriele Paciucci_ http://www.linkedin.com/in/paciucci

Pursuant to legislative Decree n. 196/03 you are hereby informed that this 
email contains confidential information intended only for use of addressee. If 
you are not the addressee and have received this email by mistake, please send 
this email to the sender. You may not copy or disseminate this message to 
anyone. Thank You.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Modifying Lustre network (good practices)

2010-05-20 Thread Olivier Hargoaa
Dear All,

We have a cluster with lustre critical data. On this cluster there are 
three networks on each Lustre server and client : one ethernet network 
for administration (eth0), and two other ethernet networks configured in 
bonding (bond0: eth1  eth2). On Lustre we get poor read performances 
and good write performances so we decide to modify Lustre network in 
order to see if problems comes from network layer.

Currently Lustre network is bond0. We want to set it as eth0, then eth1, 
then eth2 and finally back to bond0 in order to compare performances.

Therefore, we'll perform the following steps: we will umount the 
filesystem, reformat the mgs, change lnet options in modprobe file, 
start new mgs server, and finally modify our ost and mdt with 
tunefs.lustre with failover and mgs new nids using --erase-params and 
--writeconf options.

We tested it successfully on a test filesystem but we read in the manual 
that this can be really dangerous. Do you agree with this procedure? Do 
you have some advice or practice on this kind of requests? What's the 
danger?

Regards.

-- 
Olivier, Hargoaa
Phone: + 33 4 76 29 76 25
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Modifying Lustre network (good practices)

2010-05-20 Thread Brian J. Murrell
On Thu, 2010-05-20 at 16:27 +0200, Olivier Hargoaa wrote: 
 
 On Lustre we get poor read performances 
 and good write performances so we decide to modify Lustre network in 
 order to see if problems comes from network layer.

Without having any other information other than your statement that
performance is good in one direction but not the other I wonder why
you consider the network as being the most likely candidate as a culprit
for this problem.  I haven't come across very many networks that
(weren't designed to be and yet) are fast in one direction and slow in
the other.

 Therefore, we'll perform the following steps: we will umount the 
 filesystem, reformat the mgs, change lnet options in modprobe file, 
 start new mgs server, and finally modify our ost and mdt with 
 tunefs.lustre with failover and mgs new nids using --erase-params and 
 --writeconf options.

Sounds like a lot of rigmarole to test something that I would consider
to be of low probability (given the brief amount of information you have
provided).  But even if I did suspect the network were slow in only one
direction, before I started mucking with reconfiguring Lustre for
different networks, I would do some basic network throughput testing to
verify my hypothesis and adjust the probability of the network being the
problem accordingly.

Did you do any hardware profiling (i.e. using the lustre-iokit) before
deploying Lustre on this hardware?  We always recommend profiling the
hardware for exactly this reason: explaining performance problems.

Unfortunately, now that you have data on the hardware, it's much more
difficult to profile the hardware because to do it properly, you need to
be able to write to the disks.

b.



signature.asc
Description: This is a digitally signed message part
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Modifying Lustre network (good practices)

2010-05-20 Thread Nate Pearlstein
Which bonding method are you using?  Has the performance always been
this way?  Depending on which bonding type you are using and the network
hardware involved you might see the behavior you are describing.


On Thu, 2010-05-20 at 16:27 +0200, Olivier Hargoaa wrote:
 Dear All,
 
 We have a cluster with lustre critical data. On this cluster there are 
 three networks on each Lustre server and client : one ethernet network 
 for administration (eth0), and two other ethernet networks configured in 
 bonding (bond0: eth1  eth2). On Lustre we get poor read performances 
 and good write performances so we decide to modify Lustre network in 
 order to see if problems comes from network layer.
 
 Currently Lustre network is bond0. We want to set it as eth0, then eth1, 
 then eth2 and finally back to bond0 in order to compare performances.
 
 Therefore, we'll perform the following steps: we will umount the 
 filesystem, reformat the mgs, change lnet options in modprobe file, 
 start new mgs server, and finally modify our ost and mdt with 
 tunefs.lustre with failover and mgs new nids using --erase-params and 
 --writeconf options.
 
 We tested it successfully on a test filesystem but we read in the manual 
 that this can be really dangerous. Do you agree with this procedure? Do 
 you have some advice or practice on this kind of requests? What's the 
 danger?
 
 Regards.
 

-- 
Sent from my wired giant hulking workstation

Nate Pearlstein - npe...@sgi.com - Product Support Engineer


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] MGS Nids

2010-05-20 Thread Shuichi Ihara

You need two MGS nodes for 'mount' commnand on the clients.
e.g) mount -t lustre 192.168.1...@tcp:192.168.1...@tcp:/lustre /lustre

client will attempt to connect to secondary MGS once primary is not available.

Thanks
Ihara

(5/20/10 9:22 PM), leen smit wrote:
 Ok, no VIP's then.. But how does failover work in lustre then?
 If I setup everything using the real IP and then mount from a client and
 bring down the active MGS, the client will just sit there until it comes
 back up again.
 As in, there is no failover to the second node.  So how does this
 internal lustre failover mechanism work?

 I've been going trought the docs, and I must say there is very little on
 the failover mechanism, apart from mentions that a seperate app should
 care of that. Thats the reason I'm implementing keepalived..

 At this stage I really am clueless, and can only think of creating a TUN
 interface, which will have the VIP address (thus, it becomes a real IP,
 not just a VIP).
 But I got a feeling that ain't the right approach either...
 Is there any docs available where a active/passive MGS setup is described?
 Is it sufficient to define a --failnode=nid,...  at creation time?

 Any help would be greatly appreciated!

 Leen


 On 05/20/2010 01:45 PM, Brian J. Murrell wrote:
 On Thu, 2010-05-20 at 12:46 +0200, leen smit wrote:

 Keepalive uses a VIP in a active/passive state. In a failover situation
 the VIP gets transferred to the passive one.

 Don't use virtual IPs with Lustre.  Lustre clients know how to deal with
 failover nodes that have different IP addresses and using a virtual,
 floating IP address will just confuse it.

 b.


 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Modifying Lustre network (good practices)

2010-05-20 Thread Johann Lombardi
On Thu, May 20, 2010 at 10:43:58AM -0400, Brian J. Murrell wrote:
 On Thu, 2010-05-20 at 16:27 +0200, Olivier Hargoaa wrote: 
  
  On Lustre we get poor read performances 
  and good write performances so we decide to modify Lustre network in 
  order to see if problems comes from network layer.
 
 Without having any other information other than your statement that
 performance is good in one direction but not the other I wonder why
 you consider the network as being the most likely candidate as a culprit
 for this problem.

Maybe you should start with running lnet self test to compare read  write
performance?

http://wiki.lustre.org/manual/LustreManual18_HTML/LustreIOKit.html#50598014_pgfId-1290255

Johann
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Problems with MDS Crashing

2010-05-20 Thread Andrew Godziuk
We have had another hang, but this time we had KVM access to the
machine (and the screen blanker wasn't on). I took some screenshots,
the first one is an error I got after reboot, the BMP one is what I
saw when I first logged in to KVM, and the other ones are what I saw
when trying to type 'root' - it started printing traces.

http://amber.leeware.com/wi/lustre-death/

After reboot there was a command timeout message from RAID card. When
hanged - too little hardware resources.

-- 
Andrew
http://CloudAccess.net/
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Modifying Lustre network (good practices)

2010-05-20 Thread Olivier Hargoaa
Nate Pearlstein a écrit :
 Which bonding method are you using?  Has the performance always been
 this way?  Depending on which bonding type you are using and the network
 hardware involved you might see the behavior you are describing.
 

Hi,

Here is our bonding configuration :

On linux side :

mode=4- to use 802.3ad
miimon=100   - to set the link check interval (ms)
xmit_hash_policy=layer2+3   - to set XOR hashing method
lacp_rate=fast   - to set LCAPDU tx rate to request (slow=20s, fast=1s)

Onethernet switch side, load balancing is configured as:
# port-channel load-balance src-dst-mac

thanks

 
 On Thu, 2010-05-20 at 16:27 +0200, Olivier Hargoaa wrote:
 Dear All,

 We have a cluster with lustre critical data. On this cluster there are 
 three networks on each Lustre server and client : one ethernet network 
 for administration (eth0), and two other ethernet networks configured in 
 bonding (bond0: eth1  eth2). On Lustre we get poor read performances 
 and good write performances so we decide to modify Lustre network in 
 order to see if problems comes from network layer.

 Currently Lustre network is bond0. We want to set it as eth0, then eth1, 
 then eth2 and finally back to bond0 in order to compare performances.

 Therefore, we'll perform the following steps: we will umount the 
 filesystem, reformat the mgs, change lnet options in modprobe file, 
 start new mgs server, and finally modify our ost and mdt with 
 tunefs.lustre with failover and mgs new nids using --erase-params and 
 --writeconf options.

 We tested it successfully on a test filesystem but we read in the manual 
 that this can be really dangerous. Do you agree with this procedure? Do 
 you have some advice or practice on this kind of requests? What's the 
 danger?

 Regards.

 

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Modifying Lustre network (good practices)

2010-05-20 Thread James Robnett

 Can't really help with your larger question but I had a similar
experience with network appropriate write rates and slower reads.

You might check that you have enabled TCP selective acknowledgments, 
echo 1  /proc/sys/net/ipv4/tcp_sack
or
net.ipv4.tcp_sack = 1

 This can help in cases where your OSS's have larger pipes than
your clients and your files are striped across multiple OSS's.
When multiple OSS's are transmitting to a single client they can
over run the switch buffers and drop packets.   This is particularly
noticeable when doing IOzone type benchmarking from a single client
with a wide lfs stripe setting.

 With selective ACKs enabled the client will request a more minimal
set of packets be retransmitted ... or at least that's what I finally
deduced when I ran into it.

James Robnett
NRAO/NM
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Future of lustre 1.8.3+

2010-05-20 Thread Ramiro Alba Queipo
Hi all,

On Wed, 2010-05-19 at 14:43 +0200, Bernd Schubert wrote:

 That is what I would recommend and what several groups do (usually with 
 Debian, though). 
 
   not running the sles/rhel distro ? I know a lot of things can happen but
   are these rhel/sles patches do brake some key features of the kernel which
   would  only work under that specific distro ? I've positivley tested a
   lustre client with a sles patched kernel on a gentoo distro. But i'am a
   bit nervous about testing it on our live lustre server system.
 
 The only thing that really might cause trouble is udev, since sysfs 
 maintainers like to break old udev versions. I think upcoming Debian Squeeze 
 requires 2.6.27 at a minimum.

Argg!!!, I think I am stacked with this situation. I compiled
linux-2.6.18-164.11.1.el5.tar.bz2 under Ubuntu 10.04 (from Debian
Squeeze) using make-kpkg and after dealing with postinst.d hooks to
create initrd image (not created by default!!!), I tried to boot and
then I got the following before dropping into a initramfs shell:

---
Begin: Mounting root file system... ...
Begin: Running /scripts/local-top ...
Done.
libudev: udev_monitor_new_from_netlink: error getting socket: Invalid
argument wait-for-root[431] : segfault
Gave up waiting form root device. Common problems:
  - Boot args (cat /proc/cmdline)
- Check rootdelay= (did the system wait long enough?)
- Check root= (did the system wait for the right device?)
  -Missing modules (cat /proc/modules; ls /dev)
ALERT! /dev/disk/by-uuid/0cff75ed-ba7f-4799-b596-cf2b214b9768 does not
exist. Dropping to a shell!


BusyBox v1.13.3 (Ubuntu 1:1.13.3-1ubuntu11) buit-in shell (ask)
Enter 'Help' for a list of built-in commands.

(initramfs)
--

Is this the problem you are talking about? Could you give me some ideas
of how can be solved?

Regards


-- 
Ramiro Alba

Centre Tecnològic de Tranferència de Calor
http://www.cttc.upc.edu


Escola Tècnica Superior d'Enginyeries
Industrial i Aeronàutica de Terrassa
Colom 11, E-08222, Terrassa, Barcelona, Spain
Tel: (+34) 93 739 86 46


-- 
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que està net.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Future of lustre 1.8.3+

2010-05-20 Thread Andreas Dilger
The SLES11 kernel is at 2.6.27 so it could be usable for this. Also, I  
thought that there were Debian packages for Lustre, why not use those?

Cheers, Andreas

On 2010-05-20, at 9:48, Ramiro Alba Queipo r...@cttc.upc.edu wrote:

 Hi all,

 On Wed, 2010-05-19 at 14:43 +0200, Bernd Schubert wrote:

 That is what I would recommend and what several groups do (usually  
 with
 Debian, though).

 not running the sles/rhel distro ? I know a lot of things can  
 happen but
 are these rhel/sles patches do brake some key features of the  
 kernel which
 would  only work under that specific distro ? I've positivley  
 tested a
 lustre client with a sles patched kernel on a gentoo distro. But  
 i'am a
 bit nervous about testing it on our live lustre server system.

 The only thing that really might cause trouble is udev, since sysfs
 maintainers like to break old udev versions. I think upcoming  
 Debian Squeeze
 requires 2.6.27 at a minimum.

 Argg!!!, I think I am stacked with this situation. I compiled
 linux-2.6.18-164.11.1.el5.tar.bz2 under Ubuntu 10.04 (from Debian
 Squeeze) using make-kpkg and after dealing with postinst.d hooks to
 create initrd image (not created by default!!!), I tried to boot and
 then I got the following before dropping into a initramfs shell:

 --- 
 --- 
 -
 Begin: Mounting root file system... ...
 Begin: Running /scripts/local-top ...
 Done.
 libudev: udev_monitor_new_from_netlink: error getting socket: Invalid
 argument wait-for-root[431] : segfault
 Gave up waiting form root device. Common problems:
  - Boot args (cat /proc/cmdline)
- Check rootdelay= (did the system wait long enough?)
- Check root= (did the system wait for the right device?)
  -Missing modules (cat /proc/modules; ls /dev)
 ALERT! /dev/disk/by-uuid/0cff75ed-ba7f-4799-b596-cf2b214b9768 does not
 exist. Dropping to a shell!


 BusyBox v1.13.3 (Ubuntu 1:1.13.3-1ubuntu11) buit-in shell (ask)
 Enter 'Help' for a list of built-in commands.

 (initramfs)
 --- 
 --- 
 --- 
 -

 Is this the problem you are talking about? Could you give me some  
 ideas
 of how can be solved?

 Regards


 -- 
 Ramiro Alba

 Centre Tecnològic de Tranferència de Calor
 http://www.cttc.upc.edu


 Escola Tècnica Superior d'Enginyeries
 Industrial i Aeronàutica de Terrassa
 Colom 11, E-08222, Terrassa, Barcelona, Spain
 Tel: (+34) 93 739 86 46


 -- 
 Aquest missatge ha estat analitzat per MailScanner
 a la cerca de virus i d'altres continguts perillosos,
 i es considera que està net.

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Modifying Lustre network (good practices)

2010-05-20 Thread Olivier Hargoaa
Hi Brian and all others,

I'm sorry for not giving you all details. Here I will send you all 
information I have.

Regarding our configuration :
Lustre IO nodes are linked with two 10GB bonded links.
Compute nodes are linked with two 1GB bonded links.

Raw performances on server are fine for both write and read for each ost.

Firstly we ran iperf (severals times), and we obtained expected read and 
write rate. Results are symmetric (read and write) with any number of 
threads.

Then we test with LNET self test :
Here is our lst command for write test
lst add_test --batch bulkr --from c --to s brw write  check=simple size=1M
and result are :
[LNet Rates of c]
[R] Avg: 110  RPC/s Min: 110  RPC/s Max: 110  RPC/s
[W] Avg: 219  RPC/s Min: 219  RPC/s Max: 219  RPC/s
[LNet Bandwidth of c]
[R] Avg: 0.02 MB/s  Min: 0.02 MB/s  Max: 0.02 MB/s
[W] Avg: 109.20   MB/s  Min: 109.20   MB/s  Max: 109.20   MB/s
[LNet Rates of c]
[R] Avg: 109  RPC/s Min: 109  RPC/s Max: 109  RPC/s
[W] Avg: 217  RPC/s Min: 217  RPC/s Max: 217  RPC/s
[LNet Bandwidth of c]
[R] Avg: 0.02 MB/s  Min: 0.02 MB/s  Max: 0.02 MB/s
[W] Avg: 108.40   MB/s  Min: 108.40   MB/s  Max: 108.40   MB/s
[LNet Rates of c]
[R] Avg: 109  RPC/s Min: 109  RPC/s Max: 109  RPC/s
[W] Avg: 217  RPC/s Min: 217  RPC/s Max: 217  RPC/s
[LNet Bandwidth of c]
[R] Avg: 0.02 MB/s  Min: 0.02 MB/s  Max: 0.02 MB/s
[W] Avg: 108.40   MB/s  Min: 108.40   MB/s  Max: 108.40   MB/s

and now for read :
[LNet Rates of c]
[R] Avg: 10   RPC/s Min: 10   RPC/s Max: 10   RPC/s
[W] Avg: 5RPC/s Min: 5RPC/s Max: 5RPC/s
[LNet Bandwidth of c]
[R] Avg: 4.59 MB/s  Min: 4.59 MB/s  Max: 4.59 MB/s
[W] Avg: 0.00 MB/s  Min: 0.00 MB/s  Max: 0.00 MB/s
[LNet Rates of c]
[R] Avg: 10   RPC/s Min: 10   RPC/s Max: 10   RPC/s
[W] Avg: 5RPC/s Min: 5RPC/s Max: 5RPC/s
[LNet Bandwidth of c]
[R] Avg: 4.79 MB/s  Min: 4.79 MB/s  Max: 4.79 MB/s
[W] Avg: 0.00 MB/s  Min: 0.00 MB/s  Max: 0.00 MB/s
[LNet Rates of c]
[R] Avg: 10   RPC/s Min: 10   RPC/s Max: 10   RPC/s
[W] Avg: 5RPC/s Min: 5RPC/s Max: 5RPC/s
[LNet Bandwidth of c]
[R] Avg: 4.79 MB/s  Min: 4.79 MB/s  Max: 4.79 MB/s
[W] Avg: 0.00 MB/s  Min: 0.00 MB/s  Max: 0.00 MB/s

Iozone presents same asymmetric results as LNET.

With just one ost :
On WRITE sense, we get 233 MB/sec and taking into account maximum 
theorical is 250 MB/sec is a very good result: it works fine:

On READ sense, the maximun we get is 149 MB/sec with three theats ( 
processes: -t 3 ). If we configure four theats ( -t 4 ) we get 50 MB/sec

We also verified in brw_stats file that we use 1MB block size (both r and w)

So we only have problems with iozone/lustre and lnet selftest.

Thanks to all.


Brian J. Murrell a écrit :
 On Thu, 2010-05-20 at 16:27 +0200, Olivier Hargoaa wrote: 
 On Lustre we get poor read performances 
 and good write performances so we decide to modify Lustre network in 
 order to see if problems comes from network layer.
 
 Without having any other information other than your statement that
 performance is good in one direction but not the other I wonder why
 you consider the network as being the most likely candidate as a culprit
 for this problem.  I haven't come across very many networks that
 (weren't designed to be and yet) are fast in one direction and slow in
 the other.
 
 Therefore, we'll perform the following steps: we will umount the 
 filesystem, reformat the mgs, change lnet options in modprobe file, 
 start new mgs server, and finally modify our ost and mdt with 
 tunefs.lustre with failover and mgs new nids using --erase-params and 
 --writeconf options.
 
 Sounds like a lot of rigmarole to test something that I would consider
 to be of low probability (given the brief amount of information you have
 provided).  But even if I did suspect the network were slow in only one
 direction, before I started mucking with reconfiguring Lustre for
 different networks, I would do some basic network throughput testing to
 verify my hypothesis and adjust the probability of the network being the
 problem accordingly.
 
 Did you do any hardware profiling (i.e. using the lustre-iokit) before
 deploying Lustre on this hardware?  We always recommend profiling the
 hardware for exactly this reason: explaining performance problems.
 
 Unfortunately, now that you have data on the hardware, it's much more
 difficult to profile the hardware because to do it properly, you need to
 be able to write to the disks.
 
 b.
 
 
 
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Modifying Lustre network (good practices)

2010-05-20 Thread Olivier Hargoaa
Thanks Johann,

But you couldn't know but we already ran lnet self test unsuccessfully. 
I wrote results as answer to Brian.
What I do not know is if lnet test was good or not with bonding 
deactivated. I will ask administrators to test it.

Regards.

Johann Lombardi a écrit :
 On Thu, May 20, 2010 at 10:43:58AM -0400, Brian J. Murrell wrote:
 On Thu, 2010-05-20 at 16:27 +0200, Olivier Hargoaa wrote: 
 On Lustre we get poor read performances 
 and good write performances so we decide to modify Lustre network in 
 order to see if problems comes from network layer.
 Without having any other information other than your statement that
 performance is good in one direction but not the other I wonder why
 you consider the network as being the most likely candidate as a culprit
 for this problem.
 
 Maybe you should start with running lnet self test to compare read  write
 performance?
 
 http://wiki.lustre.org/manual/LustreManual18_HTML/LustreIOKit.html#50598014_pgfId-1290255
 
 Johann
 
 


-- 
Olivier, Hargoaa
Phone: + 33 4 76 29 76 25
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Future of lustre 1.8.3+

2010-05-20 Thread Ramiro Alba Queipo
On Thu, 2010-05-20 at 10:16 -0600, Andreas Dilger wrote:
 The SLES11 kernel is at 2.6.27 so it could be usable for this. Also, I 

Ok, I am getting
http://downloads.lustre.org/public/kernels/sles11/linux-2.6.27.39-0.3.1.tar.bz2

but, please. Where can I get a suitable config file to apply both for
servers and clients?

 thought that there were Debian packages for Lustre, why not use those?

Yes. I used them at the beginning, but after lustre 1.8.0.1 vanilla
servers are not any more supported to be used as servers and I started
to use RH5 kernels with lustre-1.8.1.1, both for clients and servers. In
my case using Ubuntu 10.04 with kernel 2.6.32 NOT suported as a lustre
patchless client, I prefer using an only kernel for the whole cluster,
do I?

Thanks for your answer.

Regards

 
 Cheers, Andreas
 
 On 2010-05-20, at 9:48, Ramiro Alba Queipo r...@cttc.upc.edu wrote:
 
  Hi all,
 
  On Wed, 2010-05-19 at 14:43 +0200, Bernd Schubert wrote:
 
  That is what I would recommend and what several groups do (usually  
  with
  Debian, though).
 
  not running the sles/rhel distro ? I know a lot of things can  
  happen but
  are these rhel/sles patches do brake some key features of the  
  kernel which
  would  only work under that specific distro ? I've positivley  
  tested a
  lustre client with a sles patched kernel on a gentoo distro. But  
  i'am a
  bit nervous about testing it on our live lustre server system.
 
  The only thing that really might cause trouble is udev, since sysfs
  maintainers like to break old udev versions. I think upcoming  
  Debian Squeeze
  requires 2.6.27 at a minimum.
 
  Argg!!!, I think I am stacked with this situation. I compiled
  linux-2.6.18-164.11.1.el5.tar.bz2 under Ubuntu 10.04 (from Debian
  Squeeze) using make-kpkg and after dealing with postinst.d hooks to
  create initrd image (not created by default!!!), I tried to boot and
  then I got the following before dropping into a initramfs shell:
 
  --- 
  --- 
  -
  Begin: Mounting root file system... ...
  Begin: Running /scripts/local-top ...
  Done.
  libudev: udev_monitor_new_from_netlink: error getting socket: Invalid
  argument wait-for-root[431] : segfault
  Gave up waiting form root device. Common problems:
   - Boot args (cat /proc/cmdline)
 - Check rootdelay= (did the system wait long enough?)
 - Check root= (did the system wait for the right device?)
   -Missing modules (cat /proc/modules; ls /dev)
  ALERT! /dev/disk/by-uuid/0cff75ed-ba7f-4799-b596-cf2b214b9768 does not
  exist. Dropping to a shell!
 
 
  BusyBox v1.13.3 (Ubuntu 1:1.13.3-1ubuntu11) buit-in shell (ask)
  Enter 'Help' for a list of built-in commands.
 
  (initramfs)
  --- 
  --- 
  --- 
  -
 
  Is this the problem you are talking about? Could you give me some  
  ideas
  of how can be solved?
 
  Regards
 
 
  -- 
  Ramiro Alba
 
  Centre Tecnològic de Tranferència de Calor
  http://www.cttc.upc.edu
 
 
  Escola Tècnica Superior d'Enginyeries
  Industrial i Aeronàutica de Terrassa
  Colom 11, E-08222, Terrassa, Barcelona, Spain
  Tel: (+34) 93 739 86 46
 
 
  -- 
  Aquest missatge ha estat analitzat per MailScanner
  a la cerca de virus i d'altres continguts perillosos,
  i es considera que està net.
 
  ___
  Lustre-discuss mailing list
  Lustre-discuss@lists.lustre.org
  http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
-- 
Ramiro Alba

Centre Tecnològic de Tranferència de Calor
http://www.cttc.upc.edu


Escola Tècnica Superior d'Enginyeries
Industrial i Aeronàutica de Terrassa
Colom 11, E-08222, Terrassa, Barcelona, Spain
Tel: (+34) 93 739 86 46


-- 
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que està net.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Future of lustre 1.8.3+

2010-05-20 Thread Andreas Dilger


On 2010-05-20, at 11:33, Ramiro Alba Queipo r...@cttc.upc.edu wrote:

 On Thu, 2010-05-20 at 10:16 -0600, Andreas Dilger wrote:
 The SLES11 kernel is at 2.6.27 so it could be usable for this.  
 Also, I

 Ok, I am getting
 http://downloads.lustre.org/public/kernels/sles11/linux-2.6.27.39-0.3.1.tar.bz2

 but, please. Where can I get a suitable config file to apply both for
 servers and clients?

The config files are in the Lustre source tree in lustre/ 
kernel_patches/kernel_configs/

You can just use the .config file from your client kernel (ie  
patchless) since you don't need the patched server kernel on the  
clients.

 thought that there were Debian packages for Lustre, why not use  
 those?

 Yes. I used them at the beginning, but after lustre 1.8.0.1 vanilla
 servers are not any more supported to be used as servers and I started
 to use RH5 kernels with lustre-1.8.1.1, both for clients and servers.

That's too bad that the Debiana maintainers have stopped making  
updates. Doubly so because we worked to add the fixes needed to build  
on Debian into the Lustre sources, so it is possible to do make debs  
to get Debian packages using all of the standard Debian packaging tools.

 In my case using Ubuntu 10.04 with kernel 2.6.32 NOT suported as a  
 lustre  patchless client, I prefer using an only kernel for the  
 whole cluster,
 do I?

That is by no means a requirement.

 Thanks for your answer.

 Regards


 Cheers, Andreas

 On 2010-05-20, at 9:48, Ramiro Alba Queipo r...@cttc.upc.edu wrote:

 Hi all,

 On Wed, 2010-05-19 at 14:43 +0200, Bernd Schubert wrote:

 That is what I would recommend and what several groups do (usually
 with
 Debian, though).

 not running the sles/rhel distro ? I know a lot of things can
 happen but
 are these rhel/sles patches do brake some key features of the
 kernel which
 would  only work under that specific distro ? I've positivley
 tested a
 lustre client with a sles patched kernel on a gentoo distro. But
 i'am a
 bit nervous about testing it on our live lustre server system.

 The only thing that really might cause trouble is udev, since sysfs
 maintainers like to break old udev versions. I think upcoming
 Debian Squeeze
 requires 2.6.27 at a minimum.

 Argg!!!, I think I am stacked with this situation. I compiled
 linux-2.6.18-164.11.1.el5.tar.bz2 under Ubuntu 10.04 (from Debian
 Squeeze) using make-kpkg and after dealing with postinst.d hooks to
 create initrd image (not created by default!!!), I tried to boot and
 then I got the following before dropping into a initramfs shell:

 ---
 ---
 --- 
 --
 Begin: Mounting root file system... ...
 Begin: Running /scripts/local-top ...
 Done.
 libudev: udev_monitor_new_from_netlink: error getting socket:  
 Invalid
 argument wait-for-root[431] : segfault
 Gave up waiting form root device. Common problems:
 - Boot args (cat /proc/cmdline)
   - Check rootdelay= (did the system wait long enough?)
   - Check root= (did the system wait for the right device?)
 -Missing modules (cat /proc/modules; ls /dev)
 ALERT! /dev/disk/by-uuid/0cff75ed-ba7f-4799-b596-cf2b214b9768 does  
 not
 exist. Dropping to a shell!


 BusyBox v1.13.3 (Ubuntu 1:1.13.3-1ubuntu11) buit-in shell (ask)
 Enter 'Help' for a list of built-in commands.

 (initramfs)
 ---
 ---
 ---
 --- 
 --

 Is this the problem you are talking about? Could you give me some
 ideas
 of how can be solved?

 Regards


 -- 
 Ramiro Alba

 Centre Tecnològic de Tranferència de Calor
 http://www.cttc.upc.edu


 Escola Tècnica Superior d'Enginyeries
 Industrial i Aeronàutica de Terrassa
 Colom 11, E-08222, Terrassa, Barcelona, Spain
 Tel: (+34) 93 739 86 46


 -- 
 Aquest missatge ha estat analitzat per MailScanner
 a la cerca de virus i d'altres continguts perillosos,
 i es considera que està net.

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

 -- 
 Ramiro Alba

 Centre Tecnològic de Tranferència de Calor
 http://www.cttc.upc.edu


 Escola Tècnica Superior d'Enginyeries
 Industrial i Aeronàutica de Terrassa
 Colom 11, E-08222, Terrassa, Barcelona, Spain
 Tel: (+34) 93 739 86 46


 -- 
 Aquest missatge ha estat analitzat per MailScanner
 a la cerca de virus i d'altres continguts perillosos,
 i es considera que està net.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Group descriptors corrupted

2010-05-20 Thread Lundgren, Andrew
I just went though something similar. When your fsck completes you may be left 
with things moved to your lost+found.  If that happens, you can mount the file 
system using -t ldiskfs and run the ll_recover_lost_found_objs against the 
lost+found directory.  

--
Andrew

-Original Message-
From: lustre-discuss-boun...@lists.lustre.org 
[mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Andreas Dilger
Sent: Wednesday, May 19, 2010 6:02 PM
To: John White
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [Lustre-discuss] Group descriptors corrupted

On 2010-05-19, at 17:37, John White wrote:
 A little help here?  One OST fails to mount with the following:
 
 LDISKFS-fs: group descriptors corrupted!
 LustreError: 8364:0:(obd_mount.c:1278:server_kernel_mount()) premount 
 /dev/mpath/lun_13:0x0 ldiskfs failed: -22, ldiskfs2 failed: -19.  Is the 
 ldiskfs module available?
 LustreError: 8364:0:(obd_mount.c:1592:server_fill_super()) Unable to mount 
 device /dev/mpath/lun_13: -22
 LustreError: 8364:0:(obd_mount.c:1997:lustre_fill_super()) Unable to mount  
 (-22)
 LDISKFS-fs error (device dm-13): ldiskfs_check_descriptors: Checksum for 
 group 2560 failed (12546!=45229)
 
 I assume running an e2fsck -fy against the OST is the prefered solution, I 
 just want to confirm.

That would be the normal course of action.  Actually, e2fsck -fp is slightly 
better than e2fsck -fy, since it chooses prudent answers to the questions, 
instead of yes always, and aborts if there isn't a safe/obvious choice.


  'e2fsck -fn [dev]' gives:
 Group descriptor 2560 checksum is invalid.  Fix? no
 [...]
 Group descriptor 2687 checksum is invalid.  Fix? no

Best to save the full e2fsck -fn output for future reference.  If this is the 
only problem, then no worries, but the checksums may also be invalid because 
there is other corruption, and this is only the first sign of trouble

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Future of lustre 1.8.3+

2010-05-20 Thread Andreas Dilger
On 2010-05-19, at 19:04, Dardo D Kleiner - CONTRACTOR wrote:
 I'm both personally and professionally encouraged to hear ClusterStor stand
 up and publicly state intent to support Lustre on SLES.  The million dollar
 question is in regards to *server* support - in particular wrt the 2.x
 series.  As a token of my interest, as well as a testament to my limited
 ability to maintain this in the long run, I submit the attached patches to
 the recent 1.10.0.40 beta release that enable building (and confirmed to be
 runnable) the server on current SLES11 kernel (2.6.27.45-0.1-default).  One
 caveat is that quota support is not compilable and appears to be a bit more
 difficult job than I can probably manage.  And I most certainly didn't run
 a full regression suite, but a straightforward single stream read/write
 appears to work fine.

In fact, it is best to check bugzilla first and/or post to the list.  We update 
Lustre to match the latest RHEL/SLES kernels for each releases.  Work is 
already underway to address the significant changes in SLES11 SP1, since this 
moves the kernel from 2.6.27 to 2.6.32.

 Up to .38 it was mostly monkey work - but .40 introduced additional patches
 to the RHEL ext4 implementation that has more substantially diverged from
 the one in current SLES11.  Perhaps SLES11SP1 will converge better...

That is my hope.  More below...

 There's perhaps a Bugzilla report where this is better posted, and tomorrow
 I'll look around a bit more for that, but I felt like getting it out there
 asap.  This has been a topic of much interest in my community and I'm
 starting to feel a bit alone in my desire to keep SLE across the board in
 my environment.  I've invested quite a bit of time and effort there and
 though many are fine with black box appliances, in our research environment
 I prefer to have more transparency.

To correct any confusion that is out there - there is no statement or intention 
to stop supporting SLES10/11 on the 1.8 release.  There is neither any 
statement or intention to stop supporting SLES on the client, even in the 2.0 
release.  There will be no official support from Oracle for SLES11 on the 2.0 
servers.  Since the SLES11 SP1 kernel is moving to the same base 2.6.32 kernel 
as RHEL6 (which IS going to be supported in Lustre 2.0) it is my hope that the 
kernel patches for these two distros will be the same.

I applaud that others are stepping forward to start contributing to Lustre, but 
let's not waste efforts doing the same work multiple times.  I think the only 
way to avoid this is if we continue to communicate in channels such as this 
list and bugzilla what is being worked on.

 On 5/19/2010 11:21 AM, Kevin Canady wrote:
 Quick Public Service Announcement
 
 ClusterStor is and will be providing support services for SLES on both 1.8x 
 and 2.x releases.  If anyone would like to receive additional information 
 please contact me at kevin.can...@clusterstor.com  or 415.505.7701
 
 Best regards,
 Kevin
 
 P. Kevin Canady
 Vice President,
 ClusterStor Inc.
 415.505.7701
 kevin.can...@clusterstor.com
 
 On May 19, 2010, at 8:01 AM, Andreas Dilger wrote:
 
 I've used a SLES kernel on an FC install for a long time on my home
 system. With newer distros there are also fewer changes to the base
 kernel, so there shouldn't be as much trouble to use e.g. the SLES 11
 SP1 kernel (2.6.32) when it is released.
 
 Cheers, Andreas
 
 On 2010-05-19, at 6:01, Heiko Schröterschro...@iup.physik.uni-bremen.d
 e  wrote:
 
 Am Mittwoch 19 Mai 2010, um 10:33:04 schrieben Sie:
 On 2010-05-19, at 01:40, Heiko Schröter wrote:
 we would like to know which way lustre is heading.
 
 From the s/w repository we see that only redhat and suse ditros
 seems to be supported.
 
 Is this the official policy of the lustre development to stick to
 (only) these two distros ?
 
 On the client side, we will support the main distros that our
 customers are using, namely RHEL/OEL/CentOS 5.x (and 6.x after
 release), and SLES 10/11.  We make a best-effort attempt to have
 the client work with all client kernels, but since our resources
 are limited we cannot test kernels other than the supported ones.
 I don't see any huge demand for e.g. an officially-supported Ubuntu
 client kernel, but there has long been an unofficial Debian lustre
 package.
 
 On the server side, we will continue to support RHEL5.x and
 SLES10/11 for the Lustre 1.8 release, and RHEL 5.x (6.x is being
 worked on) for the Lustre 2.x release.  Since maintaining kernel
 patches for other kernels is a lot of work, we do not attempt to
 provide patches for other than official kernels.  However, there
 have in the past been ports of the kernel patches to other kernels
 by external contributors (e.g. FC11, FC12, etc) and this will
 hopefully continue in the future.
 
 The server side is the more critical part as we are using gentoo
 +lustre running a vanilla kernel 2.6.22.19 with the lustre patches
 version 1.6.6.
 As far as we are concerned it 

Re: [Lustre-discuss] Future of lustre 1.8.3+

2010-05-20 Thread Brian J. Murrell
On Thu, 2010-05-20 at 14:45 -0600, Andreas Dilger wrote: 
 
 On 2010-05-20, at 11:33, Ramiro Alba Queipo r...@cttc.upc.edu wrote:
  Yes. I used them at the beginning, but after lustre 1.8.0.1 vanilla
  servers are not any more supported to be used as servers and I started
  to use RH5 kernels with lustre-1.8.1.1, both for clients and servers.
 
 That's too bad that the Debiana maintainers have stopped making  
 updates.

I don't know that they did.  I have to admit that I don't read these too
closely, but it seems that 1.8.3 did get accepted into some Debian
release (process):

http://lists.alioth.debian.org/pipermail/pkg-lustre-maintainers/2010-May/000317.html

Maybe, being unfamiliar with the Debian processes, I am
mis-understanding what that message is saying.  There were others at the
same time.  The archive for May is at
http://lists.alioth.debian.org/pipermail/pkg-lustre-maintainers/2010-May/thread.html

 Doubly so because we worked to add the fixes needed to build  
 on Debian into the Lustre sources, so it is possible to do make debs  
 to get Debian packages using all of the standard Debian packaging tools.

I could be wrong (as I have not inspected the actual Debian packages
lately) but I don't think they are using our work there (yet?)
unfortunately.  :-(

I would hope that the diff between us and them would be small, and if
not that they would report bugs and submit patches.  But perhaps also,
part of the problem is that our ability to focus time and effort on that
stuff is limited by the allocation of resources to supported aspects of
the project.  i.e. we can only provide best effort time and effort.

Cheers,
b.



signature.asc
Description: This is a digitally signed message part
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Best way to recover an OST

2010-05-20 Thread Mervini, Joseph A
Hi,

We encountered a multi-disk failure on one of our mdadm RAID6 8+2 OSTs. 2 
drives failed in the array within the space of a couple of hours and were 
replaced. It is questionable whether both drives are actually bad because we 
are seeing the same behavior in a test environment where a bad drive is 
actually causing a good drive to be kicked out of an array.

 Unfortunately another of the drives encountered IO errors during the resync 
process and failed causing the array to go out to lunch. The resync process was 
attempted two times with the same result. Fortunately I am able (at least for 
now) to assemble the array with the existing 8/10 arrays and am able to fsck, 
mount via ldiskfs and lustre and am in the process of copying files from the 
vulnerable OST to a backup location using lfs find --obd target 
/scratch|cpio -puvdm ...

My question is: What is the best way to restore the OST? Obviously I will need 
to somehow restore the array to its full 8+2 configuration. Whether we need to 
start from scratch or use some other means, that is our first priority. But I 
would like to make the recovery as transparent to the users as possible. 

One possible option that we are considering is simply removing the OST from 
Lustre, fixing the array and copying the recovered files to a newly created OST 
(not desirable). Another is to fix the OST (not remove it from Lustre), delete 
the files that exist  and then copy the recovered files back. The problem that 
comes to mind in either scenario is what happens if a file is part of a striped 
file? Does it lose its affinity with the rest of the stripe?

Another scenario that we are wondering about is if we mount the OST via ldiskfs 
and copy everything on the file system to a backup location, fix the array 
maintaining the same tunefs.lustre configuration, then move everything back 
using the same method as it was backed up, will the files be presented to 
lustre (mds and clients) just as it was before when mounted as a lustre file 
system? 

Thanks in advance for you advise and help.

Joe Mervini
Sandia National Laboratories
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Virtualization and Lustre

2010-05-20 Thread Tyler Hawes
Has there been any testing or conclusions regarding the use of virtualization 
and Lustre, or is this even possible considering how Lustre is coded? I've 
gotten used to the idea of virtualization for all our other servers, where it 
is great to know we can mount the image on another host very quickly if a 
hardware problem brings down a machine, and it seems the same would be nice 
with Lustre...

Tyler Hawes
Lit Post
www.litpost.com



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss