[Lustre-discuss] Recovery Problem
Hi Andreas My version of Lustre 1.8.3 Sorry for my bad English but I used the wrong word, crash is not the right word. I try to explain better, I start copying a large file on the file system and while the copy process continues, I reboot the server OSS, and the copy process enters state - stalled -. I expected that once the server back online, the copy process to resume normal and complete copy of the file, instead the copy process fault. Therefore the copy process that goes wrong, Lustre continues to perform good. The failure of the copy process is a timeout issue ? How can I change the timeout ? Thanks !!! Cheers, Stefano Ing. Stefano Elmopi Gruppo Darco - Resp. ICT Sistemi Via Ostiense 131/L Corpo B, 00154 Roma cell. 3466147165 tel. 0657060500 email:stefano.elm...@sociale.it Ai sensi e per effetti della legge sulla tutela della riservatezza personale (D.lgs n. 196/2003), questa @mail e' destinata unicamente alle persone sopra indicate e le informazioni in essa contenute sono da considerarsi strettamente riservate. E' proibito leggere, copiare, usare o diffondere il contenuto della presente @mail senza autorizzazione. Se avete ricevuto questo messaggio per errore, siete pregati di rispedire la stessa al mittente. Grazie Il giorno 19/mag/10, alle ore 17:07, Andreas Dilger ha scritto: More important is to include the crash message from the client and the version of Lustre you are using. Cheers, Andreas On 2010-05-19, at 6:34, Stefano Elmopi stefano.elm...@sociale.it wrote: Hi, I have a small problem but it certainly is the fault of the little knowledge I have by the argument. I have a Lustre file system with a node MGS/MDS, two nodes OSS and one Client. I launch a copy of a large file on Lustre and while the copy goes on, I restart the node OSS that is handling the writing on the File System. The copy process is put in the state -stalled- and when the node OSS is back on, I expected the copy process to resume normally, but instead crashes. This is a log on the node MGS: May 19 13:43:43 mdt01prdpom kernel: Lustre: 3827:0:(client.c: 1463:ptlrpc_expire_one_request()) @@@ Request x1336168048230433 sent from lustre01-OST-osc to NID 172.16.100@tcp 17s ago has timed out (17s prior to deadline). May 19 13:43:43 mdt01prdpom kernel: r...@81012e11e400 x1336168048230433/t0 o400-lustre01-ost_u...@172.16.100.121@tcp: 28/4 lens 192/384 e 0 to 1 dl 1274269423 ref 1 fl Rpc:N/0/0 rc 0/0 May 19 13:43:43 mdt01prdpom kernel: Lustre: lustre01-OST-osc: Connection to service lustre01-OST via nid 172.16.100@tcp was lost; in progress operations using this service will wait for recovery to complete. May 19 13:44:09 mdt01prdpom kernel: Lustre: 3828:0:(client.c: 1463:ptlrpc_expire_one_request()) @@@ Request x1336168048230435 sent from lustre01-OST-osc to NID 172.16.100@tcp 26s ago has timed out (26s prior to deadline). May 19 13:44:09 mdt01prdpom kernel: r...@81012e5f2000 x1336168048230435/t0 o8-lustre01-ost_u...@172.16.100.121@tcp: 28/4 lens 368/584 e 0 to 1 dl 1274269449 ref 1 fl Rpc:N/0/0 rc 0/0 May 19 13:44:37 mdt01prdpom kernel: Lustre: 3829:0:(import.c: 517:import_select_connection()) lustre01-OST-osc: tried all connections, increasing latency to 2s May 19 13:44:37 mdt01prdpom kernel: LustreError: 3828:0:(lib-move.c: 2441:LNetPut()) Error sending PUT to 12345-172.16.100@tcp: -113 May 19 13:44:37 mdt01prdpom kernel: LustreError: 3828:0:(events.c: 66:request_out_callback()) @@@ type 4, status -113 r...@81012d3e5800 x1336168048230437/t0 o8-lustre01-ost_u...@172.16.100.121 @tcp:28/4 lens 368/584 e 0 to 1 dl 1274269504 ref 2 fl Rpc:N/0/0 rc 0/0 May 19 13:44:37 mdt01prdpom kernel: Lustre: 3828:0:(client.c: 1463:ptlrpc_expire_one_request()) @@@ Request x1336168048230437 sent from lustre01-OST-osc to NID 172.16.100@tcp 0s ago has failed due to network error (27s prior to deadline). May 19 13:44:37 mdt01prdpom kernel: r...@81012d3e5800 x1336168048230437/t0 o8-lustre01-ost_u...@172.16.100.121@tcp: 28/4 lens 368/584 e 0 to 1 dl 1274269504 ref 1 fl Rpc:N/0/0 rc 0/0 May 19 13:45:33 mdt01prdpom kernel: Lustre: 3829:0:(import.c: 517:import_select_connection()) lustre01-OST-osc: tried all connections, increasing latency to 3s May 19 13:45:33 mdt01prdpom kernel: LustreError: 3828:0:(lib-move.c: 2441:LNetPut()) Error sending PUT to 12345-172.16.100@tcp: -113 May 19 13:45:33 mdt01prdpom kernel: LustreError: 3828:0:(events.c: 66:request_out_callback()) @@@ type 4, status -113 r...@81012e11e400 x1336168048230441/t0 o8-lustre01-ost_u...@172.16.100.121 @tcp:28/4 lens 368/584 e 0 to 1 dl 1274269561 ref 2 fl Rpc:N/0/0 rc 0/0 May 19 13:45:33 mdt01prdpom kernel: Lustre: 3828:0:(client.c: 1463:ptlrpc_expire_one_request()) @@@ Request x1336168048230441 sent from lustre01-OST-osc to NID 172.16.100@tcp 0s ago has failed due to
[Lustre-discuss] MGS Nids
Dear All, Im in the middle of creating a new Lustre setup, as a replacement for our current one. The current one is a single machine with MGS/MDT/OST all living on this one box. In the new setup I have 4 machines, two MDT's and two OST's We want to use keepalived as a failover mechanism between the two MDT's. To keep the MDT's in sync, I'm using a DRBD disk between the two. Keepalive uses a VIP in a active/passive state. In a failover situation the VIP gets transferred to the passive one. The problem I'm experiencing, is that I can't seem to get the VIP listed as a NID, thus the OSS can only connect on the real IP, which is unwanted in this situation. Is there an easy way to change the nid on the MGS machine to the VIP? See below for setup details. The last output from lctl list_nids is the problem area, where is that NID coming from? I hope some one can shed some light on this... Cheers, Leen Hosts: 192.168.21.32 fs-mgs-001 192.168.21.33 fs-mgs-002 192.168.21.34 fs-ost-001 192.168.21.35 fs-ost-002 192.168.21.40 fs-mgs-vip mkfs.lustre --reformat --fsname datafs --mgs --mgsnode=fs-mgs-...@tcp /dev/VG1/mgs mkfs.lustre --reformat --fsname datafs --mdt --mgsnode=fs-mgs-...@tcp /dev/drbd1 mount -t lustre /dev/VG1/mgs mgs/ mount -t lustre /dev/drbd1 /mnt/mdt/ fs-mgs-001:/mnt# lctl dl 0 UP mgs MGS MGS 9 1 UP mgc mgc192.168.21...@tcp 8f8dfecc-44bd-caae-3ed4-cd23168d59ab 5 2 UP mdt MDS MDS_uuid 3 3 UP lov datafs-mdtlov datafs-mdtlov_UUID 4 4 UP mds datafs-MDT datafs-MDT_UUID 3 fs-mgs-001:/mnt# lctl list_nids 192.168.21...@tcp ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Recovery Problem
On Thu, May 20, 2010 at 12:29:41PM +0200, Stefano Elmopi wrote: Hi Andreas My version of Lustre 1.8.3 Sorry for my bad English but I used the wrong word, crash is not the right word. I try to explain better, I start copying a large file on the file system and while the copy process continues, I reboot the server OSS, and the copy process enters state - stalled -. I expected that once the server back online, the copy process to resume normal and complete copy of the file, instead the copy process fault. Therefore the copy process that goes wrong, Lustre continues to perform good. May 19 13:46:31 mdt01prdpom kernel: LustreError: 167-0: This client was evicted by lustre01-OST; in progress operations using this service will fail. The cp process failed because the client got evicted by the OSS. We need to look at the OSS logs to figure out the root cause of the eviction. Johann ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] MGS Nids
On Thu, May 20, 2010 at 12:46:42PM +0200, leen smit wrote: In the new setup I have 4 machines, two MDT's and two OST's We want to use keepalived as a failover mechanism between the two MDT's. To keep the MDT's in sync, I'm using a DRBD disk between the two. Keepalive uses a VIP in a active/passive state. In a failover situation the VIP gets transferred to the passive one. Lustre uses stateful client/server connection. You don't need to - and cannot - use a virtual ip. The lustre protocol already takes care of reconnection recovery. The problem I'm experiencing, is that I can't seem to get the VIP listed as a NID, thus the OSS can only connect on the real IP, which is unwanted in this situation. Is there an easy way to change the nid on the MGS machine to the VIP? No, you have to list the nids of all the mgs nodes at mkfs time (i.e. --mgsnode=192.168.21...@tcp --mgsnode=192.168.21...@tcp in your case). Johann ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] MGS Nids
On Thu, 2010-05-20 at 12:46 +0200, leen smit wrote: Keepalive uses a VIP in a active/passive state. In a failover situation the VIP gets transferred to the passive one. Don't use virtual IPs with Lustre. Lustre clients know how to deal with failover nodes that have different IP addresses and using a virtual, floating IP address will just confuse it. b. signature.asc Description: This is a digitally signed message part ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] MGS Nids
Ok, no VIP's then.. But how does failover work in lustre then? If I setup everything using the real IP and then mount from a client and bring down the active MGS, the client will just sit there until it comes back up again. As in, there is no failover to the second node. So how does this internal lustre failover mechanism work? I've been going trought the docs, and I must say there is very little on the failover mechanism, apart from mentions that a seperate app should care of that. Thats the reason I'm implementing keepalived.. At this stage I really am clueless, and can only think of creating a TUN interface, which will have the VIP address (thus, it becomes a real IP, not just a VIP). But I got a feeling that ain't the right approach either... Is there any docs available where a active/passive MGS setup is described? Is it sufficient to define a --failnode=nid,... at creation time? Any help would be greatly appreciated! Leen On 05/20/2010 01:45 PM, Brian J. Murrell wrote: On Thu, 2010-05-20 at 12:46 +0200, leen smit wrote: Keepalive uses a VIP in a active/passive state. In a failover situation the VIP gets transferred to the passive one. Don't use virtual IPs with Lustre. Lustre clients know how to deal with failover nodes that have different IP addresses and using a virtual, floating IP address will just confuse it. b. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] MGS Nids
leen smit wrote: Ok, no VIP's then.. But how does failover work in lustre then? If I setup everything using the real IP and then mount from a client and bring down the active MGS, the client will just sit there until it comes back up again. As in, there is no failover to the second node. So how does this internal lustre failover mechanism work? I've been going trought the docs, and I must say there is very little on the failover mechanism, apart from mentions that a seperate app should care of that. Thats the reason I'm implementing keepalived.. Right: the external service needs to keep the mount active/healthy on one of the servers. Lustre handles reconnecting clients/servers as long as the volume is mounted where it expects (ie, the mkfs node or the --failover node). At this stage I really am clueless, and can only think of creating a TUN interface, which will have the VIP address (thus, it becomes a real IP, not just a VIP). But I got a feeling that ain't the right approach either... Is there any docs available where a active/passive MGS setup is described? Is it sufficient to define a --failnode=nid,... at creation time? Yep. See Johann's email on the MGS, but for the MDTs and OSTs that's all you have to do (besides listing both MGS NIDs at mkfs time). For the clients, you specify both MGS NIDs at mount time, so it can mount regardless of which node has the active MGS. Kevin Any help would be greatly appreciated! Leen On 05/20/2010 01:45 PM, Brian J. Murrell wrote: On Thu, 2010-05-20 at 12:46 +0200, leen smit wrote: Keepalive uses a VIP in a active/passive state. In a failover situation the VIP gets transferred to the passive one. Don't use virtual IPs with Lustre. Lustre clients know how to deal with failover nodes that have different IP addresses and using a virtual, floating IP address will just confuse it. b. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] MGS Nids
For a clearification in a two servers configuration: server1 - 192.168.2.20 MGS+MDT+OST0 server2 - 192.168.2.22 OST1 /dev/sdb is a lun shared between server1 and server 2 from server1: mkfs.lustre --mgs --failnode=192.168.2.22 --reformat /dev/sdb1 from server1: mkfs.lustre --reformat --mdt --mgsnode=192.168.2.20 --fsname=prova --failover=192.168.2.22 /dev/sdb4 from server1: mkfs.lustre --reformat --ost --mgsnode=192.168.2.20 --failover=192.168.2.22 --fsname=prova /dev/sdb2 from server2: mkfs.lustre --reformat --ost --mgsnode=192.168.2.20 --failover=192.168.2.20 --fsname=prova /dev/sdb3 from server1: mount -t lustre /dev/sdb1 /lustre/mgs_prova from server1: mount -t lustre /dev/sdb4 /lustre/mdt_prova from server1: mount -t lustre /dev/sdb2 /lustre/ost0_prova from server2: mount -t lustre /dev/sdb3 /lustre/ost1_prova from client: modprobe lustre mount -t lustre 192.168.2...@tcp:192.168.2...@tcp:/prova /prova now halt server1 and mount MGS, MDT and OST0 on server2, the client should continue the activity without problem On 05/20/2010 02:55 PM, Kevin Van Maren wrote: leen smit wrote: Ok, no VIP's then.. But how does failover work in lustre then? If I setup everything using the real IP and then mount from a client and bring down the active MGS, the client will just sit there until it comes back up again. As in, there is no failover to the second node. So how does this internal lustre failover mechanism work? I've been going trought the docs, and I must say there is very little on the failover mechanism, apart from mentions that a seperate app should care of that. Thats the reason I'm implementing keepalived.. Right: the external service needs to keep the mount active/healthy on one of the servers. Lustre handles reconnecting clients/servers as long as the volume is mounted where it expects (ie, the mkfs node or the --failover node). At this stage I really am clueless, and can only think of creating a TUN interface, which will have the VIP address (thus, it becomes a real IP, not just a VIP). But I got a feeling that ain't the right approach either... Is there any docs available where a active/passive MGS setup is described? Is it sufficient to define a --failnode=nid,... at creation time? Yep. See Johann's email on the MGS, but for the MDTs and OSTs that's all you have to do (besides listing both MGS NIDs at mkfs time). For the clients, you specify both MGS NIDs at mount time, so it can mount regardless of which node has the active MGS. Kevin Any help would be greatly appreciated! Leen On 05/20/2010 01:45 PM, Brian J. Murrell wrote: On Thu, 2010-05-20 at 12:46 +0200, leen smit wrote: Keepalive uses a VIP in a active/passive state. In a failover situation the VIP gets transferred to the passive one. Don't use virtual IPs with Lustre. Lustre clients know how to deal with failover nodes that have different IP addresses and using a virtual, floating IP address will just confuse it. b. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- _Gabriele Paciucci_ http://www.linkedin.com/in/paciucci Pursuant to legislative Decree n. 196/03 you are hereby informed that this email contains confidential information intended only for use of addressee. If you are not the addressee and have received this email by mistake, please send this email to the sender. You may not copy or disseminate this message to anyone. Thank You. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Modifying Lustre network (good practices)
Dear All, We have a cluster with lustre critical data. On this cluster there are three networks on each Lustre server and client : one ethernet network for administration (eth0), and two other ethernet networks configured in bonding (bond0: eth1 eth2). On Lustre we get poor read performances and good write performances so we decide to modify Lustre network in order to see if problems comes from network layer. Currently Lustre network is bond0. We want to set it as eth0, then eth1, then eth2 and finally back to bond0 in order to compare performances. Therefore, we'll perform the following steps: we will umount the filesystem, reformat the mgs, change lnet options in modprobe file, start new mgs server, and finally modify our ost and mdt with tunefs.lustre with failover and mgs new nids using --erase-params and --writeconf options. We tested it successfully on a test filesystem but we read in the manual that this can be really dangerous. Do you agree with this procedure? Do you have some advice or practice on this kind of requests? What's the danger? Regards. -- Olivier, Hargoaa Phone: + 33 4 76 29 76 25 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Modifying Lustre network (good practices)
On Thu, 2010-05-20 at 16:27 +0200, Olivier Hargoaa wrote: On Lustre we get poor read performances and good write performances so we decide to modify Lustre network in order to see if problems comes from network layer. Without having any other information other than your statement that performance is good in one direction but not the other I wonder why you consider the network as being the most likely candidate as a culprit for this problem. I haven't come across very many networks that (weren't designed to be and yet) are fast in one direction and slow in the other. Therefore, we'll perform the following steps: we will umount the filesystem, reformat the mgs, change lnet options in modprobe file, start new mgs server, and finally modify our ost and mdt with tunefs.lustre with failover and mgs new nids using --erase-params and --writeconf options. Sounds like a lot of rigmarole to test something that I would consider to be of low probability (given the brief amount of information you have provided). But even if I did suspect the network were slow in only one direction, before I started mucking with reconfiguring Lustre for different networks, I would do some basic network throughput testing to verify my hypothesis and adjust the probability of the network being the problem accordingly. Did you do any hardware profiling (i.e. using the lustre-iokit) before deploying Lustre on this hardware? We always recommend profiling the hardware for exactly this reason: explaining performance problems. Unfortunately, now that you have data on the hardware, it's much more difficult to profile the hardware because to do it properly, you need to be able to write to the disks. b. signature.asc Description: This is a digitally signed message part ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Modifying Lustre network (good practices)
Which bonding method are you using? Has the performance always been this way? Depending on which bonding type you are using and the network hardware involved you might see the behavior you are describing. On Thu, 2010-05-20 at 16:27 +0200, Olivier Hargoaa wrote: Dear All, We have a cluster with lustre critical data. On this cluster there are three networks on each Lustre server and client : one ethernet network for administration (eth0), and two other ethernet networks configured in bonding (bond0: eth1 eth2). On Lustre we get poor read performances and good write performances so we decide to modify Lustre network in order to see if problems comes from network layer. Currently Lustre network is bond0. We want to set it as eth0, then eth1, then eth2 and finally back to bond0 in order to compare performances. Therefore, we'll perform the following steps: we will umount the filesystem, reformat the mgs, change lnet options in modprobe file, start new mgs server, and finally modify our ost and mdt with tunefs.lustre with failover and mgs new nids using --erase-params and --writeconf options. We tested it successfully on a test filesystem but we read in the manual that this can be really dangerous. Do you agree with this procedure? Do you have some advice or practice on this kind of requests? What's the danger? Regards. -- Sent from my wired giant hulking workstation Nate Pearlstein - npe...@sgi.com - Product Support Engineer ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] MGS Nids
You need two MGS nodes for 'mount' commnand on the clients. e.g) mount -t lustre 192.168.1...@tcp:192.168.1...@tcp:/lustre /lustre client will attempt to connect to secondary MGS once primary is not available. Thanks Ihara (5/20/10 9:22 PM), leen smit wrote: Ok, no VIP's then.. But how does failover work in lustre then? If I setup everything using the real IP and then mount from a client and bring down the active MGS, the client will just sit there until it comes back up again. As in, there is no failover to the second node. So how does this internal lustre failover mechanism work? I've been going trought the docs, and I must say there is very little on the failover mechanism, apart from mentions that a seperate app should care of that. Thats the reason I'm implementing keepalived.. At this stage I really am clueless, and can only think of creating a TUN interface, which will have the VIP address (thus, it becomes a real IP, not just a VIP). But I got a feeling that ain't the right approach either... Is there any docs available where a active/passive MGS setup is described? Is it sufficient to define a --failnode=nid,... at creation time? Any help would be greatly appreciated! Leen On 05/20/2010 01:45 PM, Brian J. Murrell wrote: On Thu, 2010-05-20 at 12:46 +0200, leen smit wrote: Keepalive uses a VIP in a active/passive state. In a failover situation the VIP gets transferred to the passive one. Don't use virtual IPs with Lustre. Lustre clients know how to deal with failover nodes that have different IP addresses and using a virtual, floating IP address will just confuse it. b. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Modifying Lustre network (good practices)
On Thu, May 20, 2010 at 10:43:58AM -0400, Brian J. Murrell wrote: On Thu, 2010-05-20 at 16:27 +0200, Olivier Hargoaa wrote: On Lustre we get poor read performances and good write performances so we decide to modify Lustre network in order to see if problems comes from network layer. Without having any other information other than your statement that performance is good in one direction but not the other I wonder why you consider the network as being the most likely candidate as a culprit for this problem. Maybe you should start with running lnet self test to compare read write performance? http://wiki.lustre.org/manual/LustreManual18_HTML/LustreIOKit.html#50598014_pgfId-1290255 Johann ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Problems with MDS Crashing
We have had another hang, but this time we had KVM access to the machine (and the screen blanker wasn't on). I took some screenshots, the first one is an error I got after reboot, the BMP one is what I saw when I first logged in to KVM, and the other ones are what I saw when trying to type 'root' - it started printing traces. http://amber.leeware.com/wi/lustre-death/ After reboot there was a command timeout message from RAID card. When hanged - too little hardware resources. -- Andrew http://CloudAccess.net/ ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Modifying Lustre network (good practices)
Nate Pearlstein a écrit : Which bonding method are you using? Has the performance always been this way? Depending on which bonding type you are using and the network hardware involved you might see the behavior you are describing. Hi, Here is our bonding configuration : On linux side : mode=4- to use 802.3ad miimon=100 - to set the link check interval (ms) xmit_hash_policy=layer2+3 - to set XOR hashing method lacp_rate=fast - to set LCAPDU tx rate to request (slow=20s, fast=1s) Onethernet switch side, load balancing is configured as: # port-channel load-balance src-dst-mac thanks On Thu, 2010-05-20 at 16:27 +0200, Olivier Hargoaa wrote: Dear All, We have a cluster with lustre critical data. On this cluster there are three networks on each Lustre server and client : one ethernet network for administration (eth0), and two other ethernet networks configured in bonding (bond0: eth1 eth2). On Lustre we get poor read performances and good write performances so we decide to modify Lustre network in order to see if problems comes from network layer. Currently Lustre network is bond0. We want to set it as eth0, then eth1, then eth2 and finally back to bond0 in order to compare performances. Therefore, we'll perform the following steps: we will umount the filesystem, reformat the mgs, change lnet options in modprobe file, start new mgs server, and finally modify our ost and mdt with tunefs.lustre with failover and mgs new nids using --erase-params and --writeconf options. We tested it successfully on a test filesystem but we read in the manual that this can be really dangerous. Do you agree with this procedure? Do you have some advice or practice on this kind of requests? What's the danger? Regards. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Modifying Lustre network (good practices)
Can't really help with your larger question but I had a similar experience with network appropriate write rates and slower reads. You might check that you have enabled TCP selective acknowledgments, echo 1 /proc/sys/net/ipv4/tcp_sack or net.ipv4.tcp_sack = 1 This can help in cases where your OSS's have larger pipes than your clients and your files are striped across multiple OSS's. When multiple OSS's are transmitting to a single client they can over run the switch buffers and drop packets. This is particularly noticeable when doing IOzone type benchmarking from a single client with a wide lfs stripe setting. With selective ACKs enabled the client will request a more minimal set of packets be retransmitted ... or at least that's what I finally deduced when I ran into it. James Robnett NRAO/NM ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Future of lustre 1.8.3+
Hi all, On Wed, 2010-05-19 at 14:43 +0200, Bernd Schubert wrote: That is what I would recommend and what several groups do (usually with Debian, though). not running the sles/rhel distro ? I know a lot of things can happen but are these rhel/sles patches do brake some key features of the kernel which would only work under that specific distro ? I've positivley tested a lustre client with a sles patched kernel on a gentoo distro. But i'am a bit nervous about testing it on our live lustre server system. The only thing that really might cause trouble is udev, since sysfs maintainers like to break old udev versions. I think upcoming Debian Squeeze requires 2.6.27 at a minimum. Argg!!!, I think I am stacked with this situation. I compiled linux-2.6.18-164.11.1.el5.tar.bz2 under Ubuntu 10.04 (from Debian Squeeze) using make-kpkg and after dealing with postinst.d hooks to create initrd image (not created by default!!!), I tried to boot and then I got the following before dropping into a initramfs shell: --- Begin: Mounting root file system... ... Begin: Running /scripts/local-top ... Done. libudev: udev_monitor_new_from_netlink: error getting socket: Invalid argument wait-for-root[431] : segfault Gave up waiting form root device. Common problems: - Boot args (cat /proc/cmdline) - Check rootdelay= (did the system wait long enough?) - Check root= (did the system wait for the right device?) -Missing modules (cat /proc/modules; ls /dev) ALERT! /dev/disk/by-uuid/0cff75ed-ba7f-4799-b596-cf2b214b9768 does not exist. Dropping to a shell! BusyBox v1.13.3 (Ubuntu 1:1.13.3-1ubuntu11) buit-in shell (ask) Enter 'Help' for a list of built-in commands. (initramfs) -- Is this the problem you are talking about? Could you give me some ideas of how can be solved? Regards -- Ramiro Alba Centre Tecnològic de Tranferència de Calor http://www.cttc.upc.edu Escola Tècnica Superior d'Enginyeries Industrial i Aeronàutica de Terrassa Colom 11, E-08222, Terrassa, Barcelona, Spain Tel: (+34) 93 739 86 46 -- Aquest missatge ha estat analitzat per MailScanner a la cerca de virus i d'altres continguts perillosos, i es considera que està net. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Future of lustre 1.8.3+
The SLES11 kernel is at 2.6.27 so it could be usable for this. Also, I thought that there were Debian packages for Lustre, why not use those? Cheers, Andreas On 2010-05-20, at 9:48, Ramiro Alba Queipo r...@cttc.upc.edu wrote: Hi all, On Wed, 2010-05-19 at 14:43 +0200, Bernd Schubert wrote: That is what I would recommend and what several groups do (usually with Debian, though). not running the sles/rhel distro ? I know a lot of things can happen but are these rhel/sles patches do brake some key features of the kernel which would only work under that specific distro ? I've positivley tested a lustre client with a sles patched kernel on a gentoo distro. But i'am a bit nervous about testing it on our live lustre server system. The only thing that really might cause trouble is udev, since sysfs maintainers like to break old udev versions. I think upcoming Debian Squeeze requires 2.6.27 at a minimum. Argg!!!, I think I am stacked with this situation. I compiled linux-2.6.18-164.11.1.el5.tar.bz2 under Ubuntu 10.04 (from Debian Squeeze) using make-kpkg and after dealing with postinst.d hooks to create initrd image (not created by default!!!), I tried to boot and then I got the following before dropping into a initramfs shell: --- --- - Begin: Mounting root file system... ... Begin: Running /scripts/local-top ... Done. libudev: udev_monitor_new_from_netlink: error getting socket: Invalid argument wait-for-root[431] : segfault Gave up waiting form root device. Common problems: - Boot args (cat /proc/cmdline) - Check rootdelay= (did the system wait long enough?) - Check root= (did the system wait for the right device?) -Missing modules (cat /proc/modules; ls /dev) ALERT! /dev/disk/by-uuid/0cff75ed-ba7f-4799-b596-cf2b214b9768 does not exist. Dropping to a shell! BusyBox v1.13.3 (Ubuntu 1:1.13.3-1ubuntu11) buit-in shell (ask) Enter 'Help' for a list of built-in commands. (initramfs) --- --- --- - Is this the problem you are talking about? Could you give me some ideas of how can be solved? Regards -- Ramiro Alba Centre Tecnològic de Tranferència de Calor http://www.cttc.upc.edu Escola Tècnica Superior d'Enginyeries Industrial i Aeronàutica de Terrassa Colom 11, E-08222, Terrassa, Barcelona, Spain Tel: (+34) 93 739 86 46 -- Aquest missatge ha estat analitzat per MailScanner a la cerca de virus i d'altres continguts perillosos, i es considera que està net. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Modifying Lustre network (good practices)
Hi Brian and all others, I'm sorry for not giving you all details. Here I will send you all information I have. Regarding our configuration : Lustre IO nodes are linked with two 10GB bonded links. Compute nodes are linked with two 1GB bonded links. Raw performances on server are fine for both write and read for each ost. Firstly we ran iperf (severals times), and we obtained expected read and write rate. Results are symmetric (read and write) with any number of threads. Then we test with LNET self test : Here is our lst command for write test lst add_test --batch bulkr --from c --to s brw write check=simple size=1M and result are : [LNet Rates of c] [R] Avg: 110 RPC/s Min: 110 RPC/s Max: 110 RPC/s [W] Avg: 219 RPC/s Min: 219 RPC/s Max: 219 RPC/s [LNet Bandwidth of c] [R] Avg: 0.02 MB/s Min: 0.02 MB/s Max: 0.02 MB/s [W] Avg: 109.20 MB/s Min: 109.20 MB/s Max: 109.20 MB/s [LNet Rates of c] [R] Avg: 109 RPC/s Min: 109 RPC/s Max: 109 RPC/s [W] Avg: 217 RPC/s Min: 217 RPC/s Max: 217 RPC/s [LNet Bandwidth of c] [R] Avg: 0.02 MB/s Min: 0.02 MB/s Max: 0.02 MB/s [W] Avg: 108.40 MB/s Min: 108.40 MB/s Max: 108.40 MB/s [LNet Rates of c] [R] Avg: 109 RPC/s Min: 109 RPC/s Max: 109 RPC/s [W] Avg: 217 RPC/s Min: 217 RPC/s Max: 217 RPC/s [LNet Bandwidth of c] [R] Avg: 0.02 MB/s Min: 0.02 MB/s Max: 0.02 MB/s [W] Avg: 108.40 MB/s Min: 108.40 MB/s Max: 108.40 MB/s and now for read : [LNet Rates of c] [R] Avg: 10 RPC/s Min: 10 RPC/s Max: 10 RPC/s [W] Avg: 5RPC/s Min: 5RPC/s Max: 5RPC/s [LNet Bandwidth of c] [R] Avg: 4.59 MB/s Min: 4.59 MB/s Max: 4.59 MB/s [W] Avg: 0.00 MB/s Min: 0.00 MB/s Max: 0.00 MB/s [LNet Rates of c] [R] Avg: 10 RPC/s Min: 10 RPC/s Max: 10 RPC/s [W] Avg: 5RPC/s Min: 5RPC/s Max: 5RPC/s [LNet Bandwidth of c] [R] Avg: 4.79 MB/s Min: 4.79 MB/s Max: 4.79 MB/s [W] Avg: 0.00 MB/s Min: 0.00 MB/s Max: 0.00 MB/s [LNet Rates of c] [R] Avg: 10 RPC/s Min: 10 RPC/s Max: 10 RPC/s [W] Avg: 5RPC/s Min: 5RPC/s Max: 5RPC/s [LNet Bandwidth of c] [R] Avg: 4.79 MB/s Min: 4.79 MB/s Max: 4.79 MB/s [W] Avg: 0.00 MB/s Min: 0.00 MB/s Max: 0.00 MB/s Iozone presents same asymmetric results as LNET. With just one ost : On WRITE sense, we get 233 MB/sec and taking into account maximum theorical is 250 MB/sec is a very good result: it works fine: On READ sense, the maximun we get is 149 MB/sec with three theats ( processes: -t 3 ). If we configure four theats ( -t 4 ) we get 50 MB/sec We also verified in brw_stats file that we use 1MB block size (both r and w) So we only have problems with iozone/lustre and lnet selftest. Thanks to all. Brian J. Murrell a écrit : On Thu, 2010-05-20 at 16:27 +0200, Olivier Hargoaa wrote: On Lustre we get poor read performances and good write performances so we decide to modify Lustre network in order to see if problems comes from network layer. Without having any other information other than your statement that performance is good in one direction but not the other I wonder why you consider the network as being the most likely candidate as a culprit for this problem. I haven't come across very many networks that (weren't designed to be and yet) are fast in one direction and slow in the other. Therefore, we'll perform the following steps: we will umount the filesystem, reformat the mgs, change lnet options in modprobe file, start new mgs server, and finally modify our ost and mdt with tunefs.lustre with failover and mgs new nids using --erase-params and --writeconf options. Sounds like a lot of rigmarole to test something that I would consider to be of low probability (given the brief amount of information you have provided). But even if I did suspect the network were slow in only one direction, before I started mucking with reconfiguring Lustre for different networks, I would do some basic network throughput testing to verify my hypothesis and adjust the probability of the network being the problem accordingly. Did you do any hardware profiling (i.e. using the lustre-iokit) before deploying Lustre on this hardware? We always recommend profiling the hardware for exactly this reason: explaining performance problems. Unfortunately, now that you have data on the hardware, it's much more difficult to profile the hardware because to do it properly, you need to be able to write to the disks. b. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Modifying Lustre network (good practices)
Thanks Johann, But you couldn't know but we already ran lnet self test unsuccessfully. I wrote results as answer to Brian. What I do not know is if lnet test was good or not with bonding deactivated. I will ask administrators to test it. Regards. Johann Lombardi a écrit : On Thu, May 20, 2010 at 10:43:58AM -0400, Brian J. Murrell wrote: On Thu, 2010-05-20 at 16:27 +0200, Olivier Hargoaa wrote: On Lustre we get poor read performances and good write performances so we decide to modify Lustre network in order to see if problems comes from network layer. Without having any other information other than your statement that performance is good in one direction but not the other I wonder why you consider the network as being the most likely candidate as a culprit for this problem. Maybe you should start with running lnet self test to compare read write performance? http://wiki.lustre.org/manual/LustreManual18_HTML/LustreIOKit.html#50598014_pgfId-1290255 Johann -- Olivier, Hargoaa Phone: + 33 4 76 29 76 25 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Future of lustre 1.8.3+
On Thu, 2010-05-20 at 10:16 -0600, Andreas Dilger wrote: The SLES11 kernel is at 2.6.27 so it could be usable for this. Also, I Ok, I am getting http://downloads.lustre.org/public/kernels/sles11/linux-2.6.27.39-0.3.1.tar.bz2 but, please. Where can I get a suitable config file to apply both for servers and clients? thought that there were Debian packages for Lustre, why not use those? Yes. I used them at the beginning, but after lustre 1.8.0.1 vanilla servers are not any more supported to be used as servers and I started to use RH5 kernels with lustre-1.8.1.1, both for clients and servers. In my case using Ubuntu 10.04 with kernel 2.6.32 NOT suported as a lustre patchless client, I prefer using an only kernel for the whole cluster, do I? Thanks for your answer. Regards Cheers, Andreas On 2010-05-20, at 9:48, Ramiro Alba Queipo r...@cttc.upc.edu wrote: Hi all, On Wed, 2010-05-19 at 14:43 +0200, Bernd Schubert wrote: That is what I would recommend and what several groups do (usually with Debian, though). not running the sles/rhel distro ? I know a lot of things can happen but are these rhel/sles patches do brake some key features of the kernel which would only work under that specific distro ? I've positivley tested a lustre client with a sles patched kernel on a gentoo distro. But i'am a bit nervous about testing it on our live lustre server system. The only thing that really might cause trouble is udev, since sysfs maintainers like to break old udev versions. I think upcoming Debian Squeeze requires 2.6.27 at a minimum. Argg!!!, I think I am stacked with this situation. I compiled linux-2.6.18-164.11.1.el5.tar.bz2 under Ubuntu 10.04 (from Debian Squeeze) using make-kpkg and after dealing with postinst.d hooks to create initrd image (not created by default!!!), I tried to boot and then I got the following before dropping into a initramfs shell: --- --- - Begin: Mounting root file system... ... Begin: Running /scripts/local-top ... Done. libudev: udev_monitor_new_from_netlink: error getting socket: Invalid argument wait-for-root[431] : segfault Gave up waiting form root device. Common problems: - Boot args (cat /proc/cmdline) - Check rootdelay= (did the system wait long enough?) - Check root= (did the system wait for the right device?) -Missing modules (cat /proc/modules; ls /dev) ALERT! /dev/disk/by-uuid/0cff75ed-ba7f-4799-b596-cf2b214b9768 does not exist. Dropping to a shell! BusyBox v1.13.3 (Ubuntu 1:1.13.3-1ubuntu11) buit-in shell (ask) Enter 'Help' for a list of built-in commands. (initramfs) --- --- --- - Is this the problem you are talking about? Could you give me some ideas of how can be solved? Regards -- Ramiro Alba Centre Tecnològic de Tranferència de Calor http://www.cttc.upc.edu Escola Tècnica Superior d'Enginyeries Industrial i Aeronàutica de Terrassa Colom 11, E-08222, Terrassa, Barcelona, Spain Tel: (+34) 93 739 86 46 -- Aquest missatge ha estat analitzat per MailScanner a la cerca de virus i d'altres continguts perillosos, i es considera que està net. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- Ramiro Alba Centre Tecnològic de Tranferència de Calor http://www.cttc.upc.edu Escola Tècnica Superior d'Enginyeries Industrial i Aeronàutica de Terrassa Colom 11, E-08222, Terrassa, Barcelona, Spain Tel: (+34) 93 739 86 46 -- Aquest missatge ha estat analitzat per MailScanner a la cerca de virus i d'altres continguts perillosos, i es considera que està net. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Future of lustre 1.8.3+
On 2010-05-20, at 11:33, Ramiro Alba Queipo r...@cttc.upc.edu wrote: On Thu, 2010-05-20 at 10:16 -0600, Andreas Dilger wrote: The SLES11 kernel is at 2.6.27 so it could be usable for this. Also, I Ok, I am getting http://downloads.lustre.org/public/kernels/sles11/linux-2.6.27.39-0.3.1.tar.bz2 but, please. Where can I get a suitable config file to apply both for servers and clients? The config files are in the Lustre source tree in lustre/ kernel_patches/kernel_configs/ You can just use the .config file from your client kernel (ie patchless) since you don't need the patched server kernel on the clients. thought that there were Debian packages for Lustre, why not use those? Yes. I used them at the beginning, but after lustre 1.8.0.1 vanilla servers are not any more supported to be used as servers and I started to use RH5 kernels with lustre-1.8.1.1, both for clients and servers. That's too bad that the Debiana maintainers have stopped making updates. Doubly so because we worked to add the fixes needed to build on Debian into the Lustre sources, so it is possible to do make debs to get Debian packages using all of the standard Debian packaging tools. In my case using Ubuntu 10.04 with kernel 2.6.32 NOT suported as a lustre patchless client, I prefer using an only kernel for the whole cluster, do I? That is by no means a requirement. Thanks for your answer. Regards Cheers, Andreas On 2010-05-20, at 9:48, Ramiro Alba Queipo r...@cttc.upc.edu wrote: Hi all, On Wed, 2010-05-19 at 14:43 +0200, Bernd Schubert wrote: That is what I would recommend and what several groups do (usually with Debian, though). not running the sles/rhel distro ? I know a lot of things can happen but are these rhel/sles patches do brake some key features of the kernel which would only work under that specific distro ? I've positivley tested a lustre client with a sles patched kernel on a gentoo distro. But i'am a bit nervous about testing it on our live lustre server system. The only thing that really might cause trouble is udev, since sysfs maintainers like to break old udev versions. I think upcoming Debian Squeeze requires 2.6.27 at a minimum. Argg!!!, I think I am stacked with this situation. I compiled linux-2.6.18-164.11.1.el5.tar.bz2 under Ubuntu 10.04 (from Debian Squeeze) using make-kpkg and after dealing with postinst.d hooks to create initrd image (not created by default!!!), I tried to boot and then I got the following before dropping into a initramfs shell: --- --- --- -- Begin: Mounting root file system... ... Begin: Running /scripts/local-top ... Done. libudev: udev_monitor_new_from_netlink: error getting socket: Invalid argument wait-for-root[431] : segfault Gave up waiting form root device. Common problems: - Boot args (cat /proc/cmdline) - Check rootdelay= (did the system wait long enough?) - Check root= (did the system wait for the right device?) -Missing modules (cat /proc/modules; ls /dev) ALERT! /dev/disk/by-uuid/0cff75ed-ba7f-4799-b596-cf2b214b9768 does not exist. Dropping to a shell! BusyBox v1.13.3 (Ubuntu 1:1.13.3-1ubuntu11) buit-in shell (ask) Enter 'Help' for a list of built-in commands. (initramfs) --- --- --- --- -- Is this the problem you are talking about? Could you give me some ideas of how can be solved? Regards -- Ramiro Alba Centre Tecnològic de Tranferència de Calor http://www.cttc.upc.edu Escola Tècnica Superior d'Enginyeries Industrial i Aeronàutica de Terrassa Colom 11, E-08222, Terrassa, Barcelona, Spain Tel: (+34) 93 739 86 46 -- Aquest missatge ha estat analitzat per MailScanner a la cerca de virus i d'altres continguts perillosos, i es considera que està net. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- Ramiro Alba Centre Tecnològic de Tranferència de Calor http://www.cttc.upc.edu Escola Tècnica Superior d'Enginyeries Industrial i Aeronàutica de Terrassa Colom 11, E-08222, Terrassa, Barcelona, Spain Tel: (+34) 93 739 86 46 -- Aquest missatge ha estat analitzat per MailScanner a la cerca de virus i d'altres continguts perillosos, i es considera que està net. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Group descriptors corrupted
I just went though something similar. When your fsck completes you may be left with things moved to your lost+found. If that happens, you can mount the file system using -t ldiskfs and run the ll_recover_lost_found_objs against the lost+found directory. -- Andrew -Original Message- From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Andreas Dilger Sent: Wednesday, May 19, 2010 6:02 PM To: John White Cc: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] Group descriptors corrupted On 2010-05-19, at 17:37, John White wrote: A little help here? One OST fails to mount with the following: LDISKFS-fs: group descriptors corrupted! LustreError: 8364:0:(obd_mount.c:1278:server_kernel_mount()) premount /dev/mpath/lun_13:0x0 ldiskfs failed: -22, ldiskfs2 failed: -19. Is the ldiskfs module available? LustreError: 8364:0:(obd_mount.c:1592:server_fill_super()) Unable to mount device /dev/mpath/lun_13: -22 LustreError: 8364:0:(obd_mount.c:1997:lustre_fill_super()) Unable to mount (-22) LDISKFS-fs error (device dm-13): ldiskfs_check_descriptors: Checksum for group 2560 failed (12546!=45229) I assume running an e2fsck -fy against the OST is the prefered solution, I just want to confirm. That would be the normal course of action. Actually, e2fsck -fp is slightly better than e2fsck -fy, since it chooses prudent answers to the questions, instead of yes always, and aborts if there isn't a safe/obvious choice. 'e2fsck -fn [dev]' gives: Group descriptor 2560 checksum is invalid. Fix? no [...] Group descriptor 2687 checksum is invalid. Fix? no Best to save the full e2fsck -fn output for future reference. If this is the only problem, then no worries, but the checksums may also be invalid because there is other corruption, and this is only the first sign of trouble Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Future of lustre 1.8.3+
On 2010-05-19, at 19:04, Dardo D Kleiner - CONTRACTOR wrote: I'm both personally and professionally encouraged to hear ClusterStor stand up and publicly state intent to support Lustre on SLES. The million dollar question is in regards to *server* support - in particular wrt the 2.x series. As a token of my interest, as well as a testament to my limited ability to maintain this in the long run, I submit the attached patches to the recent 1.10.0.40 beta release that enable building (and confirmed to be runnable) the server on current SLES11 kernel (2.6.27.45-0.1-default). One caveat is that quota support is not compilable and appears to be a bit more difficult job than I can probably manage. And I most certainly didn't run a full regression suite, but a straightforward single stream read/write appears to work fine. In fact, it is best to check bugzilla first and/or post to the list. We update Lustre to match the latest RHEL/SLES kernels for each releases. Work is already underway to address the significant changes in SLES11 SP1, since this moves the kernel from 2.6.27 to 2.6.32. Up to .38 it was mostly monkey work - but .40 introduced additional patches to the RHEL ext4 implementation that has more substantially diverged from the one in current SLES11. Perhaps SLES11SP1 will converge better... That is my hope. More below... There's perhaps a Bugzilla report where this is better posted, and tomorrow I'll look around a bit more for that, but I felt like getting it out there asap. This has been a topic of much interest in my community and I'm starting to feel a bit alone in my desire to keep SLE across the board in my environment. I've invested quite a bit of time and effort there and though many are fine with black box appliances, in our research environment I prefer to have more transparency. To correct any confusion that is out there - there is no statement or intention to stop supporting SLES10/11 on the 1.8 release. There is neither any statement or intention to stop supporting SLES on the client, even in the 2.0 release. There will be no official support from Oracle for SLES11 on the 2.0 servers. Since the SLES11 SP1 kernel is moving to the same base 2.6.32 kernel as RHEL6 (which IS going to be supported in Lustre 2.0) it is my hope that the kernel patches for these two distros will be the same. I applaud that others are stepping forward to start contributing to Lustre, but let's not waste efforts doing the same work multiple times. I think the only way to avoid this is if we continue to communicate in channels such as this list and bugzilla what is being worked on. On 5/19/2010 11:21 AM, Kevin Canady wrote: Quick Public Service Announcement ClusterStor is and will be providing support services for SLES on both 1.8x and 2.x releases. If anyone would like to receive additional information please contact me at kevin.can...@clusterstor.com or 415.505.7701 Best regards, Kevin P. Kevin Canady Vice President, ClusterStor Inc. 415.505.7701 kevin.can...@clusterstor.com On May 19, 2010, at 8:01 AM, Andreas Dilger wrote: I've used a SLES kernel on an FC install for a long time on my home system. With newer distros there are also fewer changes to the base kernel, so there shouldn't be as much trouble to use e.g. the SLES 11 SP1 kernel (2.6.32) when it is released. Cheers, Andreas On 2010-05-19, at 6:01, Heiko Schröterschro...@iup.physik.uni-bremen.d e wrote: Am Mittwoch 19 Mai 2010, um 10:33:04 schrieben Sie: On 2010-05-19, at 01:40, Heiko Schröter wrote: we would like to know which way lustre is heading. From the s/w repository we see that only redhat and suse ditros seems to be supported. Is this the official policy of the lustre development to stick to (only) these two distros ? On the client side, we will support the main distros that our customers are using, namely RHEL/OEL/CentOS 5.x (and 6.x after release), and SLES 10/11. We make a best-effort attempt to have the client work with all client kernels, but since our resources are limited we cannot test kernels other than the supported ones. I don't see any huge demand for e.g. an officially-supported Ubuntu client kernel, but there has long been an unofficial Debian lustre package. On the server side, we will continue to support RHEL5.x and SLES10/11 for the Lustre 1.8 release, and RHEL 5.x (6.x is being worked on) for the Lustre 2.x release. Since maintaining kernel patches for other kernels is a lot of work, we do not attempt to provide patches for other than official kernels. However, there have in the past been ports of the kernel patches to other kernels by external contributors (e.g. FC11, FC12, etc) and this will hopefully continue in the future. The server side is the more critical part as we are using gentoo +lustre running a vanilla kernel 2.6.22.19 with the lustre patches version 1.6.6. As far as we are concerned it
Re: [Lustre-discuss] Future of lustre 1.8.3+
On Thu, 2010-05-20 at 14:45 -0600, Andreas Dilger wrote: On 2010-05-20, at 11:33, Ramiro Alba Queipo r...@cttc.upc.edu wrote: Yes. I used them at the beginning, but after lustre 1.8.0.1 vanilla servers are not any more supported to be used as servers and I started to use RH5 kernels with lustre-1.8.1.1, both for clients and servers. That's too bad that the Debiana maintainers have stopped making updates. I don't know that they did. I have to admit that I don't read these too closely, but it seems that 1.8.3 did get accepted into some Debian release (process): http://lists.alioth.debian.org/pipermail/pkg-lustre-maintainers/2010-May/000317.html Maybe, being unfamiliar with the Debian processes, I am mis-understanding what that message is saying. There were others at the same time. The archive for May is at http://lists.alioth.debian.org/pipermail/pkg-lustre-maintainers/2010-May/thread.html Doubly so because we worked to add the fixes needed to build on Debian into the Lustre sources, so it is possible to do make debs to get Debian packages using all of the standard Debian packaging tools. I could be wrong (as I have not inspected the actual Debian packages lately) but I don't think they are using our work there (yet?) unfortunately. :-( I would hope that the diff between us and them would be small, and if not that they would report bugs and submit patches. But perhaps also, part of the problem is that our ability to focus time and effort on that stuff is limited by the allocation of resources to supported aspects of the project. i.e. we can only provide best effort time and effort. Cheers, b. signature.asc Description: This is a digitally signed message part ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Best way to recover an OST
Hi, We encountered a multi-disk failure on one of our mdadm RAID6 8+2 OSTs. 2 drives failed in the array within the space of a couple of hours and were replaced. It is questionable whether both drives are actually bad because we are seeing the same behavior in a test environment where a bad drive is actually causing a good drive to be kicked out of an array. Unfortunately another of the drives encountered IO errors during the resync process and failed causing the array to go out to lunch. The resync process was attempted two times with the same result. Fortunately I am able (at least for now) to assemble the array with the existing 8/10 arrays and am able to fsck, mount via ldiskfs and lustre and am in the process of copying files from the vulnerable OST to a backup location using lfs find --obd target /scratch|cpio -puvdm ... My question is: What is the best way to restore the OST? Obviously I will need to somehow restore the array to its full 8+2 configuration. Whether we need to start from scratch or use some other means, that is our first priority. But I would like to make the recovery as transparent to the users as possible. One possible option that we are considering is simply removing the OST from Lustre, fixing the array and copying the recovered files to a newly created OST (not desirable). Another is to fix the OST (not remove it from Lustre), delete the files that exist and then copy the recovered files back. The problem that comes to mind in either scenario is what happens if a file is part of a striped file? Does it lose its affinity with the rest of the stripe? Another scenario that we are wondering about is if we mount the OST via ldiskfs and copy everything on the file system to a backup location, fix the array maintaining the same tunefs.lustre configuration, then move everything back using the same method as it was backed up, will the files be presented to lustre (mds and clients) just as it was before when mounted as a lustre file system? Thanks in advance for you advise and help. Joe Mervini Sandia National Laboratories ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Virtualization and Lustre
Has there been any testing or conclusions regarding the use of virtualization and Lustre, or is this even possible considering how Lustre is coded? I've gotten used to the idea of virtualization for all our other servers, where it is great to know we can mount the image on another host very quickly if a hardware problem brings down a machine, and it seems the same would be nice with Lustre... Tyler Hawes Lit Post www.litpost.com ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss