Re: [Lustre-discuss] lock timeouts and OST evictions on 1.4 server - 1.6 client system.
Hello! On Feb 10, 2009, at 12:46 PM, Simon Kelley wrote: > If, by "the complete event" you mean the "received cancel for > unknown cookie", there's not much more to tell. Grepping through the > last month's server logs shows that there are bursts of typically > between 3 and 7 messages, at the same time and from the same client. > After a gap, the same thing but from a different client. The number > can be as low a one, and up to ten. They look to be related to > client workload, at a guess. Ok, so you do not see a pattern of this unknown cookie message followed by eviction in some time like 100 seconds? That's what my question about. Bye, Oleg ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] lock timeouts and OST evictions on 1.4 server - 1.6 client system.
Oleg Drokin wrote: > Hello! > > On Feb 10, 2009, at 12:11 PM, Simon Kelley wrote: We are also seeing some userspace file operations fail with the error "No locks available". These don't generate any logging on the client so I don't have exact timing. It's possible that they are associated with further "### lock callback timer expired" server logs. >>> This error code typically means an application attempting to do >>> some i/ o and Lustre >>> has no lock for the i/o area for some reason anymore (it is >>> normally obtained >>> once read or write path is entered), and that could be related to >>> evictions too >>> (locks are revoked at eviction time). >> I should have mentioned that we are also seeing many errors of the >> form "LustreError: 19842:0:(ldlm_lockd.c:1078:ldlm_handle_cancel()) >> received cancel for unknown lock cookie." Checking back, these would >> seem to pre-date the introduction of 1.6 clients and even after we >> upgraded clients I can see them associated with both 1.4 and 1.6 >> clients. They may indicate something else relevant about the >> filesystems or workload. > > Hm, that means clients hold some locks that server does not believe > thy have, which is pretty strange. > Or it just does not recognize the lock released by client and later > releases the client. > If you have a complete kernel log of the event, that might be useful > to see the sequence of events. If, by "the complete event" you mean the "received cancel for unknown cookie", there's not much more to tell. Grepping through the last month's server logs shows that there are bursts of typically between 3 and 7 messages, at the same time and from the same client. After a gap, the same thing but from a different client. The number can be as low a one, and up to ten. They look to be related to client workload, at a guess. Picking a few events and looking at the client logfile for the same time gives absolutely nothing at all. > I assume you do not have flaky network and your clients do not > reconnect all the time to the servers > with messages in logs like 'changed handle from X to Y; copying, but > this may foreshadow disaster', > which would be a different bug no longer present in 1.6.6, too. > None of that applies. Simon. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] lock timeouts and OST evictions on 1.4 server - 1.6 client system.
Hello! On Feb 10, 2009, at 12:11 PM, Simon Kelley wrote: >>> We are also seeing some userspace file operations fail with the >>> error >>> "No locks available". These don't generate any logging on the >>> client so >>> I don't have exact timing. It's possible that they are associated >>> with >>> further "### lock callback timer expired" server logs. >> This error code typically means an application attempting to do >> some i/ o and Lustre >> has no lock for the i/o area for some reason anymore (it is >> normally obtained >> once read or write path is entered), and that could be related to >> evictions too >> (locks are revoked at eviction time). > I should have mentioned that we are also seeing many errors of the > form "LustreError: 19842:0:(ldlm_lockd.c:1078:ldlm_handle_cancel()) > received cancel for unknown lock cookie." Checking back, these would > seem to pre-date the introduction of 1.6 clients and even after we > upgraded clients I can see them associated with both 1.4 and 1.6 > clients. They may indicate something else relevant about the > filesystems or workload. Hm, that means clients hold some locks that server does not believe thy have, which is pretty strange. Or it just does not recognize the lock released by client and later releases the client. If you have a complete kernel log of the event, that might be useful to see the sequence of events. I assume you do not have flaky network and your clients do not reconnect all the time to the servers with messages in logs like 'changed handle from X to Y; copying, but this may foreshadow disaster', which would be a different bug no longer present in 1.6.6, too. Bye, Oleg ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] lock timeouts and OST evictions on 1.4 server - 1.6 client system.
Oleg Drokin wrote: > > What would be useful here is if you can enable dlm tracing (echo > +dlm_trace >/proc/sys/lnet/debug) > on some of those 1.6 nodes (also if you are running with no debug > enabled at all, > also enable rpc_trace and info levels) and also enable "dump on > eviction" feature. > (echo 1 >/proc/sys/lustre/dump_on_eviction). > Then when next eviction happens, there would be some useful debug data > dumped on the client, > that you can attach to a bugzilla bug along with server-side eviction > message (processed > with "lctl dl" command first). OK, will do. The main problem is reproducing the error: our users have unreasonably insisted that we run their jobs using known-good 1.4 clients and even if I grab their code to run on isolated test nodes _most_ runs are fine. > >> We are also seeing some userspace file operations fail with the error >> "No locks available". These don't generate any logging on the client >> so >> I don't have exact timing. It's possible that they are associated with >> further "### lock callback timer expired" server logs. > > This error code typically means an application attempting to do some i/ > o and Lustre > has no lock for the i/o area for some reason anymore (it is normally > obtained > once read or write path is entered), and that could be related to > evictions too > (locks are revoked at eviction time). I should have mentioned that we are also seeing many errors of the form "LustreError: 19842:0:(ldlm_lockd.c:1078:ldlm_handle_cancel()) received cancel for unknown lock cookie." Checking back, these would seem to pre-date the introduction of 1.6 clients and even after we upgraded clients I can see them associated with both 1.4 and 1.6 clients. They may indicate something else relevant about the filesystems or workload. Cheers, Simon. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] OST went down running Lustre 1.6.6
On Mon, 2009-02-09 at 13:39 -0500, Brian Stone wrote: > > Some clients are still evicted, but not all. Why would the recovery > complete if all clients did not reconnect? Recovery can't wait indefinitely for all clients to connect. You could wind up with a recovery that never completes if it does. After a timeout, if all clients don't connect, recovery is aborted and the target proceeds to the completion state so that it can respond to new requests. b. signature.asc Description: This is a digitally signed message part ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] lock timeouts and OST evictions on 1.4 server - 1.6 client system.
Hello! On Feb 10, 2009, at 5:17 AM, Simon Kelley wrote: > > Feb 9 14:05:30 sf-2-3-10 kernel: LustreError: 11-0: an error occurred > while communicating with 172.31.96...@tcp. The obd_ping operation > failed > with -107 > Feb 9 14:05:30 sf-2-3-10 kernel: LustreError: Skipped 12 previous > similar messages > Feb 9 14:05:30 sf-2-3-10 kernel: Lustre: > OSC_sf2-sfs2.internal.sanger.ac.uk_sf2-sfs-ost495_MNT_client_tcp- > 81021f897800: > > Connection to service sf2-sfs-ost495 via nid 172.31.96...@tcp was > lost; > in progress operations using this service will wait for recovery to > complete. > Feb 9 14:05:30 sf-2-3-10 kernel: Lustre: Skipped 4 previous similar > messages > Feb 9 14:05:30 sf-2-3-10 kernel: LustreError: 167-0: This client was > evicted by sf2-sfs-ost495; in progress operations using this service > will fail. > What would be useful here is if you can enable dlm tracing (echo +dlm_trace >/proc/sys/lnet/debug) on some of those 1.6 nodes (also if you are running with no debug enabled at all, also enable rpc_trace and info levels) and also enable "dump on eviction" feature. (echo 1 >/proc/sys/lustre/dump_on_eviction). Then when next eviction happens, there would be some useful debug data dumped on the client, that you can attach to a bugzilla bug along with server-side eviction message (processed with "lctl dl" command first). > We are also seeing some userspace file operations fail with the error > "No locks available". These don't generate any logging on the client > so > I don't have exact timing. It's possible that they are associated with > further "### lock callback timer expired" server logs. This error code typically means an application attempting to do some i/ o and Lustre has no lock for the i/o area for some reason anymore (it is normally obtained once read or write path is entered), and that could be related to evictions too (locks are revoked at eviction time). Bye, Oleg ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] problem mounting two different lustre instances
On Tue, Feb 10, 2009 at 1:45 PM, Johann Lombardi wrote: > 38 = MDS_CONNECT. The client tries to reach the MDT via 192.168.22...@o2ib, > whereas i think it should use tcp to access the lustre filesystem of the > remote > cluster, is my understanding of your configuration correct? That's right, it should use 141.34.228...@tcp0 instead. Regards, Götz -- AL I:40: Do what thou wilt shall be the whole of the Law. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] problem mounting two different lustre instances
On Mon, Feb 09, 2009 at 04:52:20PM +0100, G?tz Waschk wrote: > Hello everyone, > . > My client has this in modprobe.conf: > options lnet networks=o2ib,tcp > I'm trying to mount the remote network with > mount -t lustre 141.34.228...@tcp0:/atlas /scratch/lustre-1.6/atlas > and the command just hangs, the error is this: > LustreError: 2887:0:(events.c:66:request_out_callback()) @@@ type 4, > status -113 r...@0100dfc2ac00 x7/t0 The outgoing message failed with -113 (EHOSTUNREACH). What does "lctl list_nids" say on the client? Also, please: echo +neterror > /proc/sys/lnet/printk So that more network errors would go onto console. Isaac ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] problem mounting two different lustre instances
On Tue, Feb 10, 2009 at 09:38:13AM +0100, Götz Waschk wrote: > Lustre: Added LNI 192.168.224...@o2ib [8/64] > Lustre: Added LNI 141.34.216...@tcp [8/256] > Lustre: Accept secure, port 988 > Lustre: Lustre Client File System; http://www.lustre.org/ > Lustre: 2887:0:(o2iblnd_cb.c:2704:kiblnd_cm_callback()) > 192.168.22...@o2ib: ROUTE ERROR -22 > Lustre: 2887:0:(o2iblnd_cb.c:2118:kiblnd_peer_connect_failed()) > Deleting messages for 192.168.22...@o2ib: connection failed > LustreError: 2887:0:(events.c:66:request_out_callback()) @@@ type 4, > status -113 r...@010037eeac00 x7/t0 > o38->atlas-mdt_u...@192.168.22.32@o2ib:12/10 lens 240/400 e 0 to 5 > dl 1234255029 ref 2 fl Rpc:/0/0 rc 0/0 38 = MDS_CONNECT. The client tries to reach the MDT via 192.168.22...@o2ib, whereas i think it should use tcp to access the lustre filesystem of the remote cluster, is my understanding of your configuration correct? Johann ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] problem mounting two different lustre instances
On Tue, Feb 10, 2009 at 1:44 AM, Isaac Huang wrote: > On Mon, Feb 09, 2009 at 04:52:20PM +0100, G?tz Waschk wrote: >> My client has this in modprobe.conf: >> options lnet networks=o2ib,tcp >> I'm trying to mount the remote network with >> mount -t lustre 141.34.228...@tcp0:/atlas /scratch/lustre-1.6/atlas >> and the command just hangs, the error is this: >> LustreError: 2887:0:(events.c:66:request_out_callback()) @@@ type 4, >> status -113 ...@0100dfc2ac00 x7/t0 Hi Isaac, > The outgoing message failed with -113 (EHOSTUNREACH). What does "lctl > list_nids" say on the client? on that client, the output is: 192.168.224...@o2ib 141.34.216...@tcp > Also, please: > echo +neterror > /proc/sys/lnet/printk > So that more network errors would go onto console. OK, after the next mount attempt I have this in the log now: Lustre: OBD class driver, http://www.lustre.org/ Lustre Version: 1.6.6 Build Version: 1.6.6-1970010101-PRISTINE-.usr.src.redhat.BUILD.lustre-1.6.6.kernel-2.6.9-78.0.13.ELsmp Lustre: Added LNI 192.168.224...@o2ib [8/64] Lustre: Added LNI 141.34.216...@tcp [8/256] Lustre: Accept secure, port 988 Lustre: Lustre Client File System; http://www.lustre.org/ Lustre: 2887:0:(o2iblnd_cb.c:2704:kiblnd_cm_callback()) 192.168.22...@o2ib: ROUTE ERROR -22 Lustre: 2887:0:(o2iblnd_cb.c:2118:kiblnd_peer_connect_failed()) Deleting messages for 192.168.22...@o2ib: connection failed LustreError: 2887:0:(events.c:66:request_out_callback()) @@@ type 4, status -113 r...@010037eeac00 x7/t0 o38->atlas-mdt_u...@192.168.22.32@o2ib:12/10 lens 240/400 e 0 to 5 dl 1234255029 ref 2 fl Rpc:/0/0 rc 0/0 Lustre: 9263:0:(client.c:1199:ptlrpc_expire_one_request()) @@@ network error (sent at 1234255024, 0s ago) r...@010037eeac00 x7/t0 o38->atlas-mdt_u...@192.168.22.32@o2ib:12/10 lens 240/400 e 0 to 5 dl 1234255029 ref 1 fl Rpc:/0/0 rc 0/0 Lustre: Request x7 sent from atlas-MDT-mdc-0107fc2ee400 to NID 192.168.22...@o2ib 0s ago has timed out (limit 5s). Lustre: 9264:0:(import.c:410:import_select_connection()) atlas-MDT-mdc-0107fc2ee400: tried all connections, increasing latency to 5s Lustre: 2887:0:(o2iblnd_cb.c:2704:kiblnd_cm_callback()) 192.168.22...@o2ib: ROUTE ERROR -22 Lustre: 2887:0:(o2iblnd_cb.c:2118:kiblnd_peer_connect_failed()) Deleting messages for 192.168.22...@o2ib: connection failed LustreError: 2887:0:(events.c:66:request_out_callback()) @@@ type 4, status -113 r...@01080325a400 x10/t0 o38->atlas-mdt_u...@192.168.22.32@o2ib:12/10 lens 240/400 e 0 to 5 dl 1234255054 ref 2 fl Rpc:/0/0 rc 0/0 Lustre: 9263:0:(client.c:1199:ptlrpc_expire_one_request()) @@@ network error (sent at 1234255049, 0s ago) r...@01080325a400 x10/t0 o38->atlas-mdt_u...@192.168.22.32@o2ib:12/10 lens 240/400 e 0 to 5 dl 1234255054 ref 1 fl Rpc:/0/0 rc 0/0 Lustre: Request x10 sent from atlas-MDT-mdc-0107fc2ee400 to NID 192.168.22...@o2ib 0s ago has timed out (limit 5s). Regards, Götz -- AL I:40: Do what thou wilt shall be the whole of the Law. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] speedy server shutdown
On Feb 08, 2009 23:28 -0500, Robin Humble wrote: > when shutting down our OSS's and then MDS's we often wait 330s for each > set of umount's to finish eg. > Feb 2 03:20:06 xemds2 kernel: Lustre: Mount still busy with 68 refs, > waiting for 330 secs... > Feb 2 03:20:11 xemds2 kernel: Lustre: Mount still busy with 68 refs, > waiting for 325 secs... > ... > is there a way to speed this up? Please search bugzilla for this, I think there was a bug fixed in more recent versions. > we're interested in the (perhaps unusual) case where all clients are gone > because the power has failed, and the Lustre servers are running on UPS > and need to be shut down ASAP. > > the tangible reward for a quick shutdown is that we can buy a lower > capacity (cheaper) UPS if we can reliably and cleanly shutdown all the > Lustre servers in <10mins, and preferably <3 minutes. if we're tweaking > timeouts to do this then hopefully we can tweak them just before the > shutdown and avoid running short timeouts in normal operation. > > I'm probably missing something obvious, but I have looked through a > bunch of /proc/{fs/lustre,sys/lnet,sys/lustre} entries and the > Operations Manual and I can't actually see where the default 330s comes > from... ??? > it seems to be quite repeatable for both OSS's and MDS's. > > we're using Lustre 1.6.6 or 1.6.5.1 on servers and patchless 1.6.4.3 on > clients with x86_64 RHEL 5.2 everywhere. > thanks for any help! > > cheers, > robin > ___ > Lustre-discuss mailing list > Lustre-discuss@lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] (no subject)
subscribe ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] subscribe
___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] OST went down running Lustre 1.6.6
Hi, Running Lustre 1.6.6 on SLES 10 SP2 and an OST crashed due to a SCSI failure. The recovery completed, but all the clients did not reconnect. I.E. status: COMPLETE recovery_start: 1233712363 recovery_duration: 320 completed_clients: 2765/5892 replayed_requests: 0 last_transno: 45914950 Some clients are still evicted, but not all. Why would the recovery complete if all clients did not reconnect? Thanks, Brian Stone ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] lock timeouts and OST evictions on 1.4 server - 1.6 client system.
Were seeing problems which seem to be lock-related and result in data loss. This is a fairly low probability event: it's happening on a ~1000 core compute farm which is heavily loaded and the frequency is of the order of tens of failures an hour. The clients are running kernel 2.6.22 and lustre 1.6.6 from Sun. The servers are HP SFS and running lustre 1.4.11. The mixed versions are deliberate: we're in the process of adding servers running 1.6.6 but none are currently in use: all servers are 1.4.11. 1.4.11 clients don't see problems and there are some old 1.4.11 clients left still using these filesystems at teh same time as the new 1.6.6 ones. On the server (OST), and event typically looks like this: 2009/02/09 14:05:10 kerni sf2-sfs33: LustreError: 0:0:(ldlm_lockd.c:227:waiting_locks_callback()) ### lock callback timer expired: evicting client 4cdd29e6-8b9e-7e81-5d72-2a055f0e2...@net_0x2ac1f6eee_uuid nid 172.31.110@tcp ns: filter-sf2-sfs-ost495_UUID lock: 010048ae0100/0x5fd2cf129ee3e5bb lrc: 1/0 , 0 mode: PW/PW res: 1542868/0 rrc: 4 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 20 remote: 0xc8fcb3ab0d8d expref: 386 pid: 16626 2009/02/09 14:05:30 kerni sf2-sfs33: LustreError: 16577:0:(ldlm_lib.c:1364:target_send_reply_msg()) @@@ processing error (-107) r...@010073bbce00 x2138818/t0 o400->@:-1 lens 64/0 ref 0 fl Interpret:/0/0 rc -107/0 and on the client at the same time: Feb 9 14:05:30 sf-2-3-10 kernel: LustreError: 11-0: an error occurred while communicating with 172.31.96...@tcp. The obd_ping operation failed with -107 Feb 9 14:05:30 sf-2-3-10 kernel: LustreError: Skipped 12 previous similar messages Feb 9 14:05:30 sf-2-3-10 kernel: Lustre: OSC_sf2-sfs2.internal.sanger.ac.uk_sf2-sfs-ost495_MNT_client_tcp-81021f897800: Connection to service sf2-sfs-ost495 via nid 172.31.96...@tcp was lost; in progress operations using this service will wait for recovery to complete. Feb 9 14:05:30 sf-2-3-10 kernel: Lustre: Skipped 4 previous similar messages Feb 9 14:05:30 sf-2-3-10 kernel: LustreError: 167-0: This client was evicted by sf2-sfs-ost495; in progress operations using this service will fail. It gets worse: the same client s few minutes later saw this on another OST, same filesystem. 2009/02/09 14:08:30 kerni sf2-sfs34: LustreError: 0:0:(ldlm_lockd.c:227:waiting_locks_callback()) ### lock callback timer expired: evicting client 4cdd29e6-8b9e-7e81-5d72-2a055f0e2...@net_0x2ac1f6eee_uuid nid 172.31.110@tcp ns: filter-sf2-sfs-ost496_UUID lock: 01006b8bfa40/0x900da6275f064529 lrc: 1/0 , 0 mode: PW/PW res: 1542328/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 20 remote: 0xc8fcb3ab1478 expref: 172 pid: 16089 2009/02/09 14:08:30 kerni sf2-sfs33: LustreError: 0:0:(ldlm_lockd.c:227:waiting_locks_callback()) ### lock callback timer expired: evicting client f3275d1e-1360-e139-e71c-44b283d7e...@net_0x2ac1f665d_uuid nid 172.31.102...@tcp ns: filter-sf2-sfs-ost495_UUID lock: 01005a810040/0x5fd2cf129ee3e9f1 lrc: 1/0 , 0 mode: PW/PW res: 1542868/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 20 remote: 0xa899b6e20a06a83e expref: 316 pid: 16454 2009/02/09 14:08:30 kerni sf2-sfs33: LustreError: 16335:0:(ldlm_lockd.c:606:ldlm_server_completion_ast()) ### enqueue wait took 399491082us from 1234188111 ns: filter-sf2-sfs-ost495_UUID lock: 010064a638c0/0x5fd2cf129ee3ea68 lrc: 2/0 , 0 mode: PW/PW res: 1542868/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 20 remote: 0xc8423e042b195e36 expref: 501 pid: 16493 2009/02/09 14:08:32 storage w sf2-sfs1 : Warning: event log (/var/hpls/evlog/eventlog) size is 77760824 2009/02/09 14:08:50 kerni sf2-sfs34: LustreError: 16253:0:(ldlm_lib.c:1364:target_send_reply_msg()) @@@ processing error (-107) r...@0100b3ea4a00 x2150055/t0 o400->@:-1 lens 64/0 ref 0 fl Interpret:/0/0 rc -107/0 2009/02/09 14:09:02 kerni sf2-sfs33: LustreError: 16566:0:(ldlm_lib.c:1364:target_send_reply_msg()) @@@ processing error (-107) r...@01006b7b3200 x5180156/t0 o400->@:-1 lens 64/0 ref 0 fl Interpret:/0/0 rc -107/0 whilst the client suffered an assertion failure and LBUG. Feb 9 14:08:50 sf-2-3-10 kernel: LustreError: 11-0: an error occurred while communicating with 172.31.96...@tcp. The obd_ping operation failed with -107 Feb 9 14:08:50 sf-2-3-10 kernel: Lustre: OSC_sf2-sfs2.internal.sanger.ac.uk_sf2-sfs-ost496_MNT_client_tcp-81021f897800: Connection to service sf2-sfs-ost496 via nid 172.31.96...@tcp was lost; in progress operations using this service will wait for recovery to complete. Feb 9 14:08:50 sf-2-3-10 kernel: LustreError: 167-0: This client was evicted by sf2-sfs-ost496; in progress operations using this service will fail. Feb 9 14:08:50 sf-2-3-10 kernel: LustreError: 29564:0:(import.c:241:ptlrpc_invalidate_import()) sf2-sfs-ost495_UUID: rc = -110 wai