Re: [Lustre-discuss] MDS can't recover OSTs
On Thu, Apr 28, 2011 at 12:47:02PM -0400, Charles Taylor wrote: Apr 28 11:37:54 crnmds kernel: Lustre: 31982:0:(import.c: 1137:ptlrpc_connect_interpret()) recovery of crn-OST0013_UUID on 10.13.24.92@o2ib failed (-16) Apr 28 11:37:54 crnmds kernel: Lustre: 31982:0:(import.c: 1137:ptlrpc_connect_interpret()) recovery of crn-OST0007_UUID on 10.13.24.91@o2ib failed (-16) Both OST0007 OST0013 return EBUSY. Any messages or watchdogs in the OSS logs (i.e. 10.13.24.9{1,2}@o2ib)? Johann -- Johann Lombardi Whamcloud, Inc. www.whamcloud.com ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] MDS can't recover OSTs
We had a RAID array barf this morning resulting in some OST corruption which appeared to be successfully repaired with a combination of fsck and ll_recover_lost_found_objs. The OSTs mounted OK but the MDS can't seem to recover its connection to two of the OSTs as we are seeing a continuing stream of the following in the MDS syslog. Apr 28 11:37:54 crnmds kernel: Lustre: 31983:0:(recover.c: 67:ptlrpc_initiate_recovery()) crn-OST0013_UUID: starting recovery Apr 28 11:37:54 crnmds kernel: Lustre: 31983:0:(import.c: 608:ptlrpc_connect_import()) 810117426000 crn-OST0013_UUID: changing import state from DISCONN to CONNECTING Apr 28 11:37:54 crnmds kernel: Lustre: 31983:0:(import.c: 470:import_select_connection()) crn-OST0013-osc: connect to NID 10.13.24.92@o2ib last attempt 22689204132 Apr 28 11:37:54 crnmds kernel: Lustre: 31983:0:(import.c: 544:import_select_connection()) crn-OST0013-osc: import 810117426000 using connection 10.13.24.92@o2ib/10.13.24.92@o2ib Apr 28 11:37:54 crnmds kernel: Lustre: 31982:0:(import.c: 1091:ptlrpc_connect_interpret()) 810117426000 crn-OST0013_UUID: changing import state from CONNECTING to DISCONN Apr 28 11:37:54 crnmds kernel: Lustre: 31982:0:(import.c: 1137:ptlrpc_connect_interpret()) recovery of crn-OST0013_UUID on 10.13.24.92@o2ib failed (-16) Apr 28 11:37:54 crnmds kernel: Lustre: 31982:0:(import.c: 1091:ptlrpc_connect_interpret()) 81012e50d000 crn-OST0007_UUID: changing import state from CONNECTING to DISCONN Apr 28 11:37:54 crnmds kernel: Lustre: 31982:0:(import.c: 1137:ptlrpc_connect_interpret()) recovery of crn-OST0007_UUID on 10.13.24.91@o2ib failed (-16) It seems that we never see a 'oscc recovery finished' message on crnmds for OST0007 or OST0013. We have not seen this problem before so we are trying to figure out how to get the MDT reconnected to these two OSTs. Any one else been through this before? Thanks, Charlie Taylor UF HPC Center ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss