Re: [Lustre-discuss] MDS can't recover OSTs

2011-05-02 Thread Johann Lombardi
On Thu, Apr 28, 2011 at 12:47:02PM -0400, Charles Taylor wrote:
 Apr 28 11:37:54 crnmds kernel: Lustre: 31982:0:(import.c: 
 1137:ptlrpc_connect_interpret()) recovery of crn-OST0013_UUID on  
 10.13.24.92@o2ib failed (-16)
 Apr 28 11:37:54 crnmds kernel: Lustre: 31982:0:(import.c: 
 1137:ptlrpc_connect_interpret()) recovery of crn-OST0007_UUID on  
 10.13.24.91@o2ib failed (-16)

Both OST0007  OST0013 return EBUSY. Any messages or watchdogs in the OSS logs 
(i.e. 10.13.24.9{1,2}@o2ib)?

Johann

-- 
Johann Lombardi
Whamcloud, Inc.
www.whamcloud.com
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] MDS can't recover OSTs

2011-04-28 Thread Charles Taylor

We had a RAID array barf this morning resulting in some OST corruption  
which appeared to be successfully repaired with a combination of fsck  
and ll_recover_lost_found_objs.   The OSTs mounted OK but the MDS  
can't seem to recover its connection to two of the OSTs as we are  
seeing a continuing stream of the following in the MDS syslog.

Apr 28 11:37:54 crnmds kernel: Lustre: 31983:0:(recover.c: 
67:ptlrpc_initiate_recovery()) crn-OST0013_UUID: starting recovery
Apr 28 11:37:54 crnmds kernel: Lustre: 31983:0:(import.c: 
608:ptlrpc_connect_import()) 810117426000 crn-OST0013_UUID:  
changing import state from DISCONN to CONNECTING
Apr 28 11:37:54 crnmds kernel: Lustre: 31983:0:(import.c: 
470:import_select_connection()) crn-OST0013-osc: connect to NID  
10.13.24.92@o2ib last attempt 22689204132
Apr 28 11:37:54 crnmds kernel: Lustre: 31983:0:(import.c: 
544:import_select_connection()) crn-OST0013-osc: import  
810117426000 using connection 10.13.24.92@o2ib/10.13.24.92@o2ib
Apr 28 11:37:54 crnmds kernel: Lustre: 31982:0:(import.c: 
1091:ptlrpc_connect_interpret()) 810117426000 crn-OST0013_UUID:  
changing import state from CONNECTING to DISCONN
Apr 28 11:37:54 crnmds kernel: Lustre: 31982:0:(import.c: 
1137:ptlrpc_connect_interpret()) recovery of crn-OST0013_UUID on  
10.13.24.92@o2ib failed (-16)
Apr 28 11:37:54 crnmds kernel: Lustre: 31982:0:(import.c: 
1091:ptlrpc_connect_interpret()) 81012e50d000 crn-OST0007_UUID:  
changing import state from CONNECTING to DISCONN
Apr 28 11:37:54 crnmds kernel: Lustre: 31982:0:(import.c: 
1137:ptlrpc_connect_interpret()) recovery of crn-OST0007_UUID on  
10.13.24.91@o2ib failed (-16)

It seems that we never see a  'oscc recovery finished' message on  
crnmds for OST0007 or OST0013.

We have not seen this problem before so we are trying to figure out  
how to get the MDT reconnected to these two OSTs.

Any one else been through this before?

Thanks,

Charlie Taylor
UF HPC Center



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss