Please file a bug on oss.oracle.com/bugzilla. We've made many fixes in mastery/recovery since 1.2.0. We can add a test to check this issue too.
Jonathan Steinert wrote: > Anyone aware of this problem, and if so is there a fix available? > > I have two nodes, alice and bob. On both I have a shared ocfs2 mount at > /ocfs2. The FS appears to mount and work perfectly fine. > > Now on alice I take out an exclusive lock on /dlm/foo/bar and block the > process forever. Next I start a loop on bob that tries to take out the > same lock (trylock exclusive mode) once each second, which fails properly. > > Now, I unplug alice completely... machine is off. The trylock process on > bob now hangs permanently, ten seconds pass. The following appears on my > console for bob: > > > (0,0):o2net_idle_timer:1293 connection to node kano (num 0) at > 10.10.0.2:7777 has been idle for 10 seconds, shutting it down. > (0,0):o2net_idle_timer:1304 here are some times that might help debug > the situation: (tmr 1144294337.323052 now 1144294347.317365 dr > 1144294337.323045 adv 1144294337.323053:1144294337.323053 func > (7b10fddd:505) 1144294324.934836:1144294324.934838) > (2179,0):o2net_set_nn_state:409 no longer connected to node kano (num 0) > at 10.10.0.2:7777 > (2492,0):dlm_send_remote_lock_request:264 ERROR: status = -112 > (2492,0):dlm_send_remote_lock_request:264 ERROR: status = -107 > (2492,0):dlm_send_remote_lock_request:264 ERROR: status = -107 > > > The status = -107 message prints approx once every 100ms now forever, > and a few seconds after this all starts scrolling I get: > > (2493,0):ocfs2_replay_journal:1180 Recovering node 0 from slot 0 on > device (8,2) > (2492,0):dlm_send_remote_lock_request:264 ERROR: status = -107 > kjournald starting. Commit interval 5 seconds > > In the middle of all the scrolling. The trylock process on bob is > permanently hung and the -107 message continues to scroll. > > I have tried using the subversion ocfs2/trunk modules under 2.6.16 > (changed to use mutexes), the modules that come with mainline 2.6.16 and > the mainline 2.6.16.1. All of these seem to act the same. > > OCFS2 Node Manager, DLM, DLMFS all v 1.3.3 > OCFS2-Tools v 1.2.0 > > The bugreports I've found related to this problem say I need to upgrade > to -Tools ver 1.0.3, which I think I'm a little past. (Could be wrong) > > Thanks, > Jonathan Steinert > > _______________________________________________ > Ocfs2-users mailing list > [email protected] > http://oss.oracle.com/mailman/listinfo/ocfs2-users > _______________________________________________ Ocfs2-users mailing list [email protected] http://oss.oracle.com/mailman/listinfo/ocfs2-users
