On 04/08/2011 08:01 AM, Werner Flamme wrote: > Joel, I think I encountered otherwise :-( > > In our OCFS2 cluster there are up to 15 active nodes. 7 of them were > running yesterday (2 with Oracle Linux, 5 with SLES 11 SP1 + SLE HAE > SP1). When applying the last patches of the SLE HAE, the patched nodes > talked dlm 1.1 and did not fallback. They silently unmunted three > volumes, the fourth volume stayed connected (there was no data on it). > In the logs, I found entries like > > dlm_query_join_proto_check:734 Node 5 wanted to join with DLM locking > protocol 1.0, but we have 1.1, disallowing > o2net: connection to node vocnod9 (num 5) at 141.65.171.51:7777 > shutdown, state 8 > o2net: no longer connected to node vocnod9 (num 5) at 141.65.171.51:7777 > > and node 5 lost access to at least one volume straight after this. > > Maybe a specialty of SUSE, I don't know. I could not get the nodes to > communicate with the rest of the cluster, I had to undo all of the > patches provided from HAE SP1-update repo and to reboot before it worked > again. Maybe the libdlm3 package was the culprit. I opened a support > case at Novell and will report back what they say...
Yes, this is a bug and we have a fix for this headed to mainline. http://oss.oracle.com/pipermail/ocfs2-devel/2011-April/007996.html _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users