Hi,

I'm happily progressing toward the successful setup of my two nodes samba cluster : cman, qdisk, clvm, gfs2, ctdb, samba, winbind, ad.
And now, I'm in testing phase.

When my cluster is up and running, I can transfer each ip address toward on node or the other, seamlessly.
They can fence each other.

But I still have one big issue : though they have been setup as clones, they don't behave identically : when shutting down node 1, node 0 takes over every part of ctdb setup (ip, recmaster, services). But when I stop ctdb daemon on node 1, though ctdb node 0 correctly stops its children daemons (nmbd, smbd and winbind) and kills itself, node 1 claims :

ctdb_recovery_lock: Failed to get recovery lock on '/ctdb/.ctdb.lock'

(This directory is clvm + gfs2 shared, writable and correctly accessible from both nodes)

This leads node 1 to get banned.
Then, (I guess), when being unbanned, reelection occurs, but I get :

Recmaster node 1 no longer available. Force reelection

I suppose that node 1 can't become recmaster as it can not get the recovery lock. But there's no way I see why this node claims it can take this lock.

I don't know if this may help, but :
- I removed the lock file, and restarting ctdb recreates it correctly
- Every process is ran as root, who can obviously write in this dir
- I don't know if it is correct, but this file weights zero byte?

Waiting for your advice, I'm heading to reading the source code, in the hope I may understand what's wrong.

--
Nicolas Ecarnot
--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba

Reply via email to