On 2015/11/25 13:57, Joe Smith wrote:
In retrospect I should've (might still be able to one of these days)
open sourced the tool we used to migrate mesos masters. That said,
overall the process suggested so far is correct.

To validate the new host joining, you can tail the master log file for
"Successfully joined the Paxos group
<https://github.com/apache/mesos/blob/3539b7a0e15b594148308319bf052d28b1429b98/src/log/recover.cpp#L578>"
to confirm the replicated log recovery has completed for that machine.
Once that happens, feel free to move onto adding the next host.

Why does the upgraded master after reboot have to recover?
By my understanding the master after reboot will first *detect* who is the elected leader, if there is a valid one, it will *contender* by grabbing a new id to "Joining the ZK group".

https://github.com/apache/mesos/blob/3539b7a0e15b594148308319bf052d28b1429b98/src/zookeeper/contender.cpp#L147


On Tue, Nov 24, 2015 at 9:40 PM, Du, Fan <fan...@intel.com
<mailto:fan...@intel.com>> wrote:



    On 2015/11/24 9:47, Chengwei Yang wrote:

        Hi all,

        We're using mesos in product on CentOS 6 and plan to upgrade
        CentOS to 7.1, to
        avoid affect any tasks running on mesos. We're about to replace all
        mesos-masters in fly.

        The procedure listed below:

        0. 3 mesos-masters running on CentOS 6
        1. shutdown 1 mesos-master(CentOS 6) and bring up 1
        mesos-master(CentOS 7)
             wait the new master synced for some time(is there any
        simple way to know when?)


    Login the upgraded master mesos web page, it will redirect you to
    the Leader master
    if upgraded master join the cluster successfully.

    Or a script friendly way, you can query zookeeper server the role of
    the upgraded
    master server by:

    echo stat | nc UPGRADED_MASTER_ZOOKEEPER_IP 2181

    it will report something like this:

    Latency min/avg/max: 0/0/6
    Received: 105
    Sent: 105
    Connections: 2
    Outstanding: 0
    Zxid: 0x30000001c
    Mode: *follower*
    Node count: 13



        2. repeat step 1

        NOTE: we plan to shutdown non-leader first, and shutdown the
        leader(CentOS 6)
        last.

        Can we do this in such way? Or any other better suggestions?


Reply via email to