Re: After restarting cluster task disappeared

craig w Fri, 09 Oct 2015 05:31:11 -0700

I'm not sure about #3. I have seen things go awry when restarting the whole
cluster. When doing an upgrade from mesos 0.23.0 to 0.24.1, I restarted all
of the mesos-masters. Waited a few moments for a leader to be elected, then
restarted the slaves. When I went back to look at Marathon all of the tasks
were being redeployed, as though they had all been killed off for some
reason. That wasn't what I expected to happen since the upgrade was suppose
to be as simple as install and restart. Perhaps you're experiencing a
similar issue?


On Fri, Oct 9, 2015 at 8:25 AM, Badal Naik <[email protected]> wrote:

> Any idea about #1 ?
>
> Any one has experienced #3 ?
>
> On 09-Oct-2015, at 5:53 pm, craig w <[email protected]> wrote:
>
> With regards to item #2, I saw the same issue. it's been fixed in mesos
> 0.25 (release candidates are out now), see
> https://issues.apache.org/jira/browse/MESOS-3282.
>
> On Fri, Oct 9, 2015 at 8:16 AM, Badal Naik <[email protected]> wrote:
>
>> Hello Mesos-Users,
>>
>> I have set up 3 node mess cluster with ubuntu 14.04. i have started
>> zookeeper,Mesos and marathon. Every thing working fine expect three things.
>>
>> 1) When i restart the whole cluster mesos does not show completed tasks.
>> is it expected behaviour? if not what i should do?
>>
>> 2) in mesos web ui i’m not able to see
>> staged/started/finished/killed/failed/lost task numbers even when tasks are
>> running.
>>
>> 3) Every zookeeper instance throws this exception regularly:
>>
>>  2015-10-09 17:27:26,302 [myid:3] - WARN
>> [SendWorker:1:QuorumCnxManager$SendWorker@679] - Interrupted while
>> waiting for message on queue
>> java.lang.InterruptedException
>> at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
>> at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
>> at
>> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
>> at
>> org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:831)
>> at
>> org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:62)
>> at
>> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:667)
>>
>>
>>
>>
>> *Here is my Mesos-master configuration:*
>>
>> mesos master --ip=10.1.0.72  --work_dir=/var/lib/mesos-master --zk=
>> file:///etc/mesos/conf/zk --quorum=file:///etc/mesos/conf/quorum
>> Where zk=zk://zoo.service.consul:2181/mesos
>>               quorum=2
>>
>>
>>
>> *Mesos-Slave Configuration:*
>>
>> mesos slave --work_dir=/var/lib/mesos-slave --ip=10.1.0.72
>> --hostname=10.1.0.72 --strict=false  --master=
>> file:///etc/mesos/conf/master FrameworkInfo.checkpoint=True
>>
>>
>>
>> *Marathon Configuration:*
>>
>> java -jar /opt/marathon.jar  --master zk://zoo.service.consul:2181/mesos
>> --zk zk://zoo.service.consul:2181/marathon  --ha --hostname 10.1.0.72
>> --checkpoint
>>
>>
>>
>>
>> *Zookeeper configs with java version *"1.8.0_45"*:*
>>
>>
>>
>> dataDir=/var/lib/zookeeper
>> clientPort=2181
>> tickTime=2000
>> initLimit=10
>> syncLimit=20
>>
>>
>> autopurge.purgeInterval=0
>>
>>
>> zookeeper.connection.timeout.ms=6000
>> server.1=10.1.0.70:2888:3888
>> server.2=10.1.0.71:2888:3888
>> server.3=10.1.0.72:2888:3888
>>
>> And different *myid* has been given.
>>
>>
>> Can Anyone Help!!!
>>
>>
>>
>
>
> --
>
> https://github.com/mindscratch
> https://www.google.com/+CraigWickesser
> https://twitter.com/mind_scratch
> https://twitter.com/craig_links
>
>
>


-- 

https://github.com/mindscratch
https://www.google.com/+CraigWickesser
https://twitter.com/mind_scratch
https://twitter.com/craig_links

Re: After restarting cluster task disappeared

Reply via email to