RE: After restarting cluster task disappeared

Badal Naik Fri, 09 Oct 2015 09:20:47 -0700

Got it. I have build mesos from source so I need to initialize log-dir manually 
for once.
And now things are working.Thank you

-----Original Message-----
From: "Badal Naik" <[email protected]>
Sent: ‎09/‎10/‎2015 06:38 PM
To: "[email protected]" <[email protected]>
Subject: Re: After restarting cluster task disappeared

No i have not deleted anything.
i  am just restarting physical nodes and then i am not able to check old 
completed tasks.And after restarting many times marathon and mess just stay out 
of sync and when i restart marathon service both get synced.

On 09-Oct-2015, at 6:32 pm, haosdent <[email protected]> wrote:

For #1, do you delete something in your work_dir or zookeeper?
For #3, is this zookeeper issue related to yours 
http://stackoverflow.com/questions/15842553/zookeeper-network-ensemble-does-not-start-appropiately
 ?

On Fri, Oct 9, 2015 at 8:30 PM, craig w <[email protected]> wrote:

I'm not sure about #3. I have seen things go awry when restarting the whole 
cluster. When doing an upgrade from mesos 0.23.0 to 0.24.1, I restarted all of 
the mesos-masters. Waited a few moments for a leader to be elected, then 
restarted the slaves. When I went back to look at Marathon all of the tasks 
were being redeployed, as though they had all been killed off for some reason. 
That wasn't what I expected to happen since the upgrade was suppose to be as 
simple as install and restart. Perhaps you're experiencing a similar issue?

On Fri, Oct 9, 2015 at 8:25 AM, Badal Naik <[email protected]> wrote:

Any idea about #1 ?

Any one has experienced #3 ?

On 09-Oct-2015, at 5:53 pm, craig w <[email protected]> wrote:

With regards to item #2, I saw the same issue. it's been fixed in mesos 0.25 
(release candidates are out now), see 
https://issues.apache.org/jira/browse/MESOS-3282.

On Fri, Oct 9, 2015 at 8:16 AM, Badal Naik <[email protected]> wrote:

Hello Mesos-Users,

I have set up 3 node mess cluster with ubuntu 14.04. i have started 
zookeeper,Mesos and marathon. Every thing working fine expect three things.

1) When i restart the whole cluster mesos does not show completed tasks. is it 
expected behaviour? if not what i should do?

2) in mesos web ui i’m not able to see 
staged/started/finished/killed/failed/lost task numbers even when tasks are 
running.

3) Every zookeeper instance throws this exception regularly:

 2015-10-09 17:27:26,302 [myid:3] - WARN  
[SendWorker:1:QuorumCnxManager$SendWorker@679] - Interrupted while waiting for 
message on queue
java.lang.InterruptedException
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:831)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:62)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:667)

Here is my Mesos-master configuration:

mesos master --ip=10.1.0.72  --work_dir=/var/lib/mesos-master 
--zk=file:///etc/mesos/conf/zk --quorum=file:///etc/mesos/conf/quorum
Where zk=zk://zoo.service.consul:2181/mesos
              quorum=2

Mesos-Slave Configuration:

mesos slave --work_dir=/var/lib/mesos-slave --ip=10.1.0.72 --hostname=10.1.0.72 
--strict=false  --master=file:///etc/mesos/conf/master 
FrameworkInfo.checkpoint=True

Marathon Configuration:

java -jar /opt/marathon.jar  --master zk://zoo.service.consul:2181/mesos  --zk 
zk://zoo.service.consul:2181/marathon  --ha --hostname 10.1.0.72  --checkpoint

Zookeeper configs with java version "1.8.0_45":

dataDir=/var/lib/zookeeper
clientPort=2181
tickTime=2000
initLimit=10
syncLimit=20

autopurge.purgeInterval=0

zookeeper.connection.timeout.ms=6000
server.1=10.1.0.70:2888:3888
server.2=10.1.0.71:2888:3888
server.3=10.1.0.72:2888:3888

And different myid has been given.

Can Anyone Help!!!

-- 

https://github.com/mindscratchhttps://www.google.com/+CraigWickesserhttps://twitter.com/mind_scratchhttps://twitter.com/craig_links

-- 

https://github.com/mindscratchhttps://www.google.com/+CraigWickesserhttps://twitter.com/mind_scratchhttps://twitter.com/craig_links

-- 

Best Regards,

Haosdent Huang

RE: After restarting cluster task disappeared

Reply via email to