No i have not deleted anything. i am just restarting physical nodes and then i am not able to check old completed tasks.And after restarting many times marathon and mess just stay out of sync and when i restart marathon service both get synced.
> On 09-Oct-2015, at 6:32 pm, haosdent <[email protected]> wrote: > > For #1, do you delete something in your work_dir or zookeeper? > For #3, is this zookeeper issue related to yours > http://stackoverflow.com/questions/15842553/zookeeper-network-ensemble-does-not-start-appropiately > > <http://stackoverflow.com/questions/15842553/zookeeper-network-ensemble-does-not-start-appropiately> > ? > > On Fri, Oct 9, 2015 at 8:30 PM, craig w <[email protected] > <mailto:[email protected]>> wrote: > I'm not sure about #3. I have seen things go awry when restarting the whole > cluster. When doing an upgrade from mesos 0.23.0 to 0.24.1, I restarted all > of the mesos-masters. Waited a few moments for a leader to be elected, then > restarted the slaves. When I went back to look at Marathon all of the tasks > were being redeployed, as though they had all been killed off for some > reason. That wasn't what I expected to happen since the upgrade was suppose > to be as simple as install and restart. Perhaps you're experiencing a similar > issue? > > On Fri, Oct 9, 2015 at 8:25 AM, Badal Naik <[email protected] > <mailto:[email protected]>> wrote: > Any idea about #1 ? > > Any one has experienced #3 ? > >> On 09-Oct-2015, at 5:53 pm, craig w <[email protected] >> <mailto:[email protected]>> wrote: >> >> With regards to item #2, I saw the same issue. it's been fixed in mesos 0.25 >> (release candidates are out now), see >> https://issues.apache.org/jira/browse/MESOS-3282 >> <https://issues.apache.org/jira/browse/MESOS-3282>. >> >> On Fri, Oct 9, 2015 at 8:16 AM, Badal Naik <[email protected] >> <mailto:[email protected]>> wrote: >> Hello Mesos-Users, >> >> I have set up 3 node mess cluster with ubuntu 14.04. i have started >> zookeeper,Mesos and marathon. Every thing working fine expect three things. >> >> 1) When i restart the whole cluster mesos does not show completed tasks. is >> it expected behaviour? if not what i should do? >> >> 2) in mesos web ui i’m not able to see >> staged/started/finished/killed/failed/lost task numbers even when tasks are >> running. >> >> 3) Every zookeeper instance throws this exception regularly: >> >> 2015-10-09 17:27:26,302 [myid:3] - WARN >> [SendWorker:1:QuorumCnxManager$SendWorker@679] - Interrupted while waiting >> for message on queue >> java.lang.InterruptedException >> at >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) >> at >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088) >> at >> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418) >> at >> org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:831) >> at >> org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:62) >> at >> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:667) >> >> >> >> >> Here is my Mesos-master configuration: >> >> mesos master --ip=10.1.0.72 --work_dir=/var/lib/mesos-master >> --zk=file:///etc/mesos/conf/zk <> --quorum=file:///etc/mesos/conf/quorum <> >> >> Where zk=zk://zoo.service.consul:2181/mesos <> >> quorum=2 >> >> >> >> Mesos-Slave Configuration: >> >> mesos slave --work_dir=/var/lib/mesos-slave --ip=10.1.0.72 >> --hostname=10.1.0.72 --strict=false --master=file:///etc/mesos/conf/master >> <> FrameworkInfo.checkpoint=True >> >> >> >> Marathon Configuration: >> >> java -jar /opt/marathon.jar --master zk://zoo.service.consul:2181/mesos <> >> --zk zk://zoo.service.consul:2181/marathon <> --ha --hostname 10.1.0.72 >> --checkpoint >> >> >> >> >> Zookeeper configs with java version "1.8.0_45": >> >> >> >> dataDir=/var/lib/zookeeper >> clientPort=2181 >> tickTime=2000 >> initLimit=10 >> syncLimit=20 >> >> >> autopurge.purgeInterval=0 >> >> >> zookeeper.connection.timeout.ms >> <http://zookeeper.connection.timeout.ms/>=6000 >> server.1=10.1.0.70:2888:3888 >> server.2=10.1.0.71:2888:3888 >> server.3=10.1.0.72:2888:3888 >> >> And different myid has been given. >> >> >> Can Anyone Help!!! >> >> >> >> >> >> -- >> https://github.com/mindscratch <https://github.com/mindscratch> >> https://www.google.com/+CraigWickesser >> <https://www.google.com/+CraigWickesser> >> https://twitter.com/mind_scratch <https://twitter.com/mind_scratch> >> https://twitter.com/craig_links <https://twitter.com/craig_links> > > > > > -- > https://github.com/mindscratch <https://github.com/mindscratch> > https://www.google.com/+CraigWickesser > <https://www.google.com/+CraigWickesser> > https://twitter.com/mind_scratch <https://twitter.com/mind_scratch> > https://twitter.com/craig_links <https://twitter.com/craig_links> > > > > -- > Best Regards, > Haosdent Huang

