Re: After restarting cluster task disappeared

Badal Naik Fri, 09 Oct 2015 06:08:44 -0700

No i have not deleted anything.
i  am just restarting physical nodes and then i am not able to check old 
completed tasks.And after restarting many times marathon and mess just stay out 
of sync and when i restart marathon service both get synced.



> On 09-Oct-2015, at 6:32 pm, haosdent <[email protected]> wrote:
> 
> For #1, do you delete something in your work_dir or zookeeper?
> For #3, is this zookeeper issue related to yours 
> http://stackoverflow.com/questions/15842553/zookeeper-network-ensemble-does-not-start-appropiately
>  
> <http://stackoverflow.com/questions/15842553/zookeeper-network-ensemble-does-not-start-appropiately>
>  ?
> 
> On Fri, Oct 9, 2015 at 8:30 PM, craig w <[email protected] 
> <mailto:[email protected]>> wrote:
> I'm not sure about #3. I have seen things go awry when restarting the whole 
> cluster. When doing an upgrade from mesos 0.23.0 to 0.24.1, I restarted all 
> of the mesos-masters. Waited a few moments for a leader to be elected, then 
> restarted the slaves. When I went back to look at Marathon all of the tasks 
> were being redeployed, as though they had all been killed off for some 
> reason. That wasn't what I expected to happen since the upgrade was suppose 
> to be as simple as install and restart. Perhaps you're experiencing a similar 
> issue?
> 
> On Fri, Oct 9, 2015 at 8:25 AM, Badal Naik <[email protected] 
> <mailto:[email protected]>> wrote:
> Any idea about #1 ?
> 
> Any one has experienced #3 ?
> 
>> On 09-Oct-2015, at 5:53 pm, craig w <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> With regards to item #2, I saw the same issue. it's been fixed in mesos 0.25 
>> (release candidates are out now), see 
>> https://issues.apache.org/jira/browse/MESOS-3282 
>> <https://issues.apache.org/jira/browse/MESOS-3282>.
>> 
>> On Fri, Oct 9, 2015 at 8:16 AM, Badal Naik <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Hello Mesos-Users,
>> 
>> I have set up 3 node mess cluster with ubuntu 14.04. i have started 
>> zookeeper,Mesos and marathon. Every thing working fine expect three things.
>> 
>> 1) When i restart the whole cluster mesos does not show completed tasks. is 
>> it expected behaviour? if not what i should do?
>> 
>> 2) in mesos web ui i’m not able to see 
>> staged/started/finished/killed/failed/lost task numbers even when tasks are 
>> running.
>> 
>> 3) Every zookeeper instance throws this exception regularly:
>> 
>>  2015-10-09 17:27:26,302 [myid:3] - WARN  
>> [SendWorker:1:QuorumCnxManager$SendWorker@679] - Interrupted while waiting 
>> for message on queue
>> java.lang.InterruptedException
>>      at 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
>>      at 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
>>      at 
>> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
>>      at 
>> org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:831)
>>      at 
>> org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:62)
>>      at 
>> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:667)
>> 
>> 
>> 
>> 
>> Here is my Mesos-master configuration:
>> 
>> mesos master --ip=10.1.0.72  --work_dir=/var/lib/mesos-master 
>> --zk=file:///etc/mesos/conf/zk <> --quorum=file:///etc/mesos/conf/quorum <>
>>      
>>      Where zk=zk://zoo.service.consul:2181/mesos <>
>>               quorum=2
>>               
>> 
>> 
>> Mesos-Slave Configuration:
>> 
>> mesos slave --work_dir=/var/lib/mesos-slave --ip=10.1.0.72 
>> --hostname=10.1.0.72 --strict=false  --master=file:///etc/mesos/conf/master 
>> <> FrameworkInfo.checkpoint=True
>>       
>> 
>> 
>> Marathon Configuration:
>> 
>> java -jar /opt/marathon.jar  --master zk://zoo.service.consul:2181/mesos <>  
>> --zk zk://zoo.service.consul:2181/marathon <>  --ha --hostname 10.1.0.72  
>> --checkpoint
>> 
>> 
>> 
>> 
>> Zookeeper configs with java version "1.8.0_45":
>> 
>> 
>> 
>> dataDir=/var/lib/zookeeper
>> clientPort=2181
>> tickTime=2000
>> initLimit=10
>> syncLimit=20
>> 
>> 
>> autopurge.purgeInterval=0
>> 
>> 
>> zookeeper.connection.timeout.ms 
>> <http://zookeeper.connection.timeout.ms/>=6000
>> server.1=10.1.0.70:2888:3888
>> server.2=10.1.0.71:2888:3888
>> server.3=10.1.0.72:2888:3888
>> 
>> And different myid has been given.
>> 
>> 
>> Can Anyone Help!!!
>> 
>> 
>> 
>> 
>> 
>> -- 
>> https://github.com/mindscratch <https://github.com/mindscratch>
>> https://www.google.com/+CraigWickesser 
>> <https://www.google.com/+CraigWickesser>
>> https://twitter.com/mind_scratch <https://twitter.com/mind_scratch>
>> https://twitter.com/craig_links <https://twitter.com/craig_links>
> 
> 
> 
> 
> -- 
> https://github.com/mindscratch <https://github.com/mindscratch>
> https://www.google.com/+CraigWickesser 
> <https://www.google.com/+CraigWickesser>
> https://twitter.com/mind_scratch <https://twitter.com/mind_scratch>
> https://twitter.com/craig_links <https://twitter.com/craig_links>
> 
> 
> 
> -- 
> Best Regards,
> Haosdent Huang

Re: After restarting cluster task disappeared

Reply via email to