[jira] [Commented] (MESOS-6400) Not able to remove Orphan Tasks
[ https://issues.apache.org/jira/browse/MESOS-6400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613513#comment-15613513 ] Gilbert Song commented on MESOS-6400: - [~mithril], thanks for recording the logs. We will address all related tech debt in Mesos. BTW, you can resolve the orphan task issue by tearing down the unregistered marathon framework using the workaround in the following doc: https://gist.github.com/bernadinm/41bca6058f9137cd21f4fb562fd20d50 > Not able to remove Orphan Tasks > --- > > Key: MESOS-6400 > URL: https://issues.apache.org/jira/browse/MESOS-6400 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.1 > Environment: centos 7 x64 >Reporter: kasim >Priority: Critical > > The problem maybe cause by Mesos and Marathon out of sync > https://github.com/mesosphere/marathon/issues/616 > When I found Orphan Tasks happen, I > 1. restart marathon > 2. marathon do not sync Orphan Tasks, but start new tasks. > 3. Orphan Tasks still taked the resource, I have to delete them. > 4. I find all Orphan Tasks are under framework > `ef169d8a-24fc-41d1-8b0d-c67718937a48-`, > curl -XGET `http://c196:5050/master/frameworks` shows that framework is > `unregistered_frameworks` > {code} > { > "frameworks": [ > . > ], > "completed_frameworks": [ ], > "unregistered_frameworks": [ > "ef169d8a-24fc-41d1-8b0d-c67718937a48-", > "ef169d8a-24fc-41d1-8b0d-c67718937a48-", > "ef169d8a-24fc-41d1-8b0d-c67718937a48-" > ] > } > {code} > 5.Try {code}curl -XPOST http://c196:5050/master/teardown -d > 'frameworkId=ef169d8a-24fc-41d1-8b0d-c67718937a48-' {code} > , but get `No framework found with specified ID` > So I have no idea to delete Orphan Tasks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6400) Not able to remove Orphan Tasks
[ https://issues.apache.org/jira/browse/MESOS-6400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15593665#comment-15593665 ] kasim commented on MESOS-6400: -- There are two machines's logs, from 2016.9.26 to 2016.10.21 https://drive.google.com/open?id=0B1ULA2gXggVwOWQwLUxCYmdoWnc https://drive.google.com/open?id=0B1ULA2gXggVwRmJCRmI1Qm1OOEU 1. Mesos and Marathon out of sync happen on someday in 2016.10.1 - 2016.10.7 2. I restart Marathon in 2016.10.8 > Not able to remove Orphan Tasks > --- > > Key: MESOS-6400 > URL: https://issues.apache.org/jira/browse/MESOS-6400 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.1 > Environment: centos 7 x64 >Reporter: kasim > > The problem maybe cause by Mesos and Marathon out of sync > https://github.com/mesosphere/marathon/issues/616 > When I found Orphan Tasks happen, I > 1. restart marathon > 2. marathon do not sync Orphan Tasks, but start new tasks. > 3. Orphan Tasks still taked the resource, I have to delete them. > 4. I find all Orphan Tasks are under framework > `ef169d8a-24fc-41d1-8b0d-c67718937a48-`, > curl -XGET `http://c196:5050/master/frameworks` shows that framework is > `unregistered_frameworks` > {code} > { > "frameworks": [ > . > ], > "completed_frameworks": [ ], > "unregistered_frameworks": [ > "ef169d8a-24fc-41d1-8b0d-c67718937a48-", > "ef169d8a-24fc-41d1-8b0d-c67718937a48-", > "ef169d8a-24fc-41d1-8b0d-c67718937a48-" > ] > } > {code} > 5.Try {code}curl -XPOST http://c196:5050/master/teardown -d > 'frameworkId=ef169d8a-24fc-41d1-8b0d-c67718937a48-' {code} > , but get `No framework found with specified ID` > So I have no idea to delete Orphan Tasks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6400) Not able to remove Orphan Tasks
[ https://issues.apache.org/jira/browse/MESOS-6400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15592403#comment-15592403 ] Gilbert Song commented on MESOS-6400: - [~mithril], do you still have the Mesos master log and Marathon log? We would like to investigate why a new framework id got registered with Mesos master. > Not able to remove Orphan Tasks > --- > > Key: MESOS-6400 > URL: https://issues.apache.org/jira/browse/MESOS-6400 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.1 > Environment: centos 7 x64 >Reporter: kasim > > The problem maybe cause by Mesos and Marathon out of sync > https://github.com/mesosphere/marathon/issues/616 > When I found Orphan Tasks happen, I > 1. restart marathon > 2. marathon do not sync Orphan Tasks, but start new tasks. > 3. Orphan Tasks still taked the resource, I have to delete them. > 4. I find all Orphan Tasks are under framework > `ef169d8a-24fc-41d1-8b0d-c67718937a48-`, > curl -XGET `http://c196:5050/master/frameworks` shows that framework is > `unregistered_frameworks` > {code} > { > "frameworks": [ > . > ], > "completed_frameworks": [ ], > "unregistered_frameworks": [ > "ef169d8a-24fc-41d1-8b0d-c67718937a48-", > "ef169d8a-24fc-41d1-8b0d-c67718937a48-", > "ef169d8a-24fc-41d1-8b0d-c67718937a48-" > ] > } > {code} > 5.Try {code}curl -XPOST http://c196:5050/master/teardown -d > 'frameworkId=ef169d8a-24fc-41d1-8b0d-c67718937a48-' {code} > , but get `No framework found with specified ID` > So I have no idea to delete Orphan Tasks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6400) Not able to remove Orphan Tasks
[ https://issues.apache.org/jira/browse/MESOS-6400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15589950#comment-15589950 ] Gilbert Song commented on MESOS-6400: - [~mithril], seems like there are two separate issues from your description: 1. After a network partition or reboot, Marathon should not register with the Mesos Master using a new FrameworkID, since it would result in the old frameworkId being regarded as an unregistered_framework and ophan tasks still occupy the resources which make the new tasks cannot receive enough resources to launch. (We should contact Marathon team to figure why a new FrameworkID is used). 2. 'master/teardown' endpoint should support tearing down an unregistered framework. I created MESOS-6419 to track on this issue. Side Note: Currently, the Mesos master does not persist any state of the registered frameworks, this leads to the master not able to figure out when a new framework id try to register whether or not it exists before. Ideally, the Mesos master should persist all framework information on disk (as it currently does with the agent information). There should be an early JIRA describing this issue. Will link it once I find it. > Not able to remove Orphan Tasks > --- > > Key: MESOS-6400 > URL: https://issues.apache.org/jira/browse/MESOS-6400 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.1 > Environment: centos 7 x64 >Reporter: kasim > > The problem maybe cause by Mesos and Marathon out of sync > https://github.com/mesosphere/marathon/issues/616 > When I found Orphan Tasks happen, I > 1. restart marathon > 2. marathon do not sync Orphan Tasks, but start new tasks. > 3. Orphan Tasks still taked the resource, I have to delete them. > 4. I find all Orphan Tasks are under framework > `ef169d8a-24fc-41d1-8b0d-c67718937a48-`, > curl -XGET `http://c196:5050/master/frameworks` shows that framework is > `unregistered_frameworks` > {code} > { > "frameworks": [ > . > ], > "completed_frameworks": [ ], > "unregistered_frameworks": [ > "ef169d8a-24fc-41d1-8b0d-c67718937a48-", > "ef169d8a-24fc-41d1-8b0d-c67718937a48-", > "ef169d8a-24fc-41d1-8b0d-c67718937a48-" > ] > } > {code} > 5.Try {code}curl -XPOST http://c196:5050/master/teardown -d > 'frameworkId=ef169d8a-24fc-41d1-8b0d-c67718937a48-' {code} > , but get `No framework found with specified ID` > So I have no idea to delete Orphan Tasks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6400) Not able to remove Orphan Tasks
[ https://issues.apache.org/jira/browse/MESOS-6400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15589327#comment-15589327 ] Gilbert Song commented on MESOS-6400: - Did you try tearing down the old framework (whose tasks occupied your resources)? > Not able to remove Orphan Tasks > --- > > Key: MESOS-6400 > URL: https://issues.apache.org/jira/browse/MESOS-6400 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.1 > Environment: centos 7 x64 >Reporter: kasim > > The problem maybe cause by Mesos and Marathon out of sync > https://github.com/mesosphere/marathon/issues/616 > When I found Orphan Tasks happen, I > 1. restart marathon > 2. marathon do not sync Orphan Tasks, but start new tasks. > 3. Orphan Tasks still taked the resource, I have to delete them. > 4. I find all Orphan Tasks are under framework > `ef169d8a-24fc-41d1-8b0d-c67718937a48-`, > curl -XGET `http://c196:5050/master/frameworks` shows that framework is > `unregistered_frameworks` > {code} > { > "frameworks": [ > . > ], > "completed_frameworks": [ ], > "unregistered_frameworks": [ > "ef169d8a-24fc-41d1-8b0d-c67718937a48-", > "ef169d8a-24fc-41d1-8b0d-c67718937a48-", > "ef169d8a-24fc-41d1-8b0d-c67718937a48-" > ] > } > {code} > 5.Try {code}curl -XPOST http://c196:5050/master/teardown -d > 'frameworkId=ef169d8a-24fc-41d1-8b0d-c67718937a48-' {code} > , but get `No framework found with specified ID` > So I have no idea to delete Orphan Tasks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6400) Not able to remove Orphan Tasks
[ https://issues.apache.org/jira/browse/MESOS-6400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587231#comment-15587231 ] kasim commented on MESOS-6400: -- I am using marathon 1.3.0-1.0.506.el7. Yes, I can restarted marathon get a new framework id, and start some tasks(all duplicated of Orphan Tasks). And due to lack of resouce, it can not start all tasks. so I'd like to remove Orphan Tasks immediately, is there any way to do ? > Not able to remove Orphan Tasks > --- > > Key: MESOS-6400 > URL: https://issues.apache.org/jira/browse/MESOS-6400 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.1 > Environment: centos 7 x64 >Reporter: kasim > > The problem maybe cause by Mesos and Marathon out of sync > https://github.com/mesosphere/marathon/issues/616 > When I found Orphan Tasks happen, I > 1. restart marathon > 2. marathon do not sync Orphan Tasks, but start new tasks. > 3. Orphan Tasks still taked the resource, I have to delete them. > 4. I find all Orphan Tasks are under framework > `ef169d8a-24fc-41d1-8b0d-c67718937a48-`, > curl -XGET `http://c196:5050/master/frameworks` shows that framework is > `unregistered_frameworks` > {code} > { > "frameworks": [ > . > ], > "completed_frameworks": [ ], > "unregistered_frameworks": [ > "ef169d8a-24fc-41d1-8b0d-c67718937a48-", > "ef169d8a-24fc-41d1-8b0d-c67718937a48-", > "ef169d8a-24fc-41d1-8b0d-c67718937a48-" > ] > } > {code} > 5.Try {code}curl -XPOST http://c196:5050/master/teardown -d > 'frameworkId=ef169d8a-24fc-41d1-8b0d-c67718937a48-' {code} > , but get `No framework found with specified ID` > So I have no idea to delete Orphan Tasks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6400) Not able to remove Orphan Tasks
[ https://issues.apache.org/jira/browse/MESOS-6400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586677#comment-15586677 ] Gilbert Song commented on MESOS-6400: - [~mithril], we may need more information to help you with debugging. What is your Mesos version? Marathon version? Most likely you will receive a new framework id for your marathon after restart. Can you find it from the master/framework endpoint? If yes, the master is supposed to remove your old framework after a configurable timeout (e.g., 7 days by default), then your tasks from that unregistered framework should be cleaned up. > Not able to remove Orphan Tasks > --- > > Key: MESOS-6400 > URL: https://issues.apache.org/jira/browse/MESOS-6400 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.1 > Environment: centos 7 x64 >Reporter: kasim > > The problem maybe cause by Mesos and Marathon out of sync > https://github.com/mesosphere/marathon/issues/616 > When I found Orphan Tasks happen, I > 1. restart marathon > 2. marathon do not sync Orphan Tasks, but start new tasks. > 3. Orphan Tasks still taked the resource, I have to delete them. > 4. I find all Orphan Tasks are under framework > `ef169d8a-24fc-41d1-8b0d-c67718937a48-`, > curl -XGET `http://c196:5050/master/frameworks` shows that framework is > `unregistered_frameworks` > {code} > { > "frameworks": [ > . > ], > "completed_frameworks": [ ], > "unregistered_frameworks": [ > "ef169d8a-24fc-41d1-8b0d-c67718937a48-", > "ef169d8a-24fc-41d1-8b0d-c67718937a48-", > "ef169d8a-24fc-41d1-8b0d-c67718937a48-" > ] > } > {code} > 5.Try {code}curl -XPOST http://c196:5050/master/teardown -d > 'frameworkId=ef169d8a-24fc-41d1-8b0d-c67718937a48-' {code} > , but get `No framework found with specified ID` > So I have no idea to delete Orphan Tasks -- This message was sent by Atlassian JIRA (v6.3.4#6332)