[jira] [Commented] (MESOS-6400) Not able to remove Orphan Tasks

2016-10-27 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15613513#comment-15613513
 ] 

Gilbert Song commented on MESOS-6400:
-

[~mithril], thanks for recording the logs. We will address all related tech 
debt in Mesos. BTW, you can resolve the orphan task issue by tearing down the 
unregistered marathon framework using the workaround in the following doc:

https://gist.github.com/bernadinm/41bca6058f9137cd21f4fb562fd20d50

> Not able to remove Orphan Tasks
> ---
>
> Key: MESOS-6400
> URL: https://issues.apache.org/jira/browse/MESOS-6400
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
> Environment: centos 7 x64
>Reporter: kasim
>Priority: Critical
>
> The problem maybe cause by Mesos and Marathon out of sync
> https://github.com/mesosphere/marathon/issues/616
> When I found Orphan Tasks happen, I
> 1. restart marathon
> 2. marathon do not sync Orphan Tasks, but start new tasks.
> 3. Orphan Tasks still taked the resource, I have to delete them.
> 4. I find all Orphan Tasks are under framework 
> `ef169d8a-24fc-41d1-8b0d-c67718937a48-`,
> curl -XGET `http://c196:5050/master/frameworks` shows that framework is 
> `unregistered_frameworks`
> {code}
> {
> "frameworks": [
> .
> ],
> "completed_frameworks": [ ],
> "unregistered_frameworks": [
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-",
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-",
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-"
> ]
> }
> {code}
> 5.Try {code}curl -XPOST http://c196:5050/master/teardown -d 
> 'frameworkId=ef169d8a-24fc-41d1-8b0d-c67718937a48-' {code}
> , but get `No framework found with specified ID`
> So I have no idea to delete Orphan Tasks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6400) Not able to remove Orphan Tasks

2016-10-20 Thread kasim (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15593665#comment-15593665
 ] 

kasim commented on MESOS-6400:
--

There are two machines's logs, from 2016.9.26 to 2016.10.21

https://drive.google.com/open?id=0B1ULA2gXggVwOWQwLUxCYmdoWnc
https://drive.google.com/open?id=0B1ULA2gXggVwRmJCRmI1Qm1OOEU

1. Mesos and Marathon out of sync happen on someday in  2016.10.1 - 2016.10.7 
2. I restart Marathon in 2016.10.8


> Not able to remove Orphan Tasks
> ---
>
> Key: MESOS-6400
> URL: https://issues.apache.org/jira/browse/MESOS-6400
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
> Environment: centos 7 x64
>Reporter: kasim
>
> The problem maybe cause by Mesos and Marathon out of sync
> https://github.com/mesosphere/marathon/issues/616
> When I found Orphan Tasks happen, I
> 1. restart marathon
> 2. marathon do not sync Orphan Tasks, but start new tasks.
> 3. Orphan Tasks still taked the resource, I have to delete them.
> 4. I find all Orphan Tasks are under framework 
> `ef169d8a-24fc-41d1-8b0d-c67718937a48-`,
> curl -XGET `http://c196:5050/master/frameworks` shows that framework is 
> `unregistered_frameworks`
> {code}
> {
> "frameworks": [
> .
> ],
> "completed_frameworks": [ ],
> "unregistered_frameworks": [
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-",
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-",
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-"
> ]
> }
> {code}
> 5.Try {code}curl -XPOST http://c196:5050/master/teardown -d 
> 'frameworkId=ef169d8a-24fc-41d1-8b0d-c67718937a48-' {code}
> , but get `No framework found with specified ID`
> So I have no idea to delete Orphan Tasks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6400) Not able to remove Orphan Tasks

2016-10-20 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15592403#comment-15592403
 ] 

Gilbert Song commented on MESOS-6400:
-

[~mithril], do you still have the Mesos master log and Marathon log? We would 
like to investigate why a new framework id got registered with Mesos master.

> Not able to remove Orphan Tasks
> ---
>
> Key: MESOS-6400
> URL: https://issues.apache.org/jira/browse/MESOS-6400
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
> Environment: centos 7 x64
>Reporter: kasim
>
> The problem maybe cause by Mesos and Marathon out of sync
> https://github.com/mesosphere/marathon/issues/616
> When I found Orphan Tasks happen, I
> 1. restart marathon
> 2. marathon do not sync Orphan Tasks, but start new tasks.
> 3. Orphan Tasks still taked the resource, I have to delete them.
> 4. I find all Orphan Tasks are under framework 
> `ef169d8a-24fc-41d1-8b0d-c67718937a48-`,
> curl -XGET `http://c196:5050/master/frameworks` shows that framework is 
> `unregistered_frameworks`
> {code}
> {
> "frameworks": [
> .
> ],
> "completed_frameworks": [ ],
> "unregistered_frameworks": [
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-",
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-",
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-"
> ]
> }
> {code}
> 5.Try {code}curl -XPOST http://c196:5050/master/teardown -d 
> 'frameworkId=ef169d8a-24fc-41d1-8b0d-c67718937a48-' {code}
> , but get `No framework found with specified ID`
> So I have no idea to delete Orphan Tasks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6400) Not able to remove Orphan Tasks

2016-10-19 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15589950#comment-15589950
 ] 

Gilbert Song commented on MESOS-6400:
-

[~mithril], seems like there are two separate issues from your description:

1. After a network partition or reboot, Marathon should not register with the 
Mesos Master using a new FrameworkID, since it would result in the old 
frameworkId being regarded as an unregistered_framework and ophan tasks still 
occupy the resources which make the new tasks cannot receive enough resources 
to launch. (We should contact Marathon team to figure why a new FrameworkID is 
used).

2. 'master/teardown' endpoint should support tearing down an unregistered 
framework. I created MESOS-6419 to track on this issue.

Side Note:
Currently, the Mesos master does not persist any state of the registered 
frameworks, this leads to the master not able to figure out when a new 
framework id try to register whether or not it exists before. Ideally, the 
Mesos master should persist all framework information on disk (as it currently 
does with the agent information). There should be an early JIRA describing this 
issue. Will link it once I find it.

> Not able to remove Orphan Tasks
> ---
>
> Key: MESOS-6400
> URL: https://issues.apache.org/jira/browse/MESOS-6400
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
> Environment: centos 7 x64
>Reporter: kasim
>
> The problem maybe cause by Mesos and Marathon out of sync
> https://github.com/mesosphere/marathon/issues/616
> When I found Orphan Tasks happen, I
> 1. restart marathon
> 2. marathon do not sync Orphan Tasks, but start new tasks.
> 3. Orphan Tasks still taked the resource, I have to delete them.
> 4. I find all Orphan Tasks are under framework 
> `ef169d8a-24fc-41d1-8b0d-c67718937a48-`,
> curl -XGET `http://c196:5050/master/frameworks` shows that framework is 
> `unregistered_frameworks`
> {code}
> {
> "frameworks": [
> .
> ],
> "completed_frameworks": [ ],
> "unregistered_frameworks": [
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-",
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-",
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-"
> ]
> }
> {code}
> 5.Try {code}curl -XPOST http://c196:5050/master/teardown -d 
> 'frameworkId=ef169d8a-24fc-41d1-8b0d-c67718937a48-' {code}
> , but get `No framework found with specified ID`
> So I have no idea to delete Orphan Tasks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6400) Not able to remove Orphan Tasks

2016-10-19 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15589327#comment-15589327
 ] 

Gilbert Song commented on MESOS-6400:
-

Did you try tearing down the old framework (whose tasks occupied your 
resources)?

> Not able to remove Orphan Tasks
> ---
>
> Key: MESOS-6400
> URL: https://issues.apache.org/jira/browse/MESOS-6400
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
> Environment: centos 7 x64
>Reporter: kasim
>
> The problem maybe cause by Mesos and Marathon out of sync
> https://github.com/mesosphere/marathon/issues/616
> When I found Orphan Tasks happen, I
> 1. restart marathon
> 2. marathon do not sync Orphan Tasks, but start new tasks.
> 3. Orphan Tasks still taked the resource, I have to delete them.
> 4. I find all Orphan Tasks are under framework 
> `ef169d8a-24fc-41d1-8b0d-c67718937a48-`,
> curl -XGET `http://c196:5050/master/frameworks` shows that framework is 
> `unregistered_frameworks`
> {code}
> {
> "frameworks": [
> .
> ],
> "completed_frameworks": [ ],
> "unregistered_frameworks": [
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-",
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-",
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-"
> ]
> }
> {code}
> 5.Try {code}curl -XPOST http://c196:5050/master/teardown -d 
> 'frameworkId=ef169d8a-24fc-41d1-8b0d-c67718937a48-' {code}
> , but get `No framework found with specified ID`
> So I have no idea to delete Orphan Tasks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6400) Not able to remove Orphan Tasks

2016-10-18 Thread kasim (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587231#comment-15587231
 ] 

kasim commented on MESOS-6400:
--

I am using marathon 1.3.0-1.0.506.el7.

Yes, I can restarted marathon get a new framework id, and start some tasks(all 
duplicated of Orphan Tasks). And due to lack of resouce, it can not start all 
tasks. so I'd like to remove Orphan Tasks immediately, is there any way to do ?

> Not able to remove Orphan Tasks
> ---
>
> Key: MESOS-6400
> URL: https://issues.apache.org/jira/browse/MESOS-6400
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
> Environment: centos 7 x64
>Reporter: kasim
>
> The problem maybe cause by Mesos and Marathon out of sync
> https://github.com/mesosphere/marathon/issues/616
> When I found Orphan Tasks happen, I
> 1. restart marathon
> 2. marathon do not sync Orphan Tasks, but start new tasks.
> 3. Orphan Tasks still taked the resource, I have to delete them.
> 4. I find all Orphan Tasks are under framework 
> `ef169d8a-24fc-41d1-8b0d-c67718937a48-`,
> curl -XGET `http://c196:5050/master/frameworks` shows that framework is 
> `unregistered_frameworks`
> {code}
> {
> "frameworks": [
> .
> ],
> "completed_frameworks": [ ],
> "unregistered_frameworks": [
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-",
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-",
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-"
> ]
> }
> {code}
> 5.Try {code}curl -XPOST http://c196:5050/master/teardown -d 
> 'frameworkId=ef169d8a-24fc-41d1-8b0d-c67718937a48-' {code}
> , but get `No framework found with specified ID`
> So I have no idea to delete Orphan Tasks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6400) Not able to remove Orphan Tasks

2016-10-18 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586677#comment-15586677
 ] 

Gilbert Song commented on MESOS-6400:
-

[~mithril], we may need more information to help you with debugging. What is 
your Mesos version? Marathon version?

Most likely you will receive a new framework id for your marathon after 
restart. Can you find it from the master/framework endpoint? If yes, the master 
is supposed to remove your old framework after a configurable timeout (e.g., 7 
days by default), then your tasks from that unregistered framework should be 
cleaned up.

> Not able to remove Orphan Tasks
> ---
>
> Key: MESOS-6400
> URL: https://issues.apache.org/jira/browse/MESOS-6400
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
> Environment: centos 7 x64
>Reporter: kasim
>
> The problem maybe cause by Mesos and Marathon out of sync
> https://github.com/mesosphere/marathon/issues/616
> When I found Orphan Tasks happen, I
> 1. restart marathon
> 2. marathon do not sync Orphan Tasks, but start new tasks.
> 3. Orphan Tasks still taked the resource, I have to delete them.
> 4. I find all Orphan Tasks are under framework 
> `ef169d8a-24fc-41d1-8b0d-c67718937a48-`,
> curl -XGET `http://c196:5050/master/frameworks` shows that framework is 
> `unregistered_frameworks`
> {code}
> {
> "frameworks": [
> .
> ],
> "completed_frameworks": [ ],
> "unregistered_frameworks": [
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-",
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-",
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-"
> ]
> }
> {code}
> 5.Try {code}curl -XPOST http://c196:5050/master/teardown -d 
> 'frameworkId=ef169d8a-24fc-41d1-8b0d-c67718937a48-' {code}
> , but get `No framework found with specified ID`
> So I have no idea to delete Orphan Tasks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)