[jira] [Commented] (MESOS-2035) Add reason to containerizer proto Termination

2015-10-12 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954102#comment-14954102
 ] 

Jie Yu commented on MESOS-2035:
---

commit 3c96155a4618000a0896bd42f7ca1e2a363b48fd
Author: Jie Yu 
Date:   Thu Sep 24 18:42:34 2015 -0700

Added TaskStatus::Reason to containerizer Termination message.

Review: https://reviews.apache.org/r/38746

> Add reason to containerizer proto Termination
> -
>
> Key: MESOS-2035
> URL: https://issues.apache.org/jira/browse/MESOS-2035
> Project: Mesos
>  Issue Type: Improvement
>  Components: slave
>Affects Versions: 0.21.0
>Reporter: Dominic Hamon
>Assignee: Jie Yu
>  Labels: twitter
> Fix For: 0.26.0
>
>
> When an isolator kills a task, the reason is unknown. As part of MESOS-1830, 
> the reason is set to a general one but ideally we would have the termination 
> reason to pass through to the status update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2035) Add reason to containerizer proto Termination

2015-09-25 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908917#comment-14908917
 ] 

Jie Yu commented on MESOS-2035:
---

Summary of the current semantics and proposed a new semantics in the following 
doc:
https://docs.google.com/document/d/1klGDAu5yBVf-CGWLqvELLIfxLfRaisGkhi6Gn7952-4/edit?usp=sharing

> Add reason to containerizer proto Termination
> -
>
> Key: MESOS-2035
> URL: https://issues.apache.org/jira/browse/MESOS-2035
> Project: Mesos
>  Issue Type: Improvement
>  Components: slave
>Affects Versions: 0.21.0
>Reporter: Dominic Hamon
>Assignee: Jie Yu
>  Labels: mesosphere
>
> When an isolator kills a task, the reason is unknown. As part of MESOS-1830, 
> the reason is set to a general one but ideally we would have the termination 
> reason to pass through to the status update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2035) Add reason to containerizer proto Termination

2015-09-18 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876421#comment-14876421
 ] 

Vinod Kone commented on MESOS-2035:
---

[~js84] ping! are you working on this? if not, i would like someone else to 
take over.

> Add reason to containerizer proto Termination
> -
>
> Key: MESOS-2035
> URL: https://issues.apache.org/jira/browse/MESOS-2035
> Project: Mesos
>  Issue Type: Improvement
>  Components: slave
>Affects Versions: 0.21.0
>Reporter: Dominic Hamon
>Assignee: Joerg Schad
>  Labels: mesosphere
>
> When an isolator kills a task, the reason is unknown. As part of MESOS-1830, 
> the reason is set to a general one but ideally we would have the termination 
> reason to pass through to the status update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2035) Add reason to containerizer proto Termination

2015-09-14 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14743985#comment-14743985
 ] 

Vinod Kone commented on MESOS-2035:
---

We are seeing this in production when we enabled disk isolation. Let's get this 
fixed asap. cc [~jieyu]

> Add reason to containerizer proto Termination
> -
>
> Key: MESOS-2035
> URL: https://issues.apache.org/jira/browse/MESOS-2035
> Project: Mesos
>  Issue Type: Improvement
>  Components: slave
>Affects Versions: 0.21.0
>Reporter: Dominic Hamon
>Assignee: Joerg Schad
>  Labels: mesosphere
>
> When an isolator kills a task, the reason is unknown. As part of MESOS-1830, 
> the reason is set to a general one but ideally we would have the termination 
> reason to pass through to the status update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2035) Add reason to containerizer proto Termination

2015-07-17 Thread Mike Michel (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631199#comment-14631199
 ] 

Mike Michel commented on MESOS-2035:


It would be very helpfull if the information would be written to the sandbox 
too. The stderr log is already used to write mesos info to the box when a 
container was started

I0717 12:30:01.219111 54012 exec.cpp:132] Version: 0.22.1
Starting task mike-website_frontend_apache.bd016cba-2c6e-11e5-bc1c-02016fccc167
I0717 12:30:01.226969 54028 exec.cpp:206] Executor registered on slave 
20150626-195146-1694738624-5050-2

Is it possible to extend this with the information for a failed start? The 
slave already has this info in it's own logfile

failed to start: Failed to 'docker pull mikemichel/notexist': exit status = 
exited with status 1 stderr = time=2015-07-15T01:48:57+02:00 level=fatal 
msg=Error pulling image (latest) from mikemichel/notexist, HTTP code 400

This way you have the info availabe in the mesos ui.


 Add reason to containerizer proto Termination
 -

 Key: MESOS-2035
 URL: https://issues.apache.org/jira/browse/MESOS-2035
 Project: Mesos
  Issue Type: Improvement
  Components: slave
Affects Versions: 0.21.0
Reporter: Dominic Hamon
Assignee: Joerg Schad
  Labels: mesosphere

 When an isolator kills a task, the reason is unknown. As part of MESOS-1830, 
 the reason is set to a general one but ideally we would have the termination 
 reason to pass through to the status update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2035) Add reason to containerizer proto Termination

2015-06-26 Thread Joerg Schad (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603143#comment-14603143
 ] 

Joerg Schad commented on MESOS-2035:


https://reviews.apache.org/r/35927/

 Add reason to containerizer proto Termination
 -

 Key: MESOS-2035
 URL: https://issues.apache.org/jira/browse/MESOS-2035
 Project: Mesos
  Issue Type: Improvement
  Components: slave
Affects Versions: 0.21.0
Reporter: Dominic Hamon
Assignee: Joerg Schad
  Labels: mesosphere

 When an isolator kills a task, the reason is unknown. As part of MESOS-1830, 
 the reason is set to a general one but ideally we would have the termination 
 reason to pass through to the status update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2035) Add reason to containerizer proto Termination

2015-06-25 Thread Joerg Schad (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601841#comment-14601841
 ] 

Joerg Schad commented on MESOS-2035:


Review Chain
Contributor: Joerg Schad
Reviewer: AlexR
Contributor: Jie Yu

 Add reason to containerizer proto Termination
 -

 Key: MESOS-2035
 URL: https://issues.apache.org/jira/browse/MESOS-2035
 Project: Mesos
  Issue Type: Improvement
  Components: slave
Affects Versions: 0.21.0
Reporter: Dominic Hamon
Assignee: Joerg Schad
Priority: Critical
  Labels: mesosphere

 When an isolator kills a task, the reason is unknown. As part of MESOS-1830, 
 the reason is set to a general one but ideally we would have the termination 
 reason to pass through to the status update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2035) Add reason to containerizer proto Termination

2015-06-23 Thread Niklas Quarfot Nielsen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598060#comment-14598060
 ] 

Niklas Quarfot Nielsen commented on MESOS-2035:
---

[~js84] Do you still want to be on this ticket?

 Add reason to containerizer proto Termination
 -

 Key: MESOS-2035
 URL: https://issues.apache.org/jira/browse/MESOS-2035
 Project: Mesos
  Issue Type: Improvement
  Components: slave
Affects Versions: 0.21.0
Reporter: Dominic Hamon
Assignee: Joerg Schad
Priority: Critical
  Labels: mesosphere

 When an isolator kills a task, the reason is unknown. As part of MESOS-1830, 
 the reason is set to a general one but ideally we would have the termination 
 reason to pass through to the status update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2035) Add reason to containerizer proto Termination

2015-06-11 Thread Niklas Quarfot Nielsen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582561#comment-14582561
 ] 

Niklas Quarfot Nielsen commented on MESOS-2035:
---

[~js84] How far did you get with this? We need this for the QoS Controller 
implementation in the slave :)

 Add reason to containerizer proto Termination
 -

 Key: MESOS-2035
 URL: https://issues.apache.org/jira/browse/MESOS-2035
 Project: Mesos
  Issue Type: Improvement
  Components: slave
Affects Versions: 0.21.0
Reporter: Dominic Hamon
Assignee: Joerg Schad
Priority: Critical

 When an isolator kills a task, the reason is unknown. As part of MESOS-1830, 
 the reason is set to a general one but ideally we would have the termination 
 reason to pass through to the status update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2035) Add reason to containerizer proto Termination

2015-06-09 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579680#comment-14579680
 ] 

Jie Yu commented on MESOS-2035:
---

[~nnielsen] Can you help with the implementation? You may also want to sync 
with [~js84] since this ticket is currently assigned to him. I can shepherd 
this and do reviews.

 Add reason to containerizer proto Termination
 -

 Key: MESOS-2035
 URL: https://issues.apache.org/jira/browse/MESOS-2035
 Project: Mesos
  Issue Type: Improvement
  Components: slave
Affects Versions: 0.21.0
Reporter: Dominic Hamon
Assignee: Joerg Schad
Priority: Critical

 When an isolator kills a task, the reason is unknown. As part of MESOS-1830, 
 the reason is set to a general one but ideally we would have the termination 
 reason to pass through to the status update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2035) Add reason to containerizer proto Termination

2015-06-09 Thread Niklas Quarfot Nielsen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579658#comment-14579658
 ] 

Niklas Quarfot Nielsen commented on MESOS-2035:
---

[~vinodkone] You are added as shepherd on this; what is the current state on 
this effort?

[~jieyu] SGTM - I can help (implementation, reviews) if we go this route.

 Add reason to containerizer proto Termination
 -

 Key: MESOS-2035
 URL: https://issues.apache.org/jira/browse/MESOS-2035
 Project: Mesos
  Issue Type: Improvement
  Components: slave
Affects Versions: 0.21.0
Reporter: Dominic Hamon
Assignee: Joerg Schad
Priority: Critical

 When an isolator kills a task, the reason is unknown. As part of MESOS-1830, 
 the reason is set to a general one but ideally we would have the termination 
 reason to pass through to the status update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2035) Add reason to containerizer proto Termination

2015-06-09 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579844#comment-14579844
 ] 

Timothy Chen commented on MESOS-2035:
-

Hi Jie, SGTM about the approach of the field. There are going to be some 
different reasons why a containerizer failed to launch 
(REASON_DOCKER_PULL_FAILED, REASON_FETCH_FAILED) but I think the message part 
can also help add more details.

 Add reason to containerizer proto Termination
 -

 Key: MESOS-2035
 URL: https://issues.apache.org/jira/browse/MESOS-2035
 Project: Mesos
  Issue Type: Improvement
  Components: slave
Affects Versions: 0.21.0
Reporter: Dominic Hamon
Assignee: Joerg Schad
Priority: Critical

 When an isolator kills a task, the reason is unknown. As part of MESOS-1830, 
 the reason is set to a general one but ideally we would have the termination 
 reason to pass through to the status update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2035) Add reason to containerizer proto Termination

2015-06-08 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578025#comment-14578025
 ] 

Jie Yu commented on MESOS-2035:
---

This problem pops again when we are implementing oversubscription. See 
MESOS-2653 and https://reviews.apache.org/r/34720/ for details.

Here is my proposal for solving this issue:

1) We add a TaskStatus::Reason field in containerizer::Termination protobuf 
(and deprecate the 'killed' field)
2) In slave's per executor data structure (struct Executor), we maintain an 
optional 'reason' field. When the slave destroys a container (e.g., due to 
registration timeout, failed to set resource limits, failed to launch 
container, qos controller kill, etc.), it will save the 'reason' field in 
struct Executor.
3) Containerizer is responsible for setting the 'reason' field inside 
containerizer::Termination (e.g., REASON_MEMORY_LIMIT, REASON_DISK_LIMIT, etc.)
4) In sendExecutorTerminatedStatusUpdate, we look at both reasons (one from 
slave's executor data structure and one from Termination protobuf). The current 
proposal is to prefer the reason from Termination protobuf. But in the future, 
when we allow multiple reasons to be sent (MESOS-2657), we can send both to the 
scheduler.

 Add reason to containerizer proto Termination
 -

 Key: MESOS-2035
 URL: https://issues.apache.org/jira/browse/MESOS-2035
 Project: Mesos
  Issue Type: Improvement
  Components: slave
Affects Versions: 0.21.0
Reporter: Dominic Hamon
Assignee: Joerg Schad
Priority: Minor

 When an isolator kills a task, the reason is unknown. As part of MESOS-1830, 
 the reason is set to a general one but ideally we would have the termination 
 reason to pass through to the status update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2035) Add reason to containerizer proto Termination

2015-06-08 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578027#comment-14578027
 ] 

Jie Yu commented on MESOS-2035:
---

cc [~tnachen] [~idownes]

 Add reason to containerizer proto Termination
 -

 Key: MESOS-2035
 URL: https://issues.apache.org/jira/browse/MESOS-2035
 Project: Mesos
  Issue Type: Improvement
  Components: slave
Affects Versions: 0.21.0
Reporter: Dominic Hamon
Assignee: Joerg Schad
Priority: Critical

 When an isolator kills a task, the reason is unknown. As part of MESOS-1830, 
 the reason is set to a general one but ideally we would have the termination 
 reason to pass through to the status update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2035) Add reason to containerizer proto Termination

2015-05-07 Thread Jay Buffington (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533682#comment-14533682
 ] 

Jay Buffington commented on MESOS-2035:
---

Part of this should be to also remove the Termination.killed field since Reason 
is a more useful/generic version of what that was intended to accomplish.

 Add reason to containerizer proto Termination
 -

 Key: MESOS-2035
 URL: https://issues.apache.org/jira/browse/MESOS-2035
 Project: Mesos
  Issue Type: Improvement
  Components: slave
Affects Versions: 0.21.0
Reporter: Dominic Hamon
Priority: Minor

 When an isolator kills a task, the reason is unknown. As part of MESOS-1830, 
 the reason is set to a general one but ideally we would have the termination 
 reason to pass through to the status update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2035) Add reason to containerizer proto Termination

2015-05-07 Thread Jay Buffington (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532750#comment-14532750
 ] 

Jay Buffington commented on MESOS-2035:
---

Review for fix is at https://reviews.apache.org/r/33249/


 Add reason to containerizer proto Termination
 -

 Key: MESOS-2035
 URL: https://issues.apache.org/jira/browse/MESOS-2035
 Project: Mesos
  Issue Type: Improvement
  Components: slave
Affects Versions: 0.21.0
Reporter: Dominic Hamon
Assignee: Jay Buffington
Priority: Minor

 When an isolator kills a task, the reason is unknown. As part of MESOS-1830, 
 the reason is set to a general one but ideally we would have the termination 
 reason to pass through to the status update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)