[jira] [Assigned] (MESOS-3078) Recovered resources are not re-allocated until the next allocation delay.

2016-08-22 Thread Guangya Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guangya Liu reassigned MESOS-3078:
--

Assignee: Guangya Liu

> Recovered resources are not re-allocated until the next allocation delay.
> -
>
> Key: MESOS-3078
> URL: https://issues.apache.org/jira/browse/MESOS-3078
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Benjamin Mahler
>Assignee: Guangya Liu
>
> Currently, when resources are recovered, we do not perform an allocation for 
> that slave. Rather, we wait until the next allocation interval.
> For small task, high throughput frameworks, this can have a significant 
> impact on overall throughput, see the following thread:
> http://markmail.org/thread/y6mzfwzlurv6nik3
> We should consider immediately performing a re-allocation for the slave upon 
> resource recovery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3078) Recovered resources are not re-allocated until the next allocation delay.

2016-08-22 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432190#comment-15432190
 ] 

Guangya Liu commented on MESOS-3078:


The review posted by [~jjanco] here https://reviews.apache.org/r/51027/ can 
help this, we can use similar logic in {{addSlave}} to handle this.

{code}
allocationCandidates.insert(slaveId);
if (!allocationPending) {
  allocationPending = true;
  dispatch(self(), ::allocate);
}
{code}

> Recovered resources are not re-allocated until the next allocation delay.
> -
>
> Key: MESOS-3078
> URL: https://issues.apache.org/jira/browse/MESOS-3078
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Benjamin Mahler
>
> Currently, when resources are recovered, we do not perform an allocation for 
> that slave. Rather, we wait until the next allocation interval.
> For small task, high throughput frameworks, this can have a significant 
> impact on overall throughput, see the following thread:
> http://markmail.org/thread/y6mzfwzlurv6nik3
> We should consider immediately performing a re-allocation for the slave upon 
> resource recovery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6045) Implement LAUNCH_GROUP operation in master.

2016-08-22 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-6045:
---
Sprint: Mesosphere Sprint 41

> Implement LAUNCH_GROUP operation in master.
> ---
>
> Key: MESOS-6045
> URL: https://issues.apache.org/jira/browse/MESOS-6045
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Benjamin Mahler
>Assignee: Benjamin Mahler
>
> The master needs to handle the new {{LAUNCH_GROUP}} operation. This is a bit 
> different than the {{LAUNCH}} operation in that we need to ensure that we do 
> not deliver the task group if any of the tasks fail authorization, are 
> invalid, or are killed while authorization is in progress.
> The entire task group must be delivered in a single message to the agent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6071) Validate that an explicitly specified DEFAULT executor has disk resources.

2016-08-22 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-6071:
---
Description: 
When the framework is explicitly specifying the DEFAULT executor (currently 
only supported for task groups), we should consider validating that it contains 
disk resources. Currently, we validate that explicitly specified (DEFAULT or 
CUSTOM) executors only contain cpus and mem.

We should also consider supporting the omission of DEFAULT executor resources 
and injecting a default amount of resources. However, the difficulty here is 
that the framework must know about these amounts since they need to be 
available in the offer. We could expose these to the framework during framework 
registration.

  was:
When the framework is explicitly specifying the DEFAULT executor (currently 
only supported for task groups), we should consider validating that it contains 
disk resources. Currently, we validate that executors only contain cpus and mem.

We should also consider supporting the omission of DEFAULT executor resources 
and injecting a default amount of resources. However, the difficulty here is 
that the framework must know about these amounts since they need to be 
available in the offer. We could expose these to the framework during framework 
registration.


> Validate that an explicitly specified DEFAULT executor has disk resources.
> --
>
> Key: MESOS-6071
> URL: https://issues.apache.org/jira/browse/MESOS-6071
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Benjamin Mahler
>
> When the framework is explicitly specifying the DEFAULT executor (currently 
> only supported for task groups), we should consider validating that it 
> contains disk resources. Currently, we validate that explicitly specified 
> (DEFAULT or CUSTOM) executors only contain cpus and mem.
> We should also consider supporting the omission of DEFAULT executor resources 
> and injecting a default amount of resources. However, the difficulty here is 
> that the framework must know about these amounts since they need to be 
> available in the offer. We could expose these to the framework during 
> framework registration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6071) Validate that an explicitly specified DEFAULT executor has disk resources.

2016-08-22 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-6071:
---
Summary: Validate that an explicitly specified DEFAULT executor has disk 
resources.  (was: Validate that an explicitly specified DEFAULT executor has 
resources.)

> Validate that an explicitly specified DEFAULT executor has disk resources.
> --
>
> Key: MESOS-6071
> URL: https://issues.apache.org/jira/browse/MESOS-6071
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Benjamin Mahler
>
> When the framework is explicitly specifying the DEFAULT executor (currently 
> only supported for task groups), we should consider validating that it 
> contains disk resources.
> We should also consider supporting the omission of DEFAULT executor resources 
> and injecting a default amount of resources. However, the difficulty here is 
> that the framework must know about these amounts since they need to be 
> available in the offer. We could expose these to the framework during 
> framework registration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6071) Validate that an explicitly specified DEFAULT executor has resources.

2016-08-22 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-6071:
---
Description: 
When the framework is explicitly specifying the DEFAULT executor (currently 
only supported for task groups), we should consider validating that it contains 
disk resources.

We should also consider supporting the omission of DEFAULT executor resources 
and injecting a default amount of resources. However, the difficulty here is 
that the framework must know about these amounts since they need to be 
available in the offer. We could expose these to the framework during framework 
registration.

  was:
When the framework is explicitly specifying the DEFAULT executor (currently 
only supported for task groups), we should consider validating that it contains 
cpus, mem, and disk resources.

We should also consider supporting the omission of DEFAULT executor resources 
and injecting a default amount of resources. However, the difficulty here is 
that the framework must know about these amounts since they need to be 
available in the offer. We could expose these to the framework during framework 
registration.


> Validate that an explicitly specified DEFAULT executor has resources.
> -
>
> Key: MESOS-6071
> URL: https://issues.apache.org/jira/browse/MESOS-6071
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Benjamin Mahler
>
> When the framework is explicitly specifying the DEFAULT executor (currently 
> only supported for task groups), we should consider validating that it 
> contains disk resources.
> We should also consider supporting the omission of DEFAULT executor resources 
> and injecting a default amount of resources. However, the difficulty here is 
> that the framework must know about these amounts since they need to be 
> available in the offer. We could expose these to the framework during 
> framework registration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6071) Validate that an explicitly specified DEFAULT executor has disk resources.

2016-08-22 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-6071:
---
Description: 
When the framework is explicitly specifying the DEFAULT executor (currently 
only supported for task groups), we should consider validating that it contains 
disk resources. Currently, we validate that executors only contain cpus and mem.

We should also consider supporting the omission of DEFAULT executor resources 
and injecting a default amount of resources. However, the difficulty here is 
that the framework must know about these amounts since they need to be 
available in the offer. We could expose these to the framework during framework 
registration.

  was:
When the framework is explicitly specifying the DEFAULT executor (currently 
only supported for task groups), we should consider validating that it contains 
disk resources.

We should also consider supporting the omission of DEFAULT executor resources 
and injecting a default amount of resources. However, the difficulty here is 
that the framework must know about these amounts since they need to be 
available in the offer. We could expose these to the framework during framework 
registration.


> Validate that an explicitly specified DEFAULT executor has disk resources.
> --
>
> Key: MESOS-6071
> URL: https://issues.apache.org/jira/browse/MESOS-6071
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Benjamin Mahler
>
> When the framework is explicitly specifying the DEFAULT executor (currently 
> only supported for task groups), we should consider validating that it 
> contains disk resources. Currently, we validate that executors only contain 
> cpus and mem.
> We should also consider supporting the omission of DEFAULT executor resources 
> and injecting a default amount of resources. However, the difficulty here is 
> that the framework must know about these amounts since they need to be 
> available in the offer. We could expose these to the framework during 
> framework registration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6056) add NOOP Container Logger for mesos

2016-08-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432113#comment-15432113
 ] 

ASF GitHub Bot commented on MESOS-6056:
---

Github user IvanJobs closed the pull request at:

https://github.com/apache/mesos/pull/159


> add NOOP Container Logger for mesos
> ---
>
> Key: MESOS-6056
> URL: https://issues.apache.org/jira/browse/MESOS-6056
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization, slave
>Affects Versions: 1.0.0
> Environment: mesos 1.0.0, docker
>Reporter: IvanJobs
>Priority: Trivial
>  Labels: easyfix, features
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> mesos has two Container Loggers in its source files. 
> One is build into mesos-agent: sandbox Container Logger, it just redirects 
> stderr/stdout to sandbox, causing fill disk usage problem.
> The other is LogrotateContainerLogger module lib, it's good, we can make sure 
> stdout/stderr in sandbox be in a constant size.
> But there is a common need: don't write stdout/stderr into sandbox, pity, we 
> don't have any flags for turning it off. 
> This is a come around for this: developing a new module lib for 
> ContainerLogger for doing nothing(redirect stdout/stderr to /dev/null)
> yep, that's it. We need a NOOP ContainerLogger, BTW, FYI, we can also 
> retrieve stderr/stdout from docker daemon either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6056) add NOOP Container Logger for mesos

2016-08-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432112#comment-15432112
 ] 

ASF GitHub Bot commented on MESOS-6056:
---

Github user IvanJobs commented on the issue:

https://github.com/apache/mesos/pull/159
  
Well, actually after communication with Joseph Wu, I think this NOOP 
Container Logger is not so common and should not be accept by mesos community. 
So just forget about it. But if you have special use case and want to use this, 
I'm happy about that


> add NOOP Container Logger for mesos
> ---
>
> Key: MESOS-6056
> URL: https://issues.apache.org/jira/browse/MESOS-6056
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization, slave
>Affects Versions: 1.0.0
> Environment: mesos 1.0.0, docker
>Reporter: IvanJobs
>Priority: Trivial
>  Labels: easyfix, features
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> mesos has two Container Loggers in its source files. 
> One is build into mesos-agent: sandbox Container Logger, it just redirects 
> stderr/stdout to sandbox, causing fill disk usage problem.
> The other is LogrotateContainerLogger module lib, it's good, we can make sure 
> stdout/stderr in sandbox be in a constant size.
> But there is a common need: don't write stdout/stderr into sandbox, pity, we 
> don't have any flags for turning it off. 
> This is a come around for this: developing a new module lib for 
> ContainerLogger for doing nothing(redirect stdout/stderr to /dev/null)
> yep, that's it. We need a NOOP ContainerLogger, BTW, FYI, we can also 
> retrieve stderr/stdout from docker daemon either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6071) Validate that an explicitly specified DEFAULT executor has resources.

2016-08-22 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-6071:
--

 Summary: Validate that an explicitly specified DEFAULT executor 
has resources.
 Key: MESOS-6071
 URL: https://issues.apache.org/jira/browse/MESOS-6071
 Project: Mesos
  Issue Type: Task
  Components: master
Reporter: Benjamin Mahler


When the framework is explicitly specifying the DEFAULT executor (currently 
only supported for task groups), we should consider validating that it contains 
cpus, mem, and disk resources.

We should also consider supporting the omission of DEFAULT executor resources 
and injecting a default amount of resources. However, the difficulty here is 
that the framework must know about these amounts since they need to be 
available in the offer. We could expose these to the framework during framework 
registration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6056) add NOOP Container Logger for mesos

2016-08-22 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432033#comment-15432033
 ] 

Joseph Wu commented on MESOS-6056:
--

By the way, debugging the container logger is a challenge due to how the logger 
binary itself does not log anything (because when something goes wrong, it most 
likely does not have the ability to log).  The [latest issue we found in the 
logrotate module|https://issues.apache.org/jira/browse/MESOS-5856] was debugged 
using a mix of {{strace}} and matching specific syscalls to locations in the 
logrotate source code.

> add NOOP Container Logger for mesos
> ---
>
> Key: MESOS-6056
> URL: https://issues.apache.org/jira/browse/MESOS-6056
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization, slave
>Affects Versions: 1.0.0
> Environment: mesos 1.0.0, docker
>Reporter: IvanJobs
>Priority: Trivial
>  Labels: easyfix, features
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> mesos has two Container Loggers in its source files. 
> One is build into mesos-agent: sandbox Container Logger, it just redirects 
> stderr/stdout to sandbox, causing fill disk usage problem.
> The other is LogrotateContainerLogger module lib, it's good, we can make sure 
> stdout/stderr in sandbox be in a constant size.
> But there is a common need: don't write stdout/stderr into sandbox, pity, we 
> don't have any flags for turning it off. 
> This is a come around for this: developing a new module lib for 
> ContainerLogger for doing nothing(redirect stdout/stderr to /dev/null)
> yep, that's it. We need a NOOP ContainerLogger, BTW, FYI, we can also 
> retrieve stderr/stdout from docker daemon either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6056) add NOOP Container Logger for mesos

2016-08-22 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432028#comment-15432028
 ] 

Joseph Wu commented on MESOS-6056:
--

Nowadays, Github PR's are used (almost entirely) to modify the 
{{contributors.yaml}} file.  That sentence could be clearer, I suppose :)

> add NOOP Container Logger for mesos
> ---
>
> Key: MESOS-6056
> URL: https://issues.apache.org/jira/browse/MESOS-6056
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization, slave
>Affects Versions: 1.0.0
> Environment: mesos 1.0.0, docker
>Reporter: IvanJobs
>Priority: Trivial
>  Labels: easyfix, features
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> mesos has two Container Loggers in its source files. 
> One is build into mesos-agent: sandbox Container Logger, it just redirects 
> stderr/stdout to sandbox, causing fill disk usage problem.
> The other is LogrotateContainerLogger module lib, it's good, we can make sure 
> stdout/stderr in sandbox be in a constant size.
> But there is a common need: don't write stdout/stderr into sandbox, pity, we 
> don't have any flags for turning it off. 
> This is a come around for this: developing a new module lib for 
> ContainerLogger for doing nothing(redirect stdout/stderr to /dev/null)
> yep, that's it. We need a NOOP ContainerLogger, BTW, FYI, we can also 
> retrieve stderr/stdout from docker daemon either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6056) add NOOP Container Logger for mesos

2016-08-22 Thread IvanJobs (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431967#comment-15431967
 ] 

IvanJobs commented on MESOS-6056:
-

Yep, I missed that part of log. log in sandbox is not only log from docker 
container, but includes log from executor. docker daemon just maintain log from 
docker container. If I redirect sandbox's log to /dev/null,  I will lose log 
from executor. Thx for reminding me of that. 

As you say, we don't add features in Github PRs. But my understanding of that 
ref link is not the same. 
I picked two sentences out below:
"
You’ve fixed a bug or added a feature and want to contribute it. AWESOME!

Once your JIRA and Review Board accounts are in place please go ahead and 
create a review or GitHub pull request with an entry for yourself in 
contributors.yaml file.
"
did I miss something? thx.




> add NOOP Container Logger for mesos
> ---
>
> Key: MESOS-6056
> URL: https://issues.apache.org/jira/browse/MESOS-6056
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization, slave
>Affects Versions: 1.0.0
> Environment: mesos 1.0.0, docker
>Reporter: IvanJobs
>Priority: Trivial
>  Labels: easyfix, features
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> mesos has two Container Loggers in its source files. 
> One is build into mesos-agent: sandbox Container Logger, it just redirects 
> stderr/stdout to sandbox, causing fill disk usage problem.
> The other is LogrotateContainerLogger module lib, it's good, we can make sure 
> stdout/stderr in sandbox be in a constant size.
> But there is a common need: don't write stdout/stderr into sandbox, pity, we 
> don't have any flags for turning it off. 
> This is a come around for this: developing a new module lib for 
> ContainerLogger for doing nothing(redirect stdout/stderr to /dev/null)
> yep, that's it. We need a NOOP ContainerLogger, BTW, FYI, we can also 
> retrieve stderr/stdout from docker daemon either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6055) Mesos libs in LD_LIBRARY_PATH cause fetcher to fail and not report errors

2016-08-22 Thread Charles Allen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431839#comment-15431839
 ] 

Charles Allen commented on MESOS-6055:
--

I'll close it as {{can't reproduce}} for now. May have just been an oddity of 
the system I was testing on.

> Mesos libs in LD_LIBRARY_PATH cause fetcher to fail and not report errors
> -
>
> Key: MESOS-6055
> URL: https://issues.apache.org/jira/browse/MESOS-6055
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
>Reporter: Charles Allen
>
> in 1.0.0, if the agent is launched such that the mesos libraries can only be 
> found under {{LD_LIBRARY_PATH}}, the fetcher will fail and simply exit with 
> no output. The log will not show linker errors. I'm not sure where they are 
> swallowed. If the task is launched with LD_LIBRARY_PATH set to include where 
> the mesos libs can be found, the fetcher functions as expected.
> The problem is that the errors in the fetcher linking are not obvious as no 
> logs are produced from the fetcher sub process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6055) Mesos libs in LD_LIBRARY_PATH cause fetcher to fail and not report errors

2016-08-22 Thread Charles Allen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431840#comment-15431840
 ] 

Charles Allen commented on MESOS-6055:
--

Thanks for checking it out!

> Mesos libs in LD_LIBRARY_PATH cause fetcher to fail and not report errors
> -
>
> Key: MESOS-6055
> URL: https://issues.apache.org/jira/browse/MESOS-6055
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
>Reporter: Charles Allen
>
> in 1.0.0, if the agent is launched such that the mesos libraries can only be 
> found under {{LD_LIBRARY_PATH}}, the fetcher will fail and simply exit with 
> no output. The log will not show linker errors. I'm not sure where they are 
> swallowed. If the task is launched with LD_LIBRARY_PATH set to include where 
> the mesos libs can be found, the fetcher functions as expected.
> The problem is that the errors in the fetcher linking are not obvious as no 
> logs are produced from the fetcher sub process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6069) Misspelt TASK_KILLED in mesos slave

2016-08-22 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-6069:
-
Priority: Trivial  (was: Major)

> Misspelt TASK_KILLED in mesos slave
> ---
>
> Key: MESOS-6069
> URL: https://issues.apache.org/jira/browse/MESOS-6069
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Reporter: Cody Maloney
>Priority: Trivial
>  Labels: newbie
>
> https://github.com/apache/mesos/blob/c3228f3c3d1a1b2c145d1377185cfe22da6079eb/src/slave/slave.cpp#L2127



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6069) Misspelt TASK_KILLED in mesos slave

2016-08-22 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-6069:
-
Labels: newbie  (was: )

> Misspelt TASK_KILLED in mesos slave
> ---
>
> Key: MESOS-6069
> URL: https://issues.apache.org/jira/browse/MESOS-6069
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Reporter: Cody Maloney
>  Labels: newbie
>
> https://github.com/apache/mesos/blob/c3228f3c3d1a1b2c145d1377185cfe22da6079eb/src/slave/slave.cpp#L2127



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6055) Mesos libs in LD_LIBRARY_PATH cause fetcher to fail and not report errors

2016-08-22 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431760#comment-15431760
 ] 

Joseph Wu commented on MESOS-6055:
--

Wasn't able to repro from my local build.

Agent launched from non-libtool'd binary:
{code}
sudo -E GLOG_v=1 LD_RUN_PATH=/mesos/build/src/.libs 
LD_LIBRARY_PATH=/mesos/build/src/.libs src/.libs/mesos-agent 
--work_dir=/tmp/agent --master=localhost:5050 --launcher_dir=/mesos/build/src
{code}

Master launched from wherever:
{code}
bin/mesos-master.sh --work_dir=/tmp/master
{code}

See if the fetcher does anything.  The URI itself doesn't matter:
{code}
src/balloon-framework --master=localhost:5050 --task_memory=128MB 
--task_memory_usage_limit=256MB 
--executor_uri="http://dont/really/care/where/this/is;
{code}

Checked the task's stderr and it clearly showed a fetcher error, but not a 
linking error.

> Mesos libs in LD_LIBRARY_PATH cause fetcher to fail and not report errors
> -
>
> Key: MESOS-6055
> URL: https://issues.apache.org/jira/browse/MESOS-6055
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
>Reporter: Charles Allen
>
> in 1.0.0, if the agent is launched such that the mesos libraries can only be 
> found under {{LD_LIBRARY_PATH}}, the fetcher will fail and simply exit with 
> no output. The log will not show linker errors. I'm not sure where they are 
> swallowed. If the task is launched with LD_LIBRARY_PATH set to include where 
> the mesos libs can be found, the fetcher functions as expected.
> The problem is that the errors in the fetcher linking are not obvious as no 
> logs are produced from the fetcher sub process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6070) Renamed containerizer::Termination to ContainerTermination.

2016-08-22 Thread Jie Yu (JIRA)
Jie Yu created MESOS-6070:
-

 Summary: Renamed containerizer::Termination to 
ContainerTermination.
 Key: MESOS-6070
 URL: https://issues.apache.org/jira/browse/MESOS-6070
 Project: Mesos
  Issue Type: Task
Reporter: Jie Yu
Assignee: Jie Yu


`containerizer::Termination` is a legacy protobuf for external containerizer. 
Since we already removed the external containerizer, we should rename it to 
`ContainerTermination` and moved the definition to `containerizer.proto`. We 
should also move all definitions in `isolator.proto` to `containerizer.proto` 
to be more consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6057) docker isolator does not overwrite Dockerfile ENV

2016-08-22 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-6057:
-

Assignee: Jie Yu

> docker isolator does not overwrite Dockerfile ENV
> -
>
> Key: MESOS-6057
> URL: https://issues.apache.org/jira/browse/MESOS-6057
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.0.0, 1.0.1, 1.1.0
>Reporter: Stéphane Cottin
>Assignee: Jie Yu
>Priority: Critical
>  Labels: mesosphere
> Fix For: 1.1.0
>
>
> The docker/runtime isolator does not overwrite env values when a default 
> value is present in the Dockerfile.
> Steps to reproduce : 
> {code}
> mesos-execute --master=leader.mesos:5050 --name=test 
> --docker_image=bashell/alpine-bash --env="{\"LC_ALL\": 
> \fr_FR.UTF-8\",\"LC_TEST\": \"fr_FR.UTF-8\"}" --command="env"
> {code}
> outputs in stdout :
> {code}
> [...]
> LC_ALL=en_US.UTF-8
> LC_TEST=fr_FR.UTF-8
> [...]
> {code}
>  
> {{en_US.UTF-8}} is the default value from the dockerfile, see 
> https://hub.docker.com/r/bashell/alpine-bash/~/dockerfile/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6068) Refactor MesosContainerizer::launch to prepare for nesting support.

2016-08-22 Thread Jie Yu (JIRA)
Jie Yu created MESOS-6068:
-

 Summary: Refactor MesosContainerizer::launch to prepare for 
nesting support.
 Key: MESOS-6068
 URL: https://issues.apache.org/jira/browse/MESOS-6068
 Project: Mesos
  Issue Type: Task
Reporter: Jie Yu
Assignee: Jie Yu


The idea is to have a common launch path for both top level executor container 
and nested containers. That means the parameters to the launch method should be 
container agnostic.

Then the original launch can just call this common launch code. When we add 
nesting support later, the same common launch code will be re-used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6069) Misspelt TASK_KILLED in mesos slave

2016-08-22 Thread Cody Maloney (JIRA)
Cody Maloney created MESOS-6069:
---

 Summary: Misspelt TASK_KILLED in mesos slave
 Key: MESOS-6069
 URL: https://issues.apache.org/jira/browse/MESOS-6069
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Cody Maloney


https://github.com/apache/mesos/blob/c3228f3c3d1a1b2c145d1377185cfe22da6079eb/src/slave/slave.cpp#L2127



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6066) Operator SUBSCRIBE api should include timestamps

2016-08-22 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431702#comment-15431702
 ] 

Anand Mazumdar commented on MESOS-6066:
---

We intend to expose the {{TaskStatus}} as part of the {{TaskUpdated}} event. 
That would have the timestamp details.

> Operator SUBSCRIBE api should include timestamps
> 
>
> Key: MESOS-6066
> URL: https://issues.apache.org/jira/browse/MESOS-6066
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API, json api
>Affects Versions: 1.0.0
>Reporter: Steven Schlansker
>
> Events coming from the Mesos master are delivered asynchronously.  While 
> usually they are processed in a timely fashion, it really scares me that 
> updates do not have a timestamp:
> {code}
> 301
> {
>   "task_updated": {
> "agent_id": {
>   "value": "fdbb3ff5-47c2-4b49-a521-b52b9acf74dd-S14"
> },
> "framework_id": {
>   "value": "Singularity"
> },
> "state": "TASK_KILLED",
> "task_id": {
>   "value": 
> "pp-demoservice-steven.2016.07.05T17.00.06-1471901722511-1-mesos_slave17_qa_uswest2.qasql.opentable.com-us_west_2b"
> }
>   },
>   "type": "TASK_UPDATED"
> }
> {code}
> Events should have a timestamp that indicates the time that they happened at, 
> otherwise your timestamps include delivery and processing delays.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-313) Report executor terminations to framework schedulers.

2016-08-22 Thread Stephan Erb (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431688#comment-15431688
 ] 

Stephan Erb commented on MESOS-313:
---

Now that this patch has landed, even a clean shutdown of an executor (with 
status code 0) is reported to the framework via the {{executorLost}} message. 
Is this a bug or intentional?

Example log output:
{code}
I0616 13:55:16.580080 16915 master.cpp:4891] Executor 
'thermos-role-env-job-0-d94972f8-760e-4bb0-beef-654e2df1f5e0' of framework 
20151001-085346-58917130-5050-37976- on slave 
d4218d85-e294-4405-af4c-80fc7a66f1a4
-S0 at slave(1)@:5051 (): exited with status 0
I0616 13:55:16.580286 16915 master.cpp:6540] Removing executor 
'thermos-role-env-job-0-d94972f8-760e-4bb0-beef-654e2df1f5e0' with resources 
cpus(*):0.01; mem(*):128 of framework 20151001-085346-58917130-5050-37976- 
on slave d4218d85-e294-4405-af4c-80fc7a66f1a4-S0 at slave(1)@:5051 ()
{code}




> Report executor terminations to framework schedulers.
> -
>
> Key: MESOS-313
> URL: https://issues.apache.org/jira/browse/MESOS-313
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Charles Reiss
>Assignee: Zhitao Li
>  Labels: mesosphere, newbie
> Fix For: 0.27.0
>
>
> The Scheduler interface has a callback for executorLost, but currently it is 
> never called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6067) Support provisioner to be nested aware for Mesos Pods.

2016-08-22 Thread Gilbert Song (JIRA)
Gilbert Song created MESOS-6067:
---

 Summary: Support provisioner to be nested aware for Mesos Pods.
 Key: MESOS-6067
 URL: https://issues.apache.org/jira/browse/MESOS-6067
 Project: Mesos
  Issue Type: Task
  Components: containerization
Reporter: Gilbert Song
Assignee: Gilbert Song






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6066) Operator SUBSCRIBE api should include timestamps

2016-08-22 Thread Steven Schlansker (JIRA)
Steven Schlansker created MESOS-6066:


 Summary: Operator SUBSCRIBE api should include timestamps
 Key: MESOS-6066
 URL: https://issues.apache.org/jira/browse/MESOS-6066
 Project: Mesos
  Issue Type: Bug
  Components: HTTP API, json api
Affects Versions: 1.0.0
Reporter: Steven Schlansker


Events coming from the Mesos master are delivered asynchronously.  While 
usually they are processed in a timely fashion, it really scares me that 
updates do not have a timestamp:

{code}
301
{
  "task_updated": {
"agent_id": {
  "value": "fdbb3ff5-47c2-4b49-a521-b52b9acf74dd-S14"
},
"framework_id": {
  "value": "Singularity"
},
"state": "TASK_KILLED",
"task_id": {
  "value": 
"pp-demoservice-steven.2016.07.05T17.00.06-1471901722511-1-mesos_slave17_qa_uswest2.qasql.opentable.com-us_west_2b"
}
  },
  "type": "TASK_UPDATED"
}
{code}

Events should have a timestamp that indicates the time that they happened at, 
otherwise your timestamps include delivery and processing delays.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6065) Support provisioning image volumes in an isolator.

2016-08-22 Thread Gilbert Song (JIRA)
Gilbert Song created MESOS-6065:
---

 Summary: Support provisioning image volumes in an isolator.
 Key: MESOS-6065
 URL: https://issues.apache.org/jira/browse/MESOS-6065
 Project: Mesos
  Issue Type: Improvement
  Components: containerization, isolation
Reporter: Gilbert Song
Assignee: Gilbert Song


Currently the image volumes are provisioned in mesos containerizer. This makes 
the containerzer logic complicated, and hard to make containerizer launch to be 
nest aware.

We should implement a 'volume/image' isolator to move these part of logic away 
from the mesos containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6062) mesos-agent should autodetect mount-type volume sizes

2016-08-22 Thread Anindya Sinha (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anindya Sinha reassigned MESOS-6062:


Assignee: Anindya Sinha

> mesos-agent should autodetect mount-type volume sizes
> -
>
> Key: MESOS-6062
> URL: https://issues.apache.org/jira/browse/MESOS-6062
> Project: Mesos
>  Issue Type: Improvement
>  Components: slave
>Reporter: Yan Xu
>Assignee: Anindya Sinha
>
> When dealing with a large fleet of machines it could be cumbersome to 
> construct the resources JSON file that varies from host to host. Mesos 
> already auto-detects resources such as cpus, mem and "root" disk, it should 
> extend it to the MOUNT type disk as it's pretty clear that the value should 
> be the size of entire volume.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6016) Expose the unversioned Call and Event Scheduler/Executor Protobufs.

2016-08-22 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431478#comment-15431478
 ] 

Anand Mazumdar commented on MESOS-6016:
---

{noformat}
commit e3143e756fafe343a79cadb10b587fad0e5904d5
Author: Anand Mazumdar 
Date:   Mon Aug 22 11:15:26 2016 -0700

Exposed unversioned scheduler/executor protos in Mesos JAR.

This change exposes the unversioned scheduler/executor protos
in the Mesos JAR. We already used to expose the unversioned
Mesos protos. This is useful for migrating schedulers to use
the new v1 API via the scheduler shim. Otherwise, they would
need to create their own copy of these protobufs even for vetting
the new API via the shim. Note that this only partially resolves
MESOS-6016 and that we would need to tackle the unversioned protobuf
deprecation later eventually.

Review: https://reviews.apache.org/r/51130/
{noformat}

> Expose the unversioned Call and Event Scheduler/Executor Protobufs.
> ---
>
> Key: MESOS-6016
> URL: https://issues.apache.org/jira/browse/MESOS-6016
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>  Labels: mesos
>
> Currently, we don't expose the un-versioned (v0) {{Call}}/{{Event}} 
> scheduler/executor protobufs externally to framework authors. This is a bit 
> disjoint since we already expose the unversioned Mesos protos. The reasoning 
> for not doing so earlier was that Mesos would use the v0 protobufs as an 
> alternative to having separate internal protobufs internally. 
> However, that is not going to work. Eventually, when we introduce a backward 
> incompatible change in {{v1}} protobufs, we would create new {{v2}} 
> protobufs. But, we would need to ensure that {{v2}} protobufs can somehow be 
> translated to {{v0}} without breaking existing users. That's a pretty hard 
> thing to do! In the interim, to help framework authors migrate their 
> frameworks (they might be storing old protobufs in ZK/other reliable storage) 
> , we should expose the v0 scheduler/executor protobufs too and create another 
> internal translation layer for Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5788) Consider adding a Java Scheduler Shim/Adapter for the new/old API.

2016-08-22 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431476#comment-15431476
 ] 

Anand Mazumdar commented on MESOS-5788:
---

{noformat}
commit 04b9498bc5b5e080095786f4275c202625b3142b
Author: Anand Mazumdar 
Date:   Mon Aug 22 11:14:51 2016 -0700

Renamed `JNIMesos` to `V1Mesos` for scheduler shim.

This change renames `JNIMesos`, v1 implementation for the scheduler
shim to `V1Mesos`. `JNIMesos` was non-intuitive for users considering
the implementation was already in the native code for `V0Mesos` too.
Also, it was a bit confusing that `JNIMesos` referred to using the v1
API under the hood.

Review: https://reviews.apache.org/r/51129/
{noformat}

> Consider adding a Java Scheduler Shim/Adapter for the new/old API.
> --
>
> Key: MESOS-5788
> URL: https://issues.apache.org/jira/browse/MESOS-5788
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
> Fix For: 1.1.0
>
>
> Currently, for existing JAVA based frameworks, moving to try out the new API 
> can be cumbersome. This change intends to introduce a shim/adapter interface 
> that makes this easier by allowing to toggle between the old/new API 
> (driver/new scheduler library) implementation via an environment variable. 
> This would allow framework developers to transition their older frameworks to 
> the new API rather seamlessly.
> This would look similar to the work done for the executor shim for C++ 
> (command/docker executor). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6055) Mesos libs in LD_LIBRARY_PATH cause fetcher to fail and not report errors

2016-08-22 Thread Charles Allen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431398#comment-15431398
 ] 

Charles Allen commented on MESOS-6055:
--

Have mesos installed in a way where the main shared library isn't found. ex: 
launching a slave should fail by default with errors about not able to 
find/bind the mesos library.

Change the LD path via {{LD_LIBRARY_PATH}} such that the slave succeeds in 
running.

Try and launch something with a URI to be fetched, it will fail in confusing 
ways.
Try and launch something without a URI  (like {{echo something}}). it will 
print out {{something} as expected. 

> Mesos libs in LD_LIBRARY_PATH cause fetcher to fail and not report errors
> -
>
> Key: MESOS-6055
> URL: https://issues.apache.org/jira/browse/MESOS-6055
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
>Reporter: Charles Allen
>
> in 1.0.0, if the agent is launched such that the mesos libraries can only be 
> found under {{LD_LIBRARY_PATH}}, the fetcher will fail and simply exit with 
> no output. The log will not show linker errors. I'm not sure where they are 
> swallowed. If the task is launched with LD_LIBRARY_PATH set to include where 
> the mesos libs can be found, the fetcher functions as expected.
> The problem is that the errors in the fetcher linking are not obvious as no 
> logs are produced from the fetcher sub process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6064) Add version member field to Docker class to avoid validate docker version every time

2016-08-22 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6064:

Labels: docker  (was: )

> Add version member field to Docker class to avoid validate docker version 
> every time
> 
>
> Key: MESOS-6064
> URL: https://issues.apache.org/jira/browse/MESOS-6064
> Project: Mesos
>  Issue Type: Improvement
>  Components: docker
>Reporter: haosdent
>Assignee: haosdent
>  Labels: docker
>
> Now the minimum docker version we supported is >=1.0.0. However, we support 
> some advanced features after docker 1.0.0 as well which require 
> {{Docker::validateVersion}} before use them. {{Docker::validateVersion}} is a 
> blocking function which waits for {{docker --version}} return. Call it too 
> many times bring unnecessary overheads. It would be better that we add a 
> member field represent current docker version in {{Docker}} class to avoid to 
> execute {{docker --version}} every time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6064) Add version member field to Docker class to avoid validate docker version every time

2016-08-22 Thread haosdent (JIRA)
haosdent created MESOS-6064:
---

 Summary: Add version member field to Docker class to avoid 
validate docker version every time
 Key: MESOS-6064
 URL: https://issues.apache.org/jira/browse/MESOS-6064
 Project: Mesos
  Issue Type: Improvement
Reporter: haosdent
Assignee: haosdent


Now the minimum docker version we supported is >=1.0.0. However, we support 
some advanced features after docker 1.0.0 as well which require 
{{Docker::validateVersion}} before use them. {{Docker::validateVersion}} is a 
blocking function which waits for {{docker --version}} return. Call it too many 
times bring unnecessary overheads. It would be better that we add a member 
field represent current docker version in {{Docker}} class to avoid to execute 
{{docker --version}} every time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6055) Mesos libs in LD_LIBRARY_PATH cause fetcher to fail and not report errors

2016-08-22 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431325#comment-15431325
 ] 

Joseph Wu commented on MESOS-6055:
--

Can you provide more repro steps?  I haven't observed any fetcher linking 
issues...

> Mesos libs in LD_LIBRARY_PATH cause fetcher to fail and not report errors
> -
>
> Key: MESOS-6055
> URL: https://issues.apache.org/jira/browse/MESOS-6055
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
>Reporter: Charles Allen
>
> in 1.0.0, if the agent is launched such that the mesos libraries can only be 
> found under {{LD_LIBRARY_PATH}}, the fetcher will fail and simply exit with 
> no output. The log will not show linker errors. I'm not sure where they are 
> swallowed. If the task is launched with LD_LIBRARY_PATH set to include where 
> the mesos libs can be found, the fetcher functions as expected.
> The problem is that the errors in the fetcher linking are not obvious as no 
> logs are produced from the fetcher sub process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6064) Add version member field to Docker class to avoid validate docker version every time

2016-08-22 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6064:

Component/s: docker

> Add version member field to Docker class to avoid validate docker version 
> every time
> 
>
> Key: MESOS-6064
> URL: https://issues.apache.org/jira/browse/MESOS-6064
> Project: Mesos
>  Issue Type: Improvement
>  Components: docker
>Reporter: haosdent
>Assignee: haosdent
>  Labels: docker
>
> Now the minimum docker version we supported is >=1.0.0. However, we support 
> some advanced features after docker 1.0.0 as well which require 
> {{Docker::validateVersion}} before use them. {{Docker::validateVersion}} is a 
> blocking function which waits for {{docker --version}} return. Call it too 
> many times bring unnecessary overheads. It would be better that we add a 
> member field represent current docker version in {{Docker}} class to avoid to 
> execute {{docker --version}} every time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6035) Add non-recursive version of cgroups::get

2016-08-22 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431254#comment-15431254
 ] 

haosdent commented on MESOS-6035:
-

+1 for {{recursive=false}} default.

> Add non-recursive version of cgroups::get
> -
>
> Key: MESOS-6035
> URL: https://issues.apache.org/jira/browse/MESOS-6035
> Project: Mesos
>  Issue Type: Improvement
>Reporter: haosdent
>Assignee: haosdent
>Priority: Minor
>
> In some cases, we only need to get the top level cgroups instead of to get 
> all cgroups recursively. Add a non-recursive version could help to avoid 
> unnecessary paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6035) Add non-recursive version of cgroups::get

2016-08-22 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15425251#comment-15425251
 ] 

Yan Xu edited comment on MESOS-6035 at 8/22/16 5:35 PM:


I commented on the review. 

This is relevant to MESOS-5879 as well. [~swsnider] brought up a point if we 
default to *not* recursively descend into nested cgroups, a lot of problems 
including MESOS-5879 go aways. I can't think of a reason that we should 
traverse the cgroups recursively today (except {{cgroups::remove()}} and 
{{cgroups::destroy()}} but they are different in that they are within the 
cgroups util and encapsulate such details). Of course if we do have cases in 
the future they can set {{recursive=true}} explicitly.

[~jieyu] [~idownes] what are your thoughts on this?


was (Author: xujyan):
I commented on the review. 

This is relevant to MESOS-5879 as well. [~swsnider] brought up a point if we 
default to *not* recursively descend into nested cgroups, a lot of problems 
including MESOS-5879 go aways. I can't think of a reason that we should 
traverse the cgroups recursively today. Of course if we do have cases in the 
future they can set {{recursive=true}} explicitly.

[~jieyu] [~idownes] what are your thoughts on this?

> Add non-recursive version of cgroups::get
> -
>
> Key: MESOS-6035
> URL: https://issues.apache.org/jira/browse/MESOS-6035
> Project: Mesos
>  Issue Type: Improvement
>Reporter: haosdent
>Assignee: haosdent
>Priority: Minor
>
> In some cases, we only need to get the top level cgroups instead of to get 
> all cgroups recursively. Add a non-recursive version could help to avoid 
> unnecessary paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6035) Add non-recursive version of cgroups::get

2016-08-22 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15425251#comment-15425251
 ] 

Yan Xu edited comment on MESOS-6035 at 8/22/16 5:36 PM:


I commented on the review. 

This is relevant to MESOS-5879 as well. [~swsnider] brought up a point if we 
default to *not* recursively descend into nested cgroups, a lot of problems 
including MESOS-5879 go aways. I can't think of a reason that we should 
traverse the cgroups recursively today (except for {{cgroups::remove()}} and 
{{cgroups::destroy()}} but they are different in that they are within the 
cgroups util and encapsulate such details). Of course if we do have cases in 
the future they can set {{recursive=true}} explicitly.

[~jieyu] [~idownes] what are your thoughts on this?


was (Author: xujyan):
I commented on the review. 

This is relevant to MESOS-5879 as well. [~swsnider] brought up a point if we 
default to *not* recursively descend into nested cgroups, a lot of problems 
including MESOS-5879 go aways. I can't think of a reason that we should 
traverse the cgroups recursively today (except {{cgroups::remove()}} and 
{{cgroups::destroy()}} but they are different in that they are within the 
cgroups util and encapsulate such details). Of course if we do have cases in 
the future they can set {{recursive=true}} explicitly.

[~jieyu] [~idownes] what are your thoughts on this?

> Add non-recursive version of cgroups::get
> -
>
> Key: MESOS-6035
> URL: https://issues.apache.org/jira/browse/MESOS-6035
> Project: Mesos
>  Issue Type: Improvement
>Reporter: haosdent
>Assignee: haosdent
>Priority: Minor
>
> In some cases, we only need to get the top level cgroups instead of to get 
> all cgroups recursively. Add a non-recursive version could help to avoid 
> unnecessary paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6063) Track recovered and prepared subsystems for a container

2016-08-22 Thread haosdent (JIRA)
haosdent created MESOS-6063:
---

 Summary: Track recovered and prepared subsystems for a container
 Key: MESOS-6063
 URL: https://issues.apache.org/jira/browse/MESOS-6063
 Project: Mesos
  Issue Type: Improvement
  Components: cgroups
Reporter: haosdent
Assignee: haosdent


Currently, when we restart Mesos Agent with different cgroups subsystems, the 
exist containers would recover failed on newly added subsystems. In this case, 
we ignore them and continue to perform `usage`, `status` and `cleanup` on them. 
 It would be better that we track recovered and prepared subsystems for a 
container. Then ignore perform `update`, `wait`, `usage`, `status` on them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6062) mesos-agent should autodetect mount-type volume sizes

2016-08-22 Thread Yan Xu (JIRA)
Yan Xu created MESOS-6062:
-

 Summary: mesos-agent should autodetect mount-type volume sizes
 Key: MESOS-6062
 URL: https://issues.apache.org/jira/browse/MESOS-6062
 Project: Mesos
  Issue Type: Improvement
  Components: slave
Reporter: Yan Xu


When dealing with a large fleet of machines it could be cumbersome to construct 
the resources JSON file that varies from host to host. Mesos already 
auto-detects resources such as cpus, mem and "root" disk, it should extend it 
to the MOUNT type disk as it's pretty clear that the value should be the size 
of entire volume.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6061) Docker registry puller shows decode error "No response decoded".

2016-08-22 Thread Sunzhe (JIRA)
Sunzhe created MESOS-6061:
-

 Summary: Docker registry puller shows decode error "No response 
decoded".
 Key: MESOS-6061
 URL: https://issues.apache.org/jira/browse/MESOS-6061
 Project: Mesos
  Issue Type: Bug
  Components: containerization, docker
Affects Versions: 1.0.0
Reporter: Sunzhe


The {{mesos-agent}} flags:
{code}
 GLOG_v=1 ./bin/mesos-agent.sh \
  --master=zk://${MESOS_MASTER_IP}:2181/mesos  \
  --ip=10.100.3.3  \
  --work_dir=${MESOS_WORK_DIR} \
  
--isolation=cgroups/devices,gpu/nvidia,disk/du,docker/runtime,filesystem/linux \
  --enforce_container_disk_quota \
  --containerizers=mesos \
  --image_providers=docker \
  --executor_environment_variables="{}"
{code}
And the {{mesos-execute}} flags:
{code}
 ./src/mesos-execute \
   --master=${MESOS_MASTER_IP}:5050 \
   --name=${INSTANCE_NAME} \
   --docker_image=nvidia/cuda \
   --framework_capabilities=GPU_RESOURCES \
   --resources="cpus:1;mem:128;gpus:1"  \
   --command="nvidia-smi"
{code}
But when {{./src/mesos-execute}}, the errors like below:
{code}
I0822 18:45:55.423899  8821 scheduler.cpp:172] Version: 1.0.1
I0822 18:45:55.426172  8821 scheduler.cpp:461] New master detected at 
master@10.103.0.125:5050
Subscribed with ID '34126b61-9d41-48dd-9c85-b61e4f9ad4c9-0001'
Submitted task 'test' to agent 'b6c1587d-ab88-4734-9cb3-2cb916a73bf8-S1'
Received status update TASK_FAILED for task 'test'
  message: 'Failed to launch container: Failed to decode HTTP responses: No 
response decoded
HTTP/1.1 200 Connection established

HTTP/1.1 401 Unauthorized
Content-Type: application/json; charset=utf-8
Docker-Distribution-Api-Version: registry/2.0
Www-Authenticate: Bearer 
realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:nvidia/cuda:pull;
Date: Mon, 22 Aug 2016 10:46:25 GMT
Content-Length: 143
Strict-Transport-Security: max-age=31536000

{"errors":[{"code":"UNAUTHORIZED","message":"authentication 
required","detail":[{"Type":"repository","Name":"nvidia/cuda","Action":"pull"}]}]}
; Container destroyed while provisioning images'
  source: SOURCE_AGENT
  reason: REASON_CONTAINER_LAUNCH_FAILED
{code}

The Docker works well, I can use {{docker pull}} IMAGE. And  if I used the 
agent flag {{--docker_registry}} is a local path(i.e:{{/tmp/docker/images}}) in 
which Docker image archives(result of {{docker save}}) are stored, the 
mesos-execute works well.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)