date:20160627

[jira] [Commented] (MESOS-5717) Can't autodiscovery GPU resources without '--enable-nvidia-gpu-support' and '--nvidia_gpu_devices' flags

2016-06-27 Thread Sunzhe (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352358#comment-15352358
 ] 

Sunzhe commented on MESOS-5717:
---

It works well! Thank you very much.

I will continue to focus on NVIDIA GPU in Mesos and be glad to get some stuff 
to do in the territory. Thanks.

> Can't autodiscovery GPU resources without '--enable-nvidia-gpu-support' and 
> '--nvidia_gpu_devices' flags
> 
>
> Key: MESOS-5717
> URL: https://issues.apache.org/jira/browse/MESOS-5717
> Project: Mesos
>  Issue Type: Bug
> Environment: RHEL 7.2
>Reporter: Sunzhe
>Assignee: Kevin Klues
>  Labels: gpu
> Fix For: 1.0.0
>
>
> Prerequisite: In MESOS\-5257  "By default, with no '\-\-nvidia_gpu_devices' 
> flag or `gpus` resources flag, the new auto-discovery will simply enumerate 
> all of the GPUs on the system" and in MESOS\-5630 "removes this 
> flag(\-\-enable-nvidia-gpu-support) and enables this support for all builds 
> on Linux."
> So, I '../configure' without any flag, and start agent without 
> '\-\-resources' or '\-\-nvidia_gpu_devices' ,  but can not discovery GPU 
> resources, and I also start agent with '\-\-resources' and 
> '\-\-nvidia_gpu_devices' , it also does not work.
> I'm sure the NVIDIA GPUs on my machines are OK, because with 
> '\-\-enable-nvidia-gpu-support' when './configure' and with '\-\-resources', 
> '\-\-nvidia_gpu_devices' when starting agents it works well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-5731) Allow querying with metric types in GetMetrics

2016-06-27 Thread haosdent (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned MESOS-5731:
---

Assignee: haosdent

> Allow querying with metric types in GetMetrics
> --
>
> Key: MESOS-5731
> URL: https://issues.apache.org/jira/browse/MESOS-5731
> Project: Mesos
>  Issue Type: Task
>Reporter: haosdent
>Assignee: haosdent
>
> Refer to the email in dev mailing list [ Extend HTTP Endpoint to allow 
> querying for metric types | 
> http://search-hadoop.com/m/0Vlr69jfmN1kHein1=Re+Extend+HTTP+Endpoint+to+allow+querying+for+metric+types
>  ]
> We need support metric types as filter in {{GetMetrics}}
> {code}
> GetMetrics {
>   repeated Metric gauges = 1;
>   repeated Metric counters = 2;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-5731) Allow querying with metric types in GetMetrics

2016-06-27 Thread haosdent (JIRA)

haosdent created MESOS-5731:
---

 Summary: Allow querying with metric types in GetMetrics
 Key: MESOS-5731
 URL: https://issues.apache.org/jira/browse/MESOS-5731
 Project: Mesos
  Issue Type: Task
Reporter: haosdent


Refer to the email in dev mailing list [ Extend HTTP Endpoint to allow querying 
for metric types | 
http://search-hadoop.com/m/0Vlr69jfmN1kHein1=Re+Extend+HTTP+Endpoint+to+allow+querying+for+metric+types
 ]

We need support metric types as filter in {{GetMetrics}}
{code}
GetMetrics {
  repeated Metric gauges = 1;
  repeated Metric counters = 2;
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-5726) Benchmark the v1 Operator API

2016-06-27 Thread haosdent (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned MESOS-5726:
---

Assignee: haosdent

> Benchmark the v1 Operator API
> -
>
> Key: MESOS-5726
> URL: https://issues.apache.org/jira/browse/MESOS-5726
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: haosdent
>
> Just like what we did with the v1 framework API, we need to benchmark the 
> performance of v1 operator API.
> As part of this benchmarking, we should evaluate whether evolving 
> un-versioned protos to versioned protos in some of the API handlers (e.g., 
> getFrameworks) is expensive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5227) Implement HTTP Docker Executor that uses the Executor Library

2016-06-27 Thread Yong Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352279#comment-15352279
 ] 

Yong Tang commented on MESOS-5227:
--

[~guoger] The review request has been updated. Now there is only one RR related 
to this issue:
https://reviews.apache.org/r/49240/

> Implement HTTP Docker Executor that uses the Executor Library
> -
>
> Key: MESOS-5227
> URL: https://issues.apache.org/jira/browse/MESOS-5227
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Yong Tang
>
> Similar to what we did with the HTTP command executor in MESOS-3558 we should 
> have a HTTP docker executor that can speak the v1 Executor API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3019) Automate updates to configuration.md

2016-06-27 Thread Benjamin Mahler (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-3019:
---
Labels: release tech-debt  (was: release)

> Automate updates to configuration.md
> 
>
> Key: MESOS-3019
> URL: https://issues.apache.org/jira/browse/MESOS-3019
> Project: Mesos
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Adam B
>  Labels: release, tech-debt
>
> Each release we add new flags and modify existing flags. Sometimes the 
> configuration.md doc is updated by those making the changes, but the release 
> manager inevitably ends up making a pass to look for new flags that were not 
> added to the doc. It would be great if this was an automated process so the 
> release manager could just run a script to update configuration.md, or at 
> least check for missing flags.
> Note: Flags come from:
> src/logging/flags.cpp
> src/master/flags.cpp
> src/master/main.cpp
> src/slave/flags.cpp
> src/slave/main.cpp



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5717) Can't autodiscovery GPU resources without '--enable-nvidia-gpu-support' and '--nvidia_gpu_devices' flags

2016-06-27 Thread Kevin Klues (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352240#comment-15352240
 ] 

Kevin Klues commented on MESOS-5717:


We noticed this error in the documentation and are fixing it now:
{noformat}
  if (devicesIsolator == tokens.end()) {
return Error("The 'cgroups/devices' isolator must be enabled in"
 " order to use the gpu/devices isolator");
  }
{noformat}

The error message should read "The 'cgroups/devices' isolator must be enabled 
in order to use the gpu/nvidia isolator".

We also noticed an inconsistency in the documentation for the `--isolator` 
flag.  This fix has already been committed:
{noformat}
commit a2fd5f38e02c86c1020b0ae243915358e37d2b2b
Author: Benjamin Mahler 
Date:   Mon Jun 27 18:59:21 2016 -0700

Removed stale flag references to the GPU isolator.

{nofromat}

Hopefully this clears things up.

> Can't autodiscovery GPU resources without '--enable-nvidia-gpu-support' and 
> '--nvidia_gpu_devices' flags
> 
>
> Key: MESOS-5717
> URL: https://issues.apache.org/jira/browse/MESOS-5717
> Project: Mesos
>  Issue Type: Bug
> Environment: RHEL 7.2
>Reporter: Sunzhe
>Assignee: Kevin Klues
>  Labels: gpu
> Fix For: 1.0.0
>
>
> Prerequisite: In MESOS\-5257  "By default, with no '\-\-nvidia_gpu_devices' 
> flag or `gpus` resources flag, the new auto-discovery will simply enumerate 
> all of the GPUs on the system" and in MESOS\-5630 "removes this 
> flag(\-\-enable-nvidia-gpu-support) and enables this support for all builds 
> on Linux."
> So, I '../configure' without any flag, and start agent without 
> '\-\-resources' or '\-\-nvidia_gpu_devices' ,  but can not discovery GPU 
> resources, and I also start agent with '\-\-resources' and 
> '\-\-nvidia_gpu_devices' , it also does not work.
> I'm sure the NVIDIA GPUs on my machines are OK, because with 
> '\-\-enable-nvidia-gpu-support' when './configure' and with '\-\-resources', 
> '\-\-nvidia_gpu_devices' when starting agents it works well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-5717) Can't autodiscovery GPU resources without '--enable-nvidia-gpu-support' and '--nvidia_gpu_devices' flags

2016-06-27 Thread Kevin Klues (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352240#comment-15352240
 ] 

Kevin Klues edited comment on MESOS-5717 at 6/28/16 2:41 AM:
-

We noticed this error in the documentation and are fixing it now:
{noformat}
  if (devicesIsolator == tokens.end()) {
return Error("The 'cgroups/devices' isolator must be enabled in"
 " order to use the gpu/devices isolator");
  }
{noformat}

The error message should read "The 'cgroups/devices' isolator must be enabled 
in order to use the gpu/nvidia isolator".

We also noticed an inconsistency in the documentation for the `--isolator` 
flag.  This fix has already been committed:
{noformat}
commit a2fd5f38e02c86c1020b0ae243915358e37d2b2b
Author: Benjamin Mahler 
Date:   Mon Jun 27 18:59:21 2016 -0700

Removed stale flag references to the GPU isolator.

{noformat}

Hopefully this clears things up.


was (Author: klueska):
We noticed this error in the documentation and are fixing it now:
{noformat}
  if (devicesIsolator == tokens.end()) {
return Error("The 'cgroups/devices' isolator must be enabled in"
 " order to use the gpu/devices isolator");
  }
{noformat}

The error message should read "The 'cgroups/devices' isolator must be enabled 
in order to use the gpu/nvidia isolator".

We also noticed an inconsistency in the documentation for the `--isolator` 
flag.  This fix has already been committed:
{noformat}
commit a2fd5f38e02c86c1020b0ae243915358e37d2b2b
Author: Benjamin Mahler 
Date:   Mon Jun 27 18:59:21 2016 -0700

Removed stale flag references to the GPU isolator.

{nofromat}

Hopefully this clears things up.

> Can't autodiscovery GPU resources without '--enable-nvidia-gpu-support' and 
> '--nvidia_gpu_devices' flags
> 
>
> Key: MESOS-5717
> URL: https://issues.apache.org/jira/browse/MESOS-5717
> Project: Mesos
>  Issue Type: Bug
> Environment: RHEL 7.2
>Reporter: Sunzhe
>Assignee: Kevin Klues
>  Labels: gpu
> Fix For: 1.0.0
>
>
> Prerequisite: In MESOS\-5257  "By default, with no '\-\-nvidia_gpu_devices' 
> flag or `gpus` resources flag, the new auto-discovery will simply enumerate 
> all of the GPUs on the system" and in MESOS\-5630 "removes this 
> flag(\-\-enable-nvidia-gpu-support) and enables this support for all builds 
> on Linux."
> So, I '../configure' without any flag, and start agent without 
> '\-\-resources' or '\-\-nvidia_gpu_devices' ,  but can not discovery GPU 
> resources, and I also start agent with '\-\-resources' and 
> '\-\-nvidia_gpu_devices' , it also does not work.
> I'm sure the NVIDIA GPUs on my machines are OK, because with 
> '\-\-enable-nvidia-gpu-support' when './configure' and with '\-\-resources', 
> '\-\-nvidia_gpu_devices' when starting agents it works well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5730) Sandbox access authorization should fail for non existing sandboxes.

2016-06-27 Thread Till Toenshoff (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-5730:
--
Description: 
The local authorizer currently tries to authorize {{ACCESS_SANDBOX}} even if no 
further object specification - e.g. {{framework_info}} or {{executor_info}}) 
where specified / available at that time.

Given that there is likely no sandbox available if there was no 
{{executor_info}} provided, I think we should actually fail instead of allow or 
deny (403).

A failure would result into an IMHO more appropriate ServiceUnavailable (503).  

See 
https://github.com/apache/mesos/commit/c8d67590064e35566274116cede9c6a733187b48#diff-dd692b1640b2628014feca01a94ba1e1R241


  was:
The local authorizer currently tries to authorize {{ACCESS_SANDBOX}} even if no 
further object specification - e.g. {{framework_info}} or {{executor_info}}) 
where specified / available at that time.

Given that there is likely no sandbox available if there was no 
{{executor_info}} provided, I think we should actually fail instead of deny 
(403).

A failure would result into an IMHO more appropriate ServiceUnavailable (503).  

See 
https://github.com/apache/mesos/commit/c8d67590064e35566274116cede9c6a733187b48#diff-dd692b1640b2628014feca01a94ba1e1R241



> Sandbox access authorization should fail for non existing sandboxes.
> 
>
> Key: MESOS-5730
> URL: https://issues.apache.org/jira/browse/MESOS-5730
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Till Toenshoff
>Priority: Blocker
>  Labels: authorization, mesosphere, security
> Fix For: 1.0.0
>
>
> The local authorizer currently tries to authorize {{ACCESS_SANDBOX}} even if 
> no further object specification - e.g. {{framework_info}} or 
> {{executor_info}}) where specified / available at that time.
> Given that there is likely no sandbox available if there was no 
> {{executor_info}} provided, I think we should actually fail instead of allow 
> or deny (403).
> A failure would result into an IMHO more appropriate ServiceUnavailable 
> (503).  
> See 
> https://github.com/apache/mesos/commit/c8d67590064e35566274116cede9c6a733187b48#diff-dd692b1640b2628014feca01a94ba1e1R241



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-5717) Can't autodiscovery GPU resources without '--enable-nvidia-gpu-support' and '--nvidia_gpu_devices' flags

2016-06-27 Thread Kevin Klues (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues reassigned MESOS-5717:
--

Assignee: Kevin Klues

> Can't autodiscovery GPU resources without '--enable-nvidia-gpu-support' and 
> '--nvidia_gpu_devices' flags
> 
>
> Key: MESOS-5717
> URL: https://issues.apache.org/jira/browse/MESOS-5717
> Project: Mesos
>  Issue Type: Bug
> Environment: RHEL 7.2
>Reporter: Sunzhe
>Assignee: Kevin Klues
>  Labels: gpu
> Fix For: 1.0.0
>
>
> Prerequisite: In MESOS\-5257  "By default, with no '\-\-nvidia_gpu_devices' 
> flag or `gpus` resources flag, the new auto-discovery will simply enumerate 
> all of the GPUs on the system" and in MESOS\-5630 "removes this 
> flag(\-\-enable-nvidia-gpu-support) and enables this support for all builds 
> on Linux."
> So, I '../configure' without any flag, and start agent without 
> '\-\-resources' or '\-\-nvidia_gpu_devices' ,  but can not discovery GPU 
> resources, and I also start agent with '\-\-resources' and 
> '\-\-nvidia_gpu_devices' , it also does not work.
> I'm sure the NVIDIA GPUs on my machines are OK, because with 
> '\-\-enable-nvidia-gpu-support' when './configure' and with '\-\-resources', 
> '\-\-nvidia_gpu_devices' when starting agents it works well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5717) Can't autodiscovery GPU resources without '--enable-nvidia-gpu-support' and '--nvidia_gpu_devices' flags

2016-06-27 Thread Kevin Klues (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352224#comment-15352224
 ] 

Kevin Klues commented on MESOS-5717:


The documentation for using GPUs is not committed yet (we will have it 
committed by the time the 1.0 release comes out).  There has been no release 
with GPU support yet, so the documentation is lagging behind a bit.

For what you want to do, the quick answer is:

{noformat}
./bin/mesos-agent.sh --master=127.0.0.1:5050 --work_dir=/var/lib/mesos/agent 
--isolation=cgroups/devices,gpu/nvidia
{noformat}

If the error message you are seeing is wrong, or the help output is wrong when 
describing the --isolation flag. Please file a separate ticket to report these. 
 Thanks.

> Can't autodiscovery GPU resources without '--enable-nvidia-gpu-support' and 
> '--nvidia_gpu_devices' flags
> 
>
> Key: MESOS-5717
> URL: https://issues.apache.org/jira/browse/MESOS-5717
> Project: Mesos
>  Issue Type: Bug
> Environment: RHEL 7.2
>Reporter: Sunzhe
>  Labels: gpu
> Fix For: 1.0.0
>
>
> Prerequisite: In MESOS\-5257  "By default, with no '\-\-nvidia_gpu_devices' 
> flag or `gpus` resources flag, the new auto-discovery will simply enumerate 
> all of the GPUs on the system" and in MESOS\-5630 "removes this 
> flag(\-\-enable-nvidia-gpu-support) and enables this support for all builds 
> on Linux."
> So, I '../configure' without any flag, and start agent without 
> '\-\-resources' or '\-\-nvidia_gpu_devices' ,  but can not discovery GPU 
> resources, and I also start agent with '\-\-resources' and 
> '\-\-nvidia_gpu_devices' , it also does not work.
> I'm sure the NVIDIA GPUs on my machines are OK, because with 
> '\-\-enable-nvidia-gpu-support' when './configure' and with '\-\-resources', 
> '\-\-nvidia_gpu_devices' when starting agents it works well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5730) Sandbox access authorization should fail for non existing sandboxes.

2016-06-27 Thread Till Toenshoff (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-5730:
--
Affects Version/s: 1.0.0

> Sandbox access authorization should fail for non existing sandboxes.
> 
>
> Key: MESOS-5730
> URL: https://issues.apache.org/jira/browse/MESOS-5730
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Till Toenshoff
>Priority: Blocker
>  Labels: authorization, mesosphere, security
> Fix For: 1.0.0
>
>
> The local authorizer currently tries to authorize {{ACCESS_SANDBOX}} even if 
> no further object specification - e.g. {{framework_info}} or 
> {{executor_info}}) where specified / available at that time.
> Given that there is likely no sandbox available if there was no 
> {{executor_info}} provided, I think we should actually fail instead of deny 
> (403).
> A failure would result into an IMHO more appropriate ServiceUnavailable 
> (503).  
> See 
> https://github.com/apache/mesos/commit/c8d67590064e35566274116cede9c6a733187b48#diff-dd692b1640b2628014feca01a94ba1e1R241



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-5730) Sandbox access authorization should fail for non existing sandboxes.

2016-06-27 Thread Till Toenshoff (JIRA)

Till Toenshoff created MESOS-5730:
-

 Summary: Sandbox access authorization should fail for non existing 
sandboxes.
 Key: MESOS-5730
 URL: https://issues.apache.org/jira/browse/MESOS-5730
 Project: Mesos
  Issue Type: Bug
Reporter: Till Toenshoff
Priority: Blocker
 Fix For: 1.0.0


The local authorizer currently tries to authorize {{ACCESS_SANDBOX}} even if no 
further object specification - e.g. {{framework_info}} or {{executor_info}}) 
where specified / available at that time.

Given that there is likely no sandbox available if there was no 
{{executor_info}} provided, I think we should actually fail instead of deny 
(403).

A failure would result into an IMHO more appropriate ServiceUnavailable (503).  

See 
https://github.com/apache/mesos/commit/c8d67590064e35566274116cede9c6a733187b48#diff-dd692b1640b2628014feca01a94ba1e1R241




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-5717) Can't autodiscovery GPU resources without '--enable-nvidia-gpu-support' and '--nvidia_gpu_devices' flags

2016-06-27 Thread Sunzhe (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352181#comment-15352181
 ] 

Sunzhe edited comment on MESOS-5717 at 6/28/16 1:38 AM:


Yes, when starting agent: {{./bin/mesos\-agent.sh \-\-master=127.0.0.1:5050 
--work_dir=/var/lib/mesos/agent \-\-isolation=gpu/nvidia}} , 
but appears {{Failed to create a containerizer: Could not create 
MesosContainerizer: Failed to create isolator 'gpu/nvidia': The 
'cgroups/devices' isolator must be enabled in order to use the gpu/devices 
isolator}}, 
if i want to enable {{cgroups/devices}}, from explanation of the flag 
{{\-\-isolation}}, i should {{configure with flag: 
`\-\-enable\-nvidia\-gpu\-support` to enable}}, but 
{{\-\-enable-nvidia-gpu-support}} has been removed in Mesos-5630 and can enable 
this support for all bulds on Linux, That's right?


was (Author: sunzhe):
Yes, when starting agent: {{./bin/mesos\-agent.sh \-\-master=127.0.0.1:5050 
--work_dir=/var/lib/mesos/agent \-\-isolation=gpu/nvidia}} , but appears 
{{Failed to create a containerizer: Could not create MesosContainerizer: Failed 
to create isolator 'gpu/nvidia': The 'cgroups/devices' isolator must be enabled 
in order to use the gpu/devices isolator}}, if i want to enable 
{{cgroups/devices}}, from explanation of the flag {{\-\-isolation}}, i should 
{{configure with flag: `\-\-enable\-nvidia\-gpu\-support` to enable}}, but 
{{\-\-enable-nvidia-gpu-support}} has been removed in Mesos-5630 and can enable 
this support for all bulds on Linux, That's right?

> Can't autodiscovery GPU resources without '--enable-nvidia-gpu-support' and 
> '--nvidia_gpu_devices' flags
> 
>
> Key: MESOS-5717
> URL: https://issues.apache.org/jira/browse/MESOS-5717
> Project: Mesos
>  Issue Type: Bug
> Environment: RHEL 7.2
>Reporter: Sunzhe
>  Labels: gpu
> Fix For: 1.0.0
>
>
> Prerequisite: In MESOS\-5257  "By default, with no '\-\-nvidia_gpu_devices' 
> flag or `gpus` resources flag, the new auto-discovery will simply enumerate 
> all of the GPUs on the system" and in MESOS\-5630 "removes this 
> flag(\-\-enable-nvidia-gpu-support) and enables this support for all builds 
> on Linux."
> So, I '../configure' without any flag, and start agent without 
> '\-\-resources' or '\-\-nvidia_gpu_devices' ,  but can not discovery GPU 
> resources, and I also start agent with '\-\-resources' and 
> '\-\-nvidia_gpu_devices' , it also does not work.
> I'm sure the NVIDIA GPUs on my machines are OK, because with 
> '\-\-enable-nvidia-gpu-support' when './configure' and with '\-\-resources', 
> '\-\-nvidia_gpu_devices' when starting agents it works well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-5717) Can't autodiscovery GPU resources without '--enable-nvidia-gpu-support' and '--nvidia_gpu_devices' flags

2016-06-27 Thread Sunzhe (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352181#comment-15352181
 ] 

Sunzhe edited comment on MESOS-5717 at 6/28/16 1:37 AM:


Yes, when starting agent: {{./bin/mesos\-agent.sh \-\-master=127.0.0.1:5050 
--work_dir=/var/lib/mesos/agent \-\-isolation=gpu/nvidia}} , but appears 
{{Failed to create a containerizer: Could not create MesosContainerizer: Failed 
to create isolator 'gpu/nvidia': The 'cgroups/devices' isolator must be enabled 
in order to use the gpu/devices isolator}}, if i want to enable 
{{cgroups/devices}}, from explanation of the flag {{\-\-isolation}}, i should 
{{configure with flag: `\-\-enable\-nvidia\-gpu\-support` to enable}}, but 
{{\-\-enable-nvidia-gpu-support}} has been removed in Mesos-5630 and can enable 
this support for all bulds on Linux, That's right?


was (Author: sunzhe):
Yes, when starting agent: {{./bin/mesos-agent.sh --master=127.0.0.1:5050 
--work_dir=/var/lib/mesos/agent --isolation=gpu/nvidia}} , but appears {{Failed 
to create a containerizer: Could not create MesosContainerizer: Failed to 
create isolator 'gpu/nvidia': The 'cgroups/devices' isolator must be enabled in 
order to use the gpu/devices isolator}}, if i want to enable 
{{cgroups/devices}}, from explanation of the flag {{--isolation}}, i should 
{{configure with flag: `--enable-nvidia-gpu-support` to enable}}, but 
{{--enable-nvidia-gpu-support}} has been removed in Mesos-5630 and can enable 
this support for all bulds on Linux, That's right?

> Can't autodiscovery GPU resources without '--enable-nvidia-gpu-support' and 
> '--nvidia_gpu_devices' flags
> 
>
> Key: MESOS-5717
> URL: https://issues.apache.org/jira/browse/MESOS-5717
> Project: Mesos
>  Issue Type: Bug
> Environment: RHEL 7.2
>Reporter: Sunzhe
>  Labels: gpu
> Fix For: 1.0.0
>
>
> Prerequisite: In MESOS\-5257  "By default, with no '\-\-nvidia_gpu_devices' 
> flag or `gpus` resources flag, the new auto-discovery will simply enumerate 
> all of the GPUs on the system" and in MESOS\-5630 "removes this 
> flag(\-\-enable-nvidia-gpu-support) and enables this support for all builds 
> on Linux."
> So, I '../configure' without any flag, and start agent without 
> '\-\-resources' or '\-\-nvidia_gpu_devices' ,  but can not discovery GPU 
> resources, and I also start agent with '\-\-resources' and 
> '\-\-nvidia_gpu_devices' , it also does not work.
> I'm sure the NVIDIA GPUs on my machines are OK, because with 
> '\-\-enable-nvidia-gpu-support' when './configure' and with '\-\-resources', 
> '\-\-nvidia_gpu_devices' when starting agents it works well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5717) Can't autodiscovery GPU resources without '--enable-nvidia-gpu-support' and '--nvidia_gpu_devices' flags

2016-06-27 Thread Sunzhe (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352181#comment-15352181
 ] 

Sunzhe commented on MESOS-5717:
---

Yes, when starting agent: {{./bin/mesos-agent.sh --master=127.0.0.1:5050 
--work_dir=/var/lib/mesos/agent --isolation=gpu/nvidia}} , but appears {{Failed 
to create a containerizer: Could not create MesosContainerizer: Failed to 
create isolator 'gpu/nvidia': The 'cgroups/devices' isolator must be enabled in 
order to use the gpu/devices isolator}}, if i want to enable 
{{cgroups/devices}}, from explanation of the flag {{--isolation}}, i should 
{{configure with flag: `--enable-nvidia-gpu-support` to enable}}, but 
{{--enable-nvidia-gpu-support}} has been removed in Mesos-5630 and can enable 
this support for all bulds on Linux, That's right?

> Can't autodiscovery GPU resources without '--enable-nvidia-gpu-support' and 
> '--nvidia_gpu_devices' flags
> 
>
> Key: MESOS-5717
> URL: https://issues.apache.org/jira/browse/MESOS-5717
> Project: Mesos
>  Issue Type: Bug
> Environment: RHEL 7.2
>Reporter: Sunzhe
>  Labels: gpu
> Fix For: 1.0.0
>
>
> Prerequisite: In MESOS\-5257  "By default, with no '\-\-nvidia_gpu_devices' 
> flag or `gpus` resources flag, the new auto-discovery will simply enumerate 
> all of the GPUs on the system" and in MESOS\-5630 "removes this 
> flag(\-\-enable-nvidia-gpu-support) and enables this support for all builds 
> on Linux."
> So, I '../configure' without any flag, and start agent without 
> '\-\-resources' or '\-\-nvidia_gpu_devices' ,  but can not discovery GPU 
> resources, and I also start agent with '\-\-resources' and 
> '\-\-nvidia_gpu_devices' , it also does not work.
> I'm sure the NVIDIA GPUs on my machines are OK, because with 
> '\-\-enable-nvidia-gpu-support' when './configure' and with '\-\-resources', 
> '\-\-nvidia_gpu_devices' when starting agents it works well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-5729) Consider allowing the libprocess caller an option to not set CLOEXEC on libprocess sockets

2016-06-27 Thread Joseph Wu (JIRA)

Joseph Wu created MESOS-5729:


 Summary: Consider allowing the libprocess caller an option to not 
set CLOEXEC on libprocess sockets
 Key: MESOS-5729
 URL: https://issues.apache.org/jira/browse/MESOS-5729
 Project: Mesos
  Issue Type: Improvement
  Components: libprocess
Reporter: Joseph Wu


Both implementations of libprocess's {{Socket}} interface will set the 
{{CLOEXEC}} option on all new sockets (incoming or outgoing).  This assumption 
is pervasive across Mesos, but since libprocess aims to be a general-purpose 
library, the caller should be able to *not* {{CLOEXEC}} sockets when desired.

See TODOs added here: https://reviews.apache.org/r/49281/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-5728) Refactor `mesos-containerizer launch` so that executor can use that to launch user tasks.

2016-06-27 Thread Jie Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-5728:
-

Assignee: Jie Yu  (was: Gilbert Song)

> Refactor `mesos-containerizer launch` so that executor can use that to launch 
> user tasks.
> -
>
> Key: MESOS-5728
> URL: https://issues.apache.org/jira/browse/MESOS-5728
> Project: Mesos
>  Issue Type: Wish
>Reporter: Jie Yu
>Assignee: Jie Yu
>
> This will be useful for both command executor and custom executor (e.g. 
> thermos). If you look at the current impl of `mesos-containerizer launch` and 
> the command executor, they share a lot of the logic. It'll be ideal to merge 
> them.
> This is related to his review:
> https://reviews.apache.org/r/49273/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5723) SSL-enabled libprocess will leak incoming links to forks

2016-06-27 Thread Joseph Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-5723:
-
Shepherd: Joris Van Remoortere

> SSL-enabled libprocess will leak incoming links to forks
> 
>
> Key: MESOS-5723
> URL: https://issues.apache.org/jira/browse/MESOS-5723
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 0.24.0, 0.25.0, 0.26.0, 0.27.0, 0.28.0
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>Priority: Blocker
>  Labels: libprocess, mesosphere, ssl
> Fix For: 1.0.0
>
>
> Encountered two different buggy behaviors that can be tracked down to the 
> same underlying problem.
> Repro #1 (non-crashy):
> (1) Start a master.  Doesn't matter if SSL is enabled or not.
> (2) Start an agent, with SSL enabled.  Downgrade support has the same 
> problem.  The master/agent {{link}} to one another.
> (3) Run a sleep task.  Keep this alive.  If you inspect FDs at this point, 
> you'll notice the task has inherited the {{link}} FD (master -> agent).
> (4) Restart the agent.  Due to (3), the master's {{link}} stays open.
> (5) Check master's logs for the agent's re-registration message.
> (6) Check the agent's logs for re-registration.  The message will not appear. 
>  The master is actually using the old {{link}} which is not connected to the 
> agent.
> 
> Repro #2 (crashy):
> (1) Start a master.  Doesn't matter if SSL is enabled or not.
> (2) Start an agent, with SSL enabled.  Downgrade support has the same problem.
> (3) Run ~100 sleep task one after the other, keep them all alive.  Each task 
> links back to the agent.  Due to an FD leak, each task will inherit the 
> incoming links from all other actors...
> (4) At some point, the agent will run out of FDs and kernel panic.
> 
> It appears that the SSL socket {{accept}} call is missing {{os::nonblock}} 
> and {{os::cloexec}} calls:
> https://github.com/apache/mesos/blob/4b91d936f50885b6a66277e26ea3c32fe942cf1a/3rdparty/libprocess/src/libevent_ssl_socket.cpp#L794-L806
> For reference, here's {{poll}} socket's {{accept}}:
> https://github.com/apache/mesos/blob/4b91d936f50885b6a66277e26ea3c32fe942cf1a/3rdparty/libprocess/src/poll_socket.cpp#L53-L75



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-5728) Refactor `mesos-containerizer launch` so that executor can use that to launch user tasks.

2016-06-27 Thread Gilbert Song (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song reassigned MESOS-5728:
---

Assignee: Gilbert Song

> Refactor `mesos-containerizer launch` so that executor can use that to launch 
> user tasks.
> -
>
> Key: MESOS-5728
> URL: https://issues.apache.org/jira/browse/MESOS-5728
> Project: Mesos
>  Issue Type: Wish
>Reporter: Jie Yu
>Assignee: Gilbert Song
>
> This will be useful for both command executor and custom executor (e.g. 
> thermos). If you look at the current impl of `mesos-containerizer launch` and 
> the command executor, they share a lot of the logic. It'll be ideal to merge 
> them.
> This is related to his review:
> https://reviews.apache.org/r/49273/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-5727) Command executor health check does not work when the task specifies container image.

2016-06-27 Thread Gilbert Song (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song reassigned MESOS-5727:
---

Assignee: Gilbert Song

> Command executor health check does not work when the task specifies container 
> image.
> 
>
> Key: MESOS-5727
> URL: https://issues.apache.org/jira/browse/MESOS-5727
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.2, 1.0.0
>Reporter: Jie Yu
>Assignee: Gilbert Song
> Fix For: 1.0.0
>
>
> Since we launch the task after pivot_root, we no longer has the access to the 
> mesos-health-check binary. The solution is to refactor health check to be a 
> library (libprocess) so that it does not depend on the underlying filesystem.
> One note here is that we should strive to keep both the command executor and 
> the task in the same mount namespace so that Mesos CLI tooling does not need 
> to find the mount namespace for the task. It just need to find the 
> corresponding pid for the executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5727) Command executor health check does not work when the task specifies container image.

2016-06-27 Thread Jie Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-5727:
--
Fix Version/s: 1.0.0

> Command executor health check does not work when the task specifies container 
> image.
> 
>
> Key: MESOS-5727
> URL: https://issues.apache.org/jira/browse/MESOS-5727
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.2, 1.0.0
>Reporter: Jie Yu
> Fix For: 1.0.0
>
>
> Since we launch the task after pivot_root, we no longer has the access to the 
> mesos-health-check binary. The solution is to refactor health check to be a 
> library (libprocess) so that it does not depend on the underlying filesystem.
> One note here is that we should strive to keep both the command executor and 
> the task in the same mount namespace so that Mesos CLI tooling does not need 
> to find the mount namespace for the task. It just need to find the 
> corresponding pid for the executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5728) Refactor `mesos-containerizer launch` so that executor can use that to launch user tasks.

2016-06-27 Thread Jie Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-5728:
--
Description: 
This will be useful for both command executor and custom executor (e.g. 
thermos). If you look at the current impl of `mesos-containerizer launch` and 
the command executor, they share a lot of the logic. It'll be ideal to merge 
them.

This is related to his review:
https://reviews.apache.org/r/49273/

  was:This will be useful for both command executor and custom executor (e.g. 
thermos). If you look at the current impl of `mesos-containerizer launch` and 
the command executor, they share a lot of the logic. It'll be ideal to merge 
them.


> Refactor `mesos-containerizer launch` so that executor can use that to launch 
> user tasks.
> -
>
> Key: MESOS-5728
> URL: https://issues.apache.org/jira/browse/MESOS-5728
> Project: Mesos
>  Issue Type: Wish
>Reporter: Jie Yu
>
> This will be useful for both command executor and custom executor (e.g. 
> thermos). If you look at the current impl of `mesos-containerizer launch` and 
> the command executor, they share a lot of the logic. It'll be ideal to merge 
> them.
> This is related to his review:
> https://reviews.apache.org/r/49273/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5727) Command executor health check does not work when the task specifies container image.

2016-06-27 Thread Jie Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-5727:
--
Description: 
Since we launch the task after pivot_root, we no longer has the access to the 
mesos-health-check binary. The solution is to refactor health check to be a 
library (libprocess) so that it does not depend on the underlying filesystem.

One note here is that we should strive to keep both the command executor and 
the task in the same mount namespace so that Mesos CLI tooling does not need to 
find the mount namespace for the task. It just need to find the corresponding 
pid for the executor.

  was:Since we launch the task after pivot_root, we no longer has the access to 
the mesos-health-check binary. The solution is to refactor health check to be a 
library (libprocess) so that it does not depend on the underlying filesystem.


> Command executor health check does not work when the task specifies container 
> image.
> 
>
> Key: MESOS-5727
> URL: https://issues.apache.org/jira/browse/MESOS-5727
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.2, 1.0.0
>Reporter: Jie Yu
>
> Since we launch the task after pivot_root, we no longer has the access to the 
> mesos-health-check binary. The solution is to refactor health check to be a 
> library (libprocess) so that it does not depend on the underlying filesystem.
> One note here is that we should strive to keep both the command executor and 
> the task in the same mount namespace so that Mesos CLI tooling does not need 
> to find the mount namespace for the task. It just need to find the 
> corresponding pid for the executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-5728) Refactor `mesos-containerizer launch` so that executor can use that to launch user tasks.

2016-06-27 Thread Jie Yu (JIRA)

Jie Yu created MESOS-5728:
-

 Summary: Refactor `mesos-containerizer launch` so that executor 
can use that to launch user tasks.
 Key: MESOS-5728
 URL: https://issues.apache.org/jira/browse/MESOS-5728
 Project: Mesos
  Issue Type: Wish
Reporter: Jie Yu


This will be useful for both command executor and custom executor (e.g. 
thermos). If you look at the current impl of `mesos-containerizer launch` and 
the command executor, they share a lot of the logic. It'll be ideal to merge 
them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-5727) Command executor health check does not work when the task specifies container image.

2016-06-27 Thread Jie Yu (JIRA)

Jie Yu created MESOS-5727:
-

 Summary: Command executor health check does not work when the task 
specifies container image.
 Key: MESOS-5727
 URL: https://issues.apache.org/jira/browse/MESOS-5727
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.28.2, 1.0.0
Reporter: Jie Yu


Since we launch the task after pivot_root, we no longer has the access to the 
mesos-health-check binary. The solution is to refactor health check to be a 
library (libprocess) so that it does not depend on the underlying filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5232) Add capability information to ContainerInfo protobuf message.

2016-06-27 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5232:
-
Sprint: Mesosphere Sprint 33, Mesosphere Sprint 34, Mesosphere Sprint 35, 
Mesosphere Sprint 37, Mesosphere Sprint 38  (was: Mesosphere Sprint 33, 
Mesosphere Sprint 34, Mesosphere Sprint 35, Mesosphere Sprint 37)

> Add capability information to ContainerInfo protobuf message.
> -
>
> Key: MESOS-5232
> URL: https://issues.apache.org/jira/browse/MESOS-5232
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Jojy Varghese
>Assignee: Jojy Varghese
>  Labels: mesosphere
>
> To enable support for capability as first class framework entity, we need to 
> add capabilities related information to the ContainerInfo protobuf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5563) Rearrange Nvidia GPU files to cleanup semantics for header inclusion.

2016-06-27 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5563:
-
Sprint: Mesosphere Sprint 36, Mesosphere Sprint 37, Mesosphere Sprint 38  
(was: Mesosphere Sprint 36, Mesosphere Sprint 37)

> Rearrange Nvidia GPU files to cleanup semantics for header inclusion.
> -
>
> Key: MESOS-5563
> URL: https://issues.apache.org/jira/browse/MESOS-5563
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: gpu, mesosphere
>
> Currently, components outside of 
> `src/slave/containerizers/mesos/isolators/gpu` have to protect their 
> #includes for certain Nvidia header files with the ENABLE_NVIDIA_GPU_SUPPORT 
> flag. Other headers strictly *could not* be wrapped in this flag.
> 
> We need to clean up this header madness, by creating a common "nvidia.hpp" 
> header that takes care of all the dependencies. All componenents outside of 
> `src/slave/containerizers/mesos/isolators/gpu`
> should only need to #include this one header instead of managing everything 
> themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5558) Update `Containerizer::resources()` to use the `NvidiaGpuAllocator`

2016-06-27 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5558:
-
Sprint: Mesosphere Sprint 36, Mesosphere Sprint 37, Mesosphere Sprint 38  
(was: Mesosphere Sprint 36, Mesosphere Sprint 37)

> Update `Containerizer::resources()` to use the `NvidiaGpuAllocator`
> ---
>
> Key: MESOS-5558
> URL: https://issues.apache.org/jira/browse/MESOS-5558
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: gpu, mesosphere
>
> With the introduction of the shared `NvidiaGpuAllocator` component, 
> `Containerizer::resources()` should be updated to use it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5172) Registry puller cannot fetch blobs correctly from some private repos.

2016-06-27 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5172:
-
Sprint: Mesosphere Sprint 33, Mesosphere Sprint 37, Mesosphere Sprint 38  
(was: Mesosphere Sprint 33, Mesosphere Sprint 37)

> Registry puller cannot fetch blobs correctly from some private repos.
> -
>
> Key: MESOS-5172
> URL: https://issues.apache.org/jira/browse/MESOS-5172
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: containerizer, mesosphere
>
> When the registry puller is pulling a private repository from some private 
> registry (e.g., quay.io), errors may occur when fetching blobs, at which 
> point fetching the manifest of the repo is finished correctly. The error 
> message is `Unexpected HTTP response '400 Bad Request' when trying to 
> download the blob`. This may arise from the logic of fetching blobs, or 
> incorrect format of uri when requesting blobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5051) Create helpers for manipulating Linux capabilities.

2016-06-27 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5051:
-
Sprint: Mesosphere Sprint 32, Mesosphere Sprint 33, Mesosphere Sprint 34, 
Mesosphere Sprint 35, Mesosphere Sprint 37, Mesosphere Sprint 38  (was: 
Mesosphere Sprint 32, Mesosphere Sprint 33, Mesosphere Sprint 34, Mesosphere 
Sprint 35, Mesosphere Sprint 37)

> Create helpers for manipulating Linux capabilities.
> ---
>
> Key: MESOS-5051
> URL: https://issues.apache.org/jira/browse/MESOS-5051
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Jojy Varghese
>  Labels: mesosphere
>
> These helpers can either based on some existing library (e.g. libcap), or use 
> system calls directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4749) Move HTB out of containers

2016-06-27 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4749:
-
Sprint: Mesosphere Sprint 36, Mesosphere Sprint 37, Mesosphere Sprint 38  
(was: Mesosphere Sprint 36, Mesosphere Sprint 37)

> Move HTB out of containers
> --
>
> Key: MESOS-4749
> URL: https://issues.apache.org/jira/browse/MESOS-4749
> Project: Mesos
>  Issue Type: Task
>  Components: network
>Reporter: Cong Wang
>Assignee: Cong Wang
>Priority: Minor
>
> Currently we set a fixed HTB bandwidth in each of the container, which makes 
> it impossible to share the link if idle. As the first step, we should move it 
> out of the containers, into the qdisc hierarchy of the physical interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5649) Build an example framework to consume GPUs

2016-06-27 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5649:
-
Sprint: Mesosphere Sprint 37, Mesosphere Sprint 38  (was: Mesosphere Sprint 
37)

> Build an example framework to consume GPUs
> --
>
> Key: MESOS-5649
> URL: https://issues.apache.org/jira/browse/MESOS-5649
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: gpu, mesosphere
> Fix For: 1.0.0
>
>
> This framework should show how to build a GPU capable framework that can 
> accept offers with GPUs and launch tasks that use them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5303) Add capabilities support for mesos execute cli.

2016-06-27 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5303:
-
Sprint: Mesosphere Sprint 34, Mesosphere Sprint 35, Mesosphere Sprint 37, 
Mesosphere Sprint 38  (was: Mesosphere Sprint 34, Mesosphere Sprint 35, 
Mesosphere Sprint 37)

> Add capabilities support for mesos execute cli.
> ---
>
> Key: MESOS-5303
> URL: https://issues.apache.org/jira/browse/MESOS-5303
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Jojy Varghese
>Assignee: Jojy Varghese
>  Labels: mesosphere
>
> Add support for `user` and `capabilities` to execute cli. This will help in 
> testing the `capabilities` feature for unified containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5562) Add class to share Nvidia-specific components between containerizers

2016-06-27 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5562:
-
Sprint: Mesosphere Sprint 36, Mesosphere Sprint 37, Mesosphere Sprint 38  
(was: Mesosphere Sprint 36, Mesosphere Sprint 37)

> Add class to share Nvidia-specific components between containerizers
> 
>
> Key: MESOS-5562
> URL: https://issues.apache.org/jira/browse/MESOS-5562
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: gpu, mesosphere
>
> Once we have an `NvidiaGPUAllocator` component, we need some way to share it 
> across multiple containerizers.  Moreover, we anticipate needing other Nvidia 
> components to share across multiple containerizers as well (e.g. an 
> `NvidiaVolumeManager` component). As such, we should add a wrapper class 
> around these components to make it easily passable to each containerizer 
> without having to continually add a bunch of parameters to the Containerizer 
> interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5275) Add capabilities support for unified containerizer.

2016-06-27 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5275:
-
Sprint: Mesosphere Sprint 34, Mesosphere Sprint 35, Mesosphere Sprint 37, 
Mesosphere Sprint 38  (was: Mesosphere Sprint 34, Mesosphere Sprint 35, 
Mesosphere Sprint 37)

> Add capabilities support for unified containerizer.
> ---
>
> Key: MESOS-5275
> URL: https://issues.apache.org/jira/browse/MESOS-5275
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Jojy Varghese
>Assignee: Jojy Varghese
>  Labels: mesosphere
>
> Add capabilities support for unified containerizer. 
> Requirements:
> 1. Use the mesos capabilities API.
> 2. Frameworks be able to add capability requests for containers.
> 3. Agents be able to add maximum allowed capabilities for all containers 
> launched.
> Design document: 
> https://docs.google.com/document/d/1YiTift8TQla2vq3upQr7K-riQ_pQ-FKOCOsysQJROGc/edit#heading=h.rgfwelqrskmd



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5582) Create a `cgroups/devices` isolator.

2016-06-27 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5582:
-
Sprint: Mesosphere Sprint 36, Mesosphere Sprint 37, Mesosphere Sprint 38  
(was: Mesosphere Sprint 36, Mesosphere Sprint 37)

> Create a `cgroups/devices` isolator.
> 
>
> Key: MESOS-5582
> URL: https://issues.apache.org/jira/browse/MESOS-5582
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: gpu, isolator, mesosphere
>
> Currently, all the logic for the `cgroups/devices` isolator is bundled into 
> the Nvidia GPU Isolator. We should abstract it out into it's own component 
> and remove the redundant logic from the Nvidia GPU Isolator. Assuming the 
> guaranteed ordering between isolators from MESOS-5581, we can be sure that 
> the dependency order between the `cgroups/devices` and `gpu/nvidia` isolators 
> is met.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5419) Document all known client libraries for the Scheduler/Executor API

2016-06-27 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5419:
-
Sprint: Mesosphere Sprint 37, Mesosphere Sprint 38  (was: Mesosphere Sprint 
37)

> Document all known client libraries for the Scheduler/Executor API
> --
>
> Key: MESOS-5419
> URL: https://issues.apache.org/jira/browse/MESOS-5419
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: newbie
>
> Previously during various community syncs, we had decided that we would only 
> be supporting the C++ scheduler/executor library in the Mesos code base going 
> forward. We should however, still document the client libraries available in 
> various languages to drive adoption/have a recommended list for users to look 
> up.
> This can be similar to the already existing frameworks doc: 
> http://mesos.apache.org/documentation/latest/frameworks/
> Other projects also seem to have been following a similar practice:
> https://docs.docker.com/engine/reference/api/remote_api_client_libraries/
> https://github.com/kubernetes/kubernetes/blob/master/docs/devel/client-libraries.md



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5228) Add tests for Capability API.

2016-06-27 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5228:
-
Sprint: Mesosphere Sprint 33, Mesosphere Sprint 34, Mesosphere Sprint 35, 
Mesosphere Sprint 37, Mesosphere Sprint 38  (was: Mesosphere Sprint 33, 
Mesosphere Sprint 34, Mesosphere Sprint 35, Mesosphere Sprint 37)

> Add tests for Capability API.
> -
>
> Key: MESOS-5228
> URL: https://issues.apache.org/jira/browse/MESOS-5228
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Jojy Varghese
>Assignee: Jojy Varghese
>  Labels: mesosphere, unified-containerizer-mvp
>
> Add basic tests for the capability API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5392) Design doc for adding resource limits support for Mesos containerizer

2016-06-27 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5392:
-
Sprint: Mesosphere Sprint 35, Mesosphere Sprint 37, Mesosphere Sprint 38  
(was: Mesosphere Sprint 35, Mesosphere Sprint 37)

> Design doc for adding resource limits support for Mesos containerizer
> -
>
> Key: MESOS-5392
> URL: https://issues.apache.org/jira/browse/MESOS-5392
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Jojy Varghese
>Assignee: Jojy Varghese
>  Labels: mesosphere
>
> This will be the design doc for MESOS-5391.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4690) Reorganize 3rdparty directory

2016-06-27 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4690:
-
Sprint: Mesosphere Sprint 33, Mesosphere Sprint 34, Mesosphere Sprint 35, 
Mesosphere Sprint 36, Mesosphere Sprint 37, Mesosphere Sprint 38  (was: 
Mesosphere Sprint 33, Mesosphere Sprint 34, Mesosphere Sprint 35, Mesosphere 
Sprint 36, Mesosphere Sprint 37)

> Reorganize 3rdparty directory
> -
>
> Key: MESOS-4690
> URL: https://issues.apache.org/jira/browse/MESOS-4690
> Project: Mesos
>  Issue Type: Epic
>  Components: build, libprocess, stout
>Reporter: Kapil Arya
>Assignee: Kapil Arya
>  Labels: mesosphere
>
> This issues is currently being discussed in the dev mailing list:
> http://www.mail-archive.com/dev@mesos.apache.org/msg34349.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5559) Integrate the `NvidiaGpuAllocator` into the `NvidiaGpuIsolator`

2016-06-27 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5559:
-
Sprint: Mesosphere Sprint 36, Mesosphere Sprint 37, Mesosphere Sprint 38  
(was: Mesosphere Sprint 36, Mesosphere Sprint 37)

> Integrate the `NvidiaGpuAllocator` into the `NvidiaGpuIsolator`
> ---
>
> Key: MESOS-5559
> URL: https://issues.apache.org/jira/browse/MESOS-5559
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: gpu, mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4233) Logging is too verbose for sysadmins / syslog

2016-06-27 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4233:
-
Sprint: Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 28, 
Mesosphere Sprint 29, Mesosphere Sprint 30, Mesosphere Sprint 31, Mesosphere 
Sprint 32, Mesosphere Sprint 33, Mesosphere Sprint 34, Mesosphere Sprint 35, 
Mesosphere Sprint 36, Mesosphere Sprint 37, Mesosphere Sprint 38  (was: 
Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 28, Mesosphere 
Sprint 29, Mesosphere Sprint 30, Mesosphere Sprint 31, Mesosphere Sprint 32, 
Mesosphere Sprint 33, Mesosphere Sprint 34, Mesosphere Sprint 35, Mesosphere 
Sprint 36, Mesosphere Sprint 37)

> Logging is too verbose for sysadmins / syslog
> -
>
> Key: MESOS-4233
> URL: https://issues.apache.org/jira/browse/MESOS-4233
> Project: Mesos
>  Issue Type: Epic
>Reporter: Cody Maloney
>Assignee: Kapil Arya
>  Labels: mesosphere
> Attachments: giant_port_range_logging
>
>
> Currently mesos logs a lot. When launching a thousand tasks in the space of 
> 10 seconds it will print tens of thousands of log lines, overwhelming syslog 
> (there is a max rate at which a process can send stuff over a unix socket) 
> and not giving useful information to a sysadmin who cares about just the 
> high-level activity and when something goes wrong.
> Note mesos also blocks writing to its log locations, so when writing a lot of 
> log messages, it can fill up the write buffer in the kernel, and be suspended 
> until the syslog agent catches up reading from the socket (GLOG does a 
> blocking fwrite to stderr). GLOG also has a big mutex around logging so only 
> one thing logs at a time.
> While for "internal debugging" it is useful to see things like "message went 
> from internal compoent x to internal component y", from a sysadmin 
> perspective I only care about the high level actions taken (launched task for 
> framework x), sent offer to framework y, got task failed from host z. Note 
> those are what I'd expect at the "INFO" level. At the "WARNING" level I'd 
> expect very little to be logged / almost nothing in normal operation. Just 
> things like "WARN: Repliacted log write took longer than expected". WARN 
> would also get things like backtraces on crashes and abnormal exits / abort.
> When trying to launch 3k+ tasks inside a second, mesos logging currently 
> overwhelms syslog with 100k+ messages, many of which are thousands of bytes. 
> Sysadmins expect to be able to use syslog to monitor basic events in their 
> system. This is too much.
> We can keep logging the messages to files, but the logging to stderr needs to 
> be reduced significantly (stderr gets picked up and forwarded to syslog / 
> central aggregation).
> What I would like is if I can set the stderr logging level to be different / 
> independent from the file logging level (Syslog giving the "sysadmin" 
> aggregated overview, files useful for debugging in depth what happened in a 
> cluster). A lot of what mesos currently logs at info is really debugging info 
> / should show up as debug log level.
> Some samples of mesos logging a lot more than a sysadmin would want / expect 
> are attached, and some are below:
>  - Every task gets printed multiple times for a basic launch:
> {noformat}
> Dec 15 22:58:30 ip-10-0-7-60.us-west-2.compute.internal mesos-master[1311]: 
> I1215 22:58:29.382644  1315 master.cpp:3248] Launching task 
> envy.5b19a713-a37f-11e5-8b3e-0251692d6109 of framework 
> 5178f46d-71d6-422f-922c-5bbe82dff9cc- (marathon)
> Dec 15 22:58:30 ip-10-0-7-60.us-west-2.compute.internal mesos-master[1311]: 
> I1215 22:58:29.382925  1315 master.hpp:176] Adding task 
> envy.5b1958f2-a37f-11e5-8b3e-0251692d6109 with resources cpus(*):0.0001; 
> mem(*):16; ports(*):[14047-14047]
> {noformat}
>  - Every task status update prints many log lines, successful ones are part 
> of normal operation and maybe should be logged at info / debug levels, but 
> not to a sysadmin (Just show when things fail, and maybe aggregate counters 
> to tell of the volume of working)
>  - No log messagse should be really big / more than 1k characters (Would 
> prevent the giant port list attached, make that easily discoverable / bug 
> filable / fixable) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5550) Remove Nvidia GPU Isolator's link-time dependence on `libnvidia-ml`

2016-06-27 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5550:
-
Sprint: Mesosphere Sprint 36, Mesosphere Sprint 37, Mesosphere Sprint 38  
(was: Mesosphere Sprint 36, Mesosphere Sprint 37)

> Remove Nvidia GPU Isolator's link-time dependence on `libnvidia-ml`
> ---
>
> Key: MESOS-5550
> URL: https://issues.apache.org/jira/browse/MESOS-5550
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: gpu, mesosphere
> Fix For: 1.0.0
>
>
> The current Nvidia GPU isolator has a dependence on `libnvidia-ml`, and as 
> such, pulls a hard dependence on this library into `libmesos`. The 
> consequence of this is that any process that relies on `libmesos` has to have 
> `libnvidia-ml` available as well, even on machines where no GPUs are 
> available.  Since this library is not easily installable through standard 
> package managers, having such a hard dependence can be burdensome.
> This ticket proposes to pull in `libnvidia-ml` as a run-time dependence 
> instead of a link-time dependence. As such, only machines that actually have 
> GPUs installed and would like to rely on this library need to have it 
> installed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5401) Add ability to inject a Volume of Nvidia GPU-related libraries into a docker container.

2016-06-27 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5401:
-
Sprint: Mesosphere Sprint 35, Mesosphere Sprint 36, Mesosphere Sprint 37, 
Mesosphere Sprint 38  (was: Mesosphere Sprint 35, Mesosphere Sprint 36, 
Mesosphere Sprint 37)

> Add ability to inject a Volume of Nvidia GPU-related libraries into a docker 
> container.
> ---
>
> Key: MESOS-5401
> URL: https://issues.apache.org/jira/browse/MESOS-5401
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>
> In order to support Nvidia GPUs with docker containers in Mesos, we need to 
> be able to consolidate all Nvidia libraries into a common volume and inject 
> that volume into the container.
> More info on why this is necessary here: 
> https://github.com/NVIDIA/nvidia-docker/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5570) Improve CHANGELOG and upgrades.md

2016-06-27 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5570:
-
Sprint: Mesosphere Sprint 37, Mesosphere Sprint 38  (was: Mesosphere Sprint 
37)

> Improve CHANGELOG and upgrades.md
> -
>
> Key: MESOS-5570
> URL: https://issues.apache.org/jira/browse/MESOS-5570
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>
> Currently we have a lot of data duplication between the CHANGELOG and 
> upgrades.md. We should try to improve this and potentially make the CHANGLOG 
> a markdown file as well. For inspiration see the Hadoop changelog: 
> https://github.com/apache/hadoop/blob/2e1d0ff4e901b8313c8d71869735b94ed8bc40a0/hadoop-common-project/hadoop-common/src/site/markdown/release/1.2.0/CHANGES.1.2.0.md



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4766) Improve allocator performance.

2016-06-27 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4766:
-
Sprint: Mesosphere Sprint 32, Mesosphere Sprint 33, Mesosphere Sprint 34, 
Mesosphere Sprint 35, Mesosphere Sprint 36, Mesosphere Sprint 37, Mesosphere 
Sprint 38  (was: Mesosphere Sprint 32, Mesosphere Sprint 33, Mesosphere Sprint 
34, Mesosphere Sprint 35, Mesosphere Sprint 36, Mesosphere Sprint 37)

> Improve allocator performance.
> --
>
> Key: MESOS-4766
> URL: https://issues.apache.org/jira/browse/MESOS-4766
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation
>Reporter: Benjamin Mahler
>Assignee: Michael Park
>Priority: Critical
>
> This is an epic to track the various tickets around improving the performance 
> of the allocator, including the following:
> * Preventing un-necessary backup of the allocator.
> * Reducing the cost of allocations and allocator state updates.
> * Improving performance of the DRF sorter.
> * More benchmarking to simulate scenarios with performance issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4626) Support Nvidia GPUs with filesystem isolation enabled.

2016-06-27 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4626:
-
Sprint: Mesosphere Sprint 35, Mesosphere Sprint 36, Mesosphere Sprint 37, 
Mesosphere Sprint 38  (was: Mesosphere Sprint 35, Mesosphere Sprint 36, 
Mesosphere Sprint 37)

> Support Nvidia GPUs with filesystem isolation enabled.
> --
>
> Key: MESOS-4626
> URL: https://issues.apache.org/jira/browse/MESOS-4626
> Project: Mesos
>  Issue Type: Task
>  Components: isolation
>Reporter: Benjamin Mahler
>Assignee: Kevin Klues
>
> When filesystem isolation is enabled, containers that use Nvidia GPU 
> resources need access to GPU libraries residing on the host.
> We'll need to provide a means for operators to inject the necessary volumes 
> into *all* containers that use "gpus" resources.
> See the nvidia-docker project for more details:
> [nvidia-docker/tools/src/nvidia/volumes.go|https://github.com/NVIDIA/nvidia-docker/blob/fda10b2d27bf5578cc5337c23877f827e4d1ed77/tools/src/nvidia/volumes.go#L50-L103]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5699) Create new documentation for Mesos networking.

2016-06-27 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5699:
-
Sprint: Mesosphere Sprint 37, Mesosphere Sprint 38  (was: Mesosphere Sprint 
37)

> Create new documentation for Mesos networking.
> --
>
> Key: MESOS-5699
> URL: https://issues.apache.org/jira/browse/MESOS-5699
> Project: Mesos
>  Issue Type: Task
>  Components: documentation
> Environment: Linux
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
> Fix For: 1.0.0
>
>
> With introduction of CNI and dockers support docker user-defined networks, 
> there are quite a few options within Mesos for IP-per-container solutions for 
> container networking. 
> We therefore need to re-write networking documentation for Mesos highlighting 
> all the networking support that Mesos provides for orchestrating containers 
> on IP networks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5221) Add Documentation for Nvidia GPU support

2016-06-27 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5221:
-
Sprint: Mesosphere Sprint 33, Mesosphere Sprint 35, Mesosphere Sprint 36, 
Mesosphere Sprint 37, Mesosphere Sprint 38  (was: Mesosphere Sprint 33, 
Mesosphere Sprint 35, Mesosphere Sprint 36, Mesosphere Sprint 37)

> Add Documentation for Nvidia GPU support
> 
>
> Key: MESOS-5221
> URL: https://issues.apache.org/jira/browse/MESOS-5221
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>Priority: Minor
>
> https://reviews.apache.org/r/46220/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4099) parallel make tests does not build all test targets

2016-06-27 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4099:
-
Sprint: Mesosphere Sprint 35, Mesosphere Sprint 36, Mesosphere Sprint 37, 
Mesosphere Sprint 38  (was: Mesosphere Sprint 35, Mesosphere Sprint 36, 
Mesosphere Sprint 37)

> parallel make tests does not build all test targets
> ---
>
> Key: MESOS-4099
> URL: https://issues.apache.org/jira/browse/MESOS-4099
> Project: Mesos
>  Issue Type: Bug
>  Components: build, libprocess
>Affects Versions: 0.26.0
> Environment: Ubuntu 15.04
> clang-3.6 as well as gcc-4.9
>Reporter: Joris Van Remoortere
>Assignee: Kapil Arya
>  Labels: mesosphere
> Fix For: 1.0.0
>
>
> When inside 3rdparty/libprocess:
> Running {{make -j8 tests}} from a clean build does not yield the 
> {{libprocess-tests}} binary.
> Running it a subsequent time triggers more compilation and ends up yielding 
> the {{libprocess-tests}} binary.
> This suggests the {{test}} target is not being built correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5659) Design doc for TASK_UNREACHABLE

2016-06-27 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5659:
-
Sprint: Mesosphere Sprint 37, Mesosphere Sprint 38  (was: Mesosphere Sprint 
37)

> Design doc for TASK_UNREACHABLE
> ---
>
> Key: MESOS-5659
> URL: https://issues.apache.org/jira/browse/MESOS-5659
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>
> See MESOS-4049.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5445) Allow libprocess/stout to build without first doing `make` in 3rdparty.

2016-06-27 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5445:
-
Sprint: Mesosphere Sprint 35, Mesosphere Sprint 36, Mesosphere Sprint 37, 
Mesosphere Sprint 38  (was: Mesosphere Sprint 35, Mesosphere Sprint 36, 
Mesosphere Sprint 37)

> Allow libprocess/stout to build without first doing `make` in 3rdparty.
> ---
>
> Key: MESOS-5445
> URL: https://issues.apache.org/jira/browse/MESOS-5445
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Reporter: Kapil Arya
>Assignee: Kapil Arya
>  Labels: mesosphere
> Fix For: 1.0.0
>
>
> After the 3rdparty reorg, libprocess/stout are enable to build their 
> dependencies and so one has to do `make` in 3rdpart/ before building 
> libprocess/stout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-2043) Framework auth fail with timeout error and never get authenticated

2016-06-27 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-2043:
-
Sprint: Mesosphere Sprint 35, Mesosphere Sprint 36, Mesosphere Sprint 37, 
Mesosphere Sprint 38  (was: Mesosphere Sprint 35, Mesosphere Sprint 36, 
Mesosphere Sprint 37)

> Framework auth fail with timeout error and never get authenticated
> --
>
> Key: MESOS-2043
> URL: https://issues.apache.org/jira/browse/MESOS-2043
> Project: Mesos
>  Issue Type: Bug
>  Components: master, scheduler driver, security, slave
>Affects Versions: 0.21.0
>Reporter: Bhuvan Arumugam
>Assignee: Benjamin Bannier
>Priority: Critical
>  Labels: mesosphere, security
> Attachments: aurora-scheduler.20141104-1606-1706.log, master.log, 
> mesos-master.20141104-1606-1706.log, slave.log
>
>
> I'm facing this issue in master as of 
> https://github.com/apache/mesos/commit/74ea59e144d131814c66972fb0cc14784d3503d4
> As [~adam-mesos] mentioned in IRC, this sounds similar to MESOS-1866. I'm 
> running 1 master and 1 scheduler (aurora). The framework authentication fail 
> due to time out:
> error on mesos master:
> {code}
> I1104 19:37:17.741449  8329 master.cpp:3874] Authenticating 
> scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083
> I1104 19:37:17.741585  8329 master.cpp:3885] Using default CRAM-MD5 
> authenticator
> I1104 19:37:17.742106  8336 authenticator.hpp:169] Creating new server SASL 
> connection
> W1104 19:37:22.742959  8329 master.cpp:3953] Authentication timed out
> W1104 19:37:22.743548  8329 master.cpp:3930] Failed to authenticate 
> scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083: 
> Authentication discarded
> {code}
> scheduler error:
> {code}
> I1104 19:38:57.885486 49012 sched.cpp:283] Authenticating with master 
> master@MASTER_IP:PORT
> I1104 19:38:57.885928 49002 authenticatee.hpp:133] Creating new client SASL 
> connection
> I1104 19:38:57.890581 49007 authenticatee.hpp:224] Received SASL 
> authentication mechanisms: CRAM-MD5
> I1104 19:38:57.890656 49007 authenticatee.hpp:250] Attempting to authenticate 
> with mechanism 'CRAM-MD5'
> W1104 19:39:02.891196 49005 sched.cpp:378] Authentication timed out
> I1104 19:39:02.891850 49018 sched.cpp:338] Failed to authenticate with master 
> master@MASTER_IP:PORT: Authentication discarded
> {code}
> Looks like 2 instances {{scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94}} & 
> {{scheduler-d2d4437b-d375-4467-a583-362152fe065a}} of same framework is 
> trying to authenticate and fail.
> {code}
> W1104 19:36:30.769420  8319 master.cpp:3930] Failed to authenticate 
> scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94@SCHEDULER_IP:8083: Failed to 
> communicate with authenticatee
> I1104 19:36:42.701441  8328 master.cpp:3860] Queuing up authentication 
> request from scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083 
> because authentication is still in progress
> {code}
> Restarting master and scheduler didn't fix it. 
> This particular issue happen with 1 master and 1 scheduler after MESOS-1866 
> is fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5717) Can't autodiscovery GPU resources without '--enable-nvidia-gpu-support' and '--nvidia_gpu_devices' flags

2016-06-27 Thread Benjamin Mahler (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352033#comment-15352033
 ] 

Benjamin Mahler commented on MESOS-5717:


[~Sunzhe] you now need to pass {{--isolation=gpu/nvidia}} when starting the 
agent for auto-discovery to take place. Are you doing that?

> Can't autodiscovery GPU resources without '--enable-nvidia-gpu-support' and 
> '--nvidia_gpu_devices' flags
> 
>
> Key: MESOS-5717
> URL: https://issues.apache.org/jira/browse/MESOS-5717
> Project: Mesos
>  Issue Type: Bug
> Environment: RHEL 7.2
>Reporter: Sunzhe
>  Labels: gpu
> Fix For: 1.0.0
>
>
> Prerequisite: In MESOS\-5257  "By default, with no '\-\-nvidia_gpu_devices' 
> flag or `gpus` resources flag, the new auto-discovery will simply enumerate 
> all of the GPUs on the system" and in MESOS\-5630 "removes this 
> flag(\-\-enable-nvidia-gpu-support) and enables this support for all builds 
> on Linux."
> So, I '../configure' without any flag, and start agent without 
> '\-\-resources' or '\-\-nvidia_gpu_devices' ,  but can not discovery GPU 
> resources, and I also start agent with '\-\-resources' and 
> '\-\-nvidia_gpu_devices' , it also does not work.
> I'm sure the NVIDIA GPUs on my machines are OK, because with 
> '\-\-enable-nvidia-gpu-support' when './configure' and with '\-\-resources', 
> '\-\-nvidia_gpu_devices' when starting agents it works well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3243) Replace NULL with nullptr

2016-06-27 Thread Michael Park (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351993#comment-15351993
 ] 

Michael Park commented on MESOS-3243:
-

{noformat}
commit 470c6cc19011728b81e133a75957b1d47090504f
Author: Tomasz Janiszewski jani...@gmail.com
Date:   Mon Jun 27 15:24:23 2016 +0200

Updated support/cpplint.patch.

Generated `support/cpplint.patch` using following command:

```
PATCH="support/cpplint.patch"
FIRST_COMMIT=\`git log --pretty=format:"%h"  --diff-filter=A -- $PATCH\`
git diff ${FIRST_COMMIT}..master  support/cpplint.py > $PATCH
sed -i 's/\[ \\t\]\*$//' "$PATCH"
```

Review: https://reviews.apache.org/r/48728/
{noformat}
{noformat}
commit 6a04428f24a07e86dcee8cff66d9536c6d6dfc18
Author: Tomasz Janiszewski jani...@gmail.com
Date:   Mon Jun 27 15:24:34 2016 +0200

Update C++ style checker to prevent `NULL` usage.

Review: https://reviews.apache.org/r/48320/
{noformat}

> Replace NULL with nullptr
> -
>
> Key: MESOS-3243
> URL: https://issues.apache.org/jira/browse/MESOS-3243
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Michael Park
>Assignee: Tomasz Janiszewski
> Fix For: 1.0.0
>
>
> As part of the C++ upgrade, it would be nice to move our use of {{NULL}} over 
> to use {{nullptr}}. I think it would be an interesting exercise to do this 
> with {{clang-modernize}} using the [nullptr 
> transform|http://clang.llvm.org/extra/UseNullptrTransform.html] (although 
> it's probably just as easy to use {{sed}}).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5499) Implement RESERVE_RESOURCES Call in v1 master API.

2016-06-27 Thread Anand Mazumdar (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-5499:
--
Shepherd: Anand Mazumdar

> Implement RESERVE_RESOURCES Call in v1 master API.
> --
>
> Key: MESOS-5499
> URL: https://issues.apache.org/jira/browse/MESOS-5499
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Abhishek Dasgupta
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5500) Implement UNRESERVE_RESOURCES Call in v1 master API.

2016-06-27 Thread Anand Mazumdar (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-5500:
--
Shepherd: Anand Mazumdar

> Implement UNRESERVE_RESOURCES Call in v1 master API.
> 
>
> Key: MESOS-5500
> URL: https://issues.apache.org/jira/browse/MESOS-5500
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Abhishek Dasgupta
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5499) Implement RESERVE_RESOURCES Call in v1 master API.

2016-06-27 Thread Anand Mazumdar (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351932#comment-15351932
 ] 

Anand Mazumdar commented on MESOS-5499:
---

{noformat}
commit 7435e9032bc93f1d2d760d0b412b4b198f00
Author: Abhishek Dasgupta 
Date:   Mon Jun 27 15:01:23 2016 -0700

Implemented RESERVE_RESOURCES Call in v1 master API.

Review: https://reviews.apache.org/r/49225/
{noformat}

> Implement RESERVE_RESOURCES Call in v1 master API.
> --
>
> Key: MESOS-5499
> URL: https://issues.apache.org/jira/browse/MESOS-5499
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Abhishek Dasgupta
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-5726) Benchmark the v1 Operator API

2016-06-27 Thread Vinod Kone (JIRA)

Vinod Kone created MESOS-5726:
-

 Summary: Benchmark the v1 Operator API
 Key: MESOS-5726
 URL: https://issues.apache.org/jira/browse/MESOS-5726
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone


Just like what we did with the v1 framework API, we need to benchmark the 
performance of v1 operator API.

As part of this benchmarking, we should evaluate whether evolving un-versioned 
protos to versioned protos in some of the API handlers (e.g., getFrameworks) is 
expensive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Deleted] (MESOS-5720) Can't autodiscovery GPU resources without '--enable-nvidia-gpu-support' and '--nvidia_gpu_devices' flags

2016-06-27 Thread Vinod Kone (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone deleted MESOS-5720:
--


> Can't autodiscovery GPU resources without '--enable-nvidia-gpu-support' and 
> '--nvidia_gpu_devices' flags
> 
>
> Key: MESOS-5720
> URL: https://issues.apache.org/jira/browse/MESOS-5720
> Project: Mesos
>  Issue Type: Bug
> Environment: RHEL 7.2
>Reporter: Sunzhe
>  Labels: gpu
>
> Prerequisite: In MESOS\-5257  "By default, with no '\-\-nvidia_gpu_devices' 
> flag or `gpus` resources flag, the new auto-discovery will simply enumerate 
> all of the GPUs on the system" and in MESOS\-5630 "removes this 
> flag(\-\-enable-nvidia-gpu-support) and enables this support for all builds 
> on Linux."
> So, I '../configure' without any flag, and start agent without 
> '\-\-resources' or '\-\-nvidia_gpu_devices' ,  but can not discovery GPU 
> resources, and I also start agent with '\-\-resources' and 
> '\-\-nvidia_gpu_devices' , it also does not work.
> I'm sure the NVIDIA GPUs on my machines are OK, because with 
> '\-\-enable-nvidia-gpu-support' when './configure' and with '\-\-resources', 
> '\-\-nvidia_gpu_devices' when starting agents it works well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Deleted] (MESOS-5719) Can't autodiscovery GPU resources without '--enable-nvidia-gpu-support' and '--nvidia_gpu_devices' flags

2016-06-27 Thread Vinod Kone (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone deleted MESOS-5719:
--


> Can't autodiscovery GPU resources without '--enable-nvidia-gpu-support' and 
> '--nvidia_gpu_devices' flags
> 
>
> Key: MESOS-5719
> URL: https://issues.apache.org/jira/browse/MESOS-5719
> Project: Mesos
>  Issue Type: Bug
> Environment: RHEL 7.2
>Reporter: Sunzhe
>  Labels: GPU
>
> Prerequisite: In MESOS\-5257  "By default, with no '\-\-nvidia_gpu_devices' 
> flag or `gpus` resources flag, the new auto-discovery will simply enumerate 
> all of the GPUs on the system" and in MESOS\-5630 "removes this 
> flag(\-\-enable-nvidia-gpu-support) and enables this support for all builds 
> on Linux."
> So, I '../configure' without any flag, and start agent without 
> '\-\-resources' or '\-\-nvidia_gpu_devices' ,  but can not discovery GPU 
> resources, and I also start agent with '\-\-resources' and 
> '\-\-nvidia_gpu_devices' , it also does not work.
> I'm sure the NVIDIA GPUs on my machines are OK, because with 
> '\-\-enable-nvidia-gpu-support' when './configure' and with '\-\-resources', 
> '\-\-nvidia_gpu_devices' when starting agents it works well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Deleted] (MESOS-5721) dd

2016-06-27 Thread Vinod Kone (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone deleted MESOS-5721:
--


> dd
> --
>
> Key: MESOS-5721
> URL: https://issues.apache.org/jira/browse/MESOS-5721
> Project: Mesos
>  Issue Type: Bug
>Reporter: Sunzhe
>
> dd



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5718) Mesos UI shows "Taks is in RUNNING status" but can't find it in the mesos Agent.

2016-06-27 Thread Vinod Kone (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351805#comment-15351805
 ] 

Vinod Kone commented on MESOS-5718:
---

Probably related to MESOS-5722

> Mesos UI shows "Taks is in RUNNING status" but can't find it in the mesos 
> Agent.
> 
>
> Key: MESOS-5718
> URL: https://issues.apache.org/jira/browse/MESOS-5718
> Project: Mesos
>  Issue Type: Bug
>Reporter: chenqiang
>
> Now, we find an issue that a task launched by marathon with docker container 
> shows "Task is in RUNNING status" in Mesos UI, but can't find it in the mesos 
> Agent host. Namely, the docker container doesn't exist but the Task is shown 
> As RUNNING in Mesos UI.  so interesting...
> Parts log is attached as belows:
> ```
> I0627 14:31:30.239467  3913 slave.cpp:1912] Asked to kill task 
> tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799 of 
> framework 20141201-145651-1900714250-5050-3484-
> W0627 14:31:30.239547  3913 slave.cpp:2025] Ignoring kill task 
> tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799 
> because the executor 
> 'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799' 
> of framework 20141201-145651-1900714250-5050-3484- at 
> executor(1)@10.153.96.22:14578 is terminating/terminated
> I0624 14:46:04.398646  3921 slave.cpp:4511] Sending reconnect request to 
> executor 
> 'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799' 
> of framework 20141201-145651-1900714250-5050-3484- at 
> executor(1)@10.153.96.22:14578
> I0624 14:46:06.399073  3899 slave.cpp:2991] Killing un-reregistered executor 
> 'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799' 
> of framework 20141201-145651-1900714250-5050-3484- at 
> executor(1)@10.153.96.22:14578
> I0624 14:46:06.399183  3899 slave.cpp:4571] Finished recovery
> I0624 14:46:06.399375  3902 docker.cpp:1724] Destroying container 
> 'fa37fc7c-7ef1-478a-81a2-cae38ab3e4cb'
> I0624 14:46:06.399431  3902 docker.cpp:1852] Running docker stop on container 
> 'fa37fc7c-7ef1-478a-81a2-cae38ab3e4cb'
> ``` 
> What's the root cause ? It seems executor of that task is terminated, but the 
> task is ignored kill by slave.
> FIX: After restart mesos-slave, the RUNNING task becomes  in FAILED status, 
> and we can see it is launched again in other Agent, the task restores to 
> normal...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5718) Mesos UI shows "Taks is in RUNNING status" but can't find it in the mesos Agent.

2016-06-27 Thread Vinod Kone (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351803#comment-15351803
 ] 

Vinod Kone commented on MESOS-5718:
---

Looks like when the kill task request was sent at 14:31 the executor was in the 
process of termination. Do you know why the executor was terminating? But from 
the looks of it the executor never completely terminated (was it hung?) until 
after the agent was restarted 15 mins later.

> Mesos UI shows "Taks is in RUNNING status" but can't find it in the mesos 
> Agent.
> 
>
> Key: MESOS-5718
> URL: https://issues.apache.org/jira/browse/MESOS-5718
> Project: Mesos
>  Issue Type: Bug
>Reporter: chenqiang
>
> Now, we find an issue that a task launched by marathon with docker container 
> shows "Task is in RUNNING status" in Mesos UI, but can't find it in the mesos 
> Agent host. Namely, the docker container doesn't exist but the Task is shown 
> As RUNNING in Mesos UI.  so interesting...
> Parts log is attached as belows:
> ```
> I0627 14:31:30.239467  3913 slave.cpp:1912] Asked to kill task 
> tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799 of 
> framework 20141201-145651-1900714250-5050-3484-
> W0627 14:31:30.239547  3913 slave.cpp:2025] Ignoring kill task 
> tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799 
> because the executor 
> 'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799' 
> of framework 20141201-145651-1900714250-5050-3484- at 
> executor(1)@10.153.96.22:14578 is terminating/terminated
> I0624 14:46:04.398646  3921 slave.cpp:4511] Sending reconnect request to 
> executor 
> 'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799' 
> of framework 20141201-145651-1900714250-5050-3484- at 
> executor(1)@10.153.96.22:14578
> I0624 14:46:06.399073  3899 slave.cpp:2991] Killing un-reregistered executor 
> 'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799' 
> of framework 20141201-145651-1900714250-5050-3484- at 
> executor(1)@10.153.96.22:14578
> I0624 14:46:06.399183  3899 slave.cpp:4571] Finished recovery
> I0624 14:46:06.399375  3902 docker.cpp:1724] Destroying container 
> 'fa37fc7c-7ef1-478a-81a2-cae38ab3e4cb'
> I0624 14:46:06.399431  3902 docker.cpp:1852] Running docker stop on container 
> 'fa37fc7c-7ef1-478a-81a2-cae38ab3e4cb'
> ``` 
> What's the root cause ? It seems executor of that task is terminated, but the 
> task is ignored kill by slave.
> FIX: After restart mesos-slave, the RUNNING task becomes  in FAILED status, 
> and we can see it is launched again in other Agent, the task restores to 
> normal...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-5725) Move createFrameworkInfo() function definition to tests/mesos.hpp.

2016-06-27 Thread Abhishek Dasgupta (JIRA)

Abhishek Dasgupta created MESOS-5725:


 Summary: Move createFrameworkInfo() function definition to 
tests/mesos.hpp.
 Key: MESOS-5725
 URL: https://issues.apache.org/jira/browse/MESOS-5725
 Project: Mesos
  Issue Type: Bug
Reporter: Abhishek Dasgupta
Assignee: Abhishek Dasgupta
Priority: Minor


createFrameworkInfo() is a function that is used by many tests but strangely it 
is not present in the parent class MesosTest but every inherited class defines 
it when in need ( eg. ReservationEndpointsTest in 
src/tests/reservation_endpoints_tests.cpp). It is suggested to move this 
function to parent class MesosTest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5717) Can't autodiscovery GPU resources without '--enable-nvidia-gpu-support' and '--nvidia_gpu_devices' flags

2016-06-27 Thread Vinod Kone (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351791#comment-15351791
 ] 

Vinod Kone commented on MESOS-5717:
---

cc [~klueska] [~bmahler]

> Can't autodiscovery GPU resources without '--enable-nvidia-gpu-support' and 
> '--nvidia_gpu_devices' flags
> 
>
> Key: MESOS-5717
> URL: https://issues.apache.org/jira/browse/MESOS-5717
> Project: Mesos
>  Issue Type: Bug
> Environment: RHEL 7.2
>Reporter: Sunzhe
>  Labels: gpu
> Fix For: 1.0.0
>
>
> Prerequisite: In MESOS\-5257  "By default, with no '\-\-nvidia_gpu_devices' 
> flag or `gpus` resources flag, the new auto-discovery will simply enumerate 
> all of the GPUs on the system" and in MESOS\-5630 "removes this 
> flag(\-\-enable-nvidia-gpu-support) and enables this support for all builds 
> on Linux."
> So, I '../configure' without any flag, and start agent without 
> '\-\-resources' or '\-\-nvidia_gpu_devices' ,  but can not discovery GPU 
> resources, and I also start agent with '\-\-resources' and 
> '\-\-nvidia_gpu_devices' , it also does not work.
> I'm sure the NVIDIA GPUs on my machines are OK, because with 
> '\-\-enable-nvidia-gpu-support' when './configure' and with '\-\-resources', 
> '\-\-nvidia_gpu_devices' when starting agents it works well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5714) Specify soname for libmesos.so to major release

2016-06-27 Thread Vinod Kone (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351787#comment-15351787
 ] 

Vinod Kone commented on MESOS-5714:
---

If the Ceph Mesos framework written in Java?

Most JVM based frameworks I know of depend on /usr/local/lib/libmesos.so that 
is symlinked to the versioned so. So when the libmesos gets upgraded and the 
scheduler restarted it automatically picks up the new so.

> Specify soname for libmesos.so to major release
> ---
>
> Key: MESOS-5714
> URL: https://issues.apache.org/jira/browse/MESOS-5714
> Project: Mesos
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 0.28.2
>Reporter: Tim Harper
>  Labels: build
>
> I've installed mesos using the CentOS 7 package, and am building the 
> Ceph-Mesos framework. I've noticed when running {{ldd}} that {{ceph-mesos}} 
> is depending on too specific of a version of libmesos, which means that the 
> build will be broken on subsequent point releases.
> This seems to be because the {{soname}} for libmesos is set to a very 
> unforgiving value. If {{libmesos-0.28.2}} truly isn't ABI compatible with 
> {{libmesos-0.28.x}}, then I suppose this is set correctly and this ticket 
> should be closed summarily, albeit unfortunate.
> Here is the {{readelf}} output for {{libmesos}}
> {code}
> [root@6e189e07b470 /]# readelf -d /usr/local/lib/libmesos-0.28.2.so
> Dynamic section at offset 0x194cd18 contains 43 entries:
>   TagType Name/Value
>  0x0001 (NEEDED) Shared library: [libcrypt.so.1]
>  0x0001 (NEEDED) Shared library: [libexpat.so.1]
>  0x0001 (NEEDED) Shared library: [libdb-5.3.so]
>  0x0001 (NEEDED) Shared library: [libsasl2.so.3]
>  0x0001 (NEEDED) Shared library: [libsvn_delta-1.so.0]
>  0x0001 (NEEDED) Shared library: [libsvn_subr-1.so.0]
>  0x0001 (NEEDED) Shared library: [libaprutil-1.so.0]
>  0x0001 (NEEDED) Shared library: [libapr-1.so.0]
>  0x0001 (NEEDED) Shared library: [libpthread.so.0]
>  0x0001 (NEEDED) Shared library: [libdl.so.2]
>  0x0001 (NEEDED) Shared library: [libcurl.so.4]
>  0x0001 (NEEDED) Shared library: [libz.so.1]
>  0x0001 (NEEDED) Shared library: [librt.so.1]
>  0x0001 (NEEDED) Shared library: [libstdc++.so.6]
>  0x0001 (NEEDED) Shared library: [libm.so.6]
>  0x0001 (NEEDED) Shared library: [libc.so.6]
>  0x0001 (NEEDED) Shared library: 
> [ld-linux-x86-64.so.2]
>  0x0001 (NEEDED) Shared library: [libgcc_s.so.1]
>  0x000e (SONAME) Library soname: [libmesos-0.28.2.so]
>  0x000f (RPATH)  Library rpath: [/usr/lib/mesos]
>  0x000c (INIT)   0x92a1f0
>  0x000d (FINI)   0x13a8e94
>  0x0019 (INIT_ARRAY) 0x1ae
>  0x001b (INIT_ARRAYSZ)   1712 (bytes)
>  0x001a (FINI_ARRAY) 0x1ae8f38
>  0x001c (FINI_ARRAYSZ)   8 (bytes)
>  0x6ef5 (GNU_HASH)   0x228
>  0x0005 (STRTAB) 0x1b0be8
>  0x0006 (SYMTAB) 0x66a08
>  0x000a (STRSZ)  6130210 (bytes)
>  0x000b (SYMENT) 24 (bytes)
>  0x0003 (PLTGOT) 0x1b66000
>  0x0002 (PLTRELSZ)   387000 (bytes)
>  0x0014 (PLTREL) RELA
>  0x0017 (JMPREL) 0x8cba38
>  0x0007 (RELA)   0x7a5018
>  0x0008 (RELASZ) 1206816 (bytes)
>  0x0009 (RELAENT)24 (bytes)
>  0x6ffe (VERNEED)0x7a4e38
>  0x6fff (VERNEEDNUM) 8
>  0x6ff0 (VERSYM) 0x78960a
>  0x6ff9 (RELACOUNT)  1357
>  0x (NULL)   0x0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5724) SSL certificate validation should allow IP only verification.

2016-06-27 Thread Till Toenshoff (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351722#comment-15351722
 ] 

Till Toenshoff commented on MESOS-5724:
---

It is currently not entirely clear to me if adding an IP validation to 
{{verify}} would not possibly add a new intrusion vector - there may be a 
reason on why e.g. Python and some browsers do not fully support that RFC. 

See [The Python Standard Library :: 18. Interprocess Communication and 
Networking :: 18.2.1.4. Certificate 
handling|https://docs.python.org/3.4/library/ssl.html#ssl.match_hostname]

So maybe it is a good idea to make such functionality optionally available by 
an additional flag - e.g. {{LIBPROCESS_SSL_IP_VERIFY}}.

> SSL certificate validation should allow IP only verification.
> -
>
> Key: MESOS-5724
> URL: https://issues.apache.org/jira/browse/MESOS-5724
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 1.0.0
>Reporter: Till Toenshoff
>Priority: Blocker
>  Labels: libprocess, mesosphere, security, ssl
>
> Our SSL certificate validation currently assumes that the host (on connect 
> and on accept) does have a valid hostname. This however is not true for all  
> environments.
> {{process::network::openssl::verify}} currently only allows the validation of 
> a certificate against a hostname. 
> See 
> https://github.com/apache/mesos/blob/08866edd8a71d12f87f4f4dbefa292729efbf6ae/3rdparty/libprocess/src/openssl.cpp#L546
> RFC2818 however says that it should be perfectly valid to validate a 
> certificate  based on the IP address.
> See https://tools.ietf.org/html/rfc2818
> {noformat}
> In some cases, the URI is specified as an IP address rather than a
> hostname. In this case, the iPAddress subjectAltName must be present
> in the certificate and must exactly match the IP in the URI.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-5724) SSL certificate validation should allow IP only verification.

2016-06-27 Thread Till Toenshoff (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351691#comment-15351691
 ] 

Till Toenshoff edited comment on MESOS-5724 at 6/27/16 7:56 PM:


See also

- [How are SSL certificate server names resolved/Can I add alternative names 
using 
keytool?|http://stackoverflow.com/questions/8443081/how-are-ssl-certificate-server-names-resolved-can-i-add-alternative-names-using/8444863#8444863]
- [URIs in the subjAltName X.509 
extension|http://security.stackexchange.com/questions/14019/uris-in-the-subjaltname-x-509-extension/14021#14021]
- [OpenSSL: 
x509v3_config|https://www.openssl.org/docs/manmaster/apps/x509v3_config.html]




was (Author: tillt):
See also

- [How are SSL certificate server names resolved/Can I add alternative names 
using 
keytool?|http://stackoverflow.com/questions/8443081/how-are-ssl-certificate-server-names-resolved-can-i-add-alternative-names-using/8444863#8444863]
- [URIs in the subjAltName X.509 
extension|http://security.stackexchange.com/questions/14019/uris-in-the-subjaltname-x-509-extension/14021#14021]




> SSL certificate validation should allow IP only verification.
> -
>
> Key: MESOS-5724
> URL: https://issues.apache.org/jira/browse/MESOS-5724
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 1.0.0
>Reporter: Till Toenshoff
>Priority: Blocker
>  Labels: libprocess, mesosphere, security, ssl
>
> Our SSL certificate validation currently assumes that the host (on connect 
> and on accept) does have a valid hostname. This however is not true for all  
> environments.
> {{process::network::openssl::verify}} currently only allows the validation of 
> a certificate against a hostname. 
> See 
> https://github.com/apache/mesos/blob/08866edd8a71d12f87f4f4dbefa292729efbf6ae/3rdparty/libprocess/src/openssl.cpp#L546
> RFC2818 however says that it should be perfectly valid to validate a 
> certificate  based on the IP address.
> See https://tools.ietf.org/html/rfc2818
> {noformat}
> In some cases, the URI is specified as an IP address rather than a
> hostname. In this case, the iPAddress subjectAltName must be present
> in the certificate and must exactly match the IP in the URI.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5724) SSL certificate validation should allow IP only verification.

2016-06-27 Thread Till Toenshoff (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351691#comment-15351691
 ] 

Till Toenshoff commented on MESOS-5724:
---

See also

- [How are SSL certificate server names resolved/Can I add alternative names 
using 
keytool?|http://stackoverflow.com/questions/8443081/how-are-ssl-certificate-server-names-resolved-can-i-add-alternative-names-using/8444863#8444863]
- [URIs in the subjAltName X.509 
extension|http://security.stackexchange.com/questions/14019/uris-in-the-subjaltname-x-509-extension/14021#14021]




> SSL certificate validation should allow IP only verification.
> -
>
> Key: MESOS-5724
> URL: https://issues.apache.org/jira/browse/MESOS-5724
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 1.0.0
>Reporter: Till Toenshoff
>Priority: Blocker
>  Labels: libprocess, mesosphere, security, ssl
>
> Our SSL certificate validation currently assumes that the host (on connect 
> and on accept) does have a valid hostname. This however is not true for all  
> environments.
> {{process::network::openssl::verify}} currently only allows the validation of 
> a certificate against a hostname. 
> See 
> https://github.com/apache/mesos/blob/08866edd8a71d12f87f4f4dbefa292729efbf6ae/3rdparty/libprocess/src/openssl.cpp#L546
> RFC2818 however says that it should be perfectly valid to validate a 
> certificate  based on the IP address.
> See https://tools.ietf.org/html/rfc2818
> {noformat}
> In some cases, the URI is specified as an IP address rather than a
> hostname. In this case, the iPAddress subjectAltName must be present
> in the certificate and must exactly match the IP in the URI.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5724) SSL certificate validation should allow IP only verification.

2016-06-27 Thread Till Toenshoff (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-5724:
--
Labels: libprocess mesosphere security ssl  (was: libprocess security ssl)

> SSL certificate validation should allow IP only verification.
> -
>
> Key: MESOS-5724
> URL: https://issues.apache.org/jira/browse/MESOS-5724
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 1.0.0
>Reporter: Till Toenshoff
>Priority: Blocker
>  Labels: libprocess, mesosphere, security, ssl
>
> Our SSL certificate validation currently assumes that the host (on connect 
> and on accept) does have a valid hostname. This however is not true for all  
> environments.
> {{process::network::openssl::verify}} currently only allows the validation of 
> a certificate against a hostname. 
> See 
> https://github.com/apache/mesos/blob/08866edd8a71d12f87f4f4dbefa292729efbf6ae/3rdparty/libprocess/src/openssl.cpp#L546
> RFC2818 however says that it should be perfectly valid to validate a 
> certificate  based on the IP address.
> See https://tools.ietf.org/html/rfc2818
> {noformat}
> In some cases, the URI is specified as an IP address rather than a
> hostname. In this case, the iPAddress subjectAltName must be present
> in the certificate and must exactly match the IP in the URI.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5724) SSL certificate validation should allow IP only verification.

2016-06-27 Thread Till Toenshoff (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-5724:
--
Description: 
Our SSL certificate validation currently assumes that the host (on connect and 
on accept) does have a valid hostname. This however is not true for all  
environments.

{{process::network::openssl::verify}} currently only allows the validation of a 
certificate against a hostname. 
See 
https://github.com/apache/mesos/blob/08866edd8a71d12f87f4f4dbefa292729efbf6ae/3rdparty/libprocess/src/openssl.cpp#L546

RFC2818 however says that it should be perfectly valid to validate a 
certificate  based on the IP address.
See https://tools.ietf.org/html/rfc2818
{noformat}
In some cases, the URI is specified as an IP address rather than a
hostname. In this case, the iPAddress subjectAltName must be present
in the certificate and must exactly match the IP in the URI.
{noformat}

  was:
Our SSL certificate validation currently assumes that the host (on connect and 
on accept) does have a valid hostname. This however is not true for all valid 
environments.

{{process::network::openssl::verify}} currently only allows the validation of a 
certificate against a hostname. 
See 
https://github.com/apache/mesos/blob/08866edd8a71d12f87f4f4dbefa292729efbf6ae/3rdparty/libprocess/src/openssl.cpp#L546

RFC2818 however says that it should be perfectly valid to validate a 
certificate  based on the IP address.
See https://tools.ietf.org/html/rfc2818
{noformat}
In some cases, the URI is specified as an IP address rather than a
hostname. In this case, the iPAddress subjectAltName must be present
in the certificate and must exactly match the IP in the URI.
{noformat}


> SSL certificate validation should allow IP only verification.
> -
>
> Key: MESOS-5724
> URL: https://issues.apache.org/jira/browse/MESOS-5724
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 1.0.0
>Reporter: Till Toenshoff
>Priority: Blocker
>  Labels: libprocess, security, ssl
>
> Our SSL certificate validation currently assumes that the host (on connect 
> and on accept) does have a valid hostname. This however is not true for all  
> environments.
> {{process::network::openssl::verify}} currently only allows the validation of 
> a certificate against a hostname. 
> See 
> https://github.com/apache/mesos/blob/08866edd8a71d12f87f4f4dbefa292729efbf6ae/3rdparty/libprocess/src/openssl.cpp#L546
> RFC2818 however says that it should be perfectly valid to validate a 
> certificate  based on the IP address.
> See https://tools.ietf.org/html/rfc2818
> {noformat}
> In some cases, the URI is specified as an IP address rather than a
> hostname. In this case, the iPAddress subjectAltName must be present
> in the certificate and must exactly match the IP in the URI.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5724) SSL certificate validation should allow IP only verification.

2016-06-27 Thread Till Toenshoff (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-5724:
--
Description: 
Our SSL certificate validation currently assumes that the host (on connect and 
on accept) does have a valid hostname. This however is not true for all valid 
environments.

{{process::network::openssl::verify}} currently only allows the validation of a 
certificate against a hostname. 
See 
https://github.com/apache/mesos/blob/08866edd8a71d12f87f4f4dbefa292729efbf6ae/3rdparty/libprocess/src/openssl.cpp#L546

RFC2818 however says that it should be perfectly valid to validate a 
certificate  based on the IP address.
See https://tools.ietf.org/html/rfc2818
{noformat}
In some cases, the URI is specified as an IP address rather than a
hostname. In this case, the iPAddress subjectAltName must be present
in the certificate and must exactly match the IP in the URI.
{noformat}

  was:
Our SSL certificate validation currently assumes that the host (on connect and 
on accept) does have a valid hostname. This however is not true for all valid 
environments.

{{process::network::openssl::verify}} currently only allows the validation of a 
certificate against a hostname. 
See 
https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/openssl.cpp#L546

RFC2818 however says that it should be perfectly valid to validate a 
certificate  based on the IP address.
See https://tools.ietf.org/html/rfc2818
{noformat}
In some cases, the URI is specified as an IP address rather than a
hostname. In this case, the iPAddress subjectAltName must be present
in the certificate and must exactly match the IP in the URI.
{noformat}


> SSL certificate validation should allow IP only verification.
> -
>
> Key: MESOS-5724
> URL: https://issues.apache.org/jira/browse/MESOS-5724
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 1.0.0
>Reporter: Till Toenshoff
>Priority: Blocker
>  Labels: libprocess, security, ssl
>
> Our SSL certificate validation currently assumes that the host (on connect 
> and on accept) does have a valid hostname. This however is not true for all 
> valid environments.
> {{process::network::openssl::verify}} currently only allows the validation of 
> a certificate against a hostname. 
> See 
> https://github.com/apache/mesos/blob/08866edd8a71d12f87f4f4dbefa292729efbf6ae/3rdparty/libprocess/src/openssl.cpp#L546
> RFC2818 however says that it should be perfectly valid to validate a 
> certificate  based on the IP address.
> See https://tools.ietf.org/html/rfc2818
> {noformat}
> In some cases, the URI is specified as an IP address rather than a
> hostname. In this case, the iPAddress subjectAltName must be present
> in the certificate and must exactly match the IP in the URI.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-5724) SSL certificate validation should allow IP only verification.

2016-06-27 Thread Till Toenshoff (JIRA)

Till Toenshoff created MESOS-5724:
-

 Summary: SSL certificate validation should allow IP only 
verification.
 Key: MESOS-5724
 URL: https://issues.apache.org/jira/browse/MESOS-5724
 Project: Mesos
  Issue Type: Bug
  Components: libprocess
Affects Versions: 1.0.0
Reporter: Till Toenshoff
Priority: Blocker


Our SSL certificate validation currently assumes that the host (on connect and 
on accept) does have a valid hostname. This however is not true for all valid 
environments.

{{process::network::openssl::verify}} currently only allows the validation of a 
certificate against a hostname. 
See 
https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/openssl.cpp#L546

RFC2818 however says that it should be perfectly valid to validate a 
certificate  based on the IP address.
See https://tools.ietf.org/html/rfc2818
{noformat}
In some cases, the URI is specified as an IP address rather than a
hostname. In this case, the iPAddress subjectAltName must be present
in the certificate and must exactly match the IP in the URI.
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5419) Document all known client libraries for the Scheduler/Executor API

2016-06-27 Thread Anand Mazumdar (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-5419:
--
Assignee: Anand Mazumdar
  Sprint: Mesosphere Sprint 37
Story Points: 2

> Document all known client libraries for the Scheduler/Executor API
> --
>
> Key: MESOS-5419
> URL: https://issues.apache.org/jira/browse/MESOS-5419
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: newbie
>
> Previously during various community syncs, we had decided that we would only 
> be supporting the C++ scheduler/executor library in the Mesos code base going 
> forward. We should however, still document the client libraries available in 
> various languages to drive adoption/have a recommended list for users to look 
> up.
> This can be similar to the already existing frameworks doc: 
> http://mesos.apache.org/documentation/latest/frameworks/
> Other projects also seem to have been following a similar practice:
> https://docs.docker.com/engine/reference/api/remote_api_client_libraries/
> https://github.com/kubernetes/kubernetes/blob/master/docs/devel/client-libraries.md



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-5723) SSL-enabled libprocess will leak incoming links to forks

2016-06-27 Thread Joseph Wu (JIRA)

Joseph Wu created MESOS-5723:


 Summary: SSL-enabled libprocess will leak incoming links to forks
 Key: MESOS-5723
 URL: https://issues.apache.org/jira/browse/MESOS-5723
 Project: Mesos
  Issue Type: Bug
  Components: libprocess
Affects Versions: 0.28.0, 0.27.0, 0.26.0, 0.25.0, 0.24.0
Reporter: Joseph Wu
Assignee: Joseph Wu
Priority: Blocker
 Fix For: 1.0.0


Encountered two different buggy behaviors that can be tracked down to the same 
underlying problem.

Repro #1 (non-crashy):
(1) Start a master.  Doesn't matter if SSL is enabled or not.
(2) Start an agent, with SSL enabled.  Downgrade support has the same problem.  
The master/agent {{link}} to one another.
(3) Run a sleep task.  Keep this alive.  If you inspect FDs at this point, 
you'll notice the task has inherited the {{link}} FD (master -> agent).
(4) Restart the agent.  Due to (3), the master's {{link}} stays open.
(5) Check master's logs for the agent's re-registration message.
(6) Check the agent's logs for re-registration.  The message will not appear.  
The master is actually using the old {{link}} which is not connected to the 
agent.



Repro #2 (crashy):
(1) Start a master.  Doesn't matter if SSL is enabled or not.
(2) Start an agent, with SSL enabled.  Downgrade support has the same problem.
(3) Run ~100 sleep task one after the other, keep them all alive.  Each task 
links back to the agent.  Due to an FD leak, each task will inherit the 
incoming links from all other actors...
(4) At some point, the agent will run out of FDs and kernel panic.



It appears that the SSL socket {{accept}} call is missing {{os::nonblock}} and 
{{os::cloexec}} calls:
https://github.com/apache/mesos/blob/4b91d936f50885b6a66277e26ea3c32fe942cf1a/3rdparty/libprocess/src/libevent_ssl_socket.cpp#L794-L806

For reference, here's {{poll}} socket's {{accept}}:
https://github.com/apache/mesos/blob/4b91d936f50885b6a66277e26ea3c32fe942cf1a/3rdparty/libprocess/src/poll_socket.cpp#L53-L75




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5488) Implement READ_FILE Call in v1 master API.

2016-06-27 Thread Vinod Kone (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351576#comment-15351576
 ] 

Vinod Kone commented on MESOS-5488:
---

commit b72f7b11f1a38f5959d5948aeeb4f62143adc0bc
Author: haosdent huang 
Date:   Mon Jun 27 11:37:02 2016 -0700

Replaced `GET_FILE_CONTENTS` with `READ_FILE`.

The type of the response is renamed to `READ_FILE` to be consistent
with the call type.

Review: https://reviews.apache.org/r/49277/


> Implement READ_FILE Call in v1 master API.
> --
>
> Key: MESOS-5488
> URL: https://issues.apache.org/jira/browse/MESOS-5488
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Abhishek Dasgupta
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-5722) Docker executor should have a workaround for unresponsive `docker stop`.

2016-06-27 Thread Jie Yu (JIRA)

Jie Yu created MESOS-5722:
-

 Summary: Docker executor should have a workaround for unresponsive 
`docker stop`.
 Key: MESOS-5722
 URL: https://issues.apache.org/jira/browse/MESOS-5722
 Project: Mesos
  Issue Type: Bug
Reporter: Jie Yu


This issue is related to MESOS-4673.

When docker executor receives a KillTask, it'll call `docker stop` trying to 
stop the container. However, we observed in several occasions that this command 
might get stuck and the container might never be terminated. Because of that, 
the framework might keep sending KillTask to the executor.

We should have a workaround for that. For instance, we can issue a killtree as 
we did in https://reviews.apache.org/r/44571/ (which is best effort of course).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5702) CNI documentation example is not explicit enough about external plugins

2016-06-27 Thread Jie Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351317#comment-15351317
 ] 

Jie Yu commented on MESOS-5702:
---

Thanks [~philwinder]! I'd be happy to review the PR.

> CNI documentation example is not explicit enough about external plugins
> ---
>
> Key: MESOS-5702
> URL: https://issues.apache.org/jira/browse/MESOS-5702
> Project: Mesos
>  Issue Type: Documentation
>Affects Versions: 1.0.0
>Reporter: Philip Winder
>
> I'm testing Mesos 1.0.0-rc1 with Weave CNI. When I switched back to the CNI 
> example stated in the docs and restarted mesos-slave, I received a strange 
> error about not being able to find hadoop.
> I think that it's related to this issue: 
> https://issues.apache.org/jira/browse/MESOS-5669
> I thought I'd log the issue, but if it has been fixed by the issue above, 
> feel free to close.
> The setup, state and logs can be found here: 
> https://gist.github.com/philwinder/8f4c652723fa5c374b86a5e440bf4330



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5646) Build `network/cni` isolator with `libnl` support

2016-06-27 Thread Qian Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350955#comment-15350955
 ] 

Qian Zhang commented on MESOS-5646:
---

RR: https://reviews.apache.org/r/49262/

> Build `network/cni` isolator with `libnl` support
> -
>
> Key: MESOS-5646
> URL: https://issues.apache.org/jira/browse/MESOS-5646
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Affects Versions: 1.0.0
> Environment: linux
>Reporter: Avinash Sridharan
>Assignee: Qian Zhang
>  Labels: mesosphere
>
> Currently, the `network/cni` isolator does not have the ability to collect 
> network statistics for containers launched on a CNI network. We need to give 
> the `network/cni` isolator the ability to query interfaces, route tables and 
> statistics in the containers network namespace. To achieve this the 
> `network/cni` isolator will need to talk `netlink`.
> For enabling `netlink` API we need the `network/cni` isolator to be built 
> with libnl support. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-2546) Mesos 0.20.1 causes framework starvation on single node clusters when using Chronos and Marathon

2016-06-27 Thread Hans van den Bogert (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-2546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350939#comment-15350939
 ] 

Hans van den Bogert edited comment on MESOS-2546 at 6/27/16 1:04 PM:
-

Sorry for the resurrection of such an old issue, but why could this issue 
occur, despite https://issues.apache.org/jira/browse/MESOS-1086 having been 
implemented during the time of the issue creation.

/update
Nevermind, MESOS-1086 only makes a difference when frameworks have the exact 
same share.


was (Author: hbogert):
Sorry for the resurrection of such an old issue, but why could this issue 
occur, despite https://issues.apache.org/jira/browse/MESOS-1086 having been 
implemented during the time of the issue creation.

> Mesos 0.20.1 causes framework starvation on single node clusters when using 
> Chronos and Marathon
> 
>
> Key: MESOS-2546
> URL: https://issues.apache.org/jira/browse/MESOS-2546
> Project: Mesos
>  Issue Type: Bug
>  Components: framework
>Affects Versions: 0.21.1
>Reporter: Sunil Shah
>
> Tracking an issue raised by Chronos users that appears to be a regression in 
> Mesos: https://github.com/mesos/chronos/issues/381#issuecomment-83647539
> 1) Chronos's interval between refusing offers and receiving the next one is 
> at 0.1 seconds to allow finer grained scheduling of jobs.
> 2) On single node clusters running Mesos 0.20.1 with both Chronos and 
> Marathon installed, Marathon did not receive any offers. On multi-node 
> clusters, this behaviour was not observed. This behaviour was not observed 
> when using previous versions of Mesos.
> 3) Changing this interval back to the default value (i.e., by not setting it) 
> fixed this problem. (See 
> [commit|https://github.com/mesos/chronos/commit/fb1ab1c42207b12c8663457d07c322fc81a8ec2e].)
> This can be replicated using an installation of playa-mesos and running both 
> the latest Mesosphere packages of Chronos and Marathon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2546) Mesos 0.20.1 causes framework starvation on single node clusters when using Chronos and Marathon

2016-06-27 Thread Hans van den Bogert (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-2546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350939#comment-15350939
 ] 

Hans van den Bogert commented on MESOS-2546:


Sorry for the resurrection of such an old issue, but why could this issue 
occur, despite https://issues.apache.org/jira/browse/MESOS-1086 having been 
implemented during the time of the issue creation.

> Mesos 0.20.1 causes framework starvation on single node clusters when using 
> Chronos and Marathon
> 
>
> Key: MESOS-2546
> URL: https://issues.apache.org/jira/browse/MESOS-2546
> Project: Mesos
>  Issue Type: Bug
>  Components: framework
>Affects Versions: 0.21.1
>Reporter: Sunil Shah
>
> Tracking an issue raised by Chronos users that appears to be a regression in 
> Mesos: https://github.com/mesos/chronos/issues/381#issuecomment-83647539
> 1) Chronos's interval between refusing offers and receiving the next one is 
> at 0.1 seconds to allow finer grained scheduling of jobs.
> 2) On single node clusters running Mesos 0.20.1 with both Chronos and 
> Marathon installed, Marathon did not receive any offers. On multi-node 
> clusters, this behaviour was not observed. This behaviour was not observed 
> when using previous versions of Mesos.
> 3) Changing this interval back to the default value (i.e., by not setting it) 
> fixed this problem. (See 
> [commit|https://github.com/mesos/chronos/commit/fb1ab1c42207b12c8663457d07c322fc81a8ec2e].)
> This can be replicated using an installation of playa-mesos and running both 
> the latest Mesosphere packages of Chronos and Marathon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-5702) CNI documentation example is not explicit enough about external plugins

2016-06-27 Thread Philip Winder (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350797#comment-15350797
 ] 

Philip Winder edited comment on MESOS-5702 at 6/27/16 11:11 AM:


Confirmed. The issue was that the cni bridge plugin wasn't installed. The 
documentation isn't explicit enough. I'll try and make a PR.

For future reference, I got everything working with the following:

{code}
# Make dirs if they don't exist
sudo mkdir -p /opt/cni/bin
sudo mkdir -p /etc/cni/net.d

# Add location of binary and conf directories for CNI.
echo '/opt/cni/bin' | sudo tee /etc/mesos-slave/network_cni_plugins_dir
echo '/etc/cni/net.d' | sudo tee /etc/mesos-slave/network_cni_config_dir

# Add example Mesos CNI plugin configuration
echo '{
"name": "cni-test",
"type": "bridge",
"bridge": "mesos-cni0",
"isGateway": true,
"ipMasq": true,
"ipam": {
"type": "host-local",
"subnet": "192.168.0.0/16",
"routes": [
{ "dst":
  "0.0.0.0/0" }
]
  }
}' | sudo tee /etc/cni/net.d/bridge.conf


# Install go:
sudo curl -O https://storage.googleapis.com/golang/go1.6.linux-amd64.tar.gz
sudo tar -xvf go1.6.linux-amd64.tar.gz
sudo mv go /usr/local
export PATH=$PATH:/usr/local/go/bin
export GOPATH=$HOME

# Install CNI plugins
git clone https://github.com/containernetworking/cni.git
cd cni
git checkout v0.3.0
./build
sudo cp bin/* /opt/cni/bin

{code}

Then to create a service to ping, try this:

{code}
 # Start a container to ping. It will only be pingable from the same host.
 sudo mesos-execute --command='ifconfig ; sleep 999' 
--docker_image=amouat/network-utils --master=$MASTER:5050 --name=pingme 
--networks=cni-test
 # Then log on to the machine that the task was started. E.g. if it started on 
S0, log onto SLAVE0. Then you can:
 ping 192.168.0.2 # Or whatever IP it started on.
 # When in bridge mode, the container connects to an internal network local to 
that host. Hence, the pinger must run on the same machine as the pingme. So 
restart as many times as necessary to get it running on the same host.
 # Get the ip address from the first container.
 sudo mesos-execute --command='ifconfig && ping -v -c 1 192.168.0.2 && sleep 
9' --docker_image=amouat/network-utils --master=$MASTER:5050 --name=pinger 
--networks=cni-test
{code}


was (Author: philwinder):
Confirmed. The issue was that the cni bridge plugin wasn't installed. The 
documentation isn't explicit enough. I'll try and make a PR.

For future reference, I got everything working with the following:

{code}
# Make dirs if they don't exist
sudo mkdir -p /opt/cni/bin
sudo mkdir -p /etc/cni/net.d

# Add location of binary and conf directories for CNI.
echo '/opt/cni/bin' | sudo tee /etc/mesos-slave/network_cni_plugins_dir
echo '/etc/cni/net.d' | sudo tee /etc/mesos-slave/network_cni_config_dir

# Add example Mesos CNI plugin configuration
echo '{
"name": "cni-test",
"type": "bridge",
"bridge": "mesos-cni0",
"isGateway": true,
"ipMasq": true,
"ipam": {
"type": "host-local",
"subnet": "192.168.0.0/16",
"routes": [
{ "dst":
  "0.0.0.0/0" }
]
  }
}' | sudo tee /etc/cni/net.d/bridge.conf


# Install go:
sudo curl -O https://storage.googleapis.com/golang/go1.6.linux-amd64.tar.gz
sudo tar -xvf go1.6.linux-amd64.tar.gz
sudo mv go /usr/local
export PATH=$PATH:/usr/local/go/bin
export GOPATH=$HOME

# Install CNI plugins
git clone https://github.com/containernetworking/cni.git
cd cni
git checkout v0.3.0
./build
sudo cp bin/* /opt/cni/bin

{code}

Then to create a service to ping, try this:

{code}
 # Start a container to ping. It will only be pingable from the same host.
 sudo mesos-execute --command='ifconfig ; sleep 999' 
--docker_image=amouat/network-utils 
--master=ec2-52-16-230-26.eu-west-1.compute.amazonaws.com:5050 --name=pingme 
--networks=cni-test
 # Then log on to the machine that the task was started. E.g. if it started on 
S0, log onto SLAVE0. Then you can:
 ping 192.168.0.2 # Or whatever IP it started on.
 # When in bridge mode, the container connects to an internal network local to 
that host. Hence, the pinger must run on the same machine as the pingme. So 
restart as many times as necessary to get it running on the same host.
 # Get the ip address from the first container.
 sudo mesos-execute --command='ifconfig && ping -v -c 1 192.168.0.2 && sleep 
9' --docker_image=amouat/network-utils 
--master=ec2-52-16-230-26.eu-west-1.compute.amazonaws.com:5050 --name=pinger 
--networks=cni-test
{code}

> CNI documentation example is not explicit enough about external plugins
> ---
>
> Key: MESOS-5702
> URL: https://issues.apache.org/jira/browse/MESOS-5702
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Philip Winder
>
> I'm testing Mesos 1.0.0-rc1 with Weave CNI.

[jira] [Comment Edited] (MESOS-5702) CNI documentation example is not explicit enough about external plugins

2016-06-27 Thread Philip Winder (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350797#comment-15350797
 ] 

Philip Winder edited comment on MESOS-5702 at 6/27/16 11:10 AM:


Confirmed. The issue was that the cni bridge plugin wasn't installed. The 
documentation isn't explicit enough. I'll try and make a PR.

For future reference, I got everything working with the following:

{code}
# Make dirs if they don't exist
sudo mkdir -p /opt/cni/bin
sudo mkdir -p /etc/cni/net.d

# Add location of binary and conf directories for CNI.
echo '/opt/cni/bin' | sudo tee /etc/mesos-slave/network_cni_plugins_dir
echo '/etc/cni/net.d' | sudo tee /etc/mesos-slave/network_cni_config_dir

# Add example Mesos CNI plugin configuration
echo '{
"name": "cni-test",
"type": "bridge",
"bridge": "mesos-cni0",
"isGateway": true,
"ipMasq": true,
"ipam": {
"type": "host-local",
"subnet": "192.168.0.0/16",
"routes": [
{ "dst":
  "0.0.0.0/0" }
]
  }
}' | sudo tee /etc/cni/net.d/bridge.conf


# Install go:
sudo curl -O https://storage.googleapis.com/golang/go1.6.linux-amd64.tar.gz
sudo tar -xvf go1.6.linux-amd64.tar.gz
sudo mv go /usr/local
export PATH=$PATH:/usr/local/go/bin
export GOPATH=$HOME

# Install CNI plugins
git clone https://github.com/containernetworking/cni.git
cd cni
git checkout v0.3.0
./build
sudo cp bin/* /opt/cni/bin

{code}

Then to create a service to ping, try this:

{code}
 # Start a container to ping. It will only be pingable from the same host.
 sudo mesos-execute --command='ifconfig ; sleep 999' 
--docker_image=amouat/network-utils 
--master=ec2-52-16-230-26.eu-west-1.compute.amazonaws.com:5050 --name=pingme 
--networks=cni-test
 # Then log on to the machine that the task was started. E.g. if it started on 
S0, log onto SLAVE0. Then you can:
 ping 192.168.0.2 # Or whatever IP it started on.
 # When in bridge mode, the container connects to an internal network local to 
that host. Hence, the pinger must run on the same machine as the pingme. So 
restart as many times as necessary to get it running on the same host.
 # Get the ip address from the first container.
 sudo mesos-execute --command='ifconfig && ping -v -c 1 192.168.0.2 && sleep 
9' --docker_image=amouat/network-utils 
--master=ec2-52-16-230-26.eu-west-1.compute.amazonaws.com:5050 --name=pinger 
--networks=cni-test
{code}


was (Author: philwinder):
Confirmed. The issue was that the cni bridge plugin wasn't installed. The 
documentation isn't explicit enough. I'll try and make a PR.

For future reference, I got everything working with the following:

{code:bash}
# Make dirs if they don't exist
sudo mkdir -p /opt/cni/bin
sudo mkdir -p /etc/cni/net.d

# Add location of binary and conf directories for CNI.
echo '/opt/cni/bin' | sudo tee /etc/mesos-slave/network_cni_plugins_dir
echo '/etc/cni/net.d' | sudo tee /etc/mesos-slave/network_cni_config_dir

# Add example Mesos CNI plugin configuration
echo '{
"name": "cni-test",
"type": "bridge",
"bridge": "mesos-cni0",
"isGateway": true,
"ipMasq": true,
"ipam": {
"type": "host-local",
"subnet": "192.168.0.0/16",
"routes": [
{ "dst":
  "0.0.0.0/0" }
]
  }
}' | sudo tee /etc/cni/net.d/bridge.conf


# Install go:
sudo curl -O https://storage.googleapis.com/golang/go1.6.linux-amd64.tar.gz
sudo tar -xvf go1.6.linux-amd64.tar.gz
sudo mv go /usr/local
export PATH=$PATH:/usr/local/go/bin
export GOPATH=$HOME

# Install CNI plugins
git clone https://github.com/containernetworking/cni.git
cd cni
git checkout v0.3.0
./build
sudo cp bin/* /opt/cni/bin

{code}

Then to create a service to ping, try this:

{code:bash}
 # Start a container to ping. It will only be pingable from the same host.
 sudo mesos-execute --command='ifconfig ; sleep 999' 
--docker_image=amouat/network-utils 
--master=ec2-52-16-230-26.eu-west-1.compute.amazonaws.com:5050 --name=pingme 
--networks=cni-test
 # Then log on to the machine that the task was started. E.g. if it started on 
S0, log onto SLAVE0. Then you can:
 ping 192.168.0.2 # Or whatever IP it started on.
 # When in bridge mode, the container connects to an internal network local to 
that host. Hence, the pinger must run on the same machine as the pingme. So 
restart as many times as necessary to get it running on the same host.
 # Get the ip address from the first container.
 sudo mesos-execute --command='ifconfig && ping -v -c 1 192.168.0.2 && sleep 
9' --docker_image=amouat/network-utils 
--master=ec2-52-16-230-26.eu-west-1.compute.amazonaws.com:5050 --name=pinger 
--networks=cni-test
{code}

> CNI documentation example is not explicit enough about external plugins
> ---
>
> Key: MESOS-5702
> URL: https://issues.apache.org/jira/browse/MESOS-5702
> Project: Mesos
>  Issue Type: Bug
>Affects Versions:

[jira] [Commented] (MESOS-5702) CNI documentation example is not explicit enough about external plugins

2016-06-27 Thread Philip Winder (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350797#comment-15350797
 ] 

Philip Winder commented on MESOS-5702:
--

Confirmed. The issue was that the cni bridge plugin wasn't installed. The 
documentation isn't explicit enough. I'll try and make a PR.

For future reference, I got everything working with the following:

{code:bash}
# Make dirs if they don't exist
sudo mkdir -p /opt/cni/bin
sudo mkdir -p /etc/cni/net.d

# Add location of binary and conf directories for CNI.
echo '/opt/cni/bin' | sudo tee /etc/mesos-slave/network_cni_plugins_dir
echo '/etc/cni/net.d' | sudo tee /etc/mesos-slave/network_cni_config_dir

# Add example Mesos CNI plugin configuration
echo '{
"name": "cni-test",
"type": "bridge",
"bridge": "mesos-cni0",
"isGateway": true,
"ipMasq": true,
"ipam": {
"type": "host-local",
"subnet": "192.168.0.0/16",
"routes": [
{ "dst":
  "0.0.0.0/0" }
]
  }
}' | sudo tee /etc/cni/net.d/bridge.conf


# Install go:
sudo curl -O https://storage.googleapis.com/golang/go1.6.linux-amd64.tar.gz
sudo tar -xvf go1.6.linux-amd64.tar.gz
sudo mv go /usr/local
export PATH=$PATH:/usr/local/go/bin
export GOPATH=$HOME

# Install CNI plugins
git clone https://github.com/containernetworking/cni.git
cd cni
git checkout v0.3.0
./build
sudo cp bin/* /opt/cni/bin

{code}

Then to create a service to ping, try this:

{code:bash}
 # Start a container to ping. It will only be pingable from the same host.
 sudo mesos-execute --command='ifconfig ; sleep 999' 
--docker_image=amouat/network-utils 
--master=ec2-52-16-230-26.eu-west-1.compute.amazonaws.com:5050 --name=pingme 
--networks=cni-test
 # Then log on to the machine that the task was started. E.g. if it started on 
S0, log onto SLAVE0. Then you can:
 ping 192.168.0.2 # Or whatever IP it started on.
 # When in bridge mode, the container connects to an internal network local to 
that host. Hence, the pinger must run on the same machine as the pingme. So 
restart as many times as necessary to get it running on the same host.
 # Get the ip address from the first container.
 sudo mesos-execute --command='ifconfig && ping -v -c 1 192.168.0.2 && sleep 
9' --docker_image=amouat/network-utils 
--master=ec2-52-16-230-26.eu-west-1.compute.amazonaws.com:5050 --name=pinger 
--networks=cni-test
{code}

> CNI documentation example is not explicit enough about external plugins
> ---
>
> Key: MESOS-5702
> URL: https://issues.apache.org/jira/browse/MESOS-5702
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Philip Winder
>
> I'm testing Mesos 1.0.0-rc1 with Weave CNI. When I switched back to the CNI 
> example stated in the docs and restarted mesos-slave, I received a strange 
> error about not being able to find hadoop.
> I think that it's related to this issue: 
> https://issues.apache.org/jira/browse/MESOS-5669
> I thought I'd log the issue, but if it has been fixed by the issue above, 
> feel free to close.
> The setup, state and logs can be found here: 
> https://gist.github.com/philwinder/8f4c652723fa5c374b86a5e440bf4330



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5702) CNI documentation example is not explicit enough about external plugins

2016-06-27 Thread Philip Winder (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Winder updated MESOS-5702:
-
Summary: CNI documentation example is not explicit enough about external 
plugins  (was: CNI example doesn't work: hadoop not found)

> CNI documentation example is not explicit enough about external plugins
> ---
>
> Key: MESOS-5702
> URL: https://issues.apache.org/jira/browse/MESOS-5702
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Philip Winder
>
> I'm testing Mesos 1.0.0-rc1 with Weave CNI. When I switched back to the CNI 
> example stated in the docs and restarted mesos-slave, I received a strange 
> error about not being able to find hadoop.
> I think that it's related to this issue: 
> https://issues.apache.org/jira/browse/MESOS-5669
> I thought I'd log the issue, but if it has been fixed by the issue above, 
> feel free to close.
> The setup, state and logs can be found here: 
> https://gist.github.com/philwinder/8f4c652723fa5c374b86a5e440bf4330



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5707) LocalAuthorizer should error if passed a GET_ENDPOINT ACL with an unhandled path

2016-06-27 Thread Alexander Rojas (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350736#comment-15350736
 ] 

Alexander Rojas commented on MESOS-5707:


[r/49257/|https://reviews.apache.org/r/49257/]: Added documentation on coarse 
grain authorization for endpoints.

> LocalAuthorizer should error if passed a GET_ENDPOINT ACL with an unhandled 
> path
> 
>
> Key: MESOS-5707
> URL: https://issues.apache.org/jira/browse/MESOS-5707
> Project: Mesos
>  Issue Type: Task
>  Components: security
>Reporter: Adam B
>Assignee: Alexander Rojas
>Priority: Critical
>  Labels: mesosphere, security
> Fix For: 1.0.0
>
>
> Since GET_ENDPOINT_WITH_PATH doesn't (yet) work with any arbitrary path, we 
> should
> a) validate --acls and error if GET_ENDPOINT_WITH_PATH has a path object that 
> doesn't match an endpoint that uses this authz strategy.
> b) document exactly which endpoints support GET_ENDPOINT_WITH_PATH



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5721) dd

2016-06-27 Thread Sunzhe (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350691#comment-15350691
 ] 

Sunzhe commented on MESOS-5721:
---

Hi,
Sorry, this is an accident. I want to know how to close an issue.

Best Regards
Sunzhe




> dd
> --
>
> Key: MESOS-5721
> URL: https://issues.apache.org/jira/browse/MESOS-5721
> Project: Mesos
>  Issue Type: Bug
>Reporter: Sunzhe
>
> dd



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-5500) Implement UNRESERVE_RESOURCES Call in v1 master API.

2016-06-27 Thread Abhishek Dasgupta (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350684#comment-15350684
 ] 

Abhishek Dasgupta edited comment on MESOS-5500 at 6/27/16 9:16 AM:
---

RR: 
https://reviews.apache.org/r/49226/
https://reviews.apache.org/r/49227/


was (Author: a10gupta):
RR: https://reviews.apache.org/r/49226/

> Implement UNRESERVE_RESOURCES Call in v1 master API.
> 
>
> Key: MESOS-5500
> URL: https://issues.apache.org/jira/browse/MESOS-5500
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Abhishek Dasgupta
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5500) Implement UNRESERVE_RESOURCES Call in v1 master API.

2016-06-27 Thread Abhishek Dasgupta (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350684#comment-15350684
 ] 

Abhishek Dasgupta commented on MESOS-5500:
--

RR: https://reviews.apache.org/r/49226/

> Implement UNRESERVE_RESOURCES Call in v1 master API.
> 
>
> Key: MESOS-5500
> URL: https://issues.apache.org/jira/browse/MESOS-5500
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Abhishek Dasgupta
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5721) dd

2016-06-27 Thread Till Toenshoff (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350670#comment-15350670
 ] 

Till Toenshoff commented on MESOS-5721:
---

[~Sunzhe] please fill in the details or simply {{delete}} this ticket via 
{{More}} > {{Delete}} in case this was an accident. 

> dd
> --
>
> Key: MESOS-5721
> URL: https://issues.apache.org/jira/browse/MESOS-5721
> Project: Mesos
>  Issue Type: Bug
>Reporter: Sunzhe
>
> dd



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-5721) dd

2016-06-27 Thread Sunzhe (JIRA)

Sunzhe created MESOS-5721:
-

 Summary: dd
 Key: MESOS-5721
 URL: https://issues.apache.org/jira/browse/MESOS-5721
 Project: Mesos
  Issue Type: Bug
Reporter: Sunzhe


dd



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-5709) Authorization for /roles

2016-06-27 Thread Joerg Schad (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joerg Schad reassigned MESOS-5709:
--

Assignee: Joerg Schad  (was: zhou xing)

> Authorization for /roles
> 
>
> Key: MESOS-5709
> URL: https://issues.apache.org/jira/browse/MESOS-5709
> Project: Mesos
>  Issue Type: Task
>  Components: security
>Reporter: Adam B
>Assignee: Joerg Schad
>Priority: Minor
>  Labels: mesosphere, security
> Fix For: 1.0.0
>
>
> The /roles endpoint exposes the list of all roles and their weights, as well 
> as the list of all frameworkIds registered with each role. This is a superset 
> of the information exposed on GET /weights, which we already protect. We 
> should protect the data in /roles the same way.
> - Should we reuse VIEW_FRAMEWORK with role (from /state)?
> - Should we add a new VIEW_ROLE and adapt GET_WEIGHTS to use it?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-5720) Can't autodiscovery GPU resources without '--enable-nvidia-gpu-support' and '--nvidia_gpu_devices' flags

2016-06-27 Thread Sunzhe (JIRA)

Sunzhe created MESOS-5720:
-

 Summary: Can't autodiscovery GPU resources without 
'--enable-nvidia-gpu-support' and '--nvidia_gpu_devices' flags
 Key: MESOS-5720
 URL: https://issues.apache.org/jira/browse/MESOS-5720
 Project: Mesos
  Issue Type: Bug
 Environment: RHEL 7.2
Reporter: Sunzhe
 Fix For: 1.0.0


Prerequisite: In MESOS\-5257  "By default, with no '\-\-nvidia_gpu_devices' 
flag or `gpus` resources flag, the new auto-discovery will simply enumerate all 
of the GPUs on the system" and in MESOS\-5630 "removes this 
flag(\-\-enable-nvidia-gpu-support) and enables this support for all builds on 
Linux."

So, I '../configure' without any flag, and start agent without '\-\-resources' 
or '\-\-nvidia_gpu_devices' ,  but can not discovery GPU resources, and I also 
start agent with '\-\-resources' and '\-\-nvidia_gpu_devices' , it also does 
not work.

I'm sure the NVIDIA GPUs on my machines are OK, because with 
'\-\-enable-nvidia-gpu-support' when './configure' and with '\-\-resources', 
'\-\-nvidia_gpu_devices' when starting agents it works well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-5719) Can't autodiscovery GPU resources without '--enable-nvidia-gpu-support' and '--nvidia_gpu_devices' flags

2016-06-27 Thread Sunzhe (JIRA)

Sunzhe created MESOS-5719:
-

 Summary: Can't autodiscovery GPU resources without 
'--enable-nvidia-gpu-support' and '--nvidia_gpu_devices' flags
 Key: MESOS-5719
 URL: https://issues.apache.org/jira/browse/MESOS-5719
 Project: Mesos
  Issue Type: Bug
 Environment: RHEL 7.2
Reporter: Sunzhe
 Fix For: 1.0.0


Prerequisite: In MESOS\-5257  "By default, with no '\-\-nvidia_gpu_devices' 
flag or `gpus` resources flag, the new auto-discovery will simply enumerate all 
of the GPUs on the system" and in MESOS\-5630 "removes this 
flag(\-\-enable-nvidia-gpu-support) and enables this support for all builds on 
Linux."

So, I '../configure' without any flag, and start agent without '\-\-resources' 
or '\-\-nvidia_gpu_devices' ,  but can not discovery GPU resources, and I also 
start agent with '\-\-resources' and '\-\-nvidia_gpu_devices' , it also does 
not work.

I'm sure the NVIDIA GPUs on my machines are OK, because with 
'\-\-enable-nvidia-gpu-support' when './configure' and with '\-\-resources', 
'\-\-nvidia_gpu_devices' when starting agents it works well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-5709) Authorization for /roles

2016-06-27 Thread zhou xing (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhou xing reassigned MESOS-5709:


Assignee: zhou xing

> Authorization for /roles
> 
>
> Key: MESOS-5709
> URL: https://issues.apache.org/jira/browse/MESOS-5709
> Project: Mesos
>  Issue Type: Task
>  Components: security
>Reporter: Adam B
>Assignee: zhou xing
>Priority: Minor
>  Labels: mesosphere, security
> Fix For: 1.0.0
>
>
> The /roles endpoint exposes the list of all roles and their weights, as well 
> as the list of all frameworkIds registered with each role. This is a superset 
> of the information exposed on GET /weights, which we already protect. We 
> should protect the data in /roles the same way.
> - Should we reuse VIEW_FRAMEWORK with role (from /state)?
> - Should we add a new VIEW_ROLE and adapt GET_WEIGHTS to use it?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-5702) CNI example doesn't work: hadoop not found

2016-06-27 Thread Philip Winder (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350649#comment-15350649
 ] 

Philip Winder edited comment on MESOS-5702 at 6/27/16 8:48 AM:
---

Thanks Jie. I turned on the logging and found this:

```
Failed to create a containerizer: Could not create MesosContainerizer: Could 
not create isolator 'network/cni': Failed to find CNI plugin 
'/opt/cni/bin/bridge' used by CNI network configuration file 
'/etc/cni/net.d/bridge.conf'
```

So, it seems that we actually need to install a bridge plugin. The 
documentation wasn't clear on that. I assumed that bridge was some internal 
plugin provided by the Kernel or Mesos. I'll try adding that. Again, I'm 
assuming they mean the cni bridge example.


was (Author: philwinder):
Thanks Jie. I turned on the logging and found this:
```
Failed to create a containerizer: Could not create MesosContainerizer: Could 
not create isolator 'network/cni': Failed to find CNI plugin 
'/opt/cni/bin/bridge' used by CNI network configuration file 
'/etc/cni/net.d/bridge.conf'
```
So, it seems that we actually need to install a bridge plugin. The 
documentation wasn't clear on that. I assumed that bridge was some internal 
plugin provided by the Kernel or Mesos. I'll try adding that. Again, I'm 
assuming they mean the cni bridge example.

> CNI example doesn't work: hadoop not found
> --
>
> Key: MESOS-5702
> URL: https://issues.apache.org/jira/browse/MESOS-5702
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Philip Winder
>
> I'm testing Mesos 1.0.0-rc1 with Weave CNI. When I switched back to the CNI 
> example stated in the docs and restarted mesos-slave, I received a strange 
> error about not being able to find hadoop.
> I think that it's related to this issue: 
> https://issues.apache.org/jira/browse/MESOS-5669
> I thought I'd log the issue, but if it has been fixed by the issue above, 
> feel free to close.
> The setup, state and logs can be found here: 
> https://gist.github.com/philwinder/8f4c652723fa5c374b86a5e440bf4330



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5702) CNI example doesn't work: hadoop not found

2016-06-27 Thread Philip Winder (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350649#comment-15350649
 ] 

Philip Winder commented on MESOS-5702:
--

Thanks Jie. I turned on the logging and found this:
```
Failed to create a containerizer: Could not create MesosContainerizer: Could 
not create isolator 'network/cni': Failed to find CNI plugin 
'/opt/cni/bin/bridge' used by CNI network configuration file 
'/etc/cni/net.d/bridge.conf'
```
So, it seems that we actually need to install a bridge plugin. The 
documentation wasn't clear on that. I assumed that bridge was some internal 
plugin provided by the Kernel or Mesos. I'll try adding that. Again, I'm 
assuming they mean the cni bridge example.

> CNI example doesn't work: hadoop not found
> --
>
> Key: MESOS-5702
> URL: https://issues.apache.org/jira/browse/MESOS-5702
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Philip Winder
>
> I'm testing Mesos 1.0.0-rc1 with Weave CNI. When I switched back to the CNI 
> example stated in the docs and restarted mesos-slave, I received a strange 
> error about not being able to find hadoop.
> I think that it's related to this issue: 
> https://issues.apache.org/jira/browse/MESOS-5669
> I thought I'd log the issue, but if it has been fixed by the issue above, 
> feel free to close.
> The setup, state and logs can be found here: 
> https://gist.github.com/philwinder/8f4c652723fa5c374b86a5e440bf4330



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-5718) Mesos UI shows "Taks is in RUNNING status" but can't find it in the mesos Agent.

2016-06-27 Thread chenqiang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350648#comment-15350648
 ] 

chenqiang edited comment on MESOS-5718 at 6/27/16 8:45 AM:
---

yes, it's still an issue, I changed to unassigned. I will assign it back to me 
if I have the solution.


was (Author: chenqiang):
yes, it's still an issue, I changed to unsigned. I will assign it back to me if 
I have the solution.

> Mesos UI shows "Taks is in RUNNING status" but can't find it in the mesos 
> Agent.
> 
>
> Key: MESOS-5718
> URL: https://issues.apache.org/jira/browse/MESOS-5718
> Project: Mesos
>  Issue Type: Bug
>Reporter: chenqiang
>
> Now, we find an issue that a task launched by marathon with docker container 
> shows "Task is in RUNNING status" in Mesos UI, but can't find it in the mesos 
> Agent host. Namely, the docker container doesn't exist but the Task is shown 
> As RUNNING in Mesos UI.  so interesting...
> Parts log is attached as belows:
> ```
> I0627 14:31:30.239467  3913 slave.cpp:1912] Asked to kill task 
> tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799 of 
> framework 20141201-145651-1900714250-5050-3484-
> W0627 14:31:30.239547  3913 slave.cpp:2025] Ignoring kill task 
> tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799 
> because the executor 
> 'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799' 
> of framework 20141201-145651-1900714250-5050-3484- at 
> executor(1)@10.153.96.22:14578 is terminating/terminated
> I0624 14:46:04.398646  3921 slave.cpp:4511] Sending reconnect request to 
> executor 
> 'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799' 
> of framework 20141201-145651-1900714250-5050-3484- at 
> executor(1)@10.153.96.22:14578
> I0624 14:46:06.399073  3899 slave.cpp:2991] Killing un-reregistered executor 
> 'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799' 
> of framework 20141201-145651-1900714250-5050-3484- at 
> executor(1)@10.153.96.22:14578
> I0624 14:46:06.399183  3899 slave.cpp:4571] Finished recovery
> I0624 14:46:06.399375  3902 docker.cpp:1724] Destroying container 
> 'fa37fc7c-7ef1-478a-81a2-cae38ab3e4cb'
> I0624 14:46:06.399431  3902 docker.cpp:1852] Running docker stop on container 
> 'fa37fc7c-7ef1-478a-81a2-cae38ab3e4cb'
> ``` 
> What's the root cause ? It seems executor of that task is terminated, but the 
> task is ignored kill by slave.
> FIX: After restart mesos-slave, the RUNNING task becomes  in FAILED status, 
> and we can see it is launched again in other Agent, the task restores to 
> normal...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 >

1 - 100 of 116 matches

Mail list logo