[jira] [Comment Edited] (MESOS-7003) Introduce the AuthenticationContext

2017-02-17 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864985#comment-15864985
 ] 

Greg Mann edited comment on MESOS-7003 at 2/18/17 3:04 AM:
---

Reviews here:
https://reviews.apache.org/r/56623/
https://reviews.apache.org/r/56617/
https://reviews.apache.org/r/56618/
https://reviews.apache.org/r/56619/
https://reviews.apache.org/r/56812/
https://reviews.apache.org/r/56813/
https://reviews.apache.org/r/56624/
https://reviews.apache.org/r/56621/


was (Author: greggomann):
Reviews here:
https://reviews.apache.org/r/56623/
https://reviews.apache.org/r/56617/
https://reviews.apache.org/r/56618/
https://reviews.apache.org/r/56619/
https://reviews.apache.org/r/56624/
https://reviews.apache.org/r/56621/

> Introduce the AuthenticationContext
> ---
>
> Key: MESOS-7003
> URL: https://issues.apache.org/jira/browse/MESOS-7003
> Project: Mesos
>  Issue Type: Task
>  Components: executor, security
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: executor, security
>
> We will introduce a new type to represent the identity of an authenticated 
> entity in Mesos: the {{AuthenticationContext}}. To accomplish this, the 
> following should be done:
> * Add the new AuthenticationContext type
> * Update the AuthenticationResult type to use the AuthenticationContext
> * Update all authenticated endpoint handlers to handle this new type
> * Update the default authenticator modules to use the new type



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-5186) mesos.interface: Allow using protobuf 3.x

2017-02-17 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-5186:

Shepherd: haosdent  (was: Anand Mazumdar)

> mesos.interface: Allow using protobuf 3.x
> -
>
> Key: MESOS-5186
> URL: https://issues.apache.org/jira/browse/MESOS-5186
> Project: Mesos
>  Issue Type: Improvement
>  Components: python api
>Reporter: Myautsai PAN
>Assignee: Anthony Sottile
>  Labels: protobuf, python
>
> We're working on integrating TensorFlow(https://www.tensorflow.org) with 
> mesos. Both the two require {{protobuf}}. The python package 
> {{mesos.interface}} requires {{protobuf>=2.6.1,<3}}, but {{tensorflow}} 
> requires {{protobuf>=3.0.0}} . Though protobuf 3.x is not compatible with 
> protobuf 2.x, but anyway we modify the {{setup.py}} 
> (https://github.com/apache/mesos/blob/66cddaf/src/python/interface/setup.py.in#L29)
> from {{'install_requires': [ 'google-common>=0.0.1', 'protobuf>=2.6.1,<3' 
> ],}} to {{'install_requires': [ 'google-common>=0.0.1', 'protobuf>=2.6.1' ],}}
> It works fine. Would you please consider support protobuf 3.x officially in 
> the next release? Maybe just remove the {{,<3}} restriction is enough.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7142) Revisit MSVC decltype bug

2017-02-17 Thread Andrew Schwartzmeyer (JIRA)
Andrew Schwartzmeyer created MESOS-7142:
---

 Summary: Revisit MSVC decltype bug
 Key: MESOS-7142
 URL: https://issues.apache.org/jira/browse/MESOS-7142
 Project: Mesos
  Issue Type: Improvement
  Components: libprocess
 Environment: Windows with Visual Studio 2017 RC
Reporter: Andrew Schwartzmeyer
Assignee: Andrew Schwartzmeyer
Priority: Minor


Review https://reviews.apache.org/r/56781/ works around an existent and 
acknowledged bug in MSVC. From MSVC team:

{quote}
I have preprocessed file from Andy yesterday, thank you. I can verify that this 
is a bug in the latest compiler where it’s failing to get the right context for 
the decltype expression for the return type of function. However, MSVC does 
deduce the right type from the return expression in the body. We have a bug in 
the compiler logged now and will be tracked for next release. 
We’ll be adding Mesos project to our daily RealWorld Testing set from now on as 
continuous validation for MSVC compiler. Thank you for bringing this up to our 
attention.

template 
  auto then(F&& f) const-> decltype(this->then(std::forward(f), Prefer()))

Should really expand to this expression, but it’s failing.

template 
  auto then(F&& f) const -> decltype(static_cast*>(this)->then(std::forward(f), Prefer()))

The workaround from Michael to skip explicit return type for auto function 
should be actually better source change for MSVC compiler. For completeness 
sake, you can also just remove ‘this->’ from the decltype expression to make it 
work for MSVC compiler 
-> decltype(then(std::forward(f), Prefer()))

Another thing worth pointing out is, adding ‘this->’ in the body of the 
function shows that MSVC does correctly deduce the return type.
  template 
  auto then(F&& f) const
  //  -> decltype(then(std::forward(f), Prefer()))
// -> decltype(static_cast*>(this)->then(std::forward(f), Prefer()))
  {
return this->then(std::forward(f), Prefer());
  }
{quote}

This issue tracks revisiting the work-around when the first patch to VS2017 is 
released, as the compiler bug itself should be fixed then.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7066) Allow permissive bit to be set for individual acls (in addition to the global level)

2017-02-17 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872702#comment-15872702
 ] 

Yan Xu commented on MESOS-7066:
---

Oh yes I guess it would.

> Allow permissive bit to be set for individual acls (in addition to the global 
> level)
> 
>
> Key: MESOS-7066
> URL: https://issues.apache.org/jira/browse/MESOS-7066
> Project: Mesos
>  Issue Type: Improvement
>  Components: security
>Reporter: Anindya Sinha
>Assignee: Adam B
>Priority: Minor
>  Labels: acl
>
> Currently, while defining ACLs for master or agents, there is a boolean field 
>  {{permissive}} that can be set on the global level that applies to all acls.
> It defines the behavior when no ACL matches to the request made. If set to 
> true (which is the default) it will allow by default all non-matching 
> requests, if set to false it will reject all non-matching requests.
> We should consider supporting a local {{permissive}} field specific to each 
> ACL which would override the global {{permissive}} field if the local 
> {{permissive}} field is set.
> The use case is that if support for a new ACL is added to master or agent, 
> and a cluster uses the global {{permissive}} field set to {{false}}, that 
> would imply that the authorization for the newly added ACL shall fail unless 
> the operator adds the corresponding entry for the newly added ACL, which 
> leads to a upgrade issue.
> If we have both the global as well as local {{permissive}} bit, then the 
> global {{permissive}} bit can be set to {{true}}, whereas the local 
> {{permissive}} bit can be set to true or false based on the use case. With 
> this approach, it would not be mandatory to add an entry for the new ACL 
> entry unless the operator chooses to do so.
> That obviously also leads to the fact that maybe we should not have the 
> global {{permissive}} bit in the first place.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7066) Allow permissive bit to be set for individual acls (in addition to the global level)

2017-02-17 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872691#comment-15872691
 ] 

Greg Mann commented on MESOS-7066:
--

[~xujyan] if I understand the situation you mentioned correctly, the operator 
could do this:
* Set the global permissive bit to true
* In the specification of the single ACL that you want to behave as 
non-permissive, provide {{subject=ANY, object=NONE}} as the _last_ item in its 
list. This will have the effect of {{permissive=false}} for that ACL only.

Would that satisfy the use case you're thinking of?

> Allow permissive bit to be set for individual acls (in addition to the global 
> level)
> 
>
> Key: MESOS-7066
> URL: https://issues.apache.org/jira/browse/MESOS-7066
> Project: Mesos
>  Issue Type: Improvement
>  Components: security
>Reporter: Anindya Sinha
>Assignee: Adam B
>Priority: Minor
>  Labels: acl
>
> Currently, while defining ACLs for master or agents, there is a boolean field 
>  {{permissive}} that can be set on the global level that applies to all acls.
> It defines the behavior when no ACL matches to the request made. If set to 
> true (which is the default) it will allow by default all non-matching 
> requests, if set to false it will reject all non-matching requests.
> We should consider supporting a local {{permissive}} field specific to each 
> ACL which would override the global {{permissive}} field if the local 
> {{permissive}} field is set.
> The use case is that if support for a new ACL is added to master or agent, 
> and a cluster uses the global {{permissive}} field set to {{false}}, that 
> would imply that the authorization for the newly added ACL shall fail unless 
> the operator adds the corresponding entry for the newly added ACL, which 
> leads to a upgrade issue.
> If we have both the global as well as local {{permissive}} bit, then the 
> global {{permissive}} bit can be set to {{true}}, whereas the local 
> {{permissive}} bit can be set to true or false based on the use case. With 
> this approach, it would not be mandatory to add an entry for the new ACL 
> entry unless the operator chooses to do so.
> That obviously also leads to the fact that maybe we should not have the 
> global {{permissive}} bit in the first place.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7141) Support hook scripts to customize actions for container's lifecycle

2017-02-17 Thread Jason Lai (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lai updated MESOS-7141:
-
Summary: Support hook scripts to customize actions for container's 
lifecycle  (was: https://uber.zoom.us/j/153272109)

> Support hook scripts to customize actions for container's lifecycle
> ---
>
> Key: MESOS-7141
> URL: https://issues.apache.org/jira/browse/MESOS-7141
> Project: Mesos
>  Issue Type: Task
>  Components: containerization, isolation
>Reporter: Jason Lai
>Assignee: Jason Lai
>
> Inspired by [hooks | 
> https://github.com/opencontainers/runtime-spec/blob/master/config.md#hooks] 
> in [OCI's runtime spec | https://github.com/opencontainers/runtime-spec], it 
> would be great to have scripts hooked into the lifecycle of containers.
> The OCI doc has specified 3 stages for hooking:
> * Prestart
> * Poststart
> * Poststop
> We can consider having the 3 stages to start with.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7141) https://uber.zoom.us/j/153272109

2017-02-17 Thread Jason Lai (JIRA)
Jason Lai created MESOS-7141:


 Summary: https://uber.zoom.us/j/153272109
 Key: MESOS-7141
 URL: https://issues.apache.org/jira/browse/MESOS-7141
 Project: Mesos
  Issue Type: Task
  Components: containerization, isolation
Reporter: Jason Lai
Assignee: Jason Lai


Inspired by [hooks | 
https://github.com/opencontainers/runtime-spec/blob/master/config.md#hooks] in 
[OCI's runtime spec | https://github.com/opencontainers/runtime-spec], it would 
be great to have scripts hooked into the lifecycle of containers.

The OCI doc has specified 3 stages for hooking:
* Prestart
* Poststart
* Poststop

We can consider having the 3 stages to start with.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7066) Allow permissive bit to be set for individual acls (in addition to the global level)

2017-02-17 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872638#comment-15872638
 ] 

Yan Xu commented on MESOS-7066:
---

I may be misunderstanding something but I thought it's not about allowing 
people to use a {{permissive}} bit as syntactical sugar for 
{{subject=ANY,object=ANY}} and {{subject=NONE,object=NONE}} but rather, if I, 
an operator, only care about one ACL in a non-permissive form, I shouldn't have 
to fill in all possible ACLs on the process today and watch closely for all 
future additions?

> Allow permissive bit to be set for individual acls (in addition to the global 
> level)
> 
>
> Key: MESOS-7066
> URL: https://issues.apache.org/jira/browse/MESOS-7066
> Project: Mesos
>  Issue Type: Improvement
>  Components: security
>Reporter: Anindya Sinha
>Assignee: Adam B
>Priority: Minor
>  Labels: acl
>
> Currently, while defining ACLs for master or agents, there is a boolean field 
>  {{permissive}} that can be set on the global level that applies to all acls.
> It defines the behavior when no ACL matches to the request made. If set to 
> true (which is the default) it will allow by default all non-matching 
> requests, if set to false it will reject all non-matching requests.
> We should consider supporting a local {{permissive}} field specific to each 
> ACL which would override the global {{permissive}} field if the local 
> {{permissive}} field is set.
> The use case is that if support for a new ACL is added to master or agent, 
> and a cluster uses the global {{permissive}} field set to {{false}}, that 
> would imply that the authorization for the newly added ACL shall fail unless 
> the operator adds the corresponding entry for the newly added ACL, which 
> leads to a upgrade issue.
> If we have both the global as well as local {{permissive}} bit, then the 
> global {{permissive}} bit can be set to {{true}}, whereas the local 
> {{permissive}} bit can be set to true or false based on the use case. With 
> this approach, it would not be mandatory to add an entry for the new ACL 
> entry unless the operator chooses to do so.
> That obviously also leads to the fact that maybe we should not have the 
> global {{permissive}} bit in the first place.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6988) WebUI redirect doesn't work with stats from /metrics/snapshot

2017-02-17 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872629#comment-15872629
 ] 

Yan Xu commented on MESOS-6988:
---

I see. We do have this problem in our cluster but we have an internal patch 
that basically turned off the periodic query: we basically just modified 
{{updateInterval}} to return a very large number: 
https://github.com/apache/mesos/blob/267d719c7a8308e1a1b98c73f5091dbb7708c444/src/webui/master/static/js/controllers.js#L39

I imagine if we didn't have this it would probably "eventually" obtain the 
correct info.

However what's baffling me is that our webUI is still able to update the web 
page with data from {{pollState}} correctly when it first loads? Only data from 
{{pollMetrics}} uses the current non-leading node when it first loads?

> WebUI redirect doesn't work with stats from /metrics/snapshot
> -
>
> Key: MESOS-6988
> URL: https://issues.apache.org/jira/browse/MESOS-6988
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 1.1.0
>Reporter: Yan Xu
>Assignee: haosdent
>
> The issue described in MESOS-6446 is still not fixed in 1.1.0. (Especially 
> for non-leading masters)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7137) Custom executors cannot use any reserved resources.

2017-02-17 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872471#comment-15872471
 ] 

Anand Mazumdar commented on MESOS-7137:
---

{noformat}
commit 267d719c7a8308e1a1b98c73f5091dbb7708c444
Author: Anand Mazumdar 
Date:   Fri Feb 17 12:15:54 2017 -0800

Fixed a bug around executor not able to use reserved resources.

We were not unallocating the resources before checking if the
executor resources were contained in the checkpointed resources
on the agent.

Review: https://reviews.apache.org/r/56778/
{noformat}

> Custom executors cannot use any reserved resources.
> ---
>
> Key: MESOS-7137
> URL: https://issues.apache.org/jira/browse/MESOS-7137
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>Priority: Blocker
>  Labels: mesosphere
>
> A custom executor or the built-in default executor cannot launch a task if 
> they use reserved resources as part of {{ExecutorInfo}}. This mostly happens 
> due to the fact that we don't unallocate the {{Resource}} when comparing it 
> with the checkpointed resources on the agent:
> {code}
>   Resources checkpointedExecutorResources =
> Resources(executorInfo.resources()).filter(needCheckpointing);
> {code}
> The fix can be as simple as changing this to:
> {code}
>   Resources checkpointedExecutorResources =
> unallocated(executorInfo.resources()).filter(needCheckpointing);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-5992) Complete the list of API Calls on the Operator HTTP API Doc

2017-02-17 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-5992:
--
Fix Version/s: 1.2.0

Backported to 1.2.x branch for 1.2.0

commit 5b810aaeb6ddcf82481b44548f78a28a6aa75869
Author: Abhishek Dasgupta 
Date:   Fri Feb 17 12:13:55 2017 -0800

Documented all the API calls for Operator HTTP API.

Review: https://reviews.apache.org/r/50974/


> Complete the list of API Calls on the Operator HTTP API Doc
> ---
>
> Key: MESOS-5992
> URL: https://issues.apache.org/jira/browse/MESOS-5992
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Anand Mazumdar
>Assignee: Abhishek Dasgupta
>  Labels: documentation, mesosphere
> Fix For: 1.2.0, 1.3.0
>
>
> Currently, the Operator HTTP API doc only has limited information about the 
> different types of calls it supports. It would be a good exercise to complete 
> the doc with a list of all the supported calls for the Master/Agent API with 
> some description about them/relevant code snippets.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7122) Process reaper should have a dedicated thread to avoid deadlock.

2017-02-17 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872285#comment-15872285
 ] 

James Peach commented on MESOS-7122:


Ah. I'd agree that the reaper is a core part of libprocess and is special :)

> Process reaper should have a dedicated thread to avoid deadlock.
> 
>
> Key: MESOS-7122
> URL: https://issues.apache.org/jira/browse/MESOS-7122
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: James Peach
>
> In a test environment, we saw that libprocess can deadlock when the process 
> reaper is unable to run. 
> This happens in the Mesos HDFS client, which synchronously runs a {{hadoop}} 
> subprocess. If this happens too many times, the {{ReaperProcess}} is never 
> scheduled to reap the subprocess statuses. Since the HDFS {{Future}} never 
> completes, we deadlock with all the threads in the call stack below. If there 
> was a dedicated thread for the {{ReaperProcess}} to run on, or some other way 
> to endure that is is scheduled we could avoid the deadlock.
> {noformat}
> #0  0x7f67b6ffc68c in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x7f67b6da12fc in 
> std::condition_variable::wait(std::unique_lock&) () from 
> /usr/lib64/libstdc++.so.6
> #2  0x7f67b8b864f6 in process::ProcessManager::wait(process::UPID const&) 
> () from /usr/lib64/libmesos-1.2.0.so
> #3  0x7f67b8b8d347 in process::wait(process::UPID const&, Duration 
> const&) () from /usr/lib64/libmesos-1.2.0.so
> #4  0x7f67b8b51a85 in process::Latch::await(Duration const&) () from 
> /usr/lib64/libmesos-1.2.0.so
> #5  0x7f67b834fc9f in process::Future::await(Duration const&) 
> const () from /usr/lib64/libmesos-1.2.0.so
> #6  0x7f67b833d700 in 
> mesos::internal::slave::fetchSize(std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&) () from /usr/lib64/libmesos-1.2.0.so
> #7  0x7f67b833df5e in 
> std::result_of  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
> const&)::{lambda()#2} ()()>::type 
> process::AsyncExecutorProcess::execute  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
> const&)::{lambda()#2}>(std::result_of const&, 
> boost::disable_if const&::is_void  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
> const&)::{lambda()#2} ()()> >, void>::type*) () from 
> /usr/lib64/libmesos-1.2.0.so
> #8  0x7f67b833a3d5 in std::_Function_handler ()(process::ProcessBase*), process::Future > 
> process::dispatch, process::AsyncExecutorProcess, 
> mesos::internal::slave::FetcherProcess::fetch(mesos::ContainerID const&, 
> mesos::CommandInfo const&, std::basic_string std::allocator > const&, Option std::char_traits, std::allocator > > const&, mesos::SlaveID 
> const&, mesos::internal::slave::Flags const&)::{lambda()#2} const&, void*, 
> {lambda()#2}, 
> mesos::internal::slave::FetcherProcess::fetch(mesos::ContainerID const&, 
> mesos::CommandInfo const&, std::basic_string std::allocator > const&, Option std::char_traits, std::allocator > > const&, mesos::SlaveID 
> const&, mesos::internal::slave::Flags const&)::{lambda()#2} 
> const&>(process::PID const&, process::Future 
> (process::PID::*)(mesos::internal::slave::FetcherProcess::fetch(mesos::ContainerID
>  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
> const&)::{lambda()#2} const&, void*), {lambda()#2}, 
> mesos::internal::slave::FetcherProcess::fetch(mesos::ContainerID const&, 
> mesos::CommandInfo const&, std::basic_string std::allocator > const&, Option std::char_traits, std::allocator > > const&, 

[jira] [Comment Edited] (MESOS-7122) Process reaper should have a dedicated thread to avoid deadlock.

2017-02-17 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872263#comment-15872263
 ] 

Benjamin Mahler edited comment on MESOS-7122 at 2/17/17 6:31 PM:
-

{quote}
I don't think I understood you correctly here. It sounds like you are saying 
that every API that returns a Future ought to be a separate thread? Since 
basically everything returns a Future, that doesn't seem practical.
{quote}

Right, that was the point :). The reasoning seems flawed given it can be 
applied everywhere but something about the reaper is special.


was (Author: bmahler):
{code}
I don't think I understood you correctly here. It sounds like you are saying 
that every API that returns a Future ought to be a separate thread? Since 
basically everything returns a Future, that doesn't seem practical.
{code}

Right, that was the point :). The reasoning seems flawed given it can be 
applied everywhere but something about the reaper is special.

> Process reaper should have a dedicated thread to avoid deadlock.
> 
>
> Key: MESOS-7122
> URL: https://issues.apache.org/jira/browse/MESOS-7122
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: James Peach
>
> In a test environment, we saw that libprocess can deadlock when the process 
> reaper is unable to run. 
> This happens in the Mesos HDFS client, which synchronously runs a {{hadoop}} 
> subprocess. If this happens too many times, the {{ReaperProcess}} is never 
> scheduled to reap the subprocess statuses. Since the HDFS {{Future}} never 
> completes, we deadlock with all the threads in the call stack below. If there 
> was a dedicated thread for the {{ReaperProcess}} to run on, or some other way 
> to endure that is is scheduled we could avoid the deadlock.
> {noformat}
> #0  0x7f67b6ffc68c in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x7f67b6da12fc in 
> std::condition_variable::wait(std::unique_lock&) () from 
> /usr/lib64/libstdc++.so.6
> #2  0x7f67b8b864f6 in process::ProcessManager::wait(process::UPID const&) 
> () from /usr/lib64/libmesos-1.2.0.so
> #3  0x7f67b8b8d347 in process::wait(process::UPID const&, Duration 
> const&) () from /usr/lib64/libmesos-1.2.0.so
> #4  0x7f67b8b51a85 in process::Latch::await(Duration const&) () from 
> /usr/lib64/libmesos-1.2.0.so
> #5  0x7f67b834fc9f in process::Future::await(Duration const&) 
> const () from /usr/lib64/libmesos-1.2.0.so
> #6  0x7f67b833d700 in 
> mesos::internal::slave::fetchSize(std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&) () from /usr/lib64/libmesos-1.2.0.so
> #7  0x7f67b833df5e in 
> std::result_of  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
> const&)::{lambda()#2} ()()>::type 
> process::AsyncExecutorProcess::execute  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
> const&)::{lambda()#2}>(std::result_of const&, 
> boost::disable_if const&::is_void  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
> const&)::{lambda()#2} ()()> >, void>::type*) () from 
> /usr/lib64/libmesos-1.2.0.so
> #8  0x7f67b833a3d5 in std::_Function_handler ()(process::ProcessBase*), process::Future > 
> process::dispatch, process::AsyncExecutorProcess, 
> mesos::internal::slave::FetcherProcess::fetch(mesos::ContainerID const&, 
> mesos::CommandInfo const&, std::basic_string std::allocator > const&, Option std::char_traits, std::allocator > > const&, mesos::SlaveID 
> const&, mesos::internal::slave::Flags const&)::{lambda()#2} const&, void*, 
> {lambda()#2}, 
> mesos::internal::slave::FetcherProcess::fetch(mesos::ContainerID const&, 
> mesos::CommandInfo const&, std::basic_string std::allocator > const&, Option std::char_traits, std::allocator > > const&, mesos::SlaveID 
> 

[jira] [Updated] (MESOS-5783) Use protobuf arena allocation to improve performance.

2017-02-17 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-5783:
---
Issue Type: Epic  (was: Improvement)
   Summary: Use protobuf arena allocation to improve performance.  (was: 
Explore using protobuf arena allocation to improve performance)

> Use protobuf arena allocation to improve performance.
> -
>
> Key: MESOS-5783
> URL: https://issues.apache.org/jira/browse/MESOS-5783
> Project: Mesos
>  Issue Type: Epic
>  Components: general
>Reporter: Neil Conway
>  Labels: mesosphere, performance, protobuf
>
> This has the potential to reduce memory management overhead when manipulating 
> protobuf messages:
> https://developers.google.com/protocol-buffers/docs/reference/arenas



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-5783) Use protobuf arena allocation to improve performance.

2017-02-17 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-5783:
---
Epic Name: protobuf arenas

> Use protobuf arena allocation to improve performance.
> -
>
> Key: MESOS-5783
> URL: https://issues.apache.org/jira/browse/MESOS-5783
> Project: Mesos
>  Issue Type: Epic
>  Components: general
>Reporter: Neil Conway
>  Labels: mesosphere, performance, protobuf
>
> This has the potential to reduce memory management overhead when manipulating 
> protobuf messages:
> https://developers.google.com/protocol-buffers/docs/reference/arenas



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7122) Process reaper should have a dedicated thread to avoid deadlock.

2017-02-17 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872216#comment-15872216
 ] 

James Peach commented on MESOS-7122:


While I agree that blocking should be avoided, the point of this bug is that it 
is possible for the reaper to not reap. The reaper has to be able to reliably 
reap so that forward progress can be made in the unfortunate event of code 
blocking on subprocesses.

Running a separate thread for each {{waitpid}} seems expensive but would work. 
You could probably also implement this by having an event loop in {{kevent}} to 
monitor the PIDs directly, or by using {{signalfd}} on Linux to intercept 
{{SIGCHLD}} and reap any registered PIDs.

> Process reaper should have a dedicated thread to avoid deadlock.
> 
>
> Key: MESOS-7122
> URL: https://issues.apache.org/jira/browse/MESOS-7122
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: James Peach
>
> In a test environment, we saw that libprocess can deadlock when the process 
> reaper is unable to run. 
> This happens in the Mesos HDFS client, which synchronously runs a {{hadoop}} 
> subprocess. If this happens too many times, the {{ReaperProcess}} is never 
> scheduled to reap the subprocess statuses. Since the HDFS {{Future}} never 
> completes, we deadlock with all the threads in the call stack below. If there 
> was a dedicated thread for the {{ReaperProcess}} to run on, or some other way 
> to endure that is is scheduled we could avoid the deadlock.
> {noformat}
> #0  0x7f67b6ffc68c in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x7f67b6da12fc in 
> std::condition_variable::wait(std::unique_lock&) () from 
> /usr/lib64/libstdc++.so.6
> #2  0x7f67b8b864f6 in process::ProcessManager::wait(process::UPID const&) 
> () from /usr/lib64/libmesos-1.2.0.so
> #3  0x7f67b8b8d347 in process::wait(process::UPID const&, Duration 
> const&) () from /usr/lib64/libmesos-1.2.0.so
> #4  0x7f67b8b51a85 in process::Latch::await(Duration const&) () from 
> /usr/lib64/libmesos-1.2.0.so
> #5  0x7f67b834fc9f in process::Future::await(Duration const&) 
> const () from /usr/lib64/libmesos-1.2.0.so
> #6  0x7f67b833d700 in 
> mesos::internal::slave::fetchSize(std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&) () from /usr/lib64/libmesos-1.2.0.so
> #7  0x7f67b833df5e in 
> std::result_of  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
> const&)::{lambda()#2} ()()>::type 
> process::AsyncExecutorProcess::execute  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
> const&)::{lambda()#2}>(std::result_of const&, 
> boost::disable_if const&::is_void  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
> const&)::{lambda()#2} ()()> >, void>::type*) () from 
> /usr/lib64/libmesos-1.2.0.so
> #8  0x7f67b833a3d5 in std::_Function_handler ()(process::ProcessBase*), process::Future > 
> process::dispatch, process::AsyncExecutorProcess, 
> mesos::internal::slave::FetcherProcess::fetch(mesos::ContainerID const&, 
> mesos::CommandInfo const&, std::basic_string std::allocator > const&, Option std::char_traits, std::allocator > > const&, mesos::SlaveID 
> const&, mesos::internal::slave::Flags const&)::{lambda()#2} const&, void*, 
> {lambda()#2}, 
> mesos::internal::slave::FetcherProcess::fetch(mesos::ContainerID const&, 
> mesos::CommandInfo const&, std::basic_string std::allocator > const&, Option std::char_traits, std::allocator > > const&, mesos::SlaveID 
> const&, mesos::internal::slave::Flags const&)::{lambda()#2} 
> const&>(process::PID const&, process::Future 
> (process::PID::*)(mesos::internal::slave::FetcherProcess::fetch(mesos::ContainerID
>  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, 

[jira] [Commented] (MESOS-7122) Process reaper should have a dedicated thread to avoid deadlock.

2017-02-17 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872132#comment-15872132
 ] 

Benjamin Mahler commented on MESOS-7122:


{quote}
Future::await is still called in many places in the code base and if all active 
actors block like this
{quote}

Have you done a survey? Where are they?

There is also the loss of our Clock based time control if we were to use sleep, 
that seems undesired. Another thought is that we don't need an explicit thread, 
we can just use an actorless approach but that means that the timer thread is 
executing the non-blocking waitpid calls (not sure how long these can take in 
the worst case, and we generally don't want the timer thread making system 
calls so this may not be an option). What worries me about this approach is 
that we're trying to be careful in using actors in libprocess for deadlock 
prevention reasons, which seems to be a symptom of a bigger issue.

More generally, if Processes are blocking, then we may deadlock regardless of 
the reaper's involvement. For example, the rate limiter. In the case of the 
rate limiter an actorless implementation seems more feasible, but again this 
seems like an optimization rather than something that has anything to do with 
deadlock prevention.

It's also not guaranteed generally that events coming from io::poll are not 
scheduled through a Process in order to unblock the blocked Process.

To me it seems we should focus our attention on (1) not writing any blocking 
code in mesos, which means updating the hdfs client and whichever other 
components are blocking. This would allow us to reduce the number of libprocess 
threads needed by default. (2) Explore general solutions to deadlocking (e.g. 
adding worker threads dynamically as needed, better blocking prevention 
enforcement, making blocking safe, etc.

> Process reaper should have a dedicated thread to avoid deadlock.
> 
>
> Key: MESOS-7122
> URL: https://issues.apache.org/jira/browse/MESOS-7122
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: James Peach
>
> In a test environment, we saw that libprocess can deadlock when the process 
> reaper is unable to run. 
> This happens in the Mesos HDFS client, which synchronously runs a {{hadoop}} 
> subprocess. If this happens too many times, the {{ReaperProcess}} is never 
> scheduled to reap the subprocess statuses. Since the HDFS {{Future}} never 
> completes, we deadlock with all the threads in the call stack below. If there 
> was a dedicated thread for the {{ReaperProcess}} to run on, or some other way 
> to endure that is is scheduled we could avoid the deadlock.
> {noformat}
> #0  0x7f67b6ffc68c in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x7f67b6da12fc in 
> std::condition_variable::wait(std::unique_lock&) () from 
> /usr/lib64/libstdc++.so.6
> #2  0x7f67b8b864f6 in process::ProcessManager::wait(process::UPID const&) 
> () from /usr/lib64/libmesos-1.2.0.so
> #3  0x7f67b8b8d347 in process::wait(process::UPID const&, Duration 
> const&) () from /usr/lib64/libmesos-1.2.0.so
> #4  0x7f67b8b51a85 in process::Latch::await(Duration const&) () from 
> /usr/lib64/libmesos-1.2.0.so
> #5  0x7f67b834fc9f in process::Future::await(Duration const&) 
> const () from /usr/lib64/libmesos-1.2.0.so
> #6  0x7f67b833d700 in 
> mesos::internal::slave::fetchSize(std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&) () from /usr/lib64/libmesos-1.2.0.so
> #7  0x7f67b833df5e in 
> std::result_of  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
> const&)::{lambda()#2} ()()>::type 
> process::AsyncExecutorProcess::execute  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
> const&)::{lambda()#2}>(std::result_of const&, 
> boost::disable_if const&::is_void  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
> const&)::{lambda()#2} ()()> >, 

[jira] [Created] (MESOS-7140) Provide a default framework implementation

2017-02-17 Thread haosdent (JIRA)
haosdent created MESOS-7140:
---

 Summary: Provide a default framework implementation
 Key: MESOS-7140
 URL: https://issues.apache.org/jira/browse/MESOS-7140
 Project: Mesos
  Issue Type: Wish
Reporter: haosdent
Priority: Minor


Now we have a lot of example frameworks for different features, it would be 
nice if we could provide the simplest default framework in Mesos to provide all 
of these features. And it could merge the functions of mesos-execute as well.





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7139) Makes it clear that the hostname of an agent is not a unique attribute

2017-02-17 Thread Armand Grillet (JIRA)
Armand Grillet created MESOS-7139:
-

 Summary: Makes it clear that the hostname of an agent is not a 
unique attribute
 Key: MESOS-7139
 URL: https://issues.apache.org/jira/browse/MESOS-7139
 Project: Mesos
  Issue Type: Documentation
  Components: documentation
Reporter: Armand Grillet
Priority: Minor


The hostname, an agent attribute that is by default set to be its IP, is not a 
unique attribute. It does not appear to be written anywhere thus an improvement 
of the documentation concerning this flag would be interesting.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-7125) ./configure does not run ./config.status

2017-02-17 Thread Will Rouesnel (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Rouesnel reassigned MESOS-7125:


Assignee: Will Rouesnel

> ./configure does not run ./config.status
> 
>
> Key: MESOS-7125
> URL: https://issues.apache.org/jira/browse/MESOS-7125
> Project: Mesos
>  Issue Type: Bug
>  Components: build, general
>Affects Versions: 1.3.0
>Reporter: Will Rouesnel
>Assignee: Will Rouesnel
>Priority: Minor
>
> When checking out a fresh build with make, ./bootstrap && ./configure && make 
> will fail because ./config.status is not run by the ./configure script.
> This is not major, but is surprising and not included in the current build 
> recipes provided by the documentation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7122) Process reaper should have a dedicated thread to avoid deadlock.

2017-02-17 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871413#comment-15871413
 ] 

Yan Xu commented on MESOS-7122:
---

True. For the motivating example we can try to remove blocking, i.e., 
{{Future::await()}} when calling the HDFS wrapper.

I think this ticket looks at this from another angle, though.

{{Future::await}} is still called in many places in the code base and if all 
active actors block like this, we *may* deadlock. However we only deadlock when 
"all these actors are blocked on futures that require an additional thread from 
the worker thread pool to unblock". i.e., if these futures can be unblocked by 
threads outside the pool like zookeeper or {{io::poll}} then we won't deadlock. 
The point of this JIRA is that subprocess reaping may be suitable to be made 
like the zookeeper client or {{io::poll}} and it's used widely enough but its 
purpose special enough that it may warrant a special actor (or non-actor) for 
this.

So to me, yes the libprocess users can be more careful when writing code to 
minimize blocking but libprocess as a library can also help reduce the chances 
of deadlock for some common cases.

> Process reaper should have a dedicated thread to avoid deadlock.
> 
>
> Key: MESOS-7122
> URL: https://issues.apache.org/jira/browse/MESOS-7122
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: James Peach
>
> In a test environment, we saw that libprocess can deadlock when the process 
> reaper is unable to run. 
> This happens in the Mesos HDFS client, which synchronously runs a {{hadoop}} 
> subprocess. If this happens too many times, the {{ReaperProcess}} is never 
> scheduled to reap the subprocess statuses. Since the HDFS {{Future}} never 
> completes, we deadlock with all the threads in the call stack below. If there 
> was a dedicated thread for the {{ReaperProcess}} to run on, or some other way 
> to endure that is is scheduled we could avoid the deadlock.
> {noformat}
> #0  0x7f67b6ffc68c in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x7f67b6da12fc in 
> std::condition_variable::wait(std::unique_lock&) () from 
> /usr/lib64/libstdc++.so.6
> #2  0x7f67b8b864f6 in process::ProcessManager::wait(process::UPID const&) 
> () from /usr/lib64/libmesos-1.2.0.so
> #3  0x7f67b8b8d347 in process::wait(process::UPID const&, Duration 
> const&) () from /usr/lib64/libmesos-1.2.0.so
> #4  0x7f67b8b51a85 in process::Latch::await(Duration const&) () from 
> /usr/lib64/libmesos-1.2.0.so
> #5  0x7f67b834fc9f in process::Future::await(Duration const&) 
> const () from /usr/lib64/libmesos-1.2.0.so
> #6  0x7f67b833d700 in 
> mesos::internal::slave::fetchSize(std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&) () from /usr/lib64/libmesos-1.2.0.so
> #7  0x7f67b833df5e in 
> std::result_of  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
> const&)::{lambda()#2} ()()>::type 
> process::AsyncExecutorProcess::execute  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
> const&)::{lambda()#2}>(std::result_of const&, 
> boost::disable_if const&::is_void  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
> const&)::{lambda()#2} ()()> >, void>::type*) () from 
> /usr/lib64/libmesos-1.2.0.so
> #8  0x7f67b833a3d5 in std::_Function_handler ()(process::ProcessBase*), process::Future > 
> process::dispatch, process::AsyncExecutorProcess, 
> mesos::internal::slave::FetcherProcess::fetch(mesos::ContainerID const&, 
> mesos::CommandInfo const&, std::basic_string std::allocator > const&, Option std::char_traits, std::allocator > > const&, mesos::SlaveID 
> const&, mesos::internal::slave::Flags const&)::{lambda()#2} const&, void*, 
> {lambda()#2}, 
> mesos::internal::slave::FetcherProcess::fetch(mesos::ContainerID const&, 
>