[jira] [Commented] (MESOS-5155) Consolidate authorization actions for quota.

2016-05-10 Thread Zhitao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279545#comment-15279545
 ] 

Zhitao Li commented on MESOS-5155:
--

[~alexr], so this is my understanding of your answer:

1) We will not try to implement quota update in the same release of 
consolidation of quota authorization. (This is fine for me, just confirming the 
plan);

2) Because the upgrade path is new binary and old ACL flag, new code needs to 
construct both {{UPDATE_QUOTA_WITH_ROLE}} and {{DESTROY_QUOTA_WITH_PRINCIPAL}} 
actions, send *each action* to (local) authorizer separately, and merge the 
results with a boolean operator (*AND* or *OR*). Because one and only one ACL 
list is an empty list, one of the result is always {{acls.permissive()}}, so we 
need to use *AND* if {{acls.permissive()}} == true, and *OR* if 
{{acls.permissive()}}==false. *Implementing this perfectly probably requires 
adding more code to {{Authorizer}} interface.*

I tried to put up a test diff for a variance of Option 2, which completely 
ignores {{remove_quotas}} and takes {{set_quotas}} as fallback if 
{{update_quotas}} is empty. It is pretty easy to implement.

> Consolidate authorization actions for quota.
> 
>
> Key: MESOS-5155
> URL: https://issues.apache.org/jira/browse/MESOS-5155
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>Assignee: Zhitao Li
>  Labels: mesosphere
>
> We should have just a single authz action: {{UPDATE_QUOTA_WITH_ROLE}}. It was 
> a mistake in retrospect to introduce multiple actions.
> Actions that are not symmetrical are register/teardown and dynamic 
> reservations. The way they are implemented in this way is because entities 
> that do one action differ from entities that do the other. For example, 
> register framework is issued by a framework, teardown by an operator. What is 
> a good way to identify a framework? A role it runs in, which may be different 
> each launch and makes no sense in multi-role frameworks setup or better a 
> sort of a group id, which is its principal. For dynamic reservations and 
> persistent volumes, they can be both issued by frameworks and operators, 
> hence similar reasoning applies. 
> Now, quota is associated with a role and set only by operators. Do we need to 
> care about principals that set it? Not that much. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5350) Add asynchronous hook for validating docker containerizer tasks

2016-05-10 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279388#comment-15279388
 ] 

Adam B edited comment on MESOS-5350 at 5/11/16 4:27 AM:


Work in progress:

|| Review || Summary ||
| https://reviews.apache.org/r/47149/ | Split 
{{DockerContainerizerProcess::launch}} |
| https://reviews.apache.org/r/47205/ | {{mesos-docker-executer}} 
{{--task_environment}} flag |
| https://reviews.apache.org/r/47212/ | Duplicate {{executorEnvironment}} call |
| https://reviews.apache.org/r/47213/ | {{FlagsBase::toVector}} |
| https://reviews.apache.org/r/47214/ | Subprocess cleanup due to above |
| https://reviews.apache.org/r/47215/ | Dockerized {{mesos-docker-executer}} 
tweak |
| https://reviews.apache.org/r/47150/ | Introduce new hook (partial) |
| https://reviews.apache.org/r/47216/ | Put hook into the DockerContainerizer |


was (Author: kaysoky):
Work in progress:

|| Review || Summary ||
| https://reviews.apache.org/r/47149/ | Split 
{{DockerContainerizerProcess::launch}} |
| https://reviews.apache.org/r/47205/ | {{mesos-docker-executer}} 
{{--task_environment}} flag |
| https://reviews.apache.org/r/47212/ | Duplicate {{executorEnvironment}} call |
| https://reviews.apache.org/r/47213/ | {{FlagsBase::toVector}} |
| https://reviews.apache.org/r/47214/ | Subprocess cleanup due to above |
| https://reviews.apache.org/r/47215/ | Dockerized {{mesos-docker-executer}} 
tweak |
| https://reviews.apache.org/r/47150/ | Introduce new hook (partial) |
| TODO | Put hook into the DockerContainerizer |

> Add asynchronous hook for validating docker containerizer tasks
> ---
>
> Key: MESOS-5350
> URL: https://issues.apache.org/jira/browse/MESOS-5350
> Project: Mesos
>  Issue Type: Improvement
>  Components: docker, modules
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>Priority: Minor
>  Labels: containerizer, hooks, mesosphere
>
> It is possible to plug in custom validation logic for the MesosContainerizer 
> via an {{Isolator}} module, but the same is not true of the 
> DockerContainerizer.
> Basic logic can be plugged into the DockerContainerizer via {{Hooks}}, but 
> this has some notable differences compared to isolators:
> * Hooks are synchronous.
> * Modifications to tasks via Hooks have lower priority compared to the task 
> itself.  i.e. If both the {{TaskInfo}} and 
> {{slaveExecutorEnvironmentDecorator}} define the same environment variable, 
> the {{TaskInfo}} wins.
> * Hooks have no effect if they fail (short of segfaulting)
> i.e. The {{slavePreLaunchDockerHook}} has a return type of {{Try}}:
> https://github.com/apache/mesos/blob/628ccd23501078b04fb21eee85060a6226a80ef8/include/mesos/hook.hpp#L90
> But the effect of returning an {{Error}} is a log message:
> https://github.com/apache/mesos/blob/628ccd23501078b04fb21eee85060a6226a80ef8/src/hook/manager.cpp#L227-L230
> We should add a hook to the DockerContainerizer to narrow this gap.  This new 
> hook would:
> * Be called at roughly the same place as {{slavePreLaunchDockerHook}}
> https://github.com/apache/mesos/blob/628ccd23501078b04fb21eee85060a6226a80ef8/src/slave/containerizer/docker.cpp#L1022
> * Return a {{Future}} and require splitting up 
> {{DockerContainerizer::launch}}.
> * Prevent a task from launching if it returns a {{Failure}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-5350) Add asynchronous hook for validating docker containerizer tasks

2016-05-10 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-5350:
--
Comment: was deleted

(was: Looks like editing comments was disabled :(

For the TODO above: 
| https://reviews.apache.org/r/47216/ | Put hook into the DockerContainerizer |)

> Add asynchronous hook for validating docker containerizer tasks
> ---
>
> Key: MESOS-5350
> URL: https://issues.apache.org/jira/browse/MESOS-5350
> Project: Mesos
>  Issue Type: Improvement
>  Components: docker, modules
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>Priority: Minor
>  Labels: containerizer, hooks, mesosphere
>
> It is possible to plug in custom validation logic for the MesosContainerizer 
> via an {{Isolator}} module, but the same is not true of the 
> DockerContainerizer.
> Basic logic can be plugged into the DockerContainerizer via {{Hooks}}, but 
> this has some notable differences compared to isolators:
> * Hooks are synchronous.
> * Modifications to tasks via Hooks have lower priority compared to the task 
> itself.  i.e. If both the {{TaskInfo}} and 
> {{slaveExecutorEnvironmentDecorator}} define the same environment variable, 
> the {{TaskInfo}} wins.
> * Hooks have no effect if they fail (short of segfaulting)
> i.e. The {{slavePreLaunchDockerHook}} has a return type of {{Try}}:
> https://github.com/apache/mesos/blob/628ccd23501078b04fb21eee85060a6226a80ef8/include/mesos/hook.hpp#L90
> But the effect of returning an {{Error}} is a log message:
> https://github.com/apache/mesos/blob/628ccd23501078b04fb21eee85060a6226a80ef8/src/hook/manager.cpp#L227-L230
> We should add a hook to the DockerContainerizer to narrow this gap.  This new 
> hook would:
> * Be called at roughly the same place as {{slavePreLaunchDockerHook}}
> https://github.com/apache/mesos/blob/628ccd23501078b04fb21eee85060a6226a80ef8/src/slave/containerizer/docker.cpp#L1022
> * Return a {{Future}} and require splitting up 
> {{DockerContainerizer::launch}}.
> * Prevent a task from launching if it returns a {{Failure}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3784) Replace Master/Slave Terminology Phase I - Update mesos-cli

2016-05-10 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279416#comment-15279416
 ] 

Jay Guo commented on MESOS-3784:


reviewable at: https://reviews.apache.org/r/47217/

> Replace Master/Slave Terminology Phase I - Update mesos-cli 
> 
>
> Key: MESOS-3784
> URL: https://issues.apache.org/jira/browse/MESOS-3784
> Project: Mesos
>  Issue Type: Task
>  Components: cli
>Reporter: Diana Arroyo
>Assignee: Jay Guo
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3784) Replace Master/Slave Terminology Phase I - Update mesos-cli

2016-05-10 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo updated MESOS-3784:
---
Shepherd: Vinod Kone

> Replace Master/Slave Terminology Phase I - Update mesos-cli 
> 
>
> Key: MESOS-3784
> URL: https://issues.apache.org/jira/browse/MESOS-3784
> Project: Mesos
>  Issue Type: Task
>Reporter: Diana Arroyo
>Assignee: Jay Guo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5350) Add asynchronous hook for validating docker containerizer tasks

2016-05-10 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279403#comment-15279403
 ] 

Joseph Wu commented on MESOS-5350:
--

Looks like editing comments was disabled :(

For the TODO above: 
| https://reviews.apache.org/r/47216/ | Put hook into the DockerContainerizer |

> Add asynchronous hook for validating docker containerizer tasks
> ---
>
> Key: MESOS-5350
> URL: https://issues.apache.org/jira/browse/MESOS-5350
> Project: Mesos
>  Issue Type: Improvement
>  Components: docker, modules
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>Priority: Minor
>  Labels: containerizer, hooks, mesosphere
>
> It is possible to plug in custom validation logic for the MesosContainerizer 
> via an {{Isolator}} module, but the same is not true of the 
> DockerContainerizer.
> Basic logic can be plugged into the DockerContainerizer via {{Hooks}}, but 
> this has some notable differences compared to isolators:
> * Hooks are synchronous.
> * Modifications to tasks via Hooks have lower priority compared to the task 
> itself.  i.e. If both the {{TaskInfo}} and 
> {{slaveExecutorEnvironmentDecorator}} define the same environment variable, 
> the {{TaskInfo}} wins.
> * Hooks have no effect if they fail (short of segfaulting)
> i.e. The {{slavePreLaunchDockerHook}} has a return type of {{Try}}:
> https://github.com/apache/mesos/blob/628ccd23501078b04fb21eee85060a6226a80ef8/include/mesos/hook.hpp#L90
> But the effect of returning an {{Error}} is a log message:
> https://github.com/apache/mesos/blob/628ccd23501078b04fb21eee85060a6226a80ef8/src/hook/manager.cpp#L227-L230
> We should add a hook to the DockerContainerizer to narrow this gap.  This new 
> hook would:
> * Be called at roughly the same place as {{slavePreLaunchDockerHook}}
> https://github.com/apache/mesos/blob/628ccd23501078b04fb21eee85060a6226a80ef8/src/slave/containerizer/docker.cpp#L1022
> * Return a {{Future}} and require splitting up 
> {{DockerContainerizer::launch}}.
> * Prevent a task from launching if it returns a {{Failure}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5350) Add asynchronous hook for validating docker containerizer tasks

2016-05-10 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279388#comment-15279388
 ] 

Joseph Wu commented on MESOS-5350:
--

Work in progress:

|| Review || Summary ||
| https://reviews.apache.org/r/47149/ | Split 
{{DockerContainerizerProcess::launch}} |
| https://reviews.apache.org/r/47205/ | {{mesos-docker-executer}} 
{{--task_environment}} flag |
| https://reviews.apache.org/r/47212/ | Duplicate {{executorEnvironment}} call |
| https://reviews.apache.org/r/47213/ | {{FlagsBase::toVector}} |
| https://reviews.apache.org/r/47214/ | Subprocess cleanup due to above |
| https://reviews.apache.org/r/47215/ | Dockerized {{mesos-docker-executer}} 
tweak |
| https://reviews.apache.org/r/47150/ | Introduce new hook (partial) |
| TODO | Put hook into the DockerContainerizer |

> Add asynchronous hook for validating docker containerizer tasks
> ---
>
> Key: MESOS-5350
> URL: https://issues.apache.org/jira/browse/MESOS-5350
> Project: Mesos
>  Issue Type: Improvement
>  Components: docker, modules
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>Priority: Minor
>  Labels: containerizer, hooks, mesosphere
>
> It is possible to plug in custom validation logic for the MesosContainerizer 
> via an {{Isolator}} module, but the same is not true of the 
> DockerContainerizer.
> Basic logic can be plugged into the DockerContainerizer via {{Hooks}}, but 
> this has some notable differences compared to isolators:
> * Hooks are synchronous.
> * Modifications to tasks via Hooks have lower priority compared to the task 
> itself.  i.e. If both the {{TaskInfo}} and 
> {{slaveExecutorEnvironmentDecorator}} define the same environment variable, 
> the {{TaskInfo}} wins.
> * Hooks have no effect if they fail (short of segfaulting)
> i.e. The {{slavePreLaunchDockerHook}} has a return type of {{Try}}:
> https://github.com/apache/mesos/blob/628ccd23501078b04fb21eee85060a6226a80ef8/include/mesos/hook.hpp#L90
> But the effect of returning an {{Error}} is a log message:
> https://github.com/apache/mesos/blob/628ccd23501078b04fb21eee85060a6226a80ef8/src/hook/manager.cpp#L227-L230
> We should add a hook to the DockerContainerizer to narrow this gap.  This new 
> hook would:
> * Be called at roughly the same place as {{slavePreLaunchDockerHook}}
> https://github.com/apache/mesos/blob/628ccd23501078b04fb21eee85060a6226a80ef8/src/slave/containerizer/docker.cpp#L1022
> * Return a {{Future}} and require splitting up 
> {{DockerContainerizer::launch}}.
> * Prevent a task from launching if it returns a {{Failure}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5360) Set death signal for dvdcli subprocess in docker volume isolator.

2016-05-10 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279387#comment-15279387
 ] 

Guangya Liu commented on MESOS-5360:


How the dvdcli subprocess stuck? I think the dvdcli subprocess can get response 
if the docker volume driver backend alive and response on time?

> Set death signal for dvdcli subprocess in docker volume isolator.
> -
>
> Key: MESOS-5360
> URL: https://issues.apache.org/jira/browse/MESOS-5360
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jie Yu
>
> If the slave crashes, we should kill the dvdcli subprocess. Otherwise, if the 
> dvdcli subprocess gets stuck, it'll not be cleaned up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5343) Behavior of custom HTTP authenticators with disabled HTTP authentication is inconsistent between master and agent

2016-05-10 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279369#comment-15279369
 ] 

Adam B commented on MESOS-5343:
---

Fair point. Let's change the master's behavior to match the agent's.

> Behavior of custom HTTP authenticators with disabled HTTP authentication is 
> inconsistent between master and agent
> -
>
> Key: MESOS-5343
> URL: https://issues.apache.org/jira/browse/MESOS-5343
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.29.0
>Reporter: Benjamin Bannier
>Priority: Minor
>  Labels: mesosphere, security
> Fix For: 0.29.0
>
>
> When setting a custom authenticator with {{http_authenticators}} and also 
> specifying {{authenticate_http=false}} currently agents refuse to start with
> {code}
> A custom HTTP authenticator was specified with the '--http_authenticators' 
> flag, but HTTP authentication was not enabled via '--authenticate_http'
> {code}
> Masters on the other hand accept this setting.
> Having differing behavior between master and agents is confusing, and we 
> should decide on whether we want to accept these settings or not, and make 
> the implementations consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5340) libevent builds may prevent new connections

2016-05-10 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279337#comment-15279337
 ] 

Till Toenshoff commented on MESOS-5340:
---

I parallel, [~alexr] and I came up with a different, but also more intrusive 
approach: https://reviews.apache.org/r/47207/

> libevent builds may prevent new connections
> ---
>
> Key: MESOS-5340
> URL: https://issues.apache.org/jira/browse/MESOS-5340
> Project: Mesos
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.29.0, 0.28.1
>Reporter: Till Toenshoff
>Assignee: Benjamin Mahler
>Priority: Blocker
>  Labels: mesosphere, security, ssl
>
> When using an SSL-enabled build of Mesos in combination with SSL-downgrading 
> support, any connection that does not actually transmit data will hang the 
> runnable (e.g. master).
> For reproducing the issue (on any platform)...
> Spin up a master with enabled SSL-downgrading:
> {noformat}
> $ export SSL_ENABLED=true
> $ export SSL_SUPPORT_DOWNGRADE=true
> $ export SSL_KEY_FILE=/path/to/your/foo.key
> $ export SSL_CERT_FILE=/path/to/your/foo.crt
> $ export SSL_CA_FILE=/path/to/your/ca.crt
> $ ./bin/mesos-master.sh --work_dir=/tmp/foo
> {noformat}
> Create some artificial HTTP request load for quickly spotting the problem in 
> both, the master logs as well as the output of CURL itself:
> {noformat}
> $ while true; do sleep 0.1; echo $( date +">%H:%M:%S.%3N"; curl -s -k -A "SSL 
> Debug" http://localhost:5050/master/slaves; echo ;date +"<%H:%M:%S.%3N"; 
> echo); done
> {noformat}
> Now create a connection to the master that does not transmit any data:
> {noformat}
> $ telnet localhost 5050
> {noformat}
> You should now see the CURL requests hanging, the master stops responding to 
> new connections. This will persist until either some data is transmitted via 
> the above telnet connection or it is closed.
> This problem has initially been observed when running Mesos on an AWS cluster 
> with enabled load-balancer (which uses an idle, persistent connection) for 
> the master node. Such connection does naturally not transmit any data as long 
> as there are no external requests routed via the load-balancer. AWS allows 
> setting up a timeout for those connections and in our test environment, this 
> duration was set to 60 seconds and hence we were seeing our master getting 
> repetitively unresponsive for 60 seconds, then getting "unstuck" for a brief 
> period until it got stuck again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5340) libevent builds may prevent new connections

2016-05-10 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279337#comment-15279337
 ] 

Till Toenshoff edited comment on MESOS-5340 at 5/11/16 1:05 AM:


In parallel, [~alexr] and I came up with a different, but also more intrusive 
approach: https://reviews.apache.org/r/47207/


was (Author: tillt):
I parallel, [~alexr] and I came up with a different, but also more intrusive 
approach: https://reviews.apache.org/r/47207/

> libevent builds may prevent new connections
> ---
>
> Key: MESOS-5340
> URL: https://issues.apache.org/jira/browse/MESOS-5340
> Project: Mesos
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.29.0, 0.28.1
>Reporter: Till Toenshoff
>Assignee: Benjamin Mahler
>Priority: Blocker
>  Labels: mesosphere, security, ssl
>
> When using an SSL-enabled build of Mesos in combination with SSL-downgrading 
> support, any connection that does not actually transmit data will hang the 
> runnable (e.g. master).
> For reproducing the issue (on any platform)...
> Spin up a master with enabled SSL-downgrading:
> {noformat}
> $ export SSL_ENABLED=true
> $ export SSL_SUPPORT_DOWNGRADE=true
> $ export SSL_KEY_FILE=/path/to/your/foo.key
> $ export SSL_CERT_FILE=/path/to/your/foo.crt
> $ export SSL_CA_FILE=/path/to/your/ca.crt
> $ ./bin/mesos-master.sh --work_dir=/tmp/foo
> {noformat}
> Create some artificial HTTP request load for quickly spotting the problem in 
> both, the master logs as well as the output of CURL itself:
> {noformat}
> $ while true; do sleep 0.1; echo $( date +">%H:%M:%S.%3N"; curl -s -k -A "SSL 
> Debug" http://localhost:5050/master/slaves; echo ;date +"<%H:%M:%S.%3N"; 
> echo); done
> {noformat}
> Now create a connection to the master that does not transmit any data:
> {noformat}
> $ telnet localhost 5050
> {noformat}
> You should now see the CURL requests hanging, the master stops responding to 
> new connections. This will persist until either some data is transmitted via 
> the above telnet connection or it is closed.
> This problem has initially been observed when running Mesos on an AWS cluster 
> with enabled load-balancer (which uses an idle, persistent connection) for 
> the master node. Such connection does naturally not transmit any data as long 
> as there are no external requests routed via the load-balancer. AWS allows 
> setting up a timeout for those connections and in our test environment, this 
> duration was set to 60 seconds and hence we were seeing our master getting 
> repetitively unresponsive for 60 seconds, then getting "unstuck" for a brief 
> period until it got stuck again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5330) Agent should backoff before connecting to the master

2016-05-10 Thread David Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279336#comment-15279336
 ] 

David Robinson commented on MESOS-5330:
---

Hey Vinod! I spoke to [~bmahler] a short time ago, he's going to take a look 
when he gets a chance.

I've discarded the previous review due to test failures -- applying the backoff 
to authentication has some problems. Tests pass w/ the review below.

https://reviews.apache.org/r/47209/

> Agent should backoff before connecting to the master
> 
>
> Key: MESOS-5330
> URL: https://issues.apache.org/jira/browse/MESOS-5330
> Project: Mesos
>  Issue Type: Bug
>Reporter: David Robinson
>Assignee: David Robinson
>
> When an agent is started it starts a background task (libprocess process?) to 
> detect the leading master. When the leading master is detected (or changes) 
> the [SocketManager's link() method is called and a TCP connection to the 
> master is 
> established|https://github.com/apache/mesos/blob/a138e2246a30c4b5c9bc3f7069ad12204dcaffbc/src/slave/slave.cpp#L954].
>  The agent _then_ backs off before sending a ReRegisterSlave message via the 
> newly established connection. The agent needs to backoff _before_ attempting 
> to establish a TCP connection to the master, not before sending the first 
> message over the connection.
> During scale tests at Twitter we discovered that agents can SYN flood the 
> master upon leader changes, then the problem described in MESOS-5200 can 
> occur where ephemeral connections are used, which exacerbates the problem. 
> The end result is a lot of hosts setting up and tearing down TCP connections 
> every slave_ping_timeout seconds (15 by default), connections failing to be 
> established, hosts being marked as unhealthy and being shutdown. We observed 
> ~800 passive TCP connections per second on the leading master during scale 
> tests.
> The problem can be somewhat mitigated by tuning the kernel to handle a 
> thundering herd of TCP connections, but ideally there would not be a 
> thundering herd to begin with.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-5330) Agent should backoff before connecting to the master

2016-05-10 Thread David Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Robinson updated MESOS-5330:
--
Comment: was deleted

(was: https://reviews.apache.org/r/47154/)

> Agent should backoff before connecting to the master
> 
>
> Key: MESOS-5330
> URL: https://issues.apache.org/jira/browse/MESOS-5330
> Project: Mesos
>  Issue Type: Bug
>Reporter: David Robinson
>Assignee: David Robinson
>
> When an agent is started it starts a background task (libprocess process?) to 
> detect the leading master. When the leading master is detected (or changes) 
> the [SocketManager's link() method is called and a TCP connection to the 
> master is 
> established|https://github.com/apache/mesos/blob/a138e2246a30c4b5c9bc3f7069ad12204dcaffbc/src/slave/slave.cpp#L954].
>  The agent _then_ backs off before sending a ReRegisterSlave message via the 
> newly established connection. The agent needs to backoff _before_ attempting 
> to establish a TCP connection to the master, not before sending the first 
> message over the connection.
> During scale tests at Twitter we discovered that agents can SYN flood the 
> master upon leader changes, then the problem described in MESOS-5200 can 
> occur where ephemeral connections are used, which exacerbates the problem. 
> The end result is a lot of hosts setting up and tearing down TCP connections 
> every slave_ping_timeout seconds (15 by default), connections failing to be 
> established, hosts being marked as unhealthy and being shutdown. We observed 
> ~800 passive TCP connections per second on the leading master during scale 
> tests.
> The problem can be somewhat mitigated by tuning the kernel to handle a 
> thundering herd of TCP connections, but ideally there would not be a 
> thundering herd to begin with.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5302) Consider adding an Executor Shim/Adapter for the new/old API

2016-05-10 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-5302:
--
Sprint: Mesosphere Sprint 35

> Consider adding an Executor Shim/Adapter for the new/old API
> 
>
> Key: MESOS-5302
> URL: https://issues.apache.org/jira/browse/MESOS-5302
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Currently, all the business logic for HTTP based command executor/driver 
> based command executor lives in 2 different files. As more features are 
> added/bugs are discovered in the executor itself, they need to be fixed in 
> two places. It would be nice to have some kind of a shim/adapter that 
> abstracts away the underlying library details from the executor. Hence, the 
> executor can toggle between whether it wants to use the driver or the new API 
> via an environment variable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5363) Content-Type for HTTP error responses should match Accept-Type in requests

2016-05-10 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-5363:
-

 Summary: Content-Type for HTTP error responses should match 
Accept-Type in requests
 Key: MESOS-5363
 URL: https://issues.apache.org/jira/browse/MESOS-5363
 Project: Mesos
  Issue Type: Bug
Reporter: Vinod Kone


Currently, HTTP error responses (4xx and 5xx) with body are sent with 
Content-Type `text/plain`. This is done even if the original HTTP request 
didn't contain `text/plain` as one of the Accept-Types in the header. Once we 
use application level error codes (MESOS-4548) in the body of HTTP responses, 
we should match the Content-Type of response to Accept-Type in the request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2201) ReplicaTest.Restore fails with leveldb greater than v1.7.

2016-05-10 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-2201:
---
Summary: ReplicaTest.Restore fails with leveldb greater than v1.7.  (was: 
ReplicaTest_Restore fails with leveldb greater than v1.7)

> ReplicaTest.Restore fails with leveldb greater than v1.7.
> -
>
> Key: MESOS-2201
> URL: https://issues.apache.org/jira/browse/MESOS-2201
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.29.0
> Environment: E.g. Ubuntu 14.04.4 LTS + leveldb 1.10
>Reporter: Kapil Arya
>Assignee: Tomasz Janiszewski
>Priority: Minor
>  Labels: mesosphere
>
> I wanted to configure Mesos with system provided leveldb libraries when I ran 
> into this issue. Apparently,  if one does {{../configure 
> --with-leveldb=/path/to/leveldb}}, compilation succeeds, however the 
> "ReplicaTest_Restore" test fails with the following back trace:
> {code}
> [ RUN  ] ReplicaTest.Restore
> Using temporary directory '/tmp/ReplicaTest_Restore_IZbbRR'
> I1222 14:16:49.517500  2927 leveldb.cpp:176] Opened db in 10.758917ms
> I1222 14:16:49.526495  2927 leveldb.cpp:183] Compacted db in 8.931146ms
> I1222 14:16:49.526523  2927 leveldb.cpp:198] Created db iterator in 5787ns
> I1222 14:16:49.526531  2927 leveldb.cpp:204] Seeked to beginning of db in 
> 511ns
> I1222 14:16:49.526535  2927 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 197ns
> I1222 14:16:49.526623  2927 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1222 14:16:49.530972  2945 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 3.084458ms
> I1222 14:16:49.531008  2945 replica.cpp:320] Persisted replica status to 
> VOTING
> I1222 14:16:49.541263  2927 leveldb.cpp:176] Opened db in 9.980586ms
> I1222 14:16:49.551636  2927 leveldb.cpp:183] Compacted db in 10.348096ms
> I1222 14:16:49.551683  2927 leveldb.cpp:198] Created db iterator in 3405ns
> I1222 14:16:49.551693  2927 leveldb.cpp:204] Seeked to beginning of db in 
> 3559ns
> I1222 14:16:49.551728  2927 leveldb.cpp:273] Iterated through 1 keys in the 
> db in 29722ns
> I1222 14:16:49.551751  2927 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1222 14:16:49.551996  2947 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I1222 14:16:49.560921  2947 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 8.899591ms
> I1222 14:16:49.560940  2947 replica.cpp:342] Persisted promised to 1
> I1222 14:16:49.561338  2943 replica.cpp:508] Replica received write request 
> for position 1
> I1222 14:16:49.568677  2943 leveldb.cpp:343] Persisting action (27 bytes) to 
> leveldb took 7.287155ms
> I1222 14:16:49.568692  2943 replica.cpp:676] Persisted action at 1
> I1222 14:16:49.569042  2942 leveldb.cpp:438] Reading position from leveldb 
> took 26339ns
> F1222 14:16:49.569411  2927 replica.cpp:721] CHECK_SOME(state): IO error: 
> lock /tmp/ReplicaTest_Restore_IZbbRR/.log/LOCK: already held by process 
> Failed to recover the log
> *** Check failure stack trace: ***
> @ 0x7f7f6c53e688  google::LogMessage::Fail()
> @ 0x7f7f6c53e5e7  google::LogMessage::SendToLog()
> @ 0x7f7f6c53dff8  google::LogMessage::Flush()
> @ 0x7f7f6c540d2c  google::LogMessageFatal::~LogMessageFatal()
> @   0x90a520  _CheckFatal::~_CheckFatal()
> @ 0x7f7f6c400f4d  mesos::internal::log::ReplicaProcess::restore()
> @ 0x7f7f6c3fd763  
> mesos::internal::log::ReplicaProcess::ReplicaProcess()
> @ 0x7f7f6c401271  mesos::internal::log::Replica::Replica()
> @   0xcd7ca3  ReplicaTest_Restore_Test::TestBody()
> @  0x10934b2  
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x108e584  
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x10768fd  testing::Test::Run()
> @  0x1077020  testing::TestInfo::Run()
> @  0x10775a8  testing::TestCase::Run()
> @  0x107c324  testing::internal::UnitTestImpl::RunAllTests()
> @  0x1094348  
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x108f2b7  
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x107b1d4  testing::UnitTest::Run()
> @   0xd344a9  main
> @ 0x7f7f66fdfb45  __libc_start_main
> @   0x8f3549  (unknown)
> @  (nil)  (unknown)
> [2]2927 abort (core dumped)  GLOG_logtostderr=1 GTEST_v=10 
> ./bin/mesos-tests.sh --verbose
> {code}
> The bundled version of leveldb is v1.4. I tested version 1.5 and that seems 
> to work.  However, v1.6 

[jira] [Commented] (MESOS-5343) Behavior of custom HTTP authenticators with disabled HTTP authentication is inconsistent between master and agent

2016-05-10 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279217#comment-15279217
 ] 

Vinod Kone commented on MESOS-5343:
---

Why do we want to load an authenticator if we have not enabled authN? i would 
like the master to match the agent's behavior here after a deprecation cycle. 
IIUC `--http_authenticators` flag on master followed the footsteps of 
`--authenticator` flag, without realizing that the latter one needed a default 
flag value for backwards compatibility.

> Behavior of custom HTTP authenticators with disabled HTTP authentication is 
> inconsistent between master and agent
> -
>
> Key: MESOS-5343
> URL: https://issues.apache.org/jira/browse/MESOS-5343
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.29.0
>Reporter: Benjamin Bannier
>Priority: Minor
>  Labels: mesosphere, security
> Fix For: 0.29.0
>
>
> When setting a custom authenticator with {{http_authenticators}} and also 
> specifying {{authenticate_http=false}} currently agents refuse to start with
> {code}
> A custom HTTP authenticator was specified with the '--http_authenticators' 
> flag, but HTTP authentication was not enabled via '--authenticate_http'
> {code}
> Masters on the other hand accept this setting.
> Having differing behavior between master and agents is confusing, and we 
> should decide on whether we want to accept these settings or not, and make 
> the implementations consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5362) Add authentication to example frameworks

2016-05-10 Thread Greg Mann (JIRA)
Greg Mann created MESOS-5362:


 Summary: Add authentication to example frameworks
 Key: MESOS-5362
 URL: https://issues.apache.org/jira/browse/MESOS-5362
 Project: Mesos
  Issue Type: Improvement
  Components: security
Reporter: Greg Mann


Some example frameworks do not have the ability to authenticate with the 
master. Adding authentication to the example frameworks that don't already have 
it implemented would allow us to use these frameworks for testing in 
authenticated/authorized scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5340) libevent builds may prevent new connections

2016-05-10 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-5340:
---
Shepherd: Joris Van Remoortere
Assignee: Benjamin Mahler

[~jvanremoortere] I took a look and have a proposed a fix here: 
https://reviews.apache.org/r/47192/

> libevent builds may prevent new connections
> ---
>
> Key: MESOS-5340
> URL: https://issues.apache.org/jira/browse/MESOS-5340
> Project: Mesos
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.29.0, 0.28.1
>Reporter: Till Toenshoff
>Assignee: Benjamin Mahler
>Priority: Blocker
>  Labels: mesosphere, security, ssl
>
> When using an SSL-enabled build of Mesos in combination with SSL-downgrading 
> support, any connection that does not actually transmit data will hang the 
> runnable (e.g. master).
> For reproducing the issue (on any platform)...
> Spin up a master with enabled SSL-downgrading:
> {noformat}
> $ export SSL_ENABLED=true
> $ export SSL_SUPPORT_DOWNGRADE=true
> $ export SSL_KEY_FILE=/path/to/your/foo.key
> $ export SSL_CERT_FILE=/path/to/your/foo.crt
> $ export SSL_CA_FILE=/path/to/your/ca.crt
> $ ./bin/mesos-master.sh --work_dir=/tmp/foo
> {noformat}
> Create some artificial HTTP request load for quickly spotting the problem in 
> both, the master logs as well as the output of CURL itself:
> {noformat}
> $ while true; do sleep 0.1; echo $( date +">%H:%M:%S.%3N"; curl -s -k -A "SSL 
> Debug" http://localhost:5050/master/slaves; echo ;date +"<%H:%M:%S.%3N"; 
> echo); done
> {noformat}
> Now create a connection to the master that does not transmit any data:
> {noformat}
> $ telnet localhost 5050
> {noformat}
> You should now see the CURL requests hanging, the master stops responding to 
> new connections. This will persist until either some data is transmitted via 
> the above telnet connection or it is closed.
> This problem has initially been observed when running Mesos on an AWS cluster 
> with enabled load-balancer (which uses an idle, persistent connection) for 
> the master node. Such connection does naturally not transmit any data as long 
> as there are no external requests routed via the load-balancer. AWS allows 
> setting up a timeout for those connections and in our test environment, this 
> duration was set to 60 seconds and hence we were seeing our master getting 
> repetitively unresponsive for 60 seconds, then getting "unstuck" for a brief 
> period until it got stuck again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5361) Consider introducing TCP KeepAlive for Libprocess sockets.

2016-05-10 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-5361:
-

 Summary: Consider introducing TCP KeepAlive for Libprocess sockets.
 Key: MESOS-5361
 URL: https://issues.apache.org/jira/browse/MESOS-5361
 Project: Mesos
  Issue Type: Improvement
  Components: libprocess
Reporter: Anand Mazumdar


We currently don't use TCP KeepAlive's when creating sockets in libprocess. 
This might benefit master <-> scheduler, master <-> agent connections i.e. we 
can detect if any of them failed faster.

Currently, if the master process goes down. If for some reason the {{RST}} 
sequence did not reach the scheduler, the scheduler can only come to know about 
the disconnection when it tries to do a {{send}} itself. 

The default TCP keep alive values on Linux are a joke though:
{code}
. This means that the keepalive routines wait for two hours (7200 secs) before 
sending the first keepalive probe, and then resend it every 75 seconds. If no 
ACK response is received for nine consecutive times, the connection is marked 
as broken.
{code}

However, for long running instances of scheduler/agent this still can be 
beneficial. Also, operators might start tuning the values for their clusters 
explicitly once we start supporting it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5361) Consider introducing TCP KeepAlive for Libprocess sockets.

2016-05-10 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-5361:
--
Description: 
We currently don't use TCP KeepAlive's when creating sockets in libprocess. 
This might benefit master - scheduler, master - agent connections i.e. we can 
detect if any of them failed faster.

Currently, if the master process goes down. If for some reason the {{RST}} 
sequence did not reach the scheduler, the scheduler can only come to know about 
the disconnection when it tries to do a {{send}} itself. 

The default TCP keep alive values on Linux are a joke though:
{code}
. This means that the keepalive routines wait for two hours (7200 secs) before 
sending the first keepalive probe, and then resend it every 75 seconds. If no 
ACK response is received for nine consecutive times, the connection is marked 
as broken.
{code}

However, for long running instances of scheduler/agent this still can be 
beneficial. Also, operators might start tuning the values for their clusters 
explicitly once we start supporting it.

  was:
We currently don't use TCP KeepAlive's when creating sockets in libprocess. 
This might benefit master <-> scheduler, master <-> agent connections i.e. we 
can detect if any of them failed faster.

Currently, if the master process goes down. If for some reason the {{RST}} 
sequence did not reach the scheduler, the scheduler can only come to know about 
the disconnection when it tries to do a {{send}} itself. 

The default TCP keep alive values on Linux are a joke though:
{code}
. This means that the keepalive routines wait for two hours (7200 secs) before 
sending the first keepalive probe, and then resend it every 75 seconds. If no 
ACK response is received for nine consecutive times, the connection is marked 
as broken.
{code}

However, for long running instances of scheduler/agent this still can be 
beneficial. Also, operators might start tuning the values for their clusters 
explicitly once we start supporting it.


> Consider introducing TCP KeepAlive for Libprocess sockets.
> --
>
> Key: MESOS-5361
> URL: https://issues.apache.org/jira/browse/MESOS-5361
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Anand Mazumdar
>  Labels: mesosphere
>
> We currently don't use TCP KeepAlive's when creating sockets in libprocess. 
> This might benefit master - scheduler, master - agent connections i.e. we can 
> detect if any of them failed faster.
> Currently, if the master process goes down. If for some reason the {{RST}} 
> sequence did not reach the scheduler, the scheduler can only come to know 
> about the disconnection when it tries to do a {{send}} itself. 
> The default TCP keep alive values on Linux are a joke though:
> {code}
> . This means that the keepalive routines wait for two hours (7200 secs) 
> before sending the first keepalive probe, and then resend it every 75 
> seconds. If no ACK response is received for nine consecutive times, the 
> connection is marked as broken.
> {code}
> However, for long running instances of scheduler/agent this still can be 
> beneficial. Also, operators might start tuning the values for their clusters 
> explicitly once we start supporting it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5060) Requesting /files/read.json with a negative length value causes subsequent /files requests to 404.

2016-05-10 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278957#comment-15278957
 ] 

Greg Mann commented on MESOS-5060:
--

Hi [~dongdong]! My mistake: your current patch actually breaks the pailer 
({{src/webui/master/static/pailer.html}}, 
{{src/webui/master/static/js/jquery.pailer.js}}), which is used to display 
files in the web UI. We should be sure to test your patch by doing the 
following:
* Start a master
* Start an agent
* Run the test framework ({{build/src/test-framework}})
* Look in the web UI at the {{stdout}} of one of the test framework's tasks and 
confirm that it displays correctly

The problem is that the pailer intentionally sets {{length = -1}} on its first 
request in order to determine the length of the file. I think that we should 
leave the pailer as it is right now, and change it if necessary when MESOS-5334 
is worked on?

So for the current patch, perhaps we can allow {{length == -1}} and treat it 
the same as when the length parameter is not provided?

> Requesting /files/read.json with a negative length value causes subsequent 
> /files requests to 404.
> --
>
> Key: MESOS-5060
> URL: https://issues.apache.org/jira/browse/MESOS-5060
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.23.0
> Environment: Mesos 0.23.0 on CentOS 6, also Mesos 0.28.0 on OSX
>Reporter: Tom Petr
>Assignee: zhou xing
>Priority: Minor
> Fix For: 0.29.0
>
>
> I accidentally hit a slave's /files/read.json endpoint with a negative length 
> (ex. http://hostname:5051/files/read.json?path=XXX=0=-100). The 
> HTTP request timed out after 30 seconds with nothing relevant in the slave 
> logs, and subsequent calls to any of the /files endpoints on that slave 
> immediately returned a HTTP 404 response. We ultimately got things working 
> again by restarting the mesos-slave process (checkpointing FTW!), but it'd be 
> wise to guard against negative lengths on the slave's end too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3784) Replace Master/Slave Terminology Phase I - Update mesos-cli

2016-05-10 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278921#comment-15278921
 ] 

Vinod Kone commented on MESOS-3784:
---

1. Throwing a deprecation warning when using `slave` subcommand sounds good.

2. Sounds good.

3.  I personally would love for mesos to have a standalone CLI. But it's true 
that it is not actively being maintained/tested. Fixing that is out of scope 
for this ticket. Lets just focus on renaming slave to agent in those scripts 
where possible.

> Replace Master/Slave Terminology Phase I - Update mesos-cli 
> 
>
> Key: MESOS-3784
> URL: https://issues.apache.org/jira/browse/MESOS-3784
> Project: Mesos
>  Issue Type: Task
>Reporter: Diana Arroyo
>Assignee: Jay Guo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2201) ReplicaTest_Restore fails with leveldb greater than v1.7

2016-05-10 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278893#comment-15278893
 ] 

Alexander Rukletsov commented on MESOS-2201:


I was able to reproduce the bug on Ubuntu 14.04 with an older version leveldb 
(1.10) and confirmed that the proposed fix resolves the problem.

> ReplicaTest_Restore fails with leveldb greater than v1.7
> 
>
> Key: MESOS-2201
> URL: https://issues.apache.org/jira/browse/MESOS-2201
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.29.0
> Environment: E.g. Ubuntu 14.04.4 LTS + leveldb 1.10
>Reporter: Kapil Arya
>Assignee: Tomasz Janiszewski
>Priority: Minor
>  Labels: mesosphere
>
> I wanted to configure Mesos with system provided leveldb libraries when I ran 
> into this issue. Apparently,  if one does {{../configure 
> --with-leveldb=/path/to/leveldb}}, compilation succeeds, however the 
> "ReplicaTest_Restore" test fails with the following back trace:
> {code}
> [ RUN  ] ReplicaTest.Restore
> Using temporary directory '/tmp/ReplicaTest_Restore_IZbbRR'
> I1222 14:16:49.517500  2927 leveldb.cpp:176] Opened db in 10.758917ms
> I1222 14:16:49.526495  2927 leveldb.cpp:183] Compacted db in 8.931146ms
> I1222 14:16:49.526523  2927 leveldb.cpp:198] Created db iterator in 5787ns
> I1222 14:16:49.526531  2927 leveldb.cpp:204] Seeked to beginning of db in 
> 511ns
> I1222 14:16:49.526535  2927 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 197ns
> I1222 14:16:49.526623  2927 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1222 14:16:49.530972  2945 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 3.084458ms
> I1222 14:16:49.531008  2945 replica.cpp:320] Persisted replica status to 
> VOTING
> I1222 14:16:49.541263  2927 leveldb.cpp:176] Opened db in 9.980586ms
> I1222 14:16:49.551636  2927 leveldb.cpp:183] Compacted db in 10.348096ms
> I1222 14:16:49.551683  2927 leveldb.cpp:198] Created db iterator in 3405ns
> I1222 14:16:49.551693  2927 leveldb.cpp:204] Seeked to beginning of db in 
> 3559ns
> I1222 14:16:49.551728  2927 leveldb.cpp:273] Iterated through 1 keys in the 
> db in 29722ns
> I1222 14:16:49.551751  2927 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1222 14:16:49.551996  2947 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I1222 14:16:49.560921  2947 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 8.899591ms
> I1222 14:16:49.560940  2947 replica.cpp:342] Persisted promised to 1
> I1222 14:16:49.561338  2943 replica.cpp:508] Replica received write request 
> for position 1
> I1222 14:16:49.568677  2943 leveldb.cpp:343] Persisting action (27 bytes) to 
> leveldb took 7.287155ms
> I1222 14:16:49.568692  2943 replica.cpp:676] Persisted action at 1
> I1222 14:16:49.569042  2942 leveldb.cpp:438] Reading position from leveldb 
> took 26339ns
> F1222 14:16:49.569411  2927 replica.cpp:721] CHECK_SOME(state): IO error: 
> lock /tmp/ReplicaTest_Restore_IZbbRR/.log/LOCK: already held by process 
> Failed to recover the log
> *** Check failure stack trace: ***
> @ 0x7f7f6c53e688  google::LogMessage::Fail()
> @ 0x7f7f6c53e5e7  google::LogMessage::SendToLog()
> @ 0x7f7f6c53dff8  google::LogMessage::Flush()
> @ 0x7f7f6c540d2c  google::LogMessageFatal::~LogMessageFatal()
> @   0x90a520  _CheckFatal::~_CheckFatal()
> @ 0x7f7f6c400f4d  mesos::internal::log::ReplicaProcess::restore()
> @ 0x7f7f6c3fd763  
> mesos::internal::log::ReplicaProcess::ReplicaProcess()
> @ 0x7f7f6c401271  mesos::internal::log::Replica::Replica()
> @   0xcd7ca3  ReplicaTest_Restore_Test::TestBody()
> @  0x10934b2  
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x108e584  
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x10768fd  testing::Test::Run()
> @  0x1077020  testing::TestInfo::Run()
> @  0x10775a8  testing::TestCase::Run()
> @  0x107c324  testing::internal::UnitTestImpl::RunAllTests()
> @  0x1094348  
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x108f2b7  
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x107b1d4  testing::UnitTest::Run()
> @   0xd344a9  main
> @ 0x7f7f66fdfb45  __libc_start_main
> @   0x8f3549  (unknown)
> @  (nil)  (unknown)
> [2]2927 abort (core dumped)  GLOG_logtostderr=1 GTEST_v=10 
> ./bin/mesos-tests.sh --verbose
> {code}
> The bundled version of leveldb is v1.4. I tested version 

[jira] [Commented] (MESOS-5288) Update leveldb patch file to suport s390x

2016-05-10 Thread Tomasz Janiszewski (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278870#comment-15278870
 ] 

Tomasz Janiszewski commented on MESOS-5288:
---

I've already try updating leveldb to v1.18 here 
https://github.com/janisz/mesos/commit/3eb6fb875b93d22d0c4f45b0db8966ba1c6c00d7
In MESOS-970 performance tests are mentioned but I haven't done them yet. 

> Update leveldb patch file to suport s390x
> -
>
> Key: MESOS-5288
> URL: https://issues.apache.org/jira/browse/MESOS-5288
> Project: Mesos
>  Issue Type: Bug
>Reporter: Bing Li
>Assignee: Bing Li
>
> There're 2 issues in leveldb-1.4.
> 1. Leveldb didn't build. Have to define MemoryBarrier() for s390x.
> I got the patch form https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=644336 
> .
> 2. A number of unit tests failed due to 1.4 doesn't detect endianness 
> properly. And s390x is big-endian.
> Got error messages like "Failed to recover the log: Corruption: checksum 
> mismatch".
> I have a backport patch which is part of the leveldb commit
> https://github.com/google/leveldb/commit/075a35a6d390167b77b687e067dd0ba593e7f624



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-5286) Add authorization to libprocess HTTP endpoints

2016-05-10 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-5286:
-
Comment: was deleted

(was: Reviews here:

https://reviews.apache.org/r/46866/
https://reviews.apache.org/r/46867/
https://reviews.apache.org/r/46883/
https://reviews.apache.org/r/46869/
https://reviews.apache.org/r/46870/
https://reviews.apache.org/r/46876/
https://reviews.apache.org/r/46881/
https://reviews.apache.org/r/46882/)

> Add authorization to libprocess HTTP endpoints
> --
>
> Key: MESOS-5286
> URL: https://issues.apache.org/jira/browse/MESOS-5286
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
> Fix For: 0.29.0
>
>
> Now that the libprocess-level HTTP endpoints have had authentication added to 
> them in MESOS-4902, we can add authorization to them as well. As a first 
> step, we can implement a "coarse-grained" approach, in which a principal is 
> granted or denied access to a given endpoint. We will likely need to register 
> an authorizer with libprocess.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5286) Add authorization to libprocess HTTP endpoints

2016-05-10 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278846#comment-15278846
 ] 

Greg Mann commented on MESOS-5286:
--

Reviews here:

https://reviews.apache.org/r/46866/
https://reviews.apache.org/r/46867/
https://reviews.apache.org/r/46883/
https://reviews.apache.org/r/46869/
https://reviews.apache.org/r/46870/
https://reviews.apache.org/r/46881/
https://reviews.apache.org/r/46882/

> Add authorization to libprocess HTTP endpoints
> --
>
> Key: MESOS-5286
> URL: https://issues.apache.org/jira/browse/MESOS-5286
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
> Fix For: 0.29.0
>
>
> Now that the libprocess-level HTTP endpoints have had authentication added to 
> them in MESOS-4902, we can add authorization to them as well. As a first 
> step, we can implement a "coarse-grained" approach, in which a principal is 
> granted or denied access to a given endpoint. We will likely need to register 
> an authorizer with libprocess.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5330) Agent should backoff before connecting to the master

2016-05-10 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278820#comment-15278820
 ] 

Vinod Kone commented on MESOS-5330:
---

Hey DRob. Thanks for filing the ticket. Have you found and talked to a shepherd 
about the fix? Ideally we expect that to happen before you send a review out.

> Agent should backoff before connecting to the master
> 
>
> Key: MESOS-5330
> URL: https://issues.apache.org/jira/browse/MESOS-5330
> Project: Mesos
>  Issue Type: Bug
>Reporter: David Robinson
>Assignee: David Robinson
>
> When an agent is started it starts a background task (libprocess process?) to 
> detect the leading master. When the leading master is detected (or changes) 
> the [SocketManager's link() method is called and a TCP connection to the 
> master is 
> established|https://github.com/apache/mesos/blob/a138e2246a30c4b5c9bc3f7069ad12204dcaffbc/src/slave/slave.cpp#L954].
>  The agent _then_ backs off before sending a ReRegisterSlave message via the 
> newly established connection. The agent needs to backoff _before_ attempting 
> to establish a TCP connection to the master, not before sending the first 
> message over the connection.
> During scale tests at Twitter we discovered that agents can SYN flood the 
> master upon leader changes, then the problem described in MESOS-5200 can 
> occur where ephemeral connections are used, which exacerbates the problem. 
> The end result is a lot of hosts setting up and tearing down TCP connections 
> every slave_ping_timeout seconds (15 by default), connections failing to be 
> established, hosts being marked as unhealthy and being shutdown. We observed 
> ~800 passive TCP connections per second on the leading master during scale 
> tests.
> The problem can be somewhat mitigated by tuning the kernel to handle a 
> thundering herd of TCP connections, but ideally there would not be a 
> thundering herd to begin with.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2201) ReplicaTest_Restore fails with leveldb greater than v1.7

2016-05-10 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-2201:
---
 Shepherd: Alexander Rukletsov
   Sprint: Mesosphere Sprint 35
Affects Version/s: 0.29.0
 Story Points: 3
  Environment: E.g. Ubuntu 14.04.4 LTS + leveldb 1.10
   Labels: mesosphere  (was: )
 Priority: Minor  (was: Major)

> ReplicaTest_Restore fails with leveldb greater than v1.7
> 
>
> Key: MESOS-2201
> URL: https://issues.apache.org/jira/browse/MESOS-2201
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.29.0
> Environment: E.g. Ubuntu 14.04.4 LTS + leveldb 1.10
>Reporter: Kapil Arya
>Assignee: Tomasz Janiszewski
>Priority: Minor
>  Labels: mesosphere
>
> I wanted to configure Mesos with system provided leveldb libraries when I ran 
> into this issue. Apparently,  if one does {{../configure 
> --with-leveldb=/path/to/leveldb}}, compilation succeeds, however the 
> "ReplicaTest_Restore" test fails with the following back trace:
> {code}
> [ RUN  ] ReplicaTest.Restore
> Using temporary directory '/tmp/ReplicaTest_Restore_IZbbRR'
> I1222 14:16:49.517500  2927 leveldb.cpp:176] Opened db in 10.758917ms
> I1222 14:16:49.526495  2927 leveldb.cpp:183] Compacted db in 8.931146ms
> I1222 14:16:49.526523  2927 leveldb.cpp:198] Created db iterator in 5787ns
> I1222 14:16:49.526531  2927 leveldb.cpp:204] Seeked to beginning of db in 
> 511ns
> I1222 14:16:49.526535  2927 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 197ns
> I1222 14:16:49.526623  2927 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1222 14:16:49.530972  2945 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 3.084458ms
> I1222 14:16:49.531008  2945 replica.cpp:320] Persisted replica status to 
> VOTING
> I1222 14:16:49.541263  2927 leveldb.cpp:176] Opened db in 9.980586ms
> I1222 14:16:49.551636  2927 leveldb.cpp:183] Compacted db in 10.348096ms
> I1222 14:16:49.551683  2927 leveldb.cpp:198] Created db iterator in 3405ns
> I1222 14:16:49.551693  2927 leveldb.cpp:204] Seeked to beginning of db in 
> 3559ns
> I1222 14:16:49.551728  2927 leveldb.cpp:273] Iterated through 1 keys in the 
> db in 29722ns
> I1222 14:16:49.551751  2927 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1222 14:16:49.551996  2947 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I1222 14:16:49.560921  2947 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 8.899591ms
> I1222 14:16:49.560940  2947 replica.cpp:342] Persisted promised to 1
> I1222 14:16:49.561338  2943 replica.cpp:508] Replica received write request 
> for position 1
> I1222 14:16:49.568677  2943 leveldb.cpp:343] Persisting action (27 bytes) to 
> leveldb took 7.287155ms
> I1222 14:16:49.568692  2943 replica.cpp:676] Persisted action at 1
> I1222 14:16:49.569042  2942 leveldb.cpp:438] Reading position from leveldb 
> took 26339ns
> F1222 14:16:49.569411  2927 replica.cpp:721] CHECK_SOME(state): IO error: 
> lock /tmp/ReplicaTest_Restore_IZbbRR/.log/LOCK: already held by process 
> Failed to recover the log
> *** Check failure stack trace: ***
> @ 0x7f7f6c53e688  google::LogMessage::Fail()
> @ 0x7f7f6c53e5e7  google::LogMessage::SendToLog()
> @ 0x7f7f6c53dff8  google::LogMessage::Flush()
> @ 0x7f7f6c540d2c  google::LogMessageFatal::~LogMessageFatal()
> @   0x90a520  _CheckFatal::~_CheckFatal()
> @ 0x7f7f6c400f4d  mesos::internal::log::ReplicaProcess::restore()
> @ 0x7f7f6c3fd763  
> mesos::internal::log::ReplicaProcess::ReplicaProcess()
> @ 0x7f7f6c401271  mesos::internal::log::Replica::Replica()
> @   0xcd7ca3  ReplicaTest_Restore_Test::TestBody()
> @  0x10934b2  
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x108e584  
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x10768fd  testing::Test::Run()
> @  0x1077020  testing::TestInfo::Run()
> @  0x10775a8  testing::TestCase::Run()
> @  0x107c324  testing::internal::UnitTestImpl::RunAllTests()
> @  0x1094348  
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x108f2b7  
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x107b1d4  testing::UnitTest::Run()
> @   0xd344a9  main
> @ 0x7f7f66fdfb45  __libc_start_main
> @   0x8f3549  (unknown)
> @  (nil)  (unknown)
> [2]2927 abort (core dumped)  GLOG_logtostderr=1 

[jira] [Created] (MESOS-5360) Set death signal for dvdcli subprocess in docker volume isolator.

2016-05-10 Thread Jie Yu (JIRA)
Jie Yu created MESOS-5360:
-

 Summary: Set death signal for dvdcli subprocess in docker volume 
isolator.
 Key: MESOS-5360
 URL: https://issues.apache.org/jira/browse/MESOS-5360
 Project: Mesos
  Issue Type: Improvement
Reporter: Jie Yu


If the slave crashes, we should kill the dvdcli subprocess. Otherwise, if the 
dvdcli subprocess gets stuck, it'll not be cleaned up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5288) Update leveldb patch file to suport s390x

2016-05-10 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278606#comment-15278606
 ] 

Bing Li commented on MESOS-5288:


Great. I guess you meant v1.18. 

I'll update leveldb and try the patch provided in MESOS-2201.
Thanks,

> Update leveldb patch file to suport s390x
> -
>
> Key: MESOS-5288
> URL: https://issues.apache.org/jira/browse/MESOS-5288
> Project: Mesos
>  Issue Type: Bug
>Reporter: Bing Li
>Assignee: Bing Li
>
> There're 2 issues in leveldb-1.4.
> 1. Leveldb didn't build. Have to define MemoryBarrier() for s390x.
> I got the patch form https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=644336 
> .
> 2. A number of unit tests failed due to 1.4 doesn't detect endianness 
> properly. And s390x is big-endian.
> Got error messages like "Failed to recover the log: Corruption: checksum 
> mismatch".
> I have a backport patch which is part of the leveldb commit
> https://github.com/google/leveldb/commit/075a35a6d390167b77b687e067dd0ba593e7f624



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2016-05-10 Thread Chris (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris updated MESOS-5342:
-
Labels: cgroups cpu cpu-usage gpu isolation isolator mentor perfomance  
(was: )

> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>  Labels: cgroups, cpu, cpu-usage, gpu, isolation, isolator, 
> mentor, perfomance
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance, specifically in terms of cache locality. 
> Applications requiring GPU resources can benefit from this feature by getting 
> access to cores closest to the GPU hardware, which reduces cpu-gpu copy 
> latency.
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology, and the advent of Intel's Knights-series processors, will require 
> making choices about where container's are going to run on the mesos-agent's 
> processor(s) cores - this feature is a step toward developing a robust 
> solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5358) Design Doc for CPU pinning/binding support (MESOS-5342)

2016-05-10 Thread Chris (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris updated MESOS-5358:
-
Labels: cgroups cpu cpu-usage documentation gpu isolation isolator mentor 
newbie performance  (was: documentation mentor newbie performance)

> Design Doc for CPU pinning/binding support (MESOS-5342)
> ---
>
> Key: MESOS-5358
> URL: https://issues.apache.org/jira/browse/MESOS-5358
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.28.1
>Reporter: Chris
>  Labels: cgroups, cpu, cpu-usage, documentation, gpu, isolation, 
> isolator, mentor, newbie, performance
>
> Develop design document for MESOS-5342.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5358) Design Doc for CPU pinning/binding support (MESOS-5342)

2016-05-10 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278570#comment-15278570
 ] 

Chris commented on MESOS-5358:
--

Requesting a shepard!

> Design Doc for CPU pinning/binding support (MESOS-5342)
> ---
>
> Key: MESOS-5358
> URL: https://issues.apache.org/jira/browse/MESOS-5358
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.28.1
>Reporter: Chris
>  Labels: documentation, mentor, newbie, performance
>
> Develop design document for MESOS-5342.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2016-05-10 Thread Chris (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris updated MESOS-5342:
-
Comment: was deleted

(was: Documentation for this ticket is MESOS-5358)

> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance, specifically in terms of cache locality. 
> Applications requiring GPU resources can benefit from this feature by getting 
> access to cores closest to the GPU hardware, which reduces cpu-gpu copy 
> latency.
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology, and the advent of Intel's Knights-series processors, will require 
> making choices about where container's are going to run on the mesos-agent's 
> processor(s) cores - this feature is a step toward developing a robust 
> solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5242) pivot_root is not available on System z (s390x)

2016-05-10 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278556#comment-15278556
 ] 

haosdent commented on MESOS-5242:
-

Thank you for your help on this issue! I just reply in MESOS-5288. :-)

> pivot_root is not available on System z (s390x)
> ---
>
> Key: MESOS-5242
> URL: https://issues.apache.org/jira/browse/MESOS-5242
> Project: Mesos
>  Issue Type: Bug
> Environment: Hardward: IBM System z
> OS: Linux on z SLES12SP1
>Reporter: Bing Li
>Assignee: Bing Li
>
> Got error "pivot_root is not available" which is similar to MESOS-5121 .
> Added syscall pivot_root definition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5357) Add a function to extract HTTP endpoints from an URL.

2016-05-10 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-5357:
---
Labels: libprocess mesosphere newbie security  (was: libprocess newbie 
security)

> Add a function to extract HTTP endpoints from an URL.
> -
>
> Key: MESOS-5357
> URL: https://issues.apache.org/jira/browse/MESOS-5357
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>  Labels: libprocess, mesosphere, newbie, security
> Fix For: 0.29.0
>
>
> HTTP endpoints in Mesos receive a {{process::http::Request}} that includes a 
> {{process::http::URL}}. The {{path}} member of the URL instance is of the 
> form {{/master/endpoint}} or {{/slave\(n\)/endpoint}}. We want to implement 
> authorization of endpoints and need to extract the endpoint from that path 
> and that function should be accessible for masters as well as agents.
> This can be done by adding a method to {{process::http::URL}} that implements 
> the extraction logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2201) ReplicaTest_Restore fails with leveldb greater than v1.7

2016-05-10 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-2201:

Summary: ReplicaTest_Restore fails with leveldb greater than v1.7  (was: 
Make check fails with leveldb > v1.7)

> ReplicaTest_Restore fails with leveldb greater than v1.7
> 
>
> Key: MESOS-2201
> URL: https://issues.apache.org/jira/browse/MESOS-2201
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Kapil Arya
>Assignee: Tomasz Janiszewski
>
> I wanted to configure Mesos with system provided leveldb libraries when I ran 
> into this issue. Apparently,  if one does {{../configure 
> --with-leveldb=/path/to/leveldb}}, compilation succeeds, however the 
> "ReplicaTest_Restore" test fails with the following back trace:
> {code}
> [ RUN  ] ReplicaTest.Restore
> Using temporary directory '/tmp/ReplicaTest_Restore_IZbbRR'
> I1222 14:16:49.517500  2927 leveldb.cpp:176] Opened db in 10.758917ms
> I1222 14:16:49.526495  2927 leveldb.cpp:183] Compacted db in 8.931146ms
> I1222 14:16:49.526523  2927 leveldb.cpp:198] Created db iterator in 5787ns
> I1222 14:16:49.526531  2927 leveldb.cpp:204] Seeked to beginning of db in 
> 511ns
> I1222 14:16:49.526535  2927 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 197ns
> I1222 14:16:49.526623  2927 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1222 14:16:49.530972  2945 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 3.084458ms
> I1222 14:16:49.531008  2945 replica.cpp:320] Persisted replica status to 
> VOTING
> I1222 14:16:49.541263  2927 leveldb.cpp:176] Opened db in 9.980586ms
> I1222 14:16:49.551636  2927 leveldb.cpp:183] Compacted db in 10.348096ms
> I1222 14:16:49.551683  2927 leveldb.cpp:198] Created db iterator in 3405ns
> I1222 14:16:49.551693  2927 leveldb.cpp:204] Seeked to beginning of db in 
> 3559ns
> I1222 14:16:49.551728  2927 leveldb.cpp:273] Iterated through 1 keys in the 
> db in 29722ns
> I1222 14:16:49.551751  2927 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1222 14:16:49.551996  2947 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I1222 14:16:49.560921  2947 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 8.899591ms
> I1222 14:16:49.560940  2947 replica.cpp:342] Persisted promised to 1
> I1222 14:16:49.561338  2943 replica.cpp:508] Replica received write request 
> for position 1
> I1222 14:16:49.568677  2943 leveldb.cpp:343] Persisting action (27 bytes) to 
> leveldb took 7.287155ms
> I1222 14:16:49.568692  2943 replica.cpp:676] Persisted action at 1
> I1222 14:16:49.569042  2942 leveldb.cpp:438] Reading position from leveldb 
> took 26339ns
> F1222 14:16:49.569411  2927 replica.cpp:721] CHECK_SOME(state): IO error: 
> lock /tmp/ReplicaTest_Restore_IZbbRR/.log/LOCK: already held by process 
> Failed to recover the log
> *** Check failure stack trace: ***
> @ 0x7f7f6c53e688  google::LogMessage::Fail()
> @ 0x7f7f6c53e5e7  google::LogMessage::SendToLog()
> @ 0x7f7f6c53dff8  google::LogMessage::Flush()
> @ 0x7f7f6c540d2c  google::LogMessageFatal::~LogMessageFatal()
> @   0x90a520  _CheckFatal::~_CheckFatal()
> @ 0x7f7f6c400f4d  mesos::internal::log::ReplicaProcess::restore()
> @ 0x7f7f6c3fd763  
> mesos::internal::log::ReplicaProcess::ReplicaProcess()
> @ 0x7f7f6c401271  mesos::internal::log::Replica::Replica()
> @   0xcd7ca3  ReplicaTest_Restore_Test::TestBody()
> @  0x10934b2  
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x108e584  
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x10768fd  testing::Test::Run()
> @  0x1077020  testing::TestInfo::Run()
> @  0x10775a8  testing::TestCase::Run()
> @  0x107c324  testing::internal::UnitTestImpl::RunAllTests()
> @  0x1094348  
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x108f2b7  
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x107b1d4  testing::UnitTest::Run()
> @   0xd344a9  main
> @ 0x7f7f66fdfb45  __libc_start_main
> @   0x8f3549  (unknown)
> @  (nil)  (unknown)
> [2]2927 abort (core dumped)  GLOG_logtostderr=1 GTEST_v=10 
> ./bin/mesos-tests.sh --verbose
> {code}
> The bundled version of leveldb is v1.4. I tested version 1.5 and that seems 
> to work.  However, v1.6 had some build issues and us unusable with Mesos. The 
> next version v1.7, allows Mesos to compile fine but results in the above 
> error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5288) Update leveldb patch file to suport s390x

2016-05-10 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278554#comment-15278554
 ] 

haosdent commented on MESOS-5288:
-

[~bingli1000], I think may consider upgrade to leveldb 1.8 first, they only 
patch parts for endianness detection. For MESOS-2201, [~janisz] has posted his 
patch. You could apply it and your patches to check if all test cases could 
pass.

> Update leveldb patch file to suport s390x
> -
>
> Key: MESOS-5288
> URL: https://issues.apache.org/jira/browse/MESOS-5288
> Project: Mesos
>  Issue Type: Bug
>Reporter: Bing Li
>Assignee: Bing Li
>
> There're 2 issues in leveldb-1.4.
> 1. Leveldb didn't build. Have to define MemoryBarrier() for s390x.
> I got the patch form https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=644336 
> .
> 2. A number of unit tests failed due to 1.4 doesn't detect endianness 
> properly. And s390x is big-endian.
> Got error messages like "Failed to recover the log: Corruption: checksum 
> mismatch".
> I have a backport patch which is part of the leveldb commit
> https://github.com/google/leveldb/commit/075a35a6d390167b77b687e067dd0ba593e7f624



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5346) Some endpoints do not specify their allowed request methods.

2016-05-10 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-5346:
---
  Labels: http mesosphere security tech-debt  (was: http security tech-debt)
Priority: Major  (was: Minor)

> Some endpoints do not specify their allowed request methods.
> 
>
> Key: MESOS-5346
> URL: https://issues.apache.org/jira/browse/MESOS-5346
> Project: Mesos
>  Issue Type: Bug
>  Components: security, technical debt
>Reporter: Jan Schlicht
>  Labels: http, mesosphere, security, tech-debt
>
> Some HTTP endpoints (for example "/flags" or "/state") create a response 
> regardless of what the request method is. For example an HTTP POST to the 
> "/state" endpoint will create the same response as an HTTP GET.
> While this inconsistency isn't harmful at the moment, it will get problematic 
> when authorization is implemented, using separate ACLs for endpoints that can 
> be GETed and endpoints that can be POSTed to.
> Validation of the request method should be added to all endpoints, e.g. 
> "/state" should return a 405 (Method Not Allowed) when POSTed to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5359) The scheduler library should have a delay before initiating a connection with master.

2016-05-10 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-5359:
-

 Summary: The scheduler library should have a delay before 
initiating a connection with master.
 Key: MESOS-5359
 URL: https://issues.apache.org/jira/browse/MESOS-5359
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.29.0
Reporter: Anand Mazumdar


Currently, the scheduler library does have an artificially induced delay when 
trying to initially establish a connection with the master. In the event of a 
master failover or ZK disconnect, a large number of frameworks can get 
disconnected and then thereby overwhelm the master with TCP SYN requests. 

On a large cluster with many agents, the master is already overwhelmed with 
handling connection requests from the agents. This compounds the issue further 
on the master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4126) Construct the error string in `MethodNotAllowed`.

2016-05-10 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-4126:
---
   Sprint: Mesosphere Sprint 35
 Story Points: 3  (was: 1)
Fix Version/s: 0.29.0

> Construct the error string in `MethodNotAllowed`.
> -
>
> Key: MESOS-4126
> URL: https://issues.apache.org/jira/browse/MESOS-4126
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>Assignee: Jacob Janco
>Priority: Minor
>  Labels: http, mesosphere, newbie++
> Fix For: 0.29.0
>
>
> Consider constructing the error string in {{MethodNotAllowed}} rather than at 
> the invocation site. Currently we want all error messages follow the same 
> pattern, so instead of writing
> {code}
> return MethodNotAllowed({"POST"}, "Expecting 'POST', received '" + 
> request.method + "'");
> {code}
> we can write something like
> {code}
> MethodNotAllowed({"POST"}, request.method)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4126) Construct the error string in `MethodNotAllowed`.

2016-05-10 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-4126:
---
Description: 
Consider constructing the error string in {{MethodNotAllowed}} rather than at 
the invocation site. Currently we want all error messages follow the same 
pattern, so instead of writing
{code}
return MethodNotAllowed({"POST"}, "Expecting 'POST', received '" + 
request.method + "'");
{code}
we can write something like
{code}
MethodNotAllowed({"POST"}, request.method)
{code}


  was:
Consider constructing the error string in {{MethodNotAllowed}} rather than at 
the invocation site. Currently we want all error messages follow the same 
pattern, so instead of writing
{code}
return MethodNotAllowed({"POST"}, "Expecting 'POST', received '" + 
request.method + "'");
{code}
we can write something like
{code}
MethodNotAllowed({"POST"}, request.method)`
{code}



> Construct the error string in `MethodNotAllowed`.
> -
>
> Key: MESOS-4126
> URL: https://issues.apache.org/jira/browse/MESOS-4126
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>Assignee: Jacob Janco
>Priority: Minor
>  Labels: http, mesosphere, newbie++
>
> Consider constructing the error string in {{MethodNotAllowed}} rather than at 
> the invocation site. Currently we want all error messages follow the same 
> pattern, so instead of writing
> {code}
> return MethodNotAllowed({"POST"}, "Expecting 'POST', received '" + 
> request.method + "'");
> {code}
> we can write something like
> {code}
> MethodNotAllowed({"POST"}, request.method)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5358) Design Doc for CPU pinning/binding support (MESOS-5342)

2016-05-10 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278416#comment-15278416
 ] 

Chris commented on MESOS-5358:
--

Implementation ticket is MESOS-5342 
(https://issues.apache.org/jira/browse/MESOS-5342)

> Design Doc for CPU pinning/binding support (MESOS-5342)
> ---
>
> Key: MESOS-5358
> URL: https://issues.apache.org/jira/browse/MESOS-5358
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.28.1
>Reporter: Chris
>  Labels: documentation, mentor, newbie, performance
>
> Develop design document for MESOS-5342.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2016-05-10 Thread Chris (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris updated MESOS-5342:
-

Documentation for this ticket is MESOS-5358

> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance, specifically in terms of cache locality. 
> Applications requiring GPU resources can benefit from this feature by getting 
> access to cores closest to the GPU hardware, which reduces cpu-gpu copy 
> latency.
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology, and the advent of Intel's Knights-series processors, will require 
> making choices about where container's are going to run on the mesos-agent's 
> processor(s) cores - this feature is a step toward developing a robust 
> solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2016-05-10 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278405#comment-15278405
 ] 

Chris commented on MESOS-5342:
--

Note, this is my first design document for Mesos, it's not perfect.

> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance, specifically in terms of cache locality. 
> Applications requiring GPU resources can benefit from this feature by getting 
> access to cores closest to the GPU hardware, which reduces cpu-gpu copy 
> latency.
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology, and the advent of Intel's Knights-series processors, will require 
> making choices about where container's are going to run on the mesos-agent's 
> processor(s) cores - this feature is a step toward developing a robust 
> solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5358) DesignDoc for CPU pinning/binding support (MESOS-5342)

2016-05-10 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278400#comment-15278400
 ] 

Chris commented on MESOS-5358:
--

Design document posted here:

https://docs.google.com/document/d/1G3L1Tdulg5iW7hZ2WXbG-bqROILu7zdBh2aWYu3An6A/edit?usp=sharing

> DesignDoc for CPU pinning/binding support (MESOS-5342)
> --
>
> Key: MESOS-5358
> URL: https://issues.apache.org/jira/browse/MESOS-5358
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.28.1
>Reporter: Chris
>  Labels: documentation, mentor, newbie, performance
>
> Develop design document for MESOS-5342.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-5358) DesignDoc for CPU pinning/binding support (MESOS-5342)

2016-05-10 Thread Chris (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris updated MESOS-5358:
-
Comment: was deleted

(was: 
https://docs.google.com/document/d/1G3L1Tdulg5iW7hZ2WXbG-bqROILu7zdBh2aWYu3An6A/edit?usp=sharing)

> DesignDoc for CPU pinning/binding support (MESOS-5342)
> --
>
> Key: MESOS-5358
> URL: https://issues.apache.org/jira/browse/MESOS-5358
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.28.1
>Reporter: Chris
>  Labels: documentation, mentor, newbie, performance
>
> Develop design document for MESOS-5342.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5358) DesignDoc for CPU pinning/binding support (MESOS-5342)

2016-05-10 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278399#comment-15278399
 ] 

Chris commented on MESOS-5358:
--

https://docs.google.com/document/d/1G3L1Tdulg5iW7hZ2WXbG-bqROILu7zdBh2aWYu3An6A/edit?usp=sharing

> DesignDoc for CPU pinning/binding support (MESOS-5342)
> --
>
> Key: MESOS-5358
> URL: https://issues.apache.org/jira/browse/MESOS-5358
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.28.1
>Reporter: Chris
>  Labels: documentation, mentor, newbie, performance
>
> Develop design document for MESOS-5342.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5357) Add a function to extract HTTP endpoints from an URL.

2016-05-10 Thread Jan Schlicht (JIRA)
Jan Schlicht created MESOS-5357:
---

 Summary: Add a function to extract HTTP endpoints from an URL.
 Key: MESOS-5357
 URL: https://issues.apache.org/jira/browse/MESOS-5357
 Project: Mesos
  Issue Type: Improvement
  Components: libprocess
Reporter: Jan Schlicht
Assignee: Jan Schlicht
 Fix For: 0.29.0


HTTP endpoints in Mesos receive a {{process::http::Request}} that includes a 
{{process::http::URL}}. The {{path}} member of the URL instance is of the form 
{{/master/endpoint}} or {{/slave(n)/endpoint}}. We want to implement 
authorization of endpoints and need to extract the endpoint from that path and 
that function should be accessible for masters as well as agents.
This can be done by adding a method to {{process::http::URL}} that implements 
the extraction logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5357) Add a function to extract HTTP endpoints from an URL.

2016-05-10 Thread Jan Schlicht (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht updated MESOS-5357:

Description: 
HTTP endpoints in Mesos receive a {{process::http::Request}} that includes a 
{{process::http::URL}}. The {{path}} member of the URL instance is of the form 
{{/master/endpoint}} or {{/slave\(n\)/endpoint}}. We want to implement 
authorization of endpoints and need to extract the endpoint from that path and 
that function should be accessible for masters as well as agents.
This can be done by adding a method to {{process::http::URL}} that implements 
the extraction logic.

  was:
HTTP endpoints in Mesos receive a {{process::http::Request}} that includes a 
{{process::http::URL}}. The {{path}} member of the URL instance is of the form 
{{/master/endpoint}} or {{/slave(n)/endpoint}}. We want to implement 
authorization of endpoints and need to extract the endpoint from that path and 
that function should be accessible for masters as well as agents.
This can be done by adding a method to {{process::http::URL}} that implements 
the extraction logic.


> Add a function to extract HTTP endpoints from an URL.
> -
>
> Key: MESOS-5357
> URL: https://issues.apache.org/jira/browse/MESOS-5357
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>  Labels: libprocess, newbie, security
> Fix For: 0.29.0
>
>
> HTTP endpoints in Mesos receive a {{process::http::Request}} that includes a 
> {{process::http::URL}}. The {{path}} member of the URL instance is of the 
> form {{/master/endpoint}} or {{/slave\(n\)/endpoint}}. We want to implement 
> authorization of endpoints and need to extract the endpoint from that path 
> and that function should be accessible for masters as well as agents.
> This can be done by adding a method to {{process::http::URL}} that implements 
> the extraction logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5356) Add Windows support for StopWatch

2016-05-10 Thread Joris Van Remoortere (JIRA)
Joris Van Remoortere created MESOS-5356:
---

 Summary: Add Windows support for StopWatch
 Key: MESOS-5356
 URL: https://issues.apache.org/jira/browse/MESOS-5356
 Project: Mesos
  Issue Type: Improvement
Reporter: Joris Van Remoortere
Assignee: Alex Clemmer
 Fix For: 0.29.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3793) Cannot start mesos local on a Debian GNU/Linux 8 docker machine

2016-05-10 Thread igor (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278226#comment-15278226
 ] 

igor commented on MESOS-3793:
-

--no-systemd_enable_support flag helped. 

> Cannot start mesos local on a Debian GNU/Linux 8 docker machine
> ---
>
> Key: MESOS-3793
> URL: https://issues.apache.org/jira/browse/MESOS-3793
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
> Environment: Debian GNU/Linux 8 docker machine
>Reporter: Matthias Veit
>Assignee: Jojy Varghese
>  Labels: mesosphere
> Fix For: 0.26.0
>
>
> We updated the mesos version to 0.25.0 in our Marathon docker image, that 
> runs our integration tests.
> We use mesos local for those tests. This fails with this message:
> {noformat}
> root@a06e4b4eb776:/marathon# mesos local
> I1022 18:42:26.852485   136 leveldb.cpp:176] Opened db in 6.103258ms
> I1022 18:42:26.853302   136 leveldb.cpp:183] Compacted db in 765740ns
> I1022 18:42:26.853343   136 leveldb.cpp:198] Created db iterator in 9001ns
> I1022 18:42:26.853355   136 leveldb.cpp:204] Seeked to beginning of db in 
> 1287ns
> I1022 18:42:26.853366   136 leveldb.cpp:273] Iterated through 0 keys in the 
> db in ns
> I1022 18:42:26.853406   136 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1022 18:42:26.853775   141 recover.cpp:449] Starting replica recovery
> I1022 18:42:26.853862   141 recover.cpp:475] Replica is in EMPTY status
> I1022 18:42:26.854751   138 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I1022 18:42:26.854856   140 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1022 18:42:26.855002   140 recover.cpp:566] Updating replica status to 
> STARTING
> I1022 18:42:26.855655   138 master.cpp:376] Master 
> a3f39818-1bda-4710-b96b-2a60ed4d12b8 (a06e4b4eb776) started on 
> 172.17.0.14:5050
> I1022 18:42:26.855680   138 master.cpp:378] Flags at startup: 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="false" 
> --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_slave_ping_timeouts="5" --quiet="false" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" 
> --registry_strict="false" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" 
> --work_dir="/tmp/mesos/local/AK0XpG" --zk_session_timeout="10secs"
> I1022 18:42:26.855790   138 master.cpp:425] Master allowing unauthenticated 
> frameworks to register
> I1022 18:42:26.855803   138 master.cpp:430] Master allowing unauthenticated 
> slaves to register
> I1022 18:42:26.855815   138 master.cpp:467] Using default 'crammd5' 
> authenticator
> W1022 18:42:26.855829   138 authenticator.cpp:505] No credentials provided, 
> authentication requests will be refused
> I1022 18:42:26.855840   138 authenticator.cpp:512] Initializing server SASL
> I1022 18:42:26.856442   136 containerizer.cpp:143] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix
> I1022 18:42:26.856943   140 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.888185ms
> I1022 18:42:26.856987   140 replica.cpp:323] Persisted replica status to 
> STARTING
> I1022 18:42:26.857115   140 recover.cpp:475] Replica is in STARTING status
> I1022 18:42:26.857270   140 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I1022 18:42:26.857312   140 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1022 18:42:26.857368   140 recover.cpp:566] Updating replica status to VOTING
> I1022 18:42:26.857781   140 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 371121ns
> I1022 18:42:26.857841   140 replica.cpp:323] Persisted replica status to 
> VOTING
> I1022 18:42:26.857895   140 recover.cpp:580] Successfully joined the Paxos 
> group
> I1022 18:42:26.857928   140 recover.cpp:464] Recover process terminated
> I1022 18:42:26.862455   137 master.cpp:1603] The newly elected leader is 
> master@172.17.0.14:5050 with id a3f39818-1bda-4710-b96b-2a60ed4d12b8
> I1022 18:42:26.862498   137 master.cpp:1616] Elected as the leading master!
> I1022 18:42:26.862511   137 master.cpp:1376] Recovering from registrar
> I1022 18:42:26.862560   137 registrar.cpp:309] Recovering registrar
> Failed to create a containerizer: Could not create MesosContainerizer: Failed 
> to create 

[jira] [Commented] (MESOS-3793) Cannot start mesos local on a Debian GNU/Linux 8 docker machine

2016-05-10 Thread igor (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278189#comment-15278189
 ] 

igor commented on MESOS-3793:
-

same problem on centos 7.2


# mesos-slave --master=build:5050 --containerizers=docker --no-switch_user 
--launcher=posix
I0510 17:30:01.43576227 main.cpp:223] Build: 2016-04-14 15:43:08 by root
I0510 17:30:01.43644727 main.cpp:225] Version: 0.28.1
I0510 17:30:01.43646527 main.cpp:228] Git tag: 0.28.1
I0510 17:30:01.43647527 main.cpp:232] Git SHA: 
555db235a34afbb9fb49940376cc33a66f1f85f0
I0510 17:30:01.44741727 systemd.cpp:236] systemd version `219` detected
I0510 17:30:01.44750727 main.cpp:240] Inializing systemd state
Failed to initialize systemd: Failed to locate systemd runtime directory: 
/run/systemd/system


> Cannot start mesos local on a Debian GNU/Linux 8 docker machine
> ---
>
> Key: MESOS-3793
> URL: https://issues.apache.org/jira/browse/MESOS-3793
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
> Environment: Debian GNU/Linux 8 docker machine
>Reporter: Matthias Veit
>Assignee: Jojy Varghese
>  Labels: mesosphere
> Fix For: 0.26.0
>
>
> We updated the mesos version to 0.25.0 in our Marathon docker image, that 
> runs our integration tests.
> We use mesos local for those tests. This fails with this message:
> {noformat}
> root@a06e4b4eb776:/marathon# mesos local
> I1022 18:42:26.852485   136 leveldb.cpp:176] Opened db in 6.103258ms
> I1022 18:42:26.853302   136 leveldb.cpp:183] Compacted db in 765740ns
> I1022 18:42:26.853343   136 leveldb.cpp:198] Created db iterator in 9001ns
> I1022 18:42:26.853355   136 leveldb.cpp:204] Seeked to beginning of db in 
> 1287ns
> I1022 18:42:26.853366   136 leveldb.cpp:273] Iterated through 0 keys in the 
> db in ns
> I1022 18:42:26.853406   136 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1022 18:42:26.853775   141 recover.cpp:449] Starting replica recovery
> I1022 18:42:26.853862   141 recover.cpp:475] Replica is in EMPTY status
> I1022 18:42:26.854751   138 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I1022 18:42:26.854856   140 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1022 18:42:26.855002   140 recover.cpp:566] Updating replica status to 
> STARTING
> I1022 18:42:26.855655   138 master.cpp:376] Master 
> a3f39818-1bda-4710-b96b-2a60ed4d12b8 (a06e4b4eb776) started on 
> 172.17.0.14:5050
> I1022 18:42:26.855680   138 master.cpp:378] Flags at startup: 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="false" 
> --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_slave_ping_timeouts="5" --quiet="false" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" 
> --registry_strict="false" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" 
> --work_dir="/tmp/mesos/local/AK0XpG" --zk_session_timeout="10secs"
> I1022 18:42:26.855790   138 master.cpp:425] Master allowing unauthenticated 
> frameworks to register
> I1022 18:42:26.855803   138 master.cpp:430] Master allowing unauthenticated 
> slaves to register
> I1022 18:42:26.855815   138 master.cpp:467] Using default 'crammd5' 
> authenticator
> W1022 18:42:26.855829   138 authenticator.cpp:505] No credentials provided, 
> authentication requests will be refused
> I1022 18:42:26.855840   138 authenticator.cpp:512] Initializing server SASL
> I1022 18:42:26.856442   136 containerizer.cpp:143] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix
> I1022 18:42:26.856943   140 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.888185ms
> I1022 18:42:26.856987   140 replica.cpp:323] Persisted replica status to 
> STARTING
> I1022 18:42:26.857115   140 recover.cpp:475] Replica is in STARTING status
> I1022 18:42:26.857270   140 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I1022 18:42:26.857312   140 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1022 18:42:26.857368   140 recover.cpp:566] Updating replica status to VOTING
> I1022 18:42:26.857781   140 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 371121ns
> I1022 18:42:26.857841   140 replica.cpp:323] Persisted replica status to 
> VOTING
> I1022 

[jira] [Commented] (MESOS-3793) Cannot start mesos local on a Debian GNU/Linux 8 docker machine

2016-05-10 Thread igor (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278191#comment-15278191
 ] 

igor commented on MESOS-3793:
-

version 0.28.1

> Cannot start mesos local on a Debian GNU/Linux 8 docker machine
> ---
>
> Key: MESOS-3793
> URL: https://issues.apache.org/jira/browse/MESOS-3793
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
> Environment: Debian GNU/Linux 8 docker machine
>Reporter: Matthias Veit
>Assignee: Jojy Varghese
>  Labels: mesosphere
> Fix For: 0.26.0
>
>
> We updated the mesos version to 0.25.0 in our Marathon docker image, that 
> runs our integration tests.
> We use mesos local for those tests. This fails with this message:
> {noformat}
> root@a06e4b4eb776:/marathon# mesos local
> I1022 18:42:26.852485   136 leveldb.cpp:176] Opened db in 6.103258ms
> I1022 18:42:26.853302   136 leveldb.cpp:183] Compacted db in 765740ns
> I1022 18:42:26.853343   136 leveldb.cpp:198] Created db iterator in 9001ns
> I1022 18:42:26.853355   136 leveldb.cpp:204] Seeked to beginning of db in 
> 1287ns
> I1022 18:42:26.853366   136 leveldb.cpp:273] Iterated through 0 keys in the 
> db in ns
> I1022 18:42:26.853406   136 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1022 18:42:26.853775   141 recover.cpp:449] Starting replica recovery
> I1022 18:42:26.853862   141 recover.cpp:475] Replica is in EMPTY status
> I1022 18:42:26.854751   138 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I1022 18:42:26.854856   140 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1022 18:42:26.855002   140 recover.cpp:566] Updating replica status to 
> STARTING
> I1022 18:42:26.855655   138 master.cpp:376] Master 
> a3f39818-1bda-4710-b96b-2a60ed4d12b8 (a06e4b4eb776) started on 
> 172.17.0.14:5050
> I1022 18:42:26.855680   138 master.cpp:378] Flags at startup: 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="false" 
> --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_slave_ping_timeouts="5" --quiet="false" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" 
> --registry_strict="false" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" 
> --work_dir="/tmp/mesos/local/AK0XpG" --zk_session_timeout="10secs"
> I1022 18:42:26.855790   138 master.cpp:425] Master allowing unauthenticated 
> frameworks to register
> I1022 18:42:26.855803   138 master.cpp:430] Master allowing unauthenticated 
> slaves to register
> I1022 18:42:26.855815   138 master.cpp:467] Using default 'crammd5' 
> authenticator
> W1022 18:42:26.855829   138 authenticator.cpp:505] No credentials provided, 
> authentication requests will be refused
> I1022 18:42:26.855840   138 authenticator.cpp:512] Initializing server SASL
> I1022 18:42:26.856442   136 containerizer.cpp:143] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix
> I1022 18:42:26.856943   140 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.888185ms
> I1022 18:42:26.856987   140 replica.cpp:323] Persisted replica status to 
> STARTING
> I1022 18:42:26.857115   140 recover.cpp:475] Replica is in STARTING status
> I1022 18:42:26.857270   140 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I1022 18:42:26.857312   140 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1022 18:42:26.857368   140 recover.cpp:566] Updating replica status to VOTING
> I1022 18:42:26.857781   140 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 371121ns
> I1022 18:42:26.857841   140 replica.cpp:323] Persisted replica status to 
> VOTING
> I1022 18:42:26.857895   140 recover.cpp:580] Successfully joined the Paxos 
> group
> I1022 18:42:26.857928   140 recover.cpp:464] Recover process terminated
> I1022 18:42:26.862455   137 master.cpp:1603] The newly elected leader is 
> master@172.17.0.14:5050 with id a3f39818-1bda-4710-b96b-2a60ed4d12b8
> I1022 18:42:26.862498   137 master.cpp:1616] Elected as the leading master!
> I1022 18:42:26.862511   137 master.cpp:1376] Recovering from registrar
> I1022 18:42:26.862560   137 registrar.cpp:309] Recovering registrar
> Failed to create a containerizer: Could not create MesosContainerizer: Failed 
> to create launcher: Failed to create 

[jira] [Issue Comment Deleted] (MESOS-3793) Cannot start mesos local on a Debian GNU/Linux 8 docker machine

2016-05-10 Thread igor (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

igor updated MESOS-3793:

Comment: was deleted

(was: same problem on centos 7.2


# mesos-slave --master=build:5050 --containerizers=docker --no-switch_user 
--launcher=posix
I0510 17:30:01.43576227 main.cpp:223] Build: 2016-04-14 15:43:08 by root
I0510 17:30:01.43644727 main.cpp:225] Version: 0.28.1
I0510 17:30:01.43646527 main.cpp:228] Git tag: 0.28.1
I0510 17:30:01.43647527 main.cpp:232] Git SHA: 
555db235a34afbb9fb49940376cc33a66f1f85f0
I0510 17:30:01.44741727 systemd.cpp:236] systemd version `219` detected
I0510 17:30:01.44750727 main.cpp:240] Inializing systemd state
Failed to initialize systemd: Failed to locate systemd runtime directory: 
/run/systemd/system
)

> Cannot start mesos local on a Debian GNU/Linux 8 docker machine
> ---
>
> Key: MESOS-3793
> URL: https://issues.apache.org/jira/browse/MESOS-3793
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
> Environment: Debian GNU/Linux 8 docker machine
>Reporter: Matthias Veit
>Assignee: Jojy Varghese
>  Labels: mesosphere
> Fix For: 0.26.0
>
>
> We updated the mesos version to 0.25.0 in our Marathon docker image, that 
> runs our integration tests.
> We use mesos local for those tests. This fails with this message:
> {noformat}
> root@a06e4b4eb776:/marathon# mesos local
> I1022 18:42:26.852485   136 leveldb.cpp:176] Opened db in 6.103258ms
> I1022 18:42:26.853302   136 leveldb.cpp:183] Compacted db in 765740ns
> I1022 18:42:26.853343   136 leveldb.cpp:198] Created db iterator in 9001ns
> I1022 18:42:26.853355   136 leveldb.cpp:204] Seeked to beginning of db in 
> 1287ns
> I1022 18:42:26.853366   136 leveldb.cpp:273] Iterated through 0 keys in the 
> db in ns
> I1022 18:42:26.853406   136 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1022 18:42:26.853775   141 recover.cpp:449] Starting replica recovery
> I1022 18:42:26.853862   141 recover.cpp:475] Replica is in EMPTY status
> I1022 18:42:26.854751   138 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I1022 18:42:26.854856   140 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1022 18:42:26.855002   140 recover.cpp:566] Updating replica status to 
> STARTING
> I1022 18:42:26.855655   138 master.cpp:376] Master 
> a3f39818-1bda-4710-b96b-2a60ed4d12b8 (a06e4b4eb776) started on 
> 172.17.0.14:5050
> I1022 18:42:26.855680   138 master.cpp:378] Flags at startup: 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="false" 
> --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_slave_ping_timeouts="5" --quiet="false" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" 
> --registry_strict="false" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" 
> --work_dir="/tmp/mesos/local/AK0XpG" --zk_session_timeout="10secs"
> I1022 18:42:26.855790   138 master.cpp:425] Master allowing unauthenticated 
> frameworks to register
> I1022 18:42:26.855803   138 master.cpp:430] Master allowing unauthenticated 
> slaves to register
> I1022 18:42:26.855815   138 master.cpp:467] Using default 'crammd5' 
> authenticator
> W1022 18:42:26.855829   138 authenticator.cpp:505] No credentials provided, 
> authentication requests will be refused
> I1022 18:42:26.855840   138 authenticator.cpp:512] Initializing server SASL
> I1022 18:42:26.856442   136 containerizer.cpp:143] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix
> I1022 18:42:26.856943   140 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.888185ms
> I1022 18:42:26.856987   140 replica.cpp:323] Persisted replica status to 
> STARTING
> I1022 18:42:26.857115   140 recover.cpp:475] Replica is in STARTING status
> I1022 18:42:26.857270   140 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I1022 18:42:26.857312   140 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1022 18:42:26.857368   140 recover.cpp:566] Updating replica status to VOTING
> I1022 18:42:26.857781   140 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 371121ns
> I1022 18:42:26.857841   140 replica.cpp:323] Persisted replica status to 
> VOTING
> I1022 18:42:26.857895   

[jira] [Updated] (MESOS-5240) Command executor may escalate after the task is reaped.

2016-05-10 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-5240:
---
Sprint: Mesosphere Sprint 35

> Command executor may escalate after the task is reaped.
> ---
>
> Key: MESOS-5240
> URL: https://issues.apache.org/jira/browse/MESOS-5240
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27.2, 0.26.1, 0.28.1
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>Priority: Minor
>  Labels: mesosphere
> Fix For: 0.29.0
>
>
> In command executor, {{escalated()}} may be scheduled before the task has 
> been killed, i.e. {{reaped()}}, but called after. In this case 
> {{escalated()}} should be a no-op.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5355) Health check process may be killed multiple times in executors.

2016-05-10 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-5355:
--

 Summary: Health check process may be killed multiple times in 
executors.
 Key: MESOS-5355
 URL: https://issues.apache.org/jira/browse/MESOS-5355
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.28.1, 0.27.2
Reporter: Alexander Rukletsov
Assignee: Alexander Rukletsov


Currently, in both command executors (event api and HTTP api based) and in 
docker executor, we issue a kill to the health check process every time 
{{killTask}} is called. We should either guard it or maybe kill the process 
when the task is reaped. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5242) pivot_root is not available on System z (s390x)

2016-05-10 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278095#comment-15278095
 ] 

Bing Li commented on MESOS-5242:


As https://reviews.apache.org/r/46730/  is already in master branch, I've tried 
it out and it works for s390x. 
Closing this issue.

BTW, [~haosd...@gmail.com] do you mind having a look at the other porting issue 
MESOS-5288 which is realted to leveldb ?
Thanks,

> pivot_root is not available on System z (s390x)
> ---
>
> Key: MESOS-5242
> URL: https://issues.apache.org/jira/browse/MESOS-5242
> Project: Mesos
>  Issue Type: Bug
> Environment: Hardward: IBM System z
> OS: Linux on z SLES12SP1
>Reporter: Bing Li
>Assignee: Bing Li
>
> Got error "pivot_root is not available" which is similar to MESOS-5121 .
> Added syscall pivot_root definition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3435) Add containerizer support for hyper

2016-05-10 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277829#comment-15277829
 ] 

haosdent commented on MESOS-3435:
-

Thanks [~a10gupta] emails me about this today. I have a draft patch for this. I 
would public it if I am available at this weekend.

> Add containerizer support for hyper
> ---
>
> Key: MESOS-3435
> URL: https://issues.apache.org/jira/browse/MESOS-3435
> Project: Mesos
>  Issue Type: Story
>Reporter: Deshi Xiao
>Assignee: haosdent
>
> Secure as hypervisor, fast and easily used as Docker. This is hyper. 
> https://docs.hyper.sh/Introduction/what_is_hyper_.html We could implement 
> this through module way once MESOS-3709 finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5351) DockerVolumeIsolatorTest.ROOT_INTERNET_CURL_CommandTaskRootfsWithVolumes is flaky

2016-05-10 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277792#comment-15277792
 ] 

Guangya Liu commented on MESOS-5351:


Similar with MESOS-4810 , the root cause is that  on some linux distribution, 
'/bin' is not under $PATH when some shell is used. Since the container image 
'alpine' itself does not specify environment variables, $PATH will be inherit 
from the agent. As a result, when we exec, the exec cannot find 'sh' because 
it's under /bin in alpine, but '/bin' is not under $PATH.

> DockerVolumeIsolatorTest.ROOT_INTERNET_CURL_CommandTaskRootfsWithVolumes is 
> flaky
> -
>
> Key: MESOS-5351
> URL: https://issues.apache.org/jira/browse/MESOS-5351
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: GCC 4.9
> CentOS 7 and Fedora 23 (Both SSL or no-SSL)
>Reporter: Joseph Wu
>  Labels: flaky, mesosphere
>
> Consistently fails on Mesosphere internal CI:
> {code}
> [14:38:12] :   [Step 10/10] [ RUN  ] 
> DockerVolumeIsolatorTest.ROOT_INTERNET_CURL_CommandTaskRootfsWithVolumes
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.782032  2386 cluster.cpp:149] 
> Creating default 'local' authorizer
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.786592  2386 leveldb.cpp:174] 
> Opened db in 4.462265ms
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.787979  2386 leveldb.cpp:181] 
> Compacted db in 1.368995ms
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.788007  2386 leveldb.cpp:196] 
> Created db iterator in 4994ns
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.788014  2386 leveldb.cpp:202] 
> Seeked to beginning of db in 724ns
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.788019  2386 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 388ns
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.788031  2386 replica.cpp:779] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.788249  2402 recover.cpp:447] 
> Starting replica recovery
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.788316  2402 recover.cpp:473] 
> Replica is in EMPTY status
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.788684  2406 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request from 
> (18057)@172.30.2.145:48816
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.788744  2405 recover.cpp:193] 
> Received a recover response from a replica in EMPTY status
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.788869  2400 recover.cpp:564] 
> Updating replica status to STARTING
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.789206  2406 master.cpp:383] 
> Master 6c04237d-91d6-4a05-849a-8b46fdeafe76 (ip-172-30-2-145.mesosphere.io) 
> started on 172.30.2.145:48816
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.789216  2406 master.cpp:385] Flags 
> at startup: --acls="" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate="true" 
> --authenticate_http="true" --authenticate_http_frameworks="true" 
> --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/vepf2X/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --max_slave_ping_timeouts="5" --quiet="false" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" 
> --registry_strict="true" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/vepf2X/master" 
> --zk_session_timeout="10secs"
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.789342  2406 master.cpp:434] 
> Master only allowing authenticated frameworks to register
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.789348  2406 master.cpp:440] 
> Master only allowing authenticated agents to register
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.789351  2406 master.cpp:446] 
> Master only allowing authenticated HTTP frameworks to register
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.789355  2406 credentials.hpp:37] 
> Loading credentials for authentication from '/tmp/vepf2X/credentials'
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.789466  2406 master.cpp:490] Using 
> default 'crammd5' authenticator
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.789504  2406 master.cpp:561] Using 
> default 'basic' HTTP authenticator
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.789540  2406 master.cpp:641] Using 
> default 'basic' 

[jira] [Assigned] (MESOS-2201) Make check fails with leveldb > v1.7

2016-05-10 Thread Tomasz Janiszewski (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Janiszewski reassigned MESOS-2201:
-

Assignee: Tomasz Janiszewski

> Make check fails with leveldb > v1.7
> 
>
> Key: MESOS-2201
> URL: https://issues.apache.org/jira/browse/MESOS-2201
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Kapil Arya
>Assignee: Tomasz Janiszewski
>
> I wanted to configure Mesos with system provided leveldb libraries when I ran 
> into this issue. Apparently,  if one does {{../configure 
> --with-leveldb=/path/to/leveldb}}, compilation succeeds, however the 
> "ReplicaTest_Restore" test fails with the following back trace:
> {code}
> [ RUN  ] ReplicaTest.Restore
> Using temporary directory '/tmp/ReplicaTest_Restore_IZbbRR'
> I1222 14:16:49.517500  2927 leveldb.cpp:176] Opened db in 10.758917ms
> I1222 14:16:49.526495  2927 leveldb.cpp:183] Compacted db in 8.931146ms
> I1222 14:16:49.526523  2927 leveldb.cpp:198] Created db iterator in 5787ns
> I1222 14:16:49.526531  2927 leveldb.cpp:204] Seeked to beginning of db in 
> 511ns
> I1222 14:16:49.526535  2927 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 197ns
> I1222 14:16:49.526623  2927 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1222 14:16:49.530972  2945 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 3.084458ms
> I1222 14:16:49.531008  2945 replica.cpp:320] Persisted replica status to 
> VOTING
> I1222 14:16:49.541263  2927 leveldb.cpp:176] Opened db in 9.980586ms
> I1222 14:16:49.551636  2927 leveldb.cpp:183] Compacted db in 10.348096ms
> I1222 14:16:49.551683  2927 leveldb.cpp:198] Created db iterator in 3405ns
> I1222 14:16:49.551693  2927 leveldb.cpp:204] Seeked to beginning of db in 
> 3559ns
> I1222 14:16:49.551728  2927 leveldb.cpp:273] Iterated through 1 keys in the 
> db in 29722ns
> I1222 14:16:49.551751  2927 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1222 14:16:49.551996  2947 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I1222 14:16:49.560921  2947 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 8.899591ms
> I1222 14:16:49.560940  2947 replica.cpp:342] Persisted promised to 1
> I1222 14:16:49.561338  2943 replica.cpp:508] Replica received write request 
> for position 1
> I1222 14:16:49.568677  2943 leveldb.cpp:343] Persisting action (27 bytes) to 
> leveldb took 7.287155ms
> I1222 14:16:49.568692  2943 replica.cpp:676] Persisted action at 1
> I1222 14:16:49.569042  2942 leveldb.cpp:438] Reading position from leveldb 
> took 26339ns
> F1222 14:16:49.569411  2927 replica.cpp:721] CHECK_SOME(state): IO error: 
> lock /tmp/ReplicaTest_Restore_IZbbRR/.log/LOCK: already held by process 
> Failed to recover the log
> *** Check failure stack trace: ***
> @ 0x7f7f6c53e688  google::LogMessage::Fail()
> @ 0x7f7f6c53e5e7  google::LogMessage::SendToLog()
> @ 0x7f7f6c53dff8  google::LogMessage::Flush()
> @ 0x7f7f6c540d2c  google::LogMessageFatal::~LogMessageFatal()
> @   0x90a520  _CheckFatal::~_CheckFatal()
> @ 0x7f7f6c400f4d  mesos::internal::log::ReplicaProcess::restore()
> @ 0x7f7f6c3fd763  
> mesos::internal::log::ReplicaProcess::ReplicaProcess()
> @ 0x7f7f6c401271  mesos::internal::log::Replica::Replica()
> @   0xcd7ca3  ReplicaTest_Restore_Test::TestBody()
> @  0x10934b2  
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x108e584  
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x10768fd  testing::Test::Run()
> @  0x1077020  testing::TestInfo::Run()
> @  0x10775a8  testing::TestCase::Run()
> @  0x107c324  testing::internal::UnitTestImpl::RunAllTests()
> @  0x1094348  
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x108f2b7  
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x107b1d4  testing::UnitTest::Run()
> @   0xd344a9  main
> @ 0x7f7f66fdfb45  __libc_start_main
> @   0x8f3549  (unknown)
> @  (nil)  (unknown)
> [2]2927 abort (core dumped)  GLOG_logtostderr=1 GTEST_v=10 
> ./bin/mesos-tests.sh --verbose
> {code}
> The bundled version of leveldb is v1.4. I tested version 1.5 and that seems 
> to work.  However, v1.6 had some build issues and us unusable with Mesos. The 
> next version v1.7, allows Mesos to compile fine but results in the above 
> error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2201) Make check fails with leveldb > v1.7

2016-05-10 Thread Tomasz Janiszewski (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277758#comment-15277758
 ] 

Tomasz Janiszewski commented on MESOS-2201:
---

Review: https://reviews.apache.org/r/47161/

> Make check fails with leveldb > v1.7
> 
>
> Key: MESOS-2201
> URL: https://issues.apache.org/jira/browse/MESOS-2201
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Kapil Arya
>
> I wanted to configure Mesos with system provided leveldb libraries when I ran 
> into this issue. Apparently,  if one does {{../configure 
> --with-leveldb=/path/to/leveldb}}, compilation succeeds, however the 
> "ReplicaTest_Restore" test fails with the following back trace:
> {code}
> [ RUN  ] ReplicaTest.Restore
> Using temporary directory '/tmp/ReplicaTest_Restore_IZbbRR'
> I1222 14:16:49.517500  2927 leveldb.cpp:176] Opened db in 10.758917ms
> I1222 14:16:49.526495  2927 leveldb.cpp:183] Compacted db in 8.931146ms
> I1222 14:16:49.526523  2927 leveldb.cpp:198] Created db iterator in 5787ns
> I1222 14:16:49.526531  2927 leveldb.cpp:204] Seeked to beginning of db in 
> 511ns
> I1222 14:16:49.526535  2927 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 197ns
> I1222 14:16:49.526623  2927 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1222 14:16:49.530972  2945 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 3.084458ms
> I1222 14:16:49.531008  2945 replica.cpp:320] Persisted replica status to 
> VOTING
> I1222 14:16:49.541263  2927 leveldb.cpp:176] Opened db in 9.980586ms
> I1222 14:16:49.551636  2927 leveldb.cpp:183] Compacted db in 10.348096ms
> I1222 14:16:49.551683  2927 leveldb.cpp:198] Created db iterator in 3405ns
> I1222 14:16:49.551693  2927 leveldb.cpp:204] Seeked to beginning of db in 
> 3559ns
> I1222 14:16:49.551728  2927 leveldb.cpp:273] Iterated through 1 keys in the 
> db in 29722ns
> I1222 14:16:49.551751  2927 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1222 14:16:49.551996  2947 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I1222 14:16:49.560921  2947 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 8.899591ms
> I1222 14:16:49.560940  2947 replica.cpp:342] Persisted promised to 1
> I1222 14:16:49.561338  2943 replica.cpp:508] Replica received write request 
> for position 1
> I1222 14:16:49.568677  2943 leveldb.cpp:343] Persisting action (27 bytes) to 
> leveldb took 7.287155ms
> I1222 14:16:49.568692  2943 replica.cpp:676] Persisted action at 1
> I1222 14:16:49.569042  2942 leveldb.cpp:438] Reading position from leveldb 
> took 26339ns
> F1222 14:16:49.569411  2927 replica.cpp:721] CHECK_SOME(state): IO error: 
> lock /tmp/ReplicaTest_Restore_IZbbRR/.log/LOCK: already held by process 
> Failed to recover the log
> *** Check failure stack trace: ***
> @ 0x7f7f6c53e688  google::LogMessage::Fail()
> @ 0x7f7f6c53e5e7  google::LogMessage::SendToLog()
> @ 0x7f7f6c53dff8  google::LogMessage::Flush()
> @ 0x7f7f6c540d2c  google::LogMessageFatal::~LogMessageFatal()
> @   0x90a520  _CheckFatal::~_CheckFatal()
> @ 0x7f7f6c400f4d  mesos::internal::log::ReplicaProcess::restore()
> @ 0x7f7f6c3fd763  
> mesos::internal::log::ReplicaProcess::ReplicaProcess()
> @ 0x7f7f6c401271  mesos::internal::log::Replica::Replica()
> @   0xcd7ca3  ReplicaTest_Restore_Test::TestBody()
> @  0x10934b2  
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x108e584  
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x10768fd  testing::Test::Run()
> @  0x1077020  testing::TestInfo::Run()
> @  0x10775a8  testing::TestCase::Run()
> @  0x107c324  testing::internal::UnitTestImpl::RunAllTests()
> @  0x1094348  
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x108f2b7  
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x107b1d4  testing::UnitTest::Run()
> @   0xd344a9  main
> @ 0x7f7f66fdfb45  __libc_start_main
> @   0x8f3549  (unknown)
> @  (nil)  (unknown)
> [2]2927 abort (core dumped)  GLOG_logtostderr=1 GTEST_v=10 
> ./bin/mesos-tests.sh --verbose
> {code}
> The bundled version of leveldb is v1.4. I tested version 1.5 and that seems 
> to work.  However, v1.6 had some build issues and us unusable with Mesos. The 
> next version v1.7, allows Mesos to compile fine but results in the above 
> error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5351) DockerVolumeIsolatorTest.ROOT_INTERNET_CURL_CommandTaskRootfsWithVolumes is flaky

2016-05-10 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277753#comment-15277753
 ] 

Guangya Liu commented on MESOS-5351:


Set up a environment on CentOS 7.1 and the test always failed, checked the 
output of {{stderr}} and {{stdout}}, found that seems the task command always 
failed due to the binary cannot be found..

{code}
[root@mesos-24 latest]# cat stderr 
+ /root/src/mesos/build/src/mesos-containerizer mount --help=false 
--operation=make-rslave --path=/
+ mount -n --rbind 
/tmp/DockerVolumeIsolatorTest_ROOT_INTERNET_CURL_CommandTaskRootfsWithVolumes_uf7nqr/slaves/4681e398-5b36-4849-832e-382fc7a373f5-S0/frameworks/4681e398-5b36-4849-832e-382fc7a373f5-/executors/4c69109a-5948-4bf2-a9d9-146ea6208acc/runs/e377e445-1479-4661-852f-9c8f9d7bf036
 
/tmp/DockerVolumeIsolatorTest_ROOT_INTERNET_CURL_CommandTaskRootfsWithVolumes_uf7nqr/provisioner/containers/e377e445-1479-4661-852f-9c8f9d7bf036/backends/copy/rootfses/bbb9e1db-abdf-45d0-97fa-7d8bb445533a/mnt/mesos/sandbox
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0510 03:37:24.861348   698 process.cpp:1022] libprocess is initialized on 
9.21.51.124:38163 with 8 worker threads
I0510 03:37:24.861948   698 logging.cpp:195] Logging to STDERR
I0510 03:37:24.863306   698 exec.cpp:150] Version: 0.29.0
I0510 03:37:24.868268   759 exec.cpp:200] Executor started at: 
executor(1)@9.21.51.124:38163 with pid 698
I0510 03:37:24.872148   758 exec.cpp:225] Executor registered on agent 
4681e398-5b36-4849-832e-382fc7a373f5-S0
I0510 03:37:24.873325   758 exec.cpp:237] Executor::registered took 283799ns
I0510 03:37:24.873634   758 exec.cpp:312] Executor asked to run task 
'4c69109a-5948-4bf2-a9d9-146ea6208acc'
I0510 03:37:24.873761   758 exec.cpp:321] Executor::launchTask took 100260ns
I0510 03:37:24.877316   761 exec.cpp:535] Executor sending status update 
TASK_RUNNING (UUID: 7e23419c-41d5-4f58-9343-2db169d5be4c) for task 
4c69109a-5948-4bf2-a9d9-146ea6208acc of framework 
4681e398-5b36-4849-832e-382fc7a373f5-
Failed to exec: No such file or directory
I0510 03:37:24.881904   757 exec.cpp:358] Executor received status update 
acknowledgement 7e23419c-41d5-4f58-9343-2db169d5be4c for task 
4c69109a-5948-4bf2-a9d9-146ea6208acc of framework 
4681e398-5b36-4849-832e-382fc7a373f5-
I0510 03:37:24.978173   756 exec.cpp:535] Executor sending status update 
TASK_FAILED (UUID: 41ed123f-a7ca-4e6e-9221-bbb3f9e74967) for task 
4c69109a-5948-4bf2-a9d9-146ea6208acc of framework 
4681e398-5b36-4849-832e-382fc7a373f5-
I0510 03:37:24.984695   762 exec.cpp:358] Executor received status update 
acknowledgement 41ed123f-a7ca-4e6e-9221-bbb3f9e74967 for task 
4c69109a-5948-4bf2-a9d9-146ea6208acc of framework 
4681e398-5b36-4849-832e-382fc7a373f5-
[root@mesos-24 latest]# cat stdout 
Registered executor on mesos-24.eng.platformlab.ibm.com
Starting task 4c69109a-5948-4bf2-a9d9-146ea6208acc
Forked command at 765
sh -c 'test -f tmp/foo1/file1 && test -f /tmp/M4PhRu/foo2/file2;'
Command terminated with signal Aborted (pid: 765)
{code}

> DockerVolumeIsolatorTest.ROOT_INTERNET_CURL_CommandTaskRootfsWithVolumes is 
> flaky
> -
>
> Key: MESOS-5351
> URL: https://issues.apache.org/jira/browse/MESOS-5351
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: GCC 4.9
> CentOS 7 and Fedora 23 (Both SSL or no-SSL)
>Reporter: Joseph Wu
>  Labels: flaky, mesosphere
>
> Consistently fails on Mesosphere internal CI:
> {code}
> [14:38:12] :   [Step 10/10] [ RUN  ] 
> DockerVolumeIsolatorTest.ROOT_INTERNET_CURL_CommandTaskRootfsWithVolumes
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.782032  2386 cluster.cpp:149] 
> Creating default 'local' authorizer
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.786592  2386 leveldb.cpp:174] 
> Opened db in 4.462265ms
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.787979  2386 leveldb.cpp:181] 
> Compacted db in 1.368995ms
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.788007  2386 leveldb.cpp:196] 
> Created db iterator in 4994ns
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.788014  2386 leveldb.cpp:202] 
> Seeked to beginning of db in 724ns
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.788019  2386 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 388ns
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.788031  2386 replica.cpp:779] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.788249  2402 recover.cpp:447] 
> Starting replica recovery
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.788316  2402 recover.cpp:473] 
> Replica is in EMPTY status
> [14:38:12]W:   [Step 10/10] I0509 14:38:12.788684  2406 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request 

[jira] [Commented] (MESOS-1575) master sets failover timeout to 0 when framework requests a high value

2016-05-10 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277729#comment-15277729
 ] 

Neil Conway commented on MESOS-1575:


lgtm -- [~vinodkone] has been on vacation, but I suspect he'll take a look at 
it when he gets back.

> master sets failover timeout to 0 when framework requests a high value
> --
>
> Key: MESOS-1575
> URL: https://issues.apache.org/jira/browse/MESOS-1575
> Project: Mesos
>  Issue Type: Bug
>Reporter: Kevin Sweeney
>Assignee: José Guilherme Vanz
>  Labels: newbie, twitter
>
> In response to a registered RPC we observed the following behavior:
> {noformat}
> W0709 19:07:32.982997 11400 master.cpp:612] Using the default value for 
> 'failover_timeout' becausethe input value is invalid: Argument out of the 
> range that a Duration can represent due to int64_t's size limit
> I0709 19:07:32.983008 11404 hierarchical_allocator_process.hpp:408] 
> Deactivated framework 20140709-184342-119646400-5050-11380-0003
> I0709 19:07:32.983013 11400 master.cpp:617] Giving framework 
> 20140709-184342-119646400-5050-11380-0003 0ns to failover
> I0709 19:07:32.983271 11404 master.cpp:2201] Framework failover timeout, 
> removing framework 20140709-184342-119646400-5050-11380-0003
> I0709 19:07:32.983294 11404 master.cpp:2688] Removing framework 
> 20140709-184342-119646400-5050-11380-0003
> I0709 19:07:32.983678 11404 hierarchical_allocator_process.hpp:363] Removed 
> framework 20140709-184342-119646400-5050-11380-0003
> {noformat}
> This was using the following frameworkInfo.
> {code}
> FrameworkInfo frameworkInfo = FrameworkInfo.newBuilder()
> .setUser("test")
> .setName("jvm")
> .setFailoverTimeout(Long.MAX_VALUE)
> .build();
> {code}
> Instead of silently defaulting large values to 0 the master should refuse to 
> process the request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5343) Behavior of custom HTTP authenticators with disabled HTTP authentication is inconsistent between master and agent

2016-05-10 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-5343:
--
Labels: mesosphere security  (was: )

> Behavior of custom HTTP authenticators with disabled HTTP authentication is 
> inconsistent between master and agent
> -
>
> Key: MESOS-5343
> URL: https://issues.apache.org/jira/browse/MESOS-5343
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.29.0
>Reporter: Benjamin Bannier
>Priority: Minor
>  Labels: mesosphere, security
> Fix For: 0.29.0
>
>
> When setting a custom authenticator with {{http_authenticators}} and also 
> specifying {{authenticate_http=false}} currently agents refuse to start with
> {code}
> A custom HTTP authenticator was specified with the '--http_authenticators' 
> flag, but HTTP authentication was not enabled via '--authenticate_http'
> {code}
> Masters on the other hand accept this setting.
> Having differing behavior between master and agents is confusing, and we 
> should decide on whether we want to accept these settings or not, and make 
> the implementations consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5343) Behavior of custom HTTP authenticators with disabled HTTP authentication is inconsistent between master and agent

2016-05-10 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-5343:
--
Affects Version/s: 0.29.0

> Behavior of custom HTTP authenticators with disabled HTTP authentication is 
> inconsistent between master and agent
> -
>
> Key: MESOS-5343
> URL: https://issues.apache.org/jira/browse/MESOS-5343
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.29.0
>Reporter: Benjamin Bannier
>Priority: Minor
>  Labels: mesosphere, security
> Fix For: 0.29.0
>
>
> When setting a custom authenticator with {{http_authenticators}} and also 
> specifying {{authenticate_http=false}} currently agents refuse to start with
> {code}
> A custom HTTP authenticator was specified with the '--http_authenticators' 
> flag, but HTTP authentication was not enabled via '--authenticate_http'
> {code}
> Masters on the other hand accept this setting.
> Having differing behavior between master and agents is confusing, and we 
> should decide on whether we want to accept these settings or not, and make 
> the implementations consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5343) Behavior of custom HTTP authenticators with disabled HTTP authentication is inconsistent between master and agent

2016-05-10 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277678#comment-15277678
 ] 

Adam B commented on MESOS-5343:
---

Quite an annoying inconsistency. I would think that we should allow users to 
load a custom http_authenticator without requiring them to enable 
authentication. We could WARN in that case rather than EXIT.
However, agent http authentication was added later, so I would think that the 
deviation was intentional, with an intent to change master to match. I'd be 
fine with either, but we should try to be consistent.

> Behavior of custom HTTP authenticators with disabled HTTP authentication is 
> inconsistent between master and agent
> -
>
> Key: MESOS-5343
> URL: https://issues.apache.org/jira/browse/MESOS-5343
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Bannier
> Fix For: 0.29.0
>
>
> When setting a custom authenticator with {{http_authenticators}} and also 
> specifying {{authenticate_http=false}} currently agents refuse to start with
> {code}
> A custom HTTP authenticator was specified with the '--http_authenticators' 
> flag, but HTTP authentication was not enabled via '--authenticate_http'
> {code}
> Masters on the other hand accept this setting.
> Having differing behavior between master and agents is confusing, and we 
> should decide on whether we want to accept these settings or not, and make 
> the implementations consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5343) Behavior of custom HTTP authenticators with disabled HTTP authentication is inconsistent between master and agent

2016-05-10 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-5343:
--
Priority: Minor  (was: Major)

> Behavior of custom HTTP authenticators with disabled HTTP authentication is 
> inconsistent between master and agent
> -
>
> Key: MESOS-5343
> URL: https://issues.apache.org/jira/browse/MESOS-5343
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Bannier
>Priority: Minor
> Fix For: 0.29.0
>
>
> When setting a custom authenticator with {{http_authenticators}} and also 
> specifying {{authenticate_http=false}} currently agents refuse to start with
> {code}
> A custom HTTP authenticator was specified with the '--http_authenticators' 
> flag, but HTTP authentication was not enabled via '--authenticate_http'
> {code}
> Masters on the other hand accept this setting.
> Having differing behavior between master and agents is confusing, and we 
> should decide on whether we want to accept these settings or not, and make 
> the implementations consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5343) Behavior of custom HTTP authenticators with disabled HTTP authentication is inconsistent between master and agent

2016-05-10 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-5343:
--
Fix Version/s: 0.29.0

> Behavior of custom HTTP authenticators with disabled HTTP authentication is 
> inconsistent between master and agent
> -
>
> Key: MESOS-5343
> URL: https://issues.apache.org/jira/browse/MESOS-5343
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Bannier
> Fix For: 0.29.0
>
>
> When setting a custom authenticator with {{http_authenticators}} and also 
> specifying {{authenticate_http=false}} currently agents refuse to start with
> {code}
> A custom HTTP authenticator was specified with the '--http_authenticators' 
> flag, but HTTP authentication was not enabled via '--authenticate_http'
> {code}
> Masters on the other hand accept this setting.
> Having differing behavior between master and agents is confusing, and we 
> should decide on whether we want to accept these settings or not, and make 
> the implementations consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)