[jira] [Commented] (MESOS-5310) Enable `network/cni` isolator to load CNI config at runtime.

2016-04-29 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265108#comment-15265108
 ] 

Guangya Liu commented on MESOS-5310:


Thanks [~avin...@mesosphere.io] got ur point, here you only want to re-load the 
configurations but not the isolator itself.

> Enable `network/cni` isolator to load CNI config at runtime. 
> -
>
> Key: MESOS-5310
> URL: https://issues.apache.org/jira/browse/MESOS-5310
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
> Environment: linux
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> Currently the `network/cni` isolator can only load the CNI configs at 
> startup. This makes the CNI networks immutable. From an operational 
> standpoint this can make deployments painful for operators. 
> To make CNI more flexible the `network/cni` isolator should be able to load 
> configs at run time. 
> The proposal is to add an endpoint to the `network/cni` isolator, to which 
> when the operator sends a PUT request the `network/cni` isolator will reload  
> CNI configs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5310) Enable `network/cni` isolator to load CNI config at runtime.

2016-04-29 Thread Avinash Sridharan (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265103#comment-15265103
 ] 

Avinash Sridharan commented on MESOS-5310:
--

I am not sure a single end point would serve any purpose here. The endpoints 
need to be canonical, since the operator would want to re-bootstrap each of 
these entities (isolators) individually (there is no dependency between them). 
This implies that if we have a single end point something will have route the 
requests to the right isolator (probably the `MesosContainerizer`), which will 
automatically happen at the `libprocess` level if we have separate end points 
for each isolator.

> Enable `network/cni` isolator to load CNI config at runtime. 
> -
>
> Key: MESOS-5310
> URL: https://issues.apache.org/jira/browse/MESOS-5310
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
> Environment: linux
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> Currently the `network/cni` isolator can only load the CNI configs at 
> startup. This makes the CNI networks immutable. From an operational 
> standpoint this can make deployments painful for operators. 
> To make CNI more flexible the `network/cni` isolator should be able to load 
> configs at run time. 
> The proposal is to add an endpoint to the `network/cni` isolator, to which 
> when the operator sends a PUT request the `network/cni` isolator will reload  
> CNI configs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-5310) Enable `network/cni` isolator to load CNI config at runtime.

2016-04-29 Thread Avinash Sridharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan updated MESOS-5310:
-
Comment: was deleted

(was: I am not sure a single end point serve any purpose here. The endpoints 
need to be canonical since the operator would want to rebootstrap each of these 
entities (isolators) individually (there is no dependency between them. This 
implies that if we have a single end point something will have route the 
requests to right isolator (probably the `MesosContainerizer`), which will 
automatically happen if at the `libprocess` level if we separate end points for 
each isolator. )

> Enable `network/cni` isolator to load CNI config at runtime. 
> -
>
> Key: MESOS-5310
> URL: https://issues.apache.org/jira/browse/MESOS-5310
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
> Environment: linux
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> Currently the `network/cni` isolator can only load the CNI configs at 
> startup. This makes the CNI networks immutable. From an operational 
> standpoint this can make deployments painful for operators. 
> To make CNI more flexible the `network/cni` isolator should be able to load 
> configs at run time. 
> The proposal is to add an endpoint to the `network/cni` isolator, to which 
> when the operator sends a PUT request the `network/cni` isolator will reload  
> CNI configs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5310) Enable `network/cni` isolator to load CNI config at runtime.

2016-04-29 Thread Avinash Sridharan (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265099#comment-15265099
 ] 

Avinash Sridharan commented on MESOS-5310:
--

I am not sure a single end point serve any purpose here. The endpoints need to 
be canonical since the operator would want to rebootstrap each of these 
entities (isolators) individually (there is no dependency between them. This 
implies that if we have a single end point something will have route the 
requests to right isolator (probably the `MesosContainerizer`), which will 
automatically happen if at the `libprocess` level if we separate end points for 
each isolator. 

> Enable `network/cni` isolator to load CNI config at runtime. 
> -
>
> Key: MESOS-5310
> URL: https://issues.apache.org/jira/browse/MESOS-5310
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
> Environment: linux
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> Currently the `network/cni` isolator can only load the CNI configs at 
> startup. This makes the CNI networks immutable. From an operational 
> standpoint this can make deployments painful for operators. 
> To make CNI more flexible the `network/cni` isolator should be able to load 
> configs at run time. 
> The proposal is to add an endpoint to the `network/cni` isolator, to which 
> when the operator sends a PUT request the `network/cni` isolator will reload  
> CNI configs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5310) Enable `network/cni` isolator to load CNI config at runtime.

2016-04-29 Thread Avinash Sridharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan updated MESOS-5310:
-
Story Points: 1
  Labels: mesosphere  (was: )

> Enable `network/cni` isolator to load CNI config at runtime. 
> -
>
> Key: MESOS-5310
> URL: https://issues.apache.org/jira/browse/MESOS-5310
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
> Environment: linux
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> Currently the `network/cni` isolator can only load the CNI configs at 
> startup. This makes the CNI networks immutable. From an operational 
> standpoint this can make deployments painful for operators. 
> To make CNI more flexible the `network/cni` isolator should be able to load 
> configs at run time. 
> The proposal is to add an endpoint to the `network/cni` isolator, to which 
> when the operator sends a PUT request the `network/cni` isolator will reload  
> CNI configs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5310) Enable `network/cni` isolator to load CNI config at runtime.

2016-04-29 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265093#comment-15265093
 ] 

Guangya Liu commented on MESOS-5310:


Does it make sense to make this a common endpoint can work for different 
isolators, such as network, volume etc? I think that there will also be same 
requirement for docker/volume isolator.

> Enable `network/cni` isolator to load CNI config at runtime. 
> -
>
> Key: MESOS-5310
> URL: https://issues.apache.org/jira/browse/MESOS-5310
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
> Environment: linux
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>
> Currently the `network/cni` isolator can only load the CNI configs at 
> startup. This makes the CNI networks immutable. From an operational 
> standpoint this can make deployments painful for operators. 
> To make CNI more flexible the `network/cni` isolator should be able to load 
> configs at run time. 
> The proposal is to add an endpoint to the `network/cni` isolator, to which 
> when the operator sends a PUT request the `network/cni` isolator will reload  
> CNI configs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5193) Recovery failed: Failed to recover registrar on reboot of mesos master

2016-04-29 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-5193:
---
Attachment: full.log

I've attached an interleaved version of the log where each line is prefixed 
with the node number. You can see the recovery failures of node3 then node1 
then node2 towards the end.

Interestingly, I took a look with [~jieyu] and it appears there may have been 
some message loss, or connectivity issues:

(1) when node3 gets elected, node2 appears to be offline, it broadcasts an 
implicit promise request to node3 (itself) and node1. *This message is not 
received by node1 for some reason.*

(2) after node3 dies, node1 broadcasts an implicit promise request to node1 
(itself) and node2. *This message is not received by node2 for some reason.*

After this point, only node2 remains, and we do not have quorum.

{quote}
Although, once a master process gets killed the service gets terminated as well.
{quote}

Can you fix that so that the masters are restarted? That is a requirement for 
running HA masters, otherwise we cannot maintain a quorum.

> Recovery failed: Failed to recover registrar on reboot of mesos master
> --
>
> Key: MESOS-5193
> URL: https://issues.apache.org/jira/browse/MESOS-5193
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.22.0, 0.27.0
>Reporter: Priyanka Gupta
>  Labels: master, mesosphere
> Attachments: full.log, node1.log, node1_after_work_dir.log, 
> node2.log, node2_after_work_dir.log, node3.log, node3_after_work_dir.log
>
>
> Hi all, 
> We are using a 3 node cluster with mesos master, mesos slave and zookeeper on 
> all of them. We are using chronos on top of it. The problem is when we reboot 
> the mesos master leader, the other nodes try to get elected as leader but 
> fail with recovery registrar issue. 
> "Recovery failed: Failed to recover registrar: Failed to perform fetch within 
> 1mins"
> The next node then try to become the leader but again fails with same error. 
> I am not sure about the issue. We are currently using mesos 0.22 and also 
> tried to upgrade to mesos 0.27 as well but the problem continues to happen. 
>  /usr/sbin/mesos-master --work_dir=/tmp/mesos_dir 
> --zk=zk://node1:2181,node2:2181,node3:2181/mesos --quorum=2
> Can you please help us resolve this issue as its a production system.
> Thanks,
> Priyanka



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5312) Env `MESOS_SANDBOX` is not set properly for command tasks that changes rootfs.

2016-04-29 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-5312:
--
Fix Version/s: 0.29.0

> Env `MESOS_SANDBOX` is not set properly for command tasks that changes rootfs.
> --
>
> Key: MESOS-5312
> URL: https://issues.apache.org/jira/browse/MESOS-5312
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.0, 0.28.1
>Reporter: Jie Yu
>  Labels: mesosphere
> Fix For: 0.29.0, 0.28.2
>
>
> This is in the context of Mesos containerizer (a.k.a., unified containerizer).
> I did a simple test:
> {noformat}
> sudo sbin/mesos-master --work_dir=/tmp/mesos/master
> sudo GLOG_v=1 sbin/mesos-slave --master=10.0.2.15:5050 
> --isolation=docker/runtime,filesystem/linux --work_dir=/tmp/mesos/slave/ 
> --image_providers=docker --executor_environment_variables="{}"
> sudo bin/mesos-execute --master=10.0.2.15:5050 --name=test 
> --docker_image=alpine --command="env" 
> MESOS_EXECUTOR_ID=test
> SHLVL=1
> MESOS_CHECKPOINT=0
> MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD=5secs
> LIBPROCESS_PORT=0
> MESOS_AGENT_ENDPOINT=10.0.2.15:5051
> MESOS_SANDBOX=/tmp/mesos/slave/slaves/2d7e44bb-3282-4193-bdc4-eeab9e0943c2-S0/frameworks/1a1cad18-2d87-43dd-97b6-1dbf2d229061-/executors/test/runs/bb8dd72c-fb4c-426a-be18-51b0621339f6
> MESOS_NATIVE_JAVA_LIBRARY=/home/vagrant/dist/mesos/lib/libmesos-0.29.0.so
> MESOS_FRAMEWORK_ID=1a1cad18-2d87-43dd-97b6-1dbf2d229061-
> MESOS_SLAVE_ID=2d7e44bb-3282-4193-bdc4-eeab9e0943c2-S0
> MESOS_NATIVE_LIBRARY=/home/vagrant/dist/mesos/lib/libmesos-0.29.0.so
> MESOS_DIRECTORY=/tmp/mesos/slave/slaves/2d7e44bb-3282-4193-bdc4-eeab9e0943c2-S0/frameworks/1a1cad18-2d87-43dd-97b6-1dbf2d229061-/executors/test/runs/bb8dd72c-fb4c-426a-be18-51b0621339f6
> PWD=/mnt/mesos/sandbox
> MESOS_SLAVE_PID=slave(1)@10.0.2.15:5051
> {noformat}
> `MESOS_SANDBOX` above should be `/mnt/mesos/sandbox`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5312) Env `MESOS_SANDBOX` is not set properly for command tasks that changes rootfs.

2016-04-29 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-5312:
--
Labels: mesosphere  (was: )

> Env `MESOS_SANDBOX` is not set properly for command tasks that changes rootfs.
> --
>
> Key: MESOS-5312
> URL: https://issues.apache.org/jira/browse/MESOS-5312
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.0, 0.28.1
>Reporter: Jie Yu
>  Labels: mesosphere
> Fix For: 0.29.0, 0.28.2
>
>
> This is in the context of Mesos containerizer (a.k.a., unified containerizer).
> I did a simple test:
> {noformat}
> sudo sbin/mesos-master --work_dir=/tmp/mesos/master
> sudo GLOG_v=1 sbin/mesos-slave --master=10.0.2.15:5050 
> --isolation=docker/runtime,filesystem/linux --work_dir=/tmp/mesos/slave/ 
> --image_providers=docker --executor_environment_variables="{}"
> sudo bin/mesos-execute --master=10.0.2.15:5050 --name=test 
> --docker_image=alpine --command="env" 
> MESOS_EXECUTOR_ID=test
> SHLVL=1
> MESOS_CHECKPOINT=0
> MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD=5secs
> LIBPROCESS_PORT=0
> MESOS_AGENT_ENDPOINT=10.0.2.15:5051
> MESOS_SANDBOX=/tmp/mesos/slave/slaves/2d7e44bb-3282-4193-bdc4-eeab9e0943c2-S0/frameworks/1a1cad18-2d87-43dd-97b6-1dbf2d229061-/executors/test/runs/bb8dd72c-fb4c-426a-be18-51b0621339f6
> MESOS_NATIVE_JAVA_LIBRARY=/home/vagrant/dist/mesos/lib/libmesos-0.29.0.so
> MESOS_FRAMEWORK_ID=1a1cad18-2d87-43dd-97b6-1dbf2d229061-
> MESOS_SLAVE_ID=2d7e44bb-3282-4193-bdc4-eeab9e0943c2-S0
> MESOS_NATIVE_LIBRARY=/home/vagrant/dist/mesos/lib/libmesos-0.29.0.so
> MESOS_DIRECTORY=/tmp/mesos/slave/slaves/2d7e44bb-3282-4193-bdc4-eeab9e0943c2-S0/frameworks/1a1cad18-2d87-43dd-97b6-1dbf2d229061-/executors/test/runs/bb8dd72c-fb4c-426a-be18-51b0621339f6
> PWD=/mnt/mesos/sandbox
> MESOS_SLAVE_PID=slave(1)@10.0.2.15:5051
> {noformat}
> `MESOS_SANDBOX` above should be `/mnt/mesos/sandbox`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5312) Env `MESOS_SANDBOX` is not set properly for command tasks that changes rootfs.

2016-04-29 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-5312:
--
Fix Version/s: 0.28.2

> Env `MESOS_SANDBOX` is not set properly for command tasks that changes rootfs.
> --
>
> Key: MESOS-5312
> URL: https://issues.apache.org/jira/browse/MESOS-5312
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.0, 0.28.1
>Reporter: Jie Yu
>  Labels: mesosphere
> Fix For: 0.29.0, 0.28.2
>
>
> This is in the context of Mesos containerizer (a.k.a., unified containerizer).
> I did a simple test:
> {noformat}
> sudo sbin/mesos-master --work_dir=/tmp/mesos/master
> sudo GLOG_v=1 sbin/mesos-slave --master=10.0.2.15:5050 
> --isolation=docker/runtime,filesystem/linux --work_dir=/tmp/mesos/slave/ 
> --image_providers=docker --executor_environment_variables="{}"
> sudo bin/mesos-execute --master=10.0.2.15:5050 --name=test 
> --docker_image=alpine --command="env" 
> MESOS_EXECUTOR_ID=test
> SHLVL=1
> MESOS_CHECKPOINT=0
> MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD=5secs
> LIBPROCESS_PORT=0
> MESOS_AGENT_ENDPOINT=10.0.2.15:5051
> MESOS_SANDBOX=/tmp/mesos/slave/slaves/2d7e44bb-3282-4193-bdc4-eeab9e0943c2-S0/frameworks/1a1cad18-2d87-43dd-97b6-1dbf2d229061-/executors/test/runs/bb8dd72c-fb4c-426a-be18-51b0621339f6
> MESOS_NATIVE_JAVA_LIBRARY=/home/vagrant/dist/mesos/lib/libmesos-0.29.0.so
> MESOS_FRAMEWORK_ID=1a1cad18-2d87-43dd-97b6-1dbf2d229061-
> MESOS_SLAVE_ID=2d7e44bb-3282-4193-bdc4-eeab9e0943c2-S0
> MESOS_NATIVE_LIBRARY=/home/vagrant/dist/mesos/lib/libmesos-0.29.0.so
> MESOS_DIRECTORY=/tmp/mesos/slave/slaves/2d7e44bb-3282-4193-bdc4-eeab9e0943c2-S0/frameworks/1a1cad18-2d87-43dd-97b6-1dbf2d229061-/executors/test/runs/bb8dd72c-fb4c-426a-be18-51b0621339f6
> PWD=/mnt/mesos/sandbox
> MESOS_SLAVE_PID=slave(1)@10.0.2.15:5051
> {noformat}
> `MESOS_SANDBOX` above should be `/mnt/mesos/sandbox`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5312) Env `MESOS_SANDBOX` is not set properly for command tasks that changes rootfs.

2016-04-29 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-5312:
--
Affects Version/s: 0.28.0
   0.28.1

> Env `MESOS_SANDBOX` is not set properly for command tasks that changes rootfs.
> --
>
> Key: MESOS-5312
> URL: https://issues.apache.org/jira/browse/MESOS-5312
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.0, 0.28.1
>Reporter: Jie Yu
>  Labels: mesosphere
> Fix For: 0.29.0, 0.28.2
>
>
> This is in the context of Mesos containerizer (a.k.a., unified containerizer).
> I did a simple test:
> {noformat}
> sudo sbin/mesos-master --work_dir=/tmp/mesos/master
> sudo GLOG_v=1 sbin/mesos-slave --master=10.0.2.15:5050 
> --isolation=docker/runtime,filesystem/linux --work_dir=/tmp/mesos/slave/ 
> --image_providers=docker --executor_environment_variables="{}"
> sudo bin/mesos-execute --master=10.0.2.15:5050 --name=test 
> --docker_image=alpine --command="env" 
> MESOS_EXECUTOR_ID=test
> SHLVL=1
> MESOS_CHECKPOINT=0
> MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD=5secs
> LIBPROCESS_PORT=0
> MESOS_AGENT_ENDPOINT=10.0.2.15:5051
> MESOS_SANDBOX=/tmp/mesos/slave/slaves/2d7e44bb-3282-4193-bdc4-eeab9e0943c2-S0/frameworks/1a1cad18-2d87-43dd-97b6-1dbf2d229061-/executors/test/runs/bb8dd72c-fb4c-426a-be18-51b0621339f6
> MESOS_NATIVE_JAVA_LIBRARY=/home/vagrant/dist/mesos/lib/libmesos-0.29.0.so
> MESOS_FRAMEWORK_ID=1a1cad18-2d87-43dd-97b6-1dbf2d229061-
> MESOS_SLAVE_ID=2d7e44bb-3282-4193-bdc4-eeab9e0943c2-S0
> MESOS_NATIVE_LIBRARY=/home/vagrant/dist/mesos/lib/libmesos-0.29.0.so
> MESOS_DIRECTORY=/tmp/mesos/slave/slaves/2d7e44bb-3282-4193-bdc4-eeab9e0943c2-S0/frameworks/1a1cad18-2d87-43dd-97b6-1dbf2d229061-/executors/test/runs/bb8dd72c-fb4c-426a-be18-51b0621339f6
> PWD=/mnt/mesos/sandbox
> MESOS_SLAVE_PID=slave(1)@10.0.2.15:5051
> {noformat}
> `MESOS_SANDBOX` above should be `/mnt/mesos/sandbox`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5312) Env `MESOS_SANDBOX` is not set properly for command tasks that changes rootfs.

2016-04-29 Thread Jie Yu (JIRA)
Jie Yu created MESOS-5312:
-

 Summary: Env `MESOS_SANDBOX` is not set properly for command tasks 
that changes rootfs.
 Key: MESOS-5312
 URL: https://issues.apache.org/jira/browse/MESOS-5312
 Project: Mesos
  Issue Type: Bug
Reporter: Jie Yu


This is in the context of Mesos containerizer (a.k.a., unified containerizer).

I did a simple test:
{noformat}
sudo sbin/mesos-master --work_dir=/tmp/mesos/master
sudo GLOG_v=1 sbin/mesos-slave --master=10.0.2.15:5050 
--isolation=docker/runtime,filesystem/linux --work_dir=/tmp/mesos/slave/ 
--image_providers=docker --executor_environment_variables="{}"
sudo bin/mesos-execute --master=10.0.2.15:5050 --name=test 
--docker_image=alpine --command="env" 

MESOS_EXECUTOR_ID=test
SHLVL=1
MESOS_CHECKPOINT=0
MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD=5secs
LIBPROCESS_PORT=0
MESOS_AGENT_ENDPOINT=10.0.2.15:5051
MESOS_SANDBOX=/tmp/mesos/slave/slaves/2d7e44bb-3282-4193-bdc4-eeab9e0943c2-S0/frameworks/1a1cad18-2d87-43dd-97b6-1dbf2d229061-/executors/test/runs/bb8dd72c-fb4c-426a-be18-51b0621339f6
MESOS_NATIVE_JAVA_LIBRARY=/home/vagrant/dist/mesos/lib/libmesos-0.29.0.so
MESOS_FRAMEWORK_ID=1a1cad18-2d87-43dd-97b6-1dbf2d229061-
MESOS_SLAVE_ID=2d7e44bb-3282-4193-bdc4-eeab9e0943c2-S0
MESOS_NATIVE_LIBRARY=/home/vagrant/dist/mesos/lib/libmesos-0.29.0.so
MESOS_DIRECTORY=/tmp/mesos/slave/slaves/2d7e44bb-3282-4193-bdc4-eeab9e0943c2-S0/frameworks/1a1cad18-2d87-43dd-97b6-1dbf2d229061-/executors/test/runs/bb8dd72c-fb4c-426a-be18-51b0621339f6
PWD=/mnt/mesos/sandbox
MESOS_SLAVE_PID=slave(1)@10.0.2.15:5051
{noformat}

`MESOS_SANDBOX` above should be `/mnt/mesos/sandbox`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5239) Persistent volume DockerContainerizer support assumes proper mount propagation setup on the host.

2016-04-29 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264874#comment-15264874
 ] 

Jie Yu commented on MESOS-5239:
---

The following patch allows the filesystem/linux isolator to skip the bind mount 
for the agent's work_dir if possible:
https://reviews.apache.org/r/46858/

The above patch will solve this problem on Centos7, Ubuntu 16.04, CoreOS where 
default mounts are 'shared'.

> Persistent volume DockerContainerizer support assumes proper mount 
> propagation setup on the host.
> -
>
> Key: MESOS-5239
> URL: https://issues.apache.org/jira/browse/MESOS-5239
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 0.28.0, 0.28.1
>Reporter: Jie Yu
>Assignee: Jie Yu
>  Labels: mesosphere
> Fix For: 0.29.0, 0.28.2
>
>
> We recently added persistent volume support in DockerContainerizer 
> (MESOS-3413). To understand the problem, we first need to understand how 
> persistent volumes are supported in DockerContainerizer.
> To support persistent volumes in DockerContainerizer, we bind mount 
> persistent volumes under a container's sandbox ('container_path' has to be 
> relative for persistent volumes). When the Docker container is launched, 
> since we always add a volume (-v) for the sandbox, the persistent volumes 
> will be bind mounted into the container as well (since Docker does a 'rbind').
> The assumption that the above works is that the Docker daemon should see 
> those persistent volume mounts that Mesos mounts on the host mount table. 
> It's not a problem if Docker daemon itself is using the host mount namespace. 
> However, on systemd enabled systems, Docker daemon is running in a separate 
> mount namespace and all mounts in that mount namespace will be marked as 
> slave mounts due to this 
> [patch|https://github.com/docker/docker/commit/eb76cb2301fc883941bc4ca2d9ebc3a486ab8e0a].
> So what that means is that: in order for it to work, the parent mount of 
> agent's work_dir should be a shared mount when docker daemon starts. This is 
> typically true on CentOS7, CoreOS as all mounts are shared mounts by default.
> However, this causes an issue with the 'filesystem/linux' isolator. To 
> understand why, first I need to show you a typical problem when dealing with 
> shared mounts. Let me explain that using the following commands on a CentOS7 
> machine:
> {noformat}
> [root@core-dev run]# cat /proc/self/mountinfo
> 24 60 0:19 / /run rw,nosuid,nodev shared:22 - tmpfs tmpfs rw,seclabel,mode=755
> [root@core-dev run]# mkdir /run/netns
> [root@core-dev run]# mount --bind /run/netns /run/netns
> [root@core-dev run]# cat /proc/self/mountinfo
> 24 60 0:19 / /run rw,nosuid,nodev shared:22 - tmpfs tmpfs rw,seclabel,mode=755
> 121 24 0:19 /netns /run/netns rw,nosuid,nodev shared:22 - tmpfs tmpfs 
> rw,seclabel,mode=755
> [root@core-dev run]# ip netns add test
> [root@core-dev run]# cat /proc/self/mountinfo
> 24 60 0:19 / /run rw,nosuid,nodev shared:22 - tmpfs tmpfs rw,seclabel,mode=755
> 121 24 0:19 /netns /run/netns rw,nosuid,nodev shared:22 - tmpfs tmpfs 
> rw,seclabel,mode=755
> 162 121 0:3 / /run/netns/test rw,nosuid,nodev,noexec,relatime shared:5 - proc 
> proc rw
> 163 24 0:3 / /run/netns/test rw,nosuid,nodev,noexec,relatime shared:5 - proc 
> proc rw
> {noformat}
> As you can see above, there're two entries (/run/netns/test) in the mount 
> table (unexpected). This will confuse some systems sometimes. The reason is 
> because when we create a self bind mount (/run/netns -> /run/netns), the 
> mount will be put into the same shared mount peer group (shared:22) as its 
> parent (/run). Then, when you create another mount underneath that 
> (/run/netns/test), that mount operation will be propagated to all mounts in 
> the same peer group (shared:22), resulting an unexpected additional mount 
> being created.
> The reason we need to do a self bind mount in Mesos is that sometimes, we 
> need to make sure some mounts are shared so that it does not get copied when 
> a new mount namespace is created. However, on some systems, mounts are 
> private by default (e.g., Ubuntu 14.04). In those cases, since we cannot 
> change the system mounts, we have to do a self bind mount so that we can set 
> mount propagation to shared. For instance, in filesytem/linux isolator, we do 
> a self bind mount on agent's work_dir.
> To avoid the self bind mount pitfall mentioned above, in filesystem/linux 
> isolator, after we created the mount, we do a make-slave + make-shared so 
> that the mount is its own shared mount peer group. In that way, any mounts 
> underneath it will not be propagated back.
> However, that operation will break the assumption that the persistent volume 
> 

[jira] [Updated] (MESOS-5311) Calling `make install` fails if `include/mesos/slave/agent` already exists.

2016-04-29 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-5311:
--
Fix Version/s: 0.29.0

> Calling `make install` fails if `include/mesos/slave/agent` already exists.
> ---
>
> Key: MESOS-5311
> URL: https://issues.apache.org/jira/browse/MESOS-5311
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.29.0
>Reporter: Jie Yu
> Fix For: 0.29.0
>
>
> People might be calling `make install` multiple times during development.
> The second `make install` will fail:
> {noformat}
> make[4]: Entering directory `/home/jie/workspace/dist/mesos/build/src'
> cp //home/jie/workspace/dist/mesos/etc/mesos/mesos-agent-env.sh.template \
>   //home/jie/workspace/dist/mesos/etc/mesos/mesos-slave-env.sh.template &&\
> ln -s //home/jie/workspace/dist/mesos/include/mesos/agent \
>   //home/jie/workspace/dist/mesos/include/mesos/slave
> ln: failed to create symbolic link 
> ‘//home/jie/workspace/dist/mesos/include/mesos/slave/agent’: File exists
> make[4]: *** [copy-template-and-create-symlink] Error 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5311) Calling `make install` fails if `include/mesos/slave/agent` already exists.

2016-04-29 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-5311:
--
Affects Version/s: 0.29.0

> Calling `make install` fails if `include/mesos/slave/agent` already exists.
> ---
>
> Key: MESOS-5311
> URL: https://issues.apache.org/jira/browse/MESOS-5311
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.29.0
>Reporter: Jie Yu
> Fix For: 0.29.0
>
>
> People might be calling `make install` multiple times during development.
> The second `make install` will fail:
> {noformat}
> make[4]: Entering directory `/home/jie/workspace/dist/mesos/build/src'
> cp //home/jie/workspace/dist/mesos/etc/mesos/mesos-agent-env.sh.template \
>   //home/jie/workspace/dist/mesos/etc/mesos/mesos-slave-env.sh.template &&\
> ln -s //home/jie/workspace/dist/mesos/include/mesos/agent \
>   //home/jie/workspace/dist/mesos/include/mesos/slave
> ln: failed to create symbolic link 
> ‘//home/jie/workspace/dist/mesos/include/mesos/slave/agent’: File exists
> make[4]: *** [copy-template-and-create-symlink] Error 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5311) Calling `make install` fails if `include/mesos/slave/agent` already exists.

2016-04-29 Thread Jie Yu (JIRA)
Jie Yu created MESOS-5311:
-

 Summary: Calling `make install` fails if 
`include/mesos/slave/agent` already exists.
 Key: MESOS-5311
 URL: https://issues.apache.org/jira/browse/MESOS-5311
 Project: Mesos
  Issue Type: Bug
Reporter: Jie Yu


People might be calling `make install` multiple times during development.

The second `make install` will fail:
{noformat}
make[4]: Entering directory `/home/jie/workspace/dist/mesos/build/src'
cp //home/jie/workspace/dist/mesos/etc/mesos/mesos-agent-env.sh.template \
  //home/jie/workspace/dist/mesos/etc/mesos/mesos-slave-env.sh.template &&\
ln -s //home/jie/workspace/dist/mesos/include/mesos/agent \
  //home/jie/workspace/dist/mesos/include/mesos/slave
ln: failed to create symbolic link 
‘//home/jie/workspace/dist/mesos/include/mesos/slave/agent’: File exists
make[4]: *** [copy-template-and-create-symlink] Error 1
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5193) Recovery failed: Failed to recover registrar on reboot of mesos master

2016-04-29 Thread Priyanka Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264707#comment-15264707
 ] 

Priyanka Gupta commented on MESOS-5193:
---

[~bmahler] Zookeeper connectivity issues are because we have zk also setup on 
the same nodes as mesos master. So configuration wise, we have 3 nodes, each 
running zk, mesos-master and mesos-slave. As far as restart is concerned, we 
have rhel6 boxes and have a initd service which runs these. Although, once a 
master process gets killed the service gets terminated as well. 

> Recovery failed: Failed to recover registrar on reboot of mesos master
> --
>
> Key: MESOS-5193
> URL: https://issues.apache.org/jira/browse/MESOS-5193
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.22.0, 0.27.0
>Reporter: Priyanka Gupta
>  Labels: master, mesosphere
> Attachments: node1.log, node1_after_work_dir.log, node2.log, 
> node2_after_work_dir.log, node3.log, node3_after_work_dir.log
>
>
> Hi all, 
> We are using a 3 node cluster with mesos master, mesos slave and zookeeper on 
> all of them. We are using chronos on top of it. The problem is when we reboot 
> the mesos master leader, the other nodes try to get elected as leader but 
> fail with recovery registrar issue. 
> "Recovery failed: Failed to recover registrar: Failed to perform fetch within 
> 1mins"
> The next node then try to become the leader but again fails with same error. 
> I am not sure about the issue. We are currently using mesos 0.22 and also 
> tried to upgrade to mesos 0.27 as well but the problem continues to happen. 
>  /usr/sbin/mesos-master --work_dir=/tmp/mesos_dir 
> --zk=zk://node1:2181,node2:2181,node3:2181/mesos --quorum=2
> Can you please help us resolve this issue as its a production system.
> Thanks,
> Priyanka



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5310) Enable `network/cni` isolator to load CNI config at runtime.

2016-04-29 Thread Avinash Sridharan (JIRA)
Avinash Sridharan created MESOS-5310:


 Summary: Enable `network/cni` isolator to load CNI config at 
runtime. 
 Key: MESOS-5310
 URL: https://issues.apache.org/jira/browse/MESOS-5310
 Project: Mesos
  Issue Type: Task
  Components: containerization
 Environment: linux
Reporter: Avinash Sridharan
Assignee: Avinash Sridharan


Currently the `network/cni` isolator can only load the CNI configs at startup. 
This makes the CNI networks immutable. From an operational standpoint this can 
make deployments painful for operators. 

To make CNI more flexible the `network/cni` isolator should be able to load 
configs at run time. 

The proposal is to add an endpoint to the `network/cni` isolator, to which when 
the operator sends a PUT request the `network/cni` isolator will reload  CNI 
configs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5193) Recovery failed: Failed to recover registrar on reboot of mesos master

2016-04-29 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264584#comment-15264584
 ] 

Benjamin Mahler edited comment on MESOS-5193 at 4/29/16 7:19 PM:
-

[~prigupta] Looking at the logs, there was a ~ 3 minute window of time in which 
the masters were experiencing ZooKeeper connectivity issues (from 18:33 - 
18:36). Have you noticed this?

Also we require that the masters are run under supervision, are you ensuring 
that the master are being promptly restarted if they terminate? Since the 
recovery timeout is 1 minute by default, I would suggest a supervision restart 
that is much smaller, like 10 seconds.

Were the masters restarted after the last recovery failures here?

{noformat}
Master 1:
W0429 18:33:08.726205  2518 logging.cpp:88] RAW: Received signal SIGTERM from 
process 2938 of user 0; exiting
I0429 18:33:28.846740  1083 main.cpp:230] Build: 2016-04-13 23:22:05 by screwdrv
I0429 18:37:26.008154  1134 master.cpp:1723] Elected as the leading master!
F0429 18:38:26.008847  1127 master.cpp:1457] Recovery failed: Failed to recover 
registrar: Failed to perform fetch within 1mins

Master 2:
W0429 18:36:04.716518  2410 logging.cpp:88] RAW: Received signal SIGTERM from 
process 3029 of user 0; exiting
I0429 18:36:30.429669  1091 main.cpp:230] Build: 2016-04-13 23:22:05 by screwdrv
I0429 18:38:34.699726  1144 master.cpp:1723] Elected as the leading master!
F0429 18:39:34.715205  1139 master.cpp:1457] Recovery failed: Failed to recover 
registrar: Failed to perform fetch within 1mins

Master 3:
I0429 18:32:12.877344  7962 main.cpp:230] Build: 2016-04-13 23:22:05 by screwdrv
I0429 18:36:16.489387  7963 master.cpp:1723] Elected as the leading master!
F0429 18:37:16.490408  7967 master.cpp:1457] Recovery failed: Failed to recover 
registrar: Failed to perform fetch within 1mins
{noformat}

If they were restarted and the ZooKeeper connectivity was resolved, the masters 
should have been able to get back up and running.


was (Author: bmahler):
[~prigupta] Looking at the logs, there was a ~ 3 minute window of time in which 
the masters were experiencing ZooKeeper connectivity issues (from 18:33 - 
18:36). Have you noticed this?

Also we require that the masters are run under supervision, are you ensuring 
that the master are being promptly restarted if they terminate? Since the 
recovery timeout is 1 minute by default, I would suggest something much 
smaller, like 10 seconds.

Were the masters restarted after the last recovery failures here?

{noformat}
Master 1:
W0429 18:33:08.726205  2518 logging.cpp:88] RAW: Received signal SIGTERM from 
process 2938 of user 0; exiting
I0429 18:33:28.846740  1083 main.cpp:230] Build: 2016-04-13 23:22:05 by screwdrv
I0429 18:37:26.008154  1134 master.cpp:1723] Elected as the leading master!
F0429 18:38:26.008847  1127 master.cpp:1457] Recovery failed: Failed to recover 
registrar: Failed to perform fetch within 1mins

Master 2:
W0429 18:36:04.716518  2410 logging.cpp:88] RAW: Received signal SIGTERM from 
process 3029 of user 0; exiting
I0429 18:36:30.429669  1091 main.cpp:230] Build: 2016-04-13 23:22:05 by screwdrv
I0429 18:38:34.699726  1144 master.cpp:1723] Elected as the leading master!
F0429 18:39:34.715205  1139 master.cpp:1457] Recovery failed: Failed to recover 
registrar: Failed to perform fetch within 1mins

Master 3:
I0429 18:32:12.877344  7962 main.cpp:230] Build: 2016-04-13 23:22:05 by screwdrv
I0429 18:36:16.489387  7963 master.cpp:1723] Elected as the leading master!
F0429 18:37:16.490408  7967 master.cpp:1457] Recovery failed: Failed to recover 
registrar: Failed to perform fetch within 1mins
{noformat}

If they were restarted and the ZooKeeper connectivity was resolved, the masters 
should have been able to get back up and running.

> Recovery failed: Failed to recover registrar on reboot of mesos master
> --
>
> Key: MESOS-5193
> URL: https://issues.apache.org/jira/browse/MESOS-5193
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.22.0, 0.27.0
>Reporter: Priyanka Gupta
>  Labels: master, mesosphere
> Attachments: node1.log, node1_after_work_dir.log, node2.log, 
> node2_after_work_dir.log, node3.log, node3_after_work_dir.log
>
>
> Hi all, 
> We are using a 3 node cluster with mesos master, mesos slave and zookeeper on 
> all of them. We are using chronos on top of it. The problem is when we reboot 
> the mesos master leader, the other nodes try to get elected as leader but 
> fail with recovery registrar issue. 
> "Recovery failed: Failed to recover registrar: Failed to perform fetch within 
> 1mins"
> The next node then try to become the leader but again fails with same error. 
> I am not sure about the issue. We are currently using 

[jira] [Commented] (MESOS-5193) Recovery failed: Failed to recover registrar on reboot of mesos master

2016-04-29 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264584#comment-15264584
 ] 

Benjamin Mahler commented on MESOS-5193:


[~prigupta] Looking at the logs, there was a ~ 3 minute window of time in which 
the masters were experiencing ZooKeeper connectivity issues (from 18:33 - 
18:36). Have you noticed this?

Also we require that the masters are run under supervision, are you ensuring 
that the master are being promptly restarted if they terminate? Since the 
recovery timeout is 1 minute by default, I would suggest something much 
smaller, like 10 seconds.

Were the masters restarted after the last recovery failures here?

{noformat}
Master 1:
W0429 18:33:08.726205  2518 logging.cpp:88] RAW: Received signal SIGTERM from 
process 2938 of user 0; exiting
I0429 18:33:28.846740  1083 main.cpp:230] Build: 2016-04-13 23:22:05 by screwdrv
I0429 18:37:26.008154  1134 master.cpp:1723] Elected as the leading master!
F0429 18:38:26.008847  1127 master.cpp:1457] Recovery failed: Failed to recover 
registrar: Failed to perform fetch within 1mins

Master 2:
W0429 18:36:04.716518  2410 logging.cpp:88] RAW: Received signal SIGTERM from 
process 3029 of user 0; exiting
I0429 18:36:30.429669  1091 main.cpp:230] Build: 2016-04-13 23:22:05 by screwdrv
I0429 18:38:34.699726  1144 master.cpp:1723] Elected as the leading master!
F0429 18:39:34.715205  1139 master.cpp:1457] Recovery failed: Failed to recover 
registrar: Failed to perform fetch within 1mins

Master 3:
I0429 18:32:12.877344  7962 main.cpp:230] Build: 2016-04-13 23:22:05 by screwdrv
I0429 18:36:16.489387  7963 master.cpp:1723] Elected as the leading master!
F0429 18:37:16.490408  7967 master.cpp:1457] Recovery failed: Failed to recover 
registrar: Failed to perform fetch within 1mins
{noformat}

If they were restarted and the ZooKeeper connectivity was resolved, the masters 
should have been able to get back up and running.

> Recovery failed: Failed to recover registrar on reboot of mesos master
> --
>
> Key: MESOS-5193
> URL: https://issues.apache.org/jira/browse/MESOS-5193
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.22.0, 0.27.0
>Reporter: Priyanka Gupta
>  Labels: master, mesosphere
> Attachments: node1.log, node1_after_work_dir.log, node2.log, 
> node2_after_work_dir.log, node3.log, node3_after_work_dir.log
>
>
> Hi all, 
> We are using a 3 node cluster with mesos master, mesos slave and zookeeper on 
> all of them. We are using chronos on top of it. The problem is when we reboot 
> the mesos master leader, the other nodes try to get elected as leader but 
> fail with recovery registrar issue. 
> "Recovery failed: Failed to recover registrar: Failed to perform fetch within 
> 1mins"
> The next node then try to become the leader but again fails with same error. 
> I am not sure about the issue. We are currently using mesos 0.22 and also 
> tried to upgrade to mesos 0.27 as well but the problem continues to happen. 
>  /usr/sbin/mesos-master --work_dir=/tmp/mesos_dir 
> --zk=zk://node1:2181,node2:2181,node3:2181/mesos --quorum=2
> Can you please help us resolve this issue as its a production system.
> Thanks,
> Priyanka



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5193) Recovery failed: Failed to recover registrar on reboot of mesos master

2016-04-29 Thread Priyanka Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264513#comment-15264513
 ] 

Priyanka Gupta commented on MESOS-5193:
---

Hi [~jieyu] 

I tried changing the work dir as told by you but with no luck.  Attaching the 
logs again. 
Test scenario: Node1 - leading master. Rebooted node1 -> node 2 became master. 
All is fine. Once node 1 is back, I rebooted node2 (current leading master) , 
node3 becomes master and exits, then node1 tries to becomes fails and then node 
2 also fails.

> Recovery failed: Failed to recover registrar on reboot of mesos master
> --
>
> Key: MESOS-5193
> URL: https://issues.apache.org/jira/browse/MESOS-5193
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.22.0, 0.27.0
>Reporter: Priyanka Gupta
>  Labels: master, mesosphere
> Attachments: node1.log, node1_after_work_dir.log, node2.log, 
> node2_after_work_dir.log, node3.log, node3_after_work_dir.log
>
>
> Hi all, 
> We are using a 3 node cluster with mesos master, mesos slave and zookeeper on 
> all of them. We are using chronos on top of it. The problem is when we reboot 
> the mesos master leader, the other nodes try to get elected as leader but 
> fail with recovery registrar issue. 
> "Recovery failed: Failed to recover registrar: Failed to perform fetch within 
> 1mins"
> The next node then try to become the leader but again fails with same error. 
> I am not sure about the issue. We are currently using mesos 0.22 and also 
> tried to upgrade to mesos 0.27 as well but the problem continues to happen. 
>  /usr/sbin/mesos-master --work_dir=/tmp/mesos_dir 
> --zk=zk://node1:2181,node2:2181,node3:2181/mesos --quorum=2
> Can you please help us resolve this issue as its a production system.
> Thanks,
> Priyanka



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5193) Recovery failed: Failed to recover registrar on reboot of mesos master

2016-04-29 Thread Priyanka Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Priyanka Gupta updated MESOS-5193:
--
Attachment: node3_after_work_dir.log
node2_after_work_dir.log
node1_after_work_dir.log

Logs of mesos_master after changing the work dir location as suggested.

> Recovery failed: Failed to recover registrar on reboot of mesos master
> --
>
> Key: MESOS-5193
> URL: https://issues.apache.org/jira/browse/MESOS-5193
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.22.0, 0.27.0
>Reporter: Priyanka Gupta
>  Labels: master, mesosphere
> Attachments: node1.log, node1_after_work_dir.log, node2.log, 
> node2_after_work_dir.log, node3.log, node3_after_work_dir.log
>
>
> Hi all, 
> We are using a 3 node cluster with mesos master, mesos slave and zookeeper on 
> all of them. We are using chronos on top of it. The problem is when we reboot 
> the mesos master leader, the other nodes try to get elected as leader but 
> fail with recovery registrar issue. 
> "Recovery failed: Failed to recover registrar: Failed to perform fetch within 
> 1mins"
> The next node then try to become the leader but again fails with same error. 
> I am not sure about the issue. We are currently using mesos 0.22 and also 
> tried to upgrade to mesos 0.27 as well but the problem continues to happen. 
>  /usr/sbin/mesos-master --work_dir=/tmp/mesos_dir 
> --zk=zk://node1:2181,node2:2181,node3:2181/mesos --quorum=2
> Can you please help us resolve this issue as its a production system.
> Thanks,
> Priyanka



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4973) Duplicates in 'unregistered_frameworks' in /state

2016-04-29 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264295#comment-15264295
 ] 

Yan Xu commented on MESOS-4973:
---

We should deduplicate the list so the entries are unique.

> Duplicates in 'unregistered_frameworks' in /state 
> --
>
> Key: MESOS-4973
> URL: https://issues.apache.org/jira/browse/MESOS-4973
> Project: Mesos
>  Issue Type: Bug
>Reporter: Yan Xu
>Priority: Minor
>
> In our clusters where many frameworks run, 'unregistered_frameworks' 
> currently doesn't show what it semantically means, but rather "a list of 
> frameworkIDs for each orphaned task", which means a lot of duplicated 
> frameworkIDs.
> For this filed to be useful we need to deduplicate when outputting the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5308) ROOT_XFS_QuotaTest.NoCheckpointRecovery failed.

2016-04-29 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264288#comment-15264288
 ] 

Yan Xu commented on MESOS-5308:
---

We could be calling {{usage}} too fast, in another test we use {{while(true)}} 
to keep calling {{usage()}}. In don't think we need to do this in every test 
and our focus in this test is not {{usage()}}, we can just remove this 
expectation.

[~jpe...@apache.org] I can take care of this if you don't have time. :)

> ROOT_XFS_QuotaTest.NoCheckpointRecovery failed.
> ---
>
> Key: MESOS-5308
> URL: https://issues.apache.org/jira/browse/MESOS-5308
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
> Environment: Fedora 23 with/without SSL
>Reporter: Gilbert Song
>  Labels: isolation
>
> Here is the log:
> {code}
> [01:07:51] :   [Step 10/10] [ RUN  ] 
> ROOT_XFS_QuotaTest.NoCheckpointRecovery
> [01:07:51] :   [Step 10/10] meta-data=/dev/loop0 isize=512
> agcount=2, agsize=5120 blks
> [01:07:51] :   [Step 10/10]  =   sectsz=512   
> attr=2, projid32bit=1
> [01:07:51] :   [Step 10/10]  =   crc=1
> finobt=1, sparse=0
> [01:07:51] :   [Step 10/10] data =   bsize=4096   
> blocks=10240, imaxpct=25
> [01:07:51] :   [Step 10/10]  =   sunit=0  
> swidth=0 blks
> [01:07:51] :   [Step 10/10] naming   =version 2  bsize=4096   
> ascii-ci=0 ftype=1
> [01:07:51] :   [Step 10/10] log  =internal log   bsize=4096   
> blocks=855, version=2
> [01:07:51] :   [Step 10/10]  =   sectsz=512   
> sunit=0 blks, lazy-count=1
> [01:07:51] :   [Step 10/10] realtime =none   extsz=4096   
> blocks=0, rtextents=0
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.690585 17604 cluster.cpp:149] 
> Creating default 'local' authorizer
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.706126 17604 leveldb.cpp:174] 
> Opened db in 15.452988ms
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707135 17604 leveldb.cpp:181] 
> Compacted db in 984939ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707154 17604 leveldb.cpp:196] 
> Created db iterator in 4159ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707159 17604 leveldb.cpp:202] 
> Seeked to beginning of db in 517ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707165 17604 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 305ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707176 17604 replica.cpp:779] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707320 17621 recover.cpp:447] 
> Starting replica recovery
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707381 17621 recover.cpp:473] 
> Replica is in EMPTY status
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707638 17619 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request from 
> (17889)@172.30.2.13:37618
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707732 17624 recover.cpp:193] 
> Received a recover response from a replica in EMPTY status
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707885 17624 recover.cpp:564] 
> Updating replica status to STARTING
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.708389 17618 master.cpp:382] 
> Master 0c1e0a50-1212-4104-a148-661131b79f27 
> (ip-172-30-2-13.ec2.internal.mesosphere.io) started on 172.30.2.13:37618
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.708406 17618 master.cpp:384] Flags 
> at startup: --acls="" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate="true" 
> --authenticate_http="true" --authenticate_http_frameworks="true" 
> --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" 
> --credentials="/mnt/teamcity/temp/buildTmp/ROOT_XFS_QuotaTest_NoCheckpointRecovery_ZsRNg9/mnt/credentials"
>  --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/mnt/teamcity/temp/buildTmp/ROOT_XFS_QuotaTest_NoCheckpointRecovery_ZsRNg9/mnt/master"
>  --zk_session_timeout="10secs"
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.708510 17618 

[jira] [Updated] (MESOS-5239) Persistent volume DockerContainerizer support assumes proper mount propagation setup on the host.

2016-04-29 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-5239:
--
Fix Version/s: 0.28.2
   0.29.0

> Persistent volume DockerContainerizer support assumes proper mount 
> propagation setup on the host.
> -
>
> Key: MESOS-5239
> URL: https://issues.apache.org/jira/browse/MESOS-5239
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 0.28.0, 0.28.1
>Reporter: Jie Yu
>Assignee: Jie Yu
>  Labels: mesosphere
> Fix For: 0.29.0, 0.28.2
>
>
> We recently added persistent volume support in DockerContainerizer 
> (MESOS-3413). To understand the problem, we first need to understand how 
> persistent volumes are supported in DockerContainerizer.
> To support persistent volumes in DockerContainerizer, we bind mount 
> persistent volumes under a container's sandbox ('container_path' has to be 
> relative for persistent volumes). When the Docker container is launched, 
> since we always add a volume (-v) for the sandbox, the persistent volumes 
> will be bind mounted into the container as well (since Docker does a 'rbind').
> The assumption that the above works is that the Docker daemon should see 
> those persistent volume mounts that Mesos mounts on the host mount table. 
> It's not a problem if Docker daemon itself is using the host mount namespace. 
> However, on systemd enabled systems, Docker daemon is running in a separate 
> mount namespace and all mounts in that mount namespace will be marked as 
> slave mounts due to this 
> [patch|https://github.com/docker/docker/commit/eb76cb2301fc883941bc4ca2d9ebc3a486ab8e0a].
> So what that means is that: in order for it to work, the parent mount of 
> agent's work_dir should be a shared mount when docker daemon starts. This is 
> typically true on CentOS7, CoreOS as all mounts are shared mounts by default.
> However, this causes an issue with the 'filesystem/linux' isolator. To 
> understand why, first I need to show you a typical problem when dealing with 
> shared mounts. Let me explain that using the following commands on a CentOS7 
> machine:
> {noformat}
> [root@core-dev run]# cat /proc/self/mountinfo
> 24 60 0:19 / /run rw,nosuid,nodev shared:22 - tmpfs tmpfs rw,seclabel,mode=755
> [root@core-dev run]# mkdir /run/netns
> [root@core-dev run]# mount --bind /run/netns /run/netns
> [root@core-dev run]# cat /proc/self/mountinfo
> 24 60 0:19 / /run rw,nosuid,nodev shared:22 - tmpfs tmpfs rw,seclabel,mode=755
> 121 24 0:19 /netns /run/netns rw,nosuid,nodev shared:22 - tmpfs tmpfs 
> rw,seclabel,mode=755
> [root@core-dev run]# ip netns add test
> [root@core-dev run]# cat /proc/self/mountinfo
> 24 60 0:19 / /run rw,nosuid,nodev shared:22 - tmpfs tmpfs rw,seclabel,mode=755
> 121 24 0:19 /netns /run/netns rw,nosuid,nodev shared:22 - tmpfs tmpfs 
> rw,seclabel,mode=755
> 162 121 0:3 / /run/netns/test rw,nosuid,nodev,noexec,relatime shared:5 - proc 
> proc rw
> 163 24 0:3 / /run/netns/test rw,nosuid,nodev,noexec,relatime shared:5 - proc 
> proc rw
> {noformat}
> As you can see above, there're two entries (/run/netns/test) in the mount 
> table (unexpected). This will confuse some systems sometimes. The reason is 
> because when we create a self bind mount (/run/netns -> /run/netns), the 
> mount will be put into the same shared mount peer group (shared:22) as its 
> parent (/run). Then, when you create another mount underneath that 
> (/run/netns/test), that mount operation will be propagated to all mounts in 
> the same peer group (shared:22), resulting an unexpected additional mount 
> being created.
> The reason we need to do a self bind mount in Mesos is that sometimes, we 
> need to make sure some mounts are shared so that it does not get copied when 
> a new mount namespace is created. However, on some systems, mounts are 
> private by default (e.g., Ubuntu 14.04). In those cases, since we cannot 
> change the system mounts, we have to do a self bind mount so that we can set 
> mount propagation to shared. For instance, in filesytem/linux isolator, we do 
> a self bind mount on agent's work_dir.
> To avoid the self bind mount pitfall mentioned above, in filesystem/linux 
> isolator, after we created the mount, we do a make-slave + make-shared so 
> that the mount is its own shared mount peer group. In that way, any mounts 
> underneath it will not be propagated back.
> However, that operation will break the assumption that the persistent volume 
> DockerContainerizer support makes. As a result, we're seeing problem with 
> persistent volumes in DockerContainerizer when filesystem/linux isolator is 
> turned on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5239) Persistent volume DockerContainerizer support assumes proper mount propagation setup on the host.

2016-04-29 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-5239:
-

Assignee: Jie Yu

> Persistent volume DockerContainerizer support assumes proper mount 
> propagation setup on the host.
> -
>
> Key: MESOS-5239
> URL: https://issues.apache.org/jira/browse/MESOS-5239
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 0.28.0, 0.28.1
>Reporter: Jie Yu
>Assignee: Jie Yu
>  Labels: mesosphere
>
> We recently added persistent volume support in DockerContainerizer 
> (MESOS-3413). To understand the problem, we first need to understand how 
> persistent volumes are supported in DockerContainerizer.
> To support persistent volumes in DockerContainerizer, we bind mount 
> persistent volumes under a container's sandbox ('container_path' has to be 
> relative for persistent volumes). When the Docker container is launched, 
> since we always add a volume (-v) for the sandbox, the persistent volumes 
> will be bind mounted into the container as well (since Docker does a 'rbind').
> The assumption that the above works is that the Docker daemon should see 
> those persistent volume mounts that Mesos mounts on the host mount table. 
> It's not a problem if Docker daemon itself is using the host mount namespace. 
> However, on systemd enabled systems, Docker daemon is running in a separate 
> mount namespace and all mounts in that mount namespace will be marked as 
> slave mounts due to this 
> [patch|https://github.com/docker/docker/commit/eb76cb2301fc883941bc4ca2d9ebc3a486ab8e0a].
> So what that means is that: in order for it to work, the parent mount of 
> agent's work_dir should be a shared mount when docker daemon starts. This is 
> typically true on CentOS7, CoreOS as all mounts are shared mounts by default.
> However, this causes an issue with the 'filesystem/linux' isolator. To 
> understand why, first I need to show you a typical problem when dealing with 
> shared mounts. Let me explain that using the following commands on a CentOS7 
> machine:
> {noformat}
> [root@core-dev run]# cat /proc/self/mountinfo
> 24 60 0:19 / /run rw,nosuid,nodev shared:22 - tmpfs tmpfs rw,seclabel,mode=755
> [root@core-dev run]# mkdir /run/netns
> [root@core-dev run]# mount --bind /run/netns /run/netns
> [root@core-dev run]# cat /proc/self/mountinfo
> 24 60 0:19 / /run rw,nosuid,nodev shared:22 - tmpfs tmpfs rw,seclabel,mode=755
> 121 24 0:19 /netns /run/netns rw,nosuid,nodev shared:22 - tmpfs tmpfs 
> rw,seclabel,mode=755
> [root@core-dev run]# ip netns add test
> [root@core-dev run]# cat /proc/self/mountinfo
> 24 60 0:19 / /run rw,nosuid,nodev shared:22 - tmpfs tmpfs rw,seclabel,mode=755
> 121 24 0:19 /netns /run/netns rw,nosuid,nodev shared:22 - tmpfs tmpfs 
> rw,seclabel,mode=755
> 162 121 0:3 / /run/netns/test rw,nosuid,nodev,noexec,relatime shared:5 - proc 
> proc rw
> 163 24 0:3 / /run/netns/test rw,nosuid,nodev,noexec,relatime shared:5 - proc 
> proc rw
> {noformat}
> As you can see above, there're two entries (/run/netns/test) in the mount 
> table (unexpected). This will confuse some systems sometimes. The reason is 
> because when we create a self bind mount (/run/netns -> /run/netns), the 
> mount will be put into the same shared mount peer group (shared:22) as its 
> parent (/run). Then, when you create another mount underneath that 
> (/run/netns/test), that mount operation will be propagated to all mounts in 
> the same peer group (shared:22), resulting an unexpected additional mount 
> being created.
> The reason we need to do a self bind mount in Mesos is that sometimes, we 
> need to make sure some mounts are shared so that it does not get copied when 
> a new mount namespace is created. However, on some systems, mounts are 
> private by default (e.g., Ubuntu 14.04). In those cases, since we cannot 
> change the system mounts, we have to do a self bind mount so that we can set 
> mount propagation to shared. For instance, in filesytem/linux isolator, we do 
> a self bind mount on agent's work_dir.
> To avoid the self bind mount pitfall mentioned above, in filesystem/linux 
> isolator, after we created the mount, we do a make-slave + make-shared so 
> that the mount is its own shared mount peer group. In that way, any mounts 
> underneath it will not be propagated back.
> However, that operation will break the assumption that the persistent volume 
> DockerContainerizer support makes. As a result, we're seeing problem with 
> persistent volumes in DockerContainerizer when filesystem/linux isolator is 
> turned on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5308) ROOT_XFS_QuotaTest.NoCheckpointRecovery failed.

2016-04-29 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264243#comment-15264243
 ] 

James Peach commented on MESOS-5308:


Looks like the resource statistics are not correct. I don't really see why that 
would happen. Probably the way to debug this is to leave the scratch filesystem 
mounted and poke at it with {{xfs_quota}}.

{code}
[01:07:51]W: [Step 10/10] 1048576 bytes (1.0 MB) copied, 0.00128219 s, 818 
MB/s
[01:07:51] : [Step 10/10] 
../../src/tests/containerizer/xfs_quota_tests.cpp:559: Failure
[01:07:51]W: [Step 10/10] I0429 01:07:51.865185 17604 slave.cpp:825] Agent 
terminating
[01:07:51] : [Step 10/10] Value of: 
usage1->executors(0).statistics().disk_used_bytes()
[01:07:51] : [Step 10/10]   Actual: 196608
{code}

> ROOT_XFS_QuotaTest.NoCheckpointRecovery failed.
> ---
>
> Key: MESOS-5308
> URL: https://issues.apache.org/jira/browse/MESOS-5308
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
> Environment: Fedora 23 with/without SSL
>Reporter: Gilbert Song
>  Labels: isolation
>
> Here is the log:
> {code}
> [01:07:51] :   [Step 10/10] [ RUN  ] 
> ROOT_XFS_QuotaTest.NoCheckpointRecovery
> [01:07:51] :   [Step 10/10] meta-data=/dev/loop0 isize=512
> agcount=2, agsize=5120 blks
> [01:07:51] :   [Step 10/10]  =   sectsz=512   
> attr=2, projid32bit=1
> [01:07:51] :   [Step 10/10]  =   crc=1
> finobt=1, sparse=0
> [01:07:51] :   [Step 10/10] data =   bsize=4096   
> blocks=10240, imaxpct=25
> [01:07:51] :   [Step 10/10]  =   sunit=0  
> swidth=0 blks
> [01:07:51] :   [Step 10/10] naming   =version 2  bsize=4096   
> ascii-ci=0 ftype=1
> [01:07:51] :   [Step 10/10] log  =internal log   bsize=4096   
> blocks=855, version=2
> [01:07:51] :   [Step 10/10]  =   sectsz=512   
> sunit=0 blks, lazy-count=1
> [01:07:51] :   [Step 10/10] realtime =none   extsz=4096   
> blocks=0, rtextents=0
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.690585 17604 cluster.cpp:149] 
> Creating default 'local' authorizer
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.706126 17604 leveldb.cpp:174] 
> Opened db in 15.452988ms
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707135 17604 leveldb.cpp:181] 
> Compacted db in 984939ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707154 17604 leveldb.cpp:196] 
> Created db iterator in 4159ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707159 17604 leveldb.cpp:202] 
> Seeked to beginning of db in 517ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707165 17604 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 305ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707176 17604 replica.cpp:779] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707320 17621 recover.cpp:447] 
> Starting replica recovery
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707381 17621 recover.cpp:473] 
> Replica is in EMPTY status
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707638 17619 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request from 
> (17889)@172.30.2.13:37618
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707732 17624 recover.cpp:193] 
> Received a recover response from a replica in EMPTY status
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707885 17624 recover.cpp:564] 
> Updating replica status to STARTING
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.708389 17618 master.cpp:382] 
> Master 0c1e0a50-1212-4104-a148-661131b79f27 
> (ip-172-30-2-13.ec2.internal.mesosphere.io) started on 172.30.2.13:37618
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.708406 17618 master.cpp:384] Flags 
> at startup: --acls="" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate="true" 
> --authenticate_http="true" --authenticate_http_frameworks="true" 
> --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" 
> --credentials="/mnt/teamcity/temp/buildTmp/ROOT_XFS_QuotaTest_NoCheckpointRecovery_ZsRNg9/mnt/credentials"
>  --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" 

[jira] [Commented] (MESOS-5304) /metrics/snapshot endpoint help disappeared on agent.

2016-04-29 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264157#comment-15264157
 ] 

Greg Mann commented on MESOS-5304:
--

Thanks [~kaysoky]! :-)

> /metrics/snapshot endpoint help disappeared on agent.
> -
>
> Key: MESOS-5304
> URL: https://issues.apache.org/jira/browse/MESOS-5304
> Project: Mesos
>  Issue Type: Bug
>Reporter: Joerg Schad
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> After 
> https://github.com/apache/mesos/commit/066fc4bd0df6690a5e1a929d3836e307c1e22586
> the help for the /metrics/snapshot endpoint on the agent doesn't appear 
> anymore (Master endpoint help is unchanged).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3235) FetcherCacheHttpTest.HttpCachedSerialized and FetcherCacheHttpTest.HttpCachedConcurrent are flaky

2016-04-29 Thread Alexander Rojas (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264130#comment-15264130
 ] 

Alexander Rojas commented on MESOS-3235:


Got this one today (OSX 10.11.4) clang 3.8 distributed through homebrew

{noformat}
 RUN  ] FetcherCacheHttpTest.HttpCachedConcurrent
HTTP/1.1 200 OK
Date: Fri, 29 Apr 2016 14:35:51 GMT
Content-Length: 30

I0429 16:35:51.301870 2070048768 exec.cpp:150] Version: 0.29.0
I0429 16:35:51.398007 2070048768 exec.cpp:150] Version: 0.29.0
I0429 16:35:51.401103 4284416 exec.cpp:225] Executor registered on agent 
22b00c96-3232-4de8-8779-56559b884e16-S0
Registered executor on localhost
Starting task 2
sh -c './mesos-fetcher-test-cmd 2'
Forked command at 83217
I0429 16:35:51.507447 2070048768 exec.cpp:150] Version: 0.29.0
I0429 16:35:51.507869 2070048768 exec.cpp:150] Version: 0.29.0
Command exited with status 0 (pid: 83217)
I0429 16:35:51.515452 2070048768 exec.cpp:150] Version: 0.29.0
I0429 16:35:51.518779 3211264 exec.cpp:225] Executor registered on agent 
22b00c96-3232-4de8-8779-56559b884e16-S0
I0429 16:35:51.519789 1064960 exec.cpp:225] Executor registered on agent 
22b00c96-3232-4de8-8779-56559b884e16-S0
Registered executor on localhost
Starting task 1
Forked command at 83258
sh -c './mesos-fetcher-test-cmd 1'
I0429 16:35:51.530117 528384 exec.cpp:225] Executor registered on agent 
22b00c96-3232-4de8-8779-56559b884e16-S0
Registered executor on localhost
Starting task 3
Forked command at 83259
sh -c './mesos-fetcher-test-cmd 3'
Registered executor on localhost
Starting task 4
sh -c './mesos-fetcher-test-cmd 4'
Forked command at 83261
Command exited with status 0 (pid: 83258)
Command exited with status 0 (pid: 83259)
Command exited with status 0 (pid: 83261)
../../src/tests/fetcher_cache_tests.cpp:1077: Failure
Failed to wait 15secs for awaitFinished(tasks.get())
Begin listing sandboxes
Begin listing sandbox 
`/var/folders/kj/ffppgfr54x95xb9b4nhjqccwgn/T/FetcherCacheHttpTest_HttpCachedConcurrent_dfgjtz/slaves/22b00c96-3232-4de8-8779-56559b884e16-S0/frameworks/22b00c96-3232-4de8-8779-56559b884e16-/executors/0/runs/latest`:
Begin file contents of `mesos-fetcher-test-cmd`:
touch mesos-fetcher-test-cmd$1
End file
Begin file contents of `stderr`:
I0429 16:35:51.131356 2070048768 fetcher.cpp:457] Fetcher Info: 
{"cache_directory":"\/var\/folders\/kj\/ffppgfr54x95xb9b4nhjqccwgn\/T\/FetcherCacheHttpTest_HttpCachedConcurrent_dfgjtz\/fetch\/slaves\/22b00c96-3232-4de8-8779-56559b884e16-S0\/alexander","items":[{"action":"DOWNLOAD_AND_CACHE","cache_filename":"c1-mesos-fetc_r-test-cmd","uri":{"cache":true,"executable":true,"extract":true,"value":"http:\/\/127.0.0.1:65307\/(2321)\/mesos-fetcher-test-cmd"}}],"sandbox_directory":"\/var\/folders\/kj\/ffppgfr54x95xb9b4nhjqccwgn\/T\/FetcherCacheHttpTest_HttpCachedConcurrent_dfgjtz\/slaves\/22b00c96-3232-4de8-8779-56559b884e16-S0\/frameworks\/22b00c96-3232-4de8-8779-56559b884e16-\/executors\/0\/runs\/be8259a3-b4d4-4e99-a1d9-7c425f95a10f","user":"alexander"}
I0429 16:35:51.136849 2070048768 fetcher.cpp:412] Fetching URI 
'http://127.0.0.1:65307/(2321)/mesos-fetcher-test-cmd'
I0429 16:35:51.136881 2070048768 fetcher.cpp:382] Downloading into cache
I0429 16:35:51.137176 2070048768 fetcher.cpp:187] Fetching URI 
'http://127.0.0.1:65307/(2321)/mesos-fetcher-test-cmd'
I0429 16:35:51.137212 2070048768 fetcher.cpp:134] Downloading resource from 
'http://127.0.0.1:65307/(2321)/mesos-fetcher-test-cmd' to 
'/var/folders/kj/ffppgfr54x95xb9b4nhjqccwgn/T/FetcherCacheHttpTest_HttpCachedConcurrent_dfgjtz/fetch/slaves/22b00c96-3232-4de8-8779-56559b884e16-S0/alexander/c1-mesos-fetc_r-test-cmd'
I0429 16:35:51.138309 2070048768 fetcher.cpp:306] Fetching from cache
I0429 16:35:51.138375 2070048768 fetcher.cpp:167] Copying resource with 
command:cp 
'/var/folders/kj/ffppgfr54x95xb9b4nhjqccwgn/T/FetcherCacheHttpTest_HttpCachedConcurrent_dfgjtz/fetch/slaves/22b00c96-3232-4de8-8779-56559b884e16-S0/alexander/c1-mesos-fetc_r-test-cmd'
 
'/var/folders/kj/ffppgfr54x95xb9b4nhjqccwgn/T/FetcherCacheHttpTest_HttpCachedConcurrent_dfgjtz/slaves/22b00c96-3232-4de8-8779-56559b884e16-S0/frameworks/22b00c96-3232-4de8-8779-56559b884e16-/executors/0/runs/be8259a3-b4d4-4e99-a1d9-7c425f95a10f/mesos-fetcher-test-cmd'
I0429 16:35:51.142838 2070048768 fetcher.cpp:489] Fetched 
'http://127.0.0.1:65307/(2321)/mesos-fetcher-test-cmd' to 
'/var/folders/kj/ffppgfr54x95xb9b4nhjqccwgn/T/FetcherCacheHttpTest_HttpCachedConcurrent_dfgjtz/slaves/22b00c96-3232-4de8-8779-56559b884e16-S0/frameworks/22b00c96-3232-4de8-8779-56559b884e16-/executors/0/runs/be8259a3-b4d4-4e99-a1d9-7c425f95a10f/mesos-fetcher-test-cmd'

End file
Begin file contents of `stdout`:

End file
End sandbox
Begin listing sandbox 

[jira] [Updated] (MESOS-5299) Support hierarchy based matching of HTTP endpoint authorization requests.

2016-04-29 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-5299:
--
Summary: Support hierarchy based matching of HTTP endpoint authorization 
requests.  (was: Support hierarchiy based matching of HTTP endpoint 
authorization requests.)

> Support hierarchy based matching of HTTP endpoint authorization requests.
> -
>
> Key: MESOS-5299
> URL: https://issues.apache.org/jira/browse/MESOS-5299
> Project: Mesos
>  Issue Type: Task
>  Components: security
>Reporter: Jan Schlicht
>Priority: Minor
>  Labels: acl, authorization, mesosphere, security
>
> The current HTTP endpoint authorization (e.g. the GET_ENDPOINT_WITH_PATH 
> action) works by matching the request's object with entries in the ACL. This 
> could be loosened to support hierarchies, for example a principal trying to 
> access "/monitor/statistics" could be authorized to do so if an ACL rule 
> exists that allows this principal to access "/monitor" (and hence all 
> subpaths of it).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4801) Updated `createFrameworkInfo` for hierarchical_allocator_tests.cpp.

2016-04-29 Thread Guangya Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guangya Liu updated MESOS-4801:
---
Description: The function of {{createFrameworkInfo}} in 
hierarchical_allocator_tests.cpp should be updated by enabling caller can set a 
framework capability to create a framework which can use revocable resources.  
(was: The function of {{createFrameworkInfo}} in 
hierarchical_allocator_tests.cpp should be updated by enabling caller can set a 
bool parameter to create a framework which can use revocable resources.)

> Updated `createFrameworkInfo` for hierarchical_allocator_tests.cpp.
> ---
>
> Key: MESOS-4801
> URL: https://issues.apache.org/jira/browse/MESOS-4801
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> The function of {{createFrameworkInfo}} in hierarchical_allocator_tests.cpp 
> should be updated by enabling caller can set a framework capability to create 
> a framework which can use revocable resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5305) 'unused parameter' warnings may occur when '-Wextra' is enabled.

2016-04-29 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-5305:
---
Issue Type: Bug  (was: Wish)

> 'unused parameter' warnings may occur when '-Wextra' is enabled.
> 
>
> Key: MESOS-5305
> URL: https://issues.apache.org/jira/browse/MESOS-5305
> Project: Mesos
>  Issue Type: Bug
> Environment: gcc, clang
>Reporter: Gilbert Song
>Priority: Trivial
>  Labels: newbie++
>
> Extra warmings can be enabled with `-Wextra` in gcc or clang. This does not 
> affect functionality, but just as a noise when building mesos, or just a 
> style issue in Cmake build.
> E.g., 
> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/strings.hpp#L327
> the `separator` parameter name can be either omitted or commented out.
> another example,
> https://github.com/apache/mesos/blob/master/include/mesos/slave/isolator.hpp#L84
> the `containerId` parameter name.
> All similar issues should be fixed together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5299) Support hierarchiy based matching of HTTP endpoint authorization requests.

2016-04-29 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-5299:
--
Priority: Minor  (was: Major)

> Support hierarchiy based matching of HTTP endpoint authorization requests.
> --
>
> Key: MESOS-5299
> URL: https://issues.apache.org/jira/browse/MESOS-5299
> Project: Mesos
>  Issue Type: Task
>  Components: security
>Reporter: Jan Schlicht
>Priority: Minor
>  Labels: acl, authorization, mesosphere, security
>
> The current HTTP endpoint authorization (e.g. the GET_ENDPOINT_WITH_PATH 
> action) works by matching the request's object with entries in the ACL. This 
> could be loosened to support hierarchies, for example a principal trying to 
> access "/monitor/statistics" could be authorized to do so if an ACL rule 
> exists that allows this principal to access "/monitor" (and hence all 
> subpaths of it).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4785) Reorganize ACL subject/object descriptions

2016-04-29 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-4785:
--
Shepherd: Alexander Rukletsov  (was: Adam B)

cc: [~alexr], [~neilc]

> Reorganize ACL subject/object descriptions
> --
>
> Key: MESOS-4785
> URL: https://issues.apache.org/jira/browse/MESOS-4785
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Greg Mann
>Assignee: Alexander Rojas
>  Labels: documentation, mesosphere, security
> Fix For: 0.29.0
>
>
> The authorization documentation would benefit from a reorganization of the 
> ACL subject/object descriptions. Instead of simple lists of the available 
> subjects and objects, it would be nice to see a table showing which subject 
> and object is used with each action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5273) Document "/flags" endpoint authorization as in MESOS-4785

2016-04-29 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-5273:
--
Shepherd: Alexander Rukletsov
Story Points: 2  (was: 1)

cc: [~neilc]

> Document "/flags" endpoint authorization as in MESOS-4785
> -
>
> Key: MESOS-5273
> URL: https://issues.apache.org/jira/browse/MESOS-5273
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>  Labels: documentation, mesosphere, security
> Fix For: 0.29.0
>
>
> MESOS-4785 reorganizes the documentation of the authorization features that 
> are available in Mesos. The authorization of the "/flags" endpoint, 
> introduced in MESOS-5142 needs to be documented in a similar way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5273) Document "/flags" endpoint authorization as in MESOS-4785

2016-04-29 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263714#comment-15263714
 ] 

Adam B commented on MESOS-5273:
---

See Joerg's patch: https://reviews.apache.org/r/46735/

> Document "/flags" endpoint authorization as in MESOS-4785
> -
>
> Key: MESOS-5273
> URL: https://issues.apache.org/jira/browse/MESOS-5273
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>  Labels: documentation, mesosphere, security
> Fix For: 0.29.0
>
>
> MESOS-4785 reorganizes the documentation of the authorization features that 
> are available in Mesos. The authorization of the "/flags" endpoint, 
> introduced in MESOS-5142 needs to be documented in a similar way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5299) Support hierarchiy based matching of HTTP endpoint authorization requests.

2016-04-29 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-5299:
--
Labels: acl authorization mesosphere security  (was: acl authorization 
security)

> Support hierarchiy based matching of HTTP endpoint authorization requests.
> --
>
> Key: MESOS-5299
> URL: https://issues.apache.org/jira/browse/MESOS-5299
> Project: Mesos
>  Issue Type: Task
>  Components: security
>Reporter: Jan Schlicht
>  Labels: acl, authorization, mesosphere, security
>
> The current HTTP endpoint authorization (e.g. the GET_ENDPOINT_WITH_PATH 
> action) works by matching the request's object with entries in the ACL. This 
> could be loosened to support hierarchies, for example a principal trying to 
> access "/monitor/statistics" could be authorized to do so if an ACL rule 
> exists that allows this principal to access "/monitor" (and hence all 
> subpaths of it).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5299) Support hierarchiy based matching of HTTP endpoint authorization requests.

2016-04-29 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-5299:
--
Fix Version/s: (was: 0.29.0)

> Support hierarchiy based matching of HTTP endpoint authorization requests.
> --
>
> Key: MESOS-5299
> URL: https://issues.apache.org/jira/browse/MESOS-5299
> Project: Mesos
>  Issue Type: Task
>  Components: security
>Reporter: Jan Schlicht
>  Labels: acl, authorization, mesosphere, security
>
> The current HTTP endpoint authorization (e.g. the GET_ENDPOINT_WITH_PATH 
> action) works by matching the request's object with entries in the ACL. This 
> could be loosened to support hierarchies, for example a principal trying to 
> access "/monitor/statistics" could be authorized to do so if an ACL rule 
> exists that allows this principal to access "/monitor" (and hence all 
> subpaths of it).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5309) PortMappingIsolatorTest.ROOT_NC_ContainerToContainerTCP failed.

2016-04-29 Thread Gilbert Song (JIRA)
Gilbert Song created MESOS-5309:
---

 Summary: PortMappingIsolatorTest.ROOT_NC_ContainerToContainerTCP 
failed.
 Key: MESOS-5309
 URL: https://issues.apache.org/jira/browse/MESOS-5309
 Project: Mesos
  Issue Type: Bug
  Components: isolation
 Environment: Fedora 23 with network isolator enabled
Reporter: Gilbert Song


Here is the log:
{code}
[01:22:18] : [ RUN  ] 
PortMappingIsolatorTest.ROOT_NC_ContainerToContainerTCP
[01:22:18]W: I0429 01:22:18.416817 24850 port_mapping_tests.cpp:229] Using eth0 
as the public interface
[01:22:18]W: I0429 01:22:18.417135 24850 port_mapping_tests.cpp:237] Using lo 
as the loopback interface
[01:22:18]W: I0429 01:22:18.429095 24850 resources.cpp:572] Parsing resources 
as JSON failed: 
cpus:2;mem:1024;disk:1024;ephemeral_ports:[30001-30999];ports:[31000-32000]
[01:22:18]W: Trying semicolon-delimited string format instead
[01:22:18]W: I0429 01:22:18.430194 24850 port_mapping.cpp:1557] Using eth0 as 
the public interface
[01:22:18]W: I0429 01:22:18.430490 24850 port_mapping.cpp:1582] Using lo as the 
loopback interface
[01:22:18]W: I0429 01:22:18.431619 24850 port_mapping.cpp:1869] 
/proc/sys/net/ipv4/neigh/default/gc_thresh3 = '1024'
[01:22:18]W: I0429 01:22:18.431668 24850 port_mapping.cpp:1869] 
/proc/sys/net/ipv4/neigh/default/gc_thresh1 = '128'
[01:22:18]W: I0429 01:22:18.431723 24850 port_mapping.cpp:1869] 
/proc/sys/net/ipv4/tcp_wmem = '4096 16384   4194304'
[01:22:18]W: I0429 01:22:18.431761 24850 port_mapping.cpp:1869] 
/proc/sys/net/ipv4/tcp_synack_retries = '5'
[01:22:18]W: I0429 01:22:18.431797 24850 port_mapping.cpp:1869] 
/proc/sys/net/core/rmem_max = '212992'
[01:22:18]W: I0429 01:22:18.431830 24850 port_mapping.cpp:1869] 
/proc/sys/net/core/somaxconn = '128'
[01:22:18]W: I0429 01:22:18.431864 24850 port_mapping.cpp:1869] 
/proc/sys/net/core/wmem_max = '212992'
[01:22:18]W: I0429 01:22:18.431900 24850 port_mapping.cpp:1869] 
/proc/sys/net/ipv4/tcp_rmem = '4096 87380   6291456'
[01:22:18]W: I0429 01:22:18.431933 24850 port_mapping.cpp:1869] 
/proc/sys/net/ipv4/tcp_keepalive_time = '7200'
[01:22:18]W: I0429 01:22:18.431967 24850 port_mapping.cpp:1869] 
/proc/sys/net/ipv4/neigh/default/gc_thresh2 = '512'
[01:22:18]W: I0429 01:22:18.432001 24850 port_mapping.cpp:1869] 
/proc/sys/net/core/netdev_max_backlog = '1000'
[01:22:18]W: I0429 01:22:18.432036 24850 port_mapping.cpp:1869] 
/proc/sys/net/ipv4/tcp_keepalive_intvl = '75'
[01:22:18]W: I0429 01:22:18.432070 24850 port_mapping.cpp:1869] 
/proc/sys/net/ipv4/tcp_keepalive_probes = '9'
[01:22:18]W: I0429 01:22:18.432101 24850 port_mapping.cpp:1869] 
/proc/sys/net/ipv4/tcp_max_syn_backlog = '512'
[01:22:18]W: I0429 01:22:18.432134 24850 port_mapping.cpp:1869] 
/proc/sys/net/ipv4/tcp_retries2 = '15'
[01:22:18]W: F0429 01:22:18.432205 24850 port_mapping_tests.cpp:448] 
CHECK_SOME(isolator): Failed to get realpath for bind mount root 
'/var/run/netns': Not found 
[01:22:18]W: *** Check failure stack trace: ***
[01:22:18]W: @ 0x7fc8dccfc986  google::LogMessage::Fail()
[01:22:18]W: @ 0x7fc8dccfc8df  google::LogMessage::SendToLog()
[01:22:18]W: @ 0x7fc8dccfc2d5  google::LogMessage::Flush()
[01:22:18]W: @ 0x7fc8dccff146  
google::LogMessageFatal::~LogMessageFatal()
[01:22:18]W: @   0xa6031d  _CheckFatal::~_CheckFatal()
[01:22:18]W: @  0x1890d61  
mesos::internal::tests::PortMappingIsolatorTest_ROOT_NC_ContainerToContainerTCP_Test::TestBody()
[01:22:18]W: @  0x19437a8  
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
[01:22:18]W: @  0x193e81a  
testing::internal::HandleExceptionsInMethodIfSupported<>()
[01:22:18]W: @  0x191f264  testing::Test::Run()
[01:22:18]W: @  0x191fa1c  testing::TestInfo::Run()
[01:22:18]W: @  0x192006d  testing::TestCase::Run()
[01:22:18]W: @  0x1926bab  
testing::internal::UnitTestImpl::RunAllTests()
[01:22:18]W: @  0x194446f  
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
[01:22:18]W: @  0x193f35a  
testing::internal::HandleExceptionsInMethodIfSupported<>()
[01:22:18]W: @  0x1925887  testing::UnitTest::Run()
[01:22:18]W: @   0xf9131d  RUN_ALL_TESTS()
[01:22:18]W: @   0xf90f15  main
[01:22:18]W: @ 0x7fc8d68d8580  __libc_start_main
[01:22:18]W: @   0xa5e919  _start
[01:22:19]W: /mnt/teamcity/temp/agentTmp/custom_script1282998915150293546: line 
3: 24850 Aborted (core dumped) GLOG_v=1 ./bin/mesos-tests.sh 
--verbose --gtest_filter="$GTEST_FILTER"
[01:22:19]W: Process exited with code 134
[01:22:19]i: ##teamcity[buildStatisticValue 
key='buildStageDuration:buildStepRUNNER_299' value='815124.0']
[01:22:19]E: Step Run tests (Command Line) failed
[01:22:19]i: ##teamcity[buildStatisticValue 

[jira] [Commented] (MESOS-5308) ROOT_XFS_QuotaTest.NoCheckpointRecovery failed.

2016-04-29 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263621#comment-15263621
 ] 

Gilbert Song commented on MESOS-5308:
-

cc [~jpe...@apache.org], you may have more context on it. Would you mind to 
take a look?

cc [~xujyan], [~dlm]

> ROOT_XFS_QuotaTest.NoCheckpointRecovery failed.
> ---
>
> Key: MESOS-5308
> URL: https://issues.apache.org/jira/browse/MESOS-5308
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
> Environment: Fedora 23 with/without SSL
>Reporter: Gilbert Song
>  Labels: isolation
>
> Here is the log:
> {code}
> [01:07:51] :   [Step 10/10] [ RUN  ] 
> ROOT_XFS_QuotaTest.NoCheckpointRecovery
> [01:07:51] :   [Step 10/10] meta-data=/dev/loop0 isize=512
> agcount=2, agsize=5120 blks
> [01:07:51] :   [Step 10/10]  =   sectsz=512   
> attr=2, projid32bit=1
> [01:07:51] :   [Step 10/10]  =   crc=1
> finobt=1, sparse=0
> [01:07:51] :   [Step 10/10] data =   bsize=4096   
> blocks=10240, imaxpct=25
> [01:07:51] :   [Step 10/10]  =   sunit=0  
> swidth=0 blks
> [01:07:51] :   [Step 10/10] naming   =version 2  bsize=4096   
> ascii-ci=0 ftype=1
> [01:07:51] :   [Step 10/10] log  =internal log   bsize=4096   
> blocks=855, version=2
> [01:07:51] :   [Step 10/10]  =   sectsz=512   
> sunit=0 blks, lazy-count=1
> [01:07:51] :   [Step 10/10] realtime =none   extsz=4096   
> blocks=0, rtextents=0
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.690585 17604 cluster.cpp:149] 
> Creating default 'local' authorizer
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.706126 17604 leveldb.cpp:174] 
> Opened db in 15.452988ms
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707135 17604 leveldb.cpp:181] 
> Compacted db in 984939ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707154 17604 leveldb.cpp:196] 
> Created db iterator in 4159ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707159 17604 leveldb.cpp:202] 
> Seeked to beginning of db in 517ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707165 17604 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 305ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707176 17604 replica.cpp:779] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707320 17621 recover.cpp:447] 
> Starting replica recovery
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707381 17621 recover.cpp:473] 
> Replica is in EMPTY status
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707638 17619 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request from 
> (17889)@172.30.2.13:37618
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707732 17624 recover.cpp:193] 
> Received a recover response from a replica in EMPTY status
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707885 17624 recover.cpp:564] 
> Updating replica status to STARTING
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.708389 17618 master.cpp:382] 
> Master 0c1e0a50-1212-4104-a148-661131b79f27 
> (ip-172-30-2-13.ec2.internal.mesosphere.io) started on 172.30.2.13:37618
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.708406 17618 master.cpp:384] Flags 
> at startup: --acls="" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate="true" 
> --authenticate_http="true" --authenticate_http_frameworks="true" 
> --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" 
> --credentials="/mnt/teamcity/temp/buildTmp/ROOT_XFS_QuotaTest_NoCheckpointRecovery_ZsRNg9/mnt/credentials"
>  --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/mnt/teamcity/temp/buildTmp/ROOT_XFS_QuotaTest_NoCheckpointRecovery_ZsRNg9/mnt/master"
>  --zk_session_timeout="10secs"
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.708510 17618 master.cpp:433] 
> Master only allowing authenticated frameworks to register
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.708518 17618 master.cpp:439] 
> Master only allowing authenticated agents 

[jira] [Issue Comment Deleted] (MESOS-5185) Accessibility for Mesos Web UI

2016-04-29 Thread Chen Nan Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Nan Li updated MESOS-5185:
---
Comment: was deleted

(was: we can use the VoiceOver or other screen reader with manual test to 
verify fixes)

> Accessibility for Mesos Web UI
> --
>
> Key: MESOS-5185
> URL: https://issues.apache.org/jira/browse/MESOS-5185
> Project: Mesos
>  Issue Type: Epic
>  Components: webui
>Reporter: haosdent
>Assignee: Chen Nan Li
>Priority: Minor
>
> Currently, Mesos Web UI do not kindly support Accessibility features for
> disabled people.
> For example:
> Web GUI can support screen reader to read page content for blind person.
> so we can fix some issues such as making Mesos Web GUI pages to support
> [WAI-ARIA standard | https://www.w3.org/WAI/intro/aria]
> We could update webui according to [Accessibility Design Guidelines for the 
> Web|https://msdn.microsoft.com/en-us/library/aa291312(v=vs.71).aspx] and 
> https://www.w3.org/standards/webdesign/accessibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5201) Accessibility Enhancement For Page HTML page

2016-04-29 Thread Chen Nan Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Nan Li updated MESOS-5201:
---
Summary: Accessibility Enhancement For Page HTML page  (was: Accessibility 
Enhancement For Page "Mesos")

> Accessibility Enhancement For Page HTML page
> 
>
> Key: MESOS-5201
> URL: https://issues.apache.org/jira/browse/MESOS-5201
> Project: Mesos
>  Issue Type: Task
>Reporter: Chen Nan Li
>Assignee: Chen Nan Li
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)