[jira] [Updated] (MESOS-7701) Support encoded credentials file

2017-06-24 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-7701:

Labels: security  (was: )

> Support encoded credentials file
> 
>
> Key: MESOS-7701
> URL: https://issues.apache.org/jira/browse/MESOS-7701
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Timothy Chen
>  Labels: security
>
> Mesos credentials file currently only supports passwords in plain text / JSON 
> files. 
> To support some enterprises that has no clear text passwords policy, the 
> credentials file loading should support reading in an encoded (e.g: Base64) 
> format.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7701) Support encoded credentials file

2017-06-21 Thread Timothy Chen (JIRA)
Timothy Chen created MESOS-7701:
---

 Summary: Support encoded credentials file
 Key: MESOS-7701
 URL: https://issues.apache.org/jira/browse/MESOS-7701
 Project: Mesos
  Issue Type: Improvement
Reporter: Timothy Chen


Mesos credentials file currently only supports passwords in plain text / JSON 
files. 
To support some enterprises that has no clear text passwords policy, the 
credentials file loading should support reading in an encoded (e.g: Base64) 
format.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-6458) Add test to check fromString function of stout library

2016-10-27 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-6458:

Shepherd: Timothy Chen

> Add test to check fromString function of stout library
> --
>
> Key: MESOS-6458
> URL: https://issues.apache.org/jira/browse/MESOS-6458
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout
>Affects Versions: 1.0.1
>Reporter: Manuwela Kanade
>Assignee: Manuwela Kanade
>Priority: Trivial
>
> For the 3rdparty stout library, there is a testcase for checking Malformed 
> UUID. 
> But this testcase does not have a positive test for the fromString function 
> to test if it returns correct UUID when passed a correctly formatted UUID 
> string.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3435) Add containerizer support for hyper

2016-05-13 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282470#comment-15282470
 ] 

Timothy Chen commented on MESOS-3435:
-

Sounds like the hyper folks are improving their APIs for easier integration 
with Mesos, [~haosd...@gmail.com] is the module for containerization merged now?

> Add containerizer support for hyper
> ---
>
> Key: MESOS-3435
> URL: https://issues.apache.org/jira/browse/MESOS-3435
> Project: Mesos
>  Issue Type: Story
>Reporter: Deshi Xiao
>Assignee: haosdent
>
> Secure as hypervisor, fast and easily used as Docker. This is hyper. 
> https://docs.hyper.sh/Introduction/what_is_hyper_.html We could implement 
> this through module way once MESOS-3709 finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2706) When the docker-tasks grow, the time spare between Queuing task and Starting container grows

2016-03-31 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220889#comment-15220889
 ] 

Timothy Chen commented on MESOS-2706:
-

Btw we already changed stats reading to read from cgroups instead of 
perf/stats, so this should ideally been fixed. Can [~kairu1987] you help verify?

> When the docker-tasks grow, the time spare between Queuing task and Starting 
> container grows
> 
>
> Key: MESOS-2706
> URL: https://issues.apache.org/jira/browse/MESOS-2706
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.22.0
> Environment: My Environment info:
> Mesos 0.22.0 & Marathon 0.82-RC1 both running in one host-server.
> Every docker-task require 0.02 CPU and 128MB ,and the server has 8 cpus and 
> 24G mems.
> So Mesos can launch thousands of task in theory.
> And the docker-task is very light-weight to launch a sshd service .
>Reporter: chenqiuhao
>
> At the beginning, Marathon can launch docker-task very fast,but when the 
> number of tasks in the only-one mesos-slave host reached 50,It seemed 
> Marathon lauch docker-task slow.
> So I check the mesos-slave log,and I found that the time spare between 
> Queuing task and Starting container grew .
> For example, 
> launch the 1st docker task, it takes about 0.008s
> [root@CNSH231434 mesos-slave]# tail -f slave.out |egrep 'Queuing 
> task|Starting container'
> I0508 15:54:00.188350 225779 slave.cpp:1378] Queuing task 
> 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' for executor 
> dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b of framework 
> '20150202-112355-2684495626-5050-26153-
> I0508 15:54:00.196832 225781 docker.cpp:581] Starting container 
> 'd0b0813a-6cb6-4dfd-bbce-f1b338744285' for task 
> 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' (and executor 
> 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b') of framework 
> '20150202-112355-2684495626-5050-26153-'
> launch the 50th docker task, it takes about 4.9s
> I0508 16:12:10.908596 225781 slave.cpp:1378] Queuing task 
> 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' for executor 
> dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b of framework 
> '20150202-112355-2684495626-5050-26153-
> I0508 16:12:15.801503 225778 docker.cpp:581] Starting container 
> '482dd47f-b9ab-4b09-b89e-e361d6f004a4' for task 
> 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' (and executor 
> 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b') of framework 
> '20150202-112355-2684495626-5050-26153-'
> And when i launch the 100th docker task,it takes about 13s!
> And I did the same test in one 24 Cpus and 256G mems server-host, it got the 
> same result.
> Did somebody have the same experience , or Can help to do the same pressure 
> test ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2369) Segfault when mesos-slave tries to clean up docker containers on startup

2016-03-31 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220887#comment-15220887
 ] 

Timothy Chen commented on MESOS-2369:
-

This should already been fixed now, please re-open when it occurs again.

> Segfault when mesos-slave tries to clean up docker containers on startup
> 
>
> Key: MESOS-2369
> URL: https://issues.apache.org/jira/browse/MESOS-2369
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.21.1
> Environment: Debian Jessie, mesos package 0.21.1-1.2.debian77 
> docker 1.3.2 build 39fa2fa
>Reporter: Pas
>
> I did a gdb backtrace, it seems like a stack overflow due to a bit too much 
> recursion.
> The interesting aspect is that after running mesos-slave with strace -f -b 
> execve it successfully proceeded with the docker cleanup. However, there were 
> a few strace sessions (on other slaves) where I was able to observe the 
> SIGSEGV, and it was around (or a bit before) the "docker ps -a" call, because 
> docker got a broken pipe shortly, then got killed by the propagating SIGSEGV 
> signal.
> {code}
> 
> #59296 0x76e7cd98 in process::Future 
> process::Future long>::then(std::tr1::function (unsigned long const&)> const&) const () from 
> /usr/local/lib/libmesos-0.21.1.so
> #59297 0x76e4f5d3 in process::io::internal::_read(int, 
> std::tr1::shared_ptr const&, boost::shared_array const&, 
> unsigned long) () from /usr/local/lib/libmesos-0.21.1.so
> #59298 0x76e5012c in process::io::internal::__read(unsigned long, 
> int, std::tr1::shared_ptr const&, boost::shared_array 
> const&, unsigned long) () from /usr/local/lib/libmesos-0.21.1.so
> #59299 0x76e53000 in 
> std::tr1::_Function_handler const&), std::tr1::_Bind (*(std::tr1::_Placeholder<1>, int, std::tr1::shared_ptr, 
> boost::shared_array, unsigned long))(unsigned long, int, 
> std::tr1::shared_ptr const&, boost::shared_array const&, 
> unsigned long)> >::_M_invoke(std::tr1::_Any_data const&, unsigned long 
> const&) () from /usr/local/lib/libmesos-0.21.1.so
> #59300 0x76e7d23b in void process::internal::thenf std::string>(std::tr1::shared_ptr const&, 
> std::tr1::function 
> const&, process::Future const&) ()
>from /usr/local/lib/libmesos-0.21.1.so
> #59301 0x7689ee60 in process::Future long>::onAny(std::tr1::function const&)> 
> const&) const () from /usr/local/lib/libmesos-0.21.1.so
> #59302 0x76e7cd98 in process::Future 
> process::Future long>::then(std::tr1::function (unsigned long const&)> const&) const () from 
> /usr/local/lib/libmesos-0.21.1.so
> #59303 0x76e4f5d3 in process::io::internal::_read(int, 
> std::tr1::shared_ptr const&, boost::shared_array const&, 
> unsigned long) () from /usr/local/lib/libmesos-0.21.1.so
> #59304 0x76e5012c in process::io::internal::__read(unsigned long, 
> int, std::tr1::shared_ptr const&, boost::shared_array 
> const&, unsigned long) () from /usr/local/lib/libmesos-0.21.1.so
> #59305 0x76e53000 in 
> std::tr1::_Function_handler const&), std::tr1::_Bind (*(std::tr1::_Placeholder<1>, int, std::tr1::shared_ptr, 
> boost::shared_array, unsigned long))(unsigned long, int, 
> std::tr1::shared_ptr const&, boost::shared_array const&, 
> unsigned long)> >::_M_invoke(std::tr1::_Any_data const&, unsigned long 
> const&) () from /usr/local/lib/libmesos-0.21.1.so
> #59306 0x76e7d23b in void process::internal::thenf std::string>(std::tr1::shared_ptr const&, 
> std::tr1::function 
> const&, process::Future const&) ()
>from /usr/local/lib/libmesos-0.21.1.so
> #59307 0x7689ee60 in process::Future long>::onAny(std::tr1::function const&)> 
> const&) const () from /usr/local/lib/libmesos-0.21.1.so
> #59308 0x76e7cd98 in process::Future 
> process::Future long>::then(std::tr1::function (unsigned long const&)> const&) const () from 
> /usr/local/lib/libmesos-0.21.1.so
> #59309 0x76e4f5d3 in process::io::internal::_read(int, 
> std::tr1::shared_ptr const&, boost::shared_array const&, 
> unsigned long) () from /usr/local/lib/libmesos-0.21.1.so
> #59310 0x76e5012c in process::io::internal::__read(unsigned long, 
> int, std::tr1::shared_ptr const&, boost::shared_array 
> const&, unsigned long) () from /usr/local/lib/libmesos-0.21.1.so
> #59311 0x76e53000 in 
> std::tr1::_Function_handler const&), std::tr1::_Bind (*(std::tr1::_Placeholder<1>, int, std::tr1::shared_ptr, 
> 

[jira] [Commented] (MESOS-4370) NetworkSettings.IPAddress field is deprecated in Docker

2016-03-10 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15190229#comment-15190229
 ] 

Timothy Chen commented on MESOS-4370:
-

commit c65d06791f0f3651f5cd1f1ad9f4955f03677795
Author: Timothy Chen 
Date:   Mon Mar 7 23:40:55 2016 -0800

Fixed parsing network ip address with docker.

Review: https://reviews.apache.org/r/44531

> NetworkSettings.IPAddress field is deprecated in Docker
> ---
>
> Key: MESOS-4370
> URL: https://issues.apache.org/jira/browse/MESOS-4370
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0, 0.26.0, 0.27.0
> Environment: Ubuntu 14.04
> Docker 1.9.1, Docker 1.10.x
>Reporter: Clint Armstrong
>Assignee: Travis Hegner
>  Labels: Blocker
> Fix For: 0.28.0
>
>
> The latest docker API deprecates the NetworkSettings.IPAddress field, in 
> favor of the NetworkSettings.Networks field.
> https://docs.docker.com/engine/reference/api/docker_remote_api/#v1-21-api-changes
> With this deprecation, NetworkSettings.IPAddress is not populated for 
> containers running with networks that use new network plugins.
> As a result the mesos API has no data in 
> container_status.network_infos.ip_address or 
> container_status.network_infos.ipaddresses.
> The immediate impact of this is that mesos-dns is unable to retrieve a 
> containers IP from the netinfo interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4370) NetworkSettings.IPAddress field is deprecated in Docker

2016-03-10 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen reassigned MESOS-4370:
---

Assignee: Timothy Chen  (was: Travis Hegner)

> NetworkSettings.IPAddress field is deprecated in Docker
> ---
>
> Key: MESOS-4370
> URL: https://issues.apache.org/jira/browse/MESOS-4370
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0, 0.26.0, 0.27.0
> Environment: Ubuntu 14.04
> Docker 1.9.1, Docker 1.10.x
>Reporter: Clint Armstrong
>Assignee: Timothy Chen
>  Labels: Blocker
> Fix For: 0.28.0
>
>
> The latest docker API deprecates the NetworkSettings.IPAddress field, in 
> favor of the NetworkSettings.Networks field.
> https://docs.docker.com/engine/reference/api/docker_remote_api/#v1-21-api-changes
> With this deprecation, NetworkSettings.IPAddress is not populated for 
> containers running with networks that use new network plugins.
> As a result the mesos API has no data in 
> container_status.network_infos.ip_address or 
> container_status.network_infos.ipaddresses.
> The immediate impact of this is that mesos-dns is unable to retrieve a 
> containers IP from the netinfo interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4370) NetworkSettings.IPAddress field is deprecated in Docker

2016-03-08 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-4370:

Fix Version/s: 0.28.0

> NetworkSettings.IPAddress field is deprecated in Docker
> ---
>
> Key: MESOS-4370
> URL: https://issues.apache.org/jira/browse/MESOS-4370
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0, 0.26.0, 0.27.0
> Environment: Ubuntu 14.04
> Docker 1.9.1, Docker 1.10.x
>Reporter: Clint Armstrong
>Assignee: Travis Hegner
>  Labels: Blocker
> Fix For: 0.28.0
>
>
> The latest docker API deprecates the NetworkSettings.IPAddress field, in 
> favor of the NetworkSettings.Networks field.
> https://docs.docker.com/engine/reference/api/docker_remote_api/#v1-21-api-changes
> With this deprecation, NetworkSettings.IPAddress is not populated for 
> containers running with networks that use new network plugins.
> As a result the mesos API has no data in 
> container_status.network_infos.ip_address or 
> container_status.network_infos.ipaddresses.
> The immediate impact of this is that mesos-dns is unable to retrieve a 
> containers IP from the netinfo interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4370) NetworkSettings.IPAddress field is deprecated in Docker

2016-03-08 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-4370:

Shepherd: Timothy Chen  (was: Kapil Arya)

> NetworkSettings.IPAddress field is deprecated in Docker
> ---
>
> Key: MESOS-4370
> URL: https://issues.apache.org/jira/browse/MESOS-4370
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0, 0.26.0, 0.27.0
> Environment: Ubuntu 14.04
> Docker 1.9.1, Docker 1.10.x
>Reporter: Clint Armstrong
>Assignee: Travis Hegner
>  Labels: Blocker
>
> The latest docker API deprecates the NetworkSettings.IPAddress field, in 
> favor of the NetworkSettings.Networks field.
> https://docs.docker.com/engine/reference/api/docker_remote_api/#v1-21-api-changes
> With this deprecation, NetworkSettings.IPAddress is not populated for 
> containers running with networks that use new network plugins.
> As a result the mesos API has no data in 
> container_status.network_infos.ip_address or 
> container_status.network_infos.ipaddresses.
> The immediate impact of this is that mesos-dns is unable to retrieve a 
> containers IP from the netinfo interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4869) /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory

2016-03-05 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15181914#comment-15181914
 ] 

Timothy Chen commented on MESOS-4869:
-

Btw what's your slave memory usage?

> /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory
> ---
>
> Key: MESOS-4869
> URL: https://issues.apache.org/jira/browse/MESOS-4869
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27.1
>Reporter: Anthony Scalisi
>Priority: Critical
>
> We switched our health checks in Marathon from HTTP to COMMAND:
> {noformat}
> "healthChecks": [
> {
>   "protocol": "COMMAND",
>   "path": "/ops/ping",
>   "command": { "value": "curl --silent -f -X GET 
> http://$HOST:$PORT0/ops/ping > /dev/null" },
>   "gracePeriodSeconds": 90,
>   "intervalSeconds": 2,
>   "portIndex": 0,
>   "timeoutSeconds": 5,
>   "maxConsecutiveFailures": 3
> }
>   ]
> {noformat}
> All our applications have the same health check (and /ops/ping endpoint).
> Even though we have the issue on all our Meos slaves, I'm going to focus on a 
> particular one: *mesos-slave-i-e3a9c724*.
> The slave has 16 gigs of memory, with about 12 gigs allocated for 8 tasks:
> !https://i.imgur.com/gbRf804.png!
> Here is a *docker ps* on it:
> {noformat}
> root@mesos-slave-i-e3a9c724 # docker ps
> CONTAINER IDIMAGE   COMMAND  CREATED  
>STATUS  PORTS NAMES
> 4f7c0aa8d03ajava:8  "/bin/sh -c 'JAVA_OPT"   6 hours ago  
>Up 6 hours  0.0.0.0:31926->8080/tcp   
> mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d
> 66f2fc8f8056java:8  "/bin/sh -c 'JAVA_OPT"   6 hours ago  
>Up 6 hours  0.0.0.0:31939->8080/tcp   
> mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a
> f7382f241fcejava:8  "/bin/sh -c 'JAVA_OPT"   6 hours ago  
>Up 6 hours  0.0.0.0:31656->8080/tcp   
> mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d
> 880934c0049ejava:8  "/bin/sh -c 'JAVA_OPT"   24 hours ago 
>Up 24 hours 0.0.0.0:31371->8080/tcp   
> mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0
> 5eab1f8dac4ajava:8  "/bin/sh -c 'JAVA_OPT"   46 hours ago 
>Up 46 hours 0.0.0.0:31500->8080/tcp   
> mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5ac75198-283f-4349-a220-9e9645b313e7
> b63740fe56e7java:8  "/bin/sh -c 'JAVA_OPT"   46 hours ago 
>Up 46 hours 0.0.0.0:31382->8080/tcp   
> mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe
> 5c7a9ea77b0ejava:8  "/bin/sh -c 'JAVA_OPT"   2 days ago   
>Up 2 days   0.0.0.0:31186->8080/tcp   
> mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4
> 53065e7a31adjava:8  "/bin/sh -c 'JAVA_OPT"   2 days ago   
>Up 2 days   0.0.0.0:31839->8080/tcp   
> mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c
> {noformat}
> Here is a *docker stats* on it:
> {noformat}
> root@mesos-slave-i-e3a9c724  # docker stats
> CONTAINER   CPU %   MEM USAGE / LIMIT MEM %   
> NET I/O   BLOCK I/O
> 4f7c0aa8d03a2.93%   797.3 MB / 1.611 GB   49.50%  
> 1.277 GB / 1.189 GB   155.6 kB / 151.6 kB
> 53065e7a31ad8.30%   738.9 MB / 1.611 GB   45.88%  
> 419.6 MB / 554.3 MB   98.3 kB / 61.44 kB
> 5c7a9ea77b0e4.91%   1.081 GB / 1.611 GB   67.10%  
> 423 MB / 526.5 MB 3.219 MB / 61.44 kB
> 5eab1f8dac4a3.13%   1.007 GB / 1.611 GB   62.53%  
> 2.737 GB / 2.564 GB   6.566 MB / 118.8 kB
> 66f2fc8f80563.15%   768.1 MB / 1.611 GB   47.69%  
> 258.5 MB / 252.8 MB   1.86 MB / 151.6 kB
> 880934c0049e10.07%  735.1 MB / 1.611 GB   45.64%  
> 1.451 GB / 1.399 GB   573.4 kB / 94.21 kB
> b63740fe56e712.04%  629 MB / 1.611 GB 39.06%  
> 10.29 GB / 9.344 GB   8.102 MB / 61.44 kB
> f7382f241fce6.21%   505 MB / 1.611 GB 31.36%  
> 153.4 MB / 151.9 MB   5.837 MB / 94.21 kB
> {noformat}
> Not much else is running on the slave, yet the used memory doesn't map to the 
> tasks memory:
> {noformat}
> Mem:16047M used:13340M buffers:1139M cache:776M
> {noformat}
> If I exec into the container (*java:8* image), I can see correctly the shell 
> calls to execute the curl 

[jira] [Commented] (MESOS-4862) Setting failover_timeout in FrameworkInfo to Double.MAX_VALUE causes it to be set to zero

2016-03-03 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179057#comment-15179057
 ] 

Timothy Chen commented on MESOS-4862:
-

I think we should make framework timeout int64 instead of a double, I don't 
think we really need that much precision for failover timeout, and the cost of 
doing so causes problem like this.

> Setting failover_timeout in FrameworkInfo to Double.MAX_VALUE causes it to be 
> set to zero
> -
>
> Key: MESOS-4862
> URL: https://issues.apache.org/jira/browse/MESOS-4862
> Project: Mesos
>  Issue Type: Bug
>  Components: master, stout
>Reporter: Timothy Chen
>
> Currently we expose framework failover_timeout as a double in Proto, and if 
> users set the failover_timeout to Double.MAX_VALUE, the Master will actually 
> set it to zero which is the complete opposite of the original intent.
> The problem is that in stout/duration.hpp we only store down to the 
> nanoseconds with int64_t, and it gives an error when we pass double.max as it 
> goes out of the int64_t bounds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4862) Setting failover_timeout in FrameworkInfo to Double.MAX_VALUE causes it to be set to zero

2016-03-03 Thread Timothy Chen (JIRA)
Timothy Chen created MESOS-4862:
---

 Summary: Setting failover_timeout in FrameworkInfo to 
Double.MAX_VALUE causes it to be set to zero
 Key: MESOS-4862
 URL: https://issues.apache.org/jira/browse/MESOS-4862
 Project: Mesos
  Issue Type: Bug
  Components: master, stout
Reporter: Timothy Chen


Currently we expose framework failover_timeout as a double in Proto, and if 
users set the failover_timeout to Double.MAX_VALUE, the Master will actually 
set it to zero which is the complete opposite of the original intent.

The problem is that in stout/duration.hpp we only store down to the nanoseconds 
with int64_t, and it gives an error when we pass double.max as it goes out of 
the int64_t bounds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-314) Support the cgroups 'cpusets' subsystem.

2016-02-19 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154830#comment-15154830
 ] 

Timothy Chen commented on MESOS-314:


Not sure how it's going to be advertised yet, but I imagine a scheduler can 
just pick one if users don't have a preference.

> Support the cgroups 'cpusets' subsystem.
> 
>
> Key: MESOS-314
> URL: https://issues.apache.org/jira/browse/MESOS-314
> Project: Mesos
>  Issue Type: Story
>Reporter: Benjamin Mahler
>  Labels: twitter
>
> We'd like to add support for the cpusets subsystem, in order to support 
> pinning to cpus.
> This has several potential benefits:
> 1. Improved isolation against other tenants, when given exclusive access to 
> cores.
> 2. Improved performance, if pinned to several cores with good locality in the 
> CPU topology.
> 3. An alternative / complement to CFS for applying an upper limit on CPU 
> usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3413) Docker containerizer does not symlink persistent volumes into sandbox

2016-02-18 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153205#comment-15153205
 ] 

Timothy Chen commented on MESOS-3413:
-

commit 541b3d963cccf07e979ce5362cbb6ace0144f31a
Author: Timothy Chen 
Date:   Fri Jan 29 18:09:52 2016 -0500

Fixed persistent volumes with docker tasks.

Review: https://reviews.apache.org/r/43015

> Docker containerizer does not symlink persistent volumes into sandbox
> -
>
> Key: MESOS-3413
> URL: https://issues.apache.org/jira/browse/MESOS-3413
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker, slave
>Affects Versions: 0.23.0
>Reporter: Max Neunhöffer
>Assignee: Timothy Chen
>  Labels: docker, mesosphere, persistent-volumes
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> For the ArangoDB framework I am trying to use the persistent primitives. 
> nearly all is working, but I am missing a crucial piece at the end: I have 
> successfully created a persistent disk resource and have set the persistence 
> and volume information in the DiskInfo message. However, I do not see any way 
> to find out what directory on the host the mesos slave has reserved for us. I 
> know it is ${MESOS_SLAVE_WORKDIR}/volumes/roles//_ but we 
> have no way to query this information anywhere. The docker containerizer does 
> not automatically mount this directory into our docker container, or symlinks 
> it into our sandbox. Therefore, I have essentially no access to it. Note that 
> the mesos containerizer (which I cannot use for other reasons) seems to 
> create a symlink in the sandbox to the actual path for the persistent volume. 
> With that, I could mount the volume into our docker container and all would 
> be well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4285) Mesos command task doesn't support volumes with image

2016-02-08 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137288#comment-15137288
 ] 

Timothy Chen commented on MESOS-4285:
-

commit 4c9e3f419d9e74dce9a84a8f6f140dd4631bf0c0
Author: Timothy Chen 
Date:   Sun Feb 7 17:42:37 2016 +0800

Fixed volume paths for command tasks with image.

Review: https://reviews.apache.org/r/42278/


> Mesos command task doesn't support volumes with image
> -
>
> Key: MESOS-4285
> URL: https://issues.apache.org/jira/browse/MESOS-4285
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>  Labels: mesosphere, unified-containerizer-mvp
>
> Currently volumes are stripped when an image is specified running a command 
> task with Mesos containerizer. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4478) ReviewBot seemed to be crashing ReviewBoard server when posting large reviews

2016-01-26 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-4478:

Fix Version/s: (was: 0.27.0)

> ReviewBot seemed to be crashing ReviewBoard server when posting large reviews
> -
>
> Key: MESOS-4478
> URL: https://issues.apache.org/jira/browse/MESOS-4478
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Vinod Kone
>
> The bot is currently tripping on this review  
> https://reviews.apache.org/r/42506/ (see builds #10973 to #10978).
> [~jfarrell] looked at the server logs and said he saw 'MySQL going away' 
> message when the mesos bot was making these requests. I think that error is a 
> bit misleading because it happens only for this review (which has a huge 
> error log due to bad patch). The bot has successfully posted reviews for 
> other review requests which had no error log (good patch).
> One way to fix this would be to just post a tail of the error log (and 
> perhaps link to Jenkins Console or some other service for the longer error 
> text).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4328) Docker container REST API /monitor/statistics.json output have no timestamp field

2016-01-25 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-4328:

Fix Version/s: 0.27.0

> Docker container REST API /monitor/statistics.json output have no timestamp 
> field 
> --
>
> Key: MESOS-4328
> URL: https://issues.apache.org/jira/browse/MESOS-4328
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.25.0
> Environment: Linux 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 
> 15 17:43:14 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: AndyPang
>Assignee: AndyPang
>  Labels: mesosphere
> Fix For: 0.27.0
>
> Attachments: docker.patch
>
>
> mesos 0.25.0 version if slave startup flags --containerizers=docker, using 
> docker container the rest api "/monitor/statistics.json" output have no 
> timestamp field,while if the mesoscontainer the output will have the 
> timestamp field.
> So if we use docker container we maybe cann't calculate cpu utilization based 
> on timestamp, futhermore the "timestamp" is the required filed in 
> ResourceStatistics message.
> {code:title=statistics.json|borderStyle=solid}
> [
> {
> "executor_id": "sleep.ecf0e700-b8da-11e5-95db-0242872c438f",
> "executor_name": "Command Executor (Task: 
> sleep.ecf0e700-b8da-11e5-95db-0242872c438f) (Command: sh -c 'sleep 3')",
> "framework_id": "cdb28c37-14c6-4877-a591-4eabbc6d84f2-",
> "source": "sleep.ecf0e700-b8da-11e5-95db-0242872c438f",
> "statistics": {
> "cpus_limit": 1.1,
> "cpus_system_time_secs": 0,
> "cpus_user_time_secs": 0.02,
> "mem_limit_bytes": 50331648,
> "mem_rss_bytes": 200704
> }
> }
> ]
> {code}
> bug fix: we just like mesoscontainer in docker.cpp function cgroupsStatistics 
> add the timestamp value set, more on patch,the result as flow
> {code:title=statistics.json|borderStyle=solid}
> [
> {
> "executor_id": "sleep.15dd3644-b902-11e5-ac40-0242872c438f",
> "executor_name": "Command Executor (Task: 
> sleep.15dd3644-b902-11e5-ac40-0242872c438f) (Command: sh -c 'sleep 3')",
> "framework_id": "cdb28c37-14c6-4877-a591-4eabbc6d84f2-",
> "source": "sleep.15dd3644-b902-11e5-ac40-0242872c438f",
> "statistics": {
> "cpus_limit": 1.1,
> "cpus_system_time_secs": 0,
> "cpus_user_time_secs": 0.02,
> "mem_limit_bytes": 50331648,
> "mem_rss_bytes": 192512,
> "timestamp": 1452585472.6926
> }
> }
> ]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4328) Docker container REST API /monitor/statistics.json output have no timestamp field

2016-01-25 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-4328:

Shepherd: Timothy Chen

> Docker container REST API /monitor/statistics.json output have no timestamp 
> field 
> --
>
> Key: MESOS-4328
> URL: https://issues.apache.org/jira/browse/MESOS-4328
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.25.0
> Environment: Linux 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 
> 15 17:43:14 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: AndyPang
>Assignee: AndyPang
>  Labels: mesosphere
> Fix For: 0.27.0
>
> Attachments: docker.patch
>
>
> mesos 0.25.0 version if slave startup flags --containerizers=docker, using 
> docker container the rest api "/monitor/statistics.json" output have no 
> timestamp field,while if the mesoscontainer the output will have the 
> timestamp field.
> So if we use docker container we maybe cann't calculate cpu utilization based 
> on timestamp, futhermore the "timestamp" is the required filed in 
> ResourceStatistics message.
> {code:title=statistics.json|borderStyle=solid}
> [
> {
> "executor_id": "sleep.ecf0e700-b8da-11e5-95db-0242872c438f",
> "executor_name": "Command Executor (Task: 
> sleep.ecf0e700-b8da-11e5-95db-0242872c438f) (Command: sh -c 'sleep 3')",
> "framework_id": "cdb28c37-14c6-4877-a591-4eabbc6d84f2-",
> "source": "sleep.ecf0e700-b8da-11e5-95db-0242872c438f",
> "statistics": {
> "cpus_limit": 1.1,
> "cpus_system_time_secs": 0,
> "cpus_user_time_secs": 0.02,
> "mem_limit_bytes": 50331648,
> "mem_rss_bytes": 200704
> }
> }
> ]
> {code}
> bug fix: we just like mesoscontainer in docker.cpp function cgroupsStatistics 
> add the timestamp value set, more on patch,the result as flow
> {code:title=statistics.json|borderStyle=solid}
> [
> {
> "executor_id": "sleep.15dd3644-b902-11e5-ac40-0242872c438f",
> "executor_name": "Command Executor (Task: 
> sleep.15dd3644-b902-11e5-ac40-0242872c438f) (Command: sh -c 'sleep 3')",
> "framework_id": "cdb28c37-14c6-4877-a591-4eabbc6d84f2-",
> "source": "sleep.15dd3644-b902-11e5-ac40-0242872c438f",
> "statistics": {
> "cpus_limit": 1.1,
> "cpus_system_time_secs": 0,
> "cpus_user_time_secs": 0.02,
> "mem_limit_bytes": 50331648,
> "mem_rss_bytes": 192512,
> "timestamp": 1452585472.6926
> }
> }
> ]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4435) Update `Master::Http::stateSummary` to use `jsonify`.

2016-01-22 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-4435:

Target Version/s: 0.28.0  (was: 0.27.0)

> Update `Master::Http::stateSummary` to use `jsonify`.
> -
>
> Key: MESOS-4435
> URL: https://issues.apache.org/jira/browse/MESOS-4435
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Michael Park
>Assignee: Michael Park
>
> Update {{state-summary}} to use {{jsonify}} to stay consistent with {{state}} 
> HTTP endpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3379) LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint is failed

2016-01-20 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen reassigned MESOS-3379:
---

Assignee: Timothy Chen  (was: haosdent)

> LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint is failed
> --
>
> Key: MESOS-3379
> URL: https://issues.apache.org/jira/browse/MESOS-3379
> Project: Mesos
>  Issue Type: Bug
>Reporter: haosdent
>Assignee: Timothy Chen
>  Labels: flaky-test
> Fix For: 0.27.0
>
>
> {code}
> sudo GLOG_v=1 ./bin/mesos-tests.sh 
> --gtest_filter="LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint"
>  --verbose
> {code}
> failed in Ubuntu 14.04
> Just a problem when investing [MESOS-3349 
> PersistentVolumeTest.AccessPersistentVolume fails when run as 
> root.|https://issues.apache.org/jira/browse/MESOS-3349]
> In LinuxFilesystemIsolatorProcess::cleanup, when we read mount table and 
> umount. The order should use reverse order. Suppose our mount order is 
> {code}
> mount /tmp/a /tmp/b
> mount /tmp/c /tmp/b/c
> {code}
> Currently we umount logic in cleanup is 
> {code}
> umount /tmp/b
> umount /tmp/b/c <- Wrong
> {code}
> This is the reason why ROOT_VolumeFromHostSandboxMountPoint failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3379) LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint is failed

2016-01-20 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109989#comment-15109989
 ] 

Timothy Chen commented on MESOS-3379:
-

commit 10f8052c763fcb1fe796e66f985271808f903ffe
Author: Timothy Chen 
Date:   Wed Jan 20 19:23:05 2016 -0800

Fixed unmount order in linux filesystem isolator cleanup.

Review: https://reviews.apache.org/r/42389/

> LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint is failed
> --
>
> Key: MESOS-3379
> URL: https://issues.apache.org/jira/browse/MESOS-3379
> Project: Mesos
>  Issue Type: Bug
>Reporter: haosdent
>Assignee: haosdent
>  Labels: flaky-test
> Fix For: 0.27.0
>
>
> {code}
> sudo GLOG_v=1 ./bin/mesos-tests.sh 
> --gtest_filter="LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint"
>  --verbose
> {code}
> failed in Ubuntu 14.04
> Just a problem when investing [MESOS-3349 
> PersistentVolumeTest.AccessPersistentVolume fails when run as 
> root.|https://issues.apache.org/jira/browse/MESOS-3349]
> In LinuxFilesystemIsolatorProcess::cleanup, when we read mount table and 
> umount. The order should use reverse order. Suppose our mount order is 
> {code}
> mount /tmp/a /tmp/b
> mount /tmp/c /tmp/b/c
> {code}
> Currently we umount logic in cleanup is 
> {code}
> umount /tmp/b
> umount /tmp/b/c <- Wrong
> {code}
> This is the reason why ROOT_VolumeFromHostSandboxMountPoint failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4410) Introduce protobuf for quota set request.

2016-01-19 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-4410:

Priority: Blocker  (was: Major)

> Introduce protobuf for quota set request.
> -
>
> Key: MESOS-4410
> URL: https://issues.apache.org/jira/browse/MESOS-4410
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>Priority: Blocker
>  Labels: mesosphere
>
> To document quota request JSON schema and simplify request processing, 
> introduce a {{QuotaRequest}} protobuf wrapper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4349) GMock warning in SlaveTest.ContainerUpdatedBeforeTaskReachesExecutor

2016-01-19 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108131#comment-15108131
 ] 

Timothy Chen commented on MESOS-4349:
-

commit d31f9152a7250583c51f2e0568aa0b5a09cc88e9
Author: Neil Conway 
Date:   Tue Jan 19 22:27:14 2016 -0800

Fixed more tests that didn't set a shutdown expect for MockExecutor.

Specifically, the following tests:

MasterTest.OfferNotRescindedOnceUsed
OversubscriptionTest.FetchResourceUsageFromMonitor
OversubscriptionTest.QoSFetchResourceUsageFromMonitor
SlaveTest.ContainerUpdatedBeforeTaskReachesExecutor

Review: https://reviews.apache.org/r/42265/

> GMock warning in SlaveTest.ContainerUpdatedBeforeTaskReachesExecutor
> 
>
> Key: MESOS-4349
> URL: https://issues.apache.org/jira/browse/MESOS-4349
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere, tests
> Fix For: 0.27.0
>
>
> {noformat}
> [ RUN  ] SlaveTest.ContainerUpdatedBeforeTaskReachesExecutor
> GMOCK WARNING:
> Uninteresting mock function call - returning directly.
> Function call: shutdown(0x7fe189cae850)
> Stack trace:
> [   OK ] SlaveTest.ContainerUpdatedBeforeTaskReachesExecutor (51 ms)
> {noformat}
> Occurs non-deterministically for me on OSX 10.10, perhaps one run in ten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4391) docker pull a remote image conflict

2016-01-16 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103064#comment-15103064
 ] 

Timothy Chen commented on MESOS-4391:
-

Concurrent pulling the same image is handled by the docker daemon and doesn't 
require any special synchronization on our side. We only need to handle cases 
where either the docker daemon doesn't handle, or certain load / pattern that 
will break. AFAIK it's just a warning that shows up in the daemon side and from 
the client side everything still works.

> docker pull a remote image conflict
> ---
>
> Key: MESOS-4391
> URL: https://issues.apache.org/jira/browse/MESOS-4391
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, framework
>Affects Versions: 0.26.0
> Environment: CentOS Linux release 7.2.1511 (Core)
> 3.10.0-327.el7.x86_64
>Reporter: qinlu
>
> I run a docker app with 3 tasks,and the docker image not exist in the slave 
> ,it must to pull from docker.io.
> Marathon assign 2 app run in a slave,and the last in another.
> I see the log by journalctl,it show me like this :level=error msg="HTTP 
> Error" err="No such image: solr:latest" statusCode=404.
> There is two threads to pull the image
> [root@** ~]# ps -ef|grep solr
> root 30113 10735  0 12:17 ?00:00:00 docker -H 
> unix:///var/run/docker.sock pull solr:latest
> root 30114 10735  0 12:17 ?00:00:00 docker -H 
> unix:///var/run/docker.sock pull solr:latest



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3379) LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint is failed

2016-01-16 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103069#comment-15103069
 ] 

Timothy Chen commented on MESOS-3379:
-

https://reviews.apache.org/r/42389/

> LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint is failed
> --
>
> Key: MESOS-3379
> URL: https://issues.apache.org/jira/browse/MESOS-3379
> Project: Mesos
>  Issue Type: Bug
>Reporter: haosdent
>Assignee: haosdent
>  Labels: flaky-test
> Fix For: 0.27.0
>
>
> {code}
> sudo GLOG_v=1 ./bin/mesos-tests.sh 
> --gtest_filter="LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint"
>  --verbose
> {code}
> failed in Ubuntu 14.04
> Just a problem when investing [MESOS-3349 
> PersistentVolumeTest.AccessPersistentVolume fails when run as 
> root.|https://issues.apache.org/jira/browse/MESOS-3349]
> In LinuxFilesystemIsolatorProcess::cleanup, when we read mount table and 
> umount. The order should use reverse order. Suppose our mount order is 
> {code}
> mount /tmp/a /tmp/b
> mount /tmp/c /tmp/b/c
> {code}
> Currently we umount logic in cleanup is 
> {code}
> umount /tmp/b
> umount /tmp/b/c <- Wrong
> {code}
> This is the reason why ROOT_VolumeFromHostSandboxMountPoint failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3349) Removing mount point fails with EBUSY in LinuxFilesystemIsolator.

2016-01-16 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-3349:

Fix Version/s: 0.27.0

> Removing mount point fails with EBUSY in LinuxFilesystemIsolator.
> -
>
> Key: MESOS-3349
> URL: https://issues.apache.org/jira/browse/MESOS-3349
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: Ubuntu 14.04, CentOS 5
>Reporter: Benjamin Mahler
>Assignee: Jie Yu
>  Labels: flaky-test
> Fix For: 0.27.0
>
>
> When running the tests as root, we found 
> PersistentVolumeTest.AccessPersistentVolume fails consistently on some 
> platforms.
> {noformat}
> [ RUN  ] PersistentVolumeTest.AccessPersistentVolume
> I0901 02:17:26.435140 39432 exec.cpp:133] Version: 0.25.0
> I0901 02:17:26.442129 39461 exec.cpp:207] Executor registered on slave 
> 20150901-021726-1828659978-52102-32604-S0
> Registered executor on hostname
> Starting task d8ff1f00-e720-4a61-b440-e111009dfdc3
> sh -c 'echo abc > path1/file'
> Forked command at 39484
> Command exited with status 0 (pid: 39484)
> ../../src/tests/persistent_volume_tests.cpp:579: Failure
> Value of: os::exists(path::join(directory, "path1"))
>   Actual: true
> Expected: false
> [  FAILED  ] PersistentVolumeTest.AccessPersistentVolume (777 ms)
> {noformat}
> Turns out that the 'rmdir' after the 'umount' fails with EBUSY because 
> there's still some references to the mount.
> FYI [~jieyu] [~mcypark]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3379) LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint is failed

2016-01-16 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-3379:

Fix Version/s: 0.27.0

> LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint is failed
> --
>
> Key: MESOS-3379
> URL: https://issues.apache.org/jira/browse/MESOS-3379
> Project: Mesos
>  Issue Type: Bug
>Reporter: haosdent
>Assignee: haosdent
>  Labels: flaky-test
> Fix For: 0.27.0
>
>
> {code}
> sudo GLOG_v=1 ./bin/mesos-tests.sh 
> --gtest_filter="LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint"
>  --verbose
> {code}
> failed in Ubuntu 14.04
> Just a problem when investing [MESOS-3349 
> PersistentVolumeTest.AccessPersistentVolume fails when run as 
> root.|https://issues.apache.org/jira/browse/MESOS-3349]
> In LinuxFilesystemIsolatorProcess::cleanup, when we read mount table and 
> umount. The order should use reverse order. Suppose our mount order is 
> {code}
> mount /tmp/a /tmp/b
> mount /tmp/c /tmp/b/c
> {code}
> Currently we umount logic in cleanup is 
> {code}
> umount /tmp/b
> umount /tmp/b/c <- Wrong
> {code}
> This is the reason why ROOT_VolumeFromHostSandboxMountPoint failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4302) Offer filter timeouts are ignored if the allocator is slow or backlogged.

2016-01-15 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-4302:

Priority: Blocker  (was: Critical)

> Offer filter timeouts are ignored if the allocator is slow or backlogged.
> -
>
> Key: MESOS-4302
> URL: https://issues.apache.org/jira/browse/MESOS-4302
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Benjamin Mahler
>Assignee: Alexander Rukletsov
>Priority: Blocker
>  Labels: mesosphere
>
> Currently, when the allocator recovers resources from an offer, it creates a 
> filter timeout based on time at which the call is processed.
> This means that if it takes longer than the filter duration for the allocator 
> to perform an allocation for the relevant agent, then the filter is never 
> applied.
> This leads to pathological behavior: if the framework sets a filter duration 
> that is smaller than the wall clock time it takes for us to perform the next 
> allocation, then the filters will have no effect. This can mean that low 
> share frameworks may continue receiving offers that they have no intent to 
> use, without other frameworks ever receiving these offers.
> The workaround for this is for frameworks to set high filter durations, and 
> possibly reviving offers when they need more resources, however, we should 
> fix this issue in the allocator. (i.e. derive the timeout deadlines and 
> expiry based on allocation times).
> This seems to warrant cherry-picking into bug fix releases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3379) LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint is failed

2016-01-15 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-3379:

Shepherd: Timothy Chen

> LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint is failed
> --
>
> Key: MESOS-3379
> URL: https://issues.apache.org/jira/browse/MESOS-3379
> Project: Mesos
>  Issue Type: Bug
>Reporter: haosdent
>Assignee: haosdent
>  Labels: flaky-test
>
> {code}
> sudo GLOG_v=1 ./bin/mesos-tests.sh 
> --gtest_filter="LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint"
>  --verbose
> {code}
> failed in Ubuntu 14.04
> Just a problem when investing [MESOS-3349 
> PersistentVolumeTest.AccessPersistentVolume fails when run as 
> root.|https://issues.apache.org/jira/browse/MESOS-3349]
> In LinuxFilesystemIsolatorProcess::cleanup, when we read mount table and 
> umount. The order should use reverse order. Suppose our mount order is 
> {code}
> mount /tmp/a /tmp/b
> mount /tmp/c /tmp/b/c
> {code}
> Currently we umount logic in cleanup is 
> {code}
> umount /tmp/b
> umount /tmp/b/c <- Wrong
> {code}
> This is the reason why ROOT_VolumeFromHostSandboxMountPoint failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4029) ContentType/SchedulerTest is flaky.

2016-01-15 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102760#comment-15102760
 ] 

Timothy Chen commented on MESOS-4029:
-

Looks like this is not a blocker for 0.27.0 as it's only local to tests.

> ContentType/SchedulerTest is flaky.
> ---
>
> Key: MESOS-4029
> URL: https://issues.apache.org/jira/browse/MESOS-4029
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>Assignee: Artem Harutyunyan
>  Labels: flaky, flaky-test, mesosphere
> Fix For: 0.28.0
>
>
> SSL build, [Ubuntu 
> 14.04|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh],
>  non-root test run.
> {noformat}
> [--] 22 tests from ContentType/SchedulerTest
> [ RUN  ] ContentType/SchedulerTest.Subscribe/0
> [   OK ] ContentType/SchedulerTest.Subscribe/0 (48 ms)
> *** Aborted at 1448928007 (unix time) try "date -d @1448928007" if you are 
> using GNU date ***
> [ RUN  ] ContentType/SchedulerTest.Subscribe/1
> PC: @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> *** SIGSEGV (@0x10030) received by PID 21320 (TID 0x2b549e5d4700) from 
> PID 48; stack trace: ***
> @ 0x2b54c95940b7 os::Linux::chained_handler()
> @ 0x2b54c9598219 JVM_handle_linux_signal
> @ 0x2b5496300340 (unknown)
> @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> @   0xe2ea6d 
> _ZN7testing8internal18FunctionMockerBaseIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS6_SaIS6_E10InvokeWithERKSt5tupleIJSC_EE
> @   0xe2b1bc testing::internal::FunctionMocker<>::Invoke()
> @  0x1118aed 
> mesos::internal::tests::SchedulerTest::Callbacks::received()
> @  0x111c453 
> _ZNKSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS0_2v19scheduler5EventESt5dequeIS8_SaIS8_EclIJSE_EvEEvRS4_DpOT_
> @  0x111c001 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi16__callIvJSF_EJLm0ELm1T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
> @  0x111b90d 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi1clIJSF_EvEET0_DpOT_
> @  0x111ae09 std::_Function_handler<>::_M_invoke()
> @ 0x2b5493c6da09 std::function<>::operator()()
> @ 0x2b5493c688ee process::AsyncExecutorProcess::execute<>()
> @ 0x2b5493c6db2a 
> _ZZN7process8dispatchI7NothingNS_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS8_SaIS8_ESC_PvSG_SC_SJ_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSO_FSL_T1_T2_T3_ET4_T5_T6_ENKUlPNS_11ProcessBaseEE_clES11_
> @ 0x2b5493c765a4 
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingNS0_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeISC_SaISC_ESG_PvSK_SG_SN_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSS_FSP_T1_T2_T3_ET4_T5_T6_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x2b54946b1201 std::function<>::operator()()
> @ 0x2b549469960f process::ProcessBase::visit()
> @ 0x2b549469d480 process::DispatchEvent::visit()
> @   0x9dc0ba process::ProcessBase::serve()
> @ 0x2b54946958cc process::ProcessManager::resume()
> @ 0x2b5494692a9c 
> _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
> @ 0x2b549469ccac 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
> @ 0x2b549469cc5c 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
> @ 0x2b549469cbee 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
> @ 0x2b549469cb45 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
> @ 0x2b549469cade 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
> @ 0x2b5495b81a40 (unknown)
> @ 0x2b54962f8182 start_thread
> @ 0x2b549660847d (unknown)
> make[3]: *** [check-local] Segmentation fault
> make[3]: Leaving directory `/home/vagrant/mesos/build/src'
> make[2]: *** [check-am] 

[jira] [Commented] (MESOS-3578) ProvisionerDockerLocalStoreTest.MetadataManagerInitialization is flaky

2016-01-15 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102784#comment-15102784
 ] 

Timothy Chen commented on MESOS-3578:
-

I don't think this is being worked on yet, and not a blocker since Provisioner 
is not yet ready.

> ProvisionerDockerLocalStoreTest.MetadataManagerInitialization is flaky
> --
>
> Key: MESOS-3578
> URL: https://issues.apache.org/jira/browse/MESOS-3578
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Anand Mazumdar
>  Labels: flaky-test, mesosphere
>
> Showed up on ASF CI:
> https://builds.apache.org/job/Mesos/881/COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/consoleFull
> {code}
> [ RUN  ] ProvisionerDockerLocalStoreTest.MetadataManagerInitialization
> Using temporary directory 
> '/tmp/ProvisionerDockerLocalStoreTest_MetadataManagerInitialization_9ynmgE'
> I0929 02:36:44.066397 30457 local_puller.cpp:127] Untarring image from 
> '/tmp/ProvisionerDockerLocalStoreTest_MetadataManagerInitialization_9ynmgE/store/staging/aZND7C'
>  to 
> '/tmp/ProvisionerDockerLocalStoreTest_MetadataManagerInitialization_9ynmgE/images/abc:latest.tar'
> ../../src/tests/containerizer/provisioner_docker_tests.cpp:843: Failure
> (layers).failure(): Collect failed: Untar failed with exit code: exited with 
> status 2
> [  FAILED  ] ProvisionerDockerLocalStoreTest.MetadataManagerInitialization 
> (181 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3807) RegistryClientTest.SimpleGetManifest is flaky

2016-01-15 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102780#comment-15102780
 ] 

Timothy Chen commented on MESOS-3807:
-

I think we can close this we are retiring the registry client.

> RegistryClientTest.SimpleGetManifest is flaky
> -
>
> Key: MESOS-3807
> URL: https://issues.apache.org/jira/browse/MESOS-3807
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Anand Mazumdar
>  Labels: flaky-test, mesosphere
>
> From ASF CI:
> https://builds.apache.org/job/Mesos/976/COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=centos:7,label_exp=docker%7C%7CHadoop/console
> {code}
> [ RUN  ] RegistryClientTest.SimpleGetManifest
> I1026 18:02:45.320374 31975 registry_client.cpp:264] Response status: 401 
> Unauthorized
> I1026 18:02:45.323772 31982 libevent_ssl_socket.cpp:1025] Socket error: 
> Connection reset by peer
> ../../src/tests/containerizer/provisioner_docker_tests.cpp:718: Failure
> (socket).failure(): Failed accept: connection error: Connection reset by peer
> [  FAILED  ] RegistryClientTest.SimpleGetManifest (13 ms)
> {code}
> Logs from a good run:
> {code}
> [ RUN  ] RegistryClientTest.SimpleGetManifest
> I1025 15:35:36.248955 31970 registry_client.cpp:264] Response status: 401 
> Unauthorized
> I1025 15:35:36.267873 31979 registry_client.cpp:264] Response status: 200 OK
> [   OK ] RegistryClientTest.SimpleGetManifest (32 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4348) GMock warning in HookTest.VerifySlaveRunTaskHook, HookTest.VerifySlaveTaskStatusDecorator

2016-01-14 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099125#comment-15099125
 ] 

Timothy Chen commented on MESOS-4348:
-

commit 764e3532f5116305ca0d6dd1053b4a4041a8abbb
Author: Anand Mazumdar 
Date:   Thu Jan 14 14:58:36 2016 -0800

Fixed gmock warnings in hook tests.

We did not have an expectation on the `shutdown` method of the 
`MockExecutor`.
This led to the gmock warning being emitted in some runs.

The tests that are being fixed are:

- `HookTest.VerifyMasterLaunchTaskHook`
- `HookTest.VerifySlaveRunTaskHook`
- `HookTest.VerifySlaveTaskStatusDecorator`

Review: https://reviews.apache.org/r/42216/

> GMock warning in HookTest.VerifySlaveRunTaskHook, 
> HookTest.VerifySlaveTaskStatusDecorator
> -
>
> Key: MESOS-4348
> URL: https://issues.apache.org/jira/browse/MESOS-4348
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere, tests
> Fix For: 0.27.0
>
>
> {noformat}
> [ RUN  ] HookTest.VerifySlaveRunTaskHook
> GMOCK WARNING:
> Uninteresting mock function call - returning directly.
> Function call: shutdown(0x7ff079cb2420)
> Stack trace:
> [   OK ] HookTest.VerifySlaveRunTaskHook (51 ms)
> [ RUN  ] HookTest.VerifySlaveTaskStatusDecorator
> GMOCK WARNING:
> Uninteresting mock function call - returning directly.
> Function call: shutdown(0x7ff079cbb790)
> Stack trace:
> [   OK ] HookTest.VerifySlaveTaskStatusDecorator (54 ms)
> {noformat}
> Occurs non-deterministically for me. OSX 10.10.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4348) GMock warning in HookTest.VerifySlaveRunTaskHook, HookTest.VerifySlaveTaskStatusDecorator

2016-01-14 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-4348:

Fix Version/s: 0.27.0

> GMock warning in HookTest.VerifySlaveRunTaskHook, 
> HookTest.VerifySlaveTaskStatusDecorator
> -
>
> Key: MESOS-4348
> URL: https://issues.apache.org/jira/browse/MESOS-4348
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Reporter: Neil Conway
>Assignee: Anand Mazumdar
>Priority: Minor
>  Labels: mesosphere, tests
> Fix For: 0.27.0
>
>
> {noformat}
> [ RUN  ] HookTest.VerifySlaveRunTaskHook
> GMOCK WARNING:
> Uninteresting mock function call - returning directly.
> Function call: shutdown(0x7ff079cb2420)
> Stack trace:
> [   OK ] HookTest.VerifySlaveRunTaskHook (51 ms)
> [ RUN  ] HookTest.VerifySlaveTaskStatusDecorator
> GMOCK WARNING:
> Uninteresting mock function call - returning directly.
> Function call: shutdown(0x7ff079cbb790)
> Stack trace:
> [   OK ] HookTest.VerifySlaveTaskStatusDecorator (54 ms)
> {noformat}
> Occurs non-deterministically for me. OSX 10.10.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4207) Add an example bug due to a lack of defer() to the defer() documentation

2016-01-14 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-4207:

Fix Version/s: 0.27.0

> Add an example bug due to a lack of defer() to the defer() documentation
> 
>
> Key: MESOS-4207
> URL: https://issues.apache.org/jira/browse/MESOS-4207
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Greg Mann
>Assignee: Greg Mann
>Priority: Minor
>  Labels: documentation, libprocess, mesosphere
> Fix For: 0.27.0
>
>
> In the past, some bugs have been introduced into the codebase due to a lack 
> of {{defer()}} where it should have been used. It would be useful to add an 
> example of this to the {{defer()}} documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4207) Add an example bug due to a lack of defer() to the defer() documentation

2016-01-14 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099247#comment-15099247
 ] 

Timothy Chen commented on MESOS-4207:
-

commit 81fa980b7c3c8f4a5ee826e1b295380893006909
Author: Greg Mann 
Date:   Thu Jan 14 17:16:41 2016 -0800

Added example of a `defer` bug to libprocess README.

Review: https://reviews.apache.org/r/42030/

> Add an example bug due to a lack of defer() to the defer() documentation
> 
>
> Key: MESOS-4207
> URL: https://issues.apache.org/jira/browse/MESOS-4207
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Greg Mann
>Assignee: Greg Mann
>Priority: Minor
>  Labels: documentation, libprocess, mesosphere
> Fix For: 0.27.0
>
>
> In the past, some bugs have been introduced into the codebase due to a lack 
> of {{defer()}} where it should have been used. It would be useful to add an 
> example of this to the {{defer()}} documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4302) Offer filter timeouts are ignored if the allocator is slow or backlogged.

2016-01-14 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099098#comment-15099098
 ] 

Timothy Chen commented on MESOS-4302:
-

Ok, the fix has to be there soon as we're starting to put together a release 
branch. If it doesn't make it over the weekend I'll update the fix version.

> Offer filter timeouts are ignored if the allocator is slow or backlogged.
> -
>
> Key: MESOS-4302
> URL: https://issues.apache.org/jira/browse/MESOS-4302
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Benjamin Mahler
>Assignee: Alexander Rukletsov
>Priority: Critical
>  Labels: mesosphere
>
> Currently, when the allocator recovers resources from an offer, it creates a 
> filter timeout based on time at which the call is processed.
> This means that if it takes longer than the filter duration for the allocator 
> to perform an allocation for the relevant agent, then the filter is never 
> applied.
> This leads to pathological behavior: if the framework sets a filter duration 
> that is smaller than the wall clock time it takes for us to perform the next 
> allocation, then the filters will have no effect. This can mean that low 
> share frameworks may continue receiving offers that they have no intent to 
> use, without other frameworks ever receiving these offers.
> The workaround for this is for frameworks to set high filter durations, and 
> possibly reviving offers when they need more resources, however, we should 
> fix this issue in the allocator. (i.e. derive the timeout deadlines and 
> expiry based on allocation times).
> This seems to warrant cherry-picking into bug fix releases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3688) Get Container Name information when launching a container task

2016-01-14 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-3688:

Target Version/s:   (was: 0.27.0)

> Get Container Name information when launching a container task
> --
>
> Key: MESOS-3688
> URL: https://issues.apache.org/jira/browse/MESOS-3688
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Affects Versions: 0.24.1
>Reporter: Raffaele Di Fazio
>Assignee: Kapil Arya
>  Labels: mesosphere
>
> We want to get the Docker Name (or Docker ID, or both) when launching a 
> container task with mesos. The container name is generated by mesos itself 
> (i.e. mesos-77e5fde6-83e7-4618-a2dd-d5b10f2b4d25, obtained with "docker ps") 
> and it would be nice to expose this information to frameworks so that this 
> information can be used, for example by Marathon to give this information to 
> users via a REST API. 
> To go a bit in depth with our use case, we have files created by fluentd 
> logdriver that are named with Docker Name or Docker ID (full or short) and we 
> need a mapping for the users of the REST API and thus the first step is to 
> make this information available from mesos. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4018) Enhance float-point operation in Mesos

2016-01-14 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-4018:

Target Version/s: 0.28.0  (was: 0.27.0)

> Enhance float-point operation in Mesos
> --
>
> Key: MESOS-4018
> URL: https://issues.apache.org/jira/browse/MESOS-4018
> Project: Mesos
>  Issue Type: Epic
>  Components: stout
>Reporter: Klaus Ma
>Assignee: Klaus Ma
>
> For now, there are several defects about float-point equal checking. This 
> EPIC is used to build float-point operation in {{stout}} for other 
> components. The major operation will be:
> 1. {{bool almostEqual(double left, double right)}} for Scalar {{operator==}}
> 2. {{CHECK_NEAR(left, right)}} for assert in components



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4066) Expose when agent is recovering in the agent's /state.json endpoint.

2016-01-14 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-4066:

Target Version/s: 0.28.0  (was: 0.27.0)

> Expose when agent is recovering in the agent's /state.json endpoint.
> 
>
> Key: MESOS-4066
> URL: https://issues.apache.org/jira/browse/MESOS-4066
> Project: Mesos
>  Issue Type: Task
>  Components: slave
>Reporter: Benjamin Mahler
>  Labels: mesosphere
>
> Currently when a user is hitting /state.json on the agent, it may return 
> partial state if the agent has failed over and is recovering. There is 
> currently no clear way to tell if this is the case when looking at a 
> response, so the user may incorrectly interpret the agent as being empty of 
> tasks.
> We could consider exposing the 'state' enum of the agent in the endpoint:
> {code}
>   enum State
>   {
> RECOVERING,   // Slave is doing recovery.
> DISCONNECTED, // Slave is not connected to the master.
> RUNNING,  // Slave has (re-)registered.
> TERMINATING,  // Slave is shutting down.
>   } state;
> {code}
> This may be a bit tricky to maintain as far as backwards-compatibility of the 
> endpoint, if we were to alter this enum.
> Exposing this would allow users to be more informed about the state of the 
> agent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4053) MemoryPressureMesosTest tests fail on CentOS 6.6

2016-01-14 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-4053:

Target Version/s: 0.28.0  (was: 0.27.0)

> MemoryPressureMesosTest tests fail on CentOS 6.6
> 
>
> Key: MESOS-4053
> URL: https://issues.apache.org/jira/browse/MESOS-4053
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS 6.6
>Reporter: Greg Mann
>Assignee: Benjamin Hindman
>  Labels: mesosphere, test-failure
>
> {{MemoryPressureMesosTest.CGROUPS_ROOT_Statistics}} and 
> {{MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery}} fail on CentOS 6.6. It 
> seems that mounted cgroups are not properly cleaned up after previous tests, 
> so multiple hierarchies are detected and thus an error is produced:
> {code}
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> ../../src/tests/mesos.cpp:849: Failure
> Value of: _baseHierarchy.get()
>   Actual: "/cgroup"
> Expected: baseHierarchy
> Which is: "/tmp/mesos_test_cgroup"
> -
> Multiple cgroups base hierarchies detected:
>   '/tmp/mesos_test_cgroup'
>   '/cgroup'
> Mesos does not support multiple cgroups base hierarchies.
> Please unmount the corresponding (or all) subsystems.
> -
> ../../src/tests/mesos.cpp:932: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (12 ms)
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery
> ../../src/tests/mesos.cpp:849: Failure
> Value of: _baseHierarchy.get()
>   Actual: "/cgroup"
> Expected: baseHierarchy
> Which is: "/tmp/mesos_test_cgroup"
> -
> Multiple cgroups base hierarchies detected:
>   '/tmp/mesos_test_cgroup'
>   '/cgroup'
> Mesos does not support multiple cgroups base hierarchies.
> Please unmount the corresponding (or all) subsystems.
> -
> ../../src/tests/mesos.cpp:932: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery (7 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4261) Remove docker auth server flag

2016-01-14 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-4261:

Target Version/s:   (was: 0.27.0)

> Remove docker auth server flag
> --
>
> Key: MESOS-4261
> URL: https://issues.apache.org/jira/browse/MESOS-4261
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Timothy Chen
>Assignee: Jie Yu
>  Labels: mesosphere, unified-containerizer-mvp
>
> We currently use a configured docker auth server from a slave flag to get 
> token auth for docker registry. However this doesn't work for private 
> registries as docker registry supports sending down the correct auth server 
> to contact.
> We should remove docker auth server flag completely and ask the docker 
> registry for auth server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4347) GMock warning in ReservationTest.ACLMultipleOperations

2016-01-13 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-4347:

Fix Version/s: 0.27.0

> GMock warning in ReservationTest.ACLMultipleOperations
> --
>
> Key: MESOS-4347
> URL: https://issues.apache.org/jira/browse/MESOS-4347
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Reporter: Neil Conway
>Assignee: Greg Mann
>Priority: Minor
>  Labels: mesosphere, reservations, tests
> Fix For: 0.27.0
>
>
> {noformat}
> [ RUN  ] ReservationTest.ACLMultipleOperations
> GMOCK WARNING:
> Uninteresting mock function call - returning directly.
> Function call: shutdown(0x7fa2a311b300)
> Stack trace:
> [   OK ] ReservationTest.ACLMultipleOperations (174 ms)
> [--] 1 test from ReservationTest (174 ms total)
> {noformat}
> Seems to occur non-deterministically for me, maybe once per 50 runs or so. 
> OSX 10.10



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4285) Mesos command task doesn't support volumes with image

2016-01-13 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097337#comment-15097337
 ] 

Timothy Chen commented on MESOS-4285:
-

https://reviews.apache.org/r/42278/

> Mesos command task doesn't support volumes with image
> -
>
> Key: MESOS-4285
> URL: https://issues.apache.org/jira/browse/MESOS-4285
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>  Labels: mesosphere, unified-containerizer-mvp
>
> Currently volumes are stripped when an image is specified running a command 
> task with Mesos containerizer. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4031) slave crashed in cgroupstatistics()

2016-01-12 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-4031:

Labels: mesosphere  (was: )

> slave crashed in cgroupstatistics()
> ---
>
> Key: MESOS-4031
> URL: https://issues.apache.org/jira/browse/MESOS-4031
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, libprocess
>Affects Versions: 0.24.0
> Environment: Debian jessie
>Reporter: Steven
>Assignee: Timothy Chen
>  Labels: mesosphere
> Fix For: 0.27.0
>
>
> Hi all, 
> I have built a mesos cluster with three slaves. Any slave may sporadically 
> crash when I get the summary through mesos master ui. Here is the stack 
> trace. 
> {code}
>  slave.sh[13336]: I1201 11:54:12.827975 13338 slave.cpp:3926] Current disk 
> usage 79.71%. Max allowed age: 17.279577136390834hrs
>  slave.sh[13336]: I1201 11:55:12.829792 13342 slave.cpp:3926] Current disk 
> usage 79.71%. Max allowed age: 17.279577136390834hrs
>  slave.sh[13336]: I1201 11:55:38.389614 13342 http.cpp:189] HTTP GET for 
> /slave(1)/state from 192.168.100.1:64870 with User-Agent='Mozilla/5.0 (X11; 
> Ubuntu; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0'
>  docker[8409]: time="2015-12-01T11:55:38.934148017+08:00" level=info msg="GET 
> /v1.20/containers/mesos-b25be32d-41e1-4e14-9b84-d33d733cef51-S3.79c206a6-d6b5-487b-9390-e09292c5b53a/json"
>  docker[8409]: time="2015-12-01T11:55:38.941489332+08:00" level=info msg="GET 
> /v1.20/containers/mesos-b25be32d-41e1-4e14-9b84-d33d733cef51-S3.1e01a4b3-a76e-4bf6-8ce0-a4a937faf236/json"
>  slave.sh[13336]: ABORT: 
> (../../3rdparty/libprocess/3rdparty/stout/include/stout/result.hpp:110): 
> Result::get() but state == NONE*** Aborted at 1448942139 (unix time) try 
> "date -d @1448942139" if you are using GNU date ***
>  slave.sh[13336]: PC: @ 0x7f295218a107 (unknown)
>  slave.sh[13336]: *** SIGABRT (@0x3419) received by PID 13337 (TID 
> 0x7f2948992700) from PID 13337; stack trace: ***
>  slave.sh[13336]: @ 0x7f2952a2e8d0 (unknown)
>  slave.sh[13336]: @ 0x7f295218a107 (unknown)
>  slave.sh[13336]: @ 0x7f295218b4e8 (unknown)
>  slave.sh[13336]: @   0x43dc59 _Abort()
>  slave.sh[13336]: @   0x43dc87 _Abort()
>  slave.sh[13336]: @ 0x7f2955e31c86 Result<>::get()
>  slave.sh[13336]: @ 0x7f295637f017 
> mesos::internal::slave::DockerContainerizerProcess::cgroupsStatistics()
>  slave.sh[13336]: @ 0x7f295637dfea 
> _ZZN5mesos8internal5slave26DockerContainerizerProcess5usageERKNS_11ContainerIDEENKUliE_clEi
>  slave.sh[13336]: @ 0x7f295637e549 
> _ZZN5mesos8internal5slave26DockerContainerizerProcess5usageERKNS_11ContainerIDEENKUlRKN6Docker9ContainerEE0_clES9_
>  slave.sh[13336]: @ 0x7f295638453b
> ZN5mesos8internal5slave26DockerContainerizerProcess5usageERKNS1_11ContainerIDEEUlRKN6Docker9ContainerEE0_EcvSt8functionIFT_T0_EEINS_6FutureINS1_18ResourceStatisticsEEESB_EEvENKUlSB_E_clESB_ENKUlvE_clEv
>  slave.sh[13336]: @ 0x7f295638751d
> FN7process6FutureIN5mesos18ResourceStatisticsEEEvEZZNKS0_9_DeferredIZNS2_8internal5slave26DockerContainerizerProcess5usageERKNS2_11ContainerIDEEUlRKN6Docker9ContainerEE0_EcvSt8functionIFT_T0_EEIS4_SG_EEvENKUlSG_E_clESG_EUlvE_E9_M_invoke
>  slave.sh[13336]: @ 0x7f29563b53e7 std::function<>::operator()()
>  slave.sh[13336]: @ 0x7f29563aa5dc 
> _ZZN7process8dispatchIN5mesos18ResourceStatisticsEEENS_6FutureIT_EERKNS_4UPIDERKSt8functionIFS5_vEEENKUlPNS_11ProcessBaseEE_clESF_
>  slave.sh[13336]: @ 0x7f29563bd667 
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos18ResourceStatisticsEEENS0_6FutureIT_EERKNS0_4UPIDERKSt8functionIFS9_vEEEUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
>  slave.sh[13336]: @ 0x7f2956b893c3 std::function<>::operator()()
>  slave.sh[13336]: @ 0x7f2956b72ab0 process::ProcessBase::visit()
>  slave.sh[13336]: @ 0x7f2956b7588e process::DispatchEvent::visit()
>  slave.sh[13336]: @ 0x7f2955d7f972 process::ProcessBase::serve()
>  slave.sh[13336]: @ 0x7f2956b6ef8e process::ProcessManager::resume()
>  slave.sh[13336]: @ 0x7f2956b63555 process::internal::schedule()
>  slave.sh[13336]: @ 0x7f2956bc0839 
> _ZNSt12_Bind_simpleIFPFvvEvEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE
>  slave.sh[13336]: @ 0x7f2956bc0781 std::_Bind_simple<>::operator()()
>  slave.sh[13336]: @ 0x7f2956bc06fe std::thread::_Impl<>::_M_run()
>  slave.sh[13336]: @ 0x7f29527ca970 (unknown)
>  slave.sh[13336]: @ 0x7f2952a270a4 start_thread
>  slave.sh[13336]: @ 0x7f295223b04d (unknown)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4031) slave crashed in cgroupstatistics()

2016-01-12 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-4031:

Fix Version/s: 0.27.0

> slave crashed in cgroupstatistics()
> ---
>
> Key: MESOS-4031
> URL: https://issues.apache.org/jira/browse/MESOS-4031
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, libprocess
>Affects Versions: 0.24.0
> Environment: Debian jessie
>Reporter: Steven
>Assignee: Timothy Chen
>  Labels: mesosphere
> Fix For: 0.27.0
>
>
> Hi all, 
> I have built a mesos cluster with three slaves. Any slave may sporadically 
> crash when I get the summary through mesos master ui. Here is the stack 
> trace. 
> {code}
>  slave.sh[13336]: I1201 11:54:12.827975 13338 slave.cpp:3926] Current disk 
> usage 79.71%. Max allowed age: 17.279577136390834hrs
>  slave.sh[13336]: I1201 11:55:12.829792 13342 slave.cpp:3926] Current disk 
> usage 79.71%. Max allowed age: 17.279577136390834hrs
>  slave.sh[13336]: I1201 11:55:38.389614 13342 http.cpp:189] HTTP GET for 
> /slave(1)/state from 192.168.100.1:64870 with User-Agent='Mozilla/5.0 (X11; 
> Ubuntu; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0'
>  docker[8409]: time="2015-12-01T11:55:38.934148017+08:00" level=info msg="GET 
> /v1.20/containers/mesos-b25be32d-41e1-4e14-9b84-d33d733cef51-S3.79c206a6-d6b5-487b-9390-e09292c5b53a/json"
>  docker[8409]: time="2015-12-01T11:55:38.941489332+08:00" level=info msg="GET 
> /v1.20/containers/mesos-b25be32d-41e1-4e14-9b84-d33d733cef51-S3.1e01a4b3-a76e-4bf6-8ce0-a4a937faf236/json"
>  slave.sh[13336]: ABORT: 
> (../../3rdparty/libprocess/3rdparty/stout/include/stout/result.hpp:110): 
> Result::get() but state == NONE*** Aborted at 1448942139 (unix time) try 
> "date -d @1448942139" if you are using GNU date ***
>  slave.sh[13336]: PC: @ 0x7f295218a107 (unknown)
>  slave.sh[13336]: *** SIGABRT (@0x3419) received by PID 13337 (TID 
> 0x7f2948992700) from PID 13337; stack trace: ***
>  slave.sh[13336]: @ 0x7f2952a2e8d0 (unknown)
>  slave.sh[13336]: @ 0x7f295218a107 (unknown)
>  slave.sh[13336]: @ 0x7f295218b4e8 (unknown)
>  slave.sh[13336]: @   0x43dc59 _Abort()
>  slave.sh[13336]: @   0x43dc87 _Abort()
>  slave.sh[13336]: @ 0x7f2955e31c86 Result<>::get()
>  slave.sh[13336]: @ 0x7f295637f017 
> mesos::internal::slave::DockerContainerizerProcess::cgroupsStatistics()
>  slave.sh[13336]: @ 0x7f295637dfea 
> _ZZN5mesos8internal5slave26DockerContainerizerProcess5usageERKNS_11ContainerIDEENKUliE_clEi
>  slave.sh[13336]: @ 0x7f295637e549 
> _ZZN5mesos8internal5slave26DockerContainerizerProcess5usageERKNS_11ContainerIDEENKUlRKN6Docker9ContainerEE0_clES9_
>  slave.sh[13336]: @ 0x7f295638453b
> ZN5mesos8internal5slave26DockerContainerizerProcess5usageERKNS1_11ContainerIDEEUlRKN6Docker9ContainerEE0_EcvSt8functionIFT_T0_EEINS_6FutureINS1_18ResourceStatisticsEEESB_EEvENKUlSB_E_clESB_ENKUlvE_clEv
>  slave.sh[13336]: @ 0x7f295638751d
> FN7process6FutureIN5mesos18ResourceStatisticsEEEvEZZNKS0_9_DeferredIZNS2_8internal5slave26DockerContainerizerProcess5usageERKNS2_11ContainerIDEEUlRKN6Docker9ContainerEE0_EcvSt8functionIFT_T0_EEIS4_SG_EEvENKUlSG_E_clESG_EUlvE_E9_M_invoke
>  slave.sh[13336]: @ 0x7f29563b53e7 std::function<>::operator()()
>  slave.sh[13336]: @ 0x7f29563aa5dc 
> _ZZN7process8dispatchIN5mesos18ResourceStatisticsEEENS_6FutureIT_EERKNS_4UPIDERKSt8functionIFS5_vEEENKUlPNS_11ProcessBaseEE_clESF_
>  slave.sh[13336]: @ 0x7f29563bd667 
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos18ResourceStatisticsEEENS0_6FutureIT_EERKNS0_4UPIDERKSt8functionIFS9_vEEEUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
>  slave.sh[13336]: @ 0x7f2956b893c3 std::function<>::operator()()
>  slave.sh[13336]: @ 0x7f2956b72ab0 process::ProcessBase::visit()
>  slave.sh[13336]: @ 0x7f2956b7588e process::DispatchEvent::visit()
>  slave.sh[13336]: @ 0x7f2955d7f972 process::ProcessBase::serve()
>  slave.sh[13336]: @ 0x7f2956b6ef8e process::ProcessManager::resume()
>  slave.sh[13336]: @ 0x7f2956b63555 process::internal::schedule()
>  slave.sh[13336]: @ 0x7f2956bc0839 
> _ZNSt12_Bind_simpleIFPFvvEvEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE
>  slave.sh[13336]: @ 0x7f2956bc0781 std::_Bind_simple<>::operator()()
>  slave.sh[13336]: @ 0x7f2956bc06fe std::thread::_Impl<>::_M_run()
>  slave.sh[13336]: @ 0x7f29527ca970 (unknown)
>  slave.sh[13336]: @ 0x7f2952a270a4 start_thread
>  slave.sh[13336]: @ 0x7f295223b04d (unknown)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4029) ContentType/SchedulerTest is flaky.

2016-01-12 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15094621#comment-15094621
 ] 

Timothy Chen commented on MESOS-4029:
-

Anand are you able to fix this before end of this week? We need to update the 
target version if not.

> ContentType/SchedulerTest is flaky.
> ---
>
> Key: MESOS-4029
> URL: https://issues.apache.org/jira/browse/MESOS-4029
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>Assignee: Artem Harutyunyan
>  Labels: flaky, flaky-test, mesosphere
>
> SSL build, [Ubuntu 
> 14.04|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh],
>  non-root test run.
> {noformat}
> [--] 22 tests from ContentType/SchedulerTest
> [ RUN  ] ContentType/SchedulerTest.Subscribe/0
> [   OK ] ContentType/SchedulerTest.Subscribe/0 (48 ms)
> *** Aborted at 1448928007 (unix time) try "date -d @1448928007" if you are 
> using GNU date ***
> [ RUN  ] ContentType/SchedulerTest.Subscribe/1
> PC: @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> *** SIGSEGV (@0x10030) received by PID 21320 (TID 0x2b549e5d4700) from 
> PID 48; stack trace: ***
> @ 0x2b54c95940b7 os::Linux::chained_handler()
> @ 0x2b54c9598219 JVM_handle_linux_signal
> @ 0x2b5496300340 (unknown)
> @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> @   0xe2ea6d 
> _ZN7testing8internal18FunctionMockerBaseIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS6_SaIS6_E10InvokeWithERKSt5tupleIJSC_EE
> @   0xe2b1bc testing::internal::FunctionMocker<>::Invoke()
> @  0x1118aed 
> mesos::internal::tests::SchedulerTest::Callbacks::received()
> @  0x111c453 
> _ZNKSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS0_2v19scheduler5EventESt5dequeIS8_SaIS8_EclIJSE_EvEEvRS4_DpOT_
> @  0x111c001 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi16__callIvJSF_EJLm0ELm1T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
> @  0x111b90d 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi1clIJSF_EvEET0_DpOT_
> @  0x111ae09 std::_Function_handler<>::_M_invoke()
> @ 0x2b5493c6da09 std::function<>::operator()()
> @ 0x2b5493c688ee process::AsyncExecutorProcess::execute<>()
> @ 0x2b5493c6db2a 
> _ZZN7process8dispatchI7NothingNS_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS8_SaIS8_ESC_PvSG_SC_SJ_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSO_FSL_T1_T2_T3_ET4_T5_T6_ENKUlPNS_11ProcessBaseEE_clES11_
> @ 0x2b5493c765a4 
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingNS0_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeISC_SaISC_ESG_PvSK_SG_SN_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSS_FSP_T1_T2_T3_ET4_T5_T6_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x2b54946b1201 std::function<>::operator()()
> @ 0x2b549469960f process::ProcessBase::visit()
> @ 0x2b549469d480 process::DispatchEvent::visit()
> @   0x9dc0ba process::ProcessBase::serve()
> @ 0x2b54946958cc process::ProcessManager::resume()
> @ 0x2b5494692a9c 
> _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
> @ 0x2b549469ccac 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
> @ 0x2b549469cc5c 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
> @ 0x2b549469cbee 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
> @ 0x2b549469cb45 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
> @ 0x2b549469cade 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
> @ 0x2b5495b81a40 (unknown)
> @ 0x2b54962f8182 start_thread
> @ 0x2b549660847d (unknown)
> make[3]: *** [check-local] Segmentation fault
> make[3]: Leaving directory `/home/vagrant/mesos/build/src'
> make[2]: *** [check-am] Error 

[jira] [Updated] (MESOS-3418) Factor out V1 API test helper functions

2016-01-12 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-3418:

Target Version/s:   (was: 0.27.0)

> Factor out V1 API test helper functions
> ---
>
> Key: MESOS-3418
> URL: https://issues.apache.org/jira/browse/MESOS-3418
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Joris Van Remoortere
>Assignee: Guangya Liu
>  Labels: beginner, mesosphere, newbie, v1_api
>
> We currently have some helper functionality for V1 API tests. This is copied 
> in a few test files.
> Factor this out into a common place once the API is stabilized.
> {code}
> // Helper class for using EXPECT_CALL since the Mesos scheduler API
>   // is callback based.
>   class Callbacks
>   {
>   public:
> MOCK_METHOD0(connected, void(void));
> MOCK_METHOD0(disconnected, void(void));
> MOCK_METHOD1(received, void(const std::queue&));
>   };
> {code}
> {code}
> // Enqueues all received events into a libprocess queue.
> // TODO(jmlvanre): Factor this common code out of tests into V1
> // helper.
> ACTION_P(Enqueue, queue)
> {
>   std::queue events = arg0;
>   while (!events.empty()) {
> // Note that we currently drop HEARTBEATs because most of these tests
> // are not designed to deal with heartbeats.
> // TODO(vinod): Implement DROP_HTTP_CALLS that can filter heartbeats.
> if (events.front().type() == Event::HEARTBEAT) {
>   VLOG(1) << "Ignoring HEARTBEAT event";
> } else {
>   queue->put(events.front());
> }
> events.pop();
>   }
> }
> {code}
> We can also update the helpers in {{/tests/mesos.hpp}} to support the V1 API. 
>  This would let us get ride of lines like:
> {code}
> v1::TaskInfo taskInfo = evolve(createTask(devolve(offer), "", 
> DEFAULT_EXECUTOR_ID));
> {code}
> In favor of:
> {code}
> v1::TaskInfo taskInfo = createTask(offer, "", DEFAULT_EXECUTOR_ID);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4302) Offer filter timeouts are ignored if the allocator is slow or backlogged.

2016-01-12 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15094617#comment-15094617
 ] 

Timothy Chen commented on MESOS-4302:
-

Seems like we're still discussing what's the right fix (or even if we want to 
fix anything), which I doubt we can resolve this before end of this week.
Can we remove the target version for now?

> Offer filter timeouts are ignored if the allocator is slow or backlogged.
> -
>
> Key: MESOS-4302
> URL: https://issues.apache.org/jira/browse/MESOS-4302
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Benjamin Mahler
>Assignee: Alexander Rukletsov
>Priority: Critical
>  Labels: mesosphere
>
> Currently, when the allocator recovers resources from an offer, it creates a 
> filter timeout based on time at which the call is processed.
> This means that if it takes longer than the filter duration for the allocator 
> to perform an allocation for the relevant agent, then the filter is never 
> applied.
> This leads to pathological behavior: if the framework sets a filter duration 
> that is smaller than the wall clock time it takes for us to perform the next 
> allocation, then the filters will have no effect. This can mean that low 
> share frameworks may continue receiving offers that they have no intent to 
> use, without other frameworks ever receiving these offers.
> The workaround for this is for frameworks to set high filter durations, and 
> possibly reviving offers when they need more resources, however, we should 
> fix this issue in the allocator. (i.e. derive the timeout deadlines and 
> expiry based on allocation times).
> This seems to warrant cherry-picking into bug fix releases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4053) MemoryPressureMesosTest tests fail on CentOS 6.6

2016-01-12 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15094610#comment-15094610
 ] 

Timothy Chen commented on MESOS-4053:
-

Is this going to be merged before end of this week? If not let's bump the 
target version.

> MemoryPressureMesosTest tests fail on CentOS 6.6
> 
>
> Key: MESOS-4053
> URL: https://issues.apache.org/jira/browse/MESOS-4053
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS 6.6
>Reporter: Greg Mann
>Assignee: Benjamin Hindman
>  Labels: mesosphere, test-failure
>
> {{MemoryPressureMesosTest.CGROUPS_ROOT_Statistics}} and 
> {{MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery}} fail on CentOS 6.6. It 
> seems that mounted cgroups are not properly cleaned up after previous tests, 
> so multiple hierarchies are detected and thus an error is produced:
> {code}
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> ../../src/tests/mesos.cpp:849: Failure
> Value of: _baseHierarchy.get()
>   Actual: "/cgroup"
> Expected: baseHierarchy
> Which is: "/tmp/mesos_test_cgroup"
> -
> Multiple cgroups base hierarchies detected:
>   '/tmp/mesos_test_cgroup'
>   '/cgroup'
> Mesos does not support multiple cgroups base hierarchies.
> Please unmount the corresponding (or all) subsystems.
> -
> ../../src/tests/mesos.cpp:932: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (12 ms)
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery
> ../../src/tests/mesos.cpp:849: Failure
> Value of: _baseHierarchy.get()
>   Actual: "/cgroup"
> Expected: baseHierarchy
> Which is: "/tmp/mesos_test_cgroup"
> -
> Multiple cgroups base hierarchies detected:
>   '/tmp/mesos_test_cgroup'
>   '/cgroup'
> Mesos does not support multiple cgroups base hierarchies.
> Please unmount the corresponding (or all) subsystems.
> -
> ../../src/tests/mesos.cpp:932: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery (7 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3928) ROOT tests fail on Mesos 0.26 on Ubuntu/CentOS

2016-01-11 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15093439#comment-15093439
 ] 

Timothy Chen commented on MESOS-3928:
-

Bernd is this going to make 0.27.0? 

> ROOT tests fail on Mesos 0.26 on Ubuntu/CentOS
> --
>
> Key: MESOS-3928
> URL: https://issues.apache.org/jira/browse/MESOS-3928
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Marco Massenzio
>Assignee: Greg Mann
>  Labels: tech-debt, testing
> Attachments: ROOT-tests-centos-7.1.log, ROOT-tests-ubuntu-14.04.log
>
>
> Running {{0.26.0-rc1}} on both CentOS 7.1 and Ubuntu 14.04 with {{sudo}} 
> privileges, causes segfaults when running Docker tests.
> Logs attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3973) Failing 'make distcheck' on Mac OS X 10.10.5, also 10.11.

2016-01-11 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15093441#comment-15093441
 ] 

Timothy Chen commented on MESOS-3973:
-

Gilbert are you able to make this before 0.27.0 which is end of this week?

> Failing 'make distcheck' on Mac OS X 10.10.5, also 10.11.
> -
>
> Key: MESOS-3973
> URL: https://issues.apache.org/jira/browse/MESOS-3973
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.21.0, 0.21.2, 0.22.0, 0.23.0, 0.24.0, 0.25.0, 0.26.0
> Environment: Mac OS X 10.10.5, Clang 7.0.0.
>Reporter: Bernd Mathiske
>Assignee: Gilbert Song
>  Labels: build, build-failure, mesosphere
> Attachments: dist_check.log
>
>
> Non-root 'make distcheck.
> {noformat}
> ...
> [--] Global test environment tear-down
> [==] 826 tests from 113 test cases ran. (276624 ms total)
> [  PASSED  ] 826 tests.
>   YOU HAVE 6 DISABLED TESTS
> Making install in .
> make[3]: Nothing to be done for `install-exec-am'.
>  ../install-sh -c -d 
> '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/lib/pkgconfig'
>  /usr/bin/install -c -m 644 mesos.pc 
> '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/lib/pkgconfig'
> Making install in 3rdparty
> /Applications/Xcode.app/Contents/Developer/usr/bin/make  install-recursive
> Making install in libprocess
> Making install in 3rdparty
> /Applications/Xcode.app/Contents/Developer/usr/bin/make  install-recursive
> Making install in stout
> Making install in .
> make[9]: Nothing to be done for `install-exec-am'.
> make[9]: Nothing to be done for `install-data-am'.
> Making install in include
> make[9]: Nothing to be done for `install-exec-am'.
>  ../../../../../../3rdparty/libprocess/3rdparty/stout/install-sh -c -d 
> '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include'
>  ../../../../../../3rdparty/libprocess/3rdparty/stout/install-sh -c -d 
> '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/stout'
>  /usr/bin/install -c -m 644  
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/abort.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/attributes.hpp
>  
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/base64.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/bits.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/bytes.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/cache.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/check.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/duration.hpp
>  
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/dynamiclibrary.hpp
>  ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/error.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/exit.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/flags.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/foreach.hpp
>  
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/format.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/fs.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/gtest.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/gzip.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/hashmap.hpp
>  
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/hashset.hpp
>  
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/interval.hpp
>  ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/ip.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/json.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/lambda.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/linkedhashmap.hpp
>  ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/list.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/mac.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/multihashmap.hpp
>  
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/multimap.hpp
>  ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/net.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/none.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/nothing.hpp
>  
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/numify.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp 
> 

[jira] [Commented] (MESOS-4038) SlaveRecoveryTests, UserCgroupIsolatorTests fail on CentOS 6.6

2016-01-11 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15093443#comment-15093443
 ] 

Timothy Chen commented on MESOS-4038:
-

Greg are you going to try to fix this before 0.27.0? If not let's remove the 
target version.

> SlaveRecoveryTests, UserCgroupIsolatorTests fail on CentOS 6.6
> --
>
> Key: MESOS-4038
> URL: https://issues.apache.org/jira/browse/MESOS-4038
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS 6.6
>Reporter: Greg Mann
>Assignee: Jan Schlicht
>  Labels: mesosphere, test-failure
>
> All {{SlaveRecoveryTest.\*}} tests, 
> {{MesosContainerizerSlaveRecoveryTest.\*}} tests, and 
> {{UserCgroupIsolatorTest*}} tests fail on CentOS 6.6 with {{TypeParam = 
> mesos::internal::slave::MesosContainerizer}}. They all fail with the same 
> error:
> {code}
> [--] 1 test from SlaveRecoveryTest/0, where TypeParam = 
> mesos::internal::slave::MesosContainerizer
> [ RUN  ] SlaveRecoveryTest/0.ReconnectExecutor
> ../../src/tests/mesos.cpp:722: Failure
> cgroups::mount(hierarchy, subsystem): '/cgroup/perf_event' already exists in 
> the file system
> -
> We cannot run any cgroups tests that require
> a hierarchy with subsystem 'perf_event'
> because we failed to find an existing hierarchy
> or create a new one (tried '/cgroup/perf_event').
> You can either remove all existing
> hierarchies, or disable this test case
> (i.e., --gtest_filter=-SlaveRecoveryTest/0.*).
> -
> ../../src/tests/mesos.cpp:776: Failure
> cgroups: '/cgroup/perf_event' is not a valid hierarchy
> [  FAILED  ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = 
> mesos::internal::slave::MesosContainerizer (8 ms)
> [--] 1 test from SlaveRecoveryTest/0 (9 ms total)
> [--] Global test environment tear-down
> [==] 1 test from 1 test case ran. (15 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = 
> mesos::internal::slave::MesosContainerizer
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4255) Add mechanism for testing recovery of HTTP based executors

2016-01-11 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15093432#comment-15093432
 ] 

Timothy Chen commented on MESOS-4255:
-

Anand is this going to make 0.27.0? Let me know before end of wednesday or I'll 
remove it myself.

> Add mechanism for testing recovery of HTTP based executors
> --
>
> Key: MESOS-4255
> URL: https://issues.apache.org/jira/browse/MESOS-4255
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Currently, the slave process generates a process ID every time it is 
> initialized via {{process::ID::generate}} function call. This is a problem 
> for testing HTTP executors as it can't retry if there is a disconnection 
> after an agent restart since the prefix is incremented. 
> {code}
> Agent PID before:
> slave(1)@127.0.0.1:43915
> Agent PID after restart:
> slave(2)@127.0.0.1:43915
> {code}
> There are a couple of ways to fix this:
> - Add a constructor to {{Slave}} exclusively for testing that passes on a 
> fixed {{ID}} instead of relying on {{ID::generate}}.
> - Currently we delegate to slave(1)@ i.e. (1) when nothing is specified as 
> the URL in libprocess i.e. {{127.0.0.1:43915/api/v1/executor}} would delegate 
> to {{slave(1)@127.0.0.1:43915/api/v1/executor}}. Instead of defaulting to 
> (1), we can default to the last known active ID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3558) Make the CommandExecutor use the Executor Library speaking HTTP

2016-01-11 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15093430#comment-15093430
 ] 

Timothy Chen commented on MESOS-3558:
-

I don't think is going to make 0.27.0 right? Can we remove the target version 
for now?

> Make the CommandExecutor use the Executor Library speaking HTTP
> ---
>
> Key: MESOS-3558
> URL: https://issues.apache.org/jira/browse/MESOS-3558
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>  Labels: mesosphere
>
> Instead of using the {{MesosExecutorDriver}} , we should make the 
> {{CommandExecutor}} in {{src/launcher/executor.cpp}} use the new Executor 
> HTTP Library that we create in {{MESOS-3550}}. 
> This would act as a good validation of the {{HTTP API}} implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3003) Support mounting in default configuration files/volumes into every new container

2016-01-08 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089979#comment-15089979
 ] 

Timothy Chen commented on MESOS-3003:
-

I think following what libcontainer/runc does, we should create a list of etc 
files to mount (ie: etc/host and etc/resolv.conf) in the container when we see 
that /etc is not mounted already from the host.
For now I think this should be suffice and we need to test different containers 
to see is there any more configuration files that we need to pass in.

> Support mounting in default configuration files/volumes into every new 
> container
> 
>
> Key: MESOS-3003
> URL: https://issues.apache.org/jira/browse/MESOS-3003
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Timothy Chen
>  Labels: mesosphere, unified-containerizer-mvp
>
> Most container images leave out system configuration (e.g: /etc/*) and expect 
> the container runtimes to mount in specific configurations as needed such as 
> /etc/resolv.conf from the host into the container when needed.
> We need to support mounting in specific configuration files for command 
> executor to work, and also allow the user to optionally define other 
> configuration files to mount in as well via flags.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4290) Reject tasks with images with filesystem/posix isolator

2016-01-06 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-4290:

Fix Version/s: 0.27.0

> Reject tasks with images with filesystem/posix isolator
> ---
>
> Key: MESOS-4290
> URL: https://issues.apache.org/jira/browse/MESOS-4290
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>  Labels: mesosphere
> Fix For: 0.27.0
>
>
> Currently the filesystem/posix isolator allows tasks with images to run, but 
> this will cause problems as we don't correctly create a mount namespace. We 
> should reject all tasks with images with filesystem/posix isolator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4290) Reject tasks with images with filesystem/posix isolator

2016-01-06 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15086704#comment-15086704
 ] 

Timothy Chen commented on MESOS-4290:
-

commit 52abf8de380cf7a3c3d8a2e5616b3d34d7b6b277
Author: Timothy Chen 
Date:   Tue Jan 5 17:29:57 2016 -0800

Fixed posix filesystem isolator to not allow executors with image.

Review: https://reviews.apache.org/r/41909/

> Reject tasks with images with filesystem/posix isolator
> ---
>
> Key: MESOS-4290
> URL: https://issues.apache.org/jira/browse/MESOS-4290
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>  Labels: mesosphere
> Fix For: 0.27.0
>
>
> Currently the filesystem/posix isolator allows tasks with images to run, but 
> this will cause problems as we don't correctly create a mount namespace. We 
> should reject all tasks with images with filesystem/posix isolator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4285) Mesos command task doesn't support volumes with image

2016-01-04 Thread Timothy Chen (JIRA)
Timothy Chen created MESOS-4285:
---

 Summary: Mesos command task doesn't support volumes with image
 Key: MESOS-4285
 URL: https://issues.apache.org/jira/browse/MESOS-4285
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Reporter: Timothy Chen
Assignee: Timothy Chen


Currently volumes are stripped when an image is specified running a command 
task with Mesos containerizer. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4285) Mesos command task doesn't support volumes with image

2016-01-04 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-4285:

Labels: mesosphere  (was: )

> Mesos command task doesn't support volumes with image
> -
>
> Key: MESOS-4285
> URL: https://issues.apache.org/jira/browse/MESOS-4285
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>  Labels: mesosphere
>
> Currently volumes are stripped when an image is specified running a command 
> task with Mesos containerizer. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4290) Reject tasks with images with filesystem/posix isolator

2016-01-04 Thread Timothy Chen (JIRA)
Timothy Chen created MESOS-4290:
---

 Summary: Reject tasks with images with filesystem/posix isolator
 Key: MESOS-4290
 URL: https://issues.apache.org/jira/browse/MESOS-4290
 Project: Mesos
  Issue Type: Bug
Reporter: Timothy Chen
Assignee: Timothy Chen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4031) slave crashed in cgroupstatistics()

2016-01-04 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen reassigned MESOS-4031:
---

Assignee: Timothy Chen

> slave crashed in cgroupstatistics()
> ---
>
> Key: MESOS-4031
> URL: https://issues.apache.org/jira/browse/MESOS-4031
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, libprocess
>Affects Versions: 0.24.0
> Environment: Debian jessie
>Reporter: Steven
>Assignee: Timothy Chen
>
> Hi all, 
> I have built a mesos cluster with three slaves. Any slave may sporadically 
> crash when I get the summary through mesos master ui. Here is the stack 
> trace. 
> ```
>  slave.sh[13336]: I1201 11:54:12.827975 13338 slave.cpp:3926] Current disk 
> usage 79.71%. Max allowed age: 17.279577136390834hrs
>  slave.sh[13336]: I1201 11:55:12.829792 13342 slave.cpp:3926] Current disk 
> usage 79.71%. Max allowed age: 17.279577136390834hrs
>  slave.sh[13336]: I1201 11:55:38.389614 13342 http.cpp:189] HTTP GET for 
> /slave(1)/state from 192.168.100.1:64870 with User-Agent='Mozilla/5.0 (X11; 
> Ubuntu; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0'
>  docker[8409]: time="2015-12-01T11:55:38.934148017+08:00" level=info msg="GET 
> /v1.20/containers/mesos-b25be32d-41e1-4e14-9b84-d33d733cef51-S3.79c206a6-d6b5-487b-9390-e09292c5b53a/json"
>  docker[8409]: time="2015-12-01T11:55:38.941489332+08:00" level=info msg="GET 
> /v1.20/containers/mesos-b25be32d-41e1-4e14-9b84-d33d733cef51-S3.1e01a4b3-a76e-4bf6-8ce0-a4a937faf236/json"
>  slave.sh[13336]: ABORT: 
> (../../3rdparty/libprocess/3rdparty/stout/include/stout/result.hpp:110): 
> Result::get() but state == NONE*** Aborted at 1448942139 (unix time) try 
> "date -d @1448942139" if you are using GNU date ***
>  slave.sh[13336]: PC: @ 0x7f295218a107 (unknown)
>  slave.sh[13336]: *** SIGABRT (@0x3419) received by PID 13337 (TID 
> 0x7f2948992700) from PID 13337; stack trace: ***
>  slave.sh[13336]: @ 0x7f2952a2e8d0 (unknown)
>  slave.sh[13336]: @ 0x7f295218a107 (unknown)
>  slave.sh[13336]: @ 0x7f295218b4e8 (unknown)
>  slave.sh[13336]: @   0x43dc59 _Abort()
>  slave.sh[13336]: @   0x43dc87 _Abort()
>  slave.sh[13336]: @ 0x7f2955e31c86 Result<>::get()
>  slave.sh[13336]: @ 0x7f295637f017 
> mesos::internal::slave::DockerContainerizerProcess::cgroupsStatistics()
>  slave.sh[13336]: @ 0x7f295637dfea 
> _ZZN5mesos8internal5slave26DockerContainerizerProcess5usageERKNS_11ContainerIDEENKUliE_clEi
>  slave.sh[13336]: @ 0x7f295637e549 
> _ZZN5mesos8internal5slave26DockerContainerizerProcess5usageERKNS_11ContainerIDEENKUlRKN6Docker9ContainerEE0_clES9_
>  slave.sh[13336]: @ 0x7f295638453b
> ZN5mesos8internal5slave26DockerContainerizerProcess5usageERKNS1_11ContainerIDEEUlRKN6Docker9ContainerEE0_EcvSt8functionIFT_T0_EEINS_6FutureINS1_18ResourceStatisticsEEESB_EEvENKUlSB_E_clESB_ENKUlvE_clEv
>  slave.sh[13336]: @ 0x7f295638751d
> FN7process6FutureIN5mesos18ResourceStatisticsEEEvEZZNKS0_9_DeferredIZNS2_8internal5slave26DockerContainerizerProcess5usageERKNS2_11ContainerIDEEUlRKN6Docker9ContainerEE0_EcvSt8functionIFT_T0_EEIS4_SG_EEvENKUlSG_E_clESG_EUlvE_E9_M_invoke
>  slave.sh[13336]: @ 0x7f29563b53e7 std::function<>::operator()()
>  slave.sh[13336]: @ 0x7f29563aa5dc 
> _ZZN7process8dispatchIN5mesos18ResourceStatisticsEEENS_6FutureIT_EERKNS_4UPIDERKSt8functionIFS5_vEEENKUlPNS_11ProcessBaseEE_clESF_
>  slave.sh[13336]: @ 0x7f29563bd667 
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos18ResourceStatisticsEEENS0_6FutureIT_EERKNS0_4UPIDERKSt8functionIFS9_vEEEUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
>  slave.sh[13336]: @ 0x7f2956b893c3 std::function<>::operator()()
>  slave.sh[13336]: @ 0x7f2956b72ab0 process::ProcessBase::visit()
>  slave.sh[13336]: @ 0x7f2956b7588e process::DispatchEvent::visit()
>  slave.sh[13336]: @ 0x7f2955d7f972 process::ProcessBase::serve()
>  slave.sh[13336]: @ 0x7f2956b6ef8e process::ProcessManager::resume()
>  slave.sh[13336]: @ 0x7f2956b63555 process::internal::schedule()
>  slave.sh[13336]: @ 0x7f2956bc0839 
> _ZNSt12_Bind_simpleIFPFvvEvEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE
>  slave.sh[13336]: @ 0x7f2956bc0781 std::_Bind_simple<>::operator()()
>  slave.sh[13336]: @ 0x7f2956bc06fe std::thread::_Impl<>::_M_run()
>  slave.sh[13336]: @ 0x7f29527ca970 (unknown)
>  slave.sh[13336]: @ 0x7f2952a270a4 start_thread
>  slave.sh[13336]: @ 0x7f295223b04d (unknown)
> ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4290) Reject tasks with images with filesystem/posix isolator

2016-01-04 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-4290:

Component/s: containerization

> Reject tasks with images with filesystem/posix isolator
> ---
>
> Key: MESOS-4290
> URL: https://issues.apache.org/jira/browse/MESOS-4290
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>  Labels: mesosphere
>
> Currently the filesystem/posix isolator allows tasks with images to run, but 
> this will cause problems as we don't correctly create a mount namespace. We 
> should reject all tasks with images with filesystem/posix isolator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4290) Reject tasks with images with filesystem/posix isolator

2016-01-04 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-4290:

Description: Currently the filesystem/posix isolator allows tasks with 
images to run, but this will cause problems as we don't correctly create a 
mount namespace. We should reject all tasks with images with filesystem/posix 
isolator.

> Reject tasks with images with filesystem/posix isolator
> ---
>
> Key: MESOS-4290
> URL: https://issues.apache.org/jira/browse/MESOS-4290
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>  Labels: mesosphere
>
> Currently the filesystem/posix isolator allows tasks with images to run, but 
> this will cause problems as we don't correctly create a mount namespace. We 
> should reject all tasks with images with filesystem/posix isolator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4163) SlaveTest.HTTPSchedulerSlaveRestart is slow

2015-12-30 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074770#comment-15074770
 ] 

Timothy Chen commented on MESOS-4163:
-

commit 274c4abc9b7dc2b70bcbb9bd3448c5125b8c52c3
Author: Jian Qiu 
Date:   Tue Dec 29 19:47:44 2015 -0800

Speed up SlaveTest.HTTPSchedulerSlaveRestart

Review: https://reviews.apache.org/r/41675/

> SlaveTest.HTTPSchedulerSlaveRestart is slow
> ---
>
> Key: MESOS-4163
> URL: https://issues.apache.org/jira/browse/MESOS-4163
> Project: Mesos
>  Issue Type: Improvement
>  Components: technical debt, test
>Reporter: Alexander Rukletsov
>Assignee: Jian Qiu
>Priority: Minor
>  Labels: mesosphere, newbie++, tech-debt
>
> The {{SlaveTest.HTTPSchedulerSlaveRestart}} test takes more than {{2s}} to 
> finish on my Mac OS 10.10.4:
> {code}
> SlaveTest.HTTPSchedulerSlaveRestart (2307 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4260) Volumes not handled correctly for command tasks

2015-12-30 Thread Timothy Chen (JIRA)
Timothy Chen created MESOS-4260:
---

 Summary: Volumes not handled correctly for command tasks
 Key: MESOS-4260
 URL: https://issues.apache.org/jira/browse/MESOS-4260
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Reporter: Timothy Chen
Assignee: Timothy Chen


Volumes are currently not handled correctly as we strip the volumes in 
ExecutorInfo when running a command task



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4261) Remove docker auth server flag

2015-12-30 Thread Timothy Chen (JIRA)
Timothy Chen created MESOS-4261:
---

 Summary: Remove docker auth server flag
 Key: MESOS-4261
 URL: https://issues.apache.org/jira/browse/MESOS-4261
 Project: Mesos
  Issue Type: Improvement
Reporter: Timothy Chen


We currently use a configured docker auth server from a slave flag to get token 
auth for docker registry. However this doesn't work for private registries as 
docker registry supports sending down the correct auth server to contact.

We should remove docker auth server flag completely and ask the docker registry 
for auth server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4261) Remove docker auth server flag

2015-12-30 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-4261:

   Shepherd: Jie Yu
   Assignee: Timothy Chen
 Sprint: Mesosphere Sprint 25
Component/s: containerization

> Remove docker auth server flag
> --
>
> Key: MESOS-4261
> URL: https://issues.apache.org/jira/browse/MESOS-4261
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>
> We currently use a configured docker auth server from a slave flag to get 
> token auth for docker registry. However this doesn't work for private 
> registries as docker registry supports sending down the correct auth server 
> to contact.
> We should remove docker auth server flag completely and ask the docker 
> registry for auth server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4137) Modularize plain-file logging for executor/task logs launched with the Docker Containerizer

2015-12-22 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15068744#comment-15068744
 ] 

Timothy Chen edited comment on MESOS-4137 at 12/22/15 9:19 PM:
---

commit d01585f1b312ffbe3c17f69d4c37d8e2375f9409
Author: Joseph Wu 
Date:   Mon Dec 21 16:54:11 2015 -0800

Changed Docker::run to take Subprocess::IO instead of Option.

Review: https://reviews.apache.org/r/41560/


was (Author: tnachen):
commit d01585f1b312ffbe3c17f69d4c37d8e2375f9409
Author: Joseph Wu 
Date:   Mon Dec 21 16:54:11 2015 -0800

Changed Docker::run to take Subprocess::IO instead of Option.

This changes the FDs used by the `mesos-docker-executor` to inherit rather 
than open anew.

In the `mesos-docker-executor`, we originally passed the log file path 
(i.e. `path::join(sandboxDirectory, "stdout")`) as an argument to 
`Docker::run`.  In the executor's context, the log file is already open as 
`STDOUT_FILENO`.

By inheriting the FD, the docker containerizer's logging-code will path 
mirrors that of the mesos containerizer.

Review: https://reviews.apache.org/r/41560/

> Modularize plain-file logging for executor/task logs launched with the Docker 
> Containerizer
> ---
>
> Key: MESOS-4137
> URL: https://issues.apache.org/jira/browse/MESOS-4137
> Project: Mesos
>  Issue Type: Task
>  Components: docker, modules
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: logging, mesosphere
>
> Adding a hook inside the Docker containerizer is slightly more involved than 
> the Mesos containerizer.
> Docker executors/tasks perform plain-file logging in different places 
> depending on whether the agent is in a Docker container itself
> || Agent || Code ||
> | Not in container | {{DockerContainerizerProcess::launchExecutorProcess}} |
> | In container | {{Docker::run}} in a {{mesos-docker-executor}} process |
> This means a {{ContainerLogger}} will need to be loaded or hooked into the 
> {{mesos-docker-executor}}.  Or we will need to change how piping in done in 
> {{mesos-docker-executor}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4137) Modularize plain-file logging for executor/task logs launched with the Docker Containerizer

2015-12-22 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15068744#comment-15068744
 ] 

Timothy Chen commented on MESOS-4137:
-

commit d01585f1b312ffbe3c17f69d4c37d8e2375f9409
Author: Joseph Wu 
Date:   Mon Dec 21 16:54:11 2015 -0800

Changed Docker::run to take Subprocess::IO instead of Option.

This changes the FDs used by the `mesos-docker-executor` to inherit rather 
than open anew.

In the `mesos-docker-executor`, we originally passed the log file path 
(i.e. `path::join(sandboxDirectory, "stdout")`) as an argument to 
`Docker::run`.  In the executor's context, the log file is already open as 
`STDOUT_FILENO`.

By inheriting the FD, the docker containerizer's logging-code will path 
mirrors that of the mesos containerizer.

Review: https://reviews.apache.org/r/41560/

> Modularize plain-file logging for executor/task logs launched with the Docker 
> Containerizer
> ---
>
> Key: MESOS-4137
> URL: https://issues.apache.org/jira/browse/MESOS-4137
> Project: Mesos
>  Issue Type: Task
>  Components: docker, modules
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: logging, mesosphere
>
> Adding a hook inside the Docker containerizer is slightly more involved than 
> the Mesos containerizer.
> Docker executors/tasks perform plain-file logging in different places 
> depending on whether the agent is in a Docker container itself
> || Agent || Code ||
> | Not in container | {{DockerContainerizerProcess::launchExecutorProcess}} |
> | In container | {{Docker::run}} in a {{mesos-docker-executor}} process |
> This means a {{ContainerLogger}} will need to be loaded or hooked into the 
> {{mesos-docker-executor}}.  Or we will need to change how piping in done in 
> {{mesos-docker-executor}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4242) Allow Docker private registry credentials to be passed from framework

2015-12-22 Thread Timothy Chen (JIRA)
Timothy Chen created MESOS-4242:
---

 Summary: Allow Docker private registry credentials to be passed 
from framework
 Key: MESOS-4242
 URL: https://issues.apache.org/jira/browse/MESOS-4242
 Project: Mesos
  Issue Type: Improvement
  Components: containerization
Reporter: Timothy Chen
Assignee: Timothy Chen


We want to support pulling images from docker private registry for the Mesos 
containerizer, and need to allow users to provide credentials from the 
framework to pull from a private registry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4241) Consolidate docker store slave flags

2015-12-22 Thread Timothy Chen (JIRA)
Timothy Chen created MESOS-4241:
---

 Summary: Consolidate docker store slave flags
 Key: MESOS-4241
 URL: https://issues.apache.org/jira/browse/MESOS-4241
 Project: Mesos
  Issue Type: Improvement
  Components: containerization
Reporter: Timothy Chen
Assignee: Timothy Chen


Currently there are too many slave flags for configuring the docker 
store/puller.
We can remove the following flags:

docker_auth_server_port
docker_local_archives_dir
docker_registry_port
docker_puller

And consolidate them into the existing flags.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4241) Consolidate docker store slave flags

2015-12-22 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-4241:

Story Points: 3

> Consolidate docker store slave flags
> 
>
> Key: MESOS-4241
> URL: https://issues.apache.org/jira/browse/MESOS-4241
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>
> Currently there are too many slave flags for configuring the docker 
> store/puller.
> We can remove the following flags:
> docker_auth_server_port
> docker_local_archives_dir
> docker_registry_port
> docker_puller
> And consolidate them into the existing flags.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2969) Support docker images discovery via Docker Registry v2 API

2015-12-21 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-2969:

Fix Version/s: 0.26.0

> Support docker images discovery via Docker Registry v2 API
> --
>
> Key: MESOS-2969
> URL: https://issues.apache.org/jira/browse/MESOS-2969
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Timothy Chen
>Assignee: Jojy Varghese
>  Labels: mesosphere
> Fix For: 0.26.0
>
>
> We want to support discovering what a docker image url and docker layers 
> information talking to a configured docker registry that talks v2 API.
> This allows the docker store to pull and store the image and all its layers 
> based on the discovered information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3259) Support health checks in Docker Containerizer

2015-12-21 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-3259:

Fix Version/s: 0.26.0

> Support health checks in Docker Containerizer 
> --
>
> Key: MESOS-3259
> URL: https://issues.apache.org/jira/browse/MESOS-3259
> Project: Mesos
>  Issue Type: Improvement
>  Components: docker
>Reporter: Timothy Chen
>Assignee: Jojy Varghese
> Fix For: 0.26.0
>
>
> We need to support docker exec health checks in a container within the docker 
> executor.
> A health check is defined in a TaskInfo and it's not supported in the Docker 
> Containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3096) Authentication for Communicating with Docker Registry

2015-12-21 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-3096:

Fix Version/s: 0.26.0

> Authentication for Communicating with Docker Registry
> -
>
> Key: MESOS-3096
> URL: https://issues.apache.org/jira/browse/MESOS-3096
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Lily Chen
>Assignee: Jojy Varghese
>  Labels: mesosphere
> Fix For: 0.26.0
>
>
> In order to pull Docker images from Docker Hub and private Docker registries, 
> the provisioner must support two primary authentication frameworks to 
> authenticate with the registries, basic authentication and the OAuth2.0 
> authorization framework, as per the docker registry spec. A Docker registry 
> can also operate in standalone mode and may not require authentication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4138) Document proposal for exclusive resources in Mesos

2015-12-16 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060368#comment-15060368
 ] 

Timothy Chen commented on MESOS-4138:
-

The idea of exclusive resources is definitely needed! One question after 
reading the proposal is, how do we offer exclusive resources like cpusets? 
Clearly frameworks need to know the difference between shares and sets, but 
it's not specified in the proposal.

> Document proposal for exclusive resources in Mesos
> --
>
> Key: MESOS-4138
> URL: https://issues.apache.org/jira/browse/MESOS-4138
> Project: Mesos
>  Issue Type: Improvement
>  Components: isolation
>Reporter: Ian Downes
>
> Propose the concept of exclusivity to resources. An exclusive resource is a) 
> not shared with any other task, b) employs stronger isolation for more 
> predictable performance, and c) is consequently not oversubscribed (if 
> enabled). In contrast to normal resources, exclusive resources have greater 
> resource priority while oversubscribed resources have lower priority. 
> Initial resources that could support the notion of exclusivity include cpu, 
> network egress bandwidth, and IP addresses.
> Please see this 
> [document|https://docs.google.com/document/d/1Aby-U3-MPKE51s4aYd41L4Co2S97eM6LPtyzjyR_ecI/edit?usp=sharing].
>  All comments welcome!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4024) HealthCheckTest.CheckCommandTimeout is flaky.

2015-12-03 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15040436#comment-15040436
 ] 

Timothy Chen commented on MESOS-4024:
-

Ah the test does take a long time to run since it's waiting for 5 seconds time 
out of the health check program to timeout 3 times :(
Let me fix this to make it shorter.

> HealthCheckTest.CheckCommandTimeout is flaky.
> -
>
> Key: MESOS-4024
> URL: https://issues.apache.org/jira/browse/MESOS-4024
> Project: Mesos
>  Issue Type: Bug
>Reporter: haosdent
>Assignee: haosdent
>  Labels: flaky-test
> Attachments: HealthCheckTest_CheckCommandTimeout.log
>
>
> {noformat: title=Failed Run}
> [ RUN  ] HealthCheckTest.CheckCommandTimeout
> I1201 13:03:15.211911 30288 leveldb.cpp:174] Opened db in 126.548747ms
> I1201 13:03:15.254041 30288 leveldb.cpp:181] Compacted db in 42.053948ms
> I1201 13:03:15.254226 30288 leveldb.cpp:196] Created db iterator in 25588ns
> I1201 13:03:15.254281 30288 leveldb.cpp:202] Seeked to beginning of db in 
> 3231ns
> I1201 13:03:15.254294 30288 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 256ns
> I1201 13:03:15.254348 30288 replica.cpp:778] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1201 13:03:15.255162 30311 recover.cpp:447] Starting replica recovery
> I1201 13:03:15.255502 30311 recover.cpp:473] Replica is in EMPTY status
> I1201 13:03:15.257158 30311 replica.cpp:674] Replica in EMPTY status received 
> a broadcasted recover request from (1898)@172.17.21.0:52024
> I1201 13:03:15.258224 30318 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I1201 13:03:15.259735 30310 recover.cpp:564] Updating replica status to 
> STARTING
> I1201 13:03:15.265080 30322 master.cpp:365] Master 
> dd5bff66-362f-4efc-963a-54756b2edcce (fa812f474cf4) started on 
> 172.17.21.0:52024
> I1201 13:03:15.265121 30322 master.cpp:367] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/IaRntP/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/IaRntP/master" --zk_session_timeout="10secs"
> I1201 13:03:15.265487 30322 master.cpp:412] Master only allowing 
> authenticated frameworks to register
> I1201 13:03:15.265504 30322 master.cpp:417] Master only allowing 
> authenticated slaves to register
> I1201 13:03:15.265513 30322 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/IaRntP/credentials'
> I1201 13:03:15.265842 30322 master.cpp:456] Using default 'crammd5' 
> authenticator
> I1201 13:03:15.266006 30322 master.cpp:493] Authorization enabled
> I1201 13:03:15.266464 30308 hierarchical.cpp:162] Initialized hierarchical 
> allocator process
> I1201 13:03:15.267225 30321 whitelist_watcher.cpp:77] No whitelist given
> I1201 13:03:15.268847 30322 master.cpp:1637] The newly elected leader is 
> master@172.17.21.0:52024 with id dd5bff66-362f-4efc-963a-54756b2edcce
> I1201 13:03:15.268887 30322 master.cpp:1650] Elected as the leading master!
> I1201 13:03:15.268905 30322 master.cpp:1395] Recovering from registrar
> I1201 13:03:15.270830 30322 registrar.cpp:307] Recovering registrar
> I1201 13:03:15.291272 30318 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 31.410668ms
> I1201 13:03:15.291363 30318 replica.cpp:321] Persisted replica status to 
> STARTING
> I1201 13:03:15.291733 30318 recover.cpp:473] Replica is in STARTING status
> I1201 13:03:15.293392 30318 replica.cpp:674] Replica in STARTING status 
> received a broadcasted recover request from (1900)@172.17.21.0:52024
> I1201 13:03:15.294251 30307 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I1201 13:03:15.294756 30307 recover.cpp:564] Updating replica status to VOTING
> I1201 13:03:15.338260 30307 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 43.256127ms
> I1201 13:03:15.338348 30307 replica.cpp:321] Persisted replica status to 
> VOTING
> I1201 13:03:15.338601 30307 recover.cpp:578] Successfully joined the Paxos 
> group
> I1201 13:03:15.338803 30307 recover.cpp:462] Recover process terminated
> 

[jira] [Commented] (MESOS-3527) HDFS HA fails outside of docker context

2015-11-30 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032505#comment-15032505
 ] 

Timothy Chen commented on MESOS-3527:
-

Hi Alan, the fetcher in Mesos has always been running outside of the docker 
container, since the fetcher itself depends on libmesos.

Since hdfs needs core-site.xml/hdfs config information, is those configuration 
available in the container you're running or on the slave? 



> HDFS HA fails outside of docker context
> ---
>
> Key: MESOS-3527
> URL: https://issues.apache.org/jira/browse/MESOS-3527
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alan Braithwaite
>  Labels: fetcher, hdfs, mesosphere, spark
>
> I'm using Spark with the Mesos driver.
> When I pass in a `hdfs:///path` url in for the spark application, 
> the fetcher attempts to download the jar files outside the spark context (the 
> docker container in this case).  The problem is that the core-site.xml and 
> hdfs-site.xml configs exist inside the container.  The host machine does not 
> have the necessary hdfs configuration to connect to the HA cluster.
> Currently, I'm not sure what the alternative ways of accessing a HA hadoop 
> cluster besides through the hadoop client.
> {code}
> I0926 06:34:19.346851 18851 fetcher.cpp:214] Fetching URI 
> 'hdfs://hdfsha/tmp/spark-job.jar'
> I0926 06:34:19.622860 18851 fetcher.cpp:99] Fetching URI 
> 'hdfs://hdfsha/tmp/spark-job.jar' using Hadoop Client
> I0926 06:34:19.622936 18851 fetcher.cpp:109] Downloading resource from 
> 'hdfs://hdfsha/tmp/spark-job.jar' to 
> '/state/var/lib/mesos/slaves/20150602-065056-269165578-5050-17724-S12/frameworks/20150914-102037-285942794-5050-31214-0029/executors/driver-20150926063418-0002/runs/9953ae1b-9387-489f-8645-5472d9c5eacf/spark-job.jar'
> E0926 06:34:20.814858 18851 fetcher.cpp:113] HDFS copyToLocal failed: 
> /usr/local/hadoop/bin/hadoop fs -copyToLocal 
> 'hdfs://hdfsha/tmp/spark-job.jar' 
> '/state/var/lib/mesos/slaves/20150602-065056-269165578-5050-17724-S12/frameworks/20150914-102037-285942794-5050-31214-0029/executors/driver-20150926063418-0002/runs/9953ae1b-9387-489f-8645-5472d9c5eacf/spark-job.jar'
> -copyToLocal: java.net.UnknownHostException: hdfsha
> Usage: hadoop fs [generic options] -copyToLocal [-p] [-ignoreCrc] [-crc] 
>  ... 
> Failed to fetch: hdfs://hdfsha/tmp/spark-job.jar
> {code}
> The code in question:
> https://github.com/apache/mesos/blob/fbb12a52969710fe69c309c83db0a5441dbea886/src/launcher/fetcher.cpp#L92-L114



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3937) Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.

2015-11-25 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026628#comment-15026628
 ] 

Timothy Chen commented on MESOS-3937:
-

I can't repro this with phusion/ubuntu-14.04-amd64 vagrant image?
example_executor.go is open source in mesos-go repo in mesos/mesos-go

> Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.
> ---
>
> Key: MESOS-3937
> URL: https://issues.apache.org/jira/browse/MESOS-3937
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.26.0
> Environment: Ubuntu 14.04, gcc 4.8.4, Docker version 1.6.2
> 8 CPUs, 16 GB memory
> Vagrant, libvirt/Virtual Box or VMware
>Reporter: Bernd Mathiske
>Assignee: Timothy Chen
>  Labels: mesosphere
>
> {noformat}
> ../configure
> make check
> sudo ./bin/mesos-tests.sh 
> --gtest_filter="DockerContainerizerTest.ROOT_DOCKER_Launch_Executor" --verbose
> {noformat}
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from DockerContainerizerTest
> I1117 15:08:09.265943 26380 leveldb.cpp:176] Opened db in 3.199666ms
> I1117 15:08:09.267761 26380 leveldb.cpp:183] Compacted db in 1.684873ms
> I1117 15:08:09.267902 26380 leveldb.cpp:198] Created db iterator in 58313ns
> I1117 15:08:09.267966 26380 leveldb.cpp:204] Seeked to beginning of db in 
> 4927ns
> I1117 15:08:09.267997 26380 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 1605ns
> I1117 15:08:09.268156 26380 replica.cpp:780] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1117 15:08:09.270148 26396 recover.cpp:449] Starting replica recovery
> I1117 15:08:09.272105 26396 recover.cpp:475] Replica is in EMPTY status
> I1117 15:08:09.275640 26396 replica.cpp:676] Replica in EMPTY status received 
> a broadcasted recover request from (4)@10.0.2.15:50088
> I1117 15:08:09.276578 26399 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1117 15:08:09.277600 26397 recover.cpp:566] Updating replica status to 
> STARTING
> I1117 15:08:09.279613 26396 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.016098ms
> I1117 15:08:09.279731 26396 replica.cpp:323] Persisted replica status to 
> STARTING
> I1117 15:08:09.280306 26399 recover.cpp:475] Replica is in STARTING status
> I1117 15:08:09.282181 26400 replica.cpp:676] Replica in STARTING status 
> received a broadcasted recover request from (5)@10.0.2.15:50088
> I1117 15:08:09.282552 26400 master.cpp:367] Master 
> 59c600f1-92ff-4926-9c84-073d9b81f68a (vagrant-ubuntu-trusty-64) started on 
> 10.0.2.15:50088
> I1117 15:08:09.283021 26400 master.cpp:369] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/40AlT8/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/40AlT8/master" 
> --zk_session_timeout="10secs"
> I1117 15:08:09.283920 26400 master.cpp:414] Master only allowing 
> authenticated frameworks to register
> I1117 15:08:09.283972 26400 master.cpp:419] Master only allowing 
> authenticated slaves to register
> I1117 15:08:09.284032 26400 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/40AlT8/credentials'
> I1117 15:08:09.282944 26401 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1117 15:08:09.284639 26401 recover.cpp:566] Updating replica status to VOTING
> I1117 15:08:09.285539 26400 master.cpp:458] Using default 'crammd5' 
> authenticator
> I1117 15:08:09.285995 26401 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.075466ms
> I1117 15:08:09.286062 26401 replica.cpp:323] Persisted replica status to 
> VOTING
> I1117 15:08:09.286200 26401 recover.cpp:580] Successfully joined the Paxos 
> group
> I1117 15:08:09.286471 26401 recover.cpp:464] Recover process terminated
> I1117 15:08:09.287303 26400 authenticator.cpp:520] Initializing server SASL
> I1117 15:08:09.289371 26400 master.cpp:495] Authorization enabled
> I1117 15:08:09.296018 26399 master.cpp:1606] The newly elected leader is 
> master@10.0.2.15:50088 

[jira] [Commented] (MESOS-3966) LinuxFilesystemIsolatorTest.ROOT_ImageInVolumeWithRootFilesystem fails on Centos 7.1

2015-11-24 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15025942#comment-15025942
 ] 

Timothy Chen commented on MESOS-3966:
-

Bind doesn't work yet since we're not able to create a sandbox directory after 
it's binded.
So we need to modify the code to make it work.

> LinuxFilesystemIsolatorTest.ROOT_ImageInVolumeWithRootFilesystem fails on 
> Centos 7.1
> 
>
> Key: MESOS-3966
> URL: https://issues.apache.org/jira/browse/MESOS-3966
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
> Environment: centos 7.1, gcc 4.8.3, docker 1.8.2
>Reporter: Till Toenshoff
>Assignee: Jan Schlicht
>  Labels: mesosphere
>
> {noformat}
> [ RUN  ] LinuxFilesystemIsolatorTest.ROOT_ImageInVolumeWithRootFilesystem
> I1120 11:39:37.862926 29944 linux.cpp:82] Making 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_ImageInVolumeWithRootFilesystem_ZBw23E'
>  a shared mount
> I1120 11:39:37.876965 29944 linux_launcher.cpp:103] Using 
> /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
> I1120 11:39:37.930881 29944 systemd.cpp:128] systemd version `208` detected
> W1120 11:39:37.930913 29944 systemd.cpp:136] Required functionality 
> `Delegate` was introduced in Version `218`. Your system may not function 
> properly; however since some distributions have patched systemd packages, 
> your system may still be functional. This is why we keep running. See 
> MESOS-3352 for more information
> I1120 11:39:37.938351 29944 systemd.cpp:210] Started systemd slice 
> `mesos_executors.slice`
> I1120 11:39:37.940218 29962 containerizer.cpp:618] Starting container 
> '1ea741a9-5edf-4910-ae64-f8d53f74e31e' for executor 'test_executor' of 
> framework ''
> I1120 11:39:37.943042 29959 provisioner.cpp:289] Provisioning image rootfs 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_ImageInVolumeWithRootFilesystem_ZBw23E/provisioner/containers/1ea741a9-5edf-4910-ae64-f8d53f74e31e/backends/copy/rootfses/7d97f8ac-ee57-4c83-b2d1-4332e25c89ae'
>  for container 1ea741a9-5edf-4910-ae64-f8d53f74e31e
> I1120 11:39:49.571781 29958 provisioner.cpp:289] Provisioning image rootfs 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_ImageInVolumeWithRootFilesystem_ZBw23E/provisioner/containers/1ea741a9-5edf-4910-ae64-f8d53f74e31e/backends/copy/rootfses/0256b892-e737-4d3d-89ea-74cf0e96eaf6'
>  for container 1ea741a9-5edf-4910-ae64-f8d53f74e31e
> ../../src/tests/containerizer/filesystem_isolator_tests.cpp:806: Failure
> Failed to wait 15secs for launch
> [  FAILED  ] LinuxFilesystemIsolatorTest.ROOT_ImageInVolumeWithRootFilesystem 
> (55076 ms)
> [--] 1 test from LinuxFilesystemIsolatorTest (55076 ms total)
> {noformat}
> The following vagrant generator was used:
> {noformat}
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.hostname = "centos71"
>   config.vm.box = "bento/centos-7.1"
>   config.vm.provider "virtualbox" do |vb|
> vb.memory = 16384
> vb.cpus = 8
>   end
>   config.vm.provider "vmware_fusion" do |vb|
> vb.memory = 9216
> vb.cpus = 4
>   end
>   config.vm.provision "shell", inline: <<-SHELL
>  sudo yum -y update systemd
>  sudo yum install -y tar wget
>  sudo wget 
> http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo 
> -O /etc/yum.repos.d/epel-apache-maven.repo
>  sudo yum groupinstall -y "Development Tools"
>  sudo yum install -y apache-maven python-devel java-1.7.0-openjdk-devel 
> zlib-devel libcurl-devel openssl-devel cyrus-sasl-devel cyrus-sasl-md5 
> apr-devel subversion-devel apr-util-devel
>  sudo yum install -y git
>  sudo yum install -y docker
>  sudo service docker start
>  sudo docker info
>  #sudo wget -qO- https://get.docker.com/ | sh
>   SHELL
> end
> EOF
> vagrant up
> vagrant reload
> vagrant ssh -c "
> git clone  https://github.com/apache/mesos.git mesos
> cd mesos
> git checkout -b 0.26.0-rc1 0.26.0-rc1
> ./bootstrap
> mkdir build
> cd build
> ../configure
> make -j4 check
> #make -j4 distcheck
> sudo ./bin/mesos-tests.sh
> #make clean
> #../configure --enable-libevent --enable-ssl
> #GTEST_FILTER="" make check
> #sudo ./bin/mesos-tests.sh
> "
> {noformat}
> Additionally, {{/etc/hosts}} was edited to contain hostname and IP (allowing 
> a pass of the bridged docker executor tests).
> {noformat}
> 127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
> ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
> 192.168.218.135 centos71
> {noformat}



--
This message was sent by Atlassian JIRA

[jira] [Created] (MESOS-4004) Support default entrypoint and command runtime config in Mesos containerizer

2015-11-24 Thread Timothy Chen (JIRA)
Timothy Chen created MESOS-4004:
---

 Summary: Support default entrypoint and command runtime config in 
Mesos containerizer
 Key: MESOS-4004
 URL: https://issues.apache.org/jira/browse/MESOS-4004
 Project: Mesos
  Issue Type: Improvement
  Components: containerization
Reporter: Timothy Chen


We need to use the entrypoint and command runtime configuration returned from 
image to be used in Mesos containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4005) Support workdir runtime configuration from image

2015-11-24 Thread Timothy Chen (JIRA)
Timothy Chen created MESOS-4005:
---

 Summary: Support workdir runtime configuration from image 
 Key: MESOS-4005
 URL: https://issues.apache.org/jira/browse/MESOS-4005
 Project: Mesos
  Issue Type: Improvement
  Components: containerization
Reporter: Timothy Chen


We need to support workdir runtime configuration returned from image such as 
Dockerfile.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3966) LinuxFilesystemIsolatorTest.ROOT_ImageInVolumeWithRootFilesystem fails on Centos 7.1

2015-11-24 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15025765#comment-15025765
 ] 

Timothy Chen commented on MESOS-3966:
-

commit c8ff94e557902fcd23b388c83bfe2692a71ae787
Author: haosdent huang 
Date:   Tue Nov 24 14:18:32 2015 -0800

Increased launch and wait timeout in filesystem test.

Review: https://reviews.apache.org/r/40641

> LinuxFilesystemIsolatorTest.ROOT_ImageInVolumeWithRootFilesystem fails on 
> Centos 7.1
> 
>
> Key: MESOS-3966
> URL: https://issues.apache.org/jira/browse/MESOS-3966
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
> Environment: centos 7.1, gcc 4.8.3, docker 1.8.2
>Reporter: Till Toenshoff
>Assignee: Jan Schlicht
>  Labels: mesosphere
>
> {noformat}
> [ RUN  ] LinuxFilesystemIsolatorTest.ROOT_ImageInVolumeWithRootFilesystem
> I1120 11:39:37.862926 29944 linux.cpp:82] Making 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_ImageInVolumeWithRootFilesystem_ZBw23E'
>  a shared mount
> I1120 11:39:37.876965 29944 linux_launcher.cpp:103] Using 
> /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
> I1120 11:39:37.930881 29944 systemd.cpp:128] systemd version `208` detected
> W1120 11:39:37.930913 29944 systemd.cpp:136] Required functionality 
> `Delegate` was introduced in Version `218`. Your system may not function 
> properly; however since some distributions have patched systemd packages, 
> your system may still be functional. This is why we keep running. See 
> MESOS-3352 for more information
> I1120 11:39:37.938351 29944 systemd.cpp:210] Started systemd slice 
> `mesos_executors.slice`
> I1120 11:39:37.940218 29962 containerizer.cpp:618] Starting container 
> '1ea741a9-5edf-4910-ae64-f8d53f74e31e' for executor 'test_executor' of 
> framework ''
> I1120 11:39:37.943042 29959 provisioner.cpp:289] Provisioning image rootfs 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_ImageInVolumeWithRootFilesystem_ZBw23E/provisioner/containers/1ea741a9-5edf-4910-ae64-f8d53f74e31e/backends/copy/rootfses/7d97f8ac-ee57-4c83-b2d1-4332e25c89ae'
>  for container 1ea741a9-5edf-4910-ae64-f8d53f74e31e
> I1120 11:39:49.571781 29958 provisioner.cpp:289] Provisioning image rootfs 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_ImageInVolumeWithRootFilesystem_ZBw23E/provisioner/containers/1ea741a9-5edf-4910-ae64-f8d53f74e31e/backends/copy/rootfses/0256b892-e737-4d3d-89ea-74cf0e96eaf6'
>  for container 1ea741a9-5edf-4910-ae64-f8d53f74e31e
> ../../src/tests/containerizer/filesystem_isolator_tests.cpp:806: Failure
> Failed to wait 15secs for launch
> [  FAILED  ] LinuxFilesystemIsolatorTest.ROOT_ImageInVolumeWithRootFilesystem 
> (55076 ms)
> [--] 1 test from LinuxFilesystemIsolatorTest (55076 ms total)
> {noformat}
> The following vagrant generator was used:
> {noformat}
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.hostname = "centos71"
>   config.vm.box = "bento/centos-7.1"
>   config.vm.provider "virtualbox" do |vb|
> vb.memory = 16384
> vb.cpus = 8
>   end
>   config.vm.provider "vmware_fusion" do |vb|
> vb.memory = 9216
> vb.cpus = 4
>   end
>   config.vm.provision "shell", inline: <<-SHELL
>  sudo yum -y update systemd
>  sudo yum install -y tar wget
>  sudo wget 
> http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo 
> -O /etc/yum.repos.d/epel-apache-maven.repo
>  sudo yum groupinstall -y "Development Tools"
>  sudo yum install -y apache-maven python-devel java-1.7.0-openjdk-devel 
> zlib-devel libcurl-devel openssl-devel cyrus-sasl-devel cyrus-sasl-md5 
> apr-devel subversion-devel apr-util-devel
>  sudo yum install -y git
>  sudo yum install -y docker
>  sudo service docker start
>  sudo docker info
>  #sudo wget -qO- https://get.docker.com/ | sh
>   SHELL
> end
> EOF
> vagrant up
> vagrant reload
> vagrant ssh -c "
> git clone  https://github.com/apache/mesos.git mesos
> cd mesos
> git checkout -b 0.26.0-rc1 0.26.0-rc1
> ./bootstrap
> mkdir build
> cd build
> ../configure
> make -j4 check
> #make -j4 distcheck
> sudo ./bin/mesos-tests.sh
> #make clean
> #../configure --enable-libevent --enable-ssl
> #GTEST_FILTER="" make check
> #sudo ./bin/mesos-tests.sh
> "
> {noformat}
> Additionally, {{/etc/hosts}} was edited to contain hostname and IP (allowing 
> a pass of the bridged docker executor tests).
> {noformat}
> 127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
> ::1 localhost localhost.localdomain localhost6 

[jira] [Commented] (MESOS-3966) LinuxFilesystemIsolatorTest.ROOT_ImageInVolumeWithRootFilesystem fails on Centos 7.1

2015-11-24 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15025550#comment-15025550
 ] 

Timothy Chen commented on MESOS-3966:
-

I'll be merging Haosdent fix, but I think we should see how we can get bind to 
work with our tests so we don't copy /usr all the time.

> LinuxFilesystemIsolatorTest.ROOT_ImageInVolumeWithRootFilesystem fails on 
> Centos 7.1
> 
>
> Key: MESOS-3966
> URL: https://issues.apache.org/jira/browse/MESOS-3966
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
> Environment: centos 7.1, gcc 4.8.3, docker 1.8.2
>Reporter: Till Toenshoff
>Assignee: Jan Schlicht
>  Labels: mesosphere
>
> {noformat}
> [ RUN  ] LinuxFilesystemIsolatorTest.ROOT_ImageInVolumeWithRootFilesystem
> I1120 11:39:37.862926 29944 linux.cpp:82] Making 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_ImageInVolumeWithRootFilesystem_ZBw23E'
>  a shared mount
> I1120 11:39:37.876965 29944 linux_launcher.cpp:103] Using 
> /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
> I1120 11:39:37.930881 29944 systemd.cpp:128] systemd version `208` detected
> W1120 11:39:37.930913 29944 systemd.cpp:136] Required functionality 
> `Delegate` was introduced in Version `218`. Your system may not function 
> properly; however since some distributions have patched systemd packages, 
> your system may still be functional. This is why we keep running. See 
> MESOS-3352 for more information
> I1120 11:39:37.938351 29944 systemd.cpp:210] Started systemd slice 
> `mesos_executors.slice`
> I1120 11:39:37.940218 29962 containerizer.cpp:618] Starting container 
> '1ea741a9-5edf-4910-ae64-f8d53f74e31e' for executor 'test_executor' of 
> framework ''
> I1120 11:39:37.943042 29959 provisioner.cpp:289] Provisioning image rootfs 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_ImageInVolumeWithRootFilesystem_ZBw23E/provisioner/containers/1ea741a9-5edf-4910-ae64-f8d53f74e31e/backends/copy/rootfses/7d97f8ac-ee57-4c83-b2d1-4332e25c89ae'
>  for container 1ea741a9-5edf-4910-ae64-f8d53f74e31e
> I1120 11:39:49.571781 29958 provisioner.cpp:289] Provisioning image rootfs 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_ImageInVolumeWithRootFilesystem_ZBw23E/provisioner/containers/1ea741a9-5edf-4910-ae64-f8d53f74e31e/backends/copy/rootfses/0256b892-e737-4d3d-89ea-74cf0e96eaf6'
>  for container 1ea741a9-5edf-4910-ae64-f8d53f74e31e
> ../../src/tests/containerizer/filesystem_isolator_tests.cpp:806: Failure
> Failed to wait 15secs for launch
> [  FAILED  ] LinuxFilesystemIsolatorTest.ROOT_ImageInVolumeWithRootFilesystem 
> (55076 ms)
> [--] 1 test from LinuxFilesystemIsolatorTest (55076 ms total)
> {noformat}
> The following vagrant generator was used:
> {noformat}
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.hostname = "centos71"
>   config.vm.box = "bento/centos-7.1"
>   config.vm.provider "virtualbox" do |vb|
> vb.memory = 16384
> vb.cpus = 8
>   end
>   config.vm.provider "vmware_fusion" do |vb|
> vb.memory = 9216
> vb.cpus = 4
>   end
>   config.vm.provision "shell", inline: <<-SHELL
>  sudo yum -y update systemd
>  sudo yum install -y tar wget
>  sudo wget 
> http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo 
> -O /etc/yum.repos.d/epel-apache-maven.repo
>  sudo yum groupinstall -y "Development Tools"
>  sudo yum install -y apache-maven python-devel java-1.7.0-openjdk-devel 
> zlib-devel libcurl-devel openssl-devel cyrus-sasl-devel cyrus-sasl-md5 
> apr-devel subversion-devel apr-util-devel
>  sudo yum install -y git
>  sudo yum install -y docker
>  sudo service docker start
>  sudo docker info
>  #sudo wget -qO- https://get.docker.com/ | sh
>   SHELL
> end
> EOF
> vagrant up
> vagrant reload
> vagrant ssh -c "
> git clone  https://github.com/apache/mesos.git mesos
> cd mesos
> git checkout -b 0.26.0-rc1 0.26.0-rc1
> ./bootstrap
> mkdir build
> cd build
> ../configure
> make -j4 check
> #make -j4 distcheck
> sudo ./bin/mesos-tests.sh
> #make clean
> #../configure --enable-libevent --enable-ssl
> #GTEST_FILTER="" make check
> #sudo ./bin/mesos-tests.sh
> "
> {noformat}
> Additionally, {{/etc/hosts}} was edited to contain hostname and IP (allowing 
> a pass of the bridged docker executor tests).
> {noformat}
> 127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
> ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
> 192.168.218.135 centos71
> {noformat}



--
This message was sent by Atlassian JIRA

[jira] [Updated] (MESOS-2980) Allow runtime configuration to be returned from provisioner

2015-11-24 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-2980:

Summary: Allow runtime configuration to be returned from provisioner  (was: 
Support execution configuration to be returned from provisioner)

> Allow runtime configuration to be returned from provisioner
> ---
>
> Key: MESOS-2980
> URL: https://issues.apache.org/jira/browse/MESOS-2980
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>  Labels: mesosphere
>
> Image specs also includes execution configuration (e.g: Env, user, ports, 
> etc).
> We should support passing those information from the image provisioner back 
> to the containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2980) Allow runtime configuration to be returned from provisioner

2015-11-24 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-2980:

Assignee: Gilbert Song  (was: Timothy Chen)

> Allow runtime configuration to be returned from provisioner
> ---
>
> Key: MESOS-2980
> URL: https://issues.apache.org/jira/browse/MESOS-2980
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Timothy Chen
>Assignee: Gilbert Song
>  Labels: mesosphere
>
> Image specs also includes execution configuration (e.g: Env, user, ports, 
> etc).
> We should support passing those information from the image provisioner back 
> to the containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3953) DockerTest.ROOT_DOCKER_CheckPortResource fails.

2015-11-23 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022835#comment-15022835
 ] 

Timothy Chen commented on MESOS-3953:
-

I can't reproduce this on a CentOS box as well, Till where are you running this 
test? And is there anything else running on the VM?

> DockerTest.ROOT_DOCKER_CheckPortResource fails.
> ---
>
> Key: MESOS-3953
> URL: https://issues.apache.org/jira/browse/MESOS-3953
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS Linux release 7.1.1503 (Core),
> gcc (GCC) 4.8.3,
> Docker version 1.9.0, build 76d6bc9
>Reporter: Till Toenshoff
>Assignee: Timothy Chen
>
> The following is happening on my CentOS 7 installation (100% reproducible).
> {noformat}
> [ RUN  ] DockerTest.ROOT_DOCKER_CheckPortResource
> I1118 08:18:50.336110 20979 docker.cpp:684] Running docker -H 
> unix:///var/run/docker.sock rm -f -v mesos-docker-port-resource-test
> I1118 08:18:50.413763 20979 resources.cpp:474] Parsing resources as JSON 
> failed: ports:[9998-];ports:[10001-11000]
> Trying semicolon-delimited string format instead
> I1118 08:18:50.414670 20979 resources.cpp:474] Parsing resources as JSON 
> failed: ports:[9998-];ports:[1-11000]
> Trying semicolon-delimited string format instead
> I1118 08:18:50.415073 20979 docker.cpp:564] Running docker -H 
> unix:///var/run/docker.sock run -e MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_CONTAINER_NAME=mesos-docker-port-resource-test -v 
> /tmp/DockerTest_ROOT_DOCKER_CheckPortResource_4e34OB:/mnt/mesos/sandbox --net 
> bridge -p 1:80 --name mesos-docker-port-resource-test busybox true
> ../../src/tests/containerizer/docker_tests.cpp:338: Failure
> (run).failure(): Container exited on error: exited with status 1
> I1118 08:18:50.717136 20979 docker.cpp:842] Running docker -H 
> unix:///var/run/docker.sock ps -a
> I1118 08:18:50.819042 20999 docker.cpp:723] Running docker -H 
> unix:///var/run/docker.sock inspect mesos-docker-port-resource-test
> I1118 08:18:50.924579 20979 docker.cpp:684] Running docker -H 
> unix:///var/run/docker.sock rm -f -v 
> 67781b79c7641a6450c3ddb4ba13112b6f5a50060eac3f65cac3ad57a2a527ea
> [  FAILED  ] DockerTest.ROOT_DOCKER_CheckPortResource
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2980) Support execution configuration to be returned from provisioner

2015-11-23 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen reassigned MESOS-2980:
---

Assignee: Timothy Chen

> Support execution configuration to be returned from provisioner
> ---
>
> Key: MESOS-2980
> URL: https://issues.apache.org/jira/browse/MESOS-2980
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>  Labels: mesosphere
>
> Image specs also includes execution configuration (e.g: Env, user, ports, 
> etc).
> We should support passing those information from the image provisioner back 
> to the containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2980) Support execution configuration to be returned from provisioner

2015-11-23 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023191#comment-15023191
 ] 

Timothy Chen commented on MESOS-2980:
-

But image information comes from the provisioner, so we need to get the image 
information back with execution config and pass that to another isolator if we 
want to do that. 

> Support execution configuration to be returned from provisioner
> ---
>
> Key: MESOS-2980
> URL: https://issues.apache.org/jira/browse/MESOS-2980
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>  Labels: mesosphere
>
> Image specs also includes execution configuration (e.g: Env, user, ports, 
> etc).
> We should support passing those information from the image provisioner back 
> to the containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3999) Add Mesos Provisioner user documentation

2015-11-23 Thread Timothy Chen (JIRA)
Timothy Chen created MESOS-3999:
---

 Summary: Add Mesos Provisioner user documentation
 Key: MESOS-3999
 URL: https://issues.apache.org/jira/browse/MESOS-3999
 Project: Mesos
  Issue Type: Documentation
  Components: documentation
Reporter: Timothy Chen


Add user documentation around Mesos provisioner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3479) COMMAND Health Checks are not executed if the timeout is exceeded

2015-11-21 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15020616#comment-15020616
 ] 

Timothy Chen edited comment on MESOS-3479 at 11/21/15 6:54 PM:
---

commit 2c51de63eb2b50511afa95e375fbcb2a4ba4bdec
Author: haosdent huang 
Date:   Sat Nov 21 10:50:45 2015 -0800

Fixed health check external command process after timeout.

Review: https://reviews.apache.org/r/38932



was (Author: tnachen):
https://reviews.apache.org/r/38932

> COMMAND Health Checks are not executed if the timeout is exceeded
> -
>
> Key: MESOS-3479
> URL: https://issues.apache.org/jira/browse/MESOS-3479
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Matthias Veit
>Assignee: haosdent
>Priority: Critical
> Fix For: 0.27.0
>
>
> The issue first appeared as Marathon Bug: See here for reference: 
> https://github.com/mesosphere/marathon/issues/2179.
> A COMMAND health check is defined with a timeout of 20 seconds.
> The command itself takes longer than 20 seconds to execute.
> Current behavior: 
> - The mesos health check process get's killed, but the defined command 
> process not (in the example the curl command returns after 21 seconds).
> - The check attempt is considered healthy, if the timeout is exceeded
> - The health check stops and is not executed any longer
> Expected behavior: 
> - The defined health check command is killed, when the timeout is exceeded
> - The check attempt is considered Unhealthy, if the timeout is exceeded
> - The health check does not stop 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3479) COMMAND Health Checks are not executed if the timeout is exceeded

2015-11-21 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-3479:

Fix Version/s: 0.27.0

> COMMAND Health Checks are not executed if the timeout is exceeded
> -
>
> Key: MESOS-3479
> URL: https://issues.apache.org/jira/browse/MESOS-3479
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Matthias Veit
>Assignee: haosdent
>Priority: Critical
> Fix For: 0.27.0
>
>
> The issue first appeared as Marathon Bug: See here for reference: 
> https://github.com/mesosphere/marathon/issues/2179.
> A COMMAND health check is defined with a timeout of 20 seconds.
> The command itself takes longer than 20 seconds to execute.
> Current behavior: 
> - The mesos health check process get's killed, but the defined command 
> process not (in the example the curl command returns after 21 seconds).
> - The check attempt is considered healthy, if the timeout is exceeded
> - The health check stops and is not executed any longer
> Expected behavior: 
> - The defined health check command is killed, when the timeout is exceeded
> - The check attempt is considered Unhealthy, if the timeout is exceeded
> - The health check does not stop 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3599) COMMAND health checks with marathon running in slave context broken

2015-11-21 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15020885#comment-15020885
 ] 

Timothy Chen commented on MESOS-3599:
-

Sorry joining this thread and related issues pretty late, didn't see there was 
all these discussions going on :)

I think my first question is that we didn't used to have any health check with 
Docker Containerizer, and when we added health checks it has always been 
running inside of the docker executor (thanks to Haosdent!). So I'm not really 
sure why moving back to 0.22.1 make it works, since your health check command 
is pretty much ignored with Docker Containerizer.

Now the ability to run outside of the container for a health check I think 
makes sense, I'm favoring more the first option Haosdent proposed since the 
ability to run inside of the container is also very important for dependency 
and lots of other reasons. We can ignore this option with http/tcp checks.

> COMMAND health checks with marathon running in slave context broken
> ---
>
> Key: MESOS-3599
> URL: https://issues.apache.org/jira/browse/MESOS-3599
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Erhan Kesken
>Assignee: haosdent
>Priority: Critical
>
> When deploying Mesos 0.23rc4 with latest Marathon 0.10.0 RC3 command health 
> check stop working. Rolling back to Mesos 0.22.1 fixes the problem.
> Containerizer is Docker.
> All packages are from official Mesosphere Ubuntu 14.04 sources.
> The issue must be analyzed further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3786) Backticks are not mentioned in Mesos C++ Style Guide

2015-11-20 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15019239#comment-15019239
 ] 

Timothy Chen commented on MESOS-3786:
-

commit 7e81e50dfac7309b9d75040c353e40ebed2f2ae0
Author: Greg Mann 
Date:   Fri Nov 20 17:04:49 2015 -0800

Added backtick usage in comments to the C++ style guide.

Review: https://reviews.apache.org/r/40367

> Backticks are not mentioned in Mesos C++ Style Guide
> 
>
> Key: MESOS-3786
> URL: https://issues.apache.org/jira/browse/MESOS-3786
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Greg Mann
>Assignee: Greg Mann
>Priority: Minor
>  Labels: documentation, mesosphere
>
> As far as I can tell, current practice is to quote code excerpts and object 
> names with backticks when writing comments. For example:
> {code}
> // You know, `sadPanda` seems extra sad lately.
> std::string sadPanda;
> sadPanda = "   :'(   ";
> {code}
> However, I don't see this documented in our C++ style guide at all. It should 
> be added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3123) DockerContainerizerTest.ROOT_DOCKER_Launch_Executor_Bridged fails & crashes

2015-11-19 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014985#comment-15014985
 ] 

Timothy Chen commented on MESOS-3123:
-

commit d78db0658f36c0afea8db04c148ea82fdbfc6d7b
Author: Timothy Chen 
Date:   Thu Nov 19 15:59:36 2015 -0800

Disabled docker bridge executor test.

Review: https://reviews.apache.org/r/40511

> DockerContainerizerTest.ROOT_DOCKER_Launch_Executor_Bridged fails & crashes
> ---
>
> Key: MESOS-3123
> URL: https://issues.apache.org/jira/browse/MESOS-3123
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, test
>Affects Versions: 0.23.0
> Environment: CentOS 7.1, CentOS 6.6, or Ubuntu 14.04
> Mesos 0.23.0-rc4 or today's master
> Docker 1.9
>Reporter: Adam B
>Assignee: Timothy Chen
>  Labels: mesosphere
>
> Fails the test and then crashes while trying to shutdown the slaves.
> {code}
> [ RUN  ] DockerContainerizerTest.ROOT_DOCKER_Launch_Executor_Bridged
> ../../src/tests/docker_containerizer_tests.cpp:618: Failure
> Value of: statusRunning.get().state()
>   Actual: TASK_LOST
> Expected: TASK_RUNNING
> ../../src/tests/docker_containerizer_tests.cpp:619: Failure
> Failed to wait 1mins for statusFinished
> ../../src/tests/docker_containerizer_tests.cpp:610: Failure
> Actual function call count doesn't match EXPECT_CALL(sched, 
> statusUpdate(, _))...
>  Expected: to be called twice
>Actual: called once - unsatisfied and active
> F0721 21:59:54.950773 30622 logging.cpp:57] RAW: Pure virtual method called
> @ 0x7f3915347a02  google::LogMessage::Fail()
> @ 0x7f391534cee4  google::RawLog__()
> @ 0x7f3914890312  __cxa_pure_virtual
> @   0x88c3ae  mesos::internal::tests::Cluster::Slaves::shutdown()
> @   0x88c176  mesos::internal::tests::Cluster::Slaves::~Slaves()
> @   0x88dc16  mesos::internal::tests::Cluster::~Cluster()
> @   0x88dc87  mesos::internal::tests::MesosTest::~MesosTest()
> @   0xa529ab  
> mesos::internal::tests::DockerContainerizerTest::~DockerContainerizerTest()
> @   0xa8125f  
> mesos::internal::tests::DockerContainerizerTest_ROOT_DOCKER_Launch_Executor_Bridged_Test::~DockerContainerizerTest_ROOT_DOCKER_Launch_Executor_Bridged_Test()
> @   0xa8128e  
> mesos::internal::tests::DockerContainerizerTest_ROOT_DOCKER_Launch_Executor_Bridged_Test::~DockerContainerizerTest_ROOT_DOCKER_Launch_Executor_Bridged_Test()
> @  0x1218b4e  testing::Test::DeleteSelf_()
> @  0x1221909  
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x121cb38  
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x1205713  testing::TestInfo::Run()
> @  0x1205c4e  testing::TestCase::Run()
> @  0x120a9ca  testing::internal::UnitTestImpl::RunAllTests()
> @  0x122277b  
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x121d81b  
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x120987a  testing::UnitTest::Run()
> @   0xcfbf0c  main
> @ 0x7f391097caf5  __libc_start_main
> @   0x882089  (unknown)
> make[3]: *** [check-local] Aborted (core dumped)
> make[3]: Leaving directory `/home/me/mesos/build/src'
> make[2]: *** [check-am] Error 2
> make[2]: Leaving directory `/home/me/mesos/build/src'
> make[1]: *** [check] Error 2
> make[1]: Leaving directory `/home/me/mesos/build/src'
> make: *** [check-recursive] Error 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3941) Using 'async' to invoke blocking functions will block work threads.

2015-11-18 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011596#comment-15011596
 ] 

Timothy Chen commented on MESOS-3941:
-

Also it's known issue that overloadded http calls can take over Mesos as well, 
perhaps we should rethink how we schedule our workloads in libprocess.

> Using 'async' to invoke blocking functions will block work threads.
> ---
>
> Key: MESOS-3941
> URL: https://issues.apache.org/jira/browse/MESOS-3941
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>
> Saw a few occurrences in the code base. For instance:
> https://github.com/apache/mesos/blob/master/src/slave/containerizer/fetcher.cpp#L382
> The current implementation of 'async' will create a libprocess process. That 
> means if the function is blocking, a worker thread will be blocked. If we 
> have too many such instances, we might end up deadlocking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3736) Support docker local store pull same image simultaneously

2015-11-18 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012470#comment-15012470
 ] 

Timothy Chen commented on MESOS-3736:
-

commit 2a15af346f1ccdb4575f3340d21f06a3d601870b
Author: Gilbert Song 
Date:   Tue Nov 17 23:06:31 2015 +

Added support in docker local store to pull same image simultaneously.

Review: https://reviews.apache.org/r/39331

> Support docker local store pull same image simultaneously 
> --
>
> Key: MESOS-3736
> URL: https://issues.apache.org/jira/browse/MESOS-3736
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: mesosphere
>
> The current local store implements get() using the local puller. For all 
> requests of pulling same docker image at the same time, the local puller just 
> untar the image tarball as many times as those requests are, and cp all of 
> them to the same directory, which wastes time and bear high demand of 
> computation. We should be able to support the local store/puller only do 
> these for the first time, and the simultaneous pulling request should wait 
> for the promised future and get it once the first pulling finishes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3928) ROOT tests fail on Mesos 0.26 on Ubuntu/CentOS

2015-11-17 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009572#comment-15009572
 ] 

Timothy Chen commented on MESOS-3928:
-

The issue I'm seeing so far from the bridge test is that ip for ubuntu 14.04 
host is currently populated in /etc/hosts as 127.0.1.1 and this is the slave ip 
that the docker container with bridge mode running custom executor will try to 
reach. I wasn't able to repro with a GCE ubuntu since it doesn't populate 
/etc/hosts for the hostname and was able to get the ip for the eth0 interface.

> ROOT tests fail on Mesos 0.26 on Ubuntu/CentOS
> --
>
> Key: MESOS-3928
> URL: https://issues.apache.org/jira/browse/MESOS-3928
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Marco Massenzio
>Assignee: Bernd Mathiske
> Attachments: ROOT-tests-centos-7.1.log, ROOT-tests-ubuntu-14.04.log
>
>
> Running {{0.26.0-rc1}} on both CentOS 7.1 and Ubuntu 14.04 with {{sudo}} 
> privileges, causes segfaults when running Docker tests.
> Logs attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   >