[jira] [Commented] (MESOS-7324) Update documentation to reflect the addition of multi-role framework support.

2017-03-29 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948228#comment-15948228
 ] 

Michael Park commented on MESOS-7324:
-

Okay!

> Update documentation to reflect the addition of multi-role framework support.
> -
>
> Key: MESOS-7324
> URL: https://issues.apache.org/jira/browse/MESOS-7324
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Benjamin Mahler
>Assignee: Benjamin Mahler
>
> The current documentation assumes single role frameworks, we need to update 
> the documentation to reflect the support for subscribing to multiple roles.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7324) Update documentation to reflect the addition of multi-role framework support.

2017-03-29 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-7324:
--

 Summary: Update documentation to reflect the addition of 
multi-role framework support.
 Key: MESOS-7324
 URL: https://issues.apache.org/jira/browse/MESOS-7324
 Project: Mesos
  Issue Type: Documentation
  Components: documentation
Reporter: Benjamin Mahler
Assignee: Benjamin Mahler


The current documentation assumes single role frameworks, we need to update the 
documentation to reflect the support for subscribing to multiple roles.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-6762) Update release notes for multi-role changes

2017-03-29 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler reassigned MESOS-6762:
--

Assignee: Benjamin Mahler

> Update release notes for multi-role changes
> ---
>
> Key: MESOS-6762
> URL: https://issues.apache.org/jira/browse/MESOS-6762
> Project: Mesos
>  Issue Type: Task
>Reporter: Benjamin Bannier
>Assignee: Benjamin Mahler
>
> When adding support for multi-role frameworks we should call out a number of 
> potential issues in the changelog/release notes.
> This ticket collects potential pitfalls.
> h6. Changes in master and agent endpoints
> When rendering the {{FrameworkInfo}} of multi-role enabled frameworks in 
> master or agent endpoints the {{role}} field will not be supported anymore; 
> instead the {{roles}} field should be used. Any tooling parsing endpoint 
> information and relying on the {{role}} field needs to be updated before 
> multi-role enabled frameworks can be run in the cluster.
> h6. Changes to the allocator interface / implementation requirements for 
> module implementors
> Implementors of allocator modules have to provide new implementation 
> functionality to satisfy the MULTI_ROLE framework capability. Also, the 
> interface has changed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-7183) Always get coredump by add a health check on docker container app

2017-03-29 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-7183:
-

Assignee: (was: Deshi Xiao)

> Always get coredump by add a health check on docker container app
> -
>
> Key: MESOS-7183
> URL: https://issues.apache.org/jira/browse/MESOS-7183
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.1, 1.2.0
>Reporter: Deshi Xiao
> Attachments: stderr
>
>
> the key message is : Failed to enter the net namespace of task (pid: 
> '22392'): Pid 22392 does not exist
> see the sandbox's stderr log:
> {code}
> I0227 09:20:02.624827 22345 exec.cpp:162] Version: 1.1.0
> I0227 09:20:02.651790 22347 exec.cpp:237] Executor registered on agent 
> f2aeab4d-b224-479c-869d-121daa0c12cb-S0
> I0227 09:20:02.656651 22347 docker.cpp:811] Running docker -H 
> unix:///var/run/docker.sock run --privileged --cpu-shares 2048 --memory 
> 33554432 -e WORDPRESS_DB_HOST=192.168.1.210 -e WORDPRESS_DB_PASSWORD=root -e 
> MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_CONTAINER_NAME=mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  -v /home:/data:rw -v 
> /var/lib/mesos/slaves/f2aeab4d-b224-479c-869d-121daa0c12cb-S0/frameworks/67b3106e-fe2b-4eaa-8dcc-51653d027738-0001/executors/0-wordpress4-nmg-nmgtest-55ba456bf6eb4e979610f5ec1fb23980/runs/8f6de3ab-0e85-434a-a099-d16f9654a10c:/mnt/mesos/sandbox
>  --net bridge --label=APP_ID=wordpress --label=USER=nmg 
> --label=CLUSTER=nmgtest --label=SLOT=0 --label=APP=wordpress4 -p 
> 31000:8080/tcp --name 
> mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  wordpress
> WordPress not found in /var/www/html - copying now...
> Complete! WordPress has been successfully copied to /var/www/html
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:18.425110 22353 health_checker.cpp:205] Health check failed 1 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:19.535784 22347 health_checker.cpp:205] Health check failed 2 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:20.646812 22350 health_checker.cpp:205] Health check failed 3 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:21.758222 22353 health_checker.cpp:205] Health check failed 4 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:22.773813 22349 health_checker.cpp:205] Health check failed 5 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:23.883586 22352 health_checker.cpp:205] Health check failed 6 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:24.994628 22350 health_checker.cpp:205] Health check failed 7 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:26.106149 22352 health_checker.cpp:205] Health check failed 8 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:27.218143 22351 health_checker.cpp:205] Health check failed 9 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:28.329988 22350 health_checker.cpp:205] Health check failed 10 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:29.440842 22348 health_checker.cpp:205] Health check failed 11 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:30.554321 22348 health_checker.cpp:205] 

[jira] [Updated] (MESOS-7183) Always get coredump by add a health check on docker container app

2017-03-29 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-7183:
--
Affects Version/s: (was: 1.1.0)
   1.1.1

> Always get coredump by add a health check on docker container app
> -
>
> Key: MESOS-7183
> URL: https://issues.apache.org/jira/browse/MESOS-7183
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.1, 1.2.0
>Reporter: Deshi Xiao
> Attachments: stderr
>
>
> the key message is : Failed to enter the net namespace of task (pid: 
> '22392'): Pid 22392 does not exist
> see the sandbox's stderr log:
> {code}
> I0227 09:20:02.624827 22345 exec.cpp:162] Version: 1.1.0
> I0227 09:20:02.651790 22347 exec.cpp:237] Executor registered on agent 
> f2aeab4d-b224-479c-869d-121daa0c12cb-S0
> I0227 09:20:02.656651 22347 docker.cpp:811] Running docker -H 
> unix:///var/run/docker.sock run --privileged --cpu-shares 2048 --memory 
> 33554432 -e WORDPRESS_DB_HOST=192.168.1.210 -e WORDPRESS_DB_PASSWORD=root -e 
> MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_CONTAINER_NAME=mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  -v /home:/data:rw -v 
> /var/lib/mesos/slaves/f2aeab4d-b224-479c-869d-121daa0c12cb-S0/frameworks/67b3106e-fe2b-4eaa-8dcc-51653d027738-0001/executors/0-wordpress4-nmg-nmgtest-55ba456bf6eb4e979610f5ec1fb23980/runs/8f6de3ab-0e85-434a-a099-d16f9654a10c:/mnt/mesos/sandbox
>  --net bridge --label=APP_ID=wordpress --label=USER=nmg 
> --label=CLUSTER=nmgtest --label=SLOT=0 --label=APP=wordpress4 -p 
> 31000:8080/tcp --name 
> mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  wordpress
> WordPress not found in /var/www/html - copying now...
> Complete! WordPress has been successfully copied to /var/www/html
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:18.425110 22353 health_checker.cpp:205] Health check failed 1 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:19.535784 22347 health_checker.cpp:205] Health check failed 2 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:20.646812 22350 health_checker.cpp:205] Health check failed 3 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:21.758222 22353 health_checker.cpp:205] Health check failed 4 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:22.773813 22349 health_checker.cpp:205] Health check failed 5 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:23.883586 22352 health_checker.cpp:205] Health check failed 6 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:24.994628 22350 health_checker.cpp:205] Health check failed 7 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:26.106149 22352 health_checker.cpp:205] Health check failed 8 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:27.218143 22351 health_checker.cpp:205] Health check failed 9 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:28.329988 22350 health_checker.cpp:205] Health check failed 10 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:29.440842 22348 health_checker.cpp:205] Health check failed 11 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:30.554321 22348 

[jira] [Commented] (MESOS-7183) Always get coredump by add a health check on docker container app

2017-03-29 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948119#comment-15948119
 ] 

Jie Yu commented on MESOS-7183:
---

For posterity, the root cause of this problem is that when agent is running 
inside a docker container and `--docker_mesos_image` flag is specified, the pid 
namespace of the executor container (which initiate the health check) is 
different than the root pid namespace. Therefore, getting the network namespace 
handle using `/proc//ns/net` does not work because the 'pid' here is in 
the root pid namespace (reported by docker daemon).

> Always get coredump by add a health check on docker container app
> -
>
> Key: MESOS-7183
> URL: https://issues.apache.org/jira/browse/MESOS-7183
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Deshi Xiao
>Assignee: Deshi Xiao
> Attachments: stderr
>
>
> the key message is : Failed to enter the net namespace of task (pid: 
> '22392'): Pid 22392 does not exist
> see the sandbox's stderr log:
> {code}
> I0227 09:20:02.624827 22345 exec.cpp:162] Version: 1.1.0
> I0227 09:20:02.651790 22347 exec.cpp:237] Executor registered on agent 
> f2aeab4d-b224-479c-869d-121daa0c12cb-S0
> I0227 09:20:02.656651 22347 docker.cpp:811] Running docker -H 
> unix:///var/run/docker.sock run --privileged --cpu-shares 2048 --memory 
> 33554432 -e WORDPRESS_DB_HOST=192.168.1.210 -e WORDPRESS_DB_PASSWORD=root -e 
> MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_CONTAINER_NAME=mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  -v /home:/data:rw -v 
> /var/lib/mesos/slaves/f2aeab4d-b224-479c-869d-121daa0c12cb-S0/frameworks/67b3106e-fe2b-4eaa-8dcc-51653d027738-0001/executors/0-wordpress4-nmg-nmgtest-55ba456bf6eb4e979610f5ec1fb23980/runs/8f6de3ab-0e85-434a-a099-d16f9654a10c:/mnt/mesos/sandbox
>  --net bridge --label=APP_ID=wordpress --label=USER=nmg 
> --label=CLUSTER=nmgtest --label=SLOT=0 --label=APP=wordpress4 -p 
> 31000:8080/tcp --name 
> mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  wordpress
> WordPress not found in /var/www/html - copying now...
> Complete! WordPress has been successfully copied to /var/www/html
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:18.425110 22353 health_checker.cpp:205] Health check failed 1 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:19.535784 22347 health_checker.cpp:205] Health check failed 2 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:20.646812 22350 health_checker.cpp:205] Health check failed 3 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:21.758222 22353 health_checker.cpp:205] Health check failed 4 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:22.773813 22349 health_checker.cpp:205] Health check failed 5 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:23.883586 22352 health_checker.cpp:205] Health check failed 6 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:24.994628 22350 health_checker.cpp:205] Health check failed 7 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:26.106149 22352 health_checker.cpp:205] Health check failed 8 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:27.218143 22351 health_checker.cpp:205] Health check failed 9 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:28.329988 22350 health_checker.cpp:205] Health check failed 10 
> times 

[jira] [Commented] (MESOS-6762) Update release notes for multi-role changes

2017-03-29 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947997#comment-15947997
 ] 

Benjamin Mahler commented on MESOS-6762:


CHANGELOG update:

{noformat}
commit 10d7988ee5948bc45518e7c1c339a371c4bf151f
Author: Benjamin Mahler 
Date:   Thu Mar 16 15:33:56 2017 -0700

Added multi-role framework support to the CHANGELOG.

Review: https://reviews.apache.org/r/57707
{noformat}

Will close once the additional documentation described in this ticket is added.

> Update release notes for multi-role changes
> ---
>
> Key: MESOS-6762
> URL: https://issues.apache.org/jira/browse/MESOS-6762
> Project: Mesos
>  Issue Type: Task
>Reporter: Benjamin Bannier
>
> When adding support for multi-role frameworks we should call out a number of 
> potential issues in the changelog/release notes.
> This ticket collects potential pitfalls.
> h6. Changes in master and agent endpoints
> When rendering the {{FrameworkInfo}} of multi-role enabled frameworks in 
> master or agent endpoints the {{role}} field will not be supported anymore; 
> instead the {{roles}} field should be used. Any tooling parsing endpoint 
> information and relying on the {{role}} field needs to be updated before 
> multi-role enabled frameworks can be run in the cluster.
> h6. Changes to the allocator interface / implementation requirements for 
> module implementors
> Implementors of allocator modules have to provide new implementation 
> functionality to satisfy the MULTI_ROLE framework capability. Also, the 
> interface has changed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-3875) Account dynamic reservations towards quota.

2017-03-29 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947990#comment-15947990
 ] 

Benjamin Mahler commented on MESOS-3875:


I think the situation I'm describing is addressed by this ticket, since I'm 
referring to quota allocation not accounting for reserved resources and hence 
enabling gaming. Unless this is prevented already?

MESOS-3338 is fairly vague. It could use some clarification since it seems to 
be referring only to endpoints (and I'm not sure the suggestion of MESOS-3338 
is the right thing to do as far as the endpoints are concerned). Is it 
addressing the fair sharing side of the reservation gaming?

> Account dynamic reservations towards quota.
> ---
>
> Key: MESOS-3875
> URL: https://issues.apache.org/jira/browse/MESOS-3875
> Project: Mesos
>  Issue Type: Task
>  Components: allocation, master
>Reporter: Alexander Rukletsov
>Priority: Critical
>  Labels: mesosphere
>
> Dynamic reservations—whether allocated or not—should be accounted towards 
> role's quota. This requires update in at least two places:
> * The built-in allocator, which actually satisfies quota;
> * The sanity check in the master.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7323) Framework role tracking in allocator results in framework treated as active incorrectly.

2017-03-29 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-7323:
---
Target Version/s: 1.3.0
Priority: Critical  (was: Major)
 Description: When an agent is added to the allocator and there are 
resources allocated to a known framework, where the allocation role is not one 
of the framework's subscribed roles, then the allocator will "track" the role 
(i.e. allocation information) for this framework. However, the current 
implementation results in the framework being treated as an active client of 
the sorter, when it should be an inactive client.

> Framework role tracking in allocator results in framework treated as active 
> incorrectly.
> 
>
> Key: MESOS-7323
> URL: https://issues.apache.org/jira/browse/MESOS-7323
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, master
>Reporter: Benjamin Mahler
>Assignee: Michael Park
>Priority: Critical
>
> When an agent is added to the allocator and there are resources allocated to 
> a known framework, where the allocation role is not one of the framework's 
> subscribed roles, then the allocator will "track" the role (i.e. allocation 
> information) for this framework. However, the current implementation results 
> in the framework being treated as an active client of the sorter, when it 
> should be an inactive client.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7323) Framework role tracking in allocator results in framework treated as active incorrectly.

2017-03-29 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-7323:
--

 Summary: Framework role tracking in allocator results in framework 
treated as active incorrectly.
 Key: MESOS-7323
 URL: https://issues.apache.org/jira/browse/MESOS-7323
 Project: Mesos
  Issue Type: Bug
  Components: allocation, master
Reporter: Benjamin Mahler
Assignee: Michael Park






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7312) Update Resource proto for storage resource providers.

2017-03-29 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947959#comment-15947959
 ] 

Jie Yu commented on MESOS-7312:
---

commit 75c6dfaa9b816c61eb7a3e155f990d96276dcaa3
Author: Benjamin Bannier 
Date:   Wed Mar 29 14:28:38 2017 -0700

Added UNKNOWN DiskInfo.Source type.

We introduce an explicit UNKNOWN enum kind to allow explicit handling
of unknown enum values (e.g., when the sending and receiving end use
different versions of a message using the enum).

This commit also migrates pattern matching of values of this enum from
if statements to switch statements so that compiler diagnostics can be
used to identify unhandled cases when other types are added in the
future.

Review: https://reviews.apache.org/r/57911/

> Update Resource proto for storage resource providers.
> -
>
> Key: MESOS-7312
> URL: https://issues.apache.org/jira/browse/MESOS-7312
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>
> Storage resource provider support requires a number of changes to the 
> {{Resource}} proto:
> * support for {{RAW}} and {{BLOCK}} type {{Resource::DiskInfo::Source}}
> * {{ResourceProviderID}} in Resource
> * {{Resource::DiskInfo::Source::Path}} should be {{optional}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6003) Add logging module for logging to an external program

2017-03-29 Thread Joel Wilsson (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947914#comment-15947914
 ] 

Joel Wilsson commented on MESOS-6003:
-

I guess this needs another look from [~kaysoky]? It would be nice to have this 
feature merged.

> Add logging module for logging to an external program
> -
>
> Key: MESOS-6003
> URL: https://issues.apache.org/jira/browse/MESOS-6003
> Project: Mesos
>  Issue Type: Improvement
>  Components: modules
>Reporter: Will Rouesnel
>Assignee: Will Rouesnel
>Priority: Minor
>
> In the vein of the logrotate module for logging, there should be a similar 
> module which provides support for logging to an arbitrary log handling 
> program, with suitable task metadata provided by environment variables or 
> command line arguments.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-7311) CopyFetcherPluginTest.FetchExistingFile

2017-03-29 Thread Jeff Coffler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Coffler reassigned MESOS-7311:
---

Assignee: Jeff Coffler

> CopyFetcherPluginTest.FetchExistingFile
> ---
>
> Key: MESOS-7311
> URL: https://issues.apache.org/jira/browse/MESOS-7311
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
> Environment: Windows 10
>Reporter: Andrew Schwartzmeyer
>Assignee: Jeff Coffler
>  Labels: microsoft, windows
>
> The CopyFetcherPluginTest.FetchExistingFile unit tests (from mesos-tests) is 
> routinely failing on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7322) Enable precompiled headers (on non-Windows)

2017-03-29 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-7322:


 Summary: Enable precompiled headers (on non-Windows)
 Key: MESOS-7322
 URL: https://issues.apache.org/jira/browse/MESOS-7322
 Project: Mesos
  Issue Type: Improvement
  Components: build, cmake
Reporter: Joseph Wu


Support for precompiled headers was added in [MESOS-7226], but only on Windows. 
 Posix and Linux builds have additional sources with their own set of problems 
(mostly namespace resolution conflict) that arise from using precompiled 
headers.

This issue tracks progress on enabling precompiled headers on non-Windows 
platforms.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7226) Introduce precompiled headers (on Windows)

2017-03-29 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-7226:
-
Summary: Introduce precompiled headers (on Windows)  (was: Introduce 
precompiled headers)

> Introduce precompiled headers (on Windows)
> --
>
> Key: MESOS-7226
> URL: https://issues.apache.org/jira/browse/MESOS-7226
> Project: Mesos
>  Issue Type: Improvement
>  Components: cmake
>Reporter: Joseph Wu
>Assignee: Jeff Coffler
>  Labels: mesosphere, microsoft
>
> Precompiled headers (PCHs) exist on both Windows and Linux. For Linux, you 
> can refer to https://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html. 
> Straight from the GNU CC documentation: "The time the compiler takes to 
> process these header files over and over again can account for nearly all of 
> the time required to build the project."
> PCHs are only being proposed for the CMake system.  In theory, we can 
> introduce this change with only a few, non-intrusive code changes.  The 
> feature will primarily be a CMake change.
> See: https://github.com/sakra/cotire



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (MESOS-7277) General checker does not support command checks via agent.

2017-03-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MESOS-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940605#comment-15940605
 ] 

Gastón Kleiman edited comment on MESOS-7277 at 3/29/17 4:42 PM:


https://reviews.apache.org/r/58030/


was (Author: gkleiman):
https://reviews.apache.org/r/57912/
https://reviews.apache.org/r/58030/

> General checker does not support command checks via agent.
> --
>
> Key: MESOS-7277
> URL: https://issues.apache.org/jira/browse/MESOS-7277
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>Assignee: Gastón Kleiman
>  Labels: health-check, mesosphere
>
> Command checks via agent are necessary for executors, that launch their tasks 
> via agent, e.g., default executor. General checker should support launching 
> command as nested containers via agent in order to be used by such executors.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-7092) Health checker duplicates a lot of checker's functionality.

2017-03-29 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-7092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman reassigned MESOS-7092:
-

Assignee: Gastón Kleiman

> Health checker duplicates a lot of checker's functionality.
> ---
>
> Key: MESOS-7092
> URL: https://issues.apache.org/jira/browse/MESOS-7092
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.2.0
>Reporter: Alexander Rukletsov
>Assignee: Gastón Kleiman
>  Labels: health-check, mesosphere
>
> With the introduction of a general check (MESOS-6906), health checker should 
> leverage a general check plus add interpretation on top. This will avoid code 
> duplication and increase maintainability.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (MESOS-7277) General checker does not support command checks via agent.

2017-03-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MESOS-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940605#comment-15940605
 ] 

Gastón Kleiman edited comment on MESOS-7277 at 3/29/17 4:40 PM:


https://reviews.apache.org/r/57912/
https://reviews.apache.org/r/58030/


was (Author: gkleiman):
https://reviews.apache.org/r/57912/

> General checker does not support command checks via agent.
> --
>
> Key: MESOS-7277
> URL: https://issues.apache.org/jira/browse/MESOS-7277
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>Assignee: Gastón Kleiman
>  Labels: health-check, mesosphere
>
> Command checks via agent are necessary for executors, that launch their tasks 
> via agent, e.g., default executor. General checker should support launching 
> command as nested containers via agent in order to be used by such executors.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-5544) Support running Mesos agent in a Docker container.

2017-03-29 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947143#comment-15947143
 ] 

Deshi Xiao commented on MESOS-5544:
---

anyone can summary this feature's status?

> Support running Mesos agent in a Docker container.
> --
>
> Key: MESOS-5544
> URL: https://issues.apache.org/jira/browse/MESOS-5544
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jie Yu
>
> Currently, this does not work if one tries to use Mesos containerizer.
> The main problem is that we want to make sure the executor is not killed when 
> agent crashes. So we have to use --pid=host so that the agent is in the host 
> pid namespace.
> But that is not sufficient, Docker daemon will put agent into all cgroups 
> available on the host. We need to make sure we migrate the executor pid out 
> of those cgroups so that when agent crashes, executors are not killed.
> Also, when start the agent container, volumes need to be setup properly so 
> that any mounts under agent's work_dir will be propagate back to the host 
> mount table. This is to make sure we can recover those mounts after agent 
> restarts. This is also true for those mounts that are needed by some isolator 
> (e.g., network/cni isolator).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2017-03-29 Thread Gustav Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15946893#comment-15946893
 ] 

Gustav Paul commented on MESOS-5342:


[~haosdent huang] could you update this ticket with links to the Huawei work?

> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>  Labels: cgroups, cpu, cpu-usage, gpu, isolation, isolator, 
> mentor, perfomance
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance, specifically in terms of cache locality. 
> Applications requiring GPU resources can benefit from this feature by getting 
> access to cores closest to the GPU hardware, which reduces cpu-gpu copy 
> latency.
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology, and the advent of Intel's Knights-series processors, will require 
> making choices about where container's are going to run on the mesos-agent's 
> processor(s) cores - this feature is a step toward developing a robust 
> solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Issue Comment Deleted] (MESOS-6184) Health checks should use a general mechanism to enter namespaces of the task.

2017-03-29 Thread Deshi Xiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deshi Xiao updated MESOS-6184:
--
Comment: was deleted

(was: i have rebase the patch to 1.2.0 branch codebase. and testing it, it 
always get coredump file.

```
I0328 11:48:12.92218148 exec.cpp:162] Version: 1.2.0
I0328 11:48:12.92925254 exec.cpp:237] Executor registered on agent 
a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4
I0328 11:48:12.93164054 docker.cpp:850] Running docker -H 
unix:///var/run/docker.sock run --cpu-shares 10 --memory 33554432 --env-file 
/tmp/gvqGyb -v 
/data/mesos/slaves/a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4/frameworks/d7ef5d2b-f924-42d9-a274-c020afba6bce-/executors/0-hc-xychu-datamanmesos-2f3b47f9ffc048539c7b22baa6c32d8f/runs/458189b8-2ff4-4337-ad3a-67321e96f5cb:/mnt/mesos/sandbox
 --net bridge --label=USER_NAME=xychu --label=GROUP_NAME=groupautotest 
--label=APP_ID=hc --label=VCLUSTER=clusterautotest --label=USER=xychu 
--label=CLUSTER=datamanmesos --label=SLOT=0 --label=APP=hc -p 31000:80/tcp 
--name 
mesos-a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4.458189b8-2ff4-4337-ad3a-67321e96f5cb
 nginx
I0328 11:48:16.14571453 health_checker.cpp:196] Ignoring failure as health 
check still in grace period
W0328 11:48:26.28995849 health_checker.cpp:202] Health check failed 1 times 
consecutively: HTTP health check failed: curl returned terminated with signal 
Aborted (core dumped): ABORT: 
(../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): Failed 
to execute Subprocess::ChildHook: Failed to enter the net namespace of pid 
18596: Pid 18596 does not exist
*** Aborted at 1490672906 (unix time) try "date -d @1490672906" if you are 
using GNU date ***
PC: @ 0x7f26bfb485f7 __GI_raise
*** SIGABRT (@0x4a) received by PID 74 (TID 0x7f26ba152700) from PID 74; stack 
trace: ***
@ 0x7f26c0703100 (unknown)
@ 0x7f26bfb485f7 __GI_raise
@ 0x7f26bfb49ce8 __GI_abort
@ 0x7f26c315778e _Abort()
@ 0x7f26c31577cc _Abort()
@ 0x7f26c237a4b6 process::internal::childMain()
@ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke()
@ 0x7f26c2379e53 process::internal::defaultClone()
@ 0x7f26c237b951 process::internal::cloneChild()
@ 0x7f26c237954f process::subprocess()
@ 0x7f26c15a9fb1 
mesos::internal::checks::HealthCheckerProcess::httpHealthCheck()
@ 0x7f26c15ababd 
mesos::internal::checks::HealthCheckerProcess::performSingleCheck()
@ 0x7f26c2331389 process::ProcessManager::resume()
@ 0x7f26c233a3f7 
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
@ 0x7f26c04a1220 (unknown)
@ 0x7f26c06fbdc5 start_thread
@ 0x7f26bfc0928d __clone
W0328 11:48:36.34005555 health_checker.cpp:202] Health check failed 2 times 
consecutively: HTTP health check failed: curl returned terminated with signal 
Aborted (core dumped): ABORT: 
(../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): Failed 
to execute Subprocess::ChildHook: Failed to enter the net namespace of pid 
18596: Pid 18596 does not exist
*** Aborted at 1490672916 (unix time) try "date -d @1490672916" if you are 
using GNU date ***
PC: @ 0x7f26bfb485f7 __GI_raise
*** SIGABRT (@0x4b) received by PID 75 (TID 0x7f26b9951700) from PID 75; stack 
trace: ***
@ 0x7f26c0703100 (unknown)
@ 0x7f26bfb485f7 __GI_raise
@ 0x7f26bfb49ce8 __GI_abort
@ 0x7f26c315778e _Abort()
@ 0x7f26c31577cc _Abort()
@ 0x7f26c237a4b6 process::internal::childMain()
@ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke()
@ 0x7f26c2379e53 process::internal::defaultClone()
@ 0x7f26c237b951 process::internal::cloneChild()
@ 0x7f26c237954f process::subprocess()
@ 0x7f26c15a9fb1 
mesos::internal::checks::HealthCheckerProcess::httpHealthCheck()
@ 0x7f26c15ababd 
mesos::internal::checks::HealthCheckerProcess::performSingleCheck()
@ 0x7f26c2331389 process::ProcessManager::resume()
@ 0x7f26c233a3f7 
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
@ 0x7f26c04a1220 (unknown)
@ 0x7f26c06fbdc5 start_thread
@ 0x7f26bfc0928d __clone
W0328 11:48:46.38653349 health_checker.cpp:202] Health check failed 3 times 
consecutively: HTTP health check failed: curl returned terminated with signal 
Aborted (core dumped): ABORT: 
(../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): Failed 
to execute Subprocess::ChildHook: Failed to enter the net namespace of pid 
18596: Pid 18596 does not exist
*** Aborted at 1490672926 (unix time) try "date -d @1490672926" if you are 
using GNU date ***
PC: @ 0x7f26bfb485f7 __GI_raise
*** SIGABRT (@0x4c) received by PID 76 (TID 0x7f26ba152700) from PID 76; stack 
trace: ***
@ 

[jira] [Assigned] (MESOS-7183) Always get coredump by add a health check on docker container app

2017-03-29 Thread Deshi Xiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deshi Xiao reassigned MESOS-7183:
-

Resolution: Won't Fix
  Assignee: Deshi Xiao

this is specified case, when mesos in docker, we should be add --pid=host to 
let native health check process can access host pid scope. 

> Always get coredump by add a health check on docker container app
> -
>
> Key: MESOS-7183
> URL: https://issues.apache.org/jira/browse/MESOS-7183
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Deshi Xiao
>Assignee: Deshi Xiao
> Attachments: stderr
>
>
> the key message is : Failed to enter the net namespace of task (pid: 
> '22392'): Pid 22392 does not exist
> see the sandbox's stderr log:
> {code}
> I0227 09:20:02.624827 22345 exec.cpp:162] Version: 1.1.0
> I0227 09:20:02.651790 22347 exec.cpp:237] Executor registered on agent 
> f2aeab4d-b224-479c-869d-121daa0c12cb-S0
> I0227 09:20:02.656651 22347 docker.cpp:811] Running docker -H 
> unix:///var/run/docker.sock run --privileged --cpu-shares 2048 --memory 
> 33554432 -e WORDPRESS_DB_HOST=192.168.1.210 -e WORDPRESS_DB_PASSWORD=root -e 
> MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_CONTAINER_NAME=mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  -v /home:/data:rw -v 
> /var/lib/mesos/slaves/f2aeab4d-b224-479c-869d-121daa0c12cb-S0/frameworks/67b3106e-fe2b-4eaa-8dcc-51653d027738-0001/executors/0-wordpress4-nmg-nmgtest-55ba456bf6eb4e979610f5ec1fb23980/runs/8f6de3ab-0e85-434a-a099-d16f9654a10c:/mnt/mesos/sandbox
>  --net bridge --label=APP_ID=wordpress --label=USER=nmg 
> --label=CLUSTER=nmgtest --label=SLOT=0 --label=APP=wordpress4 -p 
> 31000:8080/tcp --name 
> mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  wordpress
> WordPress not found in /var/www/html - copying now...
> Complete! WordPress has been successfully copied to /var/www/html
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:18.425110 22353 health_checker.cpp:205] Health check failed 1 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:19.535784 22347 health_checker.cpp:205] Health check failed 2 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:20.646812 22350 health_checker.cpp:205] Health check failed 3 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:21.758222 22353 health_checker.cpp:205] Health check failed 4 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:22.773813 22349 health_checker.cpp:205] Health check failed 5 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:23.883586 22352 health_checker.cpp:205] Health check failed 6 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:24.994628 22350 health_checker.cpp:205] Health check failed 7 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:26.106149 22352 health_checker.cpp:205] Health check failed 8 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:27.218143 22351 health_checker.cpp:205] Health check failed 9 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:28.329988 22350 health_checker.cpp:205] Health check failed 10 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:29.440842 22348 health_checker.cpp:205] Health check failed 11 
> times consecutively: HTTP health check failed: curl 

[jira] [Commented] (MESOS-7183) Always get coredump by add a health check on docker container app

2017-03-29 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15946778#comment-15946778
 ] 

Deshi Xiao commented on MESOS-7183:
---

add  --pid=host resolve this issue.

> Always get coredump by add a health check on docker container app
> -
>
> Key: MESOS-7183
> URL: https://issues.apache.org/jira/browse/MESOS-7183
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Deshi Xiao
> Attachments: stderr
>
>
> the key message is : Failed to enter the net namespace of task (pid: 
> '22392'): Pid 22392 does not exist
> see the sandbox's stderr log:
> {code}
> I0227 09:20:02.624827 22345 exec.cpp:162] Version: 1.1.0
> I0227 09:20:02.651790 22347 exec.cpp:237] Executor registered on agent 
> f2aeab4d-b224-479c-869d-121daa0c12cb-S0
> I0227 09:20:02.656651 22347 docker.cpp:811] Running docker -H 
> unix:///var/run/docker.sock run --privileged --cpu-shares 2048 --memory 
> 33554432 -e WORDPRESS_DB_HOST=192.168.1.210 -e WORDPRESS_DB_PASSWORD=root -e 
> MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_CONTAINER_NAME=mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  -v /home:/data:rw -v 
> /var/lib/mesos/slaves/f2aeab4d-b224-479c-869d-121daa0c12cb-S0/frameworks/67b3106e-fe2b-4eaa-8dcc-51653d027738-0001/executors/0-wordpress4-nmg-nmgtest-55ba456bf6eb4e979610f5ec1fb23980/runs/8f6de3ab-0e85-434a-a099-d16f9654a10c:/mnt/mesos/sandbox
>  --net bridge --label=APP_ID=wordpress --label=USER=nmg 
> --label=CLUSTER=nmgtest --label=SLOT=0 --label=APP=wordpress4 -p 
> 31000:8080/tcp --name 
> mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  wordpress
> WordPress not found in /var/www/html - copying now...
> Complete! WordPress has been successfully copied to /var/www/html
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:18.425110 22353 health_checker.cpp:205] Health check failed 1 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:19.535784 22347 health_checker.cpp:205] Health check failed 2 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:20.646812 22350 health_checker.cpp:205] Health check failed 3 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:21.758222 22353 health_checker.cpp:205] Health check failed 4 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:22.773813 22349 health_checker.cpp:205] Health check failed 5 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:23.883586 22352 health_checker.cpp:205] Health check failed 6 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:24.994628 22350 health_checker.cpp:205] Health check failed 7 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:26.106149 22352 health_checker.cpp:205] Health check failed 8 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:27.218143 22351 health_checker.cpp:205] Health check failed 9 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:28.329988 22350 health_checker.cpp:205] Health check failed 10 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:29.440842 22348 health_checker.cpp:205] Health check failed 11 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 

[jira] [Updated] (MESOS-7316) Upgrading Mesos to 1.2.0 results in some information missing from the `/flags` endpoint.

2017-03-29 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-7316:

Shepherd: Michael Park

> Upgrading Mesos to 1.2.0 results in some information missing from the 
> `/flags` endpoint.
> 
>
> Key: MESOS-7316
> URL: https://issues.apache.org/jira/browse/MESOS-7316
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 1.2.0
>Reporter: Anand Mazumdar
>Assignee: Benjamin Bannier
>Priority: Critical
>  Labels: mesosphere
>
> From OSS Mesos Slack:
> I recently tried upgrading one of our Mesos clusters from 1.1.0 to 1.2.0. 
> After doing this, it looks like the {{zk}} field on the {{/master/flags}} 
> endpoint is no longer present. 
> This looks related to the recent {{Flags}} refactoring that was done which 
> resulted in some flags no longer being populated since they were not part of 
> {{master::Flags}} in {{src/master/flags.hpp}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-3875) Account dynamic reservations towards quota.

2017-03-29 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15946640#comment-15946640
 ] 

Alexander Rukletsov commented on MESOS-3875:


I think the situation you describe is MESOS-3338. AFAIK, authz partially 
addresses the issue but not allowing greedy frameworks do as many reservations 
as they possibly can.

> Account dynamic reservations towards quota.
> ---
>
> Key: MESOS-3875
> URL: https://issues.apache.org/jira/browse/MESOS-3875
> Project: Mesos
>  Issue Type: Task
>  Components: allocation, master
>Reporter: Alexander Rukletsov
>Priority: Critical
>  Labels: mesosphere
>
> Dynamic reservations—whether allocated or not—should be accounted towards 
> role's quota. This requires update in at least two places:
> * The built-in allocator, which actually satisfies quota;
> * The sanity check in the master.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)