[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-17 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971939#comment-15971939
 ] 

Adam B commented on MESOS-7210:
---

[~haosd...@gmail.com], could you please backport this to the 1.2.x and 1.1.x 
branches so we can include it in the next patch releases (1.2.1 and 1.1.2)? 
Hoping to cut those this week.

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: Deshi Xiao
>Priority: Critical
> Fix For: 1.3.0
>
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-13 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967207#comment-15967207
 ] 

Deshi Xiao commented on MESOS-7210:
---

for avoid of abuse priviledges, just use --cap-add NET_ADMIN to resolve the net 
operation issue.

```
Failed to enter the net namespace of task (pid: '78851'): Operation not 
permitted
```

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: Deshi Xiao
>Priority: Critical
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-08 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961797#comment-15961797
 ] 

Deshi Xiao commented on MESOS-7210:
---

in second try. i have subimt new patch to 58200. let me testing it again.

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: Deshi Xiao
>Priority: Critical
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-05 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958028#comment-15958028
 ] 

Deshi Xiao commented on MESOS-7210:
---

first testing:
https://gist.github.com/xiaods/c5a11e3ab51e89a9609edc2c477f7ea8


> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>Priority: Critical
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-05 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15956401#comment-15956401
 ] 

Deshi Xiao commented on MESOS-7210:
---

patch: https://reviews.apache.org/r/58200/

let me testing it asap.

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>Priority: Critical
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-05 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15956356#comment-15956356
 ] 

Deshi Xiao commented on MESOS-7210:
---

thanks [~haosd...@gmail.com] it works.

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>Priority: Critical
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-04 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15955755#comment-15955755
 ] 

Deshi Xiao commented on MESOS-7210:
---

found https://issues.apache.org/jira/browse/MESOS-6589 , but not found any 
reference to this docker parameter usage. 

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>Priority: Critical
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-04 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15955733#comment-15955733
 ] 

Deshi Xiao commented on MESOS-7210:
---

try 
dockerInfo.parameters.push_back("--pid=host");

does it correct?

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>Priority: Critical
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-04 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15955725#comment-15955725
 ] 

Deshi Xiao commented on MESOS-7210:
---

Hi [~alexr] [~haosd...@gmail.com]

i can't found any useful reference for HOWTO use docker parameters fields in 
mesos protobuf

```
ContainerInfo::DockerInfo dockerInfo;
dockerInfo.set_image(flags.docker_mesos_image.get());
```
i need add --pid=host to this dockerInfo.parameters. could you please give a 
help. thanks a lot.

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>Priority: Critical
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-04 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15955308#comment-15955308
 ] 

Deshi Xiao commented on MESOS-7210:
---

add 

```
dockerInfo.set_pid("host");
```

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>Priority: Critical
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-03 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15953242#comment-15953242
 ] 

Alexander Rukletsov commented on MESOS-7210:


[~xds2000], [~haosd...@gmail.com] Let's fix it and backport.

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>Priority: Critical
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-03 Thread Wojciech Sielski (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15953071#comment-15953071
 ] 

Wojciech Sielski commented on MESOS-7210:
-

[~xds2000] exactly, the mesos-slave (container) and the docker executor 
(container) need to runs in the same pid pool (host).

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-03-31 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950633#comment-15950633
 ] 

Deshi Xiao commented on MESOS-7210:
---

it difficult to fix due to the mesos agent is wrap into container, we only 
manually add --pid=host to the mesos-agent container, then the pid can find 
same pid with container inside process pid. this is not mesos fault, we prefer 
suggest user can use systemd to running the mesos agent instead of mesos agent 
container, it will benefit with developers and users each other.

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-03-30 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950251#comment-15950251
 ] 

Deshi Xiao commented on MESOS-7210:
---

add me 

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-03-20 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15934001#comment-15934001
 ] 

haosdent commented on MESOS-7210:
-

Thanks a lot  [~sielaq] [~alexr]'s help. Let me try to fix this.

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-03-20 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933319#comment-15933319
 ] 

Alexander Rukletsov commented on MESOS-7210:


Hey [~sielaq], thanks a lot for the report and sorry for a tardy reply. This 
indeed looks like a bug we need to fix. I'll add it to our backlog and will get 
back to you once we have a fix or a time plan for getting the fix out.

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: Gastón Kleiman
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-03-06 Thread Avinash Sridharan (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898712#comment-15898712
 ] 

Avinash Sridharan commented on MESOS-7210:
--

[~alexr] ^^

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)