[jira] [Commented] (MESOS-4065) slave FD for ZK tcp connection leaked to executor process
[ https://issues.apache.org/jira/browse/MESOS-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046570#comment-15046570 ] Till Toenshoff commented on MESOS-4065: --- >From your results, this conclusion seems sensible to me. We should actually add a bug report in the zookeeper JIRA so it can be properly handled upstream (https://issues.apache.org/jira/browse/ZOOKEEPER/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel). Could you please take care of that [~jdef]? > slave FD for ZK tcp connection leaked to executor process > - > > Key: MESOS-4065 > URL: https://issues.apache.org/jira/browse/MESOS-4065 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.24.1, 0.25.0 >Reporter: James DeFelice > Labels: mesosphere, security > > {code} > core@ip-10-0-0-45 ~ $ ps auxwww|grep -e etcd > root 1432 99.3 0.0 202420 12928 ?Rsl 21:32 13:51 > ./etcd-mesos-executor -log_dir=./ > root 1450 0.4 0.1 38332 28752 ?Sl 21:32 0:03 ./etcd > --data-dir=etcd_data --name=etcd-1449178273 > --listen-peer-urls=http://10.0.0.45:1025 > --initial-advertise-peer-urls=http://10.0.0.45:1025 > --listen-client-urls=http://10.0.0.45:1026 > --advertise-client-urls=http://10.0.0.45:1026 > --initial-cluster=etcd-1449178273=http://10.0.0.45:1025,etcd-1449178271=http://10.0.2.95:1025,etcd-1449178272=http://10.0.2.216:1025 > --initial-cluster-state=existing > core 1651 0.0 0.0 6740 928 pts/0S+ 21:46 0:00 grep > --colour=auto -e etcd > core@ip-10-0-0-45 ~ $ sudo lsof -p 1432|grep -e 2181 > etcd-meso 1432 root 10u IPv4 21973 0t0TCP > ip-10-0-0-45.us-west-2.compute.internal:54016->ip-10-0-5-206.us-west-2.compute.internal:2181 > (ESTABLISHED) > core@ip-10-0-0-45 ~ $ ps auxwww|grep -e slave > root 1124 0.2 0.1 900496 25736 ?Ssl 21:11 0:04 > /opt/mesosphere/packages/mesos--52cbecde74638029c3ba0ac5e5ab81df8debf0fa/sbin/mesos-slave > core 1658 0.0 0.0 6740 832 pts/0S+ 21:46 0:00 grep > --colour=auto -e slave > core@ip-10-0-0-45 ~ $ sudo lsof -p 1124|grep -e 2181 > mesos-sla 1124 root 10u IPv4 21973 0t0TCP > ip-10-0-0-45.us-west-2.compute.internal:54016->ip-10-0-5-206.us-west-2.compute.internal:2181 > (ESTABLISHED) > {code} > I only tested against mesos 0.24.1 and 0.25.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4065) slave FD for ZK tcp connection leaked to executor process
[ https://issues.apache.org/jira/browse/MESOS-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046588#comment-15046588 ] Till Toenshoff commented on MESOS-4065: --- Some tool that has been rather useful for debugging such issues within Mesos; https://github.com/tillt/mesos/commit/d6982ece26121c599426e6b5c573e8d8afeff837 > slave FD for ZK tcp connection leaked to executor process > - > > Key: MESOS-4065 > URL: https://issues.apache.org/jira/browse/MESOS-4065 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.24.1, 0.25.0 >Reporter: James DeFelice > Labels: mesosphere, security > > {code} > core@ip-10-0-0-45 ~ $ ps auxwww|grep -e etcd > root 1432 99.3 0.0 202420 12928 ?Rsl 21:32 13:51 > ./etcd-mesos-executor -log_dir=./ > root 1450 0.4 0.1 38332 28752 ?Sl 21:32 0:03 ./etcd > --data-dir=etcd_data --name=etcd-1449178273 > --listen-peer-urls=http://10.0.0.45:1025 > --initial-advertise-peer-urls=http://10.0.0.45:1025 > --listen-client-urls=http://10.0.0.45:1026 > --advertise-client-urls=http://10.0.0.45:1026 > --initial-cluster=etcd-1449178273=http://10.0.0.45:1025,etcd-1449178271=http://10.0.2.95:1025,etcd-1449178272=http://10.0.2.216:1025 > --initial-cluster-state=existing > core 1651 0.0 0.0 6740 928 pts/0S+ 21:46 0:00 grep > --colour=auto -e etcd > core@ip-10-0-0-45 ~ $ sudo lsof -p 1432|grep -e 2181 > etcd-meso 1432 root 10u IPv4 21973 0t0TCP > ip-10-0-0-45.us-west-2.compute.internal:54016->ip-10-0-5-206.us-west-2.compute.internal:2181 > (ESTABLISHED) > core@ip-10-0-0-45 ~ $ ps auxwww|grep -e slave > root 1124 0.2 0.1 900496 25736 ?Ssl 21:11 0:04 > /opt/mesosphere/packages/mesos--52cbecde74638029c3ba0ac5e5ab81df8debf0fa/sbin/mesos-slave > core 1658 0.0 0.0 6740 832 pts/0S+ 21:46 0:00 grep > --colour=auto -e slave > core@ip-10-0-0-45 ~ $ sudo lsof -p 1124|grep -e 2181 > mesos-sla 1124 root 10u IPv4 21973 0t0TCP > ip-10-0-0-45.us-west-2.compute.internal:54016->ip-10-0-5-206.us-west-2.compute.internal:2181 > (ESTABLISHED) > {code} > I only tested against mesos 0.24.1 and 0.25.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4075) Continue test suite execution across crashing tests.
[ https://issues.apache.org/jira/browse/MESOS-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046890#comment-15046890 ] Klaus Ma commented on MESOS-4075: - +1, maybe we can inject CHECK to avoid crash when testing. > Continue test suite execution across crashing tests. > > > Key: MESOS-4075 > URL: https://issues.apache.org/jira/browse/MESOS-4075 > Project: Mesos > Issue Type: Improvement > Components: test >Affects Versions: 0.26.0 >Reporter: Bernd Mathiske >Assignee: Bernd Mathiske > Labels: mesosphere > > Currently, mesos-tests.sh exits when a test crashes. This is inconvenient > when trying to find out all tests that fail. > mesos-tests.sh should rate a test that crashes as failed and continue the > same way as if the test merely returned with a failure result and exited > properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4065) slave FD for ZK tcp connection leaked to executor process
[ https://issues.apache.org/jira/browse/MESOS-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047064#comment-15047064 ] James DeFelice commented on MESOS-4065: --- https://issues.apache.org/jira/browse/ZOOKEEPER-2338 > slave FD for ZK tcp connection leaked to executor process > - > > Key: MESOS-4065 > URL: https://issues.apache.org/jira/browse/MESOS-4065 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.24.1, 0.25.0 >Reporter: James DeFelice > Labels: mesosphere, security > > {code} > core@ip-10-0-0-45 ~ $ ps auxwww|grep -e etcd > root 1432 99.3 0.0 202420 12928 ?Rsl 21:32 13:51 > ./etcd-mesos-executor -log_dir=./ > root 1450 0.4 0.1 38332 28752 ?Sl 21:32 0:03 ./etcd > --data-dir=etcd_data --name=etcd-1449178273 > --listen-peer-urls=http://10.0.0.45:1025 > --initial-advertise-peer-urls=http://10.0.0.45:1025 > --listen-client-urls=http://10.0.0.45:1026 > --advertise-client-urls=http://10.0.0.45:1026 > --initial-cluster=etcd-1449178273=http://10.0.0.45:1025,etcd-1449178271=http://10.0.2.95:1025,etcd-1449178272=http://10.0.2.216:1025 > --initial-cluster-state=existing > core 1651 0.0 0.0 6740 928 pts/0S+ 21:46 0:00 grep > --colour=auto -e etcd > core@ip-10-0-0-45 ~ $ sudo lsof -p 1432|grep -e 2181 > etcd-meso 1432 root 10u IPv4 21973 0t0TCP > ip-10-0-0-45.us-west-2.compute.internal:54016->ip-10-0-5-206.us-west-2.compute.internal:2181 > (ESTABLISHED) > core@ip-10-0-0-45 ~ $ ps auxwww|grep -e slave > root 1124 0.2 0.1 900496 25736 ?Ssl 21:11 0:04 > /opt/mesosphere/packages/mesos--52cbecde74638029c3ba0ac5e5ab81df8debf0fa/sbin/mesos-slave > core 1658 0.0 0.0 6740 832 pts/0S+ 21:46 0:00 grep > --colour=auto -e slave > core@ip-10-0-0-45 ~ $ sudo lsof -p 1124|grep -e 2181 > mesos-sla 1124 root 10u IPv4 21973 0t0TCP > ip-10-0-0-45.us-west-2.compute.internal:54016->ip-10-0-5-206.us-west-2.compute.internal:2181 > (ESTABLISHED) > {code} > I only tested against mesos 0.24.1 and 0.25.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3925) Add HDFS based URI fetcher plugin.
[ https://issues.apache.org/jira/browse/MESOS-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-3925: -- Sprint: Mesosphere Sprint 24 > Add HDFS based URI fetcher plugin. > -- > > Key: MESOS-3925 > URL: https://issues.apache.org/jira/browse/MESOS-3925 > Project: Mesos > Issue Type: Task >Reporter: Jie Yu > Labels: twitter > > This plugin uses HDFS client to fetch artifacts. It can support schemes like > hdfs/hftp/s3/s3n > It'll shell out the hadoop command to do the actual fetching. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3951) Make HDFS tool wrappers asynchronous.
[ https://issues.apache.org/jira/browse/MESOS-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-3951: -- Sprint: Mesosphere Sprint 24 > Make HDFS tool wrappers asynchronous. > - > > Key: MESOS-3951 > URL: https://issues.apache.org/jira/browse/MESOS-3951 > Project: Mesos > Issue Type: Task >Reporter: Jie Yu >Assignee: Jie Yu > > The existing HDFS tool wrappers (src/hdfs/hdfs.hpp) are synchronous. They use > os::shell to shell out the 'hadoop' commands. This makes it very hard to be > reused at other locations in the code base. > The URI fetcher HDFS plugin will try to re-use the existing HDFS tool > wrappers. In order to do that, we need to make it asynchronous first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4084) mesos-slave assigned marathon task wrongly to chronos framework after task failure
[ https://issues.apache.org/jira/browse/MESOS-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047202#comment-15047202 ] Vinod Kone commented on MESOS-4084: --- Hmm. This is really bizarre. The framework id for a status update is encoded in the status update message itself, which is sent by the executor (driver). Can you paste the log complete slave log lines between 8:58 and 9:03? Also log lines from stdout/stderr of the executor would be useful. > mesos-slave assigned marathon task wrongly to chronos framework after task > failure > -- > > Key: MESOS-4084 > URL: https://issues.apache.org/jira/browse/MESOS-4084 > Project: Mesos > Issue Type: Bug > Components: slave >Affects Versions: 0.22.2 > Environment: Ubuntu 14.04.2 LTS > Mesos 0.22.2 > Marathon 0.11.0 > Chronos 2.4.0 >Reporter: Erhan Kesken >Priority: Minor > > I don't know how to reproduce problem, only thing I can do, is sharing my > logs: > https://gist.github.com/ekesken/f2edfd65cca8638b0136 > These are highlights from my logs: > mesos-slave logs: > {noformat} > Dec 7 08:58:27 mesos-slave-node-012 mesos-slave[56099]: I1207 > 08:58:27.089156 56130 slave.cpp:2531] Handling status update TASK_FAILED > (UUID: 5b335fab-1722-4270-83a6-b4ec843be47f) for task > collector_tr_insurance_ebv_facebookscraper.ab3ddc6b-9cc0-11e5-8f21-0242ec411128 > of framework 20151113-112010-100670892-5050-7957-0001 from > executor(1)@172.29.1.12:1651 > 08:58:27 mesos-slave-node-012 mesos-slave[56099]: E1207 08:58:27.089874 56128 > slave.cpp:2662] Failed to update resources for container > ed5f4f67-464d-4786-9628-cd48732de6b7 of executor > collector_tr_insurance_ebv_facebookscraper.ab3ddc6b-9cc0-11e5-8f21-0242ec411128 > running task > collector_tr_insurance_ebv_facebookscraper.ab3ddc6b-9cc0-11e5-8f21-0242ec411128 > on status update for terminal task, destroying container: Failed to > determine cgroup for the 'cpu' subsystem: Failed to read /proc/34074/cgroup: > Failed to open file '/proc/34074/cgroup': No such file or directory > {noformat} > notice the framework id above, then 5m later we got following logs: > {noformat} > Dec 7 09:03:27 mesos-slave-node-012 mesos-slave[56099]: I1207 > 09:03:27.653187 56130 slave.cpp:2531] Handling status update TASK_RUNNING > (UUID: 81aee6b0-2b9d-470a-a543-f14f7cae699b) for task > collector_tr_insurance_ebv_facebookscraper.ab3ddc6b-9cc0-11e5-8f21-0242ec411128 > in health state unhealthy of framework > 20150624-210230-117448108-5050-3678-0001 from executor(1)@172.29.1.12:1651 > Dec 7 09:03:27 mesos-slave-node-012 mesos-slave[56099]: W1207 > 09:03:27.653282 56130 slave.cpp:2568] Could not find the executor for status > update TASK_RUNNING (UUID: 81aee6b0-2b9d-470a-a543-f14f7cae699b) for task > collector_tr_insurance_ebv_facebookscraper.ab3ddc6b-9cc0-11e5-8f21-0242ec411128 > in health state unhealthy of framework > 20150624-210230-117448108-5050-3678-0001 > Dec 7 09:03:27 mesos-slave-node-012 mesos-slave[56099]: I1207 > 09:03:27.653390 56130 status_update_manager.cpp:317] Received status update > TASK_RUNNING (UUID: 81aee6b0-2b9d-470a-a543-f14f7cae699b) for task > collector_tr_insurance_ebv_facebookscraper.ab3ddc6b-9cc0-11e5-8f21-0242ec411128 > in health state unhealthy of framework > 20150624-210230-117448108-5050-3678-0001 > Dec 7 09:03:27 mesos-slave-node-012 mesos-slave[56099]: I1207 > 09:03:27.653543 56130 slave.cpp:2776] Forwarding the update TASK_RUNNING > (UUID: 81aee6b0-2b9d-470a-a543-f14f7cae699b) for task > collector_tr_insurance_ebv_facebookscraper.ab3ddc6b-9cc0-11e5-8f21-0242ec411128 > in health state unhealthy of framework > 20150624-210230-117448108-5050-3678-0001 to master@172.29.0.5:5050 > Dec 7 09:03:27 mesos-slave-node-012 mesos-slave[56099]: I1207 > 09:03:27.653688 56130 slave.cpp:2709] Sending acknowledgement for status > update TASK_RUNNING (UUID: 81aee6b0-2b9d-470a-a543-f14f7cae699b) for task > collector_tr_insurance_ebv_facebookscraper.ab3ddc6b-9cc0-11e5-8f21-0242ec411128 > in health state unhealthy of framework > 20150624-210230-117448108-5050-3678-0001 to executor(1)@172.29.1.12:1651 > Dec 7 09:03:37 mesos-slave-node-012 mesos-slave[56099]: W1207 > 09:03:37.654337 56134 status_update_manager.cpp:472] Resending status update > TASK_RUNNING (UUID: 81aee6b0-2b9d-470a-a543-f14f7cae699b) for task > collector_tr_insurance_ebv_facebookscraper.ab3ddc6b-9cc0-11e5-8f21-0242ec411128 > in health state unhealthy of framework > 20150624-210230-117448108-5050-3678-0001 > {noformat} > this caused deactivation of chronos immediately as seen on mesos-master log: > {noformat} > Dec 7 09:03:27 mesos-master-node-001 mesos-master[40898]: I1207 > 09:03:27.654770 40948 master.cpp:1964] Deactivating framework >
[jira] [Commented] (MESOS-3828) Strategy for Utilizing Docker 1.9 Multihost Networking
[ https://issues.apache.org/jira/browse/MESOS-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047127#comment-15047127 ] Spike Curtis commented on MESOS-3828: - We're already well on the road to getting IP-per-container networking into the MesosContainerizer. I would strongly advocate for a unified networking layer that handles both Docker and Mesos containers. This work is centered at: https://github.com/mesosphere/net-modules We can straightforwardly enhance this work to extend multihost networking to Docker containers as well, without the need for Docker's libnetwork. That gives us the advantage of all tasks getting a consistent, unified, and easy to understand network function, rather than attempting to mate Docker's opinionated network model with Mesos container networking. > Strategy for Utilizing Docker 1.9 Multihost Networking > -- > > Key: MESOS-3828 > URL: https://issues.apache.org/jira/browse/MESOS-3828 > Project: Mesos > Issue Type: Story > Components: isolation >Affects Versions: 0.26.0 >Reporter: John Omernik >Assignee: Timothy Chen > Labels: Docker, isolation, mesosphere, network, plugins > > This is a user story to discuss the strategy for Mesos to in using the new > Docker 1.9 feature: Multihost Networking. > http://blog.docker.com/2015/11/docker-multi-host-networking-ga/ > Basically we should determine if this is something we want to work with from > a standpoint of container isolation and going forward how can we best > integrate. > The space for networking in Mesos is growing fast with IP per Container and > other networking modules being worked on. Projects like Project Calico offer > services from outside the Mesos community that plug nicely or will plug > nicely into Mesos. > So how about Multihost networking? An option to work with? With Docker being > a first class citizen of Mesos, this is something we should be considering. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3951) Make HDFS tool wrappers asynchronous.
[ https://issues.apache.org/jira/browse/MESOS-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-3951: -- Story Points: 5 > Make HDFS tool wrappers asynchronous. > - > > Key: MESOS-3951 > URL: https://issues.apache.org/jira/browse/MESOS-3951 > Project: Mesos > Issue Type: Task >Reporter: Jie Yu >Assignee: Jie Yu > > The existing HDFS tool wrappers (src/hdfs/hdfs.hpp) are synchronous. They use > os::shell to shell out the 'hadoop' commands. This makes it very hard to be > reused at other locations in the code base. > The URI fetcher HDFS plugin will try to re-use the existing HDFS tool > wrappers. In order to do that, we need to make it asynchronous first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3925) Add HDFS based URI fetcher plugin.
[ https://issues.apache.org/jira/browse/MESOS-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-3925: -- Story Points: 3 > Add HDFS based URI fetcher plugin. > -- > > Key: MESOS-3925 > URL: https://issues.apache.org/jira/browse/MESOS-3925 > Project: Mesos > Issue Type: Task >Reporter: Jie Yu > Labels: twitter > > This plugin uses HDFS client to fetch artifacts. It can support schemes like > hdfs/hftp/s3/s3n > It'll shell out the hadoop command to do the actual fetching. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3925) Add HDFS based URI fetcher plugin.
[ https://issues.apache.org/jira/browse/MESOS-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-3925: -- Labels: mesosphere twitter (was: twitter) > Add HDFS based URI fetcher plugin. > -- > > Key: MESOS-3925 > URL: https://issues.apache.org/jira/browse/MESOS-3925 > Project: Mesos > Issue Type: Task >Reporter: Jie Yu > Labels: mesosphere, twitter > > This plugin uses HDFS client to fetch artifacts. It can support schemes like > hdfs/hftp/s3/s3n > It'll shell out the hadoop command to do the actual fetching. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3951) Make HDFS tool wrappers asynchronous.
[ https://issues.apache.org/jira/browse/MESOS-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-3951: -- Labels: mesosphere twitter (was: ) > Make HDFS tool wrappers asynchronous. > - > > Key: MESOS-3951 > URL: https://issues.apache.org/jira/browse/MESOS-3951 > Project: Mesos > Issue Type: Task >Reporter: Jie Yu >Assignee: Jie Yu > Labels: mesosphere, twitter > > The existing HDFS tool wrappers (src/hdfs/hdfs.hpp) are synchronous. They use > os::shell to shell out the 'hadoop' commands. This makes it very hard to be > reused at other locations in the code base. > The URI fetcher HDFS plugin will try to re-use the existing HDFS tool > wrappers. In order to do that, we need to make it asynchronous first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-3925) Add HDFS based URI fetcher plugin.
[ https://issues.apache.org/jira/browse/MESOS-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu reassigned MESOS-3925: - Assignee: Jie Yu > Add HDFS based URI fetcher plugin. > -- > > Key: MESOS-3925 > URL: https://issues.apache.org/jira/browse/MESOS-3925 > Project: Mesos > Issue Type: Task >Reporter: Jie Yu >Assignee: Jie Yu > Labels: mesosphere, twitter > > This plugin uses HDFS client to fetch artifacts. It can support schemes like > hdfs/hftp/s3/s3n > It'll shell out the hadoop command to do the actual fetching. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4084) mesos-slave assigned marathon task wrongly to chronos framework after task failure
[ https://issues.apache.org/jira/browse/MESOS-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047311#comment-15047311 ] Erhan Kesken commented on MESOS-4084: - I shared my complete slave log here: https://gist.github.com/ekesken/bed20adfba0995117d74 .Unfortunately stdout/stderr files are not available. > mesos-slave assigned marathon task wrongly to chronos framework after task > failure > -- > > Key: MESOS-4084 > URL: https://issues.apache.org/jira/browse/MESOS-4084 > Project: Mesos > Issue Type: Bug > Components: slave >Affects Versions: 0.22.2 > Environment: Ubuntu 14.04.2 LTS > Mesos 0.22.2 > Marathon 0.11.0 > Chronos 2.4.0 >Reporter: Erhan Kesken >Priority: Minor > > I don't know how to reproduce problem, only thing I can do, is sharing my > logs: > https://gist.github.com/ekesken/f2edfd65cca8638b0136 > These are highlights from my logs: > mesos-slave logs: > {noformat} > Dec 7 08:58:27 mesos-slave-node-012 mesos-slave[56099]: I1207 > 08:58:27.089156 56130 slave.cpp:2531] Handling status update TASK_FAILED > (UUID: 5b335fab-1722-4270-83a6-b4ec843be47f) for task > collector_tr_insurance_ebv_facebookscraper.ab3ddc6b-9cc0-11e5-8f21-0242ec411128 > of framework 20151113-112010-100670892-5050-7957-0001 from > executor(1)@172.29.1.12:1651 > 08:58:27 mesos-slave-node-012 mesos-slave[56099]: E1207 08:58:27.089874 56128 > slave.cpp:2662] Failed to update resources for container > ed5f4f67-464d-4786-9628-cd48732de6b7 of executor > collector_tr_insurance_ebv_facebookscraper.ab3ddc6b-9cc0-11e5-8f21-0242ec411128 > running task > collector_tr_insurance_ebv_facebookscraper.ab3ddc6b-9cc0-11e5-8f21-0242ec411128 > on status update for terminal task, destroying container: Failed to > determine cgroup for the 'cpu' subsystem: Failed to read /proc/34074/cgroup: > Failed to open file '/proc/34074/cgroup': No such file or directory > {noformat} > notice the framework id above, then 5m later we got following logs: > {noformat} > Dec 7 09:03:27 mesos-slave-node-012 mesos-slave[56099]: I1207 > 09:03:27.653187 56130 slave.cpp:2531] Handling status update TASK_RUNNING > (UUID: 81aee6b0-2b9d-470a-a543-f14f7cae699b) for task > collector_tr_insurance_ebv_facebookscraper.ab3ddc6b-9cc0-11e5-8f21-0242ec411128 > in health state unhealthy of framework > 20150624-210230-117448108-5050-3678-0001 from executor(1)@172.29.1.12:1651 > Dec 7 09:03:27 mesos-slave-node-012 mesos-slave[56099]: W1207 > 09:03:27.653282 56130 slave.cpp:2568] Could not find the executor for status > update TASK_RUNNING (UUID: 81aee6b0-2b9d-470a-a543-f14f7cae699b) for task > collector_tr_insurance_ebv_facebookscraper.ab3ddc6b-9cc0-11e5-8f21-0242ec411128 > in health state unhealthy of framework > 20150624-210230-117448108-5050-3678-0001 > Dec 7 09:03:27 mesos-slave-node-012 mesos-slave[56099]: I1207 > 09:03:27.653390 56130 status_update_manager.cpp:317] Received status update > TASK_RUNNING (UUID: 81aee6b0-2b9d-470a-a543-f14f7cae699b) for task > collector_tr_insurance_ebv_facebookscraper.ab3ddc6b-9cc0-11e5-8f21-0242ec411128 > in health state unhealthy of framework > 20150624-210230-117448108-5050-3678-0001 > Dec 7 09:03:27 mesos-slave-node-012 mesos-slave[56099]: I1207 > 09:03:27.653543 56130 slave.cpp:2776] Forwarding the update TASK_RUNNING > (UUID: 81aee6b0-2b9d-470a-a543-f14f7cae699b) for task > collector_tr_insurance_ebv_facebookscraper.ab3ddc6b-9cc0-11e5-8f21-0242ec411128 > in health state unhealthy of framework > 20150624-210230-117448108-5050-3678-0001 to master@172.29.0.5:5050 > Dec 7 09:03:27 mesos-slave-node-012 mesos-slave[56099]: I1207 > 09:03:27.653688 56130 slave.cpp:2709] Sending acknowledgement for status > update TASK_RUNNING (UUID: 81aee6b0-2b9d-470a-a543-f14f7cae699b) for task > collector_tr_insurance_ebv_facebookscraper.ab3ddc6b-9cc0-11e5-8f21-0242ec411128 > in health state unhealthy of framework > 20150624-210230-117448108-5050-3678-0001 to executor(1)@172.29.1.12:1651 > Dec 7 09:03:37 mesos-slave-node-012 mesos-slave[56099]: W1207 > 09:03:37.654337 56134 status_update_manager.cpp:472] Resending status update > TASK_RUNNING (UUID: 81aee6b0-2b9d-470a-a543-f14f7cae699b) for task > collector_tr_insurance_ebv_facebookscraper.ab3ddc6b-9cc0-11e5-8f21-0242ec411128 > in health state unhealthy of framework > 20150624-210230-117448108-5050-3678-0001 > {noformat} > this caused deactivation of chronos immediately as seen on mesos-master log: > {noformat} > Dec 7 09:03:27 mesos-master-node-001 mesos-master[40898]: I1207 > 09:03:27.654770 40948 master.cpp:1964] Deactivating framework > 20150624-210230-117448108-5050-3678-0001 (chronos-2.4.0) at > scheduler-7a4396f7-1f68-4f41-901e-805db5de0432@172.29.0.6:11893 > Dec 7 09:03:27 mesos-master-node-001
[jira] [Created] (MESOS-4096) stout tests fail to build with external protobuf version
James Peach created MESOS-4096: -- Summary: stout tests fail to build with external protobuf version Key: MESOS-4096 URL: https://issues.apache.org/jira/browse/MESOS-4096 Project: Mesos Issue Type: Bug Components: build, stout Reporter: James Peach Using the following configure options: {code} prefix/configure \ --disable-java \ --disable-python \ --enable-silent-rules \ --enable-debug \ --with-apr=$(apr-1-config --prefix) \ --with-protobuf=$(pkg-config --variable=prefix protobuf-lite) {code} The stout tests fail to build because code generated with a different protobuf version is checked in and not regenerated: {code} CXX stout_tests-protobuf_tests.pb.o In file included from /Users/jpeach/src/mesos.git/3rdparty/libprocess/3rdparty/stout/tests/protobuf_tests.pb.cc:5: /Users/jpeach/src/mesos.git/3rdparty/libprocess/3rdparty/stout/tests/protobuf_tests.pb.h:17:2: error: This file was generated by an older version of protoc which is #error This file was generated by an older version of protoc which is ^ /Users/jpeach/src/mesos.git/3rdparty/libprocess/3rdparty/stout/tests/protobuf_tests.pb.h:18:2: error: incompatible with your Protocol Buffer headers. Please #error incompatible with your Protocol Buffer headers. Please ^ /Users/jpeach/src/mesos.git/3rdparty/libprocess/3rdparty/stout/tests/protobuf_tests.pb.h:19:2: error: regenerate this file with a newer version of protoc. #error regenerate this file with a newer version of protoc. ^ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-4048) Consider unifying slave timeout behavior between steady state and master failover
[ https://issues.apache.org/jira/browse/MESOS-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047385#comment-15047385 ] Benjamin Mahler edited comment on MESOS-4048 at 12/8/15 8:06 PM: - This ticket is independent from MESOS-4049 in that it is discussing the current inconsistent approaches to agent partition detection (case 1 and 2 above). When we were implementing master recovery, we wanted to use health checking to determine when an agent is unhealthy, but there were some implementation difficulties that led to the addition of {{\-\-slave_reregistration_timer}} instead. This approach is a bit scary because we may remove healthy agents that for some reason (e.g. ZK connectivity issues) could not re-register with the master after master failover. This was why we put in place some safety nets ({{\-\-recovery_slave_removal_limit}} and we were able to re-use used the removal rate limiting). The point of this ticket is to look into removing {{\-\-slave_reregistration_timer}} entirely and have the master perform the same health check based partition detection that it does in the steady state. So, MESOS-4049 is about what we do *when* an agent is unhealthy. This ticket is about *how* we determine that an agent is unhealthy. Specifically, we want to determine it in a consistent way rather than having one approach in steady state and a different approach after master failover. Make sense? was (Author: bmahler): This ticket is independent from MESOS-4049 in that it is discussing the current inconsistent approaches to agent partition handling (case 1 and 2 above). When we were implementing master recovery, we wanted to use health checking to determine when an agent should be removed, but there were some implementation difficulties that led to the addition of {{--slave_reregistration_timer}} instead. This approach is a bit scary because we may remove healthy agents that for some reason (e.g. ZK connectivity issues) could not re-register with the master after master failover. This was why we put in place some safety nets ({{--recovery_slave_removal_limit}} and we were able to re-use used the removal rate limiting). The point of this ticket is to look into removing {{--slave_reregistration_timer}} entirely and have the master perform the same health check based partition detection that it does in the steady state. So, MESOS-4049 is about what we do *when* an agent is unhealthy (e.g. partitioned). This ticket is about *how* we determine that an agent is unhealthy (e.g. partitioned). Specifically, we want to determine it in a consistent way rather than having one approach in steady state and a different approach after master failover. Make sense? > Consider unifying slave timeout behavior between steady state and master > failover > - > > Key: MESOS-4048 > URL: https://issues.apache.org/jira/browse/MESOS-4048 > Project: Mesos > Issue Type: Improvement > Components: master, slave >Reporter: Neil Conway >Assignee: Anindya Sinha >Priority: Minor > Labels: mesosphere > > Currently, there are two timeouts that control what happens when an agent is > partitioned from the master: > 1. {{max_slave_ping_timeouts}} + {{slave_ping_timeout}} controls how long the > master waits before declaring a slave to be dead in the "steady state" > 2. {{slave_reregister_timeout}} controls how long the master waits for a > slave to reregister after master failover. > It is unclear whether these two cases really merit being treated differently > -- it might be simpler for operators to configure a single timeout that > controls how long the master waits before declaring that a slave is dead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4097) Change /roles endpoint to include quotas, weights, reserved resources?
Neil Conway created MESOS-4097: -- Summary: Change /roles endpoint to include quotas, weights, reserved resources? Key: MESOS-4097 URL: https://issues.apache.org/jira/browse/MESOS-4097 Project: Mesos Issue Type: Improvement Reporter: Neil Conway MESOS-4085 changes the behavior of the {{/roles}} endpoint: rather than listing all the explicitly defined roles, we will now only list those roles that have one or more registered frameworks. As suggested by [~alexr] in code review, this could be improved -- an operator might reasonably expect to see all the roles that have * non-default weight * non-default quota * non-default ACLs? * any static or dynamically reserved resources -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4099) parallel make tests does not build all test targets
Joris Van Remoortere created MESOS-4099: --- Summary: parallel make tests does not build all test targets Key: MESOS-4099 URL: https://issues.apache.org/jira/browse/MESOS-4099 Project: Mesos Issue Type: Bug Components: libprocess Affects Versions: 0.26.0 Environment: Ubuntu 15.04 clang-3.6 as well as gcc-4.9 Reporter: Joris Van Remoortere Assignee: Kapil Arya When inside 3rdparty/libprocess: Running {{make -j8 tests}} from a clean build does not yield the {{libprocess-tests}} binary. Running it a subsequent time triggers more compilation and ends up yielding the {{libprocess-tests}} binary. This suggests the {{test}} target is not being built correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4098) Allow interactive terminal for mesos containerizer
Jojy Varghese created MESOS-4098: Summary: Allow interactive terminal for mesos containerizer Key: MESOS-4098 URL: https://issues.apache.org/jira/browse/MESOS-4098 Project: Mesos Issue Type: Improvement Components: containerization Environment: linux Reporter: Jojy Varghese Assignee: Jojy Varghese Today mesos containerizer does not have a way to run tasks that require interactive sessions. An example use case is running a task that requires a manual password entry from an operator. Another use case could be debugging (gdb). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4071) Master crash during framework teardown ( Check failed: total.resources.contains(slaveId))
[ https://issues.apache.org/jira/browse/MESOS-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047464#comment-15047464 ] Joris Van Remoortere commented on MESOS-4071: - My main fear here is that this wouldn't catch scenarios where the delta gradually gets larger as operations are performed. [~jamespeach]Would you be up for writing a simple test case where we apply the arithmetic resource operations (eg. add, then subtract) iteratively to see if there are conditions under which the delta grows? If the delta can grow then an `almostEquals` approach will just make the problem rarer, and not solve it. In this case we need to fix the math itself. I want to make sure that we do not "push the problem down the road", especially if there are logical branches dependent on this math. There are likely even more of these in the schedulers that we communicate with, rather than the ones pointed out in the mesos code base. > Master crash during framework teardown ( Check failed: > total.resources.contains(slaveId)) > - > > Key: MESOS-4071 > URL: https://issues.apache.org/jira/browse/MESOS-4071 > Project: Mesos > Issue Type: Bug > Components: master >Affects Versions: 0.25.0 >Reporter: Mandeep Chadha > > Stack Trace : > NOTE : Replaced IP address with XX.XX.XX.XX > {code} > I1204 10:31:03.391127 2588810 master.cpp:5564] Processing TEARDOWN call for > framework 61ce62d1-7418-4ae1-aa78-a8ebf75ad502-0014 > (mloop-coprocesses-183c4999-9ce9-47b2-bc96-a865c672fcbb (TEST) at > scheduler-c8ab2103-cf36-40d8-8a2d-a6b69a8fc...@xx.xx.xx.xx:35237 > I1204 10:31:03.391177 2588810 master.cpp:5576] Removing framework > 61ce62d1-7418-4ae1-aa78-a8ebf75ad502-0014 > (mloop-coprocesses-183c4999-9ce9-47b2-bc96-a865c672fcbb (TEST)) at > schedulerc8ab2103-cf36-40d8-8a2d-a6b69a8fc...@xx.xx.xx.xx:35237 > I1204 10:31:03.391337 2588805 hierarchical.hpp:605] Deactivated framework > 61ce62d1-7418-4ae1-aa78-a8ebf75ad502-0014 > F1204 10:31:03.395500 2588810 sorter.cpp:233] Check failed: > total.resources.contains(slaveId) > *** Check failure stack trace: *** > @ 0x7f2b3dda53d8 google::LogMessage::Fail() > @ 0x7f2b3dda5327 google::LogMessage::SendToLog() > @ 0x7f2b3dda4d38 google::LogMessage::Flush() > @ 0x7f2b3dda7a6c google::LogMessageFatal::~LogMessageFatal() > @ 0x7f2b3d3351a1 > mesos::internal::master::allocator::DRFSorter::remove() > @ 0x7f2b3d0b8c29 > mesos::internal::master::allocator::HierarchicalAllocatorProcess<>::removeFramework() > @ 0x7f2b3d0ca823 > _ZZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS1_11FrameworkIDES6_EEvRKNS_3PIDIT_EEMSA_FvT0_ET1_ENKUlPNS_11ProcessBaseEE_clESJ_ > @ 0x7f2b3d0dc8dc > _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS5_11FrameworkIDESA_EEvRKNS0_3PIDIT_EEMSE_FvT0_ET1_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2 > _ > @ 0x7f2b3dd2cc35 std::function<>::operator()() > @ 0x7f2b3dd15ae5 process::ProcessBase::visit() > @ 0x7f2b3dd188e2 process::DispatchEvent::visit() > @ 0x472366 process::ProcessBase::serve() > @ 0x7f2b3dd1203f process::ProcessManager::resume() > @ 0x7f2b3dd061b2 process::internal::schedule() > @ 0x7f2b3dd63efd > _ZNSt12_Bind_simpleIFPFvvEvEE9_M_invokeIJEEEvSt12_Inde > x_tupleIJXspT_EEE > @ 0x7f2b3dd63e4d std::_Bind_simple<>::operator()() > @ 0x7f2b3dd63de6 std::thread::_Impl<>::_M_run() > @ 0x318c2b6470 (unknown) > @ 0x318b2079d1 (unknown) > @ 0x318aae8b5d (unknown) > @ (nil) (unknown) > Aborted (core dumped) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3738) Mesos health check is invoked incorrectly when Mesos slave is within the docker container
[ https://issues.apache.org/jira/browse/MESOS-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048240#comment-15048240 ] Mark Hindess commented on MESOS-3738: - Has this fix been backported to a 0.23.x release? I'm using the latest 0.23.1 debian package and it is still broken. In case it helps anyone else upgrade smoothly to a working release, I am using a workaround of creating a mesos-health-check wrapper that execs the real mesos-health-check. That is: {code} bash$ cat#!/bin/sh > exec /usr/libexec/mesos/mesos-health-check "$@" > EOF bash$ chmod 0755 mesos-health-check bash$ fakeroot sh -c "chown root:root mesos-health-check; \ tar cf - mesos-health-check |gzip -9 >mesos-health-check.tar.gz" bash$ tar tvzf mesos-health-check.tar.gz -rwxr-xr-x root/root56 2015-12-09 07:44 mesos-health-check bash$ # deploy mesos-health-check.tar.gz to your mesos-slaves (I used ansible) bash$ # if using docker, restart your slaves with mesos-health-check.tar.gz bash$ # mounted as volume into your mesos-slave container bash$ # add file:///path/to/mesos-health-check.tar.gz to uris in app json {code} > Mesos health check is invoked incorrectly when Mesos slave is within the > docker container > - > > Key: MESOS-3738 > URL: https://issues.apache.org/jira/browse/MESOS-3738 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 0.25.0 > Environment: Docker 1.8.0: > Client: > Version: 1.8.0 > API version: 1.20 > Go version: go1.4.2 > Git commit: 0d03096 > Built:Tue Aug 11 16:48:39 UTC 2015 > OS/Arch: linux/amd64 > Server: > Version: 1.8.0 > API version: 1.20 > Go version: go1.4.2 > Git commit: 0d03096 > Built:Tue Aug 11 16:48:39 UTC 2015 > OS/Arch: linux/amd64 > Host: Ubuntu 14.04 > Container: Debian 8.1 + Java-7 >Reporter: Yong Tang >Assignee: haosdent > Fix For: 0.26.0 > > Attachments: MESOS-3738-0_23_1.patch, MESOS-3738-0_24_1.patch, > MESOS-3738-0_25_0.patch > > > When Mesos slave is within the container, the COMMAND health check from > Marathon is invoked incorrectly. > In such a scenario, the sandbox directory (instead of the > launcher/health-check directory) is used. This result in an error with the > container. > Command to invoke the Mesos slave container: > {noformat} > sudo docker run -d -v /sys:/sys -v /usr/bin/docker:/usr/bin/docker:ro -v > /usr/lib/x86_64-linux-gnu/libapparmor.so.1:/usr/lib/x86_64-linux-gnu/libapparmor.so.1:ro > -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/mesos:/tmp/mesos mesos > mesos slave --master=zk://10.2.1.2:2181/mesos --containerizers=docker,mesos > --executor_registration_timeout=5mins --docker_stop_timeout=10secs > --launcher=posix > {noformat} > Marathon JSON file: > {code} > { > "id": "ubuntu", > "container": > { > "type": "DOCKER", > "docker": > { > "image": "ubuntu", > "network": "BRIDGE", > "parameters": [] > } > }, > "args": [ "bash", "-c", "while true; do echo 1; sleep 5; done" ], > "uris": [], > "healthChecks": > [ > { > "protocol": "COMMAND", > "command": { "value": "echo Success" }, > "gracePeriodSeconds": 3000, > "intervalSeconds": 5, > "timeoutSeconds": 5, > "maxConsecutiveFailures": 300 > } > ], > "instances": 1 > } > {code} > {noformat} > STDOUT: > root@cea2be47d64f:/mnt/mesos/sandbox# cat stdout > --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f" > --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" > --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" > --mapped_directory="/mnt/mesos/sandbox" --quiet="false" > --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f" > --stop_timeout="10secs" > --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f" > --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" > --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" > --mapped_directory="/mnt/mesos/sandbox" --quiet="false" > --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f" > --stop_timeout="10secs" > Registered docker executor on b01e2e75afcb > Starting task
[jira] [Commented] (MESOS-3818) Line wrapping for "--help" output
[ https://issues.apache.org/jira/browse/MESOS-3818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048223#comment-15048223 ] Shuai Lin commented on MESOS-3818: -- Sorry for the confusion. I mean 80 columns might be too small for the output, since the flag name and the empty space already took 43 columns in the example I pasted, so I wonder should we stick to 80 columns, or use a larger value like 100 columns? > Line wrapping for "--help" output > - > > Key: MESOS-3818 > URL: https://issues.apache.org/jira/browse/MESOS-3818 > Project: Mesos > Issue Type: Improvement >Reporter: Neil Conway >Assignee: Shuai Lin >Priority: Trivial > Labels: mesosphere, newbie > > The output of `mesos-slave --help`, `mesos-master --help`, and perhaps other > programs has very inconsistent line wrapping: different help text fragments > are wrapped at very different column numbers, which harms readability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3909) isolator module headers depend on picojson headers
[ https://issues.apache.org/jira/browse/MESOS-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047595#comment-15047595 ] James Peach commented on MESOS-3909: There are a number of solutions to this that I can see 1. Move the picojson dependencies into a .cpp file in stout stout is supposed to be a header-only library, and this would undo that. I'm not sure of the history of why stout needs to be header-only, but maybe this restriction can be loosened. 2. Copy the picojson dependencies into a .cpp in libmesos This works since picojson is just an internal dependency of Mesos. It needs a little ifdef hackery and it might be tricky to avoid copying the relevant picojson code, so maintainability is a question. 3. Install picojson.h We would need to install picojson.h as and adjust Mesos include paths appropriately. Using an unbundled picojson would no longer work (though it probably doesn't work right today). 4. Do nothing You can't build a Mesos isolator without fishing in the Mesos code for picojson.h. This seems undesirable since it makes release engineering of isolator modules harder. > isolator module headers depend on picojson headers > -- > > Key: MESOS-3909 > URL: https://issues.apache.org/jira/browse/MESOS-3909 > Project: Mesos > Issue Type: Bug > Components: c++ api, modules >Reporter: James Peach >Assignee: James Peach > > When trying to build an isolator module, stout headers end up depending on > {{picojson.hpp}} which is not installed. > {code} > In file included from /opt/mesos/include/mesos/module/isolator.hpp:25: > In file included from /opt/mesos/include/mesos/slave/isolator.hpp:30: > In file included from /opt/mesos/include/process/dispatch.hpp:22: > In file included from /opt/mesos/include/process/process.hpp:26: > In file included from /opt/mesos/include/process/event.hpp:21: > In file included from /opt/mesos/include/process/http.hpp:39: > /opt/mesos/include/stout/json.hpp:23:10: fatal error: 'picojson.h' file not > found > #include > ^ > 8 warnings and 1 error generated. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4084) mesos-slave assigned marathon task wrongly to chronos framework after task failure
[ https://issues.apache.org/jira/browse/MESOS-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047845#comment-15047845 ] Vinod Kone commented on MESOS-4084: --- I haven't dug deeply but looks like the second status update (TASK_RUNNING) was being sent by the health check process. [~tnachen] any idea why a health check process launched inside docker executor outlives the container and sends status updates? > mesos-slave assigned marathon task wrongly to chronos framework after task > failure > -- > > Key: MESOS-4084 > URL: https://issues.apache.org/jira/browse/MESOS-4084 > Project: Mesos > Issue Type: Bug > Components: slave >Affects Versions: 0.22.2 > Environment: Ubuntu 14.04.2 LTS > Mesos 0.22.2 > Marathon 0.11.0 > Chronos 2.4.0 >Reporter: Erhan Kesken >Priority: Minor > > I don't know how to reproduce problem, only thing I can do, is sharing my > logs: > https://gist.github.com/ekesken/f2edfd65cca8638b0136 > These are highlights from my logs: > mesos-slave logs: > {noformat} > Dec 7 08:58:27 mesos-slave-node-012 mesos-slave[56099]: I1207 > 08:58:27.089156 56130 slave.cpp:2531] Handling status update TASK_FAILED > (UUID: 5b335fab-1722-4270-83a6-b4ec843be47f) for task > collector_tr_insurance_ebv_facebookscraper.ab3ddc6b-9cc0-11e5-8f21-0242ec411128 > of framework 20151113-112010-100670892-5050-7957-0001 from > executor(1)@172.29.1.12:1651 > 08:58:27 mesos-slave-node-012 mesos-slave[56099]: E1207 08:58:27.089874 56128 > slave.cpp:2662] Failed to update resources for container > ed5f4f67-464d-4786-9628-cd48732de6b7 of executor > collector_tr_insurance_ebv_facebookscraper.ab3ddc6b-9cc0-11e5-8f21-0242ec411128 > running task > collector_tr_insurance_ebv_facebookscraper.ab3ddc6b-9cc0-11e5-8f21-0242ec411128 > on status update for terminal task, destroying container: Failed to > determine cgroup for the 'cpu' subsystem: Failed to read /proc/34074/cgroup: > Failed to open file '/proc/34074/cgroup': No such file or directory > {noformat} > notice the framework id above, then 5m later we got following logs: > {noformat} > Dec 7 09:03:27 mesos-slave-node-012 mesos-slave[56099]: I1207 > 09:03:27.653187 56130 slave.cpp:2531] Handling status update TASK_RUNNING > (UUID: 81aee6b0-2b9d-470a-a543-f14f7cae699b) for task > collector_tr_insurance_ebv_facebookscraper.ab3ddc6b-9cc0-11e5-8f21-0242ec411128 > in health state unhealthy of framework > 20150624-210230-117448108-5050-3678-0001 from executor(1)@172.29.1.12:1651 > Dec 7 09:03:27 mesos-slave-node-012 mesos-slave[56099]: W1207 > 09:03:27.653282 56130 slave.cpp:2568] Could not find the executor for status > update TASK_RUNNING (UUID: 81aee6b0-2b9d-470a-a543-f14f7cae699b) for task > collector_tr_insurance_ebv_facebookscraper.ab3ddc6b-9cc0-11e5-8f21-0242ec411128 > in health state unhealthy of framework > 20150624-210230-117448108-5050-3678-0001 > Dec 7 09:03:27 mesos-slave-node-012 mesos-slave[56099]: I1207 > 09:03:27.653390 56130 status_update_manager.cpp:317] Received status update > TASK_RUNNING (UUID: 81aee6b0-2b9d-470a-a543-f14f7cae699b) for task > collector_tr_insurance_ebv_facebookscraper.ab3ddc6b-9cc0-11e5-8f21-0242ec411128 > in health state unhealthy of framework > 20150624-210230-117448108-5050-3678-0001 > Dec 7 09:03:27 mesos-slave-node-012 mesos-slave[56099]: I1207 > 09:03:27.653543 56130 slave.cpp:2776] Forwarding the update TASK_RUNNING > (UUID: 81aee6b0-2b9d-470a-a543-f14f7cae699b) for task > collector_tr_insurance_ebv_facebookscraper.ab3ddc6b-9cc0-11e5-8f21-0242ec411128 > in health state unhealthy of framework > 20150624-210230-117448108-5050-3678-0001 to master@172.29.0.5:5050 > Dec 7 09:03:27 mesos-slave-node-012 mesos-slave[56099]: I1207 > 09:03:27.653688 56130 slave.cpp:2709] Sending acknowledgement for status > update TASK_RUNNING (UUID: 81aee6b0-2b9d-470a-a543-f14f7cae699b) for task > collector_tr_insurance_ebv_facebookscraper.ab3ddc6b-9cc0-11e5-8f21-0242ec411128 > in health state unhealthy of framework > 20150624-210230-117448108-5050-3678-0001 to executor(1)@172.29.1.12:1651 > Dec 7 09:03:37 mesos-slave-node-012 mesos-slave[56099]: W1207 > 09:03:37.654337 56134 status_update_manager.cpp:472] Resending status update > TASK_RUNNING (UUID: 81aee6b0-2b9d-470a-a543-f14f7cae699b) for task > collector_tr_insurance_ebv_facebookscraper.ab3ddc6b-9cc0-11e5-8f21-0242ec411128 > in health state unhealthy of framework > 20150624-210230-117448108-5050-3678-0001 > {noformat} > this caused deactivation of chronos immediately as seen on mesos-master log: > {noformat} > Dec 7 09:03:27 mesos-master-node-001 mesos-master[40898]: I1207 > 09:03:27.654770 40948 master.cpp:1964] Deactivating framework > 20150624-210230-117448108-5050-3678-0001 (chronos-2.4.0) at >
[jira] [Created] (MESOS-4102) Quota doesn't allocate resources on slave joining
Neil Conway created MESOS-4102: -- Summary: Quota doesn't allocate resources on slave joining Key: MESOS-4102 URL: https://issues.apache.org/jira/browse/MESOS-4102 Project: Mesos Issue Type: Bug Components: allocation Reporter: Neil Conway See attached patch. {{framework1}} is not allocated any resources, despite the fact that the resources on {{agent2}} can safely be allocated to it without risk of violating {{quota1}}. If I understand the intended quota behavior correctly, this doesn't seem intended. Note that if the framework is added _after_ the slaves are added, the resources on {{agent2}} are allocated to {{framework1}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3962) Add labels to the message Port
[ https://issues.apache.org/jira/browse/MESOS-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047850#comment-15047850 ] Avinash Sridharan commented on MESOS-3962: -- MESOS-3401 was incorrectly linked to this issue due to a confusion of what the issue meant by the message "Port". We wrongly assumed that it was related to port resources offered by the slave. The message "Port" being referred to here is the protobuf used in Discovery info. > Add labels to the message Port > -- > > Key: MESOS-3962 > URL: https://issues.apache.org/jira/browse/MESOS-3962 > Project: Mesos > Issue Type: Wish >Reporter: Sargun Dhillon >Assignee: Avinash Sridharan >Priority: Minor > Labels: mesosphere > > I want to add arbitrary labels to the message "Port". I have a few use cases > for this: > 1) I want to use it to drive isolators to install firewall rules associated > with the port > 2) I want to use it to drive third party components to be able to specify > advertising information > 3) I want to be able to able to use this to associate a deterministic virtual > hostname with a given port > Ideally, once the task is launched, these labels would be immutable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3962) Add labels to the message Port
[ https://issues.apache.org/jira/browse/MESOS-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047883#comment-15047883 ] Avinash Sridharan commented on MESOS-3962: -- Had a discussion with Adam and Sargun. From mesos perspective a labels field needs to be introduced as an optional field in the message "Port" in include/mesos/mesos.proto . We will also need to update the JSON model object for TaskInfo to reflect these fields in state.json. However, making these changes in itself is not enough since currently Marathon is not populating the DiscoveryInfo field in TaskInfo. This implies that for service discovery to consume this field there are changes that need to be made in Marathon as well. > Add labels to the message Port > -- > > Key: MESOS-3962 > URL: https://issues.apache.org/jira/browse/MESOS-3962 > Project: Mesos > Issue Type: Wish >Reporter: Sargun Dhillon >Assignee: Avinash Sridharan >Priority: Minor > Labels: mesosphere > > I want to add arbitrary labels to the message "Port". I have a few use cases > for this: > 1) I want to use it to drive isolators to install firewall rules associated > with the port > 2) I want to use it to drive third party components to be able to specify > advertising information > 3) I want to be able to able to use this to associate a deterministic virtual > hostname with a given port > Ideally, once the task is launched, these labels would be immutable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3962) Add labels to the message Port
[ https://issues.apache.org/jira/browse/MESOS-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avinash Sridharan updated MESOS-3962: - Shepherd: Adam B External issue URL: https://github.com/mesosphere/marathon/issues/1866 > Add labels to the message Port > -- > > Key: MESOS-3962 > URL: https://issues.apache.org/jira/browse/MESOS-3962 > Project: Mesos > Issue Type: Wish >Reporter: Sargun Dhillon >Assignee: Avinash Sridharan >Priority: Minor > Labels: mesosphere > > I want to add arbitrary labels to the message "Port". I have a few use cases > for this: > 1) I want to use it to drive isolators to install firewall rules associated > with the port > 2) I want to use it to drive third party components to be able to specify > advertising information > 3) I want to be able to able to use this to associate a deterministic virtual > hostname with a given port > Ideally, once the task is launched, these labels would be immutable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4102) Quota doesn't allocate resources on slave joining
[ https://issues.apache.org/jira/browse/MESOS-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-4102: --- Attachment: quota_absent_framework_test-1.patch > Quota doesn't allocate resources on slave joining > - > > Key: MESOS-4102 > URL: https://issues.apache.org/jira/browse/MESOS-4102 > Project: Mesos > Issue Type: Bug > Components: allocation >Reporter: Neil Conway > Labels: mesosphere, quota > Attachments: quota_absent_framework_test-1.patch > > > See attached patch. {{framework1}} is not allocated any resources, despite > the fact that the resources on {{agent2}} can safely be allocated to it > without risk of violating {{quota1}}. If I understand the intended quota > behavior correctly, this doesn't seem intended. > Note that if the framework is added _after_ the slaves are added, the > resources on {{agent2}} are allocated to {{framework1}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4103) Show disk usage and allocation in WebUI
Vinod Kone created MESOS-4103: - Summary: Show disk usage and allocation in WebUI Key: MESOS-4103 URL: https://issues.apache.org/jira/browse/MESOS-4103 Project: Mesos Issue Type: Improvement Reporter: Vinod Kone Assignee: Vinod Kone Several places in the WebUI do not show disk utilization data (they only show cpu and mem). The max share shown in the webui also doesn't account for disk! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4048) Consider unifying slave timeout behavior between steady state and master failover
[ https://issues.apache.org/jira/browse/MESOS-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047709#comment-15047709 ] Klaus Ma commented on MESOS-4048: - Got it, make sense to me :). > Consider unifying slave timeout behavior between steady state and master > failover > - > > Key: MESOS-4048 > URL: https://issues.apache.org/jira/browse/MESOS-4048 > Project: Mesos > Issue Type: Improvement > Components: master, slave >Reporter: Neil Conway >Assignee: Anindya Sinha >Priority: Minor > Labels: mesosphere > > Currently, there are two timeouts that control what happens when an agent is > partitioned from the master: > 1. {{max_slave_ping_timeouts}} + {{slave_ping_timeout}} controls how long the > master waits before declaring a slave to be dead in the "steady state" > 2. {{slave_reregister_timeout}} controls how long the master waits for a > slave to reregister after master failover. > It is unclear whether these two cases really merit being treated differently > -- it might be simpler for operators to configure a single timeout that > controls how long the master waits before declaring that a slave is dead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3909) isolator module headers depend on picojson headers
[ https://issues.apache.org/jira/browse/MESOS-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047596#comment-15047596 ] James Peach commented on MESOS-3909: I tried (2) and it was pretty ugly. I think that (3) is the best bet. > isolator module headers depend on picojson headers > -- > > Key: MESOS-3909 > URL: https://issues.apache.org/jira/browse/MESOS-3909 > Project: Mesos > Issue Type: Bug > Components: c++ api, modules >Reporter: James Peach >Assignee: James Peach > > When trying to build an isolator module, stout headers end up depending on > {{picojson.hpp}} which is not installed. > {code} > In file included from /opt/mesos/include/mesos/module/isolator.hpp:25: > In file included from /opt/mesos/include/mesos/slave/isolator.hpp:30: > In file included from /opt/mesos/include/process/dispatch.hpp:22: > In file included from /opt/mesos/include/process/process.hpp:26: > In file included from /opt/mesos/include/process/event.hpp:21: > In file included from /opt/mesos/include/process/http.hpp:39: > /opt/mesos/include/stout/json.hpp:23:10: fatal error: 'picojson.h' file not > found > #include > ^ > 8 warnings and 1 error generated. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-4087) Introduce a module for logging executor/task output
[ https://issues.apache.org/jira/browse/MESOS-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045985#comment-15045985 ] Joseph Wu edited comment on MESOS-4087 at 12/8/15 11:23 PM: Reviews: || Review || Summary|| | https://reviews.apache.org/r/41055/ https://reviews.apache.org/r/41057/ | Refactoring | | https://reviews.apache.org/r/41002/ | Module interface | | https://reviews.apache.org/r/41003/ | Default module implementation | | https://reviews.apache.org/r/41004/ | Modularification | | https://reviews.apache.org/r/41061/ | New agent flags | | https://reviews.apache.org/r/4/ | Regression test | was (Author: kaysoky): Reviews (WIP): https://reviews.apache.org/r/41055/ https://reviews.apache.org/r/41057/ https://reviews.apache.org/r/41002/ https://reviews.apache.org/r/41003/ https://reviews.apache.org/r/41004/ > Introduce a module for logging executor/task output > --- > > Key: MESOS-4087 > URL: https://issues.apache.org/jira/browse/MESOS-4087 > Project: Mesos > Issue Type: Task > Components: containerization, modules >Reporter: Joseph Wu >Assignee: Joseph Wu > Labels: logging, mesosphere > > Existing executor/task logs are logged to files in their sandbox directory, > with some nuances based on which containerizer is used (see background > section in linked document). > A logger for executor/task logs has the following requirements: > * The logger is given a command to run and must handle the stdout/stderr of > the command. > * The handling of stdout/stderr must be resilient across agent failover. > Logging should not stop if the agent fails. > * Logs should be readable, presumably via the web UI, or via some other > module-specific UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3760) Remove fragile sleep() from ProcessManager::settle()
[ https://issues.apache.org/jira/browse/MESOS-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-3760: --- Component/s: test > Remove fragile sleep() from ProcessManager::settle() > > > Key: MESOS-3760 > URL: https://issues.apache.org/jira/browse/MESOS-3760 > Project: Mesos > Issue Type: Bug > Components: libprocess, test >Reporter: Neil Conway >Priority: Minor > Labels: mesosphere, tech-debt, testing > > From {{ProcessManager::settle()}}: > {code} > // While refactoring in order to isolate libev behind abstractions > // it became evident that this os::sleep is vital for tests to > // pass. In particular, there are certain tests that assume too > // much before they attempt to do a settle. One such example is > // tests doing http::get followed by Clock::settle, where they > // expect the http::get will have properly enqueued a process on > // the run queue but http::get is just sending bytes on a > // socket. Without sleeping at the beginning of this function we > // can get unlucky and appear settled when in actuality the > // kernel just hasn't copied the bytes to a socket or we haven't > // yet read the bytes and enqueued an event on a process (and the > // process on the run queue). > os::sleep(Milliseconds(10)); > {code} > Sleeping for 10 milliseconds doesn't guarantee that the kernel has done > anything at all; any test cases that depend on this behavior should be fixed > to actual perform the necessary synchronization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4101) Consider running most/all tests with the clock paused
Neil Conway created MESOS-4101: -- Summary: Consider running most/all tests with the clock paused Key: MESOS-4101 URL: https://issues.apache.org/jira/browse/MESOS-4101 Project: Mesos Issue Type: Improvement Components: test Reporter: Neil Conway Presently, some tests pause the clock before they do timing-sensitive operations, only calling explicitly {{Clock::advance()}} to help ensure that dependencies on the clock don't cause the test to be non-deterministic. (Using {{Clock::advance()}} is typically also faster than waiting for the equivalent amount of physical time to elapse.) However, most tests do not pause the clock, which contributes to the ongoing flakiness witnessed in many tests. We should investigate whether it is feasible to pause the clock in all/most tests (e.g., have the clock paused by default), and only enable the clock when the test cannot be implemented with {{Clock::advance()}}, {{Clock::settle()}}, and similar functions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3818) Line wrapping for "--help" output
[ https://issues.apache.org/jira/browse/MESOS-3818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048100#comment-15048100 ] Neil Conway commented on MESOS-3818: Hi [~lins05], I'm not quite sure what you mean. Shouldn't we pick a column number to wrap the text at (say 80), and then use the same line length for all of the help output text? > Line wrapping for "--help" output > - > > Key: MESOS-3818 > URL: https://issues.apache.org/jira/browse/MESOS-3818 > Project: Mesos > Issue Type: Improvement >Reporter: Neil Conway >Assignee: Shuai Lin >Priority: Trivial > Labels: mesosphere, newbie > > The output of `mesos-slave --help`, `mesos-master --help`, and perhaps other > programs has very inconsistent line wrapping: different help text fragments > are wrapped at very different column numbers, which harms readability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4071) Master crash during framework teardown ( Check failed: total.resources.contains(slaveId))
[ https://issues.apache.org/jira/browse/MESOS-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048023#comment-15048023 ] James Peach commented on MESOS-4071: [~jvanremoortere] it looks like Jie added a simple test in 2e40c67ecf68bd818b789d5dd17baf5e00c43e2b. Is that something like what you were thinking of? > Master crash during framework teardown ( Check failed: > total.resources.contains(slaveId)) > - > > Key: MESOS-4071 > URL: https://issues.apache.org/jira/browse/MESOS-4071 > Project: Mesos > Issue Type: Bug > Components: master >Affects Versions: 0.25.0 >Reporter: Mandeep Chadha > > Stack Trace : > NOTE : Replaced IP address with XX.XX.XX.XX > {code} > I1204 10:31:03.391127 2588810 master.cpp:5564] Processing TEARDOWN call for > framework 61ce62d1-7418-4ae1-aa78-a8ebf75ad502-0014 > (mloop-coprocesses-183c4999-9ce9-47b2-bc96-a865c672fcbb (TEST) at > scheduler-c8ab2103-cf36-40d8-8a2d-a6b69a8fc...@xx.xx.xx.xx:35237 > I1204 10:31:03.391177 2588810 master.cpp:5576] Removing framework > 61ce62d1-7418-4ae1-aa78-a8ebf75ad502-0014 > (mloop-coprocesses-183c4999-9ce9-47b2-bc96-a865c672fcbb (TEST)) at > schedulerc8ab2103-cf36-40d8-8a2d-a6b69a8fc...@xx.xx.xx.xx:35237 > I1204 10:31:03.391337 2588805 hierarchical.hpp:605] Deactivated framework > 61ce62d1-7418-4ae1-aa78-a8ebf75ad502-0014 > F1204 10:31:03.395500 2588810 sorter.cpp:233] Check failed: > total.resources.contains(slaveId) > *** Check failure stack trace: *** > @ 0x7f2b3dda53d8 google::LogMessage::Fail() > @ 0x7f2b3dda5327 google::LogMessage::SendToLog() > @ 0x7f2b3dda4d38 google::LogMessage::Flush() > @ 0x7f2b3dda7a6c google::LogMessageFatal::~LogMessageFatal() > @ 0x7f2b3d3351a1 > mesos::internal::master::allocator::DRFSorter::remove() > @ 0x7f2b3d0b8c29 > mesos::internal::master::allocator::HierarchicalAllocatorProcess<>::removeFramework() > @ 0x7f2b3d0ca823 > _ZZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS1_11FrameworkIDES6_EEvRKNS_3PIDIT_EEMSA_FvT0_ET1_ENKUlPNS_11ProcessBaseEE_clESJ_ > @ 0x7f2b3d0dc8dc > _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS5_11FrameworkIDESA_EEvRKNS0_3PIDIT_EEMSE_FvT0_ET1_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2 > _ > @ 0x7f2b3dd2cc35 std::function<>::operator()() > @ 0x7f2b3dd15ae5 process::ProcessBase::visit() > @ 0x7f2b3dd188e2 process::DispatchEvent::visit() > @ 0x472366 process::ProcessBase::serve() > @ 0x7f2b3dd1203f process::ProcessManager::resume() > @ 0x7f2b3dd061b2 process::internal::schedule() > @ 0x7f2b3dd63efd > _ZNSt12_Bind_simpleIFPFvvEvEE9_M_invokeIJEEEvSt12_Inde > x_tupleIJXspT_EEE > @ 0x7f2b3dd63e4d std::_Bind_simple<>::operator()() > @ 0x7f2b3dd63de6 std::thread::_Impl<>::_M_run() > @ 0x318c2b6470 (unknown) > @ 0x318b2079d1 (unknown) > @ 0x318aae8b5d (unknown) > @ (nil) (unknown) > Aborted (core dumped) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4067) ReservationTest.ACLMultipleOperations is flaky
[ https://issues.apache.org/jira/browse/MESOS-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Park updated MESOS-4067: Shepherd: Michael Park > ReservationTest.ACLMultipleOperations is flaky > -- > > Key: MESOS-4067 > URL: https://issues.apache.org/jira/browse/MESOS-4067 > Project: Mesos > Issue Type: Bug >Reporter: Michael Park >Assignee: Greg Mann > Labels: flaky, mesosphere > Fix For: 0.27.0 > > > Observed from the CI: > https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/1319/changes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4100) Include ContainerID in certain Hook interface calls?
Nicholas Parker created MESOS-4100: -- Summary: Include ContainerID in certain Hook interface calls? Key: MESOS-4100 URL: https://issues.apache.org/jira/browse/MESOS-4100 Project: Mesos Issue Type: Improvement Components: c++ api Affects Versions: 0.25.0 Reporter: Nicholas Parker Priority: Minor I'm building an agent module which uses both the Isolator interface[1] to track containers over their lifespan, and the Hook interface[2] to inject environment variables into those containers. Nearly all of the Isolator interface calls include the ContainerID, sometimes as the sole identifier. Meanwhile the Hook.slaveExecutorEnvironmentDecorator call is only given an ExecutorInfo, and doesn't have a ContainerID at all. At the moment I'm working around the lack of ContainerID in the Hook call by storing a temporary ExecutorInfo->ContainerID mapping when Isolator.prepare() is called, then reading/clearing that mapping when Hook.slaveExecutorEnvironmentDecorator() is called. While this workaround appears to work for now, I worry that it will be brittle in the future, since it depends on Isolator.prepare() consistently being called before Hook.slaveExecutorEnvironmentDecorator(). The immediate issue is specific to including a ContainerID parameter within Hook.slaveExecutorEnvironmentDecorator(), but it may make sense to determine if other Hook calls should have similar updates. [1] https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=include/mesos/slave/isolator.hpp [2] https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=include/mesos/hook.hpp -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3909) isolator module headers depend on picojson headers
[ https://issues.apache.org/jira/browse/MESOS-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047499#comment-15047499 ] James Peach commented on MESOS-3909: Looking at {{stout/json.hpp}}, the dependency on {{picojson}} could easily be eliminated by moving some of the inline functions in the header to a {{.cpp}} file. > isolator module headers depend on picojson headers > -- > > Key: MESOS-3909 > URL: https://issues.apache.org/jira/browse/MESOS-3909 > Project: Mesos > Issue Type: Bug > Components: c++ api, modules >Reporter: James Peach >Assignee: James Peach > > When trying to build an isolator module, stout headers end up depending on > {{picojson.hpp}} which is not installed. > {code} > In file included from /opt/mesos/include/mesos/module/isolator.hpp:25: > In file included from /opt/mesos/include/mesos/slave/isolator.hpp:30: > In file included from /opt/mesos/include/process/dispatch.hpp:22: > In file included from /opt/mesos/include/process/process.hpp:26: > In file included from /opt/mesos/include/process/event.hpp:21: > In file included from /opt/mesos/include/process/http.hpp:39: > /opt/mesos/include/stout/json.hpp:23:10: fatal error: 'picojson.h' file not > found > #include > ^ > 8 warnings and 1 error generated. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2782) Document the sandbox
[ https://issues.apache.org/jira/browse/MESOS-2782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047500#comment-15047500 ] Joseph Wu commented on MESOS-2782: -- The tests for sandbox expectations already exist: * {{PathsTest.Executor}} * {{GarbageCollectorIntegrationTest.ExitedExecutor}} * {{GarbageCollectorIntegrationTest.DiskUsage}} * {{SlaveRecoveryTest.GCExecutor}} * Indirectly tested by {{FilesTest.*}} and {{FetcherTest.*}} > Document the sandbox > > > Key: MESOS-2782 > URL: https://issues.apache.org/jira/browse/MESOS-2782 > Project: Mesos > Issue Type: Documentation > Components: documentation >Reporter: Aaron Bell >Assignee: Joseph Wu > Labels: documentation, mesosphere > > The sandbox is the arena of debugging for most Mesos users. From an > application- or framework-developer perspective, they need to know > - What it is > - Where it is > - How to use it, and how NOT to use it > - What Mesos writes here (fetcher etc.) > - Storage limits > - Lifecycle and garbage collection > This needs to be documented to help users get over the hump of learning to > work with Mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-2782) Document the sandbox
[ https://issues.apache.org/jira/browse/MESOS-2782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047500#comment-15047500 ] Joseph Wu edited comment on MESOS-2782 at 12/8/15 9:25 PM: --- The tests for sandbox expectations already exist: * {{PathsTest.Executor}} * {{GarbageCollectorIntegrationTest.ExitedExecutor}} * {{GarbageCollectorIntegrationTest.DiskUsage}} * {{SlaveRecoveryTest.GCExecutor}} * Indirectly tested by {{FilesTest.\*}} and {{FetcherTest.\*}} was (Author: kaysoky): The tests for sandbox expectations already exist: * {{PathsTest.Executor}} * {{GarbageCollectorIntegrationTest.ExitedExecutor}} * {{GarbageCollectorIntegrationTest.DiskUsage}} * {{SlaveRecoveryTest.GCExecutor}} * Indirectly tested by {{FilesTest.*}} and {{FetcherTest.*}} > Document the sandbox > > > Key: MESOS-2782 > URL: https://issues.apache.org/jira/browse/MESOS-2782 > Project: Mesos > Issue Type: Documentation > Components: documentation >Reporter: Aaron Bell >Assignee: Joseph Wu > Labels: documentation, mesosphere > > The sandbox is the arena of debugging for most Mesos users. From an > application- or framework-developer perspective, they need to know > - What it is > - Where it is > - How to use it, and how NOT to use it > - What Mesos writes here (fetcher etc.) > - Storage limits > - Lifecycle and garbage collection > This needs to be documented to help users get over the hump of learning to > work with Mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)