Re: [VOTE] Release Apache Mesos 0.23.0 (rc2)
+1 (non-binding) centos 7.1 and ubuntu 14.04 make check runs fine known issues with sudo make check on centos 7.1 On Fri, Jul 10, 2015 at 2:20 AM, Ian Downes idow...@twitter.com wrote: No it doesn't block me (I normally compile without python...). On Thu, Jul 9, 2015 at 4:48 PM, Adam Bordelon a...@mesosphere.io wrote: Ian, is your PythonFramework error a blocker? If so, please file a JIRA and target it to 0.23.0 Otherwise, I'm ready to cut rc3 with the fix for https://issues.apache.org/jira/browse/MESOS-3025 On Thu, Jul 9, 2015 at 4:11 PM, Marco Massenzio ma...@mesosphere.io wrote: This seems to be somewhat related to PB 2.4 v 2.5 (what Mesos uses) - and possibly, indirectly, to Py 2.6 v 2.7 (wild guess here). The problem with Python is that it's always difficult to figure out where it goes looking for imports (unless you have a virtualenv and/or munge sys.path) so it may well be that it finds `mesos.interface` from the main system site-packages folder (where you may have an old version of the protobuf libraries) instead of the correct (for 2.5.0) place (under our build/3rdparty/... foders). As in the other instance, a log dump of sys.path just before the import *may* shed some light (or add to the confusion). IMO we should require Python == 2.7 (no idea if we can support Python 3, my guess is we can't, because of this https://github.com/google/protobuf/issues/9), but that's probably another story. *Marco Massenzio* *Distributed Systems Engineer* On Thu, Jul 9, 2015 at 3:21 PM, Ian Downes idow...@twitter.com wrote: The ExamplesTest.PythonFramework test fails differently for me on CentOS5 with python 2.6.6. I presume we don't require 2.7? [idownes@hostname build]$ MESOS_VERBOSE=1 ./bin/mesos-tests.sh --gtest_filter=ExamplesTest.PythonFramework Source directory: /home/idownes/workspace/mesos Build directory: /home/idownes/workspace/mesos/build - We cannot run any cgroups tests that require mounting hierarchies because you have the following hierarchies mounted: /sys/fs/cgroup/cpu, /sys/fs/cgroup/cpuacct, /sys/fs/cgroup/freezer, /sys/fs/cgroup/memory, /sys/fs/cgroup/perf_event We'll disable the CgroupsNoHierarchyTest test fixture for now. - - We cannot run any Docker tests because: Failed to get docker version: Failed to execute 'docker --version': exited with status 127 - /usr/bin/nc Note: Google Test filter = trimmed [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from ExamplesTest [ RUN ] ExamplesTest.PythonFramework Using temporary directory '/tmp/ExamplesTest_PythonFramework_igPnUB' Traceback (most recent call last): File /home/idownes/workspace/mesos/build/../src/examples/python/test_framework.py, line 24, in module from mesos.interface import mesos_pb2 File build/bdist.linux-x86_64/egg/mesos/interface/mesos_pb2.py, line 4, in module ImportError: cannot import name enum_type_wrapper ../../src/tests/script.cpp:83: Failure Failed python_framework_test.sh exited with status 1 [ FAILED ] ExamplesTest.PythonFramework (136 ms) [--] 1 test from ExamplesTest (136 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (169 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] ExamplesTest.PythonFramework 1 FAILED TEST YOU HAVE 10 DISABLED TESTS [idownes@hostname build]$ python --version Python 2.6.6 On Thu, Jul 9, 2015 at 2:53 PM, Vinod Kone vinodk...@gmail.com wrote: I'm assuming the 50 min Jeff mentioned was when doing a 'make check' on a fresh copy of mesos source code. The majority of that time should be due to compilation of source and test code (both of which will be sped up by -j); a sequential run of the test suite should be within 10 min IIRC. On Thu, Jul 9, 2015 at 2:40 PM, Marco Massenzio ma...@mesosphere.io wrote: @Vinod: unfortunately, the tests must be run sequentially, so (at least, as far as I can tell) there's virtually no speedup in 'make check' by using the -j switch. As someone else pointed out, it would be grand if we could have a 'test compilation' step (which can be run in parallel and speeds up) distinct from a 'run tests' step (which must run sequentially). *Marco Massenzio* *Distributed Systems Engineer* On Thu, Jul 9, 2015 at 2:28 PM, Vinod Kone vinodk...@gmail.com wrote: As a tangent, you can speed up the build by doing make -j#threads check. On Thu, Jul 9, 2015 at 1:35 PM, Jeff Schroeder jeffschroe...@computer.org wrote: I'm unable to replicate the same failure on another up to date RHEL 7.1 machine for some strange reason. Even blowing away the checkout, doing a
Marathon can no longer deploy any apps after a failover
Problem: If i restart a current framework leader for marathon ( the host from active frameworks tab in mesos ui) , a new one is elected after a moment and any new deployments are stuck infinitely at 'deploying' state (empty black bar, 0/1 and hanging - with debug level i dont see any errors in marathon/mesos logs) Also the old tasks are untouchable at that time - yes, they keep running, but cant kill, restart nor scale them. When that happens i can: stop marathon on all masters remove the framework via a curl to mesos api /shutdown purge /marathon from zookeper cli restart docker services on all slaves (that kills the zombie containers) restart mesos-slave services on all slaves (pampering my paranoia here) then i can deploy apps again. How can i avoid this problem? Any basic settings im missing? This is scary, as the reboot of a single master (out of 3 or 5 servers) freezes everything that is deployed using marathon, and the steps to reclaim control introduce downtime to every single app sunning there. Configuration: Running ubuntu 14.04.2. LTS mesos 0.22.1-1.0.ubuntu1404 marathon0.9.0-1.0.381.ubuntu1404 chronos 2.3.4-1.0.81.ubuntu1404 The cluster uses 3 masters and a 15 slaves. Also the master machines are running mesos-slave process (albeit those machines give only a portion of resources as offerrings) The configuration for mesos/marathon is very default dependant, options specified You can see below. The quorum is 2. Marathon service is run on 3 master machines root@mesos-master1 ~ # tree /etc/marathon/ /etc/marathon/ `-- conf |-- event_subscriber |-- framework_name |-- hostname |-- logging_level `-- zk 1 directory, 5 files root@mesos-master1 ~ # tree /etc/mesos /etc/mesos `-- zk 0 directories, 1 file root@mesos-master1 ~ # tree /etc/mesos-slave/ /etc/mesos-slave/ |-- containerizers |-- docker_stop_timeout |-- executor_registration_timeout |-- executor_shutdown_grace_period |-- hostname |-- ip |-- logging_level `-- resources 0 directories, 8 files root@mesos-master1 ~ # tree /etc/mesos-master /etc/mesos-master |-- cluster |-- hostname |-- ip |-- logging_level |-- quorum `-- work_dir
Re: Marathon can no longer deploy any apps after a failover
sometimes you need check zookeeper log, slave log, master log. this is mesos pain, it very difficult debug for the wired case. 2015-07-16 20:29 GMT+08:00 Maciej Strzelecki maciej.strzele...@crealytics.com: Problem: If i restart a current framework leader for marathon ( the host from active frameworks tab in mesos ui) , a new one is elected after a moment and any new deployments are stuck infinitely at 'deploying' state (empty black bar, 0/1 and hanging - with debug level i dont see any errors in marathon/mesos logs) Also the old tasks are untouchable at that time - yes, they keep running, but cant kill, restart nor scale them. When that happens i can: stop marathon on all masters remove the framework via a curl to mesos api /shutdown purge /marathon from zookeper cli restart docker services on all slaves (that kills the zombie containers) restart mesos-slave services on all slaves (pampering my paranoia here) then i can deploy apps again. How can i avoid this problem? Any basic settings im missing? This is scary, as the reboot of a single master (out of 3 or 5 servers) freezes everything that is deployed using marathon, and the steps to reclaim control introduce downtime to every single app sunning there. Configuration: Running ubuntu 14.04.2. LTS mesos 0.22.1-1.0.ubuntu1404 marathon0.9.0-1.0.381.ubuntu1404 chronos 2.3.4-1.0.81.ubuntu1404 The cluster uses 3 masters and a 15 slaves. Also the master machines are running mesos-slave process (albeit those machines give only a portion of resources as offerrings) The configuration for mesos/marathon is very default dependant, options specified You can see below. The quorum is 2. Marathon service is run on 3 master machines root@mesos-master1 ~ # tree /etc/marathon/ /etc/marathon/ `-- conf |-- event_subscriber |-- framework_name |-- hostname |-- logging_level `-- zk 1 directory, 5 files root@mesos-master1 ~ # tree /etc/mesos /etc/mesos `-- zk 0 directories, 1 file root@mesos-master1 ~ # tree /etc/mesos-slave/ /etc/mesos-slave/ |-- containerizers |-- docker_stop_timeout |-- executor_registration_timeout |-- executor_shutdown_grace_period |-- hostname |-- ip |-- logging_level `-- resources 0 directories, 8 files root@mesos-master1 ~ # tree /etc/mesos-master /etc/mesos-master |-- cluster |-- hostname |-- ip |-- logging_level |-- quorum `-- work_dir -- Deshi Xiao Twitter: xds2000 E-mail: xiaods(AT)gmail.com
RE: Marathon can no longer deploy any apps after a failover
Maciej, I had a similar problem but it got solved by setting LIBPROCESS_IP environment variable to the host IP address for the Marathon process. Nikolay From: Maciej Strzelecki [mailto:maciej.strzele...@crealytics.com] Sent: Thursday, July 16, 2015 7:30 AM To: user@mesos.apache.org Subject: Marathon can no longer deploy any apps after a failover Problem: If i restart a current framework leader for marathon ( the host from active frameworks tab in mesos ui) , a new one is elected after a moment and any new deployments are stuck infinitely at 'deploying' state (empty black bar, 0/1 and hanging - with debug level i dont see any errors in marathon/mesos logs) Also the old tasks are untouchable at that time - yes, they keep running, but cant kill, restart nor scale them. When that happens i can: stop marathon on all masters remove the framework via a curl to mesos api /shutdown purge /marathon from zookeper cli restart docker services on all slaves (that kills the zombie containers) restart mesos-slave services on all slaves (pampering my paranoia here) then i can deploy apps again. How can i avoid this problem? Any basic settings im missing? This is scary, as the reboot of a single master (out of 3 or 5 servers) freezes everything that is deployed using marathon, and the steps to reclaim control introduce downtime to every single app sunning there. Configuration: Running ubuntu 14.04.2. LTS mesos 0.22.1-1.0.ubuntu1404 marathon0.9.0-1.0.381.ubuntu1404 chronos 2.3.4-1.0.81.ubuntu1404 The cluster uses 3 masters and a 15 slaves. Also the master machines are running mesos-slave process (albeit those machines give only a portion of resources as offerrings) The configuration for mesos/marathon is very default dependant, options specified You can see below. The quorum is 2. Marathon service is run on 3 master machines root@mesos-master1 ~ # tree /etc/marathon/ /etc/marathon/ `-- conf |-- event_subscriber |-- framework_name |-- hostname |-- logging_level `-- zk 1 directory, 5 files root@mesos-master1 ~ # tree /etc/mesos /etc/mesos `-- zk 0 directories, 1 file root@mesos-master1 ~ # tree /etc/mesos-slave/ /etc/mesos-slave/ |-- containerizers |-- docker_stop_timeout |-- executor_registration_timeout |-- executor_shutdown_grace_period |-- hostname |-- ip |-- logging_level `-- resources 0 directories, 8 files root@mesos-master1 ~ # tree /etc/mesos-master /etc/mesos-master |-- cluster |-- hostname |-- ip |-- logging_level |-- quorum `-- work_dir
Re: Marathon can no longer deploy any apps after a failover
Sounds like a marathon issue. Mind asking in marathon's mailing list? On Thu, Jul 16, 2015 at 8:02 AM, Nikolay Borodachev nbo...@adobe.com wrote: Maciej, I had a similar problem but it got solved by setting LIBPROCESS_IP environment variable to the host IP address for the Marathon process. Nikolay *From:* Maciej Strzelecki [mailto:maciej.strzele...@crealytics.com] *Sent:* Thursday, July 16, 2015 7:30 AM *To:* user@mesos.apache.org *Subject:* Marathon can no longer deploy any apps after a failover Problem: If i restart a current framework leader for marathon ( the host from active frameworks tab in mesos ui) , a new one is elected after a moment and any new deployments are stuck infinitely at 'deploying' state (empty black bar, 0/1 and hanging - with debug level i dont see any errors in marathon/mesos logs) Also the old tasks are untouchable at that time - yes, they keep running, but cant kill, restart nor scale them. When that happens i can: stop marathon on all masters remove the framework via a curl to mesos api /shutdown purge /marathon from zookeper cli restart docker services on all slaves (that kills the zombie containers) restart mesos-slave services on all slaves (pampering my paranoia here) then i can deploy apps again. How can i avoid this problem? Any basic settings im missing? This is scary, as the reboot of a single master (out of 3 or 5 servers) freezes everything that is deployed using marathon, and the steps to reclaim control introduce downtime to every single app sunning there. Configuration: Running ubuntu 14.04.2. LTS mesos 0.22.1-1.0.ubuntu1404 marathon0.9.0-1.0.381.ubuntu1404 chronos 2.3.4-1.0.81.ubuntu1404 The cluster uses 3 masters and a 15 slaves. Also the master machines are running mesos-slave process (albeit those machines give only a portion of resources as offerrings) The configuration for mesos/marathon is very default dependant, options specified You can see below. The quorum is 2. Marathon service is run on 3 master machines root@mesos-master1 ~ # tree /etc/marathon/ /etc/marathon/ `-- conf |-- event_subscriber |-- framework_name |-- hostname |-- logging_level `-- zk 1 directory, 5 files root@mesos-master1 ~ # tree /etc/mesos /etc/mesos `-- zk 0 directories, 1 file root@mesos-master1 ~ # tree /etc/mesos-slave/ /etc/mesos-slave/ |-- containerizers |-- docker_stop_timeout |-- executor_registration_timeout |-- executor_shutdown_grace_period |-- hostname |-- ip |-- logging_level `-- resources 0 directories, 8 files root@mesos-master1 ~ # tree /etc/mesos-master /etc/mesos-master |-- cluster |-- hostname |-- ip |-- logging_level |-- quorum `-- work_dir
Mesos-DNS configuration problem with dockerized web application
Hi, I can’t access my application using mesos-dns. Neither port 8123 nor 8080 responding. I think I miss something in configuration but can’t find problem myself. I have a very basic java application that listen on port 8080. I have created docker image and deployed this application to marathon. My deployment configuration is following: $ cat app-slick.json { container: { type: DOCKER, docker: { image: edvorkin/slick-swagger:1, network: BRIDGE, portMappings: [ { containerPort: 8080, hostPort: 0, servicePort: 9000, protocol: tcp } ] } }, cmd: java -jar /tmp/spray-slick-swagger-assembly-0.0.2.jar Boot, id: slick-swagger-demo, instances: 1, cpus: 0.1, mem: 256, constraints: [ [hostname, UNIQUE] ] } Application successfully deployed to 2 nodes and assigned random port of 31990 and 31000 on each node. Now I installed and configured Mesos-DNS with config.json { zk: zk://172.31.50.58:2181,172.31.50.59:2181,172.31.50.60:2181/mesos, refreshSeconds: 60, ttl: 60, domain: mesos, port: 53, resolvers: [172.31.0.2], timeout: 5, email: root.mesos-dns.mesos } and I got following: $ dig slick-swagger-demo.marathon.mesos ; DiG 9.9.4-RedHat-9.9.4-18.el7_1.1 slick-swagger-demo.marathon.mesos ;; global options: +cmd ;; Got answer: ;; -HEADER- opcode: QUERY, status: NOERROR, id: 20376 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;slick-swagger-demo.marathon.mesos. IN A ;; ANSWER SECTION: slick-swagger-demo.marathon.mesos. 60 IN A 172.31.11.202 slick-swagger-demo.marathon.mesos. 60 IN A 172.31.11.203 ;; Query time: 1 msec ;; SERVER: 54.86.164.193#53(54.86.164.193) ;; WHEN: Thu Jul 16 15:23:04 UTC 2015 ;; MSG SIZE rcvd: 83 curl http://localhost:8123/v1/services/_slick-swagger-demo._tcp.marathon.mesos |python -m json.tool % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 289 100 2890 0 1916 0 --:--:-- --:--:-- --:--:-- 1926 [ { host: slick-swagger-demo-15491-s42.marathon.mesos., ip: 172.31.11.203, port: 31990, service: _slick-swagger-demo._tcp.marathon.mesos }, { host: slick-swagger-demo-20495-s43.marathon.mesos., ip: 172.31.11.202, port: 31000, service: _slick-swagger-demo._tcp.marathon.mesos } ] But when I try to access my application using dns name, I can’t get get response. curl http://slick-swagger-demo.marathon.mesos:8080 curl: (7) Failed connect to slick-swagger-demo.marathon.mesos:8080; Connection refused curl slick-swagger-demo.marathon.mesos:8123 404: Page Not Found curl slick-swagger-demo.marathon.mesos:31990 – produce desired results, but that binded to a random port. How do I configure mapping between random ports and my service? I would like to be able to access my server on port 80 for example Curl http://slick-swagger-demo.marathon.mesos Thanks Eugene -- This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, notify the sender immediately by return email and delete the message and any attachments from your system.
Re: Mesos-DNS configuration problem with dockerized web application
Thanks. I was hoping that mesos-dns will do it for me and I can run services on different ports even on the same node. I was hesitant to use HAProxy. I think I have to use HAProxy/Bamboo to achieve this functionality. From: Ondrej Smola ondrej.sm...@gmail.commailto:ondrej.sm...@gmail.com Reply-To: user@mesos.apache.orgmailto:user@mesos.apache.org user@mesos.apache.orgmailto:user@mesos.apache.org Date: Thursday, July 16, 2015 at 2:55 PM To: user@mesos.apache.orgmailto:user@mesos.apache.org user@mesos.apache.orgmailto:user@mesos.apache.org Subject: Re: Mesos-DNS configuration problem with dockerized web application Hi, portMappings: [ { containerPort: 8080, hostPort: 80, servicePort: 9000, protocol: tcp } ] will work - you need to specify required port as hostPort only limitation of this setup is that you wont be able to run multiple services on single host with same hostPort (port collision) but for most setups you should be ok with just choosing random/different ports for different services or ensuring there are more nodes than requested instances with same port if you want to use random port - you will need some have logic to query DNS and parse SRV records and for example setup HA proxy with correctly assigned ports this problem can also be solved using SDN (for example flannel/weave -) assigning each service unique IP address and dont care about port collisions - but this is not related to MesosDNS - just info :) 2015-07-16 17:58 GMT+02:00 Dvorkin-Contractor, Eugene (CORP) eugene.dvorkin-contrac...@adp.commailto:eugene.dvorkin-contrac...@adp.com: Hi, I can’t access my application using mesos-dns. Neither port 8123 nor 8080 responding. I think I miss something in configuration but can’t find problem myself. I have a very basic java application that listen on port 8080. I have created docker image and deployed this application to marathon. My deployment configuration is following: $ cat app-slick.json { container: { type: DOCKER, docker: { image: edvorkin/slick-swagger:1, network: BRIDGE, portMappings: [ { containerPort: 8080, hostPort: 0, servicePort: 9000, protocol: tcp } ] } }, cmd: java -jar /tmp/spray-slick-swagger-assembly-0.0.2.jar Boot, id: slick-swagger-demo, instances: 1, cpus: 0.1, mem: 256, constraints: [ [hostname, UNIQUE] ] } Application successfully deployed to 2 nodes and assigned random port of 31990 and 31000 on each node. Now I installed and configured Mesos-DNS with config.json { zk: zk://172.31.50.58:2181http://172.31.50.58:2181,172.31.50.59:2181http://172.31.50.59:2181,172.31.50.60:2181/mesoshttp://172.31.50.60:2181/mesos, refreshSeconds: 60, ttl: 60, domain: mesos, port: 53, resolvers: [172.31.0.2], timeout: 5, email: root.mesos-dns.mesos } and I got following: $ dig slick-swagger-demo.marathon.mesos ; DiG 9.9.4-RedHat-9.9.4-18.el7_1.1 slick-swagger-demo.marathon.mesos ;; global options: +cmd ;; Got answer: ;; -HEADER- opcode: QUERY, status: NOERROR, id: 20376 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;slick-swagger-demo.marathon.mesos. IN A ;; ANSWER SECTION: slick-swagger-demo.marathon.mesos. 60 IN A 172.31.11.202 slick-swagger-demo.marathon.mesos. 60 IN A 172.31.11.203 ;; Query time: 1 msec ;; SERVER: 54.86.164.193#53(54.86.164.193) ;; WHEN: Thu Jul 16 15:23:04 UTC 2015 ;; MSG SIZE rcvd: 83 curl http://localhost:8123/v1/services/_slick-swagger-demo._tcp.marathon.mesos |python -m json.tool % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 289 100 2890 0 1916 0 --:--:-- --:--:-- --:--:-- 1926 [ { host: slick-swagger-demo-15491-s42.marathon.mesos., ip: 172.31.11.203, port: 31990, service: _slick-swagger-demo._tcp.marathon.mesos }, { host: slick-swagger-demo-20495-s43.marathon.mesos., ip: 172.31.11.202, port: 31000, service: _slick-swagger-demo._tcp.marathon.mesos } ] But when I try to access my application using dns name, I can’t get get response. curl http://slick-swagger-demo.marathon.mesos:8080 curl: (7) Failed connect to slick-swagger-demo.marathon.mesos:8080; Connection refused curl slick-swagger-demo.marathon.mesos:8123 404: Page Not Found curl slick-swagger-demo.marathon.mesos:31990 – produce desired results, but that binded to a random port. How do I configure mapping between random ports and my service? I would like to be able to access my server on port 80 for example Curl http://slick-swagger-demo.marathon.mesos Thanks Eugene This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended
Re: Mesos-DNS configuration problem with dockerized web application
I dont think there is way how can MesosDNS help you in this case ... when you rely on classic DNS lookups - they are based on looking up DNS - IP (a, records) - there is no port lookup (they use port you provide). https://mesosphere.github.io/marathon/docs/service-discovery-load-balancing.html is a good starting point. 2015-07-16 22:24 GMT+02:00 Dvorkin-Contractor, Eugene (CORP) eugene.dvorkin-contrac...@adp.com: Thanks. I was hoping that mesos-dns will do it for me and I can run services on different ports even on the same node. I was hesitant to use HAProxy. I think I have to use HAProxy/Bamboo to achieve this functionality. From: Ondrej Smola ondrej.sm...@gmail.com Reply-To: user@mesos.apache.org user@mesos.apache.org Date: Thursday, July 16, 2015 at 2:55 PM To: user@mesos.apache.org user@mesos.apache.org Subject: Re: Mesos-DNS configuration problem with dockerized web application Hi, portMappings: [ { containerPort: 8080, hostPort: *80*, servicePort: 9000, protocol: tcp } ] will work - you need to specify required port as hostPort only limitation of this setup is that you wont be able to run multiple services on single host with same hostPort (port collision) but for most setups you should be ok with just choosing random/different ports for different services or ensuring there are more nodes than requested instances with same port if you want to use random port - you will need some have logic to query DNS and parse SRV records and for example setup HA proxy with correctly assigned ports this problem can also be solved using SDN (for example flannel/weave -) assigning each service unique IP address and dont care about port collisions - but this is not related to MesosDNS - just info :) 2015-07-16 17:58 GMT+02:00 Dvorkin-Contractor, Eugene (CORP) eugene.dvorkin-contrac...@adp.com: Hi, I can’t access my application using mesos-dns. Neither port 8123 nor 8080 responding. I think I miss something in configuration but can’t find problem myself. I have a very basic java application that listen on port 8080. I have created docker image and deployed this application to marathon. My deployment configuration is following: $ cat app-slick.json { container: { type: DOCKER, docker: { image: edvorkin/slick-swagger:1, network: BRIDGE, portMappings: [ { containerPort: 8080, hostPort: 0, servicePort: 9000, protocol: tcp } ] } }, cmd: java -jar /tmp/spray-slick-swagger-assembly-0.0.2.jar Boot, id: slick-swagger-demo, instances: 1, cpus: 0.1, mem: 256, constraints: [ [hostname, UNIQUE] ] } Application successfully deployed to 2 nodes and assigned random port of 31990 and 31000 on each node. Now I installed and configured Mesos-DNS with config.json { zk: zk://172.31.50.58:2181,172.31.50.59:2181,172.31.50.60:2181/mesos, refreshSeconds: 60, ttl: 60, domain: mesos, port: 53, resolvers: [172.31.0.2], timeout: 5, email: root.mesos-dns.mesos } and I got following: $ *dig slick-swagger-demo.marathon.mesos* ; DiG 9.9.4-RedHat-9.9.4-18.el7_1.1 slick-swagger-demo.marathon.mesos ;; global options: +cmd ;; Got answer: ;; -HEADER- opcode: QUERY, status: NOERROR, id: 20376 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;slick-swagger-demo.marathon.mesos. IN A ;; ANSWER SECTION: slick-swagger-demo.marathon.mesos. 60 IN A 172.31.11.202 slick-swagger-demo.marathon.mesos. 60 IN A 172.31.11.203 ;; Query time: 1 msec ;; SERVER: 54.86.164.193#53(54.86.164.193) ;; WHEN: Thu Jul 16 15:23:04 UTC 2015 ;; MSG SIZE rcvd: 83 *curl http://localhost:8123/v1/services/_slick-swagger-demo._tcp.marathon.mesos http://localhost:8123/v1/services/_slick-swagger-demo._tcp.marathon.mesos |python -m json.tool* % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 289 100 2890 0 1916 0 --:--:-- --:--:-- --:--:-- 1926 [ { host: slick-swagger-demo-15491-s42.marathon.mesos., ip: 172.31.11.203, port: 31990, service: _slick-swagger-demo._tcp.marathon.mesos }, { host: slick-swagger-demo-20495-s43.marathon.mesos., ip: 172.31.11.202, port: 31000, service: _slick-swagger-demo._tcp.marathon.mesos } ] But when I try to access my application using dns name, I can’t get get response. curl http://slick-swagger-demo.marathon.mesos:8080 curl: (7) Failed connect to slick-swagger-demo.marathon.mesos:8080; Connection refused curl slick-swagger-demo.marathon.mesos:8123 404: Page Not Found curl slick-swagger-demo.marathon.mesos:31990 – produce desired results, but that binded to a random port. How do I configure mapping between random ports and
Re: Mesos-DNS configuration problem with dockerized web application
Hi, portMappings: [ { containerPort: 8080, hostPort: *80*, servicePort: 9000, protocol: tcp } ] will work - you need to specify required port as hostPort only limitation of this setup is that you wont be able to run multiple services on single host with same hostPort (port collision) but for most setups you should be ok with just choosing random/different ports for different services or ensuring there are more nodes than requested instances with same port if you want to use random port - you will need some have logic to query DNS and parse SRV records and for example setup HA proxy with correctly assigned ports this problem can also be solved using SDN (for example flannel/weave -) assigning each service unique IP address and dont care about port collisions - but this is not related to MesosDNS - just info :) 2015-07-16 17:58 GMT+02:00 Dvorkin-Contractor, Eugene (CORP) eugene.dvorkin-contrac...@adp.com: Hi, I can’t access my application using mesos-dns. Neither port 8123 nor 8080 responding. I think I miss something in configuration but can’t find problem myself. I have a very basic java application that listen on port 8080. I have created docker image and deployed this application to marathon. My deployment configuration is following: $ cat app-slick.json { container: { type: DOCKER, docker: { image: edvorkin/slick-swagger:1, network: BRIDGE, portMappings: [ { containerPort: 8080, hostPort: 0, servicePort: 9000, protocol: tcp } ] } }, cmd: java -jar /tmp/spray-slick-swagger-assembly-0.0.2.jar Boot, id: slick-swagger-demo, instances: 1, cpus: 0.1, mem: 256, constraints: [ [hostname, UNIQUE] ] } Application successfully deployed to 2 nodes and assigned random port of 31990 and 31000 on each node. Now I installed and configured Mesos-DNS with config.json { zk: zk://172.31.50.58:2181,172.31.50.59:2181,172.31.50.60:2181/mesos, refreshSeconds: 60, ttl: 60, domain: mesos, port: 53, resolvers: [172.31.0.2], timeout: 5, email: root.mesos-dns.mesos } and I got following: $ *dig slick-swagger-demo.marathon.mesos* ; DiG 9.9.4-RedHat-9.9.4-18.el7_1.1 slick-swagger-demo.marathon.mesos ;; global options: +cmd ;; Got answer: ;; -HEADER- opcode: QUERY, status: NOERROR, id: 20376 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;slick-swagger-demo.marathon.mesos. IN A ;; ANSWER SECTION: slick-swagger-demo.marathon.mesos. 60 IN A 172.31.11.202 slick-swagger-demo.marathon.mesos. 60 IN A 172.31.11.203 ;; Query time: 1 msec ;; SERVER: 54.86.164.193#53(54.86.164.193) ;; WHEN: Thu Jul 16 15:23:04 UTC 2015 ;; MSG SIZE rcvd: 83 *curl http://localhost:8123/v1/services/_slick-swagger-demo._tcp.marathon.mesos http://localhost:8123/v1/services/_slick-swagger-demo._tcp.marathon.mesos |python -m json.tool* % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 289 100 2890 0 1916 0 --:--:-- --:--:-- --:--:-- 1926 [ { host: slick-swagger-demo-15491-s42.marathon.mesos., ip: 172.31.11.203, port: 31990, service: _slick-swagger-demo._tcp.marathon.mesos }, { host: slick-swagger-demo-20495-s43.marathon.mesos., ip: 172.31.11.202, port: 31000, service: _slick-swagger-demo._tcp.marathon.mesos } ] But when I try to access my application using dns name, I can’t get get response. curl http://slick-swagger-demo.marathon.mesos:8080 curl: (7) Failed connect to slick-swagger-demo.marathon.mesos:8080; Connection refused curl slick-swagger-demo.marathon.mesos:8123 404: Page Not Found curl slick-swagger-demo.marathon.mesos:31990 – produce desired results, but that binded to a random port. How do I configure mapping between random ports and my service? I would like to be able to access my server on port 80 for example Curl http://slick-swagger-demo.marathon.mesos Thanks Eugene -- This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, notify the sender immediately by return email and delete the message and any attachments from your system.
Re: [VOTE] Release Apache Mesos 0.23.0 (rc3)
+1 (binding) This vote has been silent for almost a week. I assume everybody's busy testing. My testing results: basic integration tests passed for Mesos 0.23.0 on CoreOS with DCOS GUI/CLI, Marathon, Chronos, Spark, HDFS, Cassandra, and Kafka. `make check` passes on Ubuntu and CentOS, but `sudo make check` fails on CentOS 7.1 due to errors in CentOS. See https://issues.apache.org/jira/browse/MESOS-3050 for more details. I'm not convinced this is serious enough to do another release candidate and voting round, but I'll let Tim and others chime in with their thoughts. If we don't get enough deciding votes by 6pm Pacific today, I'll extend the vote for another day. On Thu, Jul 9, 2015 at 6:09 PM, Khanduja, Vaibhav vaibhav.khand...@emc.com wrote: +1 Sent from my iPhone. Please excuse the typos and brevity of this message. On Jul 9, 2015, at 6:07 PM, Adam Bordelon a...@mesosphere.io wrote: Hello Mesos community, Please vote on releasing the following candidate as Apache Mesos 0.23.0. 0.23.0 includes the following: - Per-container network isolation - Dockerized slaves will properly recover Docker containers upon failover. - Upgraded minimum required compilers to GCC 4.8+ or clang 3.5+. as well as experimental support for: - Fetcher Caching - Revocable Resources - SSL encryption - Persistent Volumes - Dynamic Reservations The CHANGELOG for the release is available at: https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.23.0-rc3 The candidate for Mesos 0.23.0 release is available at: https://dist.apache.org/repos/dist/dev/mesos/0.23.0-rc3/mesos-0.23.0.tar.gz The tag to be voted on is 0.23.0-rc3: https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.23.0-rc3 The MD5 checksum of the tarball can be found at: https://dist.apache.org/repos/dist/dev/mesos/0.23.0-rc3/mesos-0.23.0.tar.gz.md5 The signature of the tarball can be found at: https://dist.apache.org/repos/dist/dev/mesos/0.23.0-rc3/mesos-0.23.0.tar.gz.asc The PGP key used to sign the release is here: https://dist.apache.org/repos/dist/release/mesos/KEYS The JAR is up in Maven in a staging repository here: https://repository.apache.org/content/repositories/orgapachemesos-1060 Please vote on releasing this package as Apache Mesos 0.23.0! The vote is open until Thurs July 16th, 18:00 PDT 2015 and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Mesos 0.23.0 [ ] -1 Do not release this package because ... Thanks, -Adam-
Re: [VOTE] Release Apache Mesos 0.23.0 (rc3)
Found a bug in HTTP API related code: MESOS-3055 https://issues.apache.org/jira/browse/MESOS-3055 If we don't fix this in 0.23.0, we cannot expect the 0.24.0 scheduler driver (that will send Calls) to properly subscribe with a 0.23.0 master. I could add a work around in the driver to only send Calls if the master version is 0.24.0, but would prefer to not have to do that. Also, on the review https://reviews.apache.org/r/36518/ for that bug, we realized that we might want to make Subscribe.force 'optional' instead of 'required'. That's an API change, which would be nice to go into 0.23.0 as well. So, not a -1 per se, but if you are willing to cut another RC, I can land the fixes today. Sorry for the trouble. On Thu, Jul 16, 2015 at 11:48 AM, Adam Bordelon a...@mesosphere.io wrote: +1 (binding) This vote has been silent for almost a week. I assume everybody's busy testing. My testing results: basic integration tests passed for Mesos 0.23.0 on CoreOS with DCOS GUI/CLI, Marathon, Chronos, Spark, HDFS, Cassandra, and Kafka. `make check` passes on Ubuntu and CentOS, but `sudo make check` fails on CentOS 7.1 due to errors in CentOS. See https://issues.apache.org/jira/browse/MESOS-3050 for more details. I'm not convinced this is serious enough to do another release candidate and voting round, but I'll let Tim and others chime in with their thoughts. If we don't get enough deciding votes by 6pm Pacific today, I'll extend the vote for another day. On Thu, Jul 9, 2015 at 6:09 PM, Khanduja, Vaibhav vaibhav.khand...@emc.com wrote: +1 Sent from my iPhone. Please excuse the typos and brevity of this message. On Jul 9, 2015, at 6:07 PM, Adam Bordelon a...@mesosphere.io wrote: Hello Mesos community, Please vote on releasing the following candidate as Apache Mesos 0.23.0. 0.23.0 includes the following: - Per-container network isolation - Dockerized slaves will properly recover Docker containers upon failover. - Upgraded minimum required compilers to GCC 4.8+ or clang 3.5+. as well as experimental support for: - Fetcher Caching - Revocable Resources - SSL encryption - Persistent Volumes - Dynamic Reservations The CHANGELOG for the release is available at: https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.23.0-rc3 The candidate for Mesos 0.23.0 release is available at: https://dist.apache.org/repos/dist/dev/mesos/0.23.0-rc3/mesos-0.23.0.tar.gz The tag to be voted on is 0.23.0-rc3: https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.23.0-rc3 The MD5 checksum of the tarball can be found at: https://dist.apache.org/repos/dist/dev/mesos/0.23.0-rc3/mesos-0.23.0.tar.gz.md5 The signature of the tarball can be found at: https://dist.apache.org/repos/dist/dev/mesos/0.23.0-rc3/mesos-0.23.0.tar.gz.asc The PGP key used to sign the release is here: https://dist.apache.org/repos/dist/release/mesos/KEYS The JAR is up in Maven in a staging repository here: https://repository.apache.org/content/repositories/orgapachemesos-1060 Please vote on releasing this package as Apache Mesos 0.23.0! The vote is open until Thurs July 16th, 18:00 PDT 2015 and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Mesos 0.23.0 [ ] -1 Do not release this package because ... Thanks, -Adam-
Re: [VOTE] Release Apache Mesos 0.23.0 (rc3)
Just to add my +1 Built Make check on Ubuntu 14.04 With Without SSL / libevent (no 'sudo' - can test all 4 variants this evening on rc4) — Sent from Mailbox On Thu, Jul 16, 2015 at 3:10 PM, Timothy Chen tnac...@gmail.com wrote: As Adam mention I also think this is not a blocker, as it only affects the way we test the cgroup on CentOS 7.x due to a CentOS bug and doesn't actually impact Mesos normal operations. My vote is +1 as well. Tim On Thu, Jul 16, 2015 at 12:10 PM, Vinod Kone vinodk...@gmail.com wrote: Found a bug in HTTP API related code: MESOS-3055 https://issues.apache.org/jira/browse/MESOS-3055 If we don't fix this in 0.23.0, we cannot expect the 0.24.0 scheduler driver (that will send Calls) to properly subscribe with a 0.23.0 master. I could add a work around in the driver to only send Calls if the master version is 0.24.0, but would prefer to not have to do that. Also, on the review https://reviews.apache.org/r/36518/ for that bug, we realized that we might want to make Subscribe.force 'optional' instead of 'required'. That's an API change, which would be nice to go into 0.23.0 as well. So, not a -1 per se, but if you are willing to cut another RC, I can land the fixes today. Sorry for the trouble. On Thu, Jul 16, 2015 at 11:48 AM, Adam Bordelon a...@mesosphere.io wrote: +1 (binding) This vote has been silent for almost a week. I assume everybody's busy testing. My testing results: basic integration tests passed for Mesos 0.23.0 on CoreOS with DCOS GUI/CLI, Marathon, Chronos, Spark, HDFS, Cassandra, and Kafka. `make check` passes on Ubuntu and CentOS, but `sudo make check` fails on CentOS 7.1 due to errors in CentOS. See https://issues.apache.org/jira/browse/MESOS-3050 for more details. I'm not convinced this is serious enough to do another release candidate and voting round, but I'll let Tim and others chime in with their thoughts. If we don't get enough deciding votes by 6pm Pacific today, I'll extend the vote for another day. On Thu, Jul 9, 2015 at 6:09 PM, Khanduja, Vaibhav vaibhav.khand...@emc.com wrote: +1 Sent from my iPhone. Please excuse the typos and brevity of this message. On Jul 9, 2015, at 6:07 PM, Adam Bordelon a...@mesosphere.io wrote: Hello Mesos community, Please vote on releasing the following candidate as Apache Mesos 0.23.0. 0.23.0 includes the following: - Per-container network isolation - Dockerized slaves will properly recover Docker containers upon failover. - Upgraded minimum required compilers to GCC 4.8+ or clang 3.5+. as well as experimental support for: - Fetcher Caching - Revocable Resources - SSL encryption - Persistent Volumes - Dynamic Reservations The CHANGELOG for the release is available at: https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.23.0-rc3 The candidate for Mesos 0.23.0 release is available at: https://dist.apache.org/repos/dist/dev/mesos/0.23.0-rc3/mesos-0.23.0.tar.gz The tag to be voted on is 0.23.0-rc3: https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.23.0-rc3 The MD5 checksum of the tarball can be found at: https://dist.apache.org/repos/dist/dev/mesos/0.23.0-rc3/mesos-0.23.0.tar.gz.md5 The signature of the tarball can be found at: https://dist.apache.org/repos/dist/dev/mesos/0.23.0-rc3/mesos-0.23.0.tar.gz.asc The PGP key used to sign the release is here: https://dist.apache.org/repos/dist/release/mesos/KEYS The JAR is up in Maven in a staging repository here: https://repository.apache.org/content/repositories/orgapachemesos-1060 Please vote on releasing this package as Apache Mesos 0.23.0! The vote is open until Thurs July 16th, 18:00 PDT 2015 and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Mesos 0.23.0 [ ] -1 Do not release this package because ... Thanks, -Adam-
Re: [VOTE] Release Apache Mesos 0.23.0 (rc3)
-1 so that we can cherry pick MESOS-3055. The master crash bug is MESOS-3070 https://issues.apache.org/jira/browse/MESOS-3070 but the fix is non-trivial and the bug has been in the code base prior to 23.0. So I won't make it a blocker. Can't update the spreadsheet. So here are the commits I would like cherry-picked. fc85cc512b7767fc2e3921b15cf6602c0c68593e bfe6c07b79550bb3d1f2ab6f5344d740e6eb6f60 Thanks Adam. On Thu, Jul 16, 2015 at 7:39 PM, Adam Bordelon a...@mesosphere.io wrote: The 7 day voting period has ended with only 2 binding +1s (we needed 3) and no explicit -1s. However, Vinod says they've found a bug that crashes master when a framework uses duplicate task ids. Vinod, can you please share the new JIRA and officially vote -1 for rc3 if you want to call for an rc4? Assuming we'll cut an rc4, I'm tracking the JIRAs/patches to pull in here: https://docs.google.com/spreadsheets/d/14yUtwfU0mGQ7x7UcjfzZg2o1TuRMkn5SvJvetARM7JQ/edit#gid=0 Since the rc4 changes are minor (mostly tests) and we've heavily tested rc3, the next vote will only last for 3 (business) days. On Thu, Jul 16, 2015 at 6:38 PM, Marco Massenzio ma...@mesosphere.io wrote: Just to add my +1 Built Make check on Ubuntu 14.04 With Without SSL / libevent (no 'sudo' - can test all 4 variants this evening on rc4) — Sent from Mailbox https://www.dropbox.com/mailbox On Thu, Jul 16, 2015 at 3:10 PM, Timothy Chen tnac...@gmail.com wrote: As Adam mention I also think this is not a blocker, as it only affects the way we test the cgroup on CentOS 7.x due to a CentOS bug and doesn't actually impact Mesos normal operations. My vote is +1 as well. Tim On Thu, Jul 16, 2015 at 12:10 PM, Vinod Kone vinodk...@gmail.com wrote: Found a bug in HTTP API related code: MESOS-3055 https://issues.apache.org/jira/browse/MESOS-3055 If we don't fix this in 0.23.0, we cannot expect the 0.24.0 scheduler driver (that will send Calls) to properly subscribe with a 0.23.0 master. I could add a work around in the driver to only send Calls if the master version is 0.24.0, but would prefer to not have to do that. Also, on the review https://reviews.apache.org/r/36518/ for that bug, we realized that we might want to make Subscribe.force 'optional' instead of 'required'. That's an API change, which would be nice to go into 0.23.0 as well. So, not a -1 per se, but if you are willing to cut another RC, I can land the fixes today. Sorry for the trouble. On Thu, Jul 16, 2015 at 11:48 AM, Adam Bordelon a...@mesosphere.io wrote: +1 (binding) This vote has been silent for almost a week. I assume everybody's busy testing. My testing results: basic integration tests passed for Mesos 0.23.0 on CoreOS with DCOS GUI/CLI, Marathon, Chronos, Spark, HDFS, Cassandra, and Kafka. `make check` passes on Ubuntu and CentOS, but `sudo make check` fails on CentOS 7.1 due to errors in CentOS. See https://issues.apache.org/jira/browse/MESOS-3050 for more details. I'm not convinced this is serious enough to do another release candidate and voting round, but I'll let Tim and others chime in with their thoughts. If we don't get enough deciding votes by 6pm Pacific today, I'll extend the vote for another day. On Thu, Jul 9, 2015 at 6:09 PM, Khanduja, Vaibhav vaibhav.khand...@emc.com wrote: +1 Sent from my iPhone. Please excuse the typos and brevity of this message. On Jul 9, 2015, at 6:07 PM, Adam Bordelon a...@mesosphere.io wrote: Hello Mesos community, Please vote on releasing the following candidate as Apache Mesos 0.23.0. 0.23.0 includes the following: - Per-container network isolation - Dockerized slaves will properly recover Docker containers upon failover. - Upgraded minimum required compilers to GCC 4.8+ or clang 3.5+. as well as experimental support for: - Fetcher Caching - Revocable Resources - SSL encryption - Persistent Volumes - Dynamic Reservations The CHANGELOG for the release is available at: https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.23.0-rc3 The candidate for Mesos 0.23.0 release is available at: https://dist.apache.org/repos/dist/dev/mesos/0.23.0-rc3/mesos-0.23.0.tar.gz The tag to be voted on is 0.23.0-rc3: https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.23.0-rc3 The MD5 checksum of the tarball can be found at: https://dist.apache.org/repos/dist/dev/mesos/0.23.0-rc3/mesos-0.23.0.tar.gz.md5 The signature of the tarball can
Re: [VOTE] Release Apache Mesos 0.23.0 (rc3)
The 7 day voting period has ended with only 2 binding +1s (we needed 3) and no explicit -1s. However, Vinod says they've found a bug that crashes master when a framework uses duplicate task ids. Vinod, can you please share the new JIRA and officially vote -1 for rc3 if you want to call for an rc4? Assuming we'll cut an rc4, I'm tracking the JIRAs/patches to pull in here: https://docs.google.com/spreadsheets/d/14yUtwfU0mGQ7x7UcjfzZg2o1TuRMkn5SvJvetARM7JQ/edit#gid=0 Since the rc4 changes are minor (mostly tests) and we've heavily tested rc3, the next vote will only last for 3 (business) days. On Thu, Jul 16, 2015 at 6:38 PM, Marco Massenzio ma...@mesosphere.io wrote: Just to add my +1 Built Make check on Ubuntu 14.04 With Without SSL / libevent (no 'sudo' - can test all 4 variants this evening on rc4) — Sent from Mailbox https://www.dropbox.com/mailbox On Thu, Jul 16, 2015 at 3:10 PM, Timothy Chen tnac...@gmail.com wrote: As Adam mention I also think this is not a blocker, as it only affects the way we test the cgroup on CentOS 7.x due to a CentOS bug and doesn't actually impact Mesos normal operations. My vote is +1 as well. Tim On Thu, Jul 16, 2015 at 12:10 PM, Vinod Kone vinodk...@gmail.com wrote: Found a bug in HTTP API related code: MESOS-3055 https://issues.apache.org/jira/browse/MESOS-3055 If we don't fix this in 0.23.0, we cannot expect the 0.24.0 scheduler driver (that will send Calls) to properly subscribe with a 0.23.0 master. I could add a work around in the driver to only send Calls if the master version is 0.24.0, but would prefer to not have to do that. Also, on the review https://reviews.apache.org/r/36518/ for that bug, we realized that we might want to make Subscribe.force 'optional' instead of 'required'. That's an API change, which would be nice to go into 0.23.0 as well. So, not a -1 per se, but if you are willing to cut another RC, I can land the fixes today. Sorry for the trouble. On Thu, Jul 16, 2015 at 11:48 AM, Adam Bordelon a...@mesosphere.io wrote: +1 (binding) This vote has been silent for almost a week. I assume everybody's busy testing. My testing results: basic integration tests passed for Mesos 0.23.0 on CoreOS with DCOS GUI/CLI, Marathon, Chronos, Spark, HDFS, Cassandra, and Kafka. `make check` passes on Ubuntu and CentOS, but `sudo make check` fails on CentOS 7.1 due to errors in CentOS. See https://issues.apache.org/jira/browse/MESOS-3050 for more details. I'm not convinced this is serious enough to do another release candidate and voting round, but I'll let Tim and others chime in with their thoughts. If we don't get enough deciding votes by 6pm Pacific today, I'll extend the vote for another day. On Thu, Jul 9, 2015 at 6:09 PM, Khanduja, Vaibhav vaibhav.khand...@emc.com wrote: +1 Sent from my iPhone. Please excuse the typos and brevity of this message. On Jul 9, 2015, at 6:07 PM, Adam Bordelon a...@mesosphere.io wrote: Hello Mesos community, Please vote on releasing the following candidate as Apache Mesos 0.23.0. 0.23.0 includes the following: - Per-container network isolation - Dockerized slaves will properly recover Docker containers upon failover. - Upgraded minimum required compilers to GCC 4.8+ or clang 3.5+. as well as experimental support for: - Fetcher Caching - Revocable Resources - SSL encryption - Persistent Volumes - Dynamic Reservations The CHANGELOG for the release is available at: https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.23.0-rc3 The candidate for Mesos 0.23.0 release is available at: https://dist.apache.org/repos/dist/dev/mesos/0.23.0-rc3/mesos-0.23.0.tar.gz The tag to be voted on is 0.23.0-rc3: https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.23.0-rc3 The MD5 checksum of the tarball can be found at: https://dist.apache.org/repos/dist/dev/mesos/0.23.0-rc3/mesos-0.23.0.tar.gz.md5 The signature of the tarball can be found at: https://dist.apache.org/repos/dist/dev/mesos/0.23.0-rc3/mesos-0.23.0.tar.gz.asc The PGP key used to sign the release is here: https://dist.apache.org/repos/dist/release/mesos/KEYS The JAR is up in Maven in a staging repository here: https://repository.apache.org/content/repositories/orgapachemesos-1060 Please vote on releasing this package as Apache Mesos 0.23.0! The vote is open until Thurs July 16th, 18:00 PDT 2015 and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Mesos 0.23.0 [ ] -1 Do not release this
Re: [VOTE] Release Apache Mesos 0.23.0 (rc3)
What about MESOS-3055 in 0.23? Is that going to get passed up on even if we are going to cut another rc? On Thursday, July 16, 2015, Vinod Kone vinodk...@gmail.com wrote: -1 so that we can cherry pick MESOS-3055. The master crash bug is MESOS-3070 https://issues.apache.org/jira/browse/MESOS-3070 but the fix is non-trivial and the bug has been in the code base prior to 23.0. So I won't make it a blocker. Can't update the spreadsheet. So here are the commits I would like cherry-picked. fc85cc512b7767fc2e3921b15cf6602c0c68593e bfe6c07b79550bb3d1f2ab6f5344d740e6eb6f60 Thanks Adam. On Thu, Jul 16, 2015 at 7:39 PM, Adam Bordelon a...@mesosphere.io javascript:_e(%7B%7D,'cvml','a...@mesosphere.io'); wrote: The 7 day voting period has ended with only 2 binding +1s (we needed 3) and no explicit -1s. However, Vinod says they've found a bug that crashes master when a framework uses duplicate task ids. Vinod, can you please share the new JIRA and officially vote -1 for rc3 if you want to call for an rc4? Assuming we'll cut an rc4, I'm tracking the JIRAs/patches to pull in here: https://docs.google.com/spreadsheets/d/14yUtwfU0mGQ7x7UcjfzZg2o1TuRMkn5SvJvetARM7JQ/edit#gid=0 Since the rc4 changes are minor (mostly tests) and we've heavily tested rc3, the next vote will only last for 3 (business) days. On Thu, Jul 16, 2015 at 6:38 PM, Marco Massenzio ma...@mesosphere.io javascript:_e(%7B%7D,'cvml','ma...@mesosphere.io'); wrote: Just to add my +1 Built Make check on Ubuntu 14.04 With Without SSL / libevent (no 'sudo' - can test all 4 variants this evening on rc4) — Sent from Mailbox https://www.dropbox.com/mailbox On Thu, Jul 16, 2015 at 3:10 PM, Timothy Chen tnac...@gmail.com javascript:_e(%7B%7D,'cvml','tnac...@gmail.com'); wrote: As Adam mention I also think this is not a blocker, as it only affects the way we test the cgroup on CentOS 7.x due to a CentOS bug and doesn't actually impact Mesos normal operations. My vote is +1 as well. Tim On Thu, Jul 16, 2015 at 12:10 PM, Vinod Kone vinodk...@gmail.com javascript:_e(%7B%7D,'cvml','vinodk...@gmail.com'); wrote: Found a bug in HTTP API related code: MESOS-3055 https://issues.apache.org/jira/browse/MESOS-3055 If we don't fix this in 0.23.0, we cannot expect the 0.24.0 scheduler driver (that will send Calls) to properly subscribe with a 0.23.0 master. I could add a work around in the driver to only send Calls if the master version is 0.24.0, but would prefer to not have to do that. Also, on the review https://reviews.apache.org/r/36518/ for that bug, we realized that we might want to make Subscribe.force 'optional' instead of 'required'. That's an API change, which would be nice to go into 0.23.0 as well. So, not a -1 per se, but if you are willing to cut another RC, I can land the fixes today. Sorry for the trouble. On Thu, Jul 16, 2015 at 11:48 AM, Adam Bordelon a...@mesosphere.io javascript:_e(%7B%7D,'cvml','a...@mesosphere.io'); wrote: +1 (binding) This vote has been silent for almost a week. I assume everybody's busy testing. My testing results: basic integration tests passed for Mesos 0.23.0 on CoreOS with DCOS GUI/CLI, Marathon, Chronos, Spark, HDFS, Cassandra, and Kafka. `make check` passes on Ubuntu and CentOS, but `sudo make check` fails on CentOS 7.1 due to errors in CentOS. See https://issues.apache.org/jira/browse/MESOS-3050 for more details. I'm not convinced this is serious enough to do another release candidate and voting round, but I'll let Tim and others chime in with their thoughts. If we don't get enough deciding votes by 6pm Pacific today, I'll extend the vote for another day. On Thu, Jul 9, 2015 at 6:09 PM, Khanduja, Vaibhav vaibhav.khand...@emc.com javascript:_e(%7B%7D,'cvml','vaibhav.khand...@emc.com'); wrote: +1 Sent from my iPhone. Please excuse the typos and brevity of this message. On Jul 9, 2015, at 6:07 PM, Adam Bordelon a...@mesosphere.io javascript:_e(%7B%7D,'cvml','a...@mesosphere.io'); wrote: Hello Mesos community, Please vote on releasing the following candidate as Apache Mesos 0.23.0. 0.23.0 includes the following: - Per-container network isolation - Dockerized slaves will properly recover Docker containers upon failover. - Upgraded minimum required compilers to GCC 4.8+ or clang 3.5+. as well as experimental support for: - Fetcher Caching - Revocable Resources - SSL encryption - Persistent Volumes - Dynamic Reservations The CHANGELOG for the release is available at: https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.23.0-rc3
Re: [VOTE] Release Apache Mesos 0.23.0 (rc3)
Thanks, Vinod. I've got those commits in the list already. We'll pull in fixes for MESOS-3055 and others for rc4. I'll give it another night for Bernd to commit the fetcher fix and for Niklas to update the oversubscription doc. Then I'll cut rc4 tomorrow and leave the new vote open until next Wednesday. See the dashboard for status on remaining issues: https://issues.apache.org/jira/secure/Dashboard.jspa?selectPageId=12326227 Jeff, see my cherry-pick spreadsheet to see what we're planning to pull into rc4: https://docs.google.com/spreadsheets/d/14yUtwfU0mGQ7x7UcjfzZg2o1TuRMkn5SvJvetARM7JQ/edit#gid=0 If anybody has any other high priority fixes or doc updates that they want in rc4, let me know asap. On Thu, Jul 16, 2015 at 7:58 PM, Jeff Schroeder jeffschroe...@computer.org wrote: What about MESOS-3055 in 0.23? Is that going to get passed up on even if we are going to cut another rc? On Thursday, July 16, 2015, Vinod Kone vinodk...@gmail.com wrote: -1 so that we can cherry pick MESOS-3055. The master crash bug is MESOS-3070 https://issues.apache.org/jira/browse/MESOS-3070 but the fix is non-trivial and the bug has been in the code base prior to 23.0. So I won't make it a blocker. Can't update the spreadsheet. So here are the commits I would like cherry-picked. fc85cc512b7767fc2e3921b15cf6602c0c68593e bfe6c07b79550bb3d1f2ab6f5344d740e6eb6f60 Thanks Adam. On Thu, Jul 16, 2015 at 7:39 PM, Adam Bordelon a...@mesosphere.io wrote: The 7 day voting period has ended with only 2 binding +1s (we needed 3) and no explicit -1s. However, Vinod says they've found a bug that crashes master when a framework uses duplicate task ids. Vinod, can you please share the new JIRA and officially vote -1 for rc3 if you want to call for an rc4? Assuming we'll cut an rc4, I'm tracking the JIRAs/patches to pull in here: https://docs.google.com/spreadsheets/d/14yUtwfU0mGQ7x7UcjfzZg2o1TuRMkn5SvJvetARM7JQ/edit#gid=0 Since the rc4 changes are minor (mostly tests) and we've heavily tested rc3, the next vote will only last for 3 (business) days. On Thu, Jul 16, 2015 at 6:38 PM, Marco Massenzio ma...@mesosphere.io wrote: Just to add my +1 Built Make check on Ubuntu 14.04 With Without SSL / libevent (no 'sudo' - can test all 4 variants this evening on rc4) — Sent from Mailbox https://www.dropbox.com/mailbox On Thu, Jul 16, 2015 at 3:10 PM, Timothy Chen tnac...@gmail.com wrote: As Adam mention I also think this is not a blocker, as it only affects the way we test the cgroup on CentOS 7.x due to a CentOS bug and doesn't actually impact Mesos normal operations. My vote is +1 as well. Tim On Thu, Jul 16, 2015 at 12:10 PM, Vinod Kone vinodk...@gmail.com wrote: Found a bug in HTTP API related code: MESOS-3055 https://issues.apache.org/jira/browse/MESOS-3055 If we don't fix this in 0.23.0, we cannot expect the 0.24.0 scheduler driver (that will send Calls) to properly subscribe with a 0.23.0 master. I could add a work around in the driver to only send Calls if the master version is 0.24.0, but would prefer to not have to do that. Also, on the review https://reviews.apache.org/r/36518/ for that bug, we realized that we might want to make Subscribe.force 'optional' instead of 'required'. That's an API change, which would be nice to go into 0.23.0 as well. So, not a -1 per se, but if you are willing to cut another RC, I can land the fixes today. Sorry for the trouble. On Thu, Jul 16, 2015 at 11:48 AM, Adam Bordelon a...@mesosphere.io wrote: +1 (binding) This vote has been silent for almost a week. I assume everybody's busy testing. My testing results: basic integration tests passed for Mesos 0.23.0 on CoreOS with DCOS GUI/CLI, Marathon, Chronos, Spark, HDFS, Cassandra, and Kafka. `make check` passes on Ubuntu and CentOS, but `sudo make check` fails on CentOS 7.1 due to errors in CentOS. See https://issues.apache.org/jira/browse/MESOS-3050 for more details. I'm not convinced this is serious enough to do another release candidate and voting round, but I'll let Tim and others chime in with their thoughts. If we don't get enough deciding votes by 6pm Pacific today, I'll extend the vote for another day. On Thu, Jul 9, 2015 at 6:09 PM, Khanduja, Vaibhav vaibhav.khand...@emc.com wrote: +1 Sent from my iPhone. Please excuse the typos and brevity of this message. On Jul 9, 2015, at 6:07 PM, Adam Bordelon a...@mesosphere.io wrote: Hello Mesos community, Please vote on releasing the following candidate as Apache Mesos 0.23.0. 0.23.0 includes the following: - Per-container network isolation - Dockerized slaves will properly recover Docker containers upon failover.
Re: [VOTE] Release Apache Mesos 0.23.0 (rc3)
Adam - thanks. Please let me know soon as you push an rc4, if I'm still home, I can test it against Ubuntu 14.04 with/without SSL, with/without sudo (or I can always VPN in :) Very minor doc update: https://reviews.apache.org/r/36532/ (feel free to ignore). Thanks, everyone! *Marco Massenzio* *Distributed Systems Engineer* On Thu, Jul 16, 2015 at 8:05 PM, Adam Bordelon a...@mesosphere.io wrote: Thanks, Vinod. I've got those commits in the list already. We'll pull in fixes for MESOS-3055 and others for rc4. I'll give it another night for Bernd to commit the fetcher fix and for Niklas to update the oversubscription doc. Then I'll cut rc4 tomorrow and leave the new vote open until next Wednesday. See the dashboard for status on remaining issues: https://issues.apache.org/jira/secure/Dashboard.jspa?selectPageId=12326227 Jeff, see my cherry-pick spreadsheet to see what we're planning to pull into rc4: https://docs.google.com/spreadsheets/d/14yUtwfU0mGQ7x7UcjfzZg2o1TuRMkn5SvJvetARM7JQ/edit#gid=0 If anybody has any other high priority fixes or doc updates that they want in rc4, let me know asap. On Thu, Jul 16, 2015 at 7:58 PM, Jeff Schroeder jeffschroe...@computer.org wrote: What about MESOS-3055 in 0.23? Is that going to get passed up on even if we are going to cut another rc? On Thursday, July 16, 2015, Vinod Kone vinodk...@gmail.com wrote: -1 so that we can cherry pick MESOS-3055. The master crash bug is MESOS-3070 https://issues.apache.org/jira/browse/MESOS-3070 but the fix is non-trivial and the bug has been in the code base prior to 23.0. So I won't make it a blocker. Can't update the spreadsheet. So here are the commits I would like cherry-picked. fc85cc512b7767fc2e3921b15cf6602c0c68593e bfe6c07b79550bb3d1f2ab6f5344d740e6eb6f60 Thanks Adam. On Thu, Jul 16, 2015 at 7:39 PM, Adam Bordelon a...@mesosphere.io wrote: The 7 day voting period has ended with only 2 binding +1s (we needed 3) and no explicit -1s. However, Vinod says they've found a bug that crashes master when a framework uses duplicate task ids. Vinod, can you please share the new JIRA and officially vote -1 for rc3 if you want to call for an rc4? Assuming we'll cut an rc4, I'm tracking the JIRAs/patches to pull in here: https://docs.google.com/spreadsheets/d/14yUtwfU0mGQ7x7UcjfzZg2o1TuRMkn5SvJvetARM7JQ/edit#gid=0 Since the rc4 changes are minor (mostly tests) and we've heavily tested rc3, the next vote will only last for 3 (business) days. On Thu, Jul 16, 2015 at 6:38 PM, Marco Massenzio ma...@mesosphere.io wrote: Just to add my +1 Built Make check on Ubuntu 14.04 With Without SSL / libevent (no 'sudo' - can test all 4 variants this evening on rc4) — Sent from Mailbox https://www.dropbox.com/mailbox On Thu, Jul 16, 2015 at 3:10 PM, Timothy Chen tnac...@gmail.com wrote: As Adam mention I also think this is not a blocker, as it only affects the way we test the cgroup on CentOS 7.x due to a CentOS bug and doesn't actually impact Mesos normal operations. My vote is +1 as well. Tim On Thu, Jul 16, 2015 at 12:10 PM, Vinod Kone vinodk...@gmail.com wrote: Found a bug in HTTP API related code: MESOS-3055 https://issues.apache.org/jira/browse/MESOS-3055 If we don't fix this in 0.23.0, we cannot expect the 0.24.0 scheduler driver (that will send Calls) to properly subscribe with a 0.23.0 master. I could add a work around in the driver to only send Calls if the master version is 0.24.0, but would prefer to not have to do that. Also, on the review https://reviews.apache.org/r/36518/ for that bug, we realized that we might want to make Subscribe.force 'optional' instead of 'required'. That's an API change, which would be nice to go into 0.23.0 as well. So, not a -1 per se, but if you are willing to cut another RC, I can land the fixes today. Sorry for the trouble. On Thu, Jul 16, 2015 at 11:48 AM, Adam Bordelon a...@mesosphere.io wrote: +1 (binding) This vote has been silent for almost a week. I assume everybody's busy testing. My testing results: basic integration tests passed for Mesos 0.23.0 on CoreOS with DCOS GUI/CLI, Marathon, Chronos, Spark, HDFS, Cassandra, and Kafka. `make check` passes on Ubuntu and CentOS, but `sudo make check` fails on CentOS 7.1 due to errors in CentOS. See https://issues.apache.org/jira/browse/MESOS-3050 for more details. I'm not convinced this is serious enough to do another release candidate and voting round, but I'll let Tim and others chime in with their thoughts. If we don't get enough deciding votes by 6pm Pacific today, I'll extend the vote for another day. On Thu, Jul 9, 2015 at 6:09 PM, Khanduja, Vaibhav vaibhav.khand...@emc.com wrote: +1 Sent from my iPhone. Please excuse the typos and brevity of this message. On Jul 9, 2015, at 6:07 PM,
Re: [VOTE] Release Apache Mesos 0.23.0 (rc3)
As Adam mention I also think this is not a blocker, as it only affects the way we test the cgroup on CentOS 7.x due to a CentOS bug and doesn't actually impact Mesos normal operations. My vote is +1 as well. Tim On Thu, Jul 16, 2015 at 12:10 PM, Vinod Kone vinodk...@gmail.com wrote: Found a bug in HTTP API related code: MESOS-3055 https://issues.apache.org/jira/browse/MESOS-3055 If we don't fix this in 0.23.0, we cannot expect the 0.24.0 scheduler driver (that will send Calls) to properly subscribe with a 0.23.0 master. I could add a work around in the driver to only send Calls if the master version is 0.24.0, but would prefer to not have to do that. Also, on the review https://reviews.apache.org/r/36518/ for that bug, we realized that we might want to make Subscribe.force 'optional' instead of 'required'. That's an API change, which would be nice to go into 0.23.0 as well. So, not a -1 per se, but if you are willing to cut another RC, I can land the fixes today. Sorry for the trouble. On Thu, Jul 16, 2015 at 11:48 AM, Adam Bordelon a...@mesosphere.io wrote: +1 (binding) This vote has been silent for almost a week. I assume everybody's busy testing. My testing results: basic integration tests passed for Mesos 0.23.0 on CoreOS with DCOS GUI/CLI, Marathon, Chronos, Spark, HDFS, Cassandra, and Kafka. `make check` passes on Ubuntu and CentOS, but `sudo make check` fails on CentOS 7.1 due to errors in CentOS. See https://issues.apache.org/jira/browse/MESOS-3050 for more details. I'm not convinced this is serious enough to do another release candidate and voting round, but I'll let Tim and others chime in with their thoughts. If we don't get enough deciding votes by 6pm Pacific today, I'll extend the vote for another day. On Thu, Jul 9, 2015 at 6:09 PM, Khanduja, Vaibhav vaibhav.khand...@emc.com wrote: +1 Sent from my iPhone. Please excuse the typos and brevity of this message. On Jul 9, 2015, at 6:07 PM, Adam Bordelon a...@mesosphere.io wrote: Hello Mesos community, Please vote on releasing the following candidate as Apache Mesos 0.23.0. 0.23.0 includes the following: - Per-container network isolation - Dockerized slaves will properly recover Docker containers upon failover. - Upgraded minimum required compilers to GCC 4.8+ or clang 3.5+. as well as experimental support for: - Fetcher Caching - Revocable Resources - SSL encryption - Persistent Volumes - Dynamic Reservations The CHANGELOG for the release is available at: https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.23.0-rc3 The candidate for Mesos 0.23.0 release is available at: https://dist.apache.org/repos/dist/dev/mesos/0.23.0-rc3/mesos-0.23.0.tar.gz The tag to be voted on is 0.23.0-rc3: https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.23.0-rc3 The MD5 checksum of the tarball can be found at: https://dist.apache.org/repos/dist/dev/mesos/0.23.0-rc3/mesos-0.23.0.tar.gz.md5 The signature of the tarball can be found at: https://dist.apache.org/repos/dist/dev/mesos/0.23.0-rc3/mesos-0.23.0.tar.gz.asc The PGP key used to sign the release is here: https://dist.apache.org/repos/dist/release/mesos/KEYS The JAR is up in Maven in a staging repository here: https://repository.apache.org/content/repositories/orgapachemesos-1060 Please vote on releasing this package as Apache Mesos 0.23.0! The vote is open until Thurs July 16th, 18:00 PDT 2015 and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Mesos 0.23.0 [ ] -1 Do not release this package because ... Thanks, -Adam-