[jira] [Commented] (MESOS-2376) Allow libprocess ip and port to be configured
[ https://issues.apache.org/jira/browse/MESOS-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328840#comment-14328840 ] Michael Roth commented on MESOS-2376: - I think Dario is referring to the MesosSchedulerDriver constructor. Wouldn't it be possible to enhance the FrameworkInfo message type like Master/SlaveInfo containing host and port parameters but as optional to be backward compatible? Allow libprocess ip and port to be configured - Key: MESOS-2376 URL: https://issues.apache.org/jira/browse/MESOS-2376 Project: Mesos Issue Type: Improvement Components: java api Reporter: Dario Rexin Priority: Minor Currently if we want to configure the ip libprocess uses for communication, we have to set the env var LIBPROCESS_IP, or LIBPROCESS_PORT for the port. For the Java API this means, that the variable has to be set before the JVM is started, because setting env vars from within JAVA is not possible / non-trivial. Therefore it would be great to be able to pass them in to the constructor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-2376) Allow libprocess ip and port to be configured
[ https://issues.apache.org/jira/browse/MESOS-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328840#comment-14328840 ] Michael Roth edited comment on MESOS-2376 at 2/20/15 12:31 PM: --- I think Dario is referring to the MesosSchedulerDriver constructor. Wouldn't it be possible to enhance the FrameworkInfo message type like Master/SlaveInfo containing ip and port parameters but as optional to be backward compatible? was (Author: mroth): I think Dario is referring to the MesosSchedulerDriver constructor. Wouldn't it be possible to enhance the FrameworkInfo message type like Master/SlaveInfo containing host and port parameters but as optional to be backward compatible? Allow libprocess ip and port to be configured - Key: MESOS-2376 URL: https://issues.apache.org/jira/browse/MESOS-2376 Project: Mesos Issue Type: Improvement Components: java api Reporter: Dario Rexin Priority: Minor Currently if we want to configure the ip libprocess uses for communication, we have to set the env var LIBPROCESS_IP, or LIBPROCESS_PORT for the port. For the Java API this means, that the variable has to be set before the JVM is started, because setting env vars from within JAVA is not possible / non-trivial. Therefore it would be great to be able to pass them in to the constructor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2378) ASF build break: reference to 'slave' is ambiguous
Niklas Quarfot Nielsen created MESOS-2378: - Summary: ASF build break: reference to 'slave' is ambiguous Key: MESOS-2378 URL: https://issues.apache.org/jira/browse/MESOS-2378 Project: Mesos Issue Type: Bug Reporter: Niklas Quarfot Nielsen {code} g++ -DPACKAGE_NAME=\mesos\ -DPACKAGE_TARNAME=\mesos\ -DPACKAGE_VERSION=\0.22.0\ -DPACKAGE_STRING=\mesos\ 0.22.0\ -DPACKAGE_BUGREPORT=\\ -DPACKAGE_URL=\\ -DPACKAGE=\mesos\ -DVERSION=\0.22.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -DMESOS_HAS_JAVA=1 -DHAVE_PYTHON=\2.7\ -DMESOS_HAS_PYTHON=1 -I. -I../../src -Wall -Werror -DLIBDIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/lib\ -DPKGLIBEXECDIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/libexec/mesos\ -DPKGDATADIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/share/mesos\ -I../../include -I../../3rdparty/libprocess/include -I../../3rdparty/libprocess/3rdparty/stout/include -I../include -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 -I../3rdparty/libprocess/3rdparty/picojson-4f93734 -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include -I../3rdparty/zookeeper-3.4.5/src/c/generated -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -DSOURCE_DIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_build/..\ -DBUILD_DIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_build\ -I../3rdparty/libprocess/3rdparty/gmock-1.6.0/gtest/include -I../3rdparty/libprocess/3rdparty/gmock-1.6.0/include -I/home/jenkins/tools/java/jdk1.6.0_20-64/include -I/home/jenkins/tools/java/jdk1.6.0_20-64/include/linux -DZOOKEEPER_VERSION=\3.4.5\ -I/usr/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0 -pthread -g1 -O0 -Wno-unused-local-typedefs -std=c++11 -MT tests/mesos_tests-composing_containerizer_tests.o -MD -MP -MF tests/.deps/mesos_tests-composing_containerizer_tests.Tpo -c -o tests/mesos_tests-composing_containerizer_tests.o `test -f 'tests/composing_containerizer_tests.cpp' || echo '../../src/'`tests/composing_containerizer_tests.cpp mv -f examples/.deps/balloon_executor-balloon_executor.Tpo examples/.deps/balloon_executor-balloon_executor.Po g++ -DPACKAGE_NAME=\mesos\ -DPACKAGE_TARNAME=\mesos\ -DPACKAGE_VERSION=\0.22.0\ -DPACKAGE_STRING=\mesos\ 0.22.0\ -DPACKAGE_BUGREPORT=\\ -DPACKAGE_URL=\\ -DPACKAGE=\mesos\ -DVERSION=\0.22.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -DMESOS_HAS_JAVA=1 -DHAVE_PYTHON=\2.7\ -DMESOS_HAS_PYTHON=1 -I. -I../../src -Wall -Werror -DLIBDIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/lib\ -DPKGLIBEXECDIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/libexec/mesos\ -DPKGDATADIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/share/mesos\ -I../../include -I../../3rdparty/libprocess/include -I../../3rdparty/libprocess/3rdparty/stout/include -I../include -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 -I../3rdparty/libprocess/3rdparty/picojson-4f93734 -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include -I../3rdparty/zookeeper-3.4.5/src/c/generated -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -DSOURCE_DIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_build/..\ -DBUILD_DIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_build\ -I../3rdparty/libprocess/3rdparty/gmock-1.6.0/gtest/include -I../3rdparty/libprocess/3rdparty/gmock-1.6.0/include -I/home/jenkins/tools/java/jdk1.6.0_20-64/include -I/home/jenkins/tools/java/jdk1.6.0_20-64/include/linux -DZOOKEEPER_VERSION=\3.4.5\ -I/usr/include/subversion-1 -I/usr/include/apr-1
[jira] [Commented] (MESOS-2378) ASF build break: reference to 'slave' is ambiguous
[ https://issues.apache.org/jira/browse/MESOS-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329703#comment-14329703 ] Niklas Quarfot Nielsen commented on MESOS-2378: --- [~karya] This may be relevant for the 'internal' rework ASF build break: reference to 'slave' is ambiguous -- Key: MESOS-2378 URL: https://issues.apache.org/jira/browse/MESOS-2378 Project: Mesos Issue Type: Bug Reporter: Niklas Quarfot Nielsen {code} g++ -DPACKAGE_NAME=\mesos\ -DPACKAGE_TARNAME=\mesos\ -DPACKAGE_VERSION=\0.22.0\ -DPACKAGE_STRING=\mesos\ 0.22.0\ -DPACKAGE_BUGREPORT=\\ -DPACKAGE_URL=\\ -DPACKAGE=\mesos\ -DVERSION=\0.22.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -DMESOS_HAS_JAVA=1 -DHAVE_PYTHON=\2.7\ -DMESOS_HAS_PYTHON=1 -I. -I../../src -Wall -Werror -DLIBDIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/lib\ -DPKGLIBEXECDIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/libexec/mesos\ -DPKGDATADIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/share/mesos\ -I../../include -I../../3rdparty/libprocess/include -I../../3rdparty/libprocess/3rdparty/stout/include -I../include -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 -I../3rdparty/libprocess/3rdparty/picojson-4f93734 -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include -I../3rdparty/zookeeper-3.4.5/src/c/generated -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -DSOURCE_DIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_build/..\ -DBUILD_DIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_build\ -I../3rdparty/libprocess/3rdparty/gmock-1.6.0/gtest/include -I../3rdparty/libprocess/3rdparty/gmock-1.6.0/include -I/home/jenkins/tools/java/jdk1.6.0_20-64/include -I/home/jenkins/tools/java/jdk1.6.0_20-64/include/linux -DZOOKEEPER_VERSION=\3.4.5\ -I/usr/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0 -pthread -g1 -O0 -Wno-unused-local-typedefs -std=c++11 -MT tests/mesos_tests-composing_containerizer_tests.o -MD -MP -MF tests/.deps/mesos_tests-composing_containerizer_tests.Tpo -c -o tests/mesos_tests-composing_containerizer_tests.o `test -f 'tests/composing_containerizer_tests.cpp' || echo '../../src/'`tests/composing_containerizer_tests.cpp mv -f examples/.deps/balloon_executor-balloon_executor.Tpo examples/.deps/balloon_executor-balloon_executor.Po g++ -DPACKAGE_NAME=\mesos\ -DPACKAGE_TARNAME=\mesos\ -DPACKAGE_VERSION=\0.22.0\ -DPACKAGE_STRING=\mesos\ 0.22.0\ -DPACKAGE_BUGREPORT=\\ -DPACKAGE_URL=\\ -DPACKAGE=\mesos\ -DVERSION=\0.22.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -DMESOS_HAS_JAVA=1 -DHAVE_PYTHON=\2.7\ -DMESOS_HAS_PYTHON=1 -I. -I../../src -Wall -Werror -DLIBDIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/lib\ -DPKGLIBEXECDIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/libexec/mesos\ -DPKGDATADIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/share/mesos\ -I../../include -I../../3rdparty/libprocess/include -I../../3rdparty/libprocess/3rdparty/stout/include -I../include -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 -I../3rdparty/libprocess/3rdparty/picojson-4f93734 -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include -I../3rdparty/zookeeper-3.4.5/src/c/generated -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -DSOURCE_DIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_build/..\
[jira] [Commented] (MESOS-2378) ASF build break: reference to 'slave' is ambiguous
[ https://issues.apache.org/jira/browse/MESOS-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329807#comment-14329807 ] Kapil Arya commented on MESOS-2378: --- Ohh nevermind. It looks like the build for the RR that I pushed earlier. That is supposed to fail since I didn't fix src/tests/ coz I am still waiting for the response on the way forward. ASF build break: reference to 'slave' is ambiguous -- Key: MESOS-2378 URL: https://issues.apache.org/jira/browse/MESOS-2378 Project: Mesos Issue Type: Bug Reporter: Niklas Quarfot Nielsen {code} g++ -DPACKAGE_NAME=\mesos\ -DPACKAGE_TARNAME=\mesos\ -DPACKAGE_VERSION=\0.22.0\ -DPACKAGE_STRING=\mesos\ 0.22.0\ -DPACKAGE_BUGREPORT=\\ -DPACKAGE_URL=\\ -DPACKAGE=\mesos\ -DVERSION=\0.22.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -DMESOS_HAS_JAVA=1 -DHAVE_PYTHON=\2.7\ -DMESOS_HAS_PYTHON=1 -I. -I../../src -Wall -Werror -DLIBDIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/lib\ -DPKGLIBEXECDIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/libexec/mesos\ -DPKGDATADIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/share/mesos\ -I../../include -I../../3rdparty/libprocess/include -I../../3rdparty/libprocess/3rdparty/stout/include -I../include -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 -I../3rdparty/libprocess/3rdparty/picojson-4f93734 -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include -I../3rdparty/zookeeper-3.4.5/src/c/generated -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -DSOURCE_DIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_build/..\ -DBUILD_DIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_build\ -I../3rdparty/libprocess/3rdparty/gmock-1.6.0/gtest/include -I../3rdparty/libprocess/3rdparty/gmock-1.6.0/include -I/home/jenkins/tools/java/jdk1.6.0_20-64/include -I/home/jenkins/tools/java/jdk1.6.0_20-64/include/linux -DZOOKEEPER_VERSION=\3.4.5\ -I/usr/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0 -pthread -g1 -O0 -Wno-unused-local-typedefs -std=c++11 -MT tests/mesos_tests-composing_containerizer_tests.o -MD -MP -MF tests/.deps/mesos_tests-composing_containerizer_tests.Tpo -c -o tests/mesos_tests-composing_containerizer_tests.o `test -f 'tests/composing_containerizer_tests.cpp' || echo '../../src/'`tests/composing_containerizer_tests.cpp mv -f examples/.deps/balloon_executor-balloon_executor.Tpo examples/.deps/balloon_executor-balloon_executor.Po g++ -DPACKAGE_NAME=\mesos\ -DPACKAGE_TARNAME=\mesos\ -DPACKAGE_VERSION=\0.22.0\ -DPACKAGE_STRING=\mesos\ 0.22.0\ -DPACKAGE_BUGREPORT=\\ -DPACKAGE_URL=\\ -DPACKAGE=\mesos\ -DVERSION=\0.22.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -DMESOS_HAS_JAVA=1 -DHAVE_PYTHON=\2.7\ -DMESOS_HAS_PYTHON=1 -I. -I../../src -Wall -Werror -DLIBDIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/lib\ -DPKGLIBEXECDIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/libexec/mesos\ -DPKGDATADIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/share/mesos\ -I../../include -I../../3rdparty/libprocess/include -I../../3rdparty/libprocess/3rdparty/stout/include -I../include -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 -I../3rdparty/libprocess/3rdparty/picojson-4f93734 -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include -I../3rdparty/zookeeper-3.4.5/src/c/generated -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src
[jira] [Closed] (MESOS-2378) ASF build break: reference to 'slave' is ambiguous
[ https://issues.apache.org/jira/browse/MESOS-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen closed MESOS-2378. - Resolution: Won't Fix ASF build break: reference to 'slave' is ambiguous -- Key: MESOS-2378 URL: https://issues.apache.org/jira/browse/MESOS-2378 Project: Mesos Issue Type: Bug Reporter: Niklas Quarfot Nielsen {code} g++ -DPACKAGE_NAME=\mesos\ -DPACKAGE_TARNAME=\mesos\ -DPACKAGE_VERSION=\0.22.0\ -DPACKAGE_STRING=\mesos\ 0.22.0\ -DPACKAGE_BUGREPORT=\\ -DPACKAGE_URL=\\ -DPACKAGE=\mesos\ -DVERSION=\0.22.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -DMESOS_HAS_JAVA=1 -DHAVE_PYTHON=\2.7\ -DMESOS_HAS_PYTHON=1 -I. -I../../src -Wall -Werror -DLIBDIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/lib\ -DPKGLIBEXECDIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/libexec/mesos\ -DPKGDATADIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/share/mesos\ -I../../include -I../../3rdparty/libprocess/include -I../../3rdparty/libprocess/3rdparty/stout/include -I../include -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 -I../3rdparty/libprocess/3rdparty/picojson-4f93734 -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include -I../3rdparty/zookeeper-3.4.5/src/c/generated -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -DSOURCE_DIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_build/..\ -DBUILD_DIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_build\ -I../3rdparty/libprocess/3rdparty/gmock-1.6.0/gtest/include -I../3rdparty/libprocess/3rdparty/gmock-1.6.0/include -I/home/jenkins/tools/java/jdk1.6.0_20-64/include -I/home/jenkins/tools/java/jdk1.6.0_20-64/include/linux -DZOOKEEPER_VERSION=\3.4.5\ -I/usr/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0 -pthread -g1 -O0 -Wno-unused-local-typedefs -std=c++11 -MT tests/mesos_tests-composing_containerizer_tests.o -MD -MP -MF tests/.deps/mesos_tests-composing_containerizer_tests.Tpo -c -o tests/mesos_tests-composing_containerizer_tests.o `test -f 'tests/composing_containerizer_tests.cpp' || echo '../../src/'`tests/composing_containerizer_tests.cpp mv -f examples/.deps/balloon_executor-balloon_executor.Tpo examples/.deps/balloon_executor-balloon_executor.Po g++ -DPACKAGE_NAME=\mesos\ -DPACKAGE_TARNAME=\mesos\ -DPACKAGE_VERSION=\0.22.0\ -DPACKAGE_STRING=\mesos\ 0.22.0\ -DPACKAGE_BUGREPORT=\\ -DPACKAGE_URL=\\ -DPACKAGE=\mesos\ -DVERSION=\0.22.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -DMESOS_HAS_JAVA=1 -DHAVE_PYTHON=\2.7\ -DMESOS_HAS_PYTHON=1 -I. -I../../src -Wall -Werror -DLIBDIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/lib\ -DPKGLIBEXECDIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/libexec/mesos\ -DPKGDATADIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/share/mesos\ -I../../include -I../../3rdparty/libprocess/include -I../../3rdparty/libprocess/3rdparty/stout/include -I../include -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 -I../3rdparty/libprocess/3rdparty/picojson-4f93734 -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include -I../3rdparty/zookeeper-3.4.5/src/c/generated -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -DSOURCE_DIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_build/..\ -DBUILD_DIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_build\
[jira] [Commented] (MESOS-2103) Expose number of processes and threads in a container
[ https://issues.apache.org/jira/browse/MESOS-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329887#comment-14329887 ] Chi Zhang commented on MESOS-2103: -- https://reviews.apache.org/r/31250/ Expose number of processes and threads in a container - Key: MESOS-2103 URL: https://issues.apache.org/jira/browse/MESOS-2103 Project: Mesos Issue Type: Improvement Components: isolation Affects Versions: 0.20.0 Reporter: Ian Downes Assignee: Chi Zhang Labels: twitter The CFS cpu statistics (cpus_nr_throttled, cpus_nr_periods, cpus_throttled_time) are difficult to interpret. 1) nr_throttled is the number of intervals where *any* throttling occurred 2) throttled_time is the aggregate time *across all runnable tasks* (tasks in the Linux sense). For example, in a typical 60 second sampling interval: nr_periods = 600, nr_throttled could be 60, i.e., 10% of intervals, but throttled_time could be much higher than (60/600) * 60 = 6 seconds if there is more than one task that is runnable but throttled. *Each* throttled task contributes to the total throttled time. Small test to demonstrate throttled_time nr_periods * quota_interval: 5 x {{'openssl speed'}} running with quota=100ms: {noformat} cat cpu.stat sleep 1 cat cpu.stat nr_periods 3228 nr_throttled 1276 throttled_time 528843772540 nr_periods 3238 nr_throttled 1286 throttled_time 531668964667 {noformat} All 10 intervals throttled (100%) for total time of 2.8 seconds in 1 second (more than 100% of the time interval) It would be helpful to expose the number of processes and tasks in the container cgroup. This would be at a very coarse granularity but would give some guidance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2378) ASF build break: reference to 'slave' is ambiguous
[ https://issues.apache.org/jira/browse/MESOS-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329882#comment-14329882 ] Niklas Quarfot Nielsen commented on MESOS-2378: --- Yeah - not sure why we get buildbot emails for that then. Will mark as won't fix ASF build break: reference to 'slave' is ambiguous -- Key: MESOS-2378 URL: https://issues.apache.org/jira/browse/MESOS-2378 Project: Mesos Issue Type: Bug Reporter: Niklas Quarfot Nielsen {code} g++ -DPACKAGE_NAME=\mesos\ -DPACKAGE_TARNAME=\mesos\ -DPACKAGE_VERSION=\0.22.0\ -DPACKAGE_STRING=\mesos\ 0.22.0\ -DPACKAGE_BUGREPORT=\\ -DPACKAGE_URL=\\ -DPACKAGE=\mesos\ -DVERSION=\0.22.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -DMESOS_HAS_JAVA=1 -DHAVE_PYTHON=\2.7\ -DMESOS_HAS_PYTHON=1 -I. -I../../src -Wall -Werror -DLIBDIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/lib\ -DPKGLIBEXECDIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/libexec/mesos\ -DPKGDATADIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/share/mesos\ -I../../include -I../../3rdparty/libprocess/include -I../../3rdparty/libprocess/3rdparty/stout/include -I../include -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 -I../3rdparty/libprocess/3rdparty/picojson-4f93734 -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include -I../3rdparty/zookeeper-3.4.5/src/c/generated -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -DSOURCE_DIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_build/..\ -DBUILD_DIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_build\ -I../3rdparty/libprocess/3rdparty/gmock-1.6.0/gtest/include -I../3rdparty/libprocess/3rdparty/gmock-1.6.0/include -I/home/jenkins/tools/java/jdk1.6.0_20-64/include -I/home/jenkins/tools/java/jdk1.6.0_20-64/include/linux -DZOOKEEPER_VERSION=\3.4.5\ -I/usr/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0 -pthread -g1 -O0 -Wno-unused-local-typedefs -std=c++11 -MT tests/mesos_tests-composing_containerizer_tests.o -MD -MP -MF tests/.deps/mesos_tests-composing_containerizer_tests.Tpo -c -o tests/mesos_tests-composing_containerizer_tests.o `test -f 'tests/composing_containerizer_tests.cpp' || echo '../../src/'`tests/composing_containerizer_tests.cpp mv -f examples/.deps/balloon_executor-balloon_executor.Tpo examples/.deps/balloon_executor-balloon_executor.Po g++ -DPACKAGE_NAME=\mesos\ -DPACKAGE_TARNAME=\mesos\ -DPACKAGE_VERSION=\0.22.0\ -DPACKAGE_STRING=\mesos\ 0.22.0\ -DPACKAGE_BUGREPORT=\\ -DPACKAGE_URL=\\ -DPACKAGE=\mesos\ -DVERSION=\0.22.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -DMESOS_HAS_JAVA=1 -DHAVE_PYTHON=\2.7\ -DMESOS_HAS_PYTHON=1 -I. -I../../src -Wall -Werror -DLIBDIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/lib\ -DPKGLIBEXECDIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/libexec/mesos\ -DPKGDATADIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/share/mesos\ -I../../include -I../../3rdparty/libprocess/include -I../../3rdparty/libprocess/3rdparty/stout/include -I../include -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 -I../3rdparty/libprocess/3rdparty/picojson-4f93734 -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include -I../3rdparty/zookeeper-3.4.5/src/c/generated -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -DSOURCE_DIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_build/..\
[jira] [Updated] (MESOS-2136) Expose per-cgroup memory pressure
[ https://issues.apache.org/jira/browse/MESOS-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chi Zhang updated MESOS-2136: - Target Version/s: 0.23.0 (was: 0.22.0) Expose per-cgroup memory pressure - Key: MESOS-2136 URL: https://issues.apache.org/jira/browse/MESOS-2136 Project: Mesos Issue Type: Improvement Components: isolation Reporter: Ian Downes Assignee: Chi Zhang Labels: twitter The cgroup memory controller can provide information on the memory pressure of a cgroup. This is in the form of an event based notification where events of (low, medium, critical) are generated when the kernel makes specific actions to allocate memory. This signal is probably more informative than comparing memory usage to memory limit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2136) Expose per-cgroup memory pressure
[ https://issues.apache.org/jira/browse/MESOS-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329889#comment-14329889 ] Chi Zhang commented on MESOS-2136: -- [~nnielsen] Some patches are still in review. I will bump. Thanks! Expose per-cgroup memory pressure - Key: MESOS-2136 URL: https://issues.apache.org/jira/browse/MESOS-2136 Project: Mesos Issue Type: Improvement Components: isolation Reporter: Ian Downes Assignee: Chi Zhang Labels: twitter The cgroup memory controller can provide information on the memory pressure of a cgroup. This is in the form of an event based notification where events of (low, medium, critical) are generated when the kernel makes specific actions to allocate memory. This signal is probably more informative than comparing memory usage to memory limit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2378) ASF build break: reference to 'slave' is ambiguous
[ https://issues.apache.org/jira/browse/MESOS-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329804#comment-14329804 ] Kapil Arya commented on MESOS-2378: --- Do you know which tree produced it? ASF build break: reference to 'slave' is ambiguous -- Key: MESOS-2378 URL: https://issues.apache.org/jira/browse/MESOS-2378 Project: Mesos Issue Type: Bug Reporter: Niklas Quarfot Nielsen {code} g++ -DPACKAGE_NAME=\mesos\ -DPACKAGE_TARNAME=\mesos\ -DPACKAGE_VERSION=\0.22.0\ -DPACKAGE_STRING=\mesos\ 0.22.0\ -DPACKAGE_BUGREPORT=\\ -DPACKAGE_URL=\\ -DPACKAGE=\mesos\ -DVERSION=\0.22.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -DMESOS_HAS_JAVA=1 -DHAVE_PYTHON=\2.7\ -DMESOS_HAS_PYTHON=1 -I. -I../../src -Wall -Werror -DLIBDIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/lib\ -DPKGLIBEXECDIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/libexec/mesos\ -DPKGDATADIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/share/mesos\ -I../../include -I../../3rdparty/libprocess/include -I../../3rdparty/libprocess/3rdparty/stout/include -I../include -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 -I../3rdparty/libprocess/3rdparty/picojson-4f93734 -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include -I../3rdparty/zookeeper-3.4.5/src/c/generated -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -DSOURCE_DIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_build/..\ -DBUILD_DIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_build\ -I../3rdparty/libprocess/3rdparty/gmock-1.6.0/gtest/include -I../3rdparty/libprocess/3rdparty/gmock-1.6.0/include -I/home/jenkins/tools/java/jdk1.6.0_20-64/include -I/home/jenkins/tools/java/jdk1.6.0_20-64/include/linux -DZOOKEEPER_VERSION=\3.4.5\ -I/usr/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0 -pthread -g1 -O0 -Wno-unused-local-typedefs -std=c++11 -MT tests/mesos_tests-composing_containerizer_tests.o -MD -MP -MF tests/.deps/mesos_tests-composing_containerizer_tests.Tpo -c -o tests/mesos_tests-composing_containerizer_tests.o `test -f 'tests/composing_containerizer_tests.cpp' || echo '../../src/'`tests/composing_containerizer_tests.cpp mv -f examples/.deps/balloon_executor-balloon_executor.Tpo examples/.deps/balloon_executor-balloon_executor.Po g++ -DPACKAGE_NAME=\mesos\ -DPACKAGE_TARNAME=\mesos\ -DPACKAGE_VERSION=\0.22.0\ -DPACKAGE_STRING=\mesos\ 0.22.0\ -DPACKAGE_BUGREPORT=\\ -DPACKAGE_URL=\\ -DPACKAGE=\mesos\ -DVERSION=\0.22.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -DMESOS_HAS_JAVA=1 -DHAVE_PYTHON=\2.7\ -DMESOS_HAS_PYTHON=1 -I. -I../../src -Wall -Werror -DLIBDIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/lib\ -DPKGLIBEXECDIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/libexec/mesos\ -DPKGDATADIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_inst/share/mesos\ -I../../include -I../../3rdparty/libprocess/include -I../../3rdparty/libprocess/3rdparty/stout/include -I../include -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 -I../3rdparty/libprocess/3rdparty/picojson-4f93734 -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include -I../3rdparty/zookeeper-3.4.5/src/c/generated -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -DSOURCE_DIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_build/..\ -DBUILD_DIR=\/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.22.0/_build\
[jira] [Commented] (MESOS-2103) Expose number of processes and threads in a container
[ https://issues.apache.org/jira/browse/MESOS-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329888#comment-14329888 ] Chi Zhang commented on MESOS-2103: -- [~nnielsen] thanks! I don't think we can make it by Sunday. Expose number of processes and threads in a container - Key: MESOS-2103 URL: https://issues.apache.org/jira/browse/MESOS-2103 Project: Mesos Issue Type: Improvement Components: isolation Affects Versions: 0.20.0 Reporter: Ian Downes Assignee: Chi Zhang Labels: twitter The CFS cpu statistics (cpus_nr_throttled, cpus_nr_periods, cpus_throttled_time) are difficult to interpret. 1) nr_throttled is the number of intervals where *any* throttling occurred 2) throttled_time is the aggregate time *across all runnable tasks* (tasks in the Linux sense). For example, in a typical 60 second sampling interval: nr_periods = 600, nr_throttled could be 60, i.e., 10% of intervals, but throttled_time could be much higher than (60/600) * 60 = 6 seconds if there is more than one task that is runnable but throttled. *Each* throttled task contributes to the total throttled time. Small test to demonstrate throttled_time nr_periods * quota_interval: 5 x {{'openssl speed'}} running with quota=100ms: {noformat} cat cpu.stat sleep 1 cat cpu.stat nr_periods 3228 nr_throttled 1276 throttled_time 528843772540 nr_periods 3238 nr_throttled 1286 throttled_time 531668964667 {noformat} All 10 intervals throttled (100%) for total time of 2.8 seconds in 1 second (more than 100% of the time interval) It would be helpful to expose the number of processes and tasks in the container cgroup. This would be at a very coarse granularity but would give some guidance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2108) Add configure flag or environment variable to enable SSL/libevent Socket
[ https://issues.apache.org/jira/browse/MESOS-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2108: -- Sprint: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20) Add configure flag or environment variable to enable SSL/libevent Socket Key: MESOS-2108 URL: https://issues.apache.org/jira/browse/MESOS-2108 Project: Mesos Issue Type: Task Reporter: Niklas Quarfot Nielsen Assignee: Joris Van Remoortere -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1913) Create libevent/SSL-backed Socket implementation
[ https://issues.apache.org/jira/browse/MESOS-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-1913: -- Sprint: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20) Create libevent/SSL-backed Socket implementation Key: MESOS-1913 URL: https://issues.apache.org/jira/browse/MESOS-1913 Project: Mesos Issue Type: Task Reporter: Niklas Quarfot Nielsen Assignee: Joris Van Remoortere -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2050) InMemoryAuxProp plugin used by Authenticators results in SEGFAULT
[ https://issues.apache.org/jira/browse/MESOS-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2050: -- Sprint: Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q1 Sprint 3 - 2/20) InMemoryAuxProp plugin used by Authenticators results in SEGFAULT - Key: MESOS-2050 URL: https://issues.apache.org/jira/browse/MESOS-2050 Project: Mesos Issue Type: Bug Affects Versions: 0.21.0 Reporter: Vinod Kone Assignee: Till Toenshoff Observed this on ASF CI: Basically, as part of the recent Auth refactor for modules, the loading of secrets is being done once per Authenticator Process instead of once in the Master. Since, InMemoryAuxProp plugin manipulates static variables (e.g, 'properties') it results in SEGFAULT when one Authenticator (e.g., for slave) does load() while another Authenticator (e.g., for framework) does lookup(), as both these methods manipulate static 'properties'. {code} [ RUN ] MasterTest.LaunchDuplicateOfferTest Using temporary directory '/tmp/MasterTest_LaunchDuplicateOfferTest_XEBbvp' I1104 03:37:55.523553 28363 leveldb.cpp:176] Opened db in 2.270387ms I1104 03:37:55.524250 28363 leveldb.cpp:183] Compacted db in 662527ns I1104 03:37:55.524276 28363 leveldb.cpp:198] Created db iterator in 4964ns I1104 03:37:55.524284 28363 leveldb.cpp:204] Seeked to beginning of db in 702ns I1104 03:37:55.524291 28363 leveldb.cpp:273] Iterated through 0 keys in the db in 450ns I1104 03:37:55.524333 28363 replica.cpp:741] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I1104 03:37:55.524852 28384 recover.cpp:437] Starting replica recovery I1104 03:37:55.525188 28384 recover.cpp:463] Replica is in EMPTY status I1104 03:37:55.526577 28378 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I1104 03:37:55.527135 28378 master.cpp:318] Master 20141104-033755-3176252227-49988-28363 (proserpina.apache.org) started on 67.195.81.189:49988 I1104 03:37:55.527180 28378 master.cpp:364] Master only allowing authenticated frameworks to register I1104 03:37:55.527191 28378 master.cpp:369] Master only allowing authenticated slaves to register I1104 03:37:55.527217 28378 credentials.hpp:36] Loading credentials for authentication from '/tmp/MasterTest_LaunchDuplicateOfferTest_XEBbvp/credentials' I1104 03:37:55.527451 28378 master.cpp:408] Authorization enabled I1104 03:37:55.528081 28384 master.cpp:126] No whitelist given. Advertising offers for all slaves I1104 03:37:55.528548 28383 recover.cpp:188] Received a recover response from a replica in EMPTY status I1104 03:37:55.528645 28388 hierarchical_allocator_process.hpp:299] Initializing hierarchical allocator process with master : master@67.195.81.189:49988 I1104 03:37:55.529233 28388 master.cpp:1258] The newly elected leader is master@67.195.81.189:49988 with id 20141104-033755-3176252227-49988-28363 I1104 03:37:55.529266 28388 master.cpp:1271] Elected as the leading master! I1104 03:37:55.529289 28388 master.cpp:1089] Recovering from registrar I1104 03:37:55.529311 28385 recover.cpp:554] Updating replica status to STARTING I1104 03:37:55.529500 28384 registrar.cpp:313] Recovering registrar I1104 03:37:55.530037 28383 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 497965ns I1104 03:37:55.530083 28383 replica.cpp:320] Persisted replica status to STARTING I1104 03:37:55.530335 28387 recover.cpp:463] Replica is in STARTING status I1104 03:37:55.531343 28381 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I1104 03:37:55.531739 28384 recover.cpp:188] Received a recover response from a replica in STARTING status I1104 03:37:55.532168 28379 recover.cpp:554] Updating replica status to VOTING I1104 03:37:55.532572 28381 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 293974ns I1104 03:37:55.532594 28381 replica.cpp:320] Persisted replica status to VOTING I1104 03:37:55.532790 28390 recover.cpp:568] Successfully joined the Paxos group I1104 03:37:55.533107 28390 recover.cpp:452] Recover process terminated I1104 03:37:55.533604 28382 log.cpp:656] Attempting to start the writer I1104 03:37:55.534840 28381 replica.cpp:474] Replica received implicit promise request with proposal 1 I1104 03:37:55.535188 28381 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 321021ns I1104 03:37:55.535212 28381 replica.cpp:342] Persisted promised to 1 I1104 03:37:55.535893 28378 coordinator.cpp:230] Coordinator attemping to fill missing position I1104 03:37:55.537318 28392 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I1104 03:37:55.537719
[jira] [Updated] (MESOS-2069) Basic fetcher cache functionality
[ https://issues.apache.org/jira/browse/MESOS-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2069: -- Sprint: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20) Basic fetcher cache functionality - Key: MESOS-2069 URL: https://issues.apache.org/jira/browse/MESOS-2069 Project: Mesos Issue Type: Improvement Components: fetcher, slave Reporter: Bernd Mathiske Assignee: Bernd Mathiske Labels: fetcher, slave Original Estimate: 48h Remaining Estimate: 48h Add a flag to CommandInfo URI protobufs that indicates that files downloaded by the fetcher shall be cached in a repository. To be followed by MESOS-2057 for concurrency control. Also see MESOS-336 for the overall goals for the fetcher cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2253) Mesos 0.22.0 Release candidate 1
[ https://issues.apache.org/jira/browse/MESOS-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2253: -- Sprint: Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20) Mesos 0.22.0 Release candidate 1 Key: MESOS-2253 URL: https://issues.apache.org/jira/browse/MESOS-2253 Project: Mesos Issue Type: Task Reporter: Niklas Quarfot Nielsen Assignee: Niklas Quarfot Nielsen -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2348) Introduce a new filter abstraction for Resources.
[ https://issues.apache.org/jira/browse/MESOS-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2348: -- Sprint: Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q1 Sprint 3 - 2/20) Introduce a new filter abstraction for Resources. - Key: MESOS-2348 URL: https://issues.apache.org/jira/browse/MESOS-2348 Project: Mesos Issue Type: Task Components: general Reporter: Michael Park Assignee: Michael Park Labels: mesosphere h1. Motivation The main motivation here is to improve the design of the filter abstraction for {{Resources}}. With the new design we gain: 1. No duplicate code. 2. No need to introduce a new class for every filter. 3. Safer code with less work. 4. Ability to compose filters. h2. Overview of the Current Design. I think I'll need to start here to provide a clear motivation. Here's the current design in short: {code} class Filter { public: virtual Resources apply(const Resources resources) const = 0; }; class FooFilter : public Filter { public: virtual Resources apply(const Resource resources) const { Resources result; foreach (const Resource resource, resources) { if (/* resource is Foo. */) { result += resource; } } return result; } }; class BarFilter : public Filter { public: virtual Resources apply(const Resource resources) const { Resources result; foreach (const Resource resource, resources) { if (/* resource is Bar. */) { result += resource; } } return result; } }; {code} h3. Disadvantages 1. Duplicate code. Every derived {{Filter}} will have duplicate code, in specific: {code} Resources result; foreach (const Resource resource, resources) { if (/* resource satisfies some predicate. */) { result += resource; } } return result; {code} 2. Need to introduce a new class definition for every new {{Filter}}. We should be able to create new filters inline and use them in cases where the filter is only needed once and would only hurt readability at the global level. If the filter is useful in many contexts, by all means give it a name and put it somewhere. This is equivalent to lambda expressions which allow us to create new functions inline in cases where the function is only useful in this specific context but would only hurt readability at the global level. If the lambda is useful in many contexts, we give it a name and put it somewhere. 3. The constraints are too weak. A {{Filter}} must return a subset of the original {{Resources}}. It need not be a strict subset, but it must not be a strict superset. With the pure virtual apply, the only constraint we put on a new {{Filter}} definition is that we take {{Resources}} and return {{Resources}}. We should strive for code that prevents preventable bugs. 4. Inability to compose filters. I've defined 2 filters above, {{FooFilter}} and {{BarFilter}}. We should be able to give rise to a new filter with a composition of the filters above. For example, if I wanted to {{AND}} the filters, I shouldn't have to introduce {{FooAndBarFilter}}. This is equivalent to if we had predicates for these. Suppose we have {{isFoo(resource)}} and {{isBar(resource)}}. Would we introduce {{isNotFoo}}, {{isNotBar}}, {{isFooAndBar}}, {{isFooOrBar}}, {{isFooNotBar}}, etc? If {{FooAndBar}} is a common concept we use all the time, sure. But in general, we would simply compose our predicates: {{!isFoo(resource)}}, {{!isBar(resource)}}, {{isFoo(resource) isBar(resource)}}, {{isFoo(resource) || isBar(resource)}}, {{isFoo(resource) !isBar(resource)}}. h2. Overview of the New Design {code} class Filter { public: typedef lambda::functionbool(const Resource) Predicate; Filter(const Predicate _predicate) : predicate(_predicate) {} Resources operator()(const Resources resources) const { Resources result; foreach (const Resource resource, resources) { if (predicate(resource)) { result += resource; } } return result; } Filter operator ! (); private: Filter operator ( const Filter lhs, const Filter rhs); Filter operator || ( const Filter lhs, const Filter rhs); Predicate predicate; }; bool isFoo(const Resource resource) { return /* resource is Foo. */ } Filter FooFilter = Filter(isFoo); Filter BarFilter = Filter([](const Resource resource) { return /* resource is Bar. */ }); {code} h3. Addressing the Disadvantages 1. No duplicate code. We've removed the duplicate code by making the predicate the customization point. 2. No need to introduce a new class
[jira] [Updated] (MESOS-2248) 0.22.0 release
[ https://issues.apache.org/jira/browse/MESOS-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2248: -- Sprint: Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q1 Sprint 3 - 2/20) 0.22.0 release -- Key: MESOS-2248 URL: https://issues.apache.org/jira/browse/MESOS-2248 Project: Mesos Issue Type: Epic Reporter: Niklas Quarfot Nielsen Assignee: Niklas Quarfot Nielsen Mesos release 0.22.0 will include the following major feature(s): - Module Hooks (MESOS-2060) - SSL support (MESOS-910) - Disk quota isolation in Mesos containerizer (MESOS-1587 and MESOS-1588) Minor features and fixes: - Task labels (MESOS-2120) - Service discovery info for tasks and executors (MESOS-2208) - Configurable graceful shutdown (MESOS-1571) - Authentication module fixes (...) and Authenticatee modules similar to Authenticator modules (MESOS-2001, MESOS-2050) - Docker containerizer able to recover when running in a container (MESOS-2115) - Containerizer fixes (...) - Various bug fixes (...) Possible major features: - Container level network isolation (MESOS-1585) - Dynamic Reservations (MESOS-2018) - (Mesos slave should cache executors (MESOS-336)) This ticket will be used to track blockers to this release. For reference (per Jan 22nd) this has gone into Mesos since 0.21.1: https://gist.github.com/nqn/76aeb41a555625659ed8 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2139) Enable master's Accept call handler to support Dynamic Reservation
[ https://issues.apache.org/jira/browse/MESOS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2139: -- Sprint: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20) Enable master's Accept call handler to support Dynamic Reservation --- Key: MESOS-2139 URL: https://issues.apache.org/jira/browse/MESOS-2139 Project: Mesos Issue Type: Task Components: master Reporter: Michael Park Assignee: Michael Park Labels: mesosphere The allocated resources in the allocator needs to be updated when a dynamic reservation is performed because we need to transition the {{Resources}} that are marked {{reservationType=STATIC}} to {{DYNAMIC}}. {{Resources::apply(Offer::Operation)}} is used to determine the resulting set of resources after an operation. This is to be used to update the resources in places such as the allocator and the total slave resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2110) Configurable Ping Timeouts
[ https://issues.apache.org/jira/browse/MESOS-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2110: -- Sprint: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20) Configurable Ping Timeouts -- Key: MESOS-2110 URL: https://issues.apache.org/jira/browse/MESOS-2110 Project: Mesos Issue Type: Improvement Components: master, slave Reporter: Adam B Assignee: Adam B Labels: master, network, slave, timeout After a series of ping-failures, the master considers the slave lost and calls shutdownSlave, requiring such a slave that reconnects to kill its tasks and re-register as a new slaveId. On the other side, after a similar timeout, the slave will consider the master lost and try to detect a new master. These timeouts are currently hardcoded constants (5 * 15s), which may not be well-suited for all scenarios. - Some clusters may tolerate a longer slave process restart period, and wouldn't want tasks to be killed upon reconnect. - Some clusters may have higher-latency networks (e.g. cross-datacenter, or for volunteer computing efforts), and would like to tolerate longer periods without communication. We should provide flags/mechanisms on the master to control its tolerance for non-communicative slaves, and (less importantly?) on the slave to tolerate missing masters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2165) When cyrus sasl MD5 isn't installed configure passes, tests fail without any output
[ https://issues.apache.org/jira/browse/MESOS-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2165: -- Sprint: Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q1 Sprint 3 - 2/20) When cyrus sasl MD5 isn't installed configure passes, tests fail without any output --- Key: MESOS-2165 URL: https://issues.apache.org/jira/browse/MESOS-2165 Project: Mesos Issue Type: Bug Reporter: Cody Maloney Assignee: Till Toenshoff Labels: mesosphere Sample Dockerfile to make such a host: {code} FROM centos:centos7 RUN yum install -y epel-release gcc python-devel RUN yum install -y python-pip RUN yum install -y rpm-build redhat-rpm-config autoconf make gcc gcc-c++ patch libtool git python-devel ruby-devel java-1.7.0-openjdk-devel zlib-devel libcurl-devel openssl-devel cyrus-sasl-devel rubygems apr-devel apr-util-devel subversion-devel maven libselinux-python {code} Use: 'docker run -i -t imagename /bin/bash' to run the image, get a shell inside where you can 'git clone' mesos and build/run the tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2160) Add support for allocator modules
[ https://issues.apache.org/jira/browse/MESOS-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2160: -- Sprint: Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20) Add support for allocator modules - Key: MESOS-2160 URL: https://issues.apache.org/jira/browse/MESOS-2160 Project: Mesos Issue Type: Task Reporter: Niklas Quarfot Nielsen Assignee: Alexander Rukletsov Labels: mesosphere Currently Mesos supports only the DRF allocator, changing which requires hacking Mesos source code, which, in turn, sets a high entry barrier. Allocator modules give an easy possibility to tweak resource allocation policy. This will enable swapping allocation policies without the necessity to edit Mesos source code. Custom allocators may be written by everybody and does not need be distributed together with Mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1831) Master should send PingSlaveMessage instead of PING
[ https://issues.apache.org/jira/browse/MESOS-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-1831: -- Sprint: Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20) Master should send PingSlaveMessage instead of PING - Key: MESOS-1831 URL: https://issues.apache.org/jira/browse/MESOS-1831 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Adam B Labels: mesosphere In 0.21.0 master sends PING message with an embedded PingSlaveMessage for backwards compatibility (https://reviews.apache.org/r/25867/). In 0.22.0, master should send PingSlaveMessage directly instead of PING. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2229) Add max allowed age to Slave stats.json endpoint
[ https://issues.apache.org/jira/browse/MESOS-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2229: -- Sprint: Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20) Add max allowed age to Slave stats.json endpoint -- Key: MESOS-2229 URL: https://issues.apache.org/jira/browse/MESOS-2229 Project: Mesos Issue Type: Improvement Components: json api Reporter: Sunil Abraham Assignee: Alexander Rojas Labels: mesosphere Currently max allowed age gets logged, but it would be great to have this in the slave's stats.json endpoint for programmatic access. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2157) Add /master/slaves and /master/frameworks/{framework}/tasks/{task} endpoints
[ https://issues.apache.org/jira/browse/MESOS-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2157: -- Sprint: Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20) Add /master/slaves and /master/frameworks/{framework}/tasks/{task} endpoints Key: MESOS-2157 URL: https://issues.apache.org/jira/browse/MESOS-2157 Project: Mesos Issue Type: Task Components: master Reporter: Niklas Quarfot Nielsen Assignee: Alexander Rojas Priority: Trivial Labels: mesosphere, newbie master/state.json exports the entire state of the cluster and can, for large clusters, become massive (tens of megabytes of JSON). Often, a client only need information about subsets of the entire state, for example all connected slaves, or information (registration info, tasks, etc) belonging to a particular framework. We can partition state.json into many smaller endpoints, but for starters, being able to get slave information and tasks information per framework would be useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2351) Enable label and environment decorators (hooks) to remove label and environment entries
[ https://issues.apache.org/jira/browse/MESOS-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2351: -- Sprint: Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q1 Sprint 3 - 2/20) Enable label and environment decorators (hooks) to remove label and environment entries --- Key: MESOS-2351 URL: https://issues.apache.org/jira/browse/MESOS-2351 Project: Mesos Issue Type: Task Reporter: Niklas Quarfot Nielsen Assignee: Niklas Quarfot Nielsen We need to change the semantics of decorators to be able to not only add labels and environment variables, but also remove them. The change is fairly small. The hook manager (and call site) use CopyFrom instead of MergeFrom and hook implementors pass on the labels and environment from task and executor commands respectively. In the future, we can tag labels such that only labels belonging to a hook type (across master and slave) can be inspected and changed. For now, the active hooks are selected by the operator and therefore be trusted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2074) Fetcher cache test fixture
[ https://issues.apache.org/jira/browse/MESOS-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2074: -- Sprint: Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q1 Sprint 3 - 2/20) Fetcher cache test fixture -- Key: MESOS-2074 URL: https://issues.apache.org/jira/browse/MESOS-2074 Project: Mesos Issue Type: Improvement Components: fetcher, slave Reporter: Bernd Mathiske Assignee: Bernd Mathiske Original Estimate: 72h Remaining Estimate: 72h To accelerate providing good test coverage for the fetcher cache (MESOS-336), we can provide a framework that canonicalizes creating and running a number of tasks and allows easy parametrization with combinations of the following: - whether to cache or not - whether make what has been downloaded executable or not - whether to extract from an archive or not - whether to download from a file system, http, or... We can create a simple HHTP server in the test fixture to support the latter. Furthermore, the tests need to be robust wrt. varying numbers of StatusUpdate messages. An accumulating update message sink that reports the final state is needed. All this has already been programmed in this patch, just needs to be rebased: https://reviews.apache.org/r/21316/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2335) Mesos Lifecycle Modules
[ https://issues.apache.org/jira/browse/MESOS-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329116#comment-14329116 ] Bernd Mathiske commented on MESOS-2335: --- I suggest we separate the two issues (lifecylce and stream observation) into two tickets. What we need right now for several of our immediate use cases is a very simple module. We can even go with one that has zero API besides module instantiation and destruction at the right moments in the master or slave life cycle, hence the name. This is then the same as so-called anonymous modules, i.e. modules that get loaded, but not called by any callback. Mesos Lifecycle Modules --- Key: MESOS-2335 URL: https://issues.apache.org/jira/browse/MESOS-2335 Project: Mesos Issue Type: Improvement Components: master, modules, slave Reporter: Bernd Mathiske Assignee: Till Toenshoff Labels: features Original Estimate: 168h Remaining Estimate: 168h A new kind of module that receives callbacks at significant life cycle events of its host libprocess process. Typically the latter is a Mesos slave or master and the life time of the libprocess process coincides with the underlying OS process. h4. Motivation and Use Cases We want to add customized and experimental capabilities that concern the life time of Mesos components without protruding into Mesos source code and without creating new build process dependencies for everybody. Example use cases: 1. A slave or master life cycle module that gathers fail-over incidents and reports summaries thereof to a remote data sink. 2. A slave module that observes host computer metrics and correlates these with task activity. This can be used to find resources leaks and to prevent, respectively guide, oversubscription. 3. Upgrades and provisioning that require shutdown and restart. h4. Specifics The specific life cycle events that we want to get notified about and want to be able to act upon are: - Process is spawning/initializing - Process is terminating/finalizing In all these cases, a reference to the process is passed as a parameter, giving the module access for inspection and reaction. h4. Module Classification Unlike other named modules, a life cycle module does not directly replace or provide essential Mesos functionality (such as an Isolator module does). Unlike a decorator module it does not directly add or inject data into Mesos core either. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2377) Fix leak in libevent's version EventLoop::delay
[ https://issues.apache.org/jira/browse/MESOS-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van Remoortere updated MESOS-2377: Sprint: Mesosphere Q1 Sprint 3 - 2/20 (was: Mesosphere Q1 Sprint 3 - 3/6) Fix leak in libevent's version EventLoop::delay --- Key: MESOS-2377 URL: https://issues.apache.org/jira/browse/MESOS-2377 Project: Mesos Issue Type: Bug Components: libprocess Affects Versions: 0.22.0 Reporter: Joris Van Remoortere Assignee: Joris Van Remoortere Priority: Blocker Labels: libprocess Fix For: 0.22.0 I was finally able to verify the delay event is leaking through massif. Easy fix. Patch coming soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2016) docker_name_prefix is too generic
[ https://issues.apache.org/jira/browse/MESOS-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2016: -- Sprint: Mesosphere Q4 Sprint 2 - 11/14, Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q4 Sprint 2 - 11/14, Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20) docker_name_prefix is too generic - Key: MESOS-2016 URL: https://issues.apache.org/jira/browse/MESOS-2016 Project: Mesos Issue Type: Bug Reporter: Jay Buffington Assignee: Timothy Chen From docker.hpp and docker.cpp: {quote} // Prefix used to name Docker containers in order to distinguish those // created by Mesos from those created manually. extern std::string DOCKER_NAME_PREFIX; // TODO(benh): At some point to run multiple slaves we'll need to make // the Docker container name creation include the slave ID. string DOCKER_NAME_PREFIX = mesos-; {quote} This name is too generic. A common pattern in docker land is to run everything in a container and use volume mounts to share sockets do RPC between containers. CoreOS has popularized this technique. Inevitably, what people do is start a container named mesos-slave which runs the docker containerizer recovery code which removes all containers that start with mesos- And then ask huh, why did my mesos-slave docker container die? I don't see any error messages... Ideally, we should do what Ben suggested and add the slave id to the name prefix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2072) Fetcher cache eviction
[ https://issues.apache.org/jira/browse/MESOS-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2072: -- Sprint: Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20) Fetcher cache eviction -- Key: MESOS-2072 URL: https://issues.apache.org/jira/browse/MESOS-2072 Project: Mesos Issue Type: Improvement Components: fetcher, slave Reporter: Bernd Mathiske Assignee: Bernd Mathiske Original Estimate: 336h Remaining Estimate: 336h Delete files from the fetcher cache so that a given cache size is never exceeded. Succeed in doing so while concurrent downloads are on their way and new requests are pouring in. Idea: measure the size of each download before it begins, make enough room before the download. This means that only download mechanisms that divulge the size before the main download will be supported. AFAWK, those in use so far have this property. The calculation of how much space to free needs to be under concurrency control, accumulating all space needed for competing, incomplete download requests. (The Python script that performs fetcher caching for Aurora does not seem to implement this. See https://gist.github.com/zmanji/f41df77510ef9d00265a, imagine several of these programs running concurrently, each one's _cache_eviction() call succeeding, each perceiving the SAME free space being available.) Ultimately, a conflict resolution strategy is needed if just the downloads underway already exceed the cache capacity. Then, as a fallback, direct download into the work directory will be used for some tasks. TBD how to pick which task gets treated how. At first, only support copying of any downloaded files to the work directory for task execution. This isolates the task life cycle after starting a task from cache eviction considerations. (Later, we can add symbolic links that avoid copying. But then eviction of fetched files used by ongoing tasks must be blocked, which adds complexity. another future extension is MESOS-1667 Extract from URI while downloading into work dir). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2226) HookTest.VerifySlaveLaunchExecutorHook is flaky
[ https://issues.apache.org/jira/browse/MESOS-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2226: -- Sprint: Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20) HookTest.VerifySlaveLaunchExecutorHook is flaky --- Key: MESOS-2226 URL: https://issues.apache.org/jira/browse/MESOS-2226 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.22.0 Reporter: Vinod Kone Assignee: Kapil Arya Labels: flaky-test Observed this on internal CI {code} [ RUN ] HookTest.VerifySlaveLaunchExecutorHook Using temporary directory '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME' I0114 18:51:34.659353 4720 leveldb.cpp:176] Opened db in 1.255951ms I0114 18:51:34.662112 4720 leveldb.cpp:183] Compacted db in 596090ns I0114 18:51:34.662364 4720 leveldb.cpp:198] Created db iterator in 177877ns I0114 18:51:34.662719 4720 leveldb.cpp:204] Seeked to beginning of db in 19709ns I0114 18:51:34.663010 4720 leveldb.cpp:273] Iterated through 0 keys in the db in 18208ns I0114 18:51:34.663312 4720 replica.cpp:744] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0114 18:51:34.664266 4735 recover.cpp:449] Starting replica recovery I0114 18:51:34.664908 4735 recover.cpp:475] Replica is in EMPTY status I0114 18:51:34.667842 4734 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request I0114 18:51:34.669117 4735 recover.cpp:195] Received a recover response from a replica in EMPTY status I0114 18:51:34.677913 4735 recover.cpp:566] Updating replica status to STARTING I0114 18:51:34.683157 4735 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 137939ns I0114 18:51:34.683507 4735 replica.cpp:323] Persisted replica status to STARTING I0114 18:51:34.684013 4735 recover.cpp:475] Replica is in STARTING status I0114 18:51:34.685554 4738 replica.cpp:641] Replica in STARTING status received a broadcasted recover request I0114 18:51:34.696512 4736 recover.cpp:195] Received a recover response from a replica in STARTING status I0114 18:51:34.700552 4735 recover.cpp:566] Updating replica status to VOTING I0114 18:51:34.701128 4735 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 115624ns I0114 18:51:34.701478 4735 replica.cpp:323] Persisted replica status to VOTING I0114 18:51:34.701817 4735 recover.cpp:580] Successfully joined the Paxos group I0114 18:51:34.702569 4735 recover.cpp:464] Recover process terminated I0114 18:51:34.716439 4736 master.cpp:262] Master 20150114-185134-2272962752-57018-4720 (fedora-19) started on 192.168.122.135:57018 I0114 18:51:34.716913 4736 master.cpp:308] Master only allowing authenticated frameworks to register I0114 18:51:34.717136 4736 master.cpp:313] Master only allowing authenticated slaves to register I0114 18:51:34.717488 4736 credentials.hpp:36] Loading credentials for authentication from '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME/credentials' I0114 18:51:34.718077 4736 master.cpp:357] Authorization enabled I0114 18:51:34.719238 4738 whitelist_watcher.cpp:65] No whitelist given I0114 18:51:34.719755 4737 hierarchical_allocator_process.hpp:285] Initialized hierarchical allocator process I0114 18:51:34.722584 4736 master.cpp:1219] The newly elected leader is master@192.168.122.135:57018 with id 20150114-185134-2272962752-57018-4720 I0114 18:51:34.722865 4736 master.cpp:1232] Elected as the leading master! I0114 18:51:34.723310 4736 master.cpp:1050] Recovering from registrar I0114 18:51:34.723760 4734 registrar.cpp:313] Recovering registrar I0114 18:51:34.725229 4740 log.cpp:660] Attempting to start the writer I0114 18:51:34.727893 4739 replica.cpp:477] Replica received implicit promise request with proposal 1 I0114 18:51:34.728425 4739 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 114781ns I0114 18:51:34.728662 4739 replica.cpp:345] Persisted promised to 1 I0114 18:51:34.731271 4741 coordinator.cpp:230] Coordinator attemping to fill missing position I0114 18:51:34.733223 4734 replica.cpp:378] Replica received explicit promise request for position 0 with proposal 2 I0114 18:51:34.734076 4734 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 87441ns I0114 18:51:34.734441 4734 replica.cpp:679] Persisted action at 0 I0114 18:51:34.740272 4739 replica.cpp:511] Replica received write request for position 0 I0114 18:51:34.740910 4739 leveldb.cpp:438] Reading position from leveldb took 59846ns I0114 18:51:34.741672 4739 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took
[jira] [Updated] (MESOS-2155) Make docker containerizer killing orphan containers optional
[ https://issues.apache.org/jira/browse/MESOS-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2155: -- Sprint: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20) Make docker containerizer killing orphan containers optional Key: MESOS-2155 URL: https://issues.apache.org/jira/browse/MESOS-2155 Project: Mesos Issue Type: Improvement Components: docker Reporter: Timothy Chen Assignee: Timothy Chen Currently the docker containerizer on recover will kill containers that are not recognized by the containerizer. We want to make this behavior optional as there are certain situations we want to let the docker containers still continue to run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2377) Fix leak in libevent's version EventLoop::delay
[ https://issues.apache.org/jira/browse/MESOS-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2377: -- Sprint: Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q1 Sprint 3 - 2/20) Fix leak in libevent's version EventLoop::delay --- Key: MESOS-2377 URL: https://issues.apache.org/jira/browse/MESOS-2377 Project: Mesos Issue Type: Bug Components: libprocess Affects Versions: 0.22.0 Reporter: Joris Van Remoortere Assignee: Joris Van Remoortere Priority: Blocker Labels: libprocess Fix For: 0.22.0 I was finally able to verify the delay event is leaking through massif. Easy fix. Patch coming soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2057) Concurrency control for fetcher cache
[ https://issues.apache.org/jira/browse/MESOS-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2057: -- Sprint: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20) Concurrency control for fetcher cache - Key: MESOS-2057 URL: https://issues.apache.org/jira/browse/MESOS-2057 Project: Mesos Issue Type: Improvement Components: fetcher, slave Reporter: Bernd Mathiske Assignee: Bernd Mathiske Original Estimate: 96h Remaining Estimate: 96h Having added a URI flag to CommandInfo messages (in MESOS-2069) that indicates caching, caching files downloaded by the fetcher in a repository, now ensure that when a URI is cached, it is only ever downloaded once for the same user on the same slave as long as the slave keeps running. This even holds if multiple tasks request the same URI concurrently. If multiple requests for the same URI occur, perform only one of them and reuse the result. Make concurrent requests for the same URI wait for the one download. Different URIs from different CommandInfos can be downloaded concurrently. No cache eviction, cleanup or failover will be handled for now. Additional tickets will be filed for these enhancements. (So don't use this feature in production until the whole epic is complete.) Note that implementing this does not suffice for production use. This ticket contains the main part of the fetcher logic, though. See the epic MESOS-336 for the rest of the features that lead to a fully functional fetcher cache. The proposed general approach is to keep all bookkeeping about what is in which stage of being fetched and where it resides in the slave's MesosContainerizerProcess, so that all concurrent access is disambiguated and controlled by an actor (aka libprocess process). Depends on MESOS-2056 and MESOS-2069. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2317) Remove deprecated checkpoint=false code
[ https://issues.apache.org/jira/browse/MESOS-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329066#comment-14329066 ] Joerg Schad commented on MESOS-2317: I guess we should also update http://mesos.apache.org/documentation/latest/slave-recovery/. Remove deprecated checkpoint=false code --- Key: MESOS-2317 URL: https://issues.apache.org/jira/browse/MESOS-2317 Project: Mesos Issue Type: Task Affects Versions: 0.22.0 Reporter: Adam B Assignee: Cody Maloney Labels: checkpoint Cody's plan from MESOS-444 was: 1) Make it so the flag can't be changed at the command line 2) Remove the checkpoint variable entirely from slave/flags.hpp. This is a fairly involved change since a number of unit tests depend on manually setting the flag, as well as the default being non-checkpointing. 3) Remove logic around checkpointing in the slave 4) Drop the flag from the SlaveInfo struct, remove logic inside the master (Will require a deprecation cycle). Only 1) has been implemented/committed. This ticket is to track the remaining work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2115) Improve recovering Docker containers when slave is contained
[ https://issues.apache.org/jira/browse/MESOS-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2115: -- Sprint: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20) Improve recovering Docker containers when slave is contained Key: MESOS-2115 URL: https://issues.apache.org/jira/browse/MESOS-2115 Project: Mesos Issue Type: Epic Components: docker Reporter: Timothy Chen Assignee: Timothy Chen Labels: docker Currently when docker containerizer is recovering it checks the checkpointed executor pids to recover which containers are still running, and remove the rest of the containers from docker ps that isn't recognized. This is problematic when the slave itself was in a docker container, as when the slave container dies all the forked processes are removed as well, so the checkpointed executor pids are no longer valid. We have to assume the docker containers might be still running even though the checkpointed executor pids are not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2119) Add Socket tests
[ https://issues.apache.org/jira/browse/MESOS-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2119: -- Sprint: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20) Add Socket tests Key: MESOS-2119 URL: https://issues.apache.org/jira/browse/MESOS-2119 Project: Mesos Issue Type: Task Reporter: Niklas Quarfot Nielsen Assignee: Joris Van Remoortere Add more Socket specific tests to get coverage while doing libev to libevent (w and wo SSL) move -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2373) DRFSorter needs to distinguish resources from different slaves.
[ https://issues.apache.org/jira/browse/MESOS-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2373: -- Sprint: Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q1 Sprint 3 - 2/20) DRFSorter needs to distinguish resources from different slaves. --- Key: MESOS-2373 URL: https://issues.apache.org/jira/browse/MESOS-2373 Project: Mesos Issue Type: Bug Components: allocation Reporter: Michael Park Assignee: Michael Park Labels: mesosphere Currently the {{DRFSorter}} aggregates total and allocated resources across multiple slaves, which only works for scalar resources. We need to distinguish resources from different slaves. Suppose we have 2 slaves and 1 framework. The framework is allocated all resources from both slaves. {code} Resources slaveResources = Resources::parse(cpus:2;mem:512;ports:[31000-32000]).get(); DRFSorter sorter; sorter.add(slaveResources); // Add slave1 resources sorter.add(slaveResources); // Add slave2 resources // Total resources in sorter at this point is // cpus(*):4; mem(*):1024; ports(*):[31000-32000]. // The scalar resources get aggregated correctly but ports do not. sorter.add(F); // The 2 calls to allocated only works because we simply do: // allocation[name] += resources; // without checking that the 'resources' is available in the total. sorter.allocated(F, slaveResources); sorter.allocated(F, slaveResources); // At this point, sorter.allocation(F) is: // cpus(*):4; mem(*):1024; ports(*):[31000-32000]. {code} To provide some context, this issue came up while trying to reserve all unreserved resources from every offer. {code} for (const Offer offer : offers) { Resources unreserved = offer.resources().unreserved(); Resources reserved = unreserved.flatten(role, Resource::FRAMEWORK); Offer::Operation reserve; reserve.set_type(Offer::Operation::RESERVE); reserve.mutable_reserve()-mutable_resources()-CopyFrom(reserved); driver-acceptOffers({offer.id()}, {reserve}); } {code} Suppose the slave resources are the same as above: {quote} Slave1: {{cpus(\*):2; mem(\*):512; ports(\*):\[31000-32000\]}} Slave2: {{cpus(\*):2; mem(\*):512; ports(\*):\[31000-32000\]}} {quote} Initial (incorrect) total resources in the DRFSorter is: {quote} {{cpus(\*):4; mem(\*):1024; ports(\*):\[31000-32000\]}} {quote} We receive 2 offers, 1 from each slave: {quote} Offer1: {{cpus(\*):2; mem(\*):512; ports(\*):\[31000-32000\]}} Offer2: {{cpus(\*):2; mem(\*):512; ports(\*):\[31000-32000\]}} {quote} At this point, the resources allocated for the framework is: {quote} {{cpus(\*):4; mem(\*):1024; ports(\*):\[31000-32000\]}} {quote} After first {{RESERVE}} operation with Offer1: The allocated resources for the framework becomes: {quote} {{cpus(\*):2; mem(\*):512; cpus(role):2; mem(role):512; ports(role):\[31000-32000\]}} {quote} During second {{RESERVE}} operation with Offer2: {code:title=HierarchicalAllocatorProcess::updateAllocation} // ... FrameworkSorter* frameworkSorter = frameworkSorters[frameworks\[frameworkId\].role]; Resources allocation = frameworkSorter-allocation(frameworkId.value()); // Update the allocated resources. TryResources updatedAllocation = allocation.apply(operations); CHECK_SOME(updatedAllocation); // ... {code} {{allocation}} in the above code is: {quote} {{cpus(\*):2; mem(\*):512; cpus(role):2; mem(role):512; ports(role):\[31000-32000\]}} {quote} We try to {{apply}} a {{RESERVE}} operation and we fail to find {{ports(\*):\[31000-32000\]}} which leads to the {{CHECK}} fail at {{CHECK_SOME(updatedAllocation);}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2085) Add support encrypted and non-encrypted communication in parallel for cluster upgrade
[ https://issues.apache.org/jira/browse/MESOS-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2085: -- Sprint: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20) Add support encrypted and non-encrypted communication in parallel for cluster upgrade - Key: MESOS-2085 URL: https://issues.apache.org/jira/browse/MESOS-2085 Project: Mesos Issue Type: Task Reporter: Niklas Quarfot Nielsen Assignee: Joris Van Remoortere During cluster upgrade from non-encrypted to encrypted communication, we need to support an interim where: 1) A master can have connections to both encrypted and non-encrypted slaves 2) A slave that supports encrypted communication connects to a master that has not yet been upgraded. 3) Frameworks are encrypted but the master has not been upgraded yet. 4) Master has been upgraded but frameworks haven't. 5) A slave process has upgraded but running executor processes haven't. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2070) Implement simple slave recovery behavior for fetcher cache
[ https://issues.apache.org/jira/browse/MESOS-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2070: -- Sprint: Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20) Implement simple slave recovery behavior for fetcher cache -- Key: MESOS-2070 URL: https://issues.apache.org/jira/browse/MESOS-2070 Project: Mesos Issue Type: Improvement Components: fetcher, slave Reporter: Bernd Mathiske Assignee: Bernd Mathiske Labels: newbie Original Estimate: 6h Remaining Estimate: 6h Clean the fetcher cache completely upon slave restart/recovery. This implements correct, albeit not ideal behavior. More efficient schemes that restore knowledge about cached files or even resume downloads can be added later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2215) The Docker containerizer attempts to recover any task when checkpointing is enabled, not just docker tasks.
[ https://issues.apache.org/jira/browse/MESOS-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2215: -- Sprint: Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q1 Sprint 3 - 2/20) The Docker containerizer attempts to recover any task when checkpointing is enabled, not just docker tasks. --- Key: MESOS-2215 URL: https://issues.apache.org/jira/browse/MESOS-2215 Project: Mesos Issue Type: Bug Components: docker Affects Versions: 0.21.0 Reporter: Steve Niemitz Assignee: Timothy Chen Once the slave restarts and recovers the task, I see this error in the log for all tasks that were recovered every second or so. Note, these were NOT docker tasks: W0113 16:01:00.790323 773142 monitor.cpp:213] Failed to get resource usage for container 7b729b89-dc7e-4d08-af97-8cd1af560a21 for executor thermos-1421085237813-slipstream-prod-agent-3-8f769514-1835-4151-90d0-3f55dcc940dd of framework 20150109-161713-715350282-5050-290797-: Failed to 'docker inspect mesos-7b729b89-dc7e-4d08-af97-8cd1af560a21': exit status = exited with status 1 stderr = Error: No such image or container: mesos-7b729b89-dc7e-4d08-af97-8cd1af560a21 However the tasks themselves are still healthy and running. The slave was launched with --containerizers=mesos,docker - More info: it looks like the docker containerizer is a little too ambitious about recovering containers, again this was not a docker task: I0113 15:59:59.476145 773142 docker.cpp:814] Recovering container '7b729b89-dc7e-4d08-af97-8cd1af560a21' for executor 'thermos-1421085237813-slipstream-prod-agent-3-8f769514-1835-4151-90d0-3f55dcc940dd' of framework 20150109-161713-715350282-5050-290797- Looking into the source, it looks like the problem is that the ComposingContainerize runs recover in parallel, but neither the docker containerizer nor mesos containerizer check if they should recover the task or not (because they were the ones that launched it). Perhaps this needs to be written into the checkpoint somewhere? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1806) Substituting etcd or ReplicatedLog for Zookeeper
[ https://issues.apache.org/jira/browse/MESOS-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-1806: -- Sprint: Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20) Substituting etcd or ReplicatedLog for Zookeeper Key: MESOS-1806 URL: https://issues.apache.org/jira/browse/MESOS-1806 Project: Mesos Issue Type: Task Reporter: Ed Ropple Assignee: Cody Maloney Priority: Minor adam_mesos eropple: Could you also file a new JIRA for Mesos to drop ZK in favor of etcd or ReplicatedLog? Would love to get some momentum going on that one. -- Consider it filed. =) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2333) Securing Sandboxes via Filebrowser Access Control
[ https://issues.apache.org/jira/browse/MESOS-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2333: -- Sprint: Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 3 - 3/6 (was: Mesosphere Q1 Sprint 3 - 2/20) Securing Sandboxes via Filebrowser Access Control - Key: MESOS-2333 URL: https://issues.apache.org/jira/browse/MESOS-2333 Project: Mesos Issue Type: Improvement Components: security Reporter: Adam B Assignee: Alexander Rojas Labels: authorization, filebrowser, mesosphere, security As it stands now, anybody with access to the master or slave web UI can use the filebrowser to view the contents of any attached/mounted paths on the master or slave. Currently, the attached paths include master and slave logs as well as executor/task sandboxes. While there's a chance that the master and slave logs could contain sensitive information, it's much more likely that sandboxes could contain customer data or other files that should not be globally accessible. Securing the sandboxes is the primary goal of this ticket. There are four filebrowser endpoints: browse, read, download, and debug. Here are some potential solutions. 1) We could easily provide flags that globally enable/disable each endpoint, allowing coarse-grained access control. This might be a reasonable short-term plan. We would also want to update the web UIs to display an Access Denied error, rather than showing links that open up blank pailers. 2) Each master and slave handles is own authn/authz. Slaves will need to have an authenticator, and there must be a way to provide each node with credentials and ACLs, and keep these in sync across the cluster. 3) Filter all slave communications through the master(s), which already has credentials and ACLs. We'll have to restrict access to the filebrowser (and other?) endpoints to the (leading?) master. Then the master can perform the authentication and authorization, only passing the request on to the slave if auth succeeds. 3a) The slave returns the browse/read/download response back through the master. This could be a network bottleneck. 3b) Upon authn/z success, the master redirects the request to the appropriate slave, which will send the response directly back to the requester. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2050) InMemoryAuxProp plugin used by Authenticators results in SEGFAULT
[ https://issues.apache.org/jira/browse/MESOS-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2050: -- Target Version/s: 0.23.0 (was: 0.21.0, 0.22.0) InMemoryAuxProp plugin used by Authenticators results in SEGFAULT - Key: MESOS-2050 URL: https://issues.apache.org/jira/browse/MESOS-2050 Project: Mesos Issue Type: Bug Affects Versions: 0.21.0 Reporter: Vinod Kone Assignee: Till Toenshoff Observed this on ASF CI: Basically, as part of the recent Auth refactor for modules, the loading of secrets is being done once per Authenticator Process instead of once in the Master. Since, InMemoryAuxProp plugin manipulates static variables (e.g, 'properties') it results in SEGFAULT when one Authenticator (e.g., for slave) does load() while another Authenticator (e.g., for framework) does lookup(), as both these methods manipulate static 'properties'. {code} [ RUN ] MasterTest.LaunchDuplicateOfferTest Using temporary directory '/tmp/MasterTest_LaunchDuplicateOfferTest_XEBbvp' I1104 03:37:55.523553 28363 leveldb.cpp:176] Opened db in 2.270387ms I1104 03:37:55.524250 28363 leveldb.cpp:183] Compacted db in 662527ns I1104 03:37:55.524276 28363 leveldb.cpp:198] Created db iterator in 4964ns I1104 03:37:55.524284 28363 leveldb.cpp:204] Seeked to beginning of db in 702ns I1104 03:37:55.524291 28363 leveldb.cpp:273] Iterated through 0 keys in the db in 450ns I1104 03:37:55.524333 28363 replica.cpp:741] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I1104 03:37:55.524852 28384 recover.cpp:437] Starting replica recovery I1104 03:37:55.525188 28384 recover.cpp:463] Replica is in EMPTY status I1104 03:37:55.526577 28378 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I1104 03:37:55.527135 28378 master.cpp:318] Master 20141104-033755-3176252227-49988-28363 (proserpina.apache.org) started on 67.195.81.189:49988 I1104 03:37:55.527180 28378 master.cpp:364] Master only allowing authenticated frameworks to register I1104 03:37:55.527191 28378 master.cpp:369] Master only allowing authenticated slaves to register I1104 03:37:55.527217 28378 credentials.hpp:36] Loading credentials for authentication from '/tmp/MasterTest_LaunchDuplicateOfferTest_XEBbvp/credentials' I1104 03:37:55.527451 28378 master.cpp:408] Authorization enabled I1104 03:37:55.528081 28384 master.cpp:126] No whitelist given. Advertising offers for all slaves I1104 03:37:55.528548 28383 recover.cpp:188] Received a recover response from a replica in EMPTY status I1104 03:37:55.528645 28388 hierarchical_allocator_process.hpp:299] Initializing hierarchical allocator process with master : master@67.195.81.189:49988 I1104 03:37:55.529233 28388 master.cpp:1258] The newly elected leader is master@67.195.81.189:49988 with id 20141104-033755-3176252227-49988-28363 I1104 03:37:55.529266 28388 master.cpp:1271] Elected as the leading master! I1104 03:37:55.529289 28388 master.cpp:1089] Recovering from registrar I1104 03:37:55.529311 28385 recover.cpp:554] Updating replica status to STARTING I1104 03:37:55.529500 28384 registrar.cpp:313] Recovering registrar I1104 03:37:55.530037 28383 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 497965ns I1104 03:37:55.530083 28383 replica.cpp:320] Persisted replica status to STARTING I1104 03:37:55.530335 28387 recover.cpp:463] Replica is in STARTING status I1104 03:37:55.531343 28381 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I1104 03:37:55.531739 28384 recover.cpp:188] Received a recover response from a replica in STARTING status I1104 03:37:55.532168 28379 recover.cpp:554] Updating replica status to VOTING I1104 03:37:55.532572 28381 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 293974ns I1104 03:37:55.532594 28381 replica.cpp:320] Persisted replica status to VOTING I1104 03:37:55.532790 28390 recover.cpp:568] Successfully joined the Paxos group I1104 03:37:55.533107 28390 recover.cpp:452] Recover process terminated I1104 03:37:55.533604 28382 log.cpp:656] Attempting to start the writer I1104 03:37:55.534840 28381 replica.cpp:474] Replica received implicit promise request with proposal 1 I1104 03:37:55.535188 28381 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 321021ns I1104 03:37:55.535212 28381 replica.cpp:342] Persisted promised to 1 I1104 03:37:55.535893 28378 coordinator.cpp:230] Coordinator attemping to fill missing position I1104 03:37:55.537318 28392 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I1104 03:37:55.537719 28392 leveldb.cpp:343] Persisting action (8 bytes) to
[jira] [Commented] (MESOS-2377) Fix leak in libevent's version EventLoop::delay
[ https://issues.apache.org/jira/browse/MESOS-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329354#comment-14329354 ] Niklas Quarfot Nielsen commented on MESOS-2377: --- Can you add the review link? Fix leak in libevent's version EventLoop::delay --- Key: MESOS-2377 URL: https://issues.apache.org/jira/browse/MESOS-2377 Project: Mesos Issue Type: Bug Components: libprocess Affects Versions: 0.22.0 Reporter: Joris Van Remoortere Assignee: Joris Van Remoortere Priority: Blocker Labels: libprocess Fix For: 0.22.0 I was finally able to verify the delay event is leaking through massif. Easy fix. Patch coming soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2377) Fix leak in libevent's version EventLoop::delay
[ https://issues.apache.org/jira/browse/MESOS-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329360#comment-14329360 ] Joris Van Remoortere commented on MESOS-2377: - https://reviews.apache.org/r/31218/ Fix leak in libevent's version EventLoop::delay --- Key: MESOS-2377 URL: https://issues.apache.org/jira/browse/MESOS-2377 Project: Mesos Issue Type: Bug Components: libprocess Affects Versions: 0.22.0 Reporter: Joris Van Remoortere Assignee: Joris Van Remoortere Priority: Blocker Labels: libprocess Fix For: 0.22.0 I was finally able to verify the delay event is leaking through massif. Easy fix. Patch coming soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)