Review Request 18945: Add --version flag to mesos master and slave
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18945/ --- Review request for mesos. Bugs: MESOS-1068 https://issues.apache.org/jira/browse/MESOS-1068 Repository: mesos-git Description --- Subj. Looks like a bit copy-paste, please advise where to put this to be shared between slave and master code. I can see some useful defines like -DPACKAGE_STRING during build, but I don't see if it's used somewhere. Diffs - src/master/flags.hpp 159b2de5878927613ba94f81005dba601f072026 src/master/main.cpp 4c74a1b387dc99aa223cb9cf8a096d3b4a126a0a src/slave/flags.hpp e4d98a53cbfb7f9ca828f17e82d492274cb9969d src/slave/main.cpp a498a6ae6a79c7155c07a5d6dc2d6c9dc8ae060f Diff: https://reviews.apache.org/r/18945/diff/ Testing --- make check passes, manually tested --version option on master and slave Thanks, Nikita Vetoshkin
Re: Review Request 18945: Add --version flag to mesos master and slave
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18945/#review36604 --- Ship it! src/master/main.cpp https://reviews.apache.org/r/18945/#comment67651 This is already included above. - Benjamin Hindman On March 9, 2014, 5:29 p.m., Nikita Vetoshkin wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18945/ --- (Updated March 9, 2014, 5:29 p.m.) Review request for mesos. Bugs: MESOS-1068 https://issues.apache.org/jira/browse/MESOS-1068 Repository: mesos-git Description --- Subj. Looks like a bit copy-paste, please advise where to put this to be shared between slave and master code. I can see some useful defines like -DPACKAGE_STRING during build, but I don't see if it's used somewhere. Diffs - src/master/flags.hpp 159b2de5878927613ba94f81005dba601f072026 src/master/main.cpp 4c74a1b387dc99aa223cb9cf8a096d3b4a126a0a src/slave/flags.hpp e4d98a53cbfb7f9ca828f17e82d492274cb9969d src/slave/main.cpp a498a6ae6a79c7155c07a5d6dc2d6c9dc8ae060f Diff: https://reviews.apache.org/r/18945/diff/ Testing --- make check passes, manually tested --version option on master and slave Thanks, Nikita Vetoshkin
Review Request 18946: Moved JNI code to separate library
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18946/ --- Review request for mesos, Adam B, Ben Mahler, Niklas Nielsen, and Vinod Kone. Bugs: MESOS-855 https://issues.apache.org/jira/browse/MESOS-855 Repository: mesos-git Description --- Introduced a new environment variable (MESOS_NATIVE_JAVA_LIBRARY). That variable points towards libmesos_java. libmesos_java contains the JNI- specific code (formally part of libmesos) and dynamically links against libmesos. A typical java-based framework relies on mesos.jar to do the loading but may use some extra logic in its startup to make sure MESOS_NATIVE[_JAVA]_LIBRARY is set/valid. That extra-logic would need to be adapted to use the new environment variable instead of the old one. Diffs - bin/mesos-slave-flags.sh.in dc73aef src/Makefile.am 61d832b src/java/generated/org/apache/mesos/MesosNativeLibrary.java.in 231d1e2 Diff: https://reviews.apache.org/r/18946/diff/ Testing --- make check and functional testing with external, java based frameworks Thanks, Till Toenshoff
Re: Review Request 18946: Moved JNI code to separate library
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18946/#review36617 --- Bad patch! Reviews applied: [18946] Failed command: make -j3 check GTEST_FILTER='' /dev/null Error: ev.c:1531:31: warning: 'ev_default_loop_ptr' initialized and declared 'extern' [enabled by default] ev.c: In function 'evpipe_write': ev.c:2160:17: warning: ignoring return value of 'write', declared with attribute warn_unused_result [-Wunused-result] ev.c:2172:17: warning: ignoring return value of 'write', declared with attribute warn_unused_result [-Wunused-result] ev.c: In function 'pipecb': ev.c:2193:16: warning: ignoring return value of 'read', declared with attribute warn_unused_result [-Wunused-result] ev.c:2207:16: warning: ignoring return value of 'read', declared with attribute warn_unused_result [-Wunused-result] In file included from /usr/include/c++/4.6/ext/hash_set:61:0, from src/glog/stl_logging.h:54, from src/stl_logging_unittest.cc:34: /usr/include/c++/4.6/backward/backward_warning.h:33:2: warning: #warning This file includes at least one deprecated or antiquated header which may be removed without further notice at a future date. Please use a non-deprecated interface with equivalent functionality instead. For a listing of replacement headers and interfaces, consult the file backward_warning.h. To disable this warning use -Wno-deprecated. [-Wcpp] In file included from src/utilities.h:73:0, from src/googletest.h:38, from src/stl_logging_unittest.cc:48: src/base/mutex.h:137:0: warning: _XOPEN_SOURCE redefined [enabled by default] /usr/include/features.h:166:0: note: this is the location of the previous definition warning: no files found matching 'Makefile' under directory 'docs' warning: no files found matching 'indexsidebar.html' under directory 'docs' zip_safe flag not set; analyzing archive contents... /usr/bin/ld: cannot find -lmesos collect2: ld returned 1 exit status make[2]: *** [libmesos_java.la] Error 1 make[2]: *** Waiting for unfinished jobs make[1]: *** [check] Error 2 make: *** [check-recursive] Error 1 - Mesos ReviewBot On March 9, 2014, 7 p.m., Till Toenshoff wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18946/ --- (Updated March 9, 2014, 7 p.m.) Review request for mesos, Adam B, Ben Mahler, Niklas Nielsen, and Vinod Kone. Bugs: MESOS-855 https://issues.apache.org/jira/browse/MESOS-855 Repository: mesos-git Description --- Introduced a new environment variable (MESOS_NATIVE_JAVA_LIBRARY). That variable points towards libmesos_java. libmesos_java contains the JNI- specific code (formally part of libmesos) and dynamically links against libmesos. A typical java-based framework relies on mesos.jar to do the loading but may use some extra logic in its startup to make sure MESOS_NATIVE[_JAVA]_LIBRARY is set/valid. That extra-logic would need to be adapted to use the new environment variable instead of the old one. Diffs - bin/mesos-slave-flags.sh.in dc73aef src/Makefile.am 61d832b src/java/generated/org/apache/mesos/MesosNativeLibrary.java.in 231d1e2 Diff: https://reviews.apache.org/r/18946/diff/ Testing --- make check and functional testing with external, java based frameworks Thanks, Till Toenshoff
Review Request 18947: Fixed Python-Egg to adhere to --without-included-zookeeper
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18947/ --- Review request for mesos, Adam B, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone. Bugs: MESOS-550 https://issues.apache.org/jira/browse/MESOS-550 Repository: mesos-git Description --- The python-egg build-process now checks if libzookeeper_mt.a has been produced. Added LDFLAGS propagation into setup.py. Diffs - src/python/setup.py.in 02f00ef Diff: https://reviews.apache.org/r/18947/diff/ Testing --- ../configure CPPFLAGS=-I/usr/local/include/zookeeper LDFLAGS=-L/usr/local/lib --without-included-zookeeper make check and ../configure make check Thanks, Till Toenshoff
Re: Review Request 18947: Fixed Python-Egg to adhere to --without-included-zookeeper
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18947/#review36618 --- Bad patch! Reviews applied: [18947] Failed command: make -j3 check GTEST_FILTER='' /dev/null Error: ev.c:1531:31: warning: 'ev_default_loop_ptr' initialized and declared 'extern' [enabled by default] ev.c: In function 'evpipe_write': ev.c:2160:17: warning: ignoring return value of 'write', declared with attribute warn_unused_result [-Wunused-result] ev.c:2172:17: warning: ignoring return value of 'write', declared with attribute warn_unused_result [-Wunused-result] ev.c: In function 'pipecb': ev.c:2193:16: warning: ignoring return value of 'read', declared with attribute warn_unused_result [-Wunused-result] ev.c:2207:16: warning: ignoring return value of 'read', declared with attribute warn_unused_result [-Wunused-result] In file included from /usr/include/c++/4.6/ext/hash_set:61:0, from src/glog/stl_logging.h:54, from src/stl_logging_unittest.cc:34: /usr/include/c++/4.6/backward/backward_warning.h:33:2: warning: #warning This file includes at least one deprecated or antiquated header which may be removed without further notice at a future date. Please use a non-deprecated interface with equivalent functionality instead. For a listing of replacement headers and interfaces, consult the file backward_warning.h. To disable this warning use -Wno-deprecated. [-Wcpp] In file included from src/utilities.h:73:0, from src/googletest.h:38, from src/stl_logging_unittest.cc:48: src/base/mutex.h:137:0: warning: _XOPEN_SOURCE redefined [enabled by default] /usr/include/features.h:166:0: note: this is the location of the previous definition warning: no files found matching 'Makefile' under directory 'docs' warning: no files found matching 'indexsidebar.html' under directory 'docs' zip_safe flag not set; analyzing archive contents... WARNING: '.' not a valid package name; please use only.-separated package names in setup.py package init file 'src/__init__.py' not found (or not a regular file) cc1plus: warning: command line option '-Wstrict-prototypes' is valid for Ada/C/ObjC but not for C++ [enabled by default] cc1plus: warning: command line option '-Wstrict-prototypes' is valid for Ada/C/ObjC but not for C++ [enabled by default] cc1plus: warning: command line option '-Wstrict-prototypes' is valid for Ada/C/ObjC but not for C++ [enabled by default] cc1plus: warning: command line option '-Wstrict-prototypes' is valid for Ada/C/ObjC but not for C++ [enabled by default] cc1plus: warning: command line option '-Wstrict-prototypes' is valid for Ada/C/ObjC but not for C++ [enabled by default] g++: error: : No such file or directory error: command 'g++' failed with exit status 1 make[2]: *** [python/dist/mesos-0.19.0-py2.7-linux-x86_64.egg] Error 1 make[2]: *** Waiting for unfinished jobs make[1]: *** [check] Error 2 make: *** [check-recursive] Error 1 - Mesos ReviewBot On March 9, 2014, 10:53 p.m., Till Toenshoff wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18947/ --- (Updated March 9, 2014, 10:53 p.m.) Review request for mesos, Adam B, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone. Bugs: MESOS-550 https://issues.apache.org/jira/browse/MESOS-550 Repository: mesos-git Description --- The python-egg build-process now checks if libzookeeper_mt.a has been produced. Added LDFLAGS propagation into setup.py. Diffs - src/python/setup.py.in 02f00ef Diff: https://reviews.apache.org/r/18947/diff/ Testing --- ../configure CPPFLAGS=-I/usr/local/include/zookeeper LDFLAGS=-L/usr/local/lib --without-included-zookeeper make check and ../configure make check Thanks, Till Toenshoff
Re: Review Request 18947: Fixed Python-Egg to adhere to --without-included-zookeeper
On March 9, 2014, 11:18 p.m., Mesos ReviewBot wrote: Bad patch! Reviews applied: [18947] Failed command: make -j3 check GTEST_FILTER='' /dev/null Error: ev.c:1531:31: warning: 'ev_default_loop_ptr' initialized and declared 'extern' [enabled by default] ev.c: In function 'evpipe_write': ev.c:2160:17: warning: ignoring return value of 'write', declared with attribute warn_unused_result [-Wunused-result] ev.c:2172:17: warning: ignoring return value of 'write', declared with attribute warn_unused_result [-Wunused-result] ev.c: In function 'pipecb': ev.c:2193:16: warning: ignoring return value of 'read', declared with attribute warn_unused_result [-Wunused-result] ev.c:2207:16: warning: ignoring return value of 'read', declared with attribute warn_unused_result [-Wunused-result] In file included from /usr/include/c++/4.6/ext/hash_set:61:0, from src/glog/stl_logging.h:54, from src/stl_logging_unittest.cc:34: /usr/include/c++/4.6/backward/backward_warning.h:33:2: warning: #warning This file includes at least one deprecated or antiquated header which may be removed without further notice at a future date. Please use a non-deprecated interface with equivalent functionality instead. For a listing of replacement headers and interfaces, consult the file backward_warning.h. To disable this warning use -Wno-deprecated. [-Wcpp] In file included from src/utilities.h:73:0, from src/googletest.h:38, from src/stl_logging_unittest.cc:48: src/base/mutex.h:137:0: warning: _XOPEN_SOURCE redefined [enabled by default] /usr/include/features.h:166:0: note: this is the location of the previous definition warning: no files found matching 'Makefile' under directory 'docs' warning: no files found matching 'indexsidebar.html' under directory 'docs' zip_safe flag not set; analyzing archive contents... WARNING: '.' not a valid package name; please use only.-separated package names in setup.py package init file 'src/__init__.py' not found (or not a regular file) cc1plus: warning: command line option '-Wstrict-prototypes' is valid for Ada/C/ObjC but not for C++ [enabled by default] cc1plus: warning: command line option '-Wstrict-prototypes' is valid for Ada/C/ObjC but not for C++ [enabled by default] cc1plus: warning: command line option '-Wstrict-prototypes' is valid for Ada/C/ObjC but not for C++ [enabled by default] cc1plus: warning: command line option '-Wstrict-prototypes' is valid for Ada/C/ObjC but not for C++ [enabled by default] cc1plus: warning: command line option '-Wstrict-prototypes' is valid for Ada/C/ObjC but not for C++ [enabled by default] g++: error: : No such file or directory error: command 'g++' failed with exit status 1 make[2]: *** [python/dist/mesos-0.19.0-py2.7-linux-x86_64.egg] Error 1 make[2]: *** Waiting for unfinished jobs make[1]: *** [check] Error 2 make: *** [check-recursive] Error 1 Interesting failure. Seems to be caused when LDFLAGS are getting stuffed into EXTRA_LINK_ARGS - not on OSX but only on linux systems. Workaround upcoming... - Till --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18947/#review36618 --- On March 9, 2014, 10:53 p.m., Till Toenshoff wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18947/ --- (Updated March 9, 2014, 10:53 p.m.) Review request for mesos, Adam B, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone. Bugs: MESOS-550 https://issues.apache.org/jira/browse/MESOS-550 Repository: mesos-git Description --- The python-egg build-process now checks if libzookeeper_mt.a has been produced. Added LDFLAGS propagation into setup.py. Diffs - src/python/setup.py.in 02f00ef Diff: https://reviews.apache.org/r/18947/diff/ Testing --- ../configure CPPFLAGS=-I/usr/local/include/zookeeper LDFLAGS=-L/usr/local/lib --without-included-zookeeper make check and ../configure make check Thanks, Till Toenshoff
Re: Review Request 18947: Fixed Python-Egg to adhere to --without-included-zookeeper
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18947/ --- (Updated March 10, 2014, 2:27 a.m.) Review request for mesos, Adam B, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone. Changes --- Addressed build error on linux/gcc with empty LDFLAGS. Bugs: MESOS-550 https://issues.apache.org/jira/browse/MESOS-550 Repository: mesos-git Description --- The python-egg build-process now checks if libzookeeper_mt.a has been produced. Added LDFLAGS propagation into setup.py. Diffs (updated) - src/python/setup.py.in 02f00ef Diff: https://reviews.apache.org/r/18947/diff/ Testing (updated) --- ../configure --without-included-zookeeper CPPFLAGS=-I/usr/local/include/zookeeper LDFLAGS=-L/usr/local/lib LIBS=-lzookeeper_mt make check and ../configure make check Thanks, Till Toenshoff
Re: Review Request 18826: Add developer tools, update list of frameworks.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18826/ --- (Updated March 10, 2014, 2:41 a.m.) Review request for mesos. Changes --- Reorganize frameworks into sections. Add eBay and PayPal as adopters. Repository: mesos-git Description --- See summary Diffs (updated) - docs/home.md 84484b4 docs/mesos-frameworks.md 62838c9 docs/powered-by-mesos.md c09bf55 docs/tools.md c71b876 Diff: https://reviews.apache.org/r/18826/diff/ Testing --- Thanks, Tobi Knaup
Re: Review Request 18947: Fixed Python-Egg to adhere to --without-included-zookeeper
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18947/#review36621 --- Patch looks great! Reviews applied: [18947] All tests passed. - Mesos ReviewBot On March 10, 2014, 2:27 a.m., Till Toenshoff wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18947/ --- (Updated March 10, 2014, 2:27 a.m.) Review request for mesos, Adam B, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone. Bugs: MESOS-550 https://issues.apache.org/jira/browse/MESOS-550 Repository: mesos-git Description --- The python-egg build-process now checks if libzookeeper_mt.a has been produced. Added LDFLAGS propagation into setup.py. Diffs - src/python/setup.py.in 02f00ef Diff: https://reviews.apache.org/r/18947/diff/ Testing --- ../configure --without-included-zookeeper CPPFLAGS=-I/usr/local/include/zookeeper LDFLAGS=-L/usr/local/lib LIBS=-lzookeeper_mt make check and ../configure make check Thanks, Till Toenshoff
Re: Review Request 18946: Moved JNI code to separate library
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18946/#review36626 --- Patch looks great! Reviews applied: [18946] All tests passed. - Mesos ReviewBot On March 10, 2014, 3:27 a.m., Till Toenshoff wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18946/ --- (Updated March 10, 2014, 3:27 a.m.) Review request for mesos, Adam B, Ben Mahler, Niklas Nielsen, and Vinod Kone. Bugs: MESOS-855 https://issues.apache.org/jira/browse/MESOS-855 Repository: mesos-git Description --- Introduced a new environment variable (MESOS_NATIVE_JAVA_LIBRARY). That variable points towards libmesos_java. libmesos_java contains the JNI- specific code (formally part of libmesos) and dynamically links against libmesos. A typical java-based framework relies on mesos.jar to do the loading but may use some extra logic in its startup to make sure MESOS_NATIVE[_JAVA]_LIBRARY is set/valid. That extra-logic would need to be adapted to use the new environment variable instead of the old one. Diffs - bin/mesos-slave-flags.sh.in dc73aef src/Makefile.am 384b312 src/java/generated/org/apache/mesos/MesosNativeLibrary.java.in 231d1e2 Diff: https://reviews.apache.org/r/18946/diff/ Testing --- make check and functional testing with external, java based frameworks Thanks, Till Toenshoff
Build failed in Jenkins: Mesos-Trunk-Ubuntu-Build-In-Src-Set-JAVA_HOME #1684
See https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-In-Src-Set-JAVA_HOME/1684/ -- Started by an SCM change Started by an SCM change Building remotely on ubuntu2 in workspace https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-In-Src-Set-JAVA_HOME/ws/ Fetching changes from the remote Git repository Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/mesos.git FATAL: Failed to fetch from https://git-wip-us.apache.org/repos/asf/mesos.git hudson.plugins.git.GitException: Failed to fetch from https://git-wip-us.apache.org/repos/asf/mesos.git at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:623) at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:855) at hudson.plugins.git.GitSCM.checkout(GitSCM.java:880) at hudson.model.AbstractProject.checkout(AbstractProject.java:1411) at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:651) at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:560) at hudson.model.Run.execute(Run.java:1670) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:231) Caused by: hudson.plugins.git.GitException: Command git fetch --tags --progress https://git-wip-us.apache.org/repos/asf/mesos.git +refs/heads/*:refs/remotes/origin/* returned status code 128: stdout: stderr: remote: Counting objects: 39, done. remote: Compressing objects: 4% (1/25) remote: Compressing objects: 8% (2/25) remote: Compressing objects: 12% (3/25) remote: Compressing objects: 16% (4/25) remote: Compressing objects: 20% (5/25) remote: Compressing objects: 24% (6/25) remote: Compressing objects: 28% (7/25) remote: Compressing objects: 32% (8/25) remote: Compressing objects: 36% (9/25) remote: Compressing objects: 40% (10/25) remote: Compressing objects: 44% (11/25) remote: Compressing objects: 48% (12/25) remote: Compressing objects: 52% (13/25) remote: Compressing objects: 56% (14/25) remote: Compressing objects: 60% (15/25) remote: Compressing objects: 64% (16/25) remote: Compressing objects: 68% (17/25) remote: Compressing objects: 72% (18/25) remote: Compressing objects: 76% (19/25) remote: Compressing objects: 80% (20/25) remote: Compressing objects: 84% (21/25) remote: Compressing objects: 88% (22/25) remote: Compressing objects: 92% (23/25) remote: Compressing objects: 96% (24/25) remote: Compressing objects: 100% (25/25) remote: Compressing objects: 100% (25/25), done. remote: Total 25 (delta 21), reused 0 (delta 0) error: unable to create temporary sha1 filename .git/objects/4b: Input/output error fatal: failed to write object fatal: unpack-objects failed at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1173) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1043) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.access$200(CliGitAPIImpl.java:74) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$1.execute(CliGitAPIImpl.java:207) at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:153) at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:146) at hudson.remoting.UserRequest.perform(UserRequest.java:118) at hudson.remoting.UserRequest.perform(UserRequest.java:48) at hudson.remoting.Request$2.run(Request.java:328) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744)
[jira] [Commented] (MESOS-890) Figure out a way to migrate a live Mesos cluster to a different ZooKeeper cluster
[ https://issues.apache.org/jira/browse/MESOS-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925447#comment-13925447 ] Vinod Kone commented on MESOS-890: -- Having slaves and schedulers be able to do leader detection across multiple zookeeper clusters requires significant amount of code change in mesos. The arguments against it are the same as what Raul pointed in #2. Figure out a way to migrate a live Mesos cluster to a different ZooKeeper cluster - Key: MESOS-890 URL: https://issues.apache.org/jira/browse/MESOS-890 Project: Mesos Issue Type: Improvement Components: master Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales I've been chatting with [~vinodkone] about approaching a live ZK cluster migration. Here are the options we came up with. For the descriptions we treat `zk1` as the current working cluster, `obs` as a bunch of ZooKeeper Observers [1] and `zk2` as the new cluster to which we need to migrate. Approach #1: Using Observers With this option we need to: * add obs to zk1 * restart slaves to have them use obs to find their master * restart the framework having it use obs to find the mesos master * restart the mesos masters having them use obs to perform their election * we then stop all ZK obs and remove their data (since they will need to sync up with an entirely new cluster, we need to lose the old data) * we restart ZK obs having them be part of zk2 * at this point the slaves, the framework and the masters can reach the ZK obs again and an election happens * optionally you can restart slaves, the framework and masters again using zk2 instead of the ZK obs if you wanted to decommission them. This assumes that we can do the last three steps in 75 secs (75 secs being the slave health check timeout). This is a reasonable assumption if the data size in zk2 is small enough to ensure that the ZK obs can sync up quickly with zk2. If zk2 is a new cluster with no data then this should be very fast. The good things of this approach are: * no mesos code change * it is very easy to rollback half way through, if need be The hard issues are: * Manipulating the ZK obs (i.e.: stopping, removing the data from zk1 and starting again) needs to be done with care. Messing up configs or not removing the data from zk1 on any of the ZK obs will cause problems * we need to restart all slaves to have them use the ZK obs instead of connecting to zk1 directly. But with slave recovery this isn't an issue, just an extra step. * same thing for the framework and the masters Approach #2: Dual publishing from mesos masters With this option we would augment the election handling code in mesos masters to have it deal with the notion of a primary and secondary ZK clusters. Master registration and election would then work as follows: * create an ephemeral|sequential znode in zk1 (i.e.: /path/to/znode/mesos_23) * create an ephemeral, but not sequential, znode in zk2 with the exact same path as what was created in zk1 (i.e.: /path/to/znode/mesos_23) * make sure both sessions, in zk1 and zk2, are always in the same state (i.e.: if one expires, the other one should be closed, etc.) For now, lets omit a few implementation details which might need extra care and assume we can make this work consistently in such a way that zk2 reflects accurately elections that happen in zk1. This means that regardless of being connected to zk1 or zk2, you always get the same master. Once we have this the migration steps would be: * restart slaves to have them use zk2 where masters can be found by virtue of what we implemented above * restart the framework so that it finds the mesos master in zk2 * stop all mesos masters (they all need to be stopped before moving to the next step) * start all mesos masters using zk2 as its primary and only cluster Again, this assumes we can do the last two steps in 75 secs (or if we needed to, we could bump the slave health check timeout). Which, again, sounds achievable given that masters have no state and their start-up time is very short. The good things of this approach are: - no tinkering with extra ZK servers nor with ZK configs The hard issues are: - extra code needs to be added to the election handling bits of mesos master to address a very rare, but probable, use-case of cluster migration. It might take a bit of time to get that code right. - it's easier to end up with a bad state if any of the mesos masters ends up with a bad config or is restarted earlier and ends up publishing differently than the other masters. This could lead to elections with differing results. Thoughts? [1]
Re: What happens when I call reconcileTasks and database divergence
Hey David, You might want to look at Aurora and Marathon to see how they do state reconciliation. We are working on a new feature, adding persistent state to master (MESOS-764) https://issues.apache.org/jira/browse/MESOS-764, that should make reconciliation even easier.