[jira] [Updated] (MESOS-2708) Design doc for the Executor HTTP API
[ https://issues.apache.org/jira/browse/MESOS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-2708: -- Sprint: Mesosphere Sprint 17 Design doc for the Executor HTTP API Key: MESOS-2708 URL: https://issues.apache.org/jira/browse/MESOS-2708 Project: Mesos Issue Type: Bug Reporter: Alexander Rojas Assignee: Anand Mazumdar Labels: mesosphere This tracks the design of the Executor HTTP API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3319) Mesos will not build when configured with gperftools enabled
Greg Mann created MESOS-3319: Summary: Mesos will not build when configured with gperftools enabled Key: MESOS-3319 URL: https://issues.apache.org/jira/browse/MESOS-3319 Project: Mesos Issue Type: Bug Reporter: Greg Mann Mesos configured with {{--enable-perftools}} currently will not build on OSX 10.10.4 or Ubuntu 14.04, possibly because the bundled gperftools-2.0 is not current. The stable release is now 2.4, which builds successfully on both of these platforms. This issue is resolved when Mesos will build successfully out of the box with gperftools enabled. After this ticket is resolved, the libprocess profiler should be tested to confirm that it still works and if not, it should be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2466) Write documentation for all the LIBPROCESS_* environment variables.
[ https://issues.apache.org/jira/browse/MESOS-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715346#comment-14715346 ] Greg Mann commented on MESOS-2466: -- It seems Mesos will currently not build successfully with gperftools enabled (upon which the profiler depends), so I removed that environment variable from this ticket and created new issues - MESOS-3319 MESOS-3320 - for fixing the gperftools build and documenting the env. var., respectively. Write documentation for all the LIBPROCESS_* environment variables. --- Key: MESOS-2466 URL: https://issues.apache.org/jira/browse/MESOS-2466 Project: Mesos Issue Type: Documentation Reporter: Alexander Rojas Assignee: Greg Mann Labels: documentation, mesosphere libprocess uses a set of environment variables to modify its behaviour; however, these variables are not documented anywhere, nor it is defined where the documentation should be. What would be needed is a decision whether the environment variables should be documented (a new doc file or reusing an existing one), and then add the documentation there. After searching in the code, these are the variables which need to be documented: # {{LIBPROCESS_IP}} # {{LIBPROCESS_PORT}} # {{LIBPROCESS_ADVERTISE_IP}} # {{LIBPROCESS_ADVERTISE_PORT}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-3319) Mesos will not build when configured with gperftools enabled
[ https://issues.apache.org/jira/browse/MESOS-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann reassigned MESOS-3319: Assignee: Greg Mann Mesos will not build when configured with gperftools enabled Key: MESOS-3319 URL: https://issues.apache.org/jira/browse/MESOS-3319 Project: Mesos Issue Type: Bug Reporter: Greg Mann Assignee: Greg Mann Labels: build Mesos configured with {{--enable-perftools}} currently will not build on OSX 10.10.4 or Ubuntu 14.04, possibly because the bundled gperftools-2.0 is not current. The stable release is now 2.4, which builds successfully on both of these platforms. This issue is resolved when Mesos will build successfully out of the box with gperftools enabled. After this ticket is resolved, the libprocess profiler should be tested to confirm that it still works and if not, it should be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3307) Configurable size of completed task / framework history
[ https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715153#comment-14715153 ] Alexander Rukletsov commented on MESOS-3307: [~bobrik], you should be able to get the list of endpoints by hitting {{/help}} endpoint. I think history size is also an option, my feeling is however that we need a more general solution rather than a band-aid. I would also like [~jmlvanre] to chime in. Configurable size of completed task / framework history --- Key: MESOS-3307 URL: https://issues.apache.org/jira/browse/MESOS-3307 Project: Mesos Issue Type: Bug Reporter: Ian Babrou We try to make Mesos work with multiple frameworks and mesos-dns at the same time. The goal is to have set of frameworks per team / project on a single Mesos cluster. At this point our mesos state.json is at 4mb and it takes a while to assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively pushing mesos-master CPU usage through the roof. It's at 100%+ all the time. Here's the problem: {noformat} mesos λ curl -s http://mesos-master:5050/master/state.json | jq .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n 1 20150606-001827-252388362-5050-5982-0003 16 20150606-001827-252388362-5050-5982-0005 18 20150606-001827-252388362-5050-5982-0029 73 20150606-001827-252388362-5050-5982-0007 141 20150606-001827-252388362-5050-5982-0009 154 20150820-154817-302720010-5050-15320- 289 20150606-001827-252388362-5050-5982-0004 510 20150606-001827-252388362-5050-5982-0012 666 20150606-001827-252388362-5050-5982-0028 923 20150116-002612-269165578-5050-32204-0003 1000 20150606-001827-252388362-5050-5982-0001 1000 20150606-001827-252388362-5050-5982-0006 1000 20150606-001827-252388362-5050-5982-0010 1000 20150606-001827-252388362-5050-5982-0011 1000 20150606-001827-252388362-5050-5982-0027 mesos λ fgrep 1000 -r src/master src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 10; src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = 1000; {noformat} Active tasks are just 6% of state.json response: {noformat} mesos λ cat ~/temp/mesos-state.json | jq -c . | wc 1 14796 4138942 mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc 16 37 252774 {noformat} I see four options that can improve the situation: 1. Add query string param to exclude completed tasks from state.json and use it in mesos-dns and similar tools. There is no need for mesos-dns to know about completed tasks, it's just extra load on master and mesos-dns. 2. Make history size configurable. 3. Make JSON serialization faster. With 1s of tasks even without history it would take a lot of time to serialize tasks for mesos-dns. Doing it every 60 seconds instead of every 5 seconds isn't really an option. 4. Create event bus for mesos master. Marathon has it and it'd be nice to have it in Mesos. This way mesos-dns could avoid polling master state and switch to listening for events. All can be done independently. Note to mesosphere folks: please start distributing debug symbols with your distribution. I was asking for it for a while and it is really helpful: https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501 Perf report for leading master: !http://i.imgur.com/iz7C3o0.png! I'm on 0.23.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2466) Write documentation for all the LIBPROCESS_* environment variables.
[ https://issues.apache.org/jira/browse/MESOS-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-2466: - Description: libprocess uses a set of environment variables to modify its behaviour; however, these variables are not documented anywhere, nor it is defined where the documentation should be. What would be needed is a decision whether the environment variables should be documented (a new doc file or reusing an existing one), and then add the documentation there. After searching in the code, these are the variables which need to be documented: # {{LIBPROCESS_IP}} # {{LIBPROCESS_PORT}} # {{LIBPROCESS_ADVERTISE_IP}} # {{LIBPROCESS_ADVERTISE_PORT}} was: libprocess uses a set of environment variables to modify its behaviour; however, these variables are not documented anywhere, nor it is defined where the documentation should be. What would be needed is a decision whether the environment variables should be documented (a new doc file or reusing an existing one), and then add the documentation there. After searching in the code, these are the variables which need to be documented: # {{LIBPROCESS_ENABLE_PROFILER}} # {{LIBPROCESS_IP}} # {{LIBPROCESS_PORT}} # {{LIBPROCESS_ADVERTISE_IP}} # {{LIBPROCESS_ADVERTISE_PORT}} Write documentation for all the LIBPROCESS_* environment variables. --- Key: MESOS-2466 URL: https://issues.apache.org/jira/browse/MESOS-2466 Project: Mesos Issue Type: Documentation Reporter: Alexander Rojas Assignee: Greg Mann Labels: documentation, mesosphere libprocess uses a set of environment variables to modify its behaviour; however, these variables are not documented anywhere, nor it is defined where the documentation should be. What would be needed is a decision whether the environment variables should be documented (a new doc file or reusing an existing one), and then add the documentation there. After searching in the code, these are the variables which need to be documented: # {{LIBPROCESS_IP}} # {{LIBPROCESS_PORT}} # {{LIBPROCESS_ADVERTISE_IP}} # {{LIBPROCESS_ADVERTISE_PORT}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3320) Document LIBPROCESS_ENABLE_PROFILER environment variable
Greg Mann created MESOS-3320: Summary: Document LIBPROCESS_ENABLE_PROFILER environment variable Key: MESOS-3320 URL: https://issues.apache.org/jira/browse/MESOS-3320 Project: Mesos Issue Type: Documentation Reporter: Greg Mann This environment variable, used to enable the libprocess profiler, needs to be documented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3320) Document LIBPROCESS_ENABLE_PROFILER environment variable
[ https://issues.apache.org/jira/browse/MESOS-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715352#comment-14715352 ] Greg Mann commented on MESOS-3320: -- The gperftools build must be fixed, and functionality of the profiler confirmed, before this documentation is added. Document LIBPROCESS_ENABLE_PROFILER environment variable Key: MESOS-3320 URL: https://issues.apache.org/jira/browse/MESOS-3320 Project: Mesos Issue Type: Documentation Reporter: Greg Mann Labels: documentation, libprocess This environment variable, used to enable the libprocess profiler, needs to be documented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3321) Spurious fetcher message about extracting an archive
Kapil Arya created MESOS-3321: - Summary: Spurious fetcher message about extracting an archive Key: MESOS-3321 URL: https://issues.apache.org/jira/browse/MESOS-3321 Project: Mesos Issue Type: Bug Components: fetcher Reporter: Kapil Arya The fetcher emits a spurious log message about not extracting an archive with .tgz extension, even though the tarball is extracted correctly. {code} I0826 19:02:08.304914 2109 logging.cpp:172] INFO level logging started! I0826 19:02:08.305253 2109 fetcher.cpp:413] Fetcher Info: {cache_directory:\/tmp\/mesos\/fetch\/slaves\/20150826-185716-251662764-5050-1-S0\/root,items:[{action:BYPASS_CACHE,uri:{extract:true,value:file:\/\/\/mesos\/sampleflaskapp.tgz}}],sandbox_directory:\/tmp\/mesos\/slaves\/20150826-185716-251662764-5050-1-S0\/frameworks\/20150826-185716-251662764-5050-1-\/executors\/sample-flask-app.f222d202-4c24-11e5-a628-0242ac110011\/runs\/e71f50b8-816d-46d5-bcc6-f9850a0402ed,user:root} I0826 19:02:08.306834 2109 fetcher.cpp:368] Fetching URI 'file:///mesos/sampleflaskapp.tgz' I0826 19:02:08.306864 2109 fetcher.cpp:242] Fetching directly into the sandbox directory I0826 19:02:08.306884 2109 fetcher.cpp:179] Fetching URI 'file:///mesos/sampleflaskapp.tgz' I0826 19:02:08.306900 2109 fetcher.cpp:159] Copying resource with command:cp '/mesos/sampleflaskapp.tgz' '/tmp/mesos/slaves/20150826-185716-251662764-5050-1-S0/frameworks/20150826-185716-251662764-5050-1-/executors/sample-flask-app.f222d202-4c24-11e5-a628-0242ac110011/runs/e71f50b8-816d-46d5-bcc6-f9850a0402ed/sampleflaskapp.tgz' I0826 19:02:08.309063 2109 fetcher.cpp:76] Extracting with command: tar -C '/tmp/mesos/slaves/20150826-185716-251662764-5050-1-S0/frameworks/20150826-185716-251662764-5050-1-/executors/sample-flask-app.f222d202-4c24-11e5-a628-0242ac110011/runs/e71f50b8-816d-46d5-bcc6-f9850a0402ed' -xf '/tmp/mesos/slaves/20150826-185716-251662764-5050-1-S0/frameworks/20150826-185716-251662764-5050-1-/executors/sample-flask-app.f222d202-4c24-11e5-a628-0242ac110011/runs/e71f50b8-816d-46d5-bcc6-f9850a0402ed/sampleflaskapp.tgz' I0826 19:02:08.315313 2109 fetcher.cpp:84] Extracted '/tmp/mesos/slaves/20150826-185716-251662764-5050-1-S0/frameworks/20150826-185716-251662764-5050-1-/executors/sample-flask-app.f222d202-4c24-11e5-a628-0242ac110011/runs/e71f50b8-816d-46d5-bcc6-f9850a0402ed/sampleflaskapp.tgz' into '/tmp/mesos/slaves/20150826-185716-251662764-5050-1-S0/frameworks/20150826-185716-251662764-5050-1-/executors/sample-flask-app.f222d202-4c24-11e5-a628-0242ac110011/runs/e71f50b8-816d-46d5-bcc6-f9850a0402ed' W0826 19:02:08.315381 2109 fetcher.cpp:264] Copying instead of extracting resource from URI with 'extract' flag, because it does not seem to be an archive: file:///mesos/sampleflaskapp.tgz I0826 19:02:08.315604 2109 fetcher.cpp:445] Fetched 'file:///mesos/sampleflaskapp.tgz' to '/tmp/mesos/slaves/20150826-185716-251662764-5050-1-S0/frameworks/20150826-185716-251662764-5050-1-/executors/sample-flask-app.f222d202-4c24-11e5-a628-0242ac110011/runs/e71f50b8-816d-46d5-bcc6-f9850a0402ed/sampleflaskapp.tgz' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3320) Document LIBPROCESS_ENABLE_PROFILER environment variable
[ https://issues.apache.org/jira/browse/MESOS-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-3320: - Labels: documentation libprocess (was: ) Document LIBPROCESS_ENABLE_PROFILER environment variable Key: MESOS-3320 URL: https://issues.apache.org/jira/browse/MESOS-3320 Project: Mesos Issue Type: Documentation Reporter: Greg Mann Labels: documentation, libprocess This environment variable, used to enable the libprocess profiler, needs to be documented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3318) Disabling local message passing causes tests to fail
Joris Van Remoortere created MESOS-3318: --- Summary: Disabling local message passing causes tests to fail Key: MESOS-3318 URL: https://issues.apache.org/jira/browse/MESOS-3318 Project: Mesos Issue Type: Bug Components: libprocess Reporter: Joris Van Remoortere If we add a flag to disable the shortcut of local message passing in libprocess between actors in the same OS process, there are tests that fail. A patch that implemented this behavior can be found here: https://reviews.apache.org/r/33315/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1791) Introduce Master / Offer Resource Reservations aka Quota
[ https://issues.apache.org/jira/browse/MESOS-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14714507#comment-14714507 ] Alexander Rukletsov commented on MESOS-1791: [~hbogert], both reserve resources per role. The differences are that dynamic reservations are tied to particular agents (slaves) and can be controlled by frameworks, while quotas are cluster-wide and managed by operators. I would encourage you to take a look at the design doc (MESOS-2936) for more information. Introduce Master / Offer Resource Reservations aka Quota Key: MESOS-1791 URL: https://issues.apache.org/jira/browse/MESOS-1791 Project: Mesos Issue Type: Epic Components: allocation, master, replicated log Reporter: Tom Arnfeld Assignee: Alexander Rukletsov Labels: mesosphere Currently Mesos supports the ability to reserve resources (for a given role) on a per-slave basis, as introduced in MESOS-505. This allows you to almost statically partition off a set of resources on a set of machines, to guarantee certain types of frameworks get some resources. This is very useful, though it is also very useful to be able to control these reservations through the master (instead of per-slave) for when I don't care which nodes I get on, as long as I get X cpu and Y RAM, or Z sets of (X,Y). I'm not sure what structure this could take, but apparently it has already been discussed. Would this be a CLI flag? Could there be a (authenticated) web interface to control these reservations? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (MESOS-3235) FetcherCacheHttpTest.HttpCachedSerialized and FetcherCacheHttpTest.HttpCachedConcurrent are flaky
[ https://issues.apache.org/jira/browse/MESOS-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-3235: -- Comment: was deleted (was: https://reviews.apache.org/r/37813/) FetcherCacheHttpTest.HttpCachedSerialized and FetcherCacheHttpTest.HttpCachedConcurrent are flaky - Key: MESOS-3235 URL: https://issues.apache.org/jira/browse/MESOS-3235 Project: Mesos Issue Type: Bug Affects Versions: 0.23.0 Reporter: Joseph Wu Assignee: Bernd Mathiske Labels: mesosphere On OSX, {{make clean make -j8 V=0 check}}: {code} [--] 3 tests from FetcherCacheHttpTest [ RUN ] FetcherCacheHttpTest.HttpCachedSerialized HTTP/1.1 200 OK Date: Fri, 07 Aug 2015 17:23:05 GMT Content-Length: 30 I0807 10:23:05.673596 2085372672 exec.cpp:133] Version: 0.24.0 E0807 10:23:05.675884 184373248 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] I0807 10:23:05.675897 182226944 exec.cpp:207] Executor registered on slave 20150807-102305-139395082-52338-52313-S0 E0807 10:23:05.683980 184373248 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] Registered executor on 10.0.79.8 Starting task 0 Forked command at 54363 sh -c './mesos-fetcher-test-cmd 0' E0807 10:23:05.694953 184373248 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] Command exited with status 0 (pid: 54363) E0807 10:23:05.793927 184373248 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] I0807 10:23:06.590008 2085372672 exec.cpp:133] Version: 0.24.0 E0807 10:23:06.592244 355938304 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] I0807 10:23:06.592243 353255424 exec.cpp:207] Executor registered on slave 20150807-102305-139395082-52338-52313-S0 E0807 10:23:06.597995 355938304 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] Registered executor on 10.0.79.8 Starting task 1 Forked command at 54411 sh -c './mesos-fetcher-test-cmd 1' E0807 10:23:06.608708 355938304 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] Command exited with status 0 (pid: 54411) E0807 10:23:06.707649 355938304 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] ../../src/tests/fetcher_cache_tests.cpp:860: Failure Failed to wait 15secs for awaitFinished(task.get()) *** Aborted at 1438968214 (unix time) try date -d @1438968214 if you are using GNU date *** [ FAILED ] FetcherCacheHttpTest.HttpCachedSerialized (28685 ms) [ RUN ] FetcherCacheHttpTest.HttpCachedConcurrent PC: @0x113723618 process::Owned::get() *** SIGSEGV (@0x0) received by PID 52313 (TID 0x118d59000) stack trace: *** @ 0x7fff8fcacf1a _sigtramp @ 0x7f9bc3109710 (unknown) @0x1136f07e2 mesos::internal::slave::Fetcher::fetch() @0x113862f9d mesos::internal::slave::MesosContainerizerProcess::fetch() @0x1138f1b5d _ZZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS2_11ContainerIDERKNS2_11CommandInfoERKNSt3__112basic_stringIcNSC_11char_traitsIcEENSC_9allocatorIcRK6OptionISI_ERKNS2_7SlaveIDES6_S9_SI_SM_SP_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSW_FSU_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_ENKUlPNS_11ProcessBaseEE_clES1D_ @0x1138f18cf _ZNSt3__110__function6__funcIZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS5_11ContainerIDERKNS5_11CommandInfoERKNS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcRK6OptionISK_ERKNS5_7SlaveIDES9_SC_SK_SO_SR_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSY_FSW_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_EUlPNS2_11ProcessBaseEE_NSI_IS1G_EEFvS1F_EEclEOS1F_ @0x1143768cf std::__1::function::operator()() @0x11435ca7f process::ProcessBase::visit() @0x1143ed6fe process::DispatchEvent::visit() @0x11271 process::ProcessBase::serve() @0x114343b4e process::ProcessManager::resume() @0x1143431ca process::internal::schedule() @0x1143da646 _ZNSt3__114__thread_proxyINS_5tupleIJPFvvEEPvS5_ @ 0x7fff95090268 _pthread_body @ 0x7fff950901e5 _pthread_start @ 0x7fff9508e41d thread_start Failed to synchronize with slave (it's probably exited) make[3]: *** [check-local] Segmentation fault: 11 make[2]: *** [check-am] Error 2 make[1]: *** [check] Error 2 make: *** [check-recursive] Error 1 {code} This was encountered just once out of 3+ {{make check}}s. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3235) FetcherCacheHttpTest.HttpCachedSerialized and FetcherCacheHttpTest.HttpCachedConcurrent are flaky
[ https://issues.apache.org/jira/browse/MESOS-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715520#comment-14715520 ] Bernd Mathiske commented on MESOS-3235: --- I have been unable to reproduce this, so I could not debug it. And I looked at the source code and still could not find what caused this failure. So I think the best I can do at the moment is to add additional diagnostic output that may help catch the bug once it shows itself again. To this end I have prepared a patch that dumps the contents of all task/executor sandboxes in play iff a fetcher cache test ends prematurely. https://reviews.apache.org/r/37813/ FetcherCacheHttpTest.HttpCachedSerialized and FetcherCacheHttpTest.HttpCachedConcurrent are flaky - Key: MESOS-3235 URL: https://issues.apache.org/jira/browse/MESOS-3235 Project: Mesos Issue Type: Bug Affects Versions: 0.23.0 Reporter: Joseph Wu Assignee: Bernd Mathiske Labels: mesosphere On OSX, {{make clean make -j8 V=0 check}}: {code} [--] 3 tests from FetcherCacheHttpTest [ RUN ] FetcherCacheHttpTest.HttpCachedSerialized HTTP/1.1 200 OK Date: Fri, 07 Aug 2015 17:23:05 GMT Content-Length: 30 I0807 10:23:05.673596 2085372672 exec.cpp:133] Version: 0.24.0 E0807 10:23:05.675884 184373248 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] I0807 10:23:05.675897 182226944 exec.cpp:207] Executor registered on slave 20150807-102305-139395082-52338-52313-S0 E0807 10:23:05.683980 184373248 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] Registered executor on 10.0.79.8 Starting task 0 Forked command at 54363 sh -c './mesos-fetcher-test-cmd 0' E0807 10:23:05.694953 184373248 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] Command exited with status 0 (pid: 54363) E0807 10:23:05.793927 184373248 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] I0807 10:23:06.590008 2085372672 exec.cpp:133] Version: 0.24.0 E0807 10:23:06.592244 355938304 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] I0807 10:23:06.592243 353255424 exec.cpp:207] Executor registered on slave 20150807-102305-139395082-52338-52313-S0 E0807 10:23:06.597995 355938304 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] Registered executor on 10.0.79.8 Starting task 1 Forked command at 54411 sh -c './mesos-fetcher-test-cmd 1' E0807 10:23:06.608708 355938304 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] Command exited with status 0 (pid: 54411) E0807 10:23:06.707649 355938304 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] ../../src/tests/fetcher_cache_tests.cpp:860: Failure Failed to wait 15secs for awaitFinished(task.get()) *** Aborted at 1438968214 (unix time) try date -d @1438968214 if you are using GNU date *** [ FAILED ] FetcherCacheHttpTest.HttpCachedSerialized (28685 ms) [ RUN ] FetcherCacheHttpTest.HttpCachedConcurrent PC: @0x113723618 process::Owned::get() *** SIGSEGV (@0x0) received by PID 52313 (TID 0x118d59000) stack trace: *** @ 0x7fff8fcacf1a _sigtramp @ 0x7f9bc3109710 (unknown) @0x1136f07e2 mesos::internal::slave::Fetcher::fetch() @0x113862f9d mesos::internal::slave::MesosContainerizerProcess::fetch() @0x1138f1b5d _ZZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS2_11ContainerIDERKNS2_11CommandInfoERKNSt3__112basic_stringIcNSC_11char_traitsIcEENSC_9allocatorIcRK6OptionISI_ERKNS2_7SlaveIDES6_S9_SI_SM_SP_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSW_FSU_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_ENKUlPNS_11ProcessBaseEE_clES1D_ @0x1138f18cf _ZNSt3__110__function6__funcIZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS5_11ContainerIDERKNS5_11CommandInfoERKNS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcRK6OptionISK_ERKNS5_7SlaveIDES9_SC_SK_SO_SR_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSY_FSW_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_EUlPNS2_11ProcessBaseEE_NSI_IS1G_EEFvS1F_EEclEOS1F_ @0x1143768cf std::__1::function::operator()() @0x11435ca7f process::ProcessBase::visit() @0x1143ed6fe process::DispatchEvent::visit() @0x11271 process::ProcessBase::serve() @0x114343b4e process::ProcessManager::resume() @0x1143431ca process::internal::schedule() @0x1143da646 _ZNSt3__114__thread_proxyINS_5tupleIJPFvvEEPvS5_ @ 0x7fff95090268 _pthread_body @ 0x7fff950901e5 _pthread_start @ 0x7fff9508e41d thread_start Failed to synchronize with slave (it's
[jira] [Commented] (MESOS-3310) Support provisioning images specified in volumes.
[ https://issues.apache.org/jira/browse/MESOS-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715548#comment-14715548 ] Jie Yu commented on MESOS-3310: --- commit 33058278e4839fdfaf65b2adc7785e61d74b6775 Author: Jie Yu yujie@gmail.com Date: Mon Aug 24 16:22:08 2015 -0700 Added a filesystem isolator test to test image in volume while the container root filesystem is also specified. Review: https://reviews.apache.org/r/37738 commit 40638c5413266f4a4d5117cde225247ad19b2f55 Author: Jie Yu yujie@gmail.com Date: Mon Aug 24 15:55:11 2015 -0700 Refactored filesystem isolator tests to allow multiple rootfses. Review: https://reviews.apache.org/r/37735 commit da2dfab8c77ae583eff1a5ce54f23f4b17831976 Author: Jie Yu yujie@gmail.com Date: Mon Aug 24 14:23:27 2015 -0700 Used recursive bind mounts for volumes. Review: https://reviews.apache.org/r/37734 commit 347d51ceca849cc26b9ada8f1014e4c578eeb47b Author: Jie Yu yujie@gmail.com Date: Mon Aug 24 12:44:12 2015 -0700 Added support for preparing images specified in volumes. Review: https://reviews.apache.org/r/37726 Support provisioning images specified in volumes. - Key: MESOS-3310 URL: https://issues.apache.org/jira/browse/MESOS-3310 Project: Mesos Issue Type: Task Reporter: Jie Yu Assignee: Jie Yu This is related to MESOS-3095 and MESOS-3227. The idea is that we should allow command executor to run under host filesystem and provision the filesystem for the user. The command line executor will then chroot into user's root filesystem. This solves the issue that the command executor is not launchable in the user specified root filesystem. The design doc is here: https://docs.google.com/document/d/16hyLVRL0nz-KBts1J5stGyxZPniFPbPbs7R-ZRQVCH4/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2466) Write documentation for all the LIBPROCESS_* environment variables.
[ https://issues.apache.org/jira/browse/MESOS-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715670#comment-14715670 ] Greg Mann commented on MESOS-2466: -- Review here: https://reviews.apache.org/r/37814/ Anybody willing to shepherd this little one? :-) [~vinodkone]? [~nnielsen]? Write documentation for all the LIBPROCESS_* environment variables. --- Key: MESOS-2466 URL: https://issues.apache.org/jira/browse/MESOS-2466 Project: Mesos Issue Type: Documentation Reporter: Alexander Rojas Assignee: Greg Mann Labels: documentation, mesosphere libprocess uses a set of environment variables to modify its behaviour; however, these variables are not documented anywhere, nor it is defined where the documentation should be. What would be needed is a decision whether the environment variables should be documented (a new doc file or reusing an existing one), and then add the documentation there. After searching in the code, these are the variables which need to be documented: # {{LIBPROCESS_IP}} # {{LIBPROCESS_PORT}} # {{LIBPROCESS_ADVERTISE_IP}} # {{LIBPROCESS_ADVERTISE_PORT}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3316) provisioner_backend_tests.cpp breaks the build on OSX
[ https://issues.apache.org/jira/browse/MESOS-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712661#comment-14712661 ] Alexander Rojas commented on MESOS-3316: The review [r/37747/|https://reviews.apache.org/r/37747/] introduced the issue. Can you [~xujyan] and your shepherd [~jieyu] take a look at it. provisioner_backend_tests.cpp breaks the build on OSX - Key: MESOS-3316 URL: https://issues.apache.org/jira/browse/MESOS-3316 Project: Mesos Issue Type: Bug Reporter: Alexander Rojas Priority: Blocker Labels: build-failure The test file makes an include of {{linux/fs.hpp}} which in turn includes {{mntent.h}} which is only available in linux. Building in OSX leads to: {noformat} g++ -DPACKAGE_NAME=\mesos\ -DPACKAGE_TARNAME=\mesos\ -DPACKAGE_VERSION=\0.25.0\ -DPACKAGE_STRING=\mesos\ 0.25.0\ -DPACKAGE_BUGREPORT=\\ -DPACKAGE_URL=\\ -DPACKAGE=\mesos\ -DVERSION=\0.25.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -I. -I../../src -Wall -Werror -DLIBDIR=\/usr/local/lib\ -DPKGLIBEXECDIR=\/usr/local/libexec/mesos\ -DPKGDATADIR=\/usr/local/share/mesos\ -I../../include -I../../3rdparty/libprocess/include -I../../3rdparty/libprocess/3rdparty/stout/include -I../include -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 -I../3rdparty/libprocess/3rdparty/picojson-4f93734 -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include -I../3rdparty/zookeeper-3.4.5/src/c/generated -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -DSOURCE_DIR=\/Users/alexander/Documents/workspace/pmesos/build/..\ -DBUILD_DIR=\/Users/alexander/Documents/workspace/pmesos/build\ -I../3rdparty/libprocess/3rdparty/gmock-1.7.0/gtest/include -I../3rdparty/libprocess/3rdparty/gmock-1.7.0/include -I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include -I/usr/local/opt/subversion/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0 -D_THREAD_SAFE -pthread -g -O0 -std=c++11 -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -MT tests/containerizer/mesos_tests-provisioner_backend_tests.o -MD -MP -MF tests/containerizer/.deps/mesos_tests-provisioner_backend_tests.Tpo -c -o tests/containerizer/mesos_tests-provisioner_backend_tests.o `test -f 'tests/containerizer/provisioner_backend_tests.cpp' || echo '../../src/'`tests/containerizer/provisioner_backend_tests.cpp make[3]: Nothing to be done for `../../src/tests/balloon_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/event_call_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/java_exception_test.sh'. make[3]: Nothing to be done for `../../src/tests/java_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/java_log_test.sh'. make[3]: Nothing to be done for `../../src/tests/no_executor_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/persistent_volume_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/python_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/test_framework_test.sh'. In file included from ../../src/tests/containerizer/provisioner_backend_tests.cpp:28: ../../src/linux/fs.hpp:23:10: fatal error: 'mntent.h' file not found #include mntent.h ^ 1 error generated. make[3]: *** [tests/containerizer/mesos_tests-provisioner_backend_tests.o] Error 1 make[2]: *** [check-am] Error 2 make[1]: *** [check] Error 2 make: *** [check-recursive] Error 1 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3070) Master CHECK failure if a framework uses duplicated task id.
[ https://issues.apache.org/jira/browse/MESOS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712678#comment-14712678 ] Klaus Ma commented on MESOS-3070: - Regarding #4.2, framework developers do not need to care about the new field (TaskUID) except in such special cases; framework developers still uses TaskID as before, TaskUID is used internally between Master Slave. Personally, one concern is the effort/risks: all TaskID will be replaced by UUID internally. From this point, #3 (storing tasks in master in a per slave map) seems better, because it does not need to change the interaction between Master Slave. Master CHECK failure if a framework uses duplicated task id. Key: MESOS-3070 URL: https://issues.apache.org/jira/browse/MESOS-3070 Project: Mesos Issue Type: Bug Components: master Affects Versions: 0.22.1 Reporter: Jie Yu Assignee: Klaus Ma We observed this in one of our testing cluster. One framework (under development) keeps launching tasks using the same task_id. We don't expect the master to crash even if the framework is not doing what it's supposed to do. However, under a series of events, this could happen and keeps crashing the master. 1) frameworkA launches task 'task_id_1' on slaveA 2) master fails over 3) slaveA has not re-registered yet 4) frameworkA re-registered and launches task 'task_id_1' on slaveB 5) slaveA re-registering and add task task_id_1' to frameworkA 6) CHECK failure in addTask {noformat} I0716 21:52:50.759305 28805 master.hpp:159] Adding task 'task_id_1' with resources cpus(*):4; mem(*):32768 on slave 20150417-232509-1735470090-5050-48870-S25 (hostname) ... ... F0716 21:52:50.760136 28805 master.hpp:362] Check failed: !tasks.contains(task-task_id()) Duplicate task 'task_id_1' of framework framework_id {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3316) provisioner_backend_tests.cpp breaks the build on OSX
[ https://issues.apache.org/jira/browse/MESOS-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712679#comment-14712679 ] Yan Xu commented on MESOS-3316: --- Sorry for the oversight. Committed a fix. {noformat:title=} commit 5a198ee92c9aa7f14187df7e30d05137fa63b0b3 Author: Jiang Yan Xu y...@jxu.me Date: Wed Aug 26 00:39:40 2015 -0700 Fixed provisioner_backend_tests.cpp which included a Linux-only header unconditionally. {noformat} provisioner_backend_tests.cpp breaks the build on OSX - Key: MESOS-3316 URL: https://issues.apache.org/jira/browse/MESOS-3316 Project: Mesos Issue Type: Bug Reporter: Alexander Rojas Priority: Blocker Labels: build-failure The test file makes an include of {{linux/fs.hpp}} which in turn includes {{mntent.h}} which is only available in linux. Building in OSX leads to: {noformat} g++ -DPACKAGE_NAME=\mesos\ -DPACKAGE_TARNAME=\mesos\ -DPACKAGE_VERSION=\0.25.0\ -DPACKAGE_STRING=\mesos\ 0.25.0\ -DPACKAGE_BUGREPORT=\\ -DPACKAGE_URL=\\ -DPACKAGE=\mesos\ -DVERSION=\0.25.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -I. -I../../src -Wall -Werror -DLIBDIR=\/usr/local/lib\ -DPKGLIBEXECDIR=\/usr/local/libexec/mesos\ -DPKGDATADIR=\/usr/local/share/mesos\ -I../../include -I../../3rdparty/libprocess/include -I../../3rdparty/libprocess/3rdparty/stout/include -I../include -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 -I../3rdparty/libprocess/3rdparty/picojson-4f93734 -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include -I../3rdparty/zookeeper-3.4.5/src/c/generated -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -DSOURCE_DIR=\/Users/alexander/Documents/workspace/pmesos/build/..\ -DBUILD_DIR=\/Users/alexander/Documents/workspace/pmesos/build\ -I../3rdparty/libprocess/3rdparty/gmock-1.7.0/gtest/include -I../3rdparty/libprocess/3rdparty/gmock-1.7.0/include -I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include -I/usr/local/opt/subversion/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0 -D_THREAD_SAFE -pthread -g -O0 -std=c++11 -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -MT tests/containerizer/mesos_tests-provisioner_backend_tests.o -MD -MP -MF tests/containerizer/.deps/mesos_tests-provisioner_backend_tests.Tpo -c -o tests/containerizer/mesos_tests-provisioner_backend_tests.o `test -f 'tests/containerizer/provisioner_backend_tests.cpp' || echo '../../src/'`tests/containerizer/provisioner_backend_tests.cpp make[3]: Nothing to be done for `../../src/tests/balloon_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/event_call_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/java_exception_test.sh'. make[3]: Nothing to be done for `../../src/tests/java_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/java_log_test.sh'. make[3]: Nothing to be done for `../../src/tests/no_executor_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/persistent_volume_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/python_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/test_framework_test.sh'. In file included from ../../src/tests/containerizer/provisioner_backend_tests.cpp:28: ../../src/linux/fs.hpp:23:10: fatal error: 'mntent.h' file not found #include mntent.h ^ 1 error generated. make[3]: *** [tests/containerizer/mesos_tests-provisioner_backend_tests.o] Error 1 make[2]: *** [check-am] Error 2 make[1]: *** [check] Error 2 make: *** [check-recursive] Error 1 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3316) provisioner_backend_tests.cpp breaks the build on OSX
Alexander Rojas created MESOS-3316: -- Summary: provisioner_backend_tests.cpp breaks the build on OSX Key: MESOS-3316 URL: https://issues.apache.org/jira/browse/MESOS-3316 Project: Mesos Issue Type: Bug Reporter: Alexander Rojas Priority: Blocker The test file makes an include of {{linux/fs.hpp}} which in turn includes {{mntent.h}} which is only available in linux. Building in OSX leads to: {noformat} g++ -DPACKAGE_NAME=\mesos\ -DPACKAGE_TARNAME=\mesos\ -DPACKAGE_VERSION=\0.25.0\ -DPACKAGE_STRING=\mesos\ 0.25.0\ -DPACKAGE_BUGREPORT=\\ -DPACKAGE_URL=\\ -DPACKAGE=\mesos\ -DVERSION=\0.25.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -I. -I../../src -Wall -Werror -DLIBDIR=\/usr/local/lib\ -DPKGLIBEXECDIR=\/usr/local/libexec/mesos\ -DPKGDATADIR=\/usr/local/share/mesos\ -I../../include -I../../3rdparty/libprocess/include -I../../3rdparty/libprocess/3rdparty/stout/include -I../include -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 -I../3rdparty/libprocess/3rdparty/picojson-4f93734 -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include -I../3rdparty/zookeeper-3.4.5/src/c/generated -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -DSOURCE_DIR=\/Users/alexander/Documents/workspace/pmesos/build/..\ -DBUILD_DIR=\/Users/alexander/Documents/workspace/pmesos/build\ -I../3rdparty/libprocess/3rdparty/gmock-1.7.0/gtest/include -I../3rdparty/libprocess/3rdparty/gmock-1.7.0/include -I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include -I/usr/local/opt/subversion/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0 -D_THREAD_SAFE -pthread -g -O0 -std=c++11 -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -MT tests/containerizer/mesos_tests-provisioner_backend_tests.o -MD -MP -MF tests/containerizer/.deps/mesos_tests-provisioner_backend_tests.Tpo -c -o tests/containerizer/mesos_tests-provisioner_backend_tests.o `test -f 'tests/containerizer/provisioner_backend_tests.cpp' || echo '../../src/'`tests/containerizer/provisioner_backend_tests.cpp make[3]: Nothing to be done for `../../src/tests/balloon_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/event_call_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/java_exception_test.sh'. make[3]: Nothing to be done for `../../src/tests/java_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/java_log_test.sh'. make[3]: Nothing to be done for `../../src/tests/no_executor_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/persistent_volume_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/python_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/test_framework_test.sh'. In file included from ../../src/tests/containerizer/provisioner_backend_tests.cpp:28: ../../src/linux/fs.hpp:23:10: fatal error: 'mntent.h' file not found #include mntent.h ^ 1 error generated. make[3]: *** [tests/containerizer/mesos_tests-provisioner_backend_tests.o] Error 1 make[2]: *** [check-am] Error 2 make[1]: *** [check] Error 2 make: *** [check-recursive] Error 1 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3315) make check fails on OSX
Neil Conway created MESOS-3315: -- Summary: make check fails on OSX Key: MESOS-3315 URL: https://issues.apache.org/jira/browse/MESOS-3315 Project: Mesos Issue Type: Bug Environment: OSX 10.10.5 Reporter: Neil Conway Assignee: Yan Xu Priority: Minor {quote} g++ -DPACKAGE_NAME=\mesos\ -DPACKAGE_TARNAME=\mesos\ -DPACKAGE_VERSION=\0.25.0\ -DPACKAGE_STRING=\mesos\ 0.25.0\ -DPACKAGE_BUGREPORT=\\ -DPACKAGE_URL=\\ -DPACKAGE=\mesos\ -DVERSION=\0.25.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -I. -I../../src -Wall -Werror -DLIBDIR=\/usr/local/lib\ -DPKGLIBEXECDIR=\/usr/local/libexec/mesos\ -DPKGDATADIR=\/usr/local/share/mesos\ -I../../include -I../../3rdparty/libprocess/include -I../../3rdparty/libprocess/3rdparty/stout/include -I../include -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 -I../3rdparty/libprocess/3rdparty/picojson-4f93734 -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include -I../3rdparty/zookeeper-3.4.5/src/c/generated -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -DSOURCE_DIR=\/Users/neilc/mesos/build/..\ -DBUILD_DIR=\/Users/neilc/mesos/build\ -I../3rdparty/libprocess/3rdparty/gmock-1.7.0/gtest/include -I../3rdparty/libprocess/3rdparty/gmock-1.7.0/include -I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include -I/usr/local/opt/subversion/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0 -D_THREAD_SAFE -pthread -g1 -O0 -std=c++11 -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -MT tests/containerizer/mesos_tests-provisioner_backend_tests.o -MD -MP -MF tests/containerizer/.deps/mesos_tests-provisioner_backend_tests.Tpo -c -o tests/containerizer/mesos_tests-provisioner_backend_tests.o `test -f 'tests/containerizer/provisioner_backend_tests.cpp' || echo '../../src/'`tests/containerizer/provisioner_backend_tests.cpp In file included from ../../src/tests/containerizer/provisioner_backend_tests.cpp:28: ../../src/linux/fs.hpp:23:10: fatal error: 'mntent.h' file not found #include mntent.h {quote} Seems like {{provisioner_backend_tests.cpp}} shouldn't unconditionally include linux/fs.hpp, as mntent.h is not provided on OSX. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3316) provisioner_backend_tests.cpp breaks the build on OSX
[ https://issues.apache.org/jira/browse/MESOS-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3316: -- Assignee: Yan Xu provisioner_backend_tests.cpp breaks the build on OSX - Key: MESOS-3316 URL: https://issues.apache.org/jira/browse/MESOS-3316 Project: Mesos Issue Type: Bug Reporter: Alexander Rojas Assignee: Yan Xu Priority: Blocker Labels: build-failure The test file makes an include of {{linux/fs.hpp}} which in turn includes {{mntent.h}} which is only available in linux. Building in OSX leads to: {noformat} g++ -DPACKAGE_NAME=\mesos\ -DPACKAGE_TARNAME=\mesos\ -DPACKAGE_VERSION=\0.25.0\ -DPACKAGE_STRING=\mesos\ 0.25.0\ -DPACKAGE_BUGREPORT=\\ -DPACKAGE_URL=\\ -DPACKAGE=\mesos\ -DVERSION=\0.25.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -I. -I../../src -Wall -Werror -DLIBDIR=\/usr/local/lib\ -DPKGLIBEXECDIR=\/usr/local/libexec/mesos\ -DPKGDATADIR=\/usr/local/share/mesos\ -I../../include -I../../3rdparty/libprocess/include -I../../3rdparty/libprocess/3rdparty/stout/include -I../include -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 -I../3rdparty/libprocess/3rdparty/picojson-4f93734 -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include -I../3rdparty/zookeeper-3.4.5/src/c/generated -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -DSOURCE_DIR=\/Users/alexander/Documents/workspace/pmesos/build/..\ -DBUILD_DIR=\/Users/alexander/Documents/workspace/pmesos/build\ -I../3rdparty/libprocess/3rdparty/gmock-1.7.0/gtest/include -I../3rdparty/libprocess/3rdparty/gmock-1.7.0/include -I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include -I/usr/local/opt/subversion/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0 -D_THREAD_SAFE -pthread -g -O0 -std=c++11 -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -MT tests/containerizer/mesos_tests-provisioner_backend_tests.o -MD -MP -MF tests/containerizer/.deps/mesos_tests-provisioner_backend_tests.Tpo -c -o tests/containerizer/mesos_tests-provisioner_backend_tests.o `test -f 'tests/containerizer/provisioner_backend_tests.cpp' || echo '../../src/'`tests/containerizer/provisioner_backend_tests.cpp make[3]: Nothing to be done for `../../src/tests/balloon_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/event_call_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/java_exception_test.sh'. make[3]: Nothing to be done for `../../src/tests/java_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/java_log_test.sh'. make[3]: Nothing to be done for `../../src/tests/no_executor_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/persistent_volume_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/python_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/test_framework_test.sh'. In file included from ../../src/tests/containerizer/provisioner_backend_tests.cpp:28: ../../src/linux/fs.hpp:23:10: fatal error: 'mntent.h' file not found #include mntent.h ^ 1 error generated. make[3]: *** [tests/containerizer/mesos_tests-provisioner_backend_tests.o] Error 1 make[2]: *** [check-am] Error 2 make[1]: *** [check] Error 2 make: *** [check-recursive] Error 1 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3317) URL query string order is undefined
Jan Schlicht created MESOS-3317: --- Summary: URL query string order is undefined Key: MESOS-3317 URL: https://issues.apache.org/jira/browse/MESOS-3317 Project: Mesos Issue Type: Wish Components: libprocess Reporter: Jan Schlicht Priority: Minor A `process::http::URL` instance has its query strings stored in a hashmap. Stringifying the instance will use the order defined by the hash function to order the query strings. This order depends on the concrete implementation of the hash function. A well defined query string order (e.g. in alphabetical order) may be important for bot detection. If the query strings should be in an alphabetic order, multiple solutions are possible: 1. Use map instead of hashmap for defining query string in URLs 2. Order the query strings while creating the URL string 3. Provide an own string hash function that guarantees a certain order -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3317) URL query string order is undefined
[ https://issues.apache.org/jira/browse/MESOS-3317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Schlicht updated MESOS-3317: Labels: easyfix mesosphere newbie (was: easyfix newbie) URL query string order is undefined --- Key: MESOS-3317 URL: https://issues.apache.org/jira/browse/MESOS-3317 Project: Mesos Issue Type: Wish Components: libprocess Reporter: Jan Schlicht Priority: Minor Labels: easyfix, mesosphere, newbie A `process::http::URL` instance has its query strings stored in a hashmap. Stringifying the instance will use the order defined by the hash function to order the query strings. This order depends on the concrete implementation of the hash function. A well defined query string order (e.g. in alphabetical order) may be important for bot detection. If the query strings should be in an alphabetic order, multiple solutions are possible: 1. Use map instead of hashmap for defining query string in URLs 2. Order the query strings while creating the URL string 3. Provide an own string hash function that guarantees a certain order -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3307) Configurable size of completed task / framework history
[ https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712967#comment-14712967 ] Ian Babrou commented on MESOS-3307: --- [~alex-mesos] is there a list of mesos endpoints? I wasn't able to find one. Having docs for this would be great. Any feedback on configurable history size? This is the simplest solution so far. Configurable size of completed task / framework history --- Key: MESOS-3307 URL: https://issues.apache.org/jira/browse/MESOS-3307 Project: Mesos Issue Type: Bug Reporter: Ian Babrou We try to make Mesos work with multiple frameworks and mesos-dns at the same time. The goal is to have set of frameworks per team / project on a single Mesos cluster. At this point our mesos state.json is at 4mb and it takes a while to assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively pushing mesos-master CPU usage through the roof. It's at 100%+ all the time. Here's the problem: {noformat} mesos λ curl -s http://mesos-master:5050/master/state.json | jq .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n 1 20150606-001827-252388362-5050-5982-0003 16 20150606-001827-252388362-5050-5982-0005 18 20150606-001827-252388362-5050-5982-0029 73 20150606-001827-252388362-5050-5982-0007 141 20150606-001827-252388362-5050-5982-0009 154 20150820-154817-302720010-5050-15320- 289 20150606-001827-252388362-5050-5982-0004 510 20150606-001827-252388362-5050-5982-0012 666 20150606-001827-252388362-5050-5982-0028 923 20150116-002612-269165578-5050-32204-0003 1000 20150606-001827-252388362-5050-5982-0001 1000 20150606-001827-252388362-5050-5982-0006 1000 20150606-001827-252388362-5050-5982-0010 1000 20150606-001827-252388362-5050-5982-0011 1000 20150606-001827-252388362-5050-5982-0027 mesos λ fgrep 1000 -r src/master src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 10; src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = 1000; {noformat} Active tasks are just 6% of state.json response: {noformat} mesos λ cat ~/temp/mesos-state.json | jq -c . | wc 1 14796 4138942 mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc 16 37 252774 {noformat} I see four options that can improve the situation: 1. Add query string param to exclude completed tasks from state.json and use it in mesos-dns and similar tools. There is no need for mesos-dns to know about completed tasks, it's just extra load on master and mesos-dns. 2. Make history size configurable. 3. Make JSON serialization faster. With 1s of tasks even without history it would take a lot of time to serialize tasks for mesos-dns. Doing it every 60 seconds instead of every 5 seconds isn't really an option. 4. Create event bus for mesos master. Marathon has it and it'd be nice to have it in Mesos. This way mesos-dns could avoid polling master state and switch to listening for events. All can be done independently. Note to mesosphere folks: please start distributing debug symbols with your distribution. I was asking for it for a while and it is really helpful: https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501 Perf report for leading master: !http://i.imgur.com/iz7C3o0.png! I'm on 0.23.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2058) Deprecate stats.json endpoints for Master and Slave
[ https://issues.apache.org/jira/browse/MESOS-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14713504#comment-14713504 ] Ian Babrou commented on MESOS-2058: --- [~dhamon] [~nnielsen] this broke master's ui home page. It is using staged_tasks and friends and in 0.23.0 you can't see the values. Deprecate stats.json endpoints for Master and Slave --- Key: MESOS-2058 URL: https://issues.apache.org/jira/browse/MESOS-2058 Project: Mesos Issue Type: Task Components: master, slave Reporter: Dominic Hamon Assignee: Dominic Hamon Labels: twitter Fix For: 0.23.0 With the introduction of the libprocess {{/metrics/snapshot}} endpoint, metrics are now duplicated in the Master and Slave between this and {{stats.json}}. We should deprecate the {{stats.json}} endpoints. Manual inspection of {{stats.json}} shows that all metrics are now covered by the new endpoint for Master and Slave. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2684) mesos-slave should not abort when a single task has e.g. a 'mkdir' failure
[ https://issues.apache.org/jira/browse/MESOS-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14713497#comment-14713497 ] Scott D.W. Rankin commented on MESOS-2684: -- Hi all - I'm seeing this issue as well. We're running Marathon 0.8.2, Mesos 0.22.1 on CentOS 6.6 and are getting errors similar to the one pasted below pretty regularly. We can't reproduce it all the time, but it happens when initiating a deployment from Marathon. 26 Aug 2015 09:35:01.213 host=mesosnode6-aws-west tag=mesos-slave[30248]: F0826 06:35:01.136056 30280 slave.cpp:3354] CHECK_SOME(os::touch(path)): Failed to open file: No such file or directory Context 26 Aug 2015 09:35:01.369 host=mesosnode6-aws-west tag=mesos-slave[30248]: *** Check failure stack trace: *** Context 26 Aug 2015 09:35:01.369 host=mesosnode6-aws-west tag=mesos-slave[30248]: @ 0x3de1e765cd (unknown) Context 26 Aug 2015 09:35:01.369 host=mesosnode6-aws-west tag=mesos-slave[30248]: @ 0x3de1e7a5e7 (unknown) Context 26 Aug 2015 09:35:01.369 host=mesosnode6-aws-west tag=mesos-slave[30248]: @ 0x3de1e78469 (unknown) Context 26 Aug 2015 09:35:01.369 host=mesosnode6-aws-west tag=mesos-slave[30248]: @ 0x3de1e7876d (unknown) Context 26 Aug 2015 09:35:01.369 host=mesosnode6-aws-west tag=mesos-slave[30248]: @ 0x3de17c5696 (unknown) Context 26 Aug 2015 09:35:01.369 host=mesosnode6-aws-west tag=mesos-slave[30248]: @ 0x3de1a1855a (unknown) Context 26 Aug 2015 09:35:01.369 host=mesosnode6-aws-west tag=mesos-slave[30248]: @ 0x3de1a1c0a9 (unknown) Context 26 Aug 2015 09:35:01.369 host=mesosnode6-aws-west tag=mesos-slave[30248]: @ 0x3de1a510ff (unknown) Context 26 Aug 2015 09:35:01.369 host=mesosnode6-aws-west tag=mesos-slave[30248]: @ 0x3de1e18b83 (unknown) Context 26 Aug 2015 09:35:01.369 host=mesosnode6-aws-west tag=mesos-slave[30248]: @ 0x3de1e1978c (unknown) Context 26 Aug 2015 09:35:01.369 host=mesosnode6-aws-west tag=mesos-slave[30248]: @ 0x39d58079d1 (unknown) Context 26 Aug 2015 09:35:01.369 host=mesosnode6-aws-west tag=mesos-slave[30248]: @ 0x39d54e88fd (unknown) Context 26 Aug 2015 09:35:01.369 host=mesosnode6-aws-west tag=init: mesos-slave main process (30248) killed by ABRT signal Context 26 Aug 2015 09:35:01.369 host=mesosnode6-aws-west tag=init: mesos-slave main process ended, respawning Context mesos-slave should not abort when a single task has e.g. a 'mkdir' failure -- Key: MESOS-2684 URL: https://issues.apache.org/jira/browse/MESOS-2684 Project: Mesos Issue Type: Bug Components: slave Affects Versions: 0.21.1 Reporter: Steven Schlansker Attachments: mesos-slave-restart.txt mesos-slave can encounter a variety of problems while attempting to launch a task. If the task fails, that is unfortunate, but not the end of the world. Other tasks should not be affected. However, if the task failure happens to trigger an assertion, the entire slave comes crashing down: F0501 19:10:46.095464 1705 paths.hpp:342] CHECK_SOME(mkdir): No space left on device Failed to create executor directory '/mnt/mesos/slaves/20150327-194449-419644938-5050-1649-S71/frameworks/Singularity/executors/pp-gc-eventlog-teamcity.2015.03.31T23.55.14-1430507446029-2-10.70.8.160-us_west_2b/runs/95a54aeb-322c-48e9-9f6f-5b359bccbc01' Immediately afterwards, all tasks on this slave were declared TASK_KILLED when mesos-slave restarted. Something as simple as a 'mkdir' failing is not worthy of an assertion failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1791) Introduce Master / Offer Resource Reservations aka Quota
[ https://issues.apache.org/jira/browse/MESOS-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14713626#comment-14713626 ] Hans van den Bogert commented on MESOS-1791: I see this ticket is related to dynamic reservations, but how exactly is this related? Could one say that dynamic reservations is a more restricted form of quotas, as the latter does not lay reservation on specific resources/slaves? Or are they the same thing? Introduce Master / Offer Resource Reservations aka Quota Key: MESOS-1791 URL: https://issues.apache.org/jira/browse/MESOS-1791 Project: Mesos Issue Type: Epic Components: allocation, master, replicated log Reporter: Tom Arnfeld Assignee: Alexander Rukletsov Labels: mesosphere Currently Mesos supports the ability to reserve resources (for a given role) on a per-slave basis, as introduced in MESOS-505. This allows you to almost statically partition off a set of resources on a set of machines, to guarantee certain types of frameworks get some resources. This is very useful, though it is also very useful to be able to control these reservations through the master (instead of per-slave) for when I don't care which nodes I get on, as long as I get X cpu and Y RAM, or Z sets of (X,Y). I'm not sure what structure this could take, but apparently it has already been discussed. Would this be a CLI flag? Could there be a (authenticated) web interface to control these reservations? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3158) Libprocess Process: Join runqueue workers during finalization
[ https://issues.apache.org/jira/browse/MESOS-3158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715792#comment-14715792 ] Greg Mann commented on MESOS-3158: -- Review here: https://reviews.apache.org/r/37821/ Libprocess Process: Join runqueue workers during finalization - Key: MESOS-3158 URL: https://issues.apache.org/jira/browse/MESOS-3158 Project: Mesos Issue Type: Improvement Components: libprocess Reporter: Joris Van Remoortere Assignee: Greg Mann Labels: beginner, libprocess, mesosphere, newbie The lack of synchronization between ProcessManager destruction and the thread pool threads running the queued processes means that the shared state that is part of the ProcessManager gets destroyed prematurely. Synchronizing the ProcessManager destructor with draining the work queues and stopping the workers will allow us to not require leaking the shared state to avoid use beyond destruction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3322) Upgrade vendored google-glog to 0.3.4
Neil Conway created MESOS-3322: -- Summary: Upgrade vendored google-glog to 0.3.4 Key: MESOS-3322 URL: https://issues.apache.org/jira/browse/MESOS-3322 Project: Mesos Issue Type: Improvement Reporter: Neil Conway Assignee: Neil Conway Priority: Minor This brings a few improvements; it should also mean we can drop the patch we currently apply to address some glog bugs that likely have been fixed upstream (see [#860]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3322) Upgrade vendored google-glog to 0.3.4
[ https://issues.apache.org/jira/browse/MESOS-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715816#comment-14715816 ] Neil Conway commented on MESOS-3322: https://reviews.apache.org/r/37823/ https://reviews.apache.org/r/37824/ Upgrade vendored google-glog to 0.3.4 - Key: MESOS-3322 URL: https://issues.apache.org/jira/browse/MESOS-3322 Project: Mesos Issue Type: Improvement Reporter: Neil Conway Assignee: Neil Conway Priority: Minor This brings a few improvements; it should also mean we can drop the patch we currently apply to address some glog bugs that likely have been fixed upstream (see [#860]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1791) Introduce Master / Offer Resource Reservations aka Quota
[ https://issues.apache.org/jira/browse/MESOS-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715871#comment-14715871 ] Qian Zhang commented on MESOS-1791: --- [~alex-mesos], if dynamic reservation is per role, then how can we guarantee the reserved resources will be re-offered to the stateful framework which makes the dynamic reservation? For example, both Cassandra framework and HDFS framework belongs to role1, and HDFS dynamically reserves some resources in an agent for role1, then I think it may be possible for allocator to offer those resources to Cassandra since it also belongs to role1, but actually HDFS expects to be offered with those resources. Introduce Master / Offer Resource Reservations aka Quota Key: MESOS-1791 URL: https://issues.apache.org/jira/browse/MESOS-1791 Project: Mesos Issue Type: Epic Components: allocation, master, replicated log Reporter: Tom Arnfeld Assignee: Alexander Rukletsov Labels: mesosphere Currently Mesos supports the ability to reserve resources (for a given role) on a per-slave basis, as introduced in MESOS-505. This allows you to almost statically partition off a set of resources on a set of machines, to guarantee certain types of frameworks get some resources. This is very useful, though it is also very useful to be able to control these reservations through the master (instead of per-slave) for when I don't care which nodes I get on, as long as I get X cpu and Y RAM, or Z sets of (X,Y). I'm not sure what structure this could take, but apparently it has already been discussed. Would this be a CLI flag? Could there be a (authenticated) web interface to control these reservations? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3063) Add an example framework using dynamic reservation
[ https://issues.apache.org/jira/browse/MESOS-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716016#comment-14716016 ] Klaus Ma commented on MESOS-3063: - Update the example with UT scripts; and also add [~mcypark] as reviewer. Add an example framework using dynamic reservation -- Key: MESOS-3063 URL: https://issues.apache.org/jira/browse/MESOS-3063 Project: Mesos Issue Type: Task Reporter: Michael Park Assignee: Klaus Ma An example framework using dynamic reservation should added to # test dynamic reservations further, and # to be used as a reference for those who want to use the dynamic reservation feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-3312) Factor out JSON to repeated protobuf conversion
[ https://issues.apache.org/jira/browse/MESOS-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715927#comment-14715927 ] Alexander Rukletsov edited comment on MESOS-3312 at 8/27/15 3:39 AM: - https://reviews.apache.org/r/37826/ https://reviews.apache.org/r/37827/ https://reviews.apache.org/r/37830/ was (Author: alex-mesos): https://reviews.apache.org/r/37826/ https://reviews.apache.org/r/37827/ Factor out JSON to repeated protobuf conversion --- Key: MESOS-3312 URL: https://issues.apache.org/jira/browse/MESOS-3312 Project: Mesos Issue Type: Improvement Reporter: Alexander Rukletsov Assignee: Alexander Rukletsov Labels: mesosphere In general, we have the collection of protobuf messages as another protobuf message, which makes JSON - protobuf conversion straightforward. This is not always the case, for example, {{Resources}} class is not a protobuf, though protobuf-convertible. To facilitate conversions like JSON - {{Resources}} and avoid writing code for each particular case, we propose to introduce {{JSON::Array}} - {{repeated protobuf}} conversion. With this in place, {{JSON::Array}} - {{Resources}} boils down to {{JSON::Array}} - {{repeated Resource}} - (extra c-tor call) - {{Resources}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3063) Add an example framework using dynamic reservation
[ https://issues.apache.org/jira/browse/MESOS-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716017#comment-14716017 ] Klaus Ma commented on MESOS-3063: - Update the example with UT scripts; and also add [~mcypark] as reviewer. Add an example framework using dynamic reservation -- Key: MESOS-3063 URL: https://issues.apache.org/jira/browse/MESOS-3063 Project: Mesos Issue Type: Task Reporter: Michael Park Assignee: Klaus Ma An example framework using dynamic reservation should added to # test dynamic reservations further, and # to be used as a reference for those who want to use the dynamic reservation feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3323) Auto-generate protos for stout tests
Alexander Rukletsov created MESOS-3323: -- Summary: Auto-generate protos for stout tests Key: MESOS-3323 URL: https://issues.apache.org/jira/browse/MESOS-3323 Project: Mesos Issue Type: Improvement Reporter: Alexander Rukletsov Assignee: Kapil Arya Priority: Minor Stout protobufs (AFAIK right now it's just a single file {{protobuf_tests.proto}}) are not generated automatically. Including proto generation step would be cleaner and more convenient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1791) Introduce Master / Offer Resource Reservations aka Quota
[ https://issues.apache.org/jira/browse/MESOS-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715905#comment-14715905 ] Alexander Rukletsov commented on MESOS-1791: [~qianzhang], have a look at [persistent volumes documentation|https://mesos.apache.org/documentation/latest/persistent-volume/]. There was also a [talk on MesosCon|http://mesoscon2015.sched.org/event/7151c36724e5c3bc9de9e452fe4c866a#.Vd5uBtOqqko], hopefully the video will be available soon. If your question will remain unanswered, I would like to encourage you to continue on the devlist rather than quota epic, so that other contributors may chime in. Introduce Master / Offer Resource Reservations aka Quota Key: MESOS-1791 URL: https://issues.apache.org/jira/browse/MESOS-1791 Project: Mesos Issue Type: Epic Components: allocation, master, replicated log Reporter: Tom Arnfeld Assignee: Alexander Rukletsov Labels: mesosphere Currently Mesos supports the ability to reserve resources (for a given role) on a per-slave basis, as introduced in MESOS-505. This allows you to almost statically partition off a set of resources on a set of machines, to guarantee certain types of frameworks get some resources. This is very useful, though it is also very useful to be able to control these reservations through the master (instead of per-slave) for when I don't care which nodes I get on, as long as I get X cpu and Y RAM, or Z sets of (X,Y). I'm not sure what structure this could take, but apparently it has already been discussed. Would this be a CLI flag? Could there be a (authenticated) web interface to control these reservations? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3312) Factor out JSON to repeated protobuf conversion
[ https://issues.apache.org/jira/browse/MESOS-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715927#comment-14715927 ] Alexander Rukletsov commented on MESOS-3312: https://reviews.apache.org/r/37826/ https://reviews.apache.org/r/37827/ Factor out JSON to repeated protobuf conversion --- Key: MESOS-3312 URL: https://issues.apache.org/jira/browse/MESOS-3312 Project: Mesos Issue Type: Improvement Reporter: Alexander Rukletsov Assignee: Alexander Rukletsov Labels: mesosphere In general, we have the collection of protobuf messages as another protobuf message, which makes JSON - protobuf conversion straightforward. This is not always the case, for example, {{Resources}} class is not a protobuf, though protobuf-convertible. To facilitate conversions like JSON - {{Resources}} and avoid writing code for each particular case, we propose to introduce {{JSON::Array}} - {{repeated protobuf}} conversion. With this in place, {{JSON::Array}} - {{Resources}} boils down to {{JSON::Array}} - {{repeated Resource}} - (extra c-tor call) - {{Resources}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (MESOS-3063) Add an example framework using dynamic reservation
[ https://issues.apache.org/jira/browse/MESOS-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Klaus Ma updated MESOS-3063: Comment: was deleted (was: Update the example with UT scripts; and also add [~mcypark] as reviewer.) Add an example framework using dynamic reservation -- Key: MESOS-3063 URL: https://issues.apache.org/jira/browse/MESOS-3063 Project: Mesos Issue Type: Task Reporter: Michael Park Assignee: Klaus Ma An example framework using dynamic reservation should added to # test dynamic reservations further, and # to be used as a reference for those who want to use the dynamic reservation feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)