[jira] [Commented] (MESOS-1812) Queued tasks are not actually launched in the order they were queued
[ https://issues.apache.org/jira/browse/MESOS-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138738#comment-14138738 ] Alexander Rukletsov commented on MESOS-1812: Do we (and should we?) guarantee the order is preserved? Queued tasks are not actually launched in the order they were queued Key: MESOS-1812 URL: https://issues.apache.org/jira/browse/MESOS-1812 Project: Mesos Issue Type: Bug Components: slave Reporter: Tom Arnfeld Even though tasks are assigned and queued in the order in which they are launched (e.g multiple tasks in reply to one offer), due to timing issues with the futures, this can sometimes break the causality and end up not being launched in order. Example trace from a slave... In this example the Task_Tracker_10 task should be launched before slots_Task_Tracker_10. {code} I0918 02:10:50.371445 17072 slave.cpp:933] Got assigned task Task_Tracker_10 for framework 20140916-233111-3171422218-5050-14295-0015 I0918 02:10:50.372110 17072 slave.cpp:933] Got assigned task slots_Task_Tracker_10 for framework 20140916-233111-3171422218-5050-14295-0015 I0918 02:10:50.372172 17073 gc.cpp:84] Unscheduling '/mnt/mesos-slave/slaves/20140915-112519-3171422218-5050-5016-6/frameworks/20140916-233111-3171422218-5050-14295-0015' from gc I0918 02:10:50.375018 17072 slave.cpp:1043] Launching task slots_Task_Tracker_10 for framework 20140916-233111-3171422218-5050-14295-0015 I0918 02:10:50.386282 17072 slave.cpp:1153] Queuing task 'slots_Task_Tracker_10' for executor executor_Task_Tracker_10 of framework '20140916-233111-3171422218-5050-14295-0015 I0918 02:10:50.386312 17070 mesos_containerizer.cpp:537] Starting container '5f507f09-b48e-44ea-b74e-740b0e8bba4d' for executor 'executor_Task_Tracker_10' of framework '20140916-233111-3171422218-5050-14295-0015' I0918 02:10:50.388942 17072 slave.cpp:1043] Launching task Task_Tracker_10 for framework 20140916-233111-3171422218-5050-14295-0015 I0918 02:10:50.406277 17070 launcher.cpp:117] Forked child with pid '817' for container '5f507f09-b48e-44ea-b74e-740b0e8bba4d' I0918 02:10:50.406563 17072 slave.cpp:1153] Queuing task 'Task_Tracker_10' for executor executor_Task_Tracker_10 of framework '20140916-233111-3171422218-5050-14295-0015 I0918 02:10:50.408499 17069 mesos_containerizer.cpp:647] Fetching URIs for container '5f507f09-b48e-44ea-b74e-740b0e8bba4d' using command '/usr/local/libexec/mesos/mesos-fetcher' I0918 02:11:11.650687 17071 slave.cpp:2873] Current usage 17.34%. Max allowed age: 5.086371210668750days I0918 02:11:16.590270 17075 slave.cpp:2355] Monitoring executor 'executor_Task_Tracker_10' of framework '20140916-233111-3171422218-5050-14295-0015' in container '5f507f09-b48e-44ea-b74e-740b0e8bba4d' I0918 02:11:17.701015 17070 slave.cpp:1664] Got registration for executor 'executor_Task_Tracker_10' of framework 20140916-233111-3171422218-5050-14295-0015 I0918 02:11:17.701897 17070 slave.cpp:1783] Flushing queued task slots_Task_Tracker_10 for executor 'executor_Task_Tracker_10' of framework 20140916-233111-3171422218-5050-14295-0015 I0918 02:11:17.702350 17070 slave.cpp:1783] Flushing queued task Task_Tracker_10 for executor 'executor_Task_Tracker_10' of framework 20140916-233111-3171422218-5050-14295-0015 I0918 02:11:18.588388 17070 mesos_containerizer.cpp:1112] Executor for container '5f507f09-b48e-44ea-b74e-740b0e8bba4d' has exited I0918 02:11:18.588665 17070 mesos_containerizer.cpp:996] Destroying container '5f507f09-b48e-44ea-b74e-740b0e8bba4d' I0918 02:11:18.599234 17072 slave.cpp:2413] Executor 'executor_Task_Tracker_10' of framework 20140916-233111-3171422218-5050-14295-0015 has exited with status 1 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1806) Substituting etcd or ReplicatedLog for Zookeeper
[ https://issues.apache.org/jira/browse/MESOS-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139077#comment-14139077 ] Timothy St. Clair commented on MESOS-1806: -- [~tnachen] got a branch? I'm game for assist, and I'm sure the folks on your end are looking to resolve the delta between kube. Substituting etcd or ReplicatedLog for Zookeeper Key: MESOS-1806 URL: https://issues.apache.org/jira/browse/MESOS-1806 Project: Mesos Issue Type: Task Reporter: Ed Ropple Priority: Minor adam_mesos eropple: Could you also file a new JIRA for Mesos to drop ZK in favor of etcd or ReplicatedLog? Would love to get some momentum going on that one. -- Consider it filed. =) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1392) Failure when znode is removed before we can read its contents.
[ https://issues.apache.org/jira/browse/MESOS-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139098#comment-14139098 ] Jay Buffington commented on MESOS-1392: --- Looks like this is resolved by this commit: https://github.com/apache/mesos/commit/14c605e8ce425ec8c517d8e4f899eb3ddeede56a Failure when znode is removed before we can read its contents. -- Key: MESOS-1392 URL: https://issues.apache.org/jira/browse/MESOS-1392 Project: Mesos Issue Type: Bug Affects Versions: 0.19.0 Reporter: Benjamin Mahler Assignee: Yan Xu Looks like the following can occur when a znode goes away right before we can read it's contents: {noformat: title=Slave exit} I0520 16:33:45.721727 29155 group.cpp:382] Trying to create path '/home/mesos/test/master' in ZooKeeper I0520 16:33:48.600837 29155 detector.cpp:134] Detected a new leader: (id='2617') I0520 16:33:48.601428 29147 group.cpp:655] Trying to get '/home/mesos/test/master/info_002617' in ZooKeeper Failed to detect a master: Failed to get data for ephemeral node '/home/mesos/test/master/info_002617' in ZooKeeper: no node Slave Exit Status: 1 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1384) Add support for loadable MesosModule
[ https://issues.apache.org/jira/browse/MESOS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139097#comment-14139097 ] Timothy St. Clair commented on MESOS-1384: -- Having a pluggable architecture would enable folks to do the following: 1. Test PoC ideas in a clean way without impacting mainline. 2. Enable Service providers to write custom interfaces that may only apply to their workflow. *This is the big one* 3. Prevents mesos from accreating too much into it's core without having well thought out boundaries on interfaces and adaptability over time. By forcing the step, it helps to define clear boundaries. ... Add support for loadable MesosModule Key: MESOS-1384 URL: https://issues.apache.org/jira/browse/MESOS-1384 Project: Mesos Issue Type: Improvement Affects Versions: 0.19.0 Reporter: Timothy St. Clair Assignee: Niklas Quarfot Nielsen I think we should break this into multiple phases. -(1) Let's get the dynamic library loading via a stout-ified version of https://github.com/timothysc/tests/blob/master/plugin_modules/DynamicLibrary.h. - *DONE* (2) Use (1) to instantiate some classes in Mesos (like an Authenticator and/or isolator) from a dynamic library. This will give us some more experience with how we want to name the underlying library symbol, how we want to specify flags for finding the library, what types of validation we want when loading a library. *TARGET* (3) After doing (2) for one or two classes in Mesos I think we can formalize the approach in a mesos-ified version of https://github.com/timothysc/tests/blob/master/plugin_modules/MesosModule.h. *NEXT* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-1814) Task attempted to use more offers than requested in example framework
Vinod Kone created MESOS-1814: - Summary: Task attempted to use more offers than requested in example framework Key: MESOS-1814 URL: https://issues.apache.org/jira/browse/MESOS-1814 Project: Mesos Issue Type: Bug Reporter: Vinod Kone {code} [ RUN ] ExamplesTest.JavaFramework Using temporary directory '/tmp/ExamplesTest_JavaFramework_2PcFCh' Enabling authentication for the framework WARNING: Logging before InitGoogleLogging() is written to STDERR I0917 23:14:35.199069 31510 process.cpp:1771] libprocess is initialized on 127.0.1.1:34609 for 8 cpus I0917 23:14:35.199794 31510 logging.cpp:177] Logging to STDERR I0917 23:14:35.225342 31510 leveldb.cpp:176] Opened db in 22.197149ms I0917 23:14:35.231133 31510 leveldb.cpp:183] Compacted db in 5.601897ms I0917 23:14:35.231498 31510 leveldb.cpp:198] Created db iterator in 215441ns I0917 23:14:35.231608 31510 leveldb.cpp:204] Seeked to beginning of db in 11488ns I0917 23:14:35.231722 31510 leveldb.cpp:273] Iterated through 0 keys in the db in 14016ns I0917 23:14:35.231917 31510 replica.cpp:741] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0917 23:14:35.233129 31526 recover.cpp:425] Starting replica recovery I0917 23:14:35.233614 31526 recover.cpp:451] Replica is in EMPTY status I0917 23:14:35.234994 31526 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0917 23:14:35.240116 31519 recover.cpp:188] Received a recover response from a replica in EMPTY status I0917 23:14:35.240782 31519 recover.cpp:542] Updating replica status to STARTING I0917 23:14:35.242846 31524 master.cpp:286] Master 20140917-231435-16842879-34609-31503 (saucy) started on 127.0.1.1:34609 I0917 23:14:35.243191 31524 master.cpp:332] Master only allowing authenticated frameworks to register I0917 23:14:35.243288 31524 master.cpp:339] Master allowing unauthenticated slaves to register I0917 23:14:35.243399 31524 credentials.hpp:36] Loading credentials for authentication from '/tmp/ExamplesTest_JavaFramework_2PcFCh/credentials' W0917 23:14:35.243588 31524 credentials.hpp:51] Permissions on credentials file '/tmp/ExamplesTest_JavaFramework_2PcFCh/credentials' are too open. It is recommended that your credentials file is NOT accessible by others. I0917 23:14:35.243846 31524 master.cpp:366] Authorization enabled I0917 23:14:35.244882 31520 hierarchical_allocator_process.hpp:299] Initializing hierarchical allocator process with master : master@127.0.1.1:34609 I0917 23:14:35.245224 31520 master.cpp:120] No whitelist given. Advertising offers for all slaves I0917 23:14:35.246934 31524 master.cpp:1211] The newly elected leader is master@127.0.1.1:34609 with id 20140917-231435-16842879-34609-31503 I0917 23:14:35.247234 31524 master.cpp:1224] Elected as the leading master! I0917 23:14:35.247336 31524 master.cpp:1042] Recovering from registrar I0917 23:14:35.247542 31526 registrar.cpp:313] Recovering registrar I0917 23:14:35.250555 31510 containerizer.cpp:89] Using isolation: posix/cpu,posix/mem I0917 23:14:35.252326 31510 containerizer.cpp:89] Using isolation: posix/cpu,posix/mem I0917 23:14:35.252821 31520 slave.cpp:169] Slave started on 1)@127.0.1.1:34609 I0917 23:14:35.253552 31520 slave.cpp:289] Slave resources: cpus(*):1; mem(*):1001; disk(*):24988; ports(*):[31000-32000] I0917 23:14:35.253906 31520 slave.cpp:317] Slave hostname: saucy I0917 23:14:35.254004 31520 slave.cpp:318] Slave checkpoint: true I0917 23:14:35.254818 31520 state.cpp:33] Recovering state from '/tmp/mesos-w8snRW/0/meta' I0917 23:14:35.255106 31519 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 13.99622ms I0917 23:14:35.255235 31519 replica.cpp:320] Persisted replica status to STARTING I0917 23:14:35.255419 31519 recover.cpp:451] Replica is in STARTING status I0917 23:14:35.255834 31519 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0917 23:14:35.256000 31519 recover.cpp:188] Received a recover response from a replica in STARTING status I0917 23:14:35.256217 31519 recover.cpp:542] Updating replica status to VOTING I0917 23:14:35.256641 31520 status_update_manager.cpp:193] Recovering status update manager I0917 23:14:35.257064 31520 containerizer.cpp:252] Recovering containerizer I0917 23:14:35.257725 31520 slave.cpp:3220] Finished recovery I0917 23:14:35.258463 31520 slave.cpp:600] New master detected at master@127.0.1.1:34609 I0917 23:14:35.258769 31524 status_update_manager.cpp:167] New master detected at master@127.0.1.1:34609 I0917 23:14:35.258885 31520 slave.cpp:636] No credentials provided. Attempting to register without authentication I0917 23:14:35.259024 31520 slave.cpp:647] Detecting new master I0917 23:14:35.259863 31520 slave.cpp:169] Slave started on 2)@127.0.1.1:34609 I0917 23:14:35.260288 31520 slave.cpp:289] Slave resources: cpus(*):1; mem(*):1001; disk(*):24988;
[jira] [Commented] (MESOS-1812) Queued tasks are not actually launched in the order they were queued
[ https://issues.apache.org/jira/browse/MESOS-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139171#comment-14139171 ] Dominic Hamon commented on MESOS-1812: -- MESOS-497 doesn't have any reasoning other than it would be nice so I would also like to hear why this is important. I'm not saying it isn't, just want to make sure we're not artificially adding constraints to the system. Queued tasks are not actually launched in the order they were queued Key: MESOS-1812 URL: https://issues.apache.org/jira/browse/MESOS-1812 Project: Mesos Issue Type: Bug Components: slave Reporter: Tom Arnfeld Even though tasks are assigned and queued in the order in which they are launched (e.g multiple tasks in reply to one offer), due to timing issues with the futures, this can sometimes break the causality and end up not being launched in order. Example trace from a slave... In this example the Task_Tracker_10 task should be launched before slots_Task_Tracker_10. {code} I0918 02:10:50.371445 17072 slave.cpp:933] Got assigned task Task_Tracker_10 for framework 20140916-233111-3171422218-5050-14295-0015 I0918 02:10:50.372110 17072 slave.cpp:933] Got assigned task slots_Task_Tracker_10 for framework 20140916-233111-3171422218-5050-14295-0015 I0918 02:10:50.372172 17073 gc.cpp:84] Unscheduling '/mnt/mesos-slave/slaves/20140915-112519-3171422218-5050-5016-6/frameworks/20140916-233111-3171422218-5050-14295-0015' from gc I0918 02:10:50.375018 17072 slave.cpp:1043] Launching task slots_Task_Tracker_10 for framework 20140916-233111-3171422218-5050-14295-0015 I0918 02:10:50.386282 17072 slave.cpp:1153] Queuing task 'slots_Task_Tracker_10' for executor executor_Task_Tracker_10 of framework '20140916-233111-3171422218-5050-14295-0015 I0918 02:10:50.386312 17070 mesos_containerizer.cpp:537] Starting container '5f507f09-b48e-44ea-b74e-740b0e8bba4d' for executor 'executor_Task_Tracker_10' of framework '20140916-233111-3171422218-5050-14295-0015' I0918 02:10:50.388942 17072 slave.cpp:1043] Launching task Task_Tracker_10 for framework 20140916-233111-3171422218-5050-14295-0015 I0918 02:10:50.406277 17070 launcher.cpp:117] Forked child with pid '817' for container '5f507f09-b48e-44ea-b74e-740b0e8bba4d' I0918 02:10:50.406563 17072 slave.cpp:1153] Queuing task 'Task_Tracker_10' for executor executor_Task_Tracker_10 of framework '20140916-233111-3171422218-5050-14295-0015 I0918 02:10:50.408499 17069 mesos_containerizer.cpp:647] Fetching URIs for container '5f507f09-b48e-44ea-b74e-740b0e8bba4d' using command '/usr/local/libexec/mesos/mesos-fetcher' I0918 02:11:11.650687 17071 slave.cpp:2873] Current usage 17.34%. Max allowed age: 5.086371210668750days I0918 02:11:16.590270 17075 slave.cpp:2355] Monitoring executor 'executor_Task_Tracker_10' of framework '20140916-233111-3171422218-5050-14295-0015' in container '5f507f09-b48e-44ea-b74e-740b0e8bba4d' I0918 02:11:17.701015 17070 slave.cpp:1664] Got registration for executor 'executor_Task_Tracker_10' of framework 20140916-233111-3171422218-5050-14295-0015 I0918 02:11:17.701897 17070 slave.cpp:1783] Flushing queued task slots_Task_Tracker_10 for executor 'executor_Task_Tracker_10' of framework 20140916-233111-3171422218-5050-14295-0015 I0918 02:11:17.702350 17070 slave.cpp:1783] Flushing queued task Task_Tracker_10 for executor 'executor_Task_Tracker_10' of framework 20140916-233111-3171422218-5050-14295-0015 I0918 02:11:18.588388 17070 mesos_containerizer.cpp:1112] Executor for container '5f507f09-b48e-44ea-b74e-740b0e8bba4d' has exited I0918 02:11:18.588665 17070 mesos_containerizer.cpp:996] Destroying container '5f507f09-b48e-44ea-b74e-740b0e8bba4d' I0918 02:11:18.599234 17072 slave.cpp:2413] Executor 'executor_Task_Tracker_10' of framework 20140916-233111-3171422218-5050-14295-0015 has exited with status 1 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1662) Mesos doesn't limit swap
[ https://issues.apache.org/jira/browse/MESOS-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139176#comment-14139176 ] Chi Hoang commented on MESOS-1662: -- Wondering what happened with this fix. Status says fixed, but it wasn't included in 0.20.0. Mesos doesn't limit swap Key: MESOS-1662 URL: https://issues.apache.org/jira/browse/MESOS-1662 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.19.1 Reporter: Andrew Forgue Assignee: Anton Lindström When using control groups, mesos will limit memory usage, but if the CONFIG_MEMCG_SWAP config option is enabled swap usage is not limited. This means that if a task that asked for 1G and allocated 4G, it will fill 3G of swap. The expected behavior is that the cgroup should have OOMed. The control group key for limiting both Memory+Swap is memory.memsw.limit_in_bytes (not memory.limit_in_bytes). It looks like CONFIG_MEMCG_SWAP showed up in Kernel 3.6. Mesos should limit swap+memory if possible. I can't imagine when you'd want to limit memory but not swap, but there may be some situations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1662) Mesos doesn't limit swap
[ https://issues.apache.org/jira/browse/MESOS-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-1662: -- Fix Version/s: 0.20.0 Mesos doesn't limit swap Key: MESOS-1662 URL: https://issues.apache.org/jira/browse/MESOS-1662 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.19.1 Reporter: Andrew Forgue Assignee: Anton Lindström Fix For: 0.20.0 When using control groups, mesos will limit memory usage, but if the CONFIG_MEMCG_SWAP config option is enabled swap usage is not limited. This means that if a task that asked for 1G and allocated 4G, it will fill 3G of swap. The expected behavior is that the cgroup should have OOMed. The control group key for limiting both Memory+Swap is memory.memsw.limit_in_bytes (not memory.limit_in_bytes). It looks like CONFIG_MEMCG_SWAP showed up in Kernel 3.6. Mesos should limit swap+memory if possible. I can't imagine when you'd want to limit memory but not swap, but there may be some situations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1814) Task attempted to use more offers than requested in example jave and python frameworks
[ https://issues.apache.org/jira/browse/MESOS-1814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-1814: -- Component/s: test Sprint: Mesos Q3 Sprint 5 Target Version/s: 0.21.0 Affects Version/s: 0.21.0 Shepherd: Yan Xu Story Points: 2 Summary: Task attempted to use more offers than requested in example jave and python frameworks (was: Task attempted to use more offers than requested in example framework) This is a latent bug in both the java and python example frameworks. Both these frameworks launch tasks without looking at whether the resources offered to it are enough to launch the task. We are seeing this now because of the recently landed change that offers frameworks resources with no memory or no cpu. Before this change, no such offers were made and hence the framework was lucky that any offer that it received matched its task requirements. I'll send a patch shortly. Task attempted to use more offers than requested in example jave and python frameworks -- Key: MESOS-1814 URL: https://issues.apache.org/jira/browse/MESOS-1814 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.21.0 Reporter: Vinod Kone Assignee: Vinod Kone {code} [ RUN ] ExamplesTest.JavaFramework Using temporary directory '/tmp/ExamplesTest_JavaFramework_2PcFCh' Enabling authentication for the framework WARNING: Logging before InitGoogleLogging() is written to STDERR I0917 23:14:35.199069 31510 process.cpp:1771] libprocess is initialized on 127.0.1.1:34609 for 8 cpus I0917 23:14:35.199794 31510 logging.cpp:177] Logging to STDERR I0917 23:14:35.225342 31510 leveldb.cpp:176] Opened db in 22.197149ms I0917 23:14:35.231133 31510 leveldb.cpp:183] Compacted db in 5.601897ms I0917 23:14:35.231498 31510 leveldb.cpp:198] Created db iterator in 215441ns I0917 23:14:35.231608 31510 leveldb.cpp:204] Seeked to beginning of db in 11488ns I0917 23:14:35.231722 31510 leveldb.cpp:273] Iterated through 0 keys in the db in 14016ns I0917 23:14:35.231917 31510 replica.cpp:741] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0917 23:14:35.233129 31526 recover.cpp:425] Starting replica recovery I0917 23:14:35.233614 31526 recover.cpp:451] Replica is in EMPTY status I0917 23:14:35.234994 31526 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0917 23:14:35.240116 31519 recover.cpp:188] Received a recover response from a replica in EMPTY status I0917 23:14:35.240782 31519 recover.cpp:542] Updating replica status to STARTING I0917 23:14:35.242846 31524 master.cpp:286] Master 20140917-231435-16842879-34609-31503 (saucy) started on 127.0.1.1:34609 I0917 23:14:35.243191 31524 master.cpp:332] Master only allowing authenticated frameworks to register I0917 23:14:35.243288 31524 master.cpp:339] Master allowing unauthenticated slaves to register I0917 23:14:35.243399 31524 credentials.hpp:36] Loading credentials for authentication from '/tmp/ExamplesTest_JavaFramework_2PcFCh/credentials' W0917 23:14:35.243588 31524 credentials.hpp:51] Permissions on credentials file '/tmp/ExamplesTest_JavaFramework_2PcFCh/credentials' are too open. It is recommended that your credentials file is NOT accessible by others. I0917 23:14:35.243846 31524 master.cpp:366] Authorization enabled I0917 23:14:35.244882 31520 hierarchical_allocator_process.hpp:299] Initializing hierarchical allocator process with master : master@127.0.1.1:34609 I0917 23:14:35.245224 31520 master.cpp:120] No whitelist given. Advertising offers for all slaves I0917 23:14:35.246934 31524 master.cpp:1211] The newly elected leader is master@127.0.1.1:34609 with id 20140917-231435-16842879-34609-31503 I0917 23:14:35.247234 31524 master.cpp:1224] Elected as the leading master! I0917 23:14:35.247336 31524 master.cpp:1042] Recovering from registrar I0917 23:14:35.247542 31526 registrar.cpp:313] Recovering registrar I0917 23:14:35.250555 31510 containerizer.cpp:89] Using isolation: posix/cpu,posix/mem I0917 23:14:35.252326 31510 containerizer.cpp:89] Using isolation: posix/cpu,posix/mem I0917 23:14:35.252821 31520 slave.cpp:169] Slave started on 1)@127.0.1.1:34609 I0917 23:14:35.253552 31520 slave.cpp:289] Slave resources: cpus(*):1; mem(*):1001; disk(*):24988; ports(*):[31000-32000] I0917 23:14:35.253906 31520 slave.cpp:317] Slave hostname: saucy I0917 23:14:35.254004 31520 slave.cpp:318] Slave checkpoint: true I0917 23:14:35.254818 31520 state.cpp:33] Recovering state from '/tmp/mesos-w8snRW/0/meta' I0917 23:14:35.255106 31519 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 13.99622ms I0917
[jira] [Created] (MESOS-1815) Create a guide to becoming a committer
Dominic Hamon created MESOS-1815: Summary: Create a guide to becoming a committer Key: MESOS-1815 URL: https://issues.apache.org/jira/browse/MESOS-1815 Project: Mesos Issue Type: Documentation Components: documentation Reporter: Dominic Hamon Assignee: Dominic Hamon We have a committer's guide, but the process by which one becomes a committer is unclear. We should set some guidelines and a process by which we can grow contributors into committers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1815) Create a guide to becoming a committer
[ https://issues.apache.org/jira/browse/MESOS-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139232#comment-14139232 ] Dominic Hamon commented on MESOS-1815: -- Please review at https://reviews.apache.org/r/25785/ Create a guide to becoming a committer -- Key: MESOS-1815 URL: https://issues.apache.org/jira/browse/MESOS-1815 Project: Mesos Issue Type: Documentation Components: documentation Reporter: Dominic Hamon Assignee: Dominic Hamon We have a committer's guide, but the process by which one becomes a committer is unclear. We should set some guidelines and a process by which we can grow contributors into committers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1662) Mesos doesn't limit swap
[ https://issues.apache.org/jira/browse/MESOS-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139289#comment-14139289 ] Chi Hoang commented on MESOS-1662: -- awesome! thanks! Mesos doesn't limit swap Key: MESOS-1662 URL: https://issues.apache.org/jira/browse/MESOS-1662 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.19.1 Reporter: Andrew Forgue Assignee: Anton Lindström Fix For: 0.20.0 When using control groups, mesos will limit memory usage, but if the CONFIG_MEMCG_SWAP config option is enabled swap usage is not limited. This means that if a task that asked for 1G and allocated 4G, it will fill 3G of swap. The expected behavior is that the cgroup should have OOMed. The control group key for limiting both Memory+Swap is memory.memsw.limit_in_bytes (not memory.limit_in_bytes). It looks like CONFIG_MEMCG_SWAP showed up in Kernel 3.6. Mesos should limit swap+memory if possible. I can't imagine when you'd want to limit memory but not swap, but there may be some situations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1808) expose RTT in container stats
[ https://issues.apache.org/jira/browse/MESOS-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-1808: -- Assignee: Chi Zhang (was: Jie Yu) expose RTT in container stats - Key: MESOS-1808 URL: https://issues.apache.org/jira/browse/MESOS-1808 Project: Mesos Issue Type: Task Components: containerization Reporter: Dominic Hamon Assignee: Chi Zhang As we expose the bandwidth, so we should expose the RTT as a measure of latency each container is experiencing. We can use {{ss}} to get the per-socket statistics and filter and aggregate accordingly to get a measure of RTT. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1384) Add support for loadable MesosModule
[ https://issues.apache.org/jira/browse/MESOS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139349#comment-14139349 ] Timothy St. Clair commented on MESOS-1384: -- Folks - I think this is ready for review. You might want to make a couple of minor changes around named loading: e.g. libFoo.so, libFoo.dylib The load could check for extension, and in absence do the right thing. load (Foo) Add support for loadable MesosModule Key: MESOS-1384 URL: https://issues.apache.org/jira/browse/MESOS-1384 Project: Mesos Issue Type: Improvement Affects Versions: 0.19.0 Reporter: Timothy St. Clair Assignee: Niklas Quarfot Nielsen I think we should break this into multiple phases. -(1) Let's get the dynamic library loading via a stout-ified version of https://github.com/timothysc/tests/blob/master/plugin_modules/DynamicLibrary.h. - *DONE* (2) Use (1) to instantiate some classes in Mesos (like an Authenticator and/or isolator) from a dynamic library. This will give us some more experience with how we want to name the underlying library symbol, how we want to specify flags for finding the library, what types of validation we want when loading a library. *TARGET* (3) After doing (2) for one or two classes in Mesos I think we can formalize the approach in a mesos-ified version of https://github.com/timothysc/tests/blob/master/plugin_modules/MesosModule.h. *NEXT* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1384) Add support for loadable MesosModule
[ https://issues.apache.org/jira/browse/MESOS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139390#comment-14139390 ] Bernd Mathiske commented on MESOS-1384: --- [~tstclair] Thanks for the vote of confidence! We will make a code improvement pass now and also remove non-essentials to get to a minimal viable first patch. Add support for loadable MesosModule Key: MESOS-1384 URL: https://issues.apache.org/jira/browse/MESOS-1384 Project: Mesos Issue Type: Improvement Affects Versions: 0.19.0 Reporter: Timothy St. Clair Assignee: Niklas Quarfot Nielsen I think we should break this into multiple phases. -(1) Let's get the dynamic library loading via a stout-ified version of https://github.com/timothysc/tests/blob/master/plugin_modules/DynamicLibrary.h. - *DONE* (2) Use (1) to instantiate some classes in Mesos (like an Authenticator and/or isolator) from a dynamic library. This will give us some more experience with how we want to name the underlying library symbol, how we want to specify flags for finding the library, what types of validation we want when loading a library. *TARGET* (3) After doing (2) for one or two classes in Mesos I think we can formalize the approach in a mesos-ified version of https://github.com/timothysc/tests/blob/master/plugin_modules/MesosModule.h. *NEXT* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-1384) Add support for loadable MesosModule
[ https://issues.apache.org/jira/browse/MESOS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139390#comment-14139390 ] Bernd Mathiske edited comment on MESOS-1384 at 9/18/14 7:57 PM: [~tstclair] Thanks for the vote of confidence! We can a code improvement pass now and also remove non-essentials to get to a minimal viable first patch. However, we still have to solve the question what the command line interface should look like. Go for JSON right away? On the command line? Or maybe this: keep the simple format (lib path:module name,...) we have right now and also add a second flag that points at a JSON file? was (Author: bernd-mesos): [~tstclair] Thanks for the vote of confidence! We will can a code improvement pass now and also remove non-essentials to get to a minimal viable first patch. However, we still have to solve the question what the command line interface should look like. Go for JSON right away? On the command line? Or maybe this: keep the simple format (lib path:module name,...) we have right now and also add a second flag that points at a JSON file? Add support for loadable MesosModule Key: MESOS-1384 URL: https://issues.apache.org/jira/browse/MESOS-1384 Project: Mesos Issue Type: Improvement Affects Versions: 0.19.0 Reporter: Timothy St. Clair Assignee: Niklas Quarfot Nielsen I think we should break this into multiple phases. -(1) Let's get the dynamic library loading via a stout-ified version of https://github.com/timothysc/tests/blob/master/plugin_modules/DynamicLibrary.h. - *DONE* (2) Use (1) to instantiate some classes in Mesos (like an Authenticator and/or isolator) from a dynamic library. This will give us some more experience with how we want to name the underlying library symbol, how we want to specify flags for finding the library, what types of validation we want when loading a library. *TARGET* (3) After doing (2) for one or two classes in Mesos I think we can formalize the approach in a mesos-ified version of https://github.com/timothysc/tests/blob/master/plugin_modules/MesosModule.h. *NEXT* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-1384) Add support for loadable MesosModule
[ https://issues.apache.org/jira/browse/MESOS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139390#comment-14139390 ] Bernd Mathiske edited comment on MESOS-1384 at 9/18/14 7:57 PM: [~tstclair] Thanks for the vote of confidence! We will can a code improvement pass now and also remove non-essentials to get to a minimal viable first patch. However, we still have to solve the question what the command line interface should look like. Go for JSON right away? On the command line? Or maybe this: keep the simple format (lib path:module name,...) we have right now and also add a second flag that points at a JSON file? was (Author: bernd-mesos): [~tstclair] Thanks for the vote of confidence! We will make a code improvement pass now and also remove non-essentials to get to a minimal viable first patch. Add support for loadable MesosModule Key: MESOS-1384 URL: https://issues.apache.org/jira/browse/MESOS-1384 Project: Mesos Issue Type: Improvement Affects Versions: 0.19.0 Reporter: Timothy St. Clair Assignee: Niklas Quarfot Nielsen I think we should break this into multiple phases. -(1) Let's get the dynamic library loading via a stout-ified version of https://github.com/timothysc/tests/blob/master/plugin_modules/DynamicLibrary.h. - *DONE* (2) Use (1) to instantiate some classes in Mesos (like an Authenticator and/or isolator) from a dynamic library. This will give us some more experience with how we want to name the underlying library symbol, how we want to specify flags for finding the library, what types of validation we want when loading a library. *TARGET* (3) After doing (2) for one or two classes in Mesos I think we can formalize the approach in a mesos-ified version of https://github.com/timothysc/tests/blob/master/plugin_modules/MesosModule.h. *NEXT* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1384) Add support for loadable MesosModule
[ https://issues.apache.org/jira/browse/MESOS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139412#comment-14139412 ] Timothy St. Clair commented on MESOS-1384: -- Keep it simple for now, as I fully expect this to iterate over time. It's also auxiliary and nothing depends on it yet, so until that point happens there can be refinement. Add support for loadable MesosModule Key: MESOS-1384 URL: https://issues.apache.org/jira/browse/MESOS-1384 Project: Mesos Issue Type: Improvement Affects Versions: 0.19.0 Reporter: Timothy St. Clair Assignee: Niklas Quarfot Nielsen I think we should break this into multiple phases. -(1) Let's get the dynamic library loading via a stout-ified version of https://github.com/timothysc/tests/blob/master/plugin_modules/DynamicLibrary.h. - *DONE* (2) Use (1) to instantiate some classes in Mesos (like an Authenticator and/or isolator) from a dynamic library. This will give us some more experience with how we want to name the underlying library symbol, how we want to specify flags for finding the library, what types of validation we want when loading a library. *TARGET* (3) After doing (2) for one or two classes in Mesos I think we can formalize the approach in a mesos-ified version of https://github.com/timothysc/tests/blob/master/plugin_modules/MesosModule.h. *NEXT* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MESOS-1809) Modify docker pull to use docker inspect after a successful pull
[ https://issues.apache.org/jira/browse/MESOS-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen resolved MESOS-1809. - Resolution: Fixed Modify docker pull to use docker inspect after a successful pull Key: MESOS-1809 URL: https://issues.apache.org/jira/browse/MESOS-1809 Project: Mesos Issue Type: Bug Reporter: Timothy Chen Assignee: Timothy Chen Currently in docker pull we read the stdout of pull to construct the docker image object, however it contains extra output from stdout. We should docker inspect after pull instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1809) Modify docker pull to use docker inspect after a successful pull
[ https://issues.apache.org/jira/browse/MESOS-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-1809: -- Fix Version/s: 0.20.1 Modify docker pull to use docker inspect after a successful pull Key: MESOS-1809 URL: https://issues.apache.org/jira/browse/MESOS-1809 Project: Mesos Issue Type: Bug Reporter: Timothy Chen Assignee: Timothy Chen Fix For: 0.20.1 Currently in docker pull we read the stdout of pull to construct the docker image object, however it contains extra output from stdout. We should docker inspect after pull instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1675) Decouple version of the mesos library from the package release version
[ https://issues.apache.org/jira/browse/MESOS-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139531#comment-14139531 ] Timothy St. Clair commented on MESOS-1675: -- Provided that they linked to libmesos.so, I don't believe so. Decouple version of the mesos library from the package release version -- Key: MESOS-1675 URL: https://issues.apache.org/jira/browse/MESOS-1675 Project: Mesos Issue Type: Bug Reporter: Vinod Kone This discussion should be rolled into the larger discussion around how to version Mesos (APIs, packages, libraries etc). Some notes from libtool docs. http://www.gnu.org/software/libtool/manual/html_node/Updating-version-info.html http://www.gnu.org/software/libtool/manual/html_node/Release-numbers.html#Release-numbers -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1809) Modify docker pull to use docker inspect after a successful pull
[ https://issues.apache.org/jira/browse/MESOS-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139532#comment-14139532 ] Timothy Chen commented on MESOS-1809: - commit 48db9a513fac0066c8f38aa98b8d893fdf298998 Author: Timothy Chen tnac...@apache.org Date: Thu Sep 18 02:11:40 2014 -0700 Modify Docker::pull to call inspect after pull. Review: https://reviews.apache.org/r/25758 Modify docker pull to use docker inspect after a successful pull Key: MESOS-1809 URL: https://issues.apache.org/jira/browse/MESOS-1809 Project: Mesos Issue Type: Bug Reporter: Timothy Chen Assignee: Timothy Chen Fix For: 0.20.1 Currently in docker pull we read the stdout of pull to construct the docker image object, however it contains extra output from stdout. We should docker inspect after pull instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1384) Add support for loadable MesosModule
[ https://issues.apache.org/jira/browse/MESOS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139553#comment-14139553 ] Vinod Kone commented on MESOS-1384: --- Please have the flag as JSON. It's easy to maintain. Our JSON flag parser accepts a file with JSON or raw JSON string. Add support for loadable MesosModule Key: MESOS-1384 URL: https://issues.apache.org/jira/browse/MESOS-1384 Project: Mesos Issue Type: Improvement Affects Versions: 0.19.0 Reporter: Timothy St. Clair Assignee: Niklas Quarfot Nielsen I think we should break this into multiple phases. -(1) Let's get the dynamic library loading via a stout-ified version of https://github.com/timothysc/tests/blob/master/plugin_modules/DynamicLibrary.h. - *DONE* (2) Use (1) to instantiate some classes in Mesos (like an Authenticator and/or isolator) from a dynamic library. This will give us some more experience with how we want to name the underlying library symbol, how we want to specify flags for finding the library, what types of validation we want when loading a library. *TARGET* (3) After doing (2) for one or two classes in Mesos I think we can formalize the approach in a mesos-ified version of https://github.com/timothysc/tests/blob/master/plugin_modules/MesosModule.h. *NEXT* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1384) Add support for loadable MesosModule
[ https://issues.apache.org/jira/browse/MESOS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139575#comment-14139575 ] Niklas Quarfot Nielsen commented on MESOS-1384: --- [~vinodkone] +1 Add support for loadable MesosModule Key: MESOS-1384 URL: https://issues.apache.org/jira/browse/MESOS-1384 Project: Mesos Issue Type: Improvement Affects Versions: 0.19.0 Reporter: Timothy St. Clair Assignee: Niklas Quarfot Nielsen I think we should break this into multiple phases. -(1) Let's get the dynamic library loading via a stout-ified version of https://github.com/timothysc/tests/blob/master/plugin_modules/DynamicLibrary.h. - *DONE* (2) Use (1) to instantiate some classes in Mesos (like an Authenticator and/or isolator) from a dynamic library. This will give us some more experience with how we want to name the underlying library symbol, how we want to specify flags for finding the library, what types of validation we want when loading a library. *TARGET* (3) After doing (2) for one or two classes in Mesos I think we can formalize the approach in a mesos-ified version of https://github.com/timothysc/tests/blob/master/plugin_modules/MesosModule.h. *NEXT* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1816) lxc execution driver support for docker containerizer
[ https://issues.apache.org/jira/browse/MESOS-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugen Feller updated MESOS-1816: Summary: lxc execution driver support for docker containerizer (was: lxc execution driver for docker containerizer) lxc execution driver support for docker containerizer - Key: MESOS-1816 URL: https://issues.apache.org/jira/browse/MESOS-1816 Project: Mesos Issue Type: Improvement Components: containerization Affects Versions: 0.20.1 Reporter: Eugen Feller Labels: docker Hi all, One way to get networking up and running in Docker is to use the bridge mode. The bridge mode results in Docker automatically assigning IPs to the containers from the IP range specified on the docker0 bridge. In our setup we need to manage IPs using our own DHCP server. Unfortunately this is not supported by Docker's libcontainer execution driver. Instead, the lxc execution driver (http://blog.docker.com/2014/03/docker-0-9-introducing-execution-drivers-and-libcontainer/) can be used. In order to use the lxc execution driver, Docker daemon needs to be started with the -e lxc flag. Once started, Docker own networking can be disabled and lxc options can be passed to the docker run command. For example: $ docker run -n=false --lxc-conf=lxc.network.type = veth --lxc-conf=lxc.network.link = br0 --lxc-conf=lxc.network.name = eth0 -lxc-conf=lxc.network.flags = up This will force Docker to use my own bridge br0. Moreover, IP can be assigned to the eth0 interface by executing the dhclient eth0 command inside the started container. In the previous integration of Docker in Mesos (using Deimos), I have passed the aforementioned options using the options flag in Marathon. However, with the new changes this is no longer possible. It would be great to support the lxc execution driver in the current Docker integration. Thanks. Best regards, Eugen -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-1816) lxc execution driver for docker containerizer
Eugen Feller created MESOS-1816: --- Summary: lxc execution driver for docker containerizer Key: MESOS-1816 URL: https://issues.apache.org/jira/browse/MESOS-1816 Project: Mesos Issue Type: Improvement Components: containerization Affects Versions: 0.20.1 Reporter: Eugen Feller Hi all, One way to get networking up and running in Docker is to use the bridge mode. The bridge mode results in Docker automatically assigning IPs to the containers from the IP range specified on the docker0 bridge. In our setup we need to manage IPs using our own DHCP server. Unfortunately this is not supported by Docker's libcontainer execution driver. Instead, the lxc execution driver (http://blog.docker.com/2014/03/docker-0-9-introducing-execution-drivers-and-libcontainer/) can be used. In order to use the lxc execution driver, Docker daemon needs to be started with the -e lxc flag. Once started, Docker own networking can be disabled and lxc options can be passed to the docker run command. For example: $ docker run -n=false --lxc-conf=lxc.network.type = veth --lxc-conf=lxc.network.link = br0 --lxc-conf=lxc.network.name = eth0 -lxc-conf=lxc.network.flags = up This will force Docker to use my own bridge br0. Moreover, IP can be assigned to the eth0 interface by executing the dhclient eth0 command inside the started container. In the previous integration of Docker in Mesos (using Deimos), I have passed the aforementioned options using the options flag in Marathon. However, with the new changes this is no longer possible. It would be great to support the lxc execution driver in the current Docker integration. Thanks. Best regards, Eugen -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1816) lxc execution driver support for docker containerizer
[ https://issues.apache.org/jira/browse/MESOS-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugen Feller updated MESOS-1816: Description: Hi all, One way to get networking up and running in Docker is to use the bridge mode. The bridge mode results in Docker automatically assigning IPs to the containers from the IP range specified on the docker0 bridge. In our setup we need to manage IPs using our own DHCP server. Unfortunately this is not supported by Docker's libcontainer execution driver. Instead, the lxc execution driver (http://blog.docker.com/2014/03/docker-0-9-introducing-execution-drivers-and-libcontainer/) can be used. In order to use the lxc execution driver, Docker daemon needs to be started with the -e lxc flag. Once started, Docker own networking can be disabled and lxc options can be passed to the docker run command. For example: $ docker run -n=false --lxc-conf=lxc.network.type = veth --lxc-conf=lxc.network.link = br0 --lxc-conf=lxc.network.name = eth0 -lxc-conf=lxc.network.flags = up ... This will force Docker to use my own bridge br0. Moreover, IP can be assigned to the eth0 interface by executing the dhclient eth0 command inside the started container. In the previous integration of Docker in Mesos (using Deimos), I have passed the aforementioned options using the options flag in Marathon. However, with the new changes this is no longer possible. It would be great to support the lxc execution driver in the current Docker integration. Thanks. Best regards, Eugen was: Hi all, One way to get networking up and running in Docker is to use the bridge mode. The bridge mode results in Docker automatically assigning IPs to the containers from the IP range specified on the docker0 bridge. In our setup we need to manage IPs using our own DHCP server. Unfortunately this is not supported by Docker's libcontainer execution driver. Instead, the lxc execution driver (http://blog.docker.com/2014/03/docker-0-9-introducing-execution-drivers-and-libcontainer/) can be used. In order to use the lxc execution driver, Docker daemon needs to be started with the -e lxc flag. Once started, Docker own networking can be disabled and lxc options can be passed to the docker run command. For example: $ docker run -n=false --lxc-conf=lxc.network.type = veth --lxc-conf=lxc.network.link = br0 --lxc-conf=lxc.network.name = eth0 -lxc-conf=lxc.network.flags = up This will force Docker to use my own bridge br0. Moreover, IP can be assigned to the eth0 interface by executing the dhclient eth0 command inside the started container. In the previous integration of Docker in Mesos (using Deimos), I have passed the aforementioned options using the options flag in Marathon. However, with the new changes this is no longer possible. It would be great to support the lxc execution driver in the current Docker integration. Thanks. Best regards, Eugen lxc execution driver support for docker containerizer - Key: MESOS-1816 URL: https://issues.apache.org/jira/browse/MESOS-1816 Project: Mesos Issue Type: Improvement Components: containerization Affects Versions: 0.20.1 Reporter: Eugen Feller Labels: docker Hi all, One way to get networking up and running in Docker is to use the bridge mode. The bridge mode results in Docker automatically assigning IPs to the containers from the IP range specified on the docker0 bridge. In our setup we need to manage IPs using our own DHCP server. Unfortunately this is not supported by Docker's libcontainer execution driver. Instead, the lxc execution driver (http://blog.docker.com/2014/03/docker-0-9-introducing-execution-drivers-and-libcontainer/) can be used. In order to use the lxc execution driver, Docker daemon needs to be started with the -e lxc flag. Once started, Docker own networking can be disabled and lxc options can be passed to the docker run command. For example: $ docker run -n=false --lxc-conf=lxc.network.type = veth --lxc-conf=lxc.network.link = br0 --lxc-conf=lxc.network.name = eth0 -lxc-conf=lxc.network.flags = up ... This will force Docker to use my own bridge br0. Moreover, IP can be assigned to the eth0 interface by executing the dhclient eth0 command inside the started container. In the previous integration of Docker in Mesos (using Deimos), I have passed the aforementioned options using the options flag in Marathon. However, with the new changes this is no longer possible. It would be great to support the lxc execution driver in the current Docker integration. Thanks. Best regards, Eugen -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1814) Task attempted to use more offers than requested in example jave and python frameworks
[ https://issues.apache.org/jira/browse/MESOS-1814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139686#comment-14139686 ] Vinod Kone commented on MESOS-1814: --- https://reviews.apache.org/r/25801/ Task attempted to use more offers than requested in example jave and python frameworks -- Key: MESOS-1814 URL: https://issues.apache.org/jira/browse/MESOS-1814 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.21.0 Reporter: Vinod Kone Assignee: Vinod Kone {code} [ RUN ] ExamplesTest.JavaFramework Using temporary directory '/tmp/ExamplesTest_JavaFramework_2PcFCh' Enabling authentication for the framework WARNING: Logging before InitGoogleLogging() is written to STDERR I0917 23:14:35.199069 31510 process.cpp:1771] libprocess is initialized on 127.0.1.1:34609 for 8 cpus I0917 23:14:35.199794 31510 logging.cpp:177] Logging to STDERR I0917 23:14:35.225342 31510 leveldb.cpp:176] Opened db in 22.197149ms I0917 23:14:35.231133 31510 leveldb.cpp:183] Compacted db in 5.601897ms I0917 23:14:35.231498 31510 leveldb.cpp:198] Created db iterator in 215441ns I0917 23:14:35.231608 31510 leveldb.cpp:204] Seeked to beginning of db in 11488ns I0917 23:14:35.231722 31510 leveldb.cpp:273] Iterated through 0 keys in the db in 14016ns I0917 23:14:35.231917 31510 replica.cpp:741] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0917 23:14:35.233129 31526 recover.cpp:425] Starting replica recovery I0917 23:14:35.233614 31526 recover.cpp:451] Replica is in EMPTY status I0917 23:14:35.234994 31526 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0917 23:14:35.240116 31519 recover.cpp:188] Received a recover response from a replica in EMPTY status I0917 23:14:35.240782 31519 recover.cpp:542] Updating replica status to STARTING I0917 23:14:35.242846 31524 master.cpp:286] Master 20140917-231435-16842879-34609-31503 (saucy) started on 127.0.1.1:34609 I0917 23:14:35.243191 31524 master.cpp:332] Master only allowing authenticated frameworks to register I0917 23:14:35.243288 31524 master.cpp:339] Master allowing unauthenticated slaves to register I0917 23:14:35.243399 31524 credentials.hpp:36] Loading credentials for authentication from '/tmp/ExamplesTest_JavaFramework_2PcFCh/credentials' W0917 23:14:35.243588 31524 credentials.hpp:51] Permissions on credentials file '/tmp/ExamplesTest_JavaFramework_2PcFCh/credentials' are too open. It is recommended that your credentials file is NOT accessible by others. I0917 23:14:35.243846 31524 master.cpp:366] Authorization enabled I0917 23:14:35.244882 31520 hierarchical_allocator_process.hpp:299] Initializing hierarchical allocator process with master : master@127.0.1.1:34609 I0917 23:14:35.245224 31520 master.cpp:120] No whitelist given. Advertising offers for all slaves I0917 23:14:35.246934 31524 master.cpp:1211] The newly elected leader is master@127.0.1.1:34609 with id 20140917-231435-16842879-34609-31503 I0917 23:14:35.247234 31524 master.cpp:1224] Elected as the leading master! I0917 23:14:35.247336 31524 master.cpp:1042] Recovering from registrar I0917 23:14:35.247542 31526 registrar.cpp:313] Recovering registrar I0917 23:14:35.250555 31510 containerizer.cpp:89] Using isolation: posix/cpu,posix/mem I0917 23:14:35.252326 31510 containerizer.cpp:89] Using isolation: posix/cpu,posix/mem I0917 23:14:35.252821 31520 slave.cpp:169] Slave started on 1)@127.0.1.1:34609 I0917 23:14:35.253552 31520 slave.cpp:289] Slave resources: cpus(*):1; mem(*):1001; disk(*):24988; ports(*):[31000-32000] I0917 23:14:35.253906 31520 slave.cpp:317] Slave hostname: saucy I0917 23:14:35.254004 31520 slave.cpp:318] Slave checkpoint: true I0917 23:14:35.254818 31520 state.cpp:33] Recovering state from '/tmp/mesos-w8snRW/0/meta' I0917 23:14:35.255106 31519 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 13.99622ms I0917 23:14:35.255235 31519 replica.cpp:320] Persisted replica status to STARTING I0917 23:14:35.255419 31519 recover.cpp:451] Replica is in STARTING status I0917 23:14:35.255834 31519 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0917 23:14:35.256000 31519 recover.cpp:188] Received a recover response from a replica in STARTING status I0917 23:14:35.256217 31519 recover.cpp:542] Updating replica status to VOTING I0917 23:14:35.256641 31520 status_update_manager.cpp:193] Recovering status update manager I0917 23:14:35.257064 31520 containerizer.cpp:252] Recovering containerizer I0917 23:14:35.257725 31520 slave.cpp:3220] Finished recovery I0917 23:14:35.258463 31520 slave.cpp:600] New master detected at
[jira] [Commented] (MESOS-1813) Fail fast in example frameworks if task goes into unexpected state
[ https://issues.apache.org/jira/browse/MESOS-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139777#comment-14139777 ] Vinod Kone commented on MESOS-1813: --- https://reviews.apache.org/r/25805/ Fail fast in example frameworks if task goes into unexpected state -- Key: MESOS-1813 URL: https://issues.apache.org/jira/browse/MESOS-1813 Project: Mesos Issue Type: Improvement Components: test Reporter: Vinod Kone Assignee: Vinod Kone Most of the example frameworks launch a bunch of tasks and exit if *all* of them reach FINISHED state. But if there is a bug in the code resulting in TASK_LOST, the framework waits forever. Instead the framework should abort if an un-expected task state is encountered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1817) Completed tasks remains in TASK_RUNNING when framework is disconnected
[ https://issues.apache.org/jira/browse/MESOS-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-1817: -- Description: We have run into a problem that cause tasks which completes, when a framework is disconnected and has a fail-over time, to remain in a running state even though the tasks actually finishes. This hogs the cluster and gives users a inconsistent view of the cluster state. Going to the slave, the task is finished. Going to the master, the task is still in a non-terminal state. When the scheduler reattaches or the failover timeout expires, the tasks finishes correctly. The current workflow of this scheduler has a long fail-over timeout, but may on the other hand never reattach. Here is a test framework we have been able to reproduce the issue with: https://gist.github.com/nqn/9b9b1de9123a6e836f54 It launches many short-lived tasks (1 second sleep) and when killing the framework instance, the master reports the tasks as running even after several minutes: http://cl.ly/image/2R3719461e0t/Screen%20Shot%202014-09-10%20at%203.19.39%20PM.png When clicking on one of the slaves where, for example, task 49 runs; the slave knows that it completed: http://cl.ly/image/2P410L3m1O1N/Screen%20Shot%202014-09-10%20at%203.21.29%20PM.png Here is the log of a mesos-local instance where I reproduced it: https://gist.github.com/nqn/f7ee20601199d70787c0 (Here task 10 to 19 are stuck in running state). There is a lot of output, so here is a filtered log for task 10: https://gist.github.com/nqn/a53e5ea05c5e41cd5a7d The problem turn out to be an issue with the ack-cycle of status updates: If the framework disconnects (with a failover timeout set), the status update manage on the slaves will keep trying to send the front of status update stream to the master (which in turn forwards it to the framework). If the first status update after the disconnect is terminal, things work out fine; the master pick the terminal state up, removes the task and release the resources. If, on the other hand, one non-terminal status is in the stream. The master will never know that the task finished (or failed) before the framework reconnects. During a discussion on the dev mailing list (http://mail-archives.apache.org/mod_mbox/mesos-dev/201409.mbox/%3cCADKthhAVR5mrq1s9HXw1BB_XFALXWWxjutp7MV4y3wP-Bh=a...@mail.gmail.com%3e) we enumerated a couple of options to solve this problem. First off, having two ack-cycles: one between masters and slaves and one between masters and frameworks, would be ideal. We would be able to replay the statuses in order while keeping the master state current. However, this requires us to persist the master state in a replicated storage. As a first pass, we can make sure that the tasks caught in a running state doesn't hog the cluster when completed and the framework being disconnected. Here is a proof-of-concept to work out of: https://github.com/nqn/mesos/tree/niklas/status-update-disconnect/ A new (optional) field have been added to the internal status update message: https://github.com/nqn/mesos/blob/niklas/status-update-disconnect/src/messages/messages.proto#L68 Which makes it possible for the status update manager to set the field, if the latest status was terminal: https://github.com/nqn/mesos/blob/niklas/status-update-disconnect/src/slave/status_update_manager.cpp#L501 I added a test which should high-light the issue as well: https://github.com/nqn/mesos/blob/niklas/status-update-disconnect/src/tests/fault_tolerance_tests.cpp#L2478 I would love some input on the approach before moving on. There are rough edges in the PoC which (of course) should be addressed before bringing it for up review. was: We have run into a problem that cause tasks which completes, when a framework is disconnected and has a fail-over time, to remain in a running state even though the tasks actually finishes. This hogs the cluster and gives users a inconsistent view of the cluster state. Going to the slave, the task is finished. Going to the master, the task is still in a non-terminal state. When the scheduler reattaches or the failover timeout expires, the tasks finishes correctly. The current workflow of this scheduler has a long fail-over timeout, but may on the other hand never reattach. Here is a test framework we have been able to reproduce the issue with: https://gist.github.com/nqn/9b9b1de9123a6e836f54 It launches many short-lived tasks (1 second sleep) and when killing the framework instance, the master reports the tasks as running even after several minutes: http://cl.ly/image/2R3719461e0t/Screen%20Shot%202014-09-10%20at%203.19.39%20PM.png When clicking on one of the slaves where, for example, task 49 runs; the slave knows that it completed: http://cl.ly/image/2P410L3m1O1N/Screen%20Shot%202014-09-10%20at%203.21.29%20PM.png Here is the log of a
[jira] [Created] (MESOS-1818) AllocatorTest/0.ResourcesUnused sometimes segfaults
Vinod Kone created MESOS-1818: - Summary: AllocatorTest/0.ResourcesUnused sometimes segfaults Key: MESOS-1818 URL: https://issues.apache.org/jira/browse/MESOS-1818 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.21.0 Reporter: Vinod Kone Priority: Critical {code} [ RUN ] AllocatorTest/0.ResourcesUnused *** Aborted at 1411088950 (unix time) try date -d @1411088950 if you are using GNU date *** PC: @ 0x8649a4 mesos::SlaveID::value() *** SIGSEGV (@0x2de9) received by PID 20876 (TID 0x7fb63a1c0940) from PID 11753; stack trace: *** @ 0x7fb643ec4ca0 (unknown) @ 0x8649a4 mesos::SlaveID::value() @ 0x8741c7 mesos::hash_value() @ 0x8f7448 boost::hash::operator()() @ 0x8e0bed boost::unordered::detail::mix64_policy::apply_hash() @ 0x7fb64694c1cf boost::unordered::detail::table::hash() @ 0x7fb646973615 boost::unordered::detail::table::find_node() @ 0x7fb64694c191 boost::unordered::detail::table_impl::count() @ 0x7fb64691f3c1 boost::unordered::unordered_map::count() @ 0x7fb6468f4373 hashmap::contains() @ 0x7fb6468c5eda mesos::internal::master::Master::getSlave() @ 0x7fb6468c0fc3 mesos::internal::master::Master::removeFramework() @ 0x7fb6468afa9f mesos::internal::master::Master::unregisterFramework() @ 0x7fb646904ab9 ProtobufProcess::handler1() @ 0x7fb6469a1e81 _ZNSt5_BindIFPFvPN5mesos8internal6master6MasterEMS3_FvRKN7process4UPIDERKNS0_11FrameworkIDEEMNS1_26UnregisterFrameworkMessageEKFSB_vES8_RKSsES4_SD_SG_St12_PlaceholderILi1EESL_ILi26__callIvJS8_SI_EJLm0ELm1ELm2ELm3ELm4T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE @ 0x7fb646983afe std::_Bind::operator()() @ 0x7fb64695f83c std::_Function_handler::_M_invoke() @ 0xc4e17f std::function::operator()() @ 0x7fb6468ebd10 ProtobufProcess::visit() @ 0x7fb6468a9892 mesos::internal::master::Master::_visit() @ 0x7fb6468a8f46 mesos::internal::master::Master::visit() @ 0x7fb6468ce670 process::MessageEvent::visit() @ 0x86ad54 process::ProcessBase::serve() @ 0x7fb6470e9738 process::ProcessManager::resume() @ 0x7fb6470dff3f process::schedule() @ 0x7fb643ebc83d start_thread @ 0x7fb642c2426d clone make[3]: *** [check-local] Segmentation fault {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1384) Add support for loadable MesosModule
[ https://issues.apache.org/jira/browse/MESOS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139841#comment-14139841 ] Bernd Mathiske commented on MESOS-1384: --- OK, JSON it is then. Add support for loadable MesosModule Key: MESOS-1384 URL: https://issues.apache.org/jira/browse/MESOS-1384 Project: Mesos Issue Type: Improvement Affects Versions: 0.19.0 Reporter: Timothy St. Clair Assignee: Niklas Quarfot Nielsen I think we should break this into multiple phases. -(1) Let's get the dynamic library loading via a stout-ified version of https://github.com/timothysc/tests/blob/master/plugin_modules/DynamicLibrary.h. - *DONE* (2) Use (1) to instantiate some classes in Mesos (like an Authenticator and/or isolator) from a dynamic library. This will give us some more experience with how we want to name the underlying library symbol, how we want to specify flags for finding the library, what types of validation we want when loading a library. *TARGET* (3) After doing (2) for one or two classes in Mesos I think we can formalize the approach in a mesos-ified version of https://github.com/timothysc/tests/blob/master/plugin_modules/MesosModule.h. *NEXT* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1813) Fail fast in example frameworks if task goes into unexpected state
[ https://issues.apache.org/jira/browse/MESOS-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-1813: -- Sprint: Mesos Q3 Sprint 5 Fail fast in example frameworks if task goes into unexpected state -- Key: MESOS-1813 URL: https://issues.apache.org/jira/browse/MESOS-1813 Project: Mesos Issue Type: Improvement Components: test Reporter: Vinod Kone Assignee: Vinod Kone Fix For: 0.21.0 Most of the example frameworks launch a bunch of tasks and exit if *all* of them reach FINISHED state. But if there is a bug in the code resulting in TASK_LOST, the framework waits forever. Instead the framework should abort if an un-expected task state is encountered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1814) Task attempted to use more offers than requested in example jave and python frameworks
[ https://issues.apache.org/jira/browse/MESOS-1814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-1814: -- Shepherd: Benjamin Mahler (was: Yan Xu) Task attempted to use more offers than requested in example jave and python frameworks -- Key: MESOS-1814 URL: https://issues.apache.org/jira/browse/MESOS-1814 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.21.0 Reporter: Vinod Kone Assignee: Vinod Kone {code} [ RUN ] ExamplesTest.JavaFramework Using temporary directory '/tmp/ExamplesTest_JavaFramework_2PcFCh' Enabling authentication for the framework WARNING: Logging before InitGoogleLogging() is written to STDERR I0917 23:14:35.199069 31510 process.cpp:1771] libprocess is initialized on 127.0.1.1:34609 for 8 cpus I0917 23:14:35.199794 31510 logging.cpp:177] Logging to STDERR I0917 23:14:35.225342 31510 leveldb.cpp:176] Opened db in 22.197149ms I0917 23:14:35.231133 31510 leveldb.cpp:183] Compacted db in 5.601897ms I0917 23:14:35.231498 31510 leveldb.cpp:198] Created db iterator in 215441ns I0917 23:14:35.231608 31510 leveldb.cpp:204] Seeked to beginning of db in 11488ns I0917 23:14:35.231722 31510 leveldb.cpp:273] Iterated through 0 keys in the db in 14016ns I0917 23:14:35.231917 31510 replica.cpp:741] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0917 23:14:35.233129 31526 recover.cpp:425] Starting replica recovery I0917 23:14:35.233614 31526 recover.cpp:451] Replica is in EMPTY status I0917 23:14:35.234994 31526 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0917 23:14:35.240116 31519 recover.cpp:188] Received a recover response from a replica in EMPTY status I0917 23:14:35.240782 31519 recover.cpp:542] Updating replica status to STARTING I0917 23:14:35.242846 31524 master.cpp:286] Master 20140917-231435-16842879-34609-31503 (saucy) started on 127.0.1.1:34609 I0917 23:14:35.243191 31524 master.cpp:332] Master only allowing authenticated frameworks to register I0917 23:14:35.243288 31524 master.cpp:339] Master allowing unauthenticated slaves to register I0917 23:14:35.243399 31524 credentials.hpp:36] Loading credentials for authentication from '/tmp/ExamplesTest_JavaFramework_2PcFCh/credentials' W0917 23:14:35.243588 31524 credentials.hpp:51] Permissions on credentials file '/tmp/ExamplesTest_JavaFramework_2PcFCh/credentials' are too open. It is recommended that your credentials file is NOT accessible by others. I0917 23:14:35.243846 31524 master.cpp:366] Authorization enabled I0917 23:14:35.244882 31520 hierarchical_allocator_process.hpp:299] Initializing hierarchical allocator process with master : master@127.0.1.1:34609 I0917 23:14:35.245224 31520 master.cpp:120] No whitelist given. Advertising offers for all slaves I0917 23:14:35.246934 31524 master.cpp:1211] The newly elected leader is master@127.0.1.1:34609 with id 20140917-231435-16842879-34609-31503 I0917 23:14:35.247234 31524 master.cpp:1224] Elected as the leading master! I0917 23:14:35.247336 31524 master.cpp:1042] Recovering from registrar I0917 23:14:35.247542 31526 registrar.cpp:313] Recovering registrar I0917 23:14:35.250555 31510 containerizer.cpp:89] Using isolation: posix/cpu,posix/mem I0917 23:14:35.252326 31510 containerizer.cpp:89] Using isolation: posix/cpu,posix/mem I0917 23:14:35.252821 31520 slave.cpp:169] Slave started on 1)@127.0.1.1:34609 I0917 23:14:35.253552 31520 slave.cpp:289] Slave resources: cpus(*):1; mem(*):1001; disk(*):24988; ports(*):[31000-32000] I0917 23:14:35.253906 31520 slave.cpp:317] Slave hostname: saucy I0917 23:14:35.254004 31520 slave.cpp:318] Slave checkpoint: true I0917 23:14:35.254818 31520 state.cpp:33] Recovering state from '/tmp/mesos-w8snRW/0/meta' I0917 23:14:35.255106 31519 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 13.99622ms I0917 23:14:35.255235 31519 replica.cpp:320] Persisted replica status to STARTING I0917 23:14:35.255419 31519 recover.cpp:451] Replica is in STARTING status I0917 23:14:35.255834 31519 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0917 23:14:35.256000 31519 recover.cpp:188] Received a recover response from a replica in STARTING status I0917 23:14:35.256217 31519 recover.cpp:542] Updating replica status to VOTING I0917 23:14:35.256641 31520 status_update_manager.cpp:193] Recovering status update manager I0917 23:14:35.257064 31520 containerizer.cpp:252] Recovering containerizer I0917 23:14:35.257725 31520 slave.cpp:3220] Finished recovery I0917 23:14:35.258463 31520 slave.cpp:600] New master detected at master@127.0.1.1:34609 I0917 23:14:35.258769 31524
[jira] [Updated] (MESOS-1818) AllocatorTest/0.ResourcesUnused sometimes segfaults
[ https://issues.apache.org/jira/browse/MESOS-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-1818: -- Assignee: Benjamin Mahler (was: Vinod Kone) AllocatorTest/0.ResourcesUnused sometimes segfaults --- Key: MESOS-1818 URL: https://issues.apache.org/jira/browse/MESOS-1818 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.21.0 Reporter: Vinod Kone Assignee: Benjamin Mahler Priority: Critical {code} [ RUN ] AllocatorTest/0.ResourcesUnused *** Aborted at 1411088950 (unix time) try date -d @1411088950 if you are using GNU date *** PC: @ 0x8649a4 mesos::SlaveID::value() *** SIGSEGV (@0x2de9) received by PID 20876 (TID 0x7fb63a1c0940) from PID 11753; stack trace: *** @ 0x7fb643ec4ca0 (unknown) @ 0x8649a4 mesos::SlaveID::value() @ 0x8741c7 mesos::hash_value() @ 0x8f7448 boost::hash::operator()() @ 0x8e0bed boost::unordered::detail::mix64_policy::apply_hash() @ 0x7fb64694c1cf boost::unordered::detail::table::hash() @ 0x7fb646973615 boost::unordered::detail::table::find_node() @ 0x7fb64694c191 boost::unordered::detail::table_impl::count() @ 0x7fb64691f3c1 boost::unordered::unordered_map::count() @ 0x7fb6468f4373 hashmap::contains() @ 0x7fb6468c5eda mesos::internal::master::Master::getSlave() @ 0x7fb6468c0fc3 mesos::internal::master::Master::removeFramework() @ 0x7fb6468afa9f mesos::internal::master::Master::unregisterFramework() @ 0x7fb646904ab9 ProtobufProcess::handler1() @ 0x7fb6469a1e81 _ZNSt5_BindIFPFvPN5mesos8internal6master6MasterEMS3_FvRKN7process4UPIDERKNS0_11FrameworkIDEEMNS1_26UnregisterFrameworkMessageEKFSB_vES8_RKSsES4_SD_SG_St12_PlaceholderILi1EESL_ILi26__callIvJS8_SI_EJLm0ELm1ELm2ELm3ELm4T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE @ 0x7fb646983afe std::_Bind::operator()() @ 0x7fb64695f83c std::_Function_handler::_M_invoke() @ 0xc4e17f std::function::operator()() @ 0x7fb6468ebd10 ProtobufProcess::visit() @ 0x7fb6468a9892 mesos::internal::master::Master::_visit() @ 0x7fb6468a8f46 mesos::internal::master::Master::visit() @ 0x7fb6468ce670 process::MessageEvent::visit() @ 0x86ad54 process::ProcessBase::serve() @ 0x7fb6470e9738 process::ProcessManager::resume() @ 0x7fb6470dff3f process::schedule() @ 0x7fb643ebc83d start_thread @ 0x7fb642c2426d clone make[3]: *** [check-local] Segmentation fault {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-1818) AllocatorTest/0.ResourcesUnused sometimes segfaults
[ https://issues.apache.org/jira/browse/MESOS-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone reassigned MESOS-1818: - Assignee: Vinod Kone AllocatorTest/0.ResourcesUnused sometimes segfaults --- Key: MESOS-1818 URL: https://issues.apache.org/jira/browse/MESOS-1818 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.21.0 Reporter: Vinod Kone Assignee: Vinod Kone Priority: Critical {code} [ RUN ] AllocatorTest/0.ResourcesUnused *** Aborted at 1411088950 (unix time) try date -d @1411088950 if you are using GNU date *** PC: @ 0x8649a4 mesos::SlaveID::value() *** SIGSEGV (@0x2de9) received by PID 20876 (TID 0x7fb63a1c0940) from PID 11753; stack trace: *** @ 0x7fb643ec4ca0 (unknown) @ 0x8649a4 mesos::SlaveID::value() @ 0x8741c7 mesos::hash_value() @ 0x8f7448 boost::hash::operator()() @ 0x8e0bed boost::unordered::detail::mix64_policy::apply_hash() @ 0x7fb64694c1cf boost::unordered::detail::table::hash() @ 0x7fb646973615 boost::unordered::detail::table::find_node() @ 0x7fb64694c191 boost::unordered::detail::table_impl::count() @ 0x7fb64691f3c1 boost::unordered::unordered_map::count() @ 0x7fb6468f4373 hashmap::contains() @ 0x7fb6468c5eda mesos::internal::master::Master::getSlave() @ 0x7fb6468c0fc3 mesos::internal::master::Master::removeFramework() @ 0x7fb6468afa9f mesos::internal::master::Master::unregisterFramework() @ 0x7fb646904ab9 ProtobufProcess::handler1() @ 0x7fb6469a1e81 _ZNSt5_BindIFPFvPN5mesos8internal6master6MasterEMS3_FvRKN7process4UPIDERKNS0_11FrameworkIDEEMNS1_26UnregisterFrameworkMessageEKFSB_vES8_RKSsES4_SD_SG_St12_PlaceholderILi1EESL_ILi26__callIvJS8_SI_EJLm0ELm1ELm2ELm3ELm4T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE @ 0x7fb646983afe std::_Bind::operator()() @ 0x7fb64695f83c std::_Function_handler::_M_invoke() @ 0xc4e17f std::function::operator()() @ 0x7fb6468ebd10 ProtobufProcess::visit() @ 0x7fb6468a9892 mesos::internal::master::Master::_visit() @ 0x7fb6468a8f46 mesos::internal::master::Master::visit() @ 0x7fb6468ce670 process::MessageEvent::visit() @ 0x86ad54 process::ProcessBase::serve() @ 0x7fb6470e9738 process::ProcessManager::resume() @ 0x7fb6470dff3f process::schedule() @ 0x7fb643ebc83d start_thread @ 0x7fb642c2426d clone make[3]: *** [check-local] Segmentation fault {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1818) AllocatorTest/0.ResourcesUnused sometimes segfaults
[ https://issues.apache.org/jira/browse/MESOS-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-1818: --- Sprint: Mesos Q3 Sprint 5 AllocatorTest/0.ResourcesUnused sometimes segfaults --- Key: MESOS-1818 URL: https://issues.apache.org/jira/browse/MESOS-1818 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.21.0 Reporter: Vinod Kone Assignee: Benjamin Mahler Priority: Critical {code} [ RUN ] AllocatorTest/0.ResourcesUnused *** Aborted at 1411088950 (unix time) try date -d @1411088950 if you are using GNU date *** PC: @ 0x8649a4 mesos::SlaveID::value() *** SIGSEGV (@0x2de9) received by PID 20876 (TID 0x7fb63a1c0940) from PID 11753; stack trace: *** @ 0x7fb643ec4ca0 (unknown) @ 0x8649a4 mesos::SlaveID::value() @ 0x8741c7 mesos::hash_value() @ 0x8f7448 boost::hash::operator()() @ 0x8e0bed boost::unordered::detail::mix64_policy::apply_hash() @ 0x7fb64694c1cf boost::unordered::detail::table::hash() @ 0x7fb646973615 boost::unordered::detail::table::find_node() @ 0x7fb64694c191 boost::unordered::detail::table_impl::count() @ 0x7fb64691f3c1 boost::unordered::unordered_map::count() @ 0x7fb6468f4373 hashmap::contains() @ 0x7fb6468c5eda mesos::internal::master::Master::getSlave() @ 0x7fb6468c0fc3 mesos::internal::master::Master::removeFramework() @ 0x7fb6468afa9f mesos::internal::master::Master::unregisterFramework() @ 0x7fb646904ab9 ProtobufProcess::handler1() @ 0x7fb6469a1e81 _ZNSt5_BindIFPFvPN5mesos8internal6master6MasterEMS3_FvRKN7process4UPIDERKNS0_11FrameworkIDEEMNS1_26UnregisterFrameworkMessageEKFSB_vES8_RKSsES4_SD_SG_St12_PlaceholderILi1EESL_ILi26__callIvJS8_SI_EJLm0ELm1ELm2ELm3ELm4T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE @ 0x7fb646983afe std::_Bind::operator()() @ 0x7fb64695f83c std::_Function_handler::_M_invoke() @ 0xc4e17f std::function::operator()() @ 0x7fb6468ebd10 ProtobufProcess::visit() @ 0x7fb6468a9892 mesos::internal::master::Master::_visit() @ 0x7fb6468a8f46 mesos::internal::master::Master::visit() @ 0x7fb6468ce670 process::MessageEvent::visit() @ 0x86ad54 process::ProcessBase::serve() @ 0x7fb6470e9738 process::ProcessManager::resume() @ 0x7fb6470dff3f process::schedule() @ 0x7fb643ebc83d start_thread @ 0x7fb642c2426d clone make[3]: *** [check-local] Segmentation fault {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)