[jira] [Issue Comment Deleted] (MESOS-2706) When the docker-tasks grow, the time spare between Queuing task and Starting container grows
[ https://issues.apache.org/jira/browse/MESOS-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuhe updated MESOS-2706: Comment: was deleted (was: mesos/src/slave/slave.cpp if (!executor-isCommandExecutor()) { // If the executor is _not_ a command executor, this means that // the task will include the executor to run. The actual task to // run will be enqueued and subsequently handled by the executor // when it has registered to the slave. launch = slave-containerizer-launch( containerId, executorInfo_, // modified to include the task's resources. executor-directory, slave-flags.switch_user ? Optionstring(user) : None(), slave-info.id(), slave-self(), info.checkpoint()); } else { // An executor has _not_ been provided by the task and will // instead define a command and/or container to run. Right now, // these tasks will require an executor anyway and the slave // creates a command executor. However, it is up to the // containerizer how to execute those tasks and the generated // executor info works as a placeholder. // TODO(nnielsen): Obsolete the requirement for executors to run // one-off tasks. launch = slave-containerizer-launch( containerId, taskInfo, executorInfo_, executor-directory, slave-flags.switch_user ? Optionstring(user) : None(), slave-info.id(), slave-self(), info.checkpoint()); }) When the docker-tasks grow, the time spare between Queuing task and Starting container grows Key: MESOS-2706 URL: https://issues.apache.org/jira/browse/MESOS-2706 Project: Mesos Issue Type: Bug Components: docker Affects Versions: 0.22.0 Environment: My Environment info: Mesos 0.22.0 Marathon 0.82-RC1 both running in one host-server. Every docker-task require 0.02 CPU and 128MB ,and the server has 8 cpus and 24G mems. So Mesos can launch thousands of task in theory. And the docker-task is very light-weight to launch a sshd service . Reporter: chenqiuhao At the beginning, Marathon can launch docker-task very fast,but when the number of tasks in the only-one mesos-slave host reached 50,It seemed Marathon lauch docker-task slow. So I check the mesos-slave log,and I found that the time spare between Queuing task and Starting container grew . For example, launch the 1st docker task, it takes about 0.008s [root@CNSH231434 mesos-slave]# tail -f slave.out |egrep 'Queuing task|Starting container' I0508 15:54:00.188350 225779 slave.cpp:1378] Queuing task 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' for executor dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b of framework '20150202-112355-2684495626-5050-26153- I0508 15:54:00.196832 225781 docker.cpp:581] Starting container 'd0b0813a-6cb6-4dfd-bbce-f1b338744285' for task 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' (and executor 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b') of framework '20150202-112355-2684495626-5050-26153-' launch the 50th docker task, it takes about 4.9s I0508 16:12:10.908596 225781 slave.cpp:1378] Queuing task 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' for executor dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b of framework '20150202-112355-2684495626-5050-26153- I0508 16:12:15.801503 225778 docker.cpp:581] Starting container '482dd47f-b9ab-4b09-b89e-e361d6f004a4' for task 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' (and executor 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b') of framework '20150202-112355-2684495626-5050-26153-' And when i launch the 100th docker task,it takes about 13s! And I did the same test in one 24 Cpus and 256G mems server-host, it got the same result. Did somebody have the same experience , or Can help to do the same pressure test ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2706) When the docker-tasks grow, the time spare between Queuing task and Starting container grows
[ https://issues.apache.org/jira/browse/MESOS-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537634#comment-14537634 ] yuhe commented on MESOS-2706: - mesos/src/slave/slave.cpp if (!executor-isCommandExecutor()) { // If the executor is _not_ a command executor, this means that // the task will include the executor to run. The actual task to // run will be enqueued and subsequently handled by the executor // when it has registered to the slave. launch = slave-containerizer-launch( containerId, executorInfo_, // modified to include the task's resources. executor-directory, slave-flags.switch_user ? Optionstring(user) : None(), slave-info.id(), slave-self(), info.checkpoint()); } else { // An executor has _not_ been provided by the task and will // instead define a command and/or container to run. Right now, // these tasks will require an executor anyway and the slave // creates a command executor. However, it is up to the // containerizer how to execute those tasks and the generated // executor info works as a placeholder. // TODO(nnielsen): Obsolete the requirement for executors to run // one-off tasks. launch = slave-containerizer-launch( containerId, taskInfo, executorInfo_, executor-directory, slave-flags.switch_user ? Optionstring(user) : None(), slave-info.id(), slave-self(), info.checkpoint()); } When the docker-tasks grow, the time spare between Queuing task and Starting container grows Key: MESOS-2706 URL: https://issues.apache.org/jira/browse/MESOS-2706 Project: Mesos Issue Type: Bug Components: docker Affects Versions: 0.22.0 Environment: My Environment info: Mesos 0.22.0 Marathon 0.82-RC1 both running in one host-server. Every docker-task require 0.02 CPU and 128MB ,and the server has 8 cpus and 24G mems. So Mesos can launch thousands of task in theory. And the docker-task is very light-weight to launch a sshd service . Reporter: chenqiuhao At the beginning, Marathon can launch docker-task very fast,but when the number of tasks in the only-one mesos-slave host reached 50,It seemed Marathon lauch docker-task slow. So I check the mesos-slave log,and I found that the time spare between Queuing task and Starting container grew . For example, launch the 1st docker task, it takes about 0.008s [root@CNSH231434 mesos-slave]# tail -f slave.out |egrep 'Queuing task|Starting container' I0508 15:54:00.188350 225779 slave.cpp:1378] Queuing task 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' for executor dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b of framework '20150202-112355-2684495626-5050-26153- I0508 15:54:00.196832 225781 docker.cpp:581] Starting container 'd0b0813a-6cb6-4dfd-bbce-f1b338744285' for task 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' (and executor 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b') of framework '20150202-112355-2684495626-5050-26153-' launch the 50th docker task, it takes about 4.9s I0508 16:12:10.908596 225781 slave.cpp:1378] Queuing task 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' for executor dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b of framework '20150202-112355-2684495626-5050-26153- I0508 16:12:15.801503 225778 docker.cpp:581] Starting container '482dd47f-b9ab-4b09-b89e-e361d6f004a4' for task 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' (and executor 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b') of framework '20150202-112355-2684495626-5050-26153-' And when i launch the 100th docker task,it takes about 13s! And I did the same test in one 24 Cpus and 256G mems server-host, it got the same result. Did somebody have the same experience , or Can help to do the same pressure test ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2706) When the docker-tasks grow, the time spare between Queuing task and Starting container grows
[ https://issues.apache.org/jira/browse/MESOS-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537637#comment-14537637 ] yuhe commented on MESOS-2706: - Executor registered to the slave have delay the start time. Who can help? thx When the docker-tasks grow, the time spare between Queuing task and Starting container grows Key: MESOS-2706 URL: https://issues.apache.org/jira/browse/MESOS-2706 Project: Mesos Issue Type: Bug Components: docker Affects Versions: 0.22.0 Environment: My Environment info: Mesos 0.22.0 Marathon 0.82-RC1 both running in one host-server. Every docker-task require 0.02 CPU and 128MB ,and the server has 8 cpus and 24G mems. So Mesos can launch thousands of task in theory. And the docker-task is very light-weight to launch a sshd service . Reporter: chenqiuhao At the beginning, Marathon can launch docker-task very fast,but when the number of tasks in the only-one mesos-slave host reached 50,It seemed Marathon lauch docker-task slow. So I check the mesos-slave log,and I found that the time spare between Queuing task and Starting container grew . For example, launch the 1st docker task, it takes about 0.008s [root@CNSH231434 mesos-slave]# tail -f slave.out |egrep 'Queuing task|Starting container' I0508 15:54:00.188350 225779 slave.cpp:1378] Queuing task 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' for executor dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b of framework '20150202-112355-2684495626-5050-26153- I0508 15:54:00.196832 225781 docker.cpp:581] Starting container 'd0b0813a-6cb6-4dfd-bbce-f1b338744285' for task 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' (and executor 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b') of framework '20150202-112355-2684495626-5050-26153-' launch the 50th docker task, it takes about 4.9s I0508 16:12:10.908596 225781 slave.cpp:1378] Queuing task 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' for executor dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b of framework '20150202-112355-2684495626-5050-26153- I0508 16:12:15.801503 225778 docker.cpp:581] Starting container '482dd47f-b9ab-4b09-b89e-e361d6f004a4' for task 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' (and executor 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b') of framework '20150202-112355-2684495626-5050-26153-' And when i launch the 100th docker task,it takes about 13s! And I did the same test in one 24 Cpus and 256G mems server-host, it got the same result. Did somebody have the same experience , or Can help to do the same pressure test ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2713) Docker resource usage
[ https://issues.apache.org/jira/browse/MESOS-2713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537814#comment-14537814 ] Ian Babrou commented on MESOS-2713: --- I made an example docker container that exposes the issue: https://github.com/bobrik/mesos-wrong-stats Docker resource usage -- Key: MESOS-2713 URL: https://issues.apache.org/jira/browse/MESOS-2713 Project: Mesos Issue Type: Bug Components: containerization, docker, isolation Affects Versions: 0.22.1 Reporter: Ian Babrou Looks like resource usage for docker containers on slaves is not very accurate (/monitor/statistics.json). For example, cpu usage is calculated by travesing process tree and summing up cpu times. Resulting numbers are not even close to real usage, CPU time can even decrease. What is the reason for this if you can use cgroup data directly? Reading cgroup location from pid of docker container is pretty straighforward. Another similar question: what is the reason to set isolation to posix instead of cgroups by default? Looks like it suffers from the same issues as docker containerizer (incorrect stats). More docs on this topic would be great. Posix isolation also leads to bigger CPU usage from mesos slave process (higher usage — posix isolation): http://i.imgur.com/jepk5m6.png -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2479) Task filter input disappears entirely once the search query yields no results
[ https://issues.apache.org/jira/browse/MESOS-2479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537754#comment-14537754 ] Ian Babrou commented on MESOS-2479: --- This works as expected for me. Can you submit a patch? I can do it for you since this issue is driving me nuts. Task filter input disappears entirely once the search query yields no results - Key: MESOS-2479 URL: https://issues.apache.org/jira/browse/MESOS-2479 Project: Mesos Issue Type: Bug Components: webui Reporter: Joe Lee Assignee: Joe Lee Priority: Minor Original Estimate: 1h Remaining Estimate: 1h The search filter at the head of each table on the Web UI disappears as soon as your search token yields no results, making it impossible to edit your search without having to refresh the entire page. This looks to be a simple fix to the hide directive in the table header. It was introduced by commit dfd466cf121bf3482acc73f0461e557a5c3ac299 and undoes this change, as it seems erroneous. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (MESOS-2714) Memory limit for docker containers is set inconsistently
[ https://issues.apache.org/jira/browse/MESOS-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Babrou closed MESOS-2714. - Resolution: Invalid My bad, I forgot that I only updated 10 hosts out of 121 to 0.22.1 to see how it goes. 0.22.1 is definitely better :) Memory limit for docker containers is set inconsistently Key: MESOS-2714 URL: https://issues.apache.org/jira/browse/MESOS-2714 Project: Mesos Issue Type: Bug Components: docker, slave Affects Versions: 0.22.1 Reporter: Ian Babrou I launched 120 docker containers on unique nodes with marathon and monitoring said that they have different memory limits. Memory limit in marathon is set to 64mb, but 9 of 120 slaves reported limit of 96mb. Slaves are identical in terms of hardware and mesos slave versions. I read stats from docker stats api, not from cgroup file. It turned out, that some tasks were started with memory limit of 64mb and some with 96mb. The ones with 64mb were increased to 96mb: I0510 15:29:26.530024 41390 docker.cpp:1298] Updated 'cpu.shares' to 307 at /sys/fs/cgroup/cpu/docker/b020fd33df578a9287b25886b7d9de52353fa943a6c384c4303f8bb552f377cd for container 1e8c9f99-8519-4e35-bee6-69072f357c5e I0510 15:29:26.530828 41390 docker.cpp:1359] Updated 'memory.limit_in_bytes' to 96MB at /sys/fs/cgroup/memory/docker/b020fd33df578a9287b25886b7d9de52353fa943a6c384c4303f8bb552f377cd for container 1e8c9f99-8519-4e35-bee6-69072f357c5e In the end all tasks had 96mb limit in cgroup file, but memory limit reported by docker was different. I think that the limit should be set consistently and all slaves should behave identically. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2588) Create pre-create hook before a Docker container launches
[ https://issues.apache.org/jira/browse/MESOS-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538339#comment-14538339 ] Adam Avilla commented on MESOS-2588: +1. I think this would be an excellent feature. Let me know how I can help get this going and expose through Marathon / Chronos. Create pre-create hook before a Docker container launches - Key: MESOS-2588 URL: https://issues.apache.org/jira/browse/MESOS-2588 Project: Mesos Issue Type: Bug Components: docker Reporter: Timothy Chen Assignee: haosdent To be able to support custom actions to be called before launching a docker contianer, we should create a hook that can be extensible and allow module/hooks to be performed before a docker container is launched. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2588) Create pre-create hook before a Docker container launches
[ https://issues.apache.org/jira/browse/MESOS-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538355#comment-14538355 ] haosdent commented on MESOS-2588: - Hi, [~hekaldama]. Timothy Chen says them still have some no minor changes about docker in mesos, so he suggest me start this issue after them finished to submit those patches. When I start and finish this issue, I would let you know. Thank you. Create pre-create hook before a Docker container launches - Key: MESOS-2588 URL: https://issues.apache.org/jira/browse/MESOS-2588 Project: Mesos Issue Type: Bug Components: docker Reporter: Timothy Chen Assignee: haosdent To be able to support custom actions to be called before launching a docker contianer, we should create a hook that can be extensible and allow module/hooks to be performed before a docker container is launched. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2690) --enable-optimize build fails with maybe-uninitialized
[ https://issues.apache.org/jira/browse/MESOS-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-2690: -- Assignee: Joris Van Remoortere (was: Vinod Kone) --enable-optimize build fails with maybe-uninitialized -- Key: MESOS-2690 URL: https://issues.apache.org/jira/browse/MESOS-2690 Project: Mesos Issue Type: Bug Components: build Environment: GCC 4.8 - 4.9 Reporter: Joris Van Remoortere Assignee: Joris Van Remoortere Priority: Blocker Fix For: 0.23.0 When building with the `enable-optimize` flag, the build fails with `maybe-uninitialized' errors. This is due to a bug in GCC when building optimized code triggering false positives for this warning. Please see: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59970 We can disable this warning when using GCC + --enable-optimize. A quick work-around until there is a patch: ../configure CXXFLAGS=-Wno-maybe-uninitialized your-other-flags-here -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2226) HookTest.VerifySlaveLaunchExecutorHook is flaky
[ https://issues.apache.org/jira/browse/MESOS-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2226: --- Story Points: 5 HookTest.VerifySlaveLaunchExecutorHook is flaky --- Key: MESOS-2226 URL: https://issues.apache.org/jira/browse/MESOS-2226 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.22.0 Reporter: Vinod Kone Assignee: Kapil Arya Labels: flaky, flaky-test Observed this on internal CI {code} [ RUN ] HookTest.VerifySlaveLaunchExecutorHook Using temporary directory '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME' I0114 18:51:34.659353 4720 leveldb.cpp:176] Opened db in 1.255951ms I0114 18:51:34.662112 4720 leveldb.cpp:183] Compacted db in 596090ns I0114 18:51:34.662364 4720 leveldb.cpp:198] Created db iterator in 177877ns I0114 18:51:34.662719 4720 leveldb.cpp:204] Seeked to beginning of db in 19709ns I0114 18:51:34.663010 4720 leveldb.cpp:273] Iterated through 0 keys in the db in 18208ns I0114 18:51:34.663312 4720 replica.cpp:744] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0114 18:51:34.664266 4735 recover.cpp:449] Starting replica recovery I0114 18:51:34.664908 4735 recover.cpp:475] Replica is in EMPTY status I0114 18:51:34.667842 4734 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request I0114 18:51:34.669117 4735 recover.cpp:195] Received a recover response from a replica in EMPTY status I0114 18:51:34.677913 4735 recover.cpp:566] Updating replica status to STARTING I0114 18:51:34.683157 4735 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 137939ns I0114 18:51:34.683507 4735 replica.cpp:323] Persisted replica status to STARTING I0114 18:51:34.684013 4735 recover.cpp:475] Replica is in STARTING status I0114 18:51:34.685554 4738 replica.cpp:641] Replica in STARTING status received a broadcasted recover request I0114 18:51:34.696512 4736 recover.cpp:195] Received a recover response from a replica in STARTING status I0114 18:51:34.700552 4735 recover.cpp:566] Updating replica status to VOTING I0114 18:51:34.701128 4735 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 115624ns I0114 18:51:34.701478 4735 replica.cpp:323] Persisted replica status to VOTING I0114 18:51:34.701817 4735 recover.cpp:580] Successfully joined the Paxos group I0114 18:51:34.702569 4735 recover.cpp:464] Recover process terminated I0114 18:51:34.716439 4736 master.cpp:262] Master 20150114-185134-2272962752-57018-4720 (fedora-19) started on 192.168.122.135:57018 I0114 18:51:34.716913 4736 master.cpp:308] Master only allowing authenticated frameworks to register I0114 18:51:34.717136 4736 master.cpp:313] Master only allowing authenticated slaves to register I0114 18:51:34.717488 4736 credentials.hpp:36] Loading credentials for authentication from '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME/credentials' I0114 18:51:34.718077 4736 master.cpp:357] Authorization enabled I0114 18:51:34.719238 4738 whitelist_watcher.cpp:65] No whitelist given I0114 18:51:34.719755 4737 hierarchical_allocator_process.hpp:285] Initialized hierarchical allocator process I0114 18:51:34.722584 4736 master.cpp:1219] The newly elected leader is master@192.168.122.135:57018 with id 20150114-185134-2272962752-57018-4720 I0114 18:51:34.722865 4736 master.cpp:1232] Elected as the leading master! I0114 18:51:34.723310 4736 master.cpp:1050] Recovering from registrar I0114 18:51:34.723760 4734 registrar.cpp:313] Recovering registrar I0114 18:51:34.725229 4740 log.cpp:660] Attempting to start the writer I0114 18:51:34.727893 4739 replica.cpp:477] Replica received implicit promise request with proposal 1 I0114 18:51:34.728425 4739 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 114781ns I0114 18:51:34.728662 4739 replica.cpp:345] Persisted promised to 1 I0114 18:51:34.731271 4741 coordinator.cpp:230] Coordinator attemping to fill missing position I0114 18:51:34.733223 4734 replica.cpp:378] Replica received explicit promise request for position 0 with proposal 2 I0114 18:51:34.734076 4734 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 87441ns I0114 18:51:34.734441 4734 replica.cpp:679] Persisted action at 0 I0114 18:51:34.740272 4739 replica.cpp:511] Replica received write request for position 0 I0114 18:51:34.740910 4739 leveldb.cpp:438] Reading position from leveldb took 59846ns I0114 18:51:34.741672 4739 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 189259ns I0114 18:51:34.741919 4739 replica.cpp:679] Persisted action at 0 I0114 18:51:34.743000 4739 replica.cpp:658] Replica received learned notice for
[jira] [Updated] (MESOS-2649) Implement Resource Estimator
[ https://issues.apache.org/jira/browse/MESOS-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-2649: -- Sprint: Twitter Q2 Sprint 2, Twitter Q2 Sprint 3 - 5/11 (was: Twitter Q2 Sprint 2) Implement Resource Estimator Key: MESOS-2649 URL: https://issues.apache.org/jira/browse/MESOS-2649 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Jie Yu Labels: twitter Resource estimator is the component in the slave that estimates the amount of oversubscribable resources. This needs to be integrated with the slave and resource monitor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2691) Update Resource message to include revocable resources
[ https://issues.apache.org/jira/browse/MESOS-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-2691: -- Sprint: Twitter Q2 Sprint 2, Twitter Q2 Sprint 3 - 5/11 (was: Twitter Q2 Sprint 2) Update Resource message to include revocable resources -- Key: MESOS-2691 URL: https://issues.apache.org/jira/browse/MESOS-2691 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Vinod Kone Labels: twitter Need to update Resource message with a new subtype that indicates that the resource is revocable. It might also need to specify why it is revocable (e.g., oversubscribed). Also need to make sure all the operations on Resource(s) takes this new message into account. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2665) Fix queuing discipline wrapper in linux/routing/queueing
[ https://issues.apache.org/jira/browse/MESOS-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-2665: -- Sprint: Twitter Q2 Sprint 2, Twitter Q2 Sprint 3 - 5/11 (was: Twitter Q2 Sprint 2) Fix queuing discipline wrapper in linux/routing/queueing - Key: MESOS-2665 URL: https://issues.apache.org/jira/browse/MESOS-2665 Project: Mesos Issue Type: Bug Components: isolation Reporter: Paul Brett Assignee: Paul Brett Priority: Critical qdisc search function is dependent on matching a single hard coded handle and does not correctly test for interface, making the implementation fragile. Additionally, the current setup scripts (using dynamically created shell commands) do not match the hard coded handles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2693) Printing a resource should show information about reservation, disk etc
[ https://issues.apache.org/jira/browse/MESOS-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-2693: -- Sprint: Twitter Q2 Sprint 3 - 5/11 Printing a resource should show information about reservation, disk etc --- Key: MESOS-2693 URL: https://issues.apache.org/jira/browse/MESOS-2693 Project: Mesos Issue Type: Improvement Reporter: Vinod Kone Assignee: brian wickman Labels: twitter While new fields like DiskInfo and ReservationInfo have been added to Resource protobuf, the output stream operator hasn't been updated to show these. This is valuable information to have in the logs during debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2697) Add a /teardown endpoint on master to teardown a framework
[ https://issues.apache.org/jira/browse/MESOS-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-2697: -- Story Points: 2 Add a /teardown endpoint on master to teardown a framework -- Key: MESOS-2697 URL: https://issues.apache.org/jira/browse/MESOS-2697 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Vinod Kone We plan to rename /shutdown endpoint to /teardown to be compatible with the new API. /shutdown will be deprecated in 0.24.0 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2715) Python egg build breakage
Greg Bowyer created MESOS-2715: -- Summary: Python egg build breakage Key: MESOS-2715 URL: https://issues.apache.org/jira/browse/MESOS-2715 Project: Mesos Issue Type: Bug Components: build, python api Reporter: Greg Bowyer Priority: Minor Essentially a small build fix, the python setup.py for the native code does not add -std=c++11 to its compiler flags. This is probably a dup. Fix is here for the interested https://github.com/apache/mesos/pull/42 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2697) Add a /teardown endpoint on master to teardown a framework
[ https://issues.apache.org/jira/browse/MESOS-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-2697: -- Sprint: Twitter Q2 Sprint 2, Twitter Q2 Sprint 3 - 5/11 (was: Twitter Q2 Sprint 2) Add a /teardown endpoint on master to teardown a framework -- Key: MESOS-2697 URL: https://issues.apache.org/jira/browse/MESOS-2697 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Vinod Kone We plan to rename /shutdown endpoint to /teardown to be compatible with the new API. /shutdown will be deprecated in 0.24.0 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2422) Use fq_codel qdisc for egress network traffic isolation
[ https://issues.apache.org/jira/browse/MESOS-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-2422: -- Sprint: Twitter Mesos Q1 Sprint 4, Twitter Q2 Sprint 2, Twitter Q2 Sprint 3 - 5/11 (was: Twitter Mesos Q1 Sprint 4, Twitter Q2 Sprint 2) Use fq_codel qdisc for egress network traffic isolation --- Key: MESOS-2422 URL: https://issues.apache.org/jira/browse/MESOS-2422 Project: Mesos Issue Type: Task Reporter: Cong Wang Assignee: Cong Wang Labels: twitter -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2479) Task filter input disappears entirely once the search query yields no results
[ https://issues.apache.org/jira/browse/MESOS-2479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538357#comment-14538357 ] Ian Babrou commented on MESOS-2479: --- There you go: https://reviews.apache.org/r/34048/ Task filter input disappears entirely once the search query yields no results - Key: MESOS-2479 URL: https://issues.apache.org/jira/browse/MESOS-2479 Project: Mesos Issue Type: Bug Components: webui Reporter: Joe Lee Assignee: Joe Lee Priority: Minor Original Estimate: 1h Remaining Estimate: 1h The search filter at the head of each table on the Web UI disappears as soon as your search token yields no results, making it impossible to edit your search without having to refresh the entire page. This looks to be a simple fix to the hide directive in the table header. It was introduced by commit dfd466cf121bf3482acc73f0461e557a5c3ac299 and undoes this change, as it seems erroneous. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2489) Enable a framework to perform reservation operations.
[ https://issues.apache.org/jira/browse/MESOS-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Park updated MESOS-2489: Shepherd: Jie Yu Enable a framework to perform reservation operations. - Key: MESOS-2489 URL: https://issues.apache.org/jira/browse/MESOS-2489 Project: Mesos Issue Type: Task Components: master Reporter: Michael Park Assignee: Michael Park Labels: mesosphere h3. Goal This is the first step to supporting dynamic reservations. The goal of this task is to enable a framework to reply to a resource offer with *Reserve* and *Unreserve* offer operations as defined by {{Offer::Operation}} in {{mesos.proto}}. h3. Overview It's divided into a few subtasks so that it's clear what the small chunks to be addressed are. In summary, we need to introduce the {{Resource::ReservationInfo}} protobuf message to encapsulate the reservation information, enable the C++ {{Resources}} class to handle it then enable the master to handle reservation operations. h3. Expected Outcome * The framework will be able to send back reservation operations to (un)reserve resources. * The reservations are kept only in the master since we don't send the {{CheckpointResources}} message to checkpoint the reservations on the slave yet. * The reservations are considered to be reserved for the framework's role. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2598) Slave state.json frameworks.executors.queued_tasks wrong format?
[ https://issues.apache.org/jira/browse/MESOS-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538723#comment-14538723 ] Benjamin Mahler commented on MESOS-2598: [~marco-mesos] when marking as fixed, can you please include the commit that fixed it and the fix version? It's very helpful to have for posterity :) Slave state.json frameworks.executors.queued_tasks wrong format? Key: MESOS-2598 URL: https://issues.apache.org/jira/browse/MESOS-2598 Project: Mesos Issue Type: Bug Components: statistics Affects Versions: 0.22.0 Environment: Linux version 3.10.0-229.1.2.el7.x86_64 (buil...@kbuilder.dev.centos.org) (gcc version 4.8.2 20140120 (Red Hat 4.8.2-16) (GCC) ) #1 SMP Fri Mar 27 03:04:26 UTC 2015 Reporter: Matthias Veit Priority: Minor Labels: newbie queued_tasks.executor_id is expected to be a string and not a complete json object. It should have the very same format as the tasks array on the same level. Example, directly taken from slave {noformat} queued_tasks: [ { data: , executor_id: { command: { argv: [], uris: [ { executable: false, value: http://downloads.foo.io/orchestra/storm-mesos/0.9.2-incubating-47-ovh.bb373df1c/storm-mesos-0.9.2-incubating.tgz; } ], value: cd storm-mesos* python bin/storm supervisor storm.mesos.MesosSupervisor }, data: {\assignmentid\:\srv4.hw.ca1.foo.com\,\supervisorid\:\srv4.hw.ca1.foo.com-stage-ingestion-stats-slave-111-1428421145\}, executor_id: stage-ingestion-stats-slave-111-1428421145, framework_id: 20150401-160104-251662508-5050-2197-0002, name: , resources: { cpus: 0.5, disk: 0, mem: 1000 } }, id: srv4.hw.ca1.foo.com-31708, name: worker srv4.hw.ca1.foo.com:31708, resources: { cpus: 1, disk: 0, mem: 5120, ports: [31708-31708] }, slave_id: 20150327-025553-218108076-5050-4122-S0 }, ... ] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2372) Test suite for verifying compatibility between Mesos components
[ https://issues.apache.org/jira/browse/MESOS-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538515#comment-14538515 ] Marco Massenzio commented on MESOS-2372: [~karya] This needs promoting to an epic and stories created for it. The main story being worked on will be placed back in sprint and given 8 points (or whatever appropriate) Test suite for verifying compatibility between Mesos components --- Key: MESOS-2372 URL: https://issues.apache.org/jira/browse/MESOS-2372 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Kapil Arya While our current unit/integration test suite catches functional bugs, it doesn't catch compatibility bugs (e.g, MESOS-2371). This is really crucial to provide operators the ability to do seamless upgrades on live clusters. We should have a test suite / framework (ideally running on CI vetting each review on RB) that tests upgrade paths between master, slave, scheduler and executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2069) Basic fetcher cache functionality
[ https://issues.apache.org/jira/browse/MESOS-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2069: --- Story Points: 8 (was: 1) Basic fetcher cache functionality - Key: MESOS-2069 URL: https://issues.apache.org/jira/browse/MESOS-2069 Project: Mesos Issue Type: Improvement Components: fetcher, slave Reporter: Bernd Mathiske Assignee: Bernd Mathiske Labels: fetcher, slave Original Estimate: 48h Remaining Estimate: 48h Add a flag to CommandInfo URI protobufs that indicates that files downloaded by the fetcher shall be cached in a repository. To be followed by MESOS-2057 for concurrency control. Also see MESOS-336 for the overall goals for the fetcher cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2072) Fetcher cache eviction
[ https://issues.apache.org/jira/browse/MESOS-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2072: --- Story Points: 8 (was: 3) Fetcher cache eviction -- Key: MESOS-2072 URL: https://issues.apache.org/jira/browse/MESOS-2072 Project: Mesos Issue Type: Improvement Components: fetcher, slave Reporter: Bernd Mathiske Assignee: Bernd Mathiske Original Estimate: 336h Remaining Estimate: 336h Delete files from the fetcher cache so that a given cache size is never exceeded. Succeed in doing so while concurrent downloads are on their way and new requests are pouring in. Idea: measure the size of each download before it begins, make enough room before the download. This means that only download mechanisms that divulge the size before the main download will be supported. AFAWK, those in use so far have this property. The calculation of how much space to free needs to be under concurrency control, accumulating all space needed for competing, incomplete download requests. (The Python script that performs fetcher caching for Aurora does not seem to implement this. See https://gist.github.com/zmanji/f41df77510ef9d00265a, imagine several of these programs running concurrently, each one's _cache_eviction() call succeeding, each perceiving the SAME free space being available.) Ultimately, a conflict resolution strategy is needed if just the downloads underway already exceed the cache capacity. Then, as a fallback, direct download into the work directory will be used for some tasks. TBD how to pick which task gets treated how. At first, only support copying of any downloaded files to the work directory for task execution. This isolates the task life cycle after starting a task from cache eviction considerations. (Later, we can add symbolic links that avoid copying. But then eviction of fetched files used by ongoing tasks must be blocked, which adds complexity. another future extension is MESOS-1667 Extract from URI while downloading into work dir). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-2650) Modularize the Resource Estimator
[ https://issues.apache.org/jira/browse/MESOS-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen reassigned MESOS-2650: - Assignee: Niklas Quarfot Nielsen Modularize the Resource Estimator - Key: MESOS-2650 URL: https://issues.apache.org/jira/browse/MESOS-2650 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Niklas Quarfot Nielsen Labels: mesosphere Modularizing the resource estimator opens up the door for org specific implementations. Test the estimator module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2074) Fetcher cache test fixture
[ https://issues.apache.org/jira/browse/MESOS-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2074: --- Story Points: 5 (was: 1) Fetcher cache test fixture -- Key: MESOS-2074 URL: https://issues.apache.org/jira/browse/MESOS-2074 Project: Mesos Issue Type: Improvement Components: fetcher, slave Reporter: Bernd Mathiske Assignee: Bernd Mathiske Original Estimate: 72h Remaining Estimate: 72h To accelerate providing good test coverage for the fetcher cache (MESOS-336), we can provide a framework that canonicalizes creating and running a number of tasks and allows easy parametrization with combinations of the following: - whether to cache or not - whether make what has been downloaded executable or not - whether to extract from an archive or not - whether to download from a file system, http, or... We can create a simple HHTP server in the test fixture to support the latter. Furthermore, the tests need to be robust wrt. varying numbers of StatusUpdate messages. An accumulating update message sink that reports the final state is needed. All this has already been programmed in this patch, just needs to be rebased: https://reviews.apache.org/r/21316/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2057) Concurrency control for fetcher cache
[ https://issues.apache.org/jira/browse/MESOS-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2057: --- Story Points: 8 (was: 13) Concurrency control for fetcher cache - Key: MESOS-2057 URL: https://issues.apache.org/jira/browse/MESOS-2057 Project: Mesos Issue Type: Improvement Components: fetcher, slave Reporter: Bernd Mathiske Assignee: Bernd Mathiske Original Estimate: 96h Remaining Estimate: 96h Having added a URI flag to CommandInfo messages (in MESOS-2069) that indicates caching, caching files downloaded by the fetcher in a repository, now ensure that when a URI is cached, it is only ever downloaded once for the same user on the same slave as long as the slave keeps running. This even holds if multiple tasks request the same URI concurrently. If multiple requests for the same URI occur, perform only one of them and reuse the result. Make concurrent requests for the same URI wait for the one download. Different URIs from different CommandInfos can be downloaded concurrently. No cache eviction, cleanup or failover will be handled for now. Additional tickets will be filed for these enhancements. (So don't use this feature in production until the whole epic is complete.) Note that implementing this does not suffice for production use. This ticket contains the main part of the fetcher logic, though. See the epic MESOS-336 for the rest of the features that lead to a fully functional fetcher cache. The proposed general approach is to keep all bookkeeping about what is in which stage of being fetched and where it resides in the slave's MesosContainerizerProcess, so that all concurrent access is disambiguated and controlled by an actor (aka libprocess process). Depends on MESOS-2056 and MESOS-2069. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2070) Implement simple slave recovery behavior for fetcher cache
[ https://issues.apache.org/jira/browse/MESOS-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2070: --- Story Points: 2 (was: 1) Implement simple slave recovery behavior for fetcher cache -- Key: MESOS-2070 URL: https://issues.apache.org/jira/browse/MESOS-2070 Project: Mesos Issue Type: Improvement Components: fetcher, slave Reporter: Bernd Mathiske Assignee: Bernd Mathiske Labels: newbie Original Estimate: 6h Remaining Estimate: 6h Clean the fetcher cache completely upon slave restart/recovery. This implements correct, albeit not ideal behavior. More efficient schemes that restore knowledge about cached files or even resume downloads can be added later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2551) C++ Scheduler library should send Call messages to Master
[ https://issues.apache.org/jira/browse/MESOS-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Jimenez updated MESOS-2551: -- Story Points: 8 C++ Scheduler library should send Call messages to Master - Key: MESOS-2551 URL: https://issues.apache.org/jira/browse/MESOS-2551 Project: Mesos Issue Type: Bug Reporter: Vinod Kone Assignee: Isabel Jimenez Currently, the C++ library sends different messages to Master instead of a single Call message. To vet the new Call API it should send Call messages. Master should be updated to handle all types of Calls. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2057) Concurrency control for fetcher cache
[ https://issues.apache.org/jira/browse/MESOS-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2057: --- Story Points: 13 (was: 2) Concurrency control for fetcher cache - Key: MESOS-2057 URL: https://issues.apache.org/jira/browse/MESOS-2057 Project: Mesos Issue Type: Improvement Components: fetcher, slave Reporter: Bernd Mathiske Assignee: Bernd Mathiske Original Estimate: 96h Remaining Estimate: 96h Having added a URI flag to CommandInfo messages (in MESOS-2069) that indicates caching, caching files downloaded by the fetcher in a repository, now ensure that when a URI is cached, it is only ever downloaded once for the same user on the same slave as long as the slave keeps running. This even holds if multiple tasks request the same URI concurrently. If multiple requests for the same URI occur, perform only one of them and reuse the result. Make concurrent requests for the same URI wait for the one download. Different URIs from different CommandInfos can be downloaded concurrently. No cache eviction, cleanup or failover will be handled for now. Additional tickets will be filed for these enhancements. (So don't use this feature in production until the whole epic is complete.) Note that implementing this does not suffice for production use. This ticket contains the main part of the fetcher logic, though. See the epic MESOS-336 for the rest of the features that lead to a fully functional fetcher cache. The proposed general approach is to keep all bookkeeping about what is in which stage of being fetched and where it resides in the slave's MesosContainerizerProcess, so that all concurrent access is disambiguated and controlled by an actor (aka libprocess process). Depends on MESOS-2056 and MESOS-2069. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2293) Implement the Call endpoint on master
[ https://issues.apache.org/jira/browse/MESOS-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Jimenez updated MESOS-2293: -- Sprint: Mesosphere Q1 Sprint 9 - 5/15 Story Points: 13 Implement the Call endpoint on master - Key: MESOS-2293 URL: https://issues.apache.org/jira/browse/MESOS-2293 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Isabel Jimenez Labels: mesosphere, twitter -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2650) Modularize the Resource Estimator
[ https://issues.apache.org/jira/browse/MESOS-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2650: -- Story Points: 3 (was: 5) Modularize the Resource Estimator - Key: MESOS-2650 URL: https://issues.apache.org/jira/browse/MESOS-2650 Project: Mesos Issue Type: Task Reporter: Vinod Kone Labels: mesosphere Modularizing the resource estimator opens up the door for org specific implementations. Test the estimator module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-2651) Implement QoS controller
[ https://issues.apache.org/jira/browse/MESOS-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen reassigned MESOS-2651: - Assignee: Niklas Quarfot Nielsen Implement QoS controller Key: MESOS-2651 URL: https://issues.apache.org/jira/browse/MESOS-2651 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Niklas Quarfot Nielsen Labels: mesosphere This is a component of the slave that informs the slave about the possible corrections that need to be performed (e.g., shutdown container using recoverable resources). This needs to be integrated with the resource monitor. Need to figure out the metrics used for sending corrections (e.g., scheduling latency, usage, informed by executor/scheduler) We also need to figure out the feedback loop between the QoS controller and the Resource Estimator. {code} class QoSController { public: QoSController(ResourceMonitor* monitor); process::QueueQoSCorrection correction(); }; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2516) Move allocation-related types to mesos::master namespace
[ https://issues.apache.org/jira/browse/MESOS-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538957#comment-14538957 ] Colin Williams commented on MESOS-2516: --- Thank you Move allocation-related types to mesos::master namespace Key: MESOS-2516 URL: https://issues.apache.org/jira/browse/MESOS-2516 Project: Mesos Issue Type: Improvement Components: allocation Reporter: Alexander Rukletsov Assignee: Colin Williams Priority: Minor Labels: easyfix, newbie {{Allocator}}, {{Sorter}} and {{Comaprator}} types live in {{master::allocator}} namespace. This is not consistent with the rest of the codebase: {{Isolator}}, {{Fetcher}}, {{Containerizer}} all live in {{slave}} namespace. Namespace {{allocator}} should be killed for consistency. Since sorters are poorly named, they should be renamed (or namespaced) prior to this change in order not to pollute {{master}} namespace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-2516) Move allocation-related types to mesos::master namespace
[ https://issues.apache.org/jira/browse/MESOS-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Williams reassigned MESOS-2516: - Assignee: Colin Williams Move allocation-related types to mesos::master namespace Key: MESOS-2516 URL: https://issues.apache.org/jira/browse/MESOS-2516 Project: Mesos Issue Type: Improvement Components: allocation Reporter: Alexander Rukletsov Assignee: Colin Williams Priority: Minor Labels: easyfix, newbie {{Allocator}}, {{Sorter}} and {{Comaprator}} types live in {{master::allocator}} namespace. This is not consistent with the rest of the codebase: {{Isolator}}, {{Fetcher}}, {{Containerizer}} all live in {{slave}} namespace. Namespace {{allocator}} should be killed for consistency. Since sorters are poorly named, they should be renamed (or namespaced) prior to this change in order not to pollute {{master}} namespace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-354) Oversubscribe resources
[ https://issues.apache.org/jira/browse/MESOS-354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Lambert updated MESOS-354: Epic Colour: ghx-label-4 Oversubscribe resources --- Key: MESOS-354 URL: https://issues.apache.org/jira/browse/MESOS-354 Project: Mesos Issue Type: Epic Components: isolation, master, slave Reporter: brian wickman Priority: Minor Labels: mesosphere, twitter Attachments: mesos_virtual_offers.pdf This proposal is predicated upon offer revocation. The idea would be to add a new revoked status either by (1) piggybacking off an existing status update (TASK_LOST or TASK_KILLED) or (2) introducing a new status update TASK_REVOKED. In order to augment an offer with metadata about revocability, there are options: 1) Add a revocable boolean to the Offer and a) offer only one type of Offer per slave at a particular time b) offer both revocable and non-revocable resources at the same time but require frameworks to understand that Offers can contain overlapping resources 2) Add a revocable_resources field on the Offer which is a superset of the regular resources field. By consuming resources = revocable_resources in a launchTask, the Task becomes a revocable task. If launching a task with resources, the Task is non-revocable. The use cases for revocable tasks are batch tasks (e.g. hadoop/pig/mapreduce) and non-revocable tasks are online higher-SLA tasks (e.g. services.) Consider a non-revocable that asks for 4 cores, 8 GB RAM and 20 GB of disk. One of these resources is a rate (4 cpu seconds per second) and two of them are fixed values (8GB and 20GB respectively, though disk resources can be further broken down into spindles - fixed - and iops - a rate.) In practice, these are the maximum resources in the respective dimensions that this task will use. In reality, we provision tasks at some factor below peak, and only hit peak resource consumption in rare circumstances or perhaps at a diurnal peak. In the meantime, we stand to gain from offering the some constant factor of the difference between (reserved - actual) of non-revocable tasks as revocable resources, depending upon our tolerance for revocable task churn. The main challenge is coming up with an accurate short / medium / long-term prediction of resource consumption based upon current behavior. In many cases it would be OK to be sloppy: * CPU / iops / network IO are rates (compressible) and can often be OK below guarantees for brief periods of time while task revocation takes place * Memory slack can be provided by enabling swap and dynamically setting swap paging boundaries. Should swap ever be activated, that would be a signal to revoke. The master / allocator would piggyback on the slave heartbeat mechanism to learn of the amount of revocable resources available at any point in time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (MESOS-2706) When the docker-tasks grow, the time spare between Queuing task and Starting container grows
[ https://issues.apache.org/jira/browse/MESOS-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuhe updated MESOS-2706: Comment: was deleted (was: waiting for u ...) When the docker-tasks grow, the time spare between Queuing task and Starting container grows Key: MESOS-2706 URL: https://issues.apache.org/jira/browse/MESOS-2706 Project: Mesos Issue Type: Bug Components: docker Affects Versions: 0.22.0 Environment: My Environment info: Mesos 0.22.0 Marathon 0.82-RC1 both running in one host-server. Every docker-task require 0.02 CPU and 128MB ,and the server has 8 cpus and 24G mems. So Mesos can launch thousands of task in theory. And the docker-task is very light-weight to launch a sshd service . Reporter: chenqiuhao At the beginning, Marathon can launch docker-task very fast,but when the number of tasks in the only-one mesos-slave host reached 50,It seemed Marathon lauch docker-task slow. So I check the mesos-slave log,and I found that the time spare between Queuing task and Starting container grew . For example, launch the 1st docker task, it takes about 0.008s [root@CNSH231434 mesos-slave]# tail -f slave.out |egrep 'Queuing task|Starting container' I0508 15:54:00.188350 225779 slave.cpp:1378] Queuing task 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' for executor dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b of framework '20150202-112355-2684495626-5050-26153- I0508 15:54:00.196832 225781 docker.cpp:581] Starting container 'd0b0813a-6cb6-4dfd-bbce-f1b338744285' for task 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' (and executor 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b') of framework '20150202-112355-2684495626-5050-26153-' launch the 50th docker task, it takes about 4.9s I0508 16:12:10.908596 225781 slave.cpp:1378] Queuing task 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' for executor dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b of framework '20150202-112355-2684495626-5050-26153- I0508 16:12:15.801503 225778 docker.cpp:581] Starting container '482dd47f-b9ab-4b09-b89e-e361d6f004a4' for task 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' (and executor 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b') of framework '20150202-112355-2684495626-5050-26153-' And when i launch the 100th docker task,it takes about 13s! And I did the same test in one 24 Cpus and 256G mems server-host, it got the same result. Did somebody have the same experience , or Can help to do the same pressure test ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2706) When the docker-tasks grow, the time spare between Queuing task and Starting container grows
[ https://issues.apache.org/jira/browse/MESOS-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539189#comment-14539189 ] yuhe commented on MESOS-2706: - waiting for u ... When the docker-tasks grow, the time spare between Queuing task and Starting container grows Key: MESOS-2706 URL: https://issues.apache.org/jira/browse/MESOS-2706 Project: Mesos Issue Type: Bug Components: docker Affects Versions: 0.22.0 Environment: My Environment info: Mesos 0.22.0 Marathon 0.82-RC1 both running in one host-server. Every docker-task require 0.02 CPU and 128MB ,and the server has 8 cpus and 24G mems. So Mesos can launch thousands of task in theory. And the docker-task is very light-weight to launch a sshd service . Reporter: chenqiuhao At the beginning, Marathon can launch docker-task very fast,but when the number of tasks in the only-one mesos-slave host reached 50,It seemed Marathon lauch docker-task slow. So I check the mesos-slave log,and I found that the time spare between Queuing task and Starting container grew . For example, launch the 1st docker task, it takes about 0.008s [root@CNSH231434 mesos-slave]# tail -f slave.out |egrep 'Queuing task|Starting container' I0508 15:54:00.188350 225779 slave.cpp:1378] Queuing task 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' for executor dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b of framework '20150202-112355-2684495626-5050-26153- I0508 15:54:00.196832 225781 docker.cpp:581] Starting container 'd0b0813a-6cb6-4dfd-bbce-f1b338744285' for task 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' (and executor 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b') of framework '20150202-112355-2684495626-5050-26153-' launch the 50th docker task, it takes about 4.9s I0508 16:12:10.908596 225781 slave.cpp:1378] Queuing task 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' for executor dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b of framework '20150202-112355-2684495626-5050-26153- I0508 16:12:15.801503 225778 docker.cpp:581] Starting container '482dd47f-b9ab-4b09-b89e-e361d6f004a4' for task 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' (and executor 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b') of framework '20150202-112355-2684495626-5050-26153-' And when i launch the 100th docker task,it takes about 13s! And I did the same test in one 24 Cpus and 256G mems server-host, it got the same result. Did somebody have the same experience , or Can help to do the same pressure test ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2706) When the docker-tasks grow, the time spare between Queuing task and Starting container grows
[ https://issues.apache.org/jira/browse/MESOS-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539190#comment-14539190 ] yuhe commented on MESOS-2706: - waiting for u ... When the docker-tasks grow, the time spare between Queuing task and Starting container grows Key: MESOS-2706 URL: https://issues.apache.org/jira/browse/MESOS-2706 Project: Mesos Issue Type: Bug Components: docker Affects Versions: 0.22.0 Environment: My Environment info: Mesos 0.22.0 Marathon 0.82-RC1 both running in one host-server. Every docker-task require 0.02 CPU and 128MB ,and the server has 8 cpus and 24G mems. So Mesos can launch thousands of task in theory. And the docker-task is very light-weight to launch a sshd service . Reporter: chenqiuhao At the beginning, Marathon can launch docker-task very fast,but when the number of tasks in the only-one mesos-slave host reached 50,It seemed Marathon lauch docker-task slow. So I check the mesos-slave log,and I found that the time spare between Queuing task and Starting container grew . For example, launch the 1st docker task, it takes about 0.008s [root@CNSH231434 mesos-slave]# tail -f slave.out |egrep 'Queuing task|Starting container' I0508 15:54:00.188350 225779 slave.cpp:1378] Queuing task 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' for executor dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b of framework '20150202-112355-2684495626-5050-26153- I0508 15:54:00.196832 225781 docker.cpp:581] Starting container 'd0b0813a-6cb6-4dfd-bbce-f1b338744285' for task 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' (and executor 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b') of framework '20150202-112355-2684495626-5050-26153-' launch the 50th docker task, it takes about 4.9s I0508 16:12:10.908596 225781 slave.cpp:1378] Queuing task 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' for executor dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b of framework '20150202-112355-2684495626-5050-26153- I0508 16:12:15.801503 225778 docker.cpp:581] Starting container '482dd47f-b9ab-4b09-b89e-e361d6f004a4' for task 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' (and executor 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b') of framework '20150202-112355-2684495626-5050-26153-' And when i launch the 100th docker task,it takes about 13s! And I did the same test in one 24 Cpus and 256G mems server-host, it got the same result. Did somebody have the same experience , or Can help to do the same pressure test ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2706) When the docker-tasks grow, the time spare between Queuing task and Starting container grows
[ https://issues.apache.org/jira/browse/MESOS-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539188#comment-14539188 ] yuhe commented on MESOS-2706: - waiting for u ... When the docker-tasks grow, the time spare between Queuing task and Starting container grows Key: MESOS-2706 URL: https://issues.apache.org/jira/browse/MESOS-2706 Project: Mesos Issue Type: Bug Components: docker Affects Versions: 0.22.0 Environment: My Environment info: Mesos 0.22.0 Marathon 0.82-RC1 both running in one host-server. Every docker-task require 0.02 CPU and 128MB ,and the server has 8 cpus and 24G mems. So Mesos can launch thousands of task in theory. And the docker-task is very light-weight to launch a sshd service . Reporter: chenqiuhao At the beginning, Marathon can launch docker-task very fast,but when the number of tasks in the only-one mesos-slave host reached 50,It seemed Marathon lauch docker-task slow. So I check the mesos-slave log,and I found that the time spare between Queuing task and Starting container grew . For example, launch the 1st docker task, it takes about 0.008s [root@CNSH231434 mesos-slave]# tail -f slave.out |egrep 'Queuing task|Starting container' I0508 15:54:00.188350 225779 slave.cpp:1378] Queuing task 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' for executor dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b of framework '20150202-112355-2684495626-5050-26153- I0508 15:54:00.196832 225781 docker.cpp:581] Starting container 'd0b0813a-6cb6-4dfd-bbce-f1b338744285' for task 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' (and executor 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b') of framework '20150202-112355-2684495626-5050-26153-' launch the 50th docker task, it takes about 4.9s I0508 16:12:10.908596 225781 slave.cpp:1378] Queuing task 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' for executor dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b of framework '20150202-112355-2684495626-5050-26153- I0508 16:12:15.801503 225778 docker.cpp:581] Starting container '482dd47f-b9ab-4b09-b89e-e361d6f004a4' for task 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' (and executor 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b') of framework '20150202-112355-2684495626-5050-26153-' And when i launch the 100th docker task,it takes about 13s! And I did the same test in one 24 Cpus and 256G mems server-host, it got the same result. Did somebody have the same experience , or Can help to do the same pressure test ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (MESOS-2706) When the docker-tasks grow, the time spare between Queuing task and Starting container grows
[ https://issues.apache.org/jira/browse/MESOS-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuhe updated MESOS-2706: Comment: was deleted (was: waiting for u ...) When the docker-tasks grow, the time spare between Queuing task and Starting container grows Key: MESOS-2706 URL: https://issues.apache.org/jira/browse/MESOS-2706 Project: Mesos Issue Type: Bug Components: docker Affects Versions: 0.22.0 Environment: My Environment info: Mesos 0.22.0 Marathon 0.82-RC1 both running in one host-server. Every docker-task require 0.02 CPU and 128MB ,and the server has 8 cpus and 24G mems. So Mesos can launch thousands of task in theory. And the docker-task is very light-weight to launch a sshd service . Reporter: chenqiuhao At the beginning, Marathon can launch docker-task very fast,but when the number of tasks in the only-one mesos-slave host reached 50,It seemed Marathon lauch docker-task slow. So I check the mesos-slave log,and I found that the time spare between Queuing task and Starting container grew . For example, launch the 1st docker task, it takes about 0.008s [root@CNSH231434 mesos-slave]# tail -f slave.out |egrep 'Queuing task|Starting container' I0508 15:54:00.188350 225779 slave.cpp:1378] Queuing task 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' for executor dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b of framework '20150202-112355-2684495626-5050-26153- I0508 15:54:00.196832 225781 docker.cpp:581] Starting container 'd0b0813a-6cb6-4dfd-bbce-f1b338744285' for task 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' (and executor 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b') of framework '20150202-112355-2684495626-5050-26153-' launch the 50th docker task, it takes about 4.9s I0508 16:12:10.908596 225781 slave.cpp:1378] Queuing task 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' for executor dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b of framework '20150202-112355-2684495626-5050-26153- I0508 16:12:15.801503 225778 docker.cpp:581] Starting container '482dd47f-b9ab-4b09-b89e-e361d6f004a4' for task 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' (and executor 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b') of framework '20150202-112355-2684495626-5050-26153-' And when i launch the 100th docker task,it takes about 13s! And I did the same test in one 24 Cpus and 256G mems server-host, it got the same result. Did somebody have the same experience , or Can help to do the same pressure test ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1375) Log rotation capable
[ https://issues.apache.org/jira/browse/MESOS-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538880#comment-14538880 ] Benjamin Mahler commented on MESOS-1375: [~stevenschlansker] Sorry to hear that this is affecting you negatively, any reason you're not already using a tool like {{logrotate}}? Is there something preventing you from doing so? I assume you have to rotate logs for the rest of the daemon's running on your system, can the log rotation tooling be re-used for mesos? Log rotation capable Key: MESOS-1375 URL: https://issues.apache.org/jira/browse/MESOS-1375 Project: Mesos Issue Type: Improvement Components: master, slave Affects Versions: 0.18.0 Reporter: Damien Hardy Labels: ops, twitter Please provide a way to let ops manage logs. A log4j like configuration would be hard but make rotation capable without restarting the service at least. Based on external logrotate tool would be great : * write to a constant log file name * check for file change (recreated by logrotate) before write -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2636) Segfault in inline TryIP getIP(const std::string hostname, int family)
[ https://issues.apache.org/jira/browse/MESOS-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-2636: -- Description: We saw a segfault in production. Attaching the coredump, we see: Core was generated by `/usr/local/sbin/mesos-slave --port=5051 --resources=cpus:23;mem:70298;ports:[31'. Program terminated with signal 11, Segmentation fault. #0 0x7f639867c77e in free () from /lib64/libc.so.6 (gdb) bt #0 0x7f639867c77e in free () from /lib64/libc.so.6 #1 0x7f63986c25d0 in freeaddrinfo () from /lib64/libc.so.6 #2 0x7f6399deeafa in net::getIP (hostname=redacted, family=2) at ./3rdparty/stout/include/stout/net.hpp:201 #3 0x7f6399e1f273 in process::initialize (delegate=Unhandled dwarf expression opcode 0xf3 ) at src/process.cpp:837 #4 0x0042342f in main () was: We saw a segfault in production. Attaching the coredump, we see: Core was generated by `/usr/local/sbin/mesos-slave --port=5051 --resources=cpus:23;mem:70298;ports:[31'. Program terminated with signal 11, Segmentation fault. #0 0x7f639867c77e in free () from /lib64/libc.so.6 (gdb) bt #0 0x7f639867c77e in free () from /lib64/libc.so.6 #1 0x7f63986c25d0 in freeaddrinfo () from /lib64/libc.so.6 #2 0x7f6399deeafa in net::getIP (hostname=smf1-azc-03-sr2.prod.twitter.com, family=2) at ./3rdparty/stout/include/stout/net.hpp:201 #3 0x7f6399e1f273 in process::initialize (delegate=Unhandled dwarf expression opcode 0xf3 ) at src/process.cpp:837 #4 0x0042342f in main () Segfault in inline TryIP getIP(const std::string hostname, int family) - Key: MESOS-2636 URL: https://issues.apache.org/jira/browse/MESOS-2636 Project: Mesos Issue Type: Bug Reporter: Chi Zhang Assignee: Chi Zhang Labels: twitter Fix For: 0.23.0 We saw a segfault in production. Attaching the coredump, we see: Core was generated by `/usr/local/sbin/mesos-slave --port=5051 --resources=cpus:23;mem:70298;ports:[31'. Program terminated with signal 11, Segmentation fault. #0 0x7f639867c77e in free () from /lib64/libc.so.6 (gdb) bt #0 0x7f639867c77e in free () from /lib64/libc.so.6 #1 0x7f63986c25d0 in freeaddrinfo () from /lib64/libc.so.6 #2 0x7f6399deeafa in net::getIP (hostname=redacted, family=2) at ./3rdparty/stout/include/stout/net.hpp:201 #3 0x7f6399e1f273 in process::initialize (delegate=Unhandled dwarf expression opcode 0xf3 ) at src/process.cpp:837 #4 0x0042342f in main () -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2293) Implement the Call endpoint on master
[ https://issues.apache.org/jira/browse/MESOS-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-2293: -- Labels: mesosphere (was: mesosphere twitter) Implement the Call endpoint on master - Key: MESOS-2293 URL: https://issues.apache.org/jira/browse/MESOS-2293 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Isabel Jimenez Labels: mesosphere -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2616) Update C++ style guide on variable naming.
[ https://issues.apache.org/jira/browse/MESOS-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1453#comment-1453 ] Michael Park commented on MESOS-2616: - We'll slowly make progress on this. I mainly wanted to capture my thoughts on this topic as a whole. Update C++ style guide on variable naming. --- Key: MESOS-2616 URL: https://issues.apache.org/jira/browse/MESOS-2616 Project: Mesos Issue Type: Documentation Reporter: Till Toenshoff Assignee: Alexander Rukletsov Priority: Minor Fix For: 0.23.0 Our variable naming guide currently is not really explaining use cases for leading or trailing underscores as found a lot within our codebase. We should correct that. The following was copied from the review description for allowing discussions where needed: Documents the patterns we use to name variables and function arguments in our codebase. h4.Leading underscores to avoid ambiguity. We use this pattern extensively in libprocess, stout and mesos, a few examples below. * stout/try.hpp:105 {noformat} Try(State _state, T* _t = NULL, const std::string _message = ) : state(_state), t(_t), message(_message) {} {noformat} * process/http.hpp:480 {noformat} URL(const std::string _scheme, const std::string _domain, const uint16_t _port = 80, const std::string _path = /, const hashmapstd::string, std::string _query = (hashmapstd::string, std::string()), const Optionstd::string _fragment = None()) : scheme(_scheme), domain(_domain), port(_port), path(_path), query(_query), fragment(_fragment) {} {noformat} * slave/containerizer/linux_launcher.cpp:56 {noformat} LinuxLauncher::LinuxLauncher( const Flags _flags, int _namespaces, const string _hierarchy) : flags(_flags), namespaces(_namespaces), hierarchy(_hierarchy) {} {noformat} h4.Trailing undescores as prime symbols. We use this pattern in the code, though not extensively. We would like to see more pass-by-value instead of creating copies from a variable passed by const reference. * master.cpp:2942 {noformat} // Create and add the slave id. SlaveInfo slaveInfo_ = slaveInfo; slaveInfo_.mutable_id()-CopyFrom(newSlaveId()); {noformat} * slave.cpp:4180 {noformat} ExecutorInfo executorInfo_ = executor-info; Resources resources = executorInfo_.resources(); resources += taskInfo.resources(); executorInfo_.mutable_resources()-CopyFrom(resources); {noformat} * status_update_manager.cpp:474 {noformat} // Bounded exponential backoff. Duration duration_ = std::min(duration * 2, STATUS_UPDATE_RETRY_INTERVAL_MAX); {noformat} * containerizer/mesos/containerizer.cpp:109 {noformat} // Modify the flags to include any changes to isolation. Flags flags_ = flags; flags_.isolation = isolation; {noformat} h4.Passing arguments by value. * slave.cpp:2480 {noformat} void Slave::statusUpdate(StatusUpdate update, const UPID pid) { ... // Set the source before forwarding the status update. update.mutable_status()-set_source( pid == UPID() ? TaskStatus::SOURCE_SLAVE : TaskStatus::SOURCE_EXECUTOR); ... } {noformat} * process/metrics/timer.hpp:103 {noformat} static void _time(Time start, Timer that) { const Time stop = Clock::now(); double value; process::internal::acquire(that.data-lock); { that.data-lastValue = T(stop - start).value(); value = that.data-lastValue.get(); } process::internal::release(that.data-lock); that.push(value); } {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2622) Document the semantic change in decorator return values
[ https://issues.apache.org/jira/browse/MESOS-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2622: -- Story Points: 1 Document the semantic change in decorator return values --- Key: MESOS-2622 URL: https://issues.apache.org/jira/browse/MESOS-2622 Project: Mesos Issue Type: Documentation Reporter: Niklas Quarfot Nielsen Assignee: Niklas Quarfot Nielsen Labels: mesosphere In order to enable decorator modules to _remove_ metadata (environment variables or labels), we changed the meaning of the return value for decorator hooks. The ResultT return values means: ||State||Before||After|| |Error|Error is propagated to the call-site|No change| |None|The result of the decorator is not applied|No change| |Some|The result of the decorator is *appended*|The result of the decorator *overwrites* the final labels/environment object| -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2645) Design doc for resource oversubscription
[ https://issues.apache.org/jira/browse/MESOS-2645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538894#comment-14538894 ] Niklas Quarfot Nielsen commented on MESOS-2645: --- This has been going through a few reviews already - when do we claim success here? :) Design doc for resource oversubscription Key: MESOS-2645 URL: https://issues.apache.org/jira/browse/MESOS-2645 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Niklas Quarfot Nielsen Labels: mesosphere -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2645) Design doc for resource oversubscription
[ https://issues.apache.org/jira/browse/MESOS-2645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538916#comment-14538916 ] Vinod Kone commented on MESOS-2645: --- Can you update the design doc with the latest thinking? Also, I think the allocator design needs to be fleshed out before we can resolve this ticket. Agreed? Design doc for resource oversubscription Key: MESOS-2645 URL: https://issues.apache.org/jira/browse/MESOS-2645 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Niklas Quarfot Nielsen Labels: mesosphere -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2716) Add non-const reference version of OptionT::get.
Benjamin Mahler created MESOS-2716: -- Summary: Add non-const reference version of OptionT::get. Key: MESOS-2716 URL: https://issues.apache.org/jira/browse/MESOS-2716 Project: Mesos Issue Type: Improvement Components: stout Reporter: Benjamin Mahler Currently Option only provides a const reference to the underlying object: {code} template typename T class Option { ... const T get() const; ... }; {code} Since we use Option as a replacement for NULL, we often have optional variables that we need to perform non-const operations on. However, this requires taking a copy: {code} static void cleanup(const Response response) { if (response.type == Response::PIPE) { CHECK_SOME(response.reader); http::Pipe::Reader reader = response.reader.get(); // Remove const. reader.close(); } } {code} Taking a copy is hacky, but works for shared objects and some other copyable objects. Since Option represents a mutable variable, it makes sense to add non-const reference access to the underlying value: {code} template typename T class Option { ... const T get() const; T get(); ... }; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2516) Move allocation-related types to mesos::master namespace
[ https://issues.apache.org/jira/browse/MESOS-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538751#comment-14538751 ] Colin Williams commented on MESOS-2516: --- I'd like to start working on this issue, but I'm new enough to Jira I'm not sure how to take the task. Move allocation-related types to mesos::master namespace Key: MESOS-2516 URL: https://issues.apache.org/jira/browse/MESOS-2516 Project: Mesos Issue Type: Improvement Components: allocation Reporter: Alexander Rukletsov Priority: Minor Labels: easyfix, newbie {{Allocator}}, {{Sorter}} and {{Comaprator}} types live in {{master::allocator}} namespace. This is not consistent with the rest of the codebase: {{Isolator}}, {{Fetcher}}, {{Containerizer}} all live in {{slave}} namespace. Namespace {{allocator}} should be killed for consistency. Since sorters are poorly named, they should be renamed (or namespaced) prior to this change in order not to pollute {{master}} namespace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1279) Add resize task primitive
[ https://issues.apache.org/jira/browse/MESOS-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-1279: -- Labels: mesosphere myriad (was: mesosphere) Add resize task primitive - Key: MESOS-1279 URL: https://issues.apache.org/jira/browse/MESOS-1279 Project: Mesos Issue Type: Sub-task Components: c++ api, master, slave Reporter: Niklas Quarfot Nielsen Labels: mesosphere, myriad As mentioned in MESOS-938, one way to support task replacement and scaling could be to split the responsibility into several smaller primitives for 1) reducing complexity 2) Make it easier to comprehend and 3) easier and incremental in implementation. resizeTask() would be the primitive to either 1) Scale a running task's resources down 2) Scale a running task's resources up by using extra auxiliary offers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1739) Allow slave reconfiguration on restart
[ https://issues.apache.org/jira/browse/MESOS-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-1739: -- Labels: mesosphere myriad (was: mesosphere) Allow slave reconfiguration on restart -- Key: MESOS-1739 URL: https://issues.apache.org/jira/browse/MESOS-1739 Project: Mesos Issue Type: Epic Reporter: Patrick Reilly Assignee: Cody Maloney Labels: mesosphere, myriad Make it so that either via a slave restart or a out of process reconfigure ping, the attributes and resources of a slave can be updated to be a superset of what they used to be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-598) Also check 'git diff --shortstat --staged' in post-reviews.py.
[ https://issues.apache.org/jira/browse/MESOS-598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538865#comment-14538865 ] Benjamin Mahler commented on MESOS-598: --- [~pbrett] can you please include the commit when marking as fixed? It's very helpful for posterity. :) Also check 'git diff --shortstat --staged' in post-reviews.py. -- Key: MESOS-598 URL: https://issues.apache.org/jira/browse/MESOS-598 Project: Mesos Issue Type: Bug Reporter: Benjamin Hindman Assignee: Paul Brett We current check if you have any changes before we run post-reviews.py but we don't check for staged changes which IIUC could get lost. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2598) Slave state.json frameworks.executors.queued_tasks wrong format?
[ https://issues.apache.org/jira/browse/MESOS-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538771#comment-14538771 ] Marco Massenzio commented on MESOS-2598: [~bmahler] sure thing - please also be aware that we are doing some cleanup of old stuff and that information may either not be available or only available with much difficulty. Ideally, people would keep their Jira up to date, but that's not always been the case and we're trying to bring us to a better place. Slave state.json frameworks.executors.queued_tasks wrong format? Key: MESOS-2598 URL: https://issues.apache.org/jira/browse/MESOS-2598 Project: Mesos Issue Type: Bug Components: statistics Affects Versions: 0.22.0 Environment: Linux version 3.10.0-229.1.2.el7.x86_64 (buil...@kbuilder.dev.centos.org) (gcc version 4.8.2 20140120 (Red Hat 4.8.2-16) (GCC) ) #1 SMP Fri Mar 27 03:04:26 UTC 2015 Reporter: Matthias Veit Priority: Minor Labels: newbie queued_tasks.executor_id is expected to be a string and not a complete json object. It should have the very same format as the tasks array on the same level. Example, directly taken from slave {noformat} queued_tasks: [ { data: , executor_id: { command: { argv: [], uris: [ { executable: false, value: http://downloads.foo.io/orchestra/storm-mesos/0.9.2-incubating-47-ovh.bb373df1c/storm-mesos-0.9.2-incubating.tgz; } ], value: cd storm-mesos* python bin/storm supervisor storm.mesos.MesosSupervisor }, data: {\assignmentid\:\srv4.hw.ca1.foo.com\,\supervisorid\:\srv4.hw.ca1.foo.com-stage-ingestion-stats-slave-111-1428421145\}, executor_id: stage-ingestion-stats-slave-111-1428421145, framework_id: 20150401-160104-251662508-5050-2197-0002, name: , resources: { cpus: 0.5, disk: 0, mem: 1000 } }, id: srv4.hw.ca1.foo.com-31708, name: worker srv4.hw.ca1.foo.com:31708, resources: { cpus: 1, disk: 0, mem: 5120, ports: [31708-31708] }, slave_id: 20150327-025553-218108076-5050-4122-S0 }, ... ] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2516) Move allocation-related types to mesos::master namespace
[ https://issues.apache.org/jira/browse/MESOS-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538806#comment-14538806 ] Vinod Kone commented on MESOS-2516: --- Added you to the contributors role. Now you can assign the task to yourself. Move allocation-related types to mesos::master namespace Key: MESOS-2516 URL: https://issues.apache.org/jira/browse/MESOS-2516 Project: Mesos Issue Type: Improvement Components: allocation Reporter: Alexander Rukletsov Priority: Minor Labels: easyfix, newbie {{Allocator}}, {{Sorter}} and {{Comaprator}} types live in {{master::allocator}} namespace. This is not consistent with the rest of the codebase: {{Isolator}}, {{Fetcher}}, {{Containerizer}} all live in {{slave}} namespace. Namespace {{allocator}} should be killed for consistency. Since sorters are poorly named, they should be renamed (or namespaced) prior to this change in order not to pollute {{master}} namespace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (MESOS-2598) Slave state.json frameworks.executors.queued_tasks wrong format?
[ https://issues.apache.org/jira/browse/MESOS-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler reopened MESOS-2598: [~marco-mesos] Took a look, this is still an issue on master FWICT: https://github.com/apache/mesos/blob/115db9250e096edf70da54c3f6aea18aefeb1a06/src/slave/http.cpp#L157 {code} JSON::Object model(const TaskInfo task) { JSON::Object object; object.values[id] = task.task_id().value(); object.values[name] = task.name(); object.values[slave_id] = task.slave_id().value(); object.values[resources] = model(task.resources()); object.values[data] = task.data(); if (task.has_command()) { object.values[command] = model(task.command()); } if (task.has_executor()) { object.values[executor_id] = model(task.executor()); // XXX Bug here. } return object; } {code} Slave state.json frameworks.executors.queued_tasks wrong format? Key: MESOS-2598 URL: https://issues.apache.org/jira/browse/MESOS-2598 Project: Mesos Issue Type: Bug Components: statistics Affects Versions: 0.22.0 Environment: Linux version 3.10.0-229.1.2.el7.x86_64 (buil...@kbuilder.dev.centos.org) (gcc version 4.8.2 20140120 (Red Hat 4.8.2-16) (GCC) ) #1 SMP Fri Mar 27 03:04:26 UTC 2015 Reporter: Matthias Veit Priority: Minor Labels: newbie queued_tasks.executor_id is expected to be a string and not a complete json object. It should have the very same format as the tasks array on the same level. Example, directly taken from slave {noformat} queued_tasks: [ { data: , executor_id: { command: { argv: [], uris: [ { executable: false, value: http://downloads.foo.io/orchestra/storm-mesos/0.9.2-incubating-47-ovh.bb373df1c/storm-mesos-0.9.2-incubating.tgz; } ], value: cd storm-mesos* python bin/storm supervisor storm.mesos.MesosSupervisor }, data: {\assignmentid\:\srv4.hw.ca1.foo.com\,\supervisorid\:\srv4.hw.ca1.foo.com-stage-ingestion-stats-slave-111-1428421145\}, executor_id: stage-ingestion-stats-slave-111-1428421145, framework_id: 20150401-160104-251662508-5050-2197-0002, name: , resources: { cpus: 0.5, disk: 0, mem: 1000 } }, id: srv4.hw.ca1.foo.com-31708, name: worker srv4.hw.ca1.foo.com:31708, resources: { cpus: 1, disk: 0, mem: 5120, ports: [31708-31708] }, slave_id: 20150327-025553-218108076-5050-4122-S0 }, ... ] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1375) Log rotation capable
[ https://issues.apache.org/jira/browse/MESOS-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539084#comment-14539084 ] Timothy St. Clair commented on MESOS-1375: -- Now that systemd has pretty much taken over, it seems like folks (myself included) should update to simply take advantage of the journal. Log rotation capable Key: MESOS-1375 URL: https://issues.apache.org/jira/browse/MESOS-1375 Project: Mesos Issue Type: Improvement Components: master, slave Affects Versions: 0.18.0 Reporter: Damien Hardy Labels: ops, twitter Please provide a way to let ops manage logs. A log4j like configuration would be hard but make rotation capable without restarting the service at least. Based on external logrotate tool would be great : * write to a constant log file name * check for file change (recreated by logrotate) before write -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1279) Add resize task primitive
[ https://issues.apache.org/jira/browse/MESOS-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538255#comment-14538255 ] Kannan Rajah commented on MESOS-1279: - This JIRA has not been updated in a year. This feature will benefit Apache Myriad which launches place holder tasks to represent the YARN containers. These container sizes will keep varying and so we need a way to resize the place holder tasks to avoid wasting resources. Currently, we are thinking of a workaround in Myriad itself. But I would like to know if there is a plan on when the feature will be implemented. Add resize task primitive - Key: MESOS-1279 URL: https://issues.apache.org/jira/browse/MESOS-1279 Project: Mesos Issue Type: Sub-task Components: c++ api, master, slave Reporter: Niklas Quarfot Nielsen As mentioned in MESOS-938, one way to support task replacement and scaling could be to split the responsibility into several smaller primitives for 1) reducing complexity 2) Make it easier to comprehend and 3) easier and incremental in implementation. resizeTask() would be the primitive to either 1) Scale a running task's resources down 2) Scale a running task's resources up by using extra auxiliary offers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2479) Task filter input disappears entirely once the search query yields no results
[ https://issues.apache.org/jira/browse/MESOS-2479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-2479: -- Shepherd: Adam B Task filter input disappears entirely once the search query yields no results - Key: MESOS-2479 URL: https://issues.apache.org/jira/browse/MESOS-2479 Project: Mesos Issue Type: Bug Components: webui Reporter: Joe Lee Assignee: Joe Lee Priority: Minor Original Estimate: 1h Remaining Estimate: 1h The search filter at the head of each table on the Web UI disappears as soon as your search token yields no results, making it impossible to edit your search without having to refresh the entire page. This looks to be a simple fix to the hide directive in the table header. It was introduced by commit dfd466cf121bf3482acc73f0461e557a5c3ac299 and undoes this change, as it seems erroneous. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2479) Task filter input disappears entirely once the search query yields no results
[ https://issues.apache.org/jira/browse/MESOS-2479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538251#comment-14538251 ] Adam B commented on MESOS-2479: --- [~bobrik] Could you please submit the patch? See http://mesos.apache.org/documentation/latest/submitting-a-patch/ You can add Thomas and myself (adam-mesos) as reviewers. Task filter input disappears entirely once the search query yields no results - Key: MESOS-2479 URL: https://issues.apache.org/jira/browse/MESOS-2479 Project: Mesos Issue Type: Bug Components: webui Reporter: Joe Lee Assignee: Joe Lee Priority: Minor Original Estimate: 1h Remaining Estimate: 1h The search filter at the head of each table on the Web UI disappears as soon as your search token yields no results, making it impossible to edit your search without having to refresh the entire page. This looks to be a simple fix to the hide directive in the table header. It was introduced by commit dfd466cf121bf3482acc73f0461e557a5c3ac299 and undoes this change, as it seems erroneous. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1279) Add resize task primitive
[ https://issues.apache.org/jira/browse/MESOS-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-1279: -- Labels: mesosphere (was: ) Add resize task primitive - Key: MESOS-1279 URL: https://issues.apache.org/jira/browse/MESOS-1279 Project: Mesos Issue Type: Sub-task Components: c++ api, master, slave Reporter: Niklas Quarfot Nielsen Labels: mesosphere As mentioned in MESOS-938, one way to support task replacement and scaling could be to split the responsibility into several smaller primitives for 1) reducing complexity 2) Make it easier to comprehend and 3) easier and incremental in implementation. resizeTask() would be the primitive to either 1) Scale a running task's resources down 2) Scale a running task's resources up by using extra auxiliary offers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)