[jira] [Issue Comment Deleted] (MESOS-2706) When the docker-tasks grow, the time spare between Queuing task and Starting container grows

2015-05-11 Thread yuhe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuhe updated MESOS-2706:

Comment: was deleted

(was: mesos/src/slave/slave.cpp 

 if (!executor-isCommandExecutor()) {
// If the executor is _not_ a command executor, this means that
// the task will include the executor to run. The actual task to
// run will be enqueued and subsequently handled by the executor
// when it has registered to the slave.
launch = slave-containerizer-launch(
containerId,
executorInfo_, // modified to include the task's resources.
executor-directory,
slave-flags.switch_user ? Optionstring(user) : None(),
slave-info.id(),
slave-self(),
info.checkpoint());
  } else {
// An executor has _not_ been provided by the task and will
// instead define a command and/or container to run. Right now,
// these tasks will require an executor anyway and the slave
// creates a command executor. However, it is up to the
// containerizer how to execute those tasks and the generated
// executor info works as a placeholder.
// TODO(nnielsen): Obsolete the requirement for executors to run
// one-off tasks.
launch = slave-containerizer-launch(
containerId,
taskInfo,
executorInfo_,
executor-directory,
slave-flags.switch_user ? Optionstring(user) : None(),
slave-info.id(),
slave-self(),
info.checkpoint());
  })

 When the docker-tasks grow, the time spare between Queuing task and Starting 
 container grows
 

 Key: MESOS-2706
 URL: https://issues.apache.org/jira/browse/MESOS-2706
 Project: Mesos
  Issue Type: Bug
  Components: docker
Affects Versions: 0.22.0
 Environment: My Environment info:
 Mesos 0.22.0  Marathon 0.82-RC1 both running in one host-server.
 Every docker-task require 0.02 CPU and 128MB ,and the server has 8 cpus and 
 24G mems.
 So Mesos can launch thousands of task in theory.
 And the docker-task is very light-weight to launch a sshd service .
Reporter: chenqiuhao

 At the beginning, Marathon can launch docker-task very fast,but when the 
 number of tasks in the only-one mesos-slave host reached 50,It seemed 
 Marathon lauch docker-task slow.
 So I check the mesos-slave log,and I found that the time spare between 
 Queuing task and Starting container grew .
 For example, 
 launch the 1st docker task, it takes about 0.008s
 [root@CNSH231434 mesos-slave]# tail -f slave.out |egrep 'Queuing 
 task|Starting container'
 I0508 15:54:00.188350 225779 slave.cpp:1378] Queuing task 
 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' for executor 
 dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b of framework 
 '20150202-112355-2684495626-5050-26153-
 I0508 15:54:00.196832 225781 docker.cpp:581] Starting container 
 'd0b0813a-6cb6-4dfd-bbce-f1b338744285' for task 
 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' (and executor 
 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b') of framework 
 '20150202-112355-2684495626-5050-26153-'
 launch the 50th docker task, it takes about 4.9s
 I0508 16:12:10.908596 225781 slave.cpp:1378] Queuing task 
 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' for executor 
 dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b of framework 
 '20150202-112355-2684495626-5050-26153-
 I0508 16:12:15.801503 225778 docker.cpp:581] Starting container 
 '482dd47f-b9ab-4b09-b89e-e361d6f004a4' for task 
 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' (and executor 
 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b') of framework 
 '20150202-112355-2684495626-5050-26153-'
 And when i launch the 100th docker task,it takes about 13s!
 And I did the same test in one 24 Cpus and 256G mems server-host, it got the 
 same result.
 Did somebody have the same experience , or Can help to do the same pressure 
 test ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2706) When the docker-tasks grow, the time spare between Queuing task and Starting container grows

2015-05-11 Thread yuhe (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537634#comment-14537634
 ] 

yuhe commented on MESOS-2706:
-

mesos/src/slave/slave.cpp 

 if (!executor-isCommandExecutor()) {
// If the executor is _not_ a command executor, this means that
// the task will include the executor to run. The actual task to
// run will be enqueued and subsequently handled by the executor
// when it has registered to the slave.
launch = slave-containerizer-launch(
containerId,
executorInfo_, // modified to include the task's resources.
executor-directory,
slave-flags.switch_user ? Optionstring(user) : None(),
slave-info.id(),
slave-self(),
info.checkpoint());
  } else {
// An executor has _not_ been provided by the task and will
// instead define a command and/or container to run. Right now,
// these tasks will require an executor anyway and the slave
// creates a command executor. However, it is up to the
// containerizer how to execute those tasks and the generated
// executor info works as a placeholder.
// TODO(nnielsen): Obsolete the requirement for executors to run
// one-off tasks.
launch = slave-containerizer-launch(
containerId,
taskInfo,
executorInfo_,
executor-directory,
slave-flags.switch_user ? Optionstring(user) : None(),
slave-info.id(),
slave-self(),
info.checkpoint());
  }

 When the docker-tasks grow, the time spare between Queuing task and Starting 
 container grows
 

 Key: MESOS-2706
 URL: https://issues.apache.org/jira/browse/MESOS-2706
 Project: Mesos
  Issue Type: Bug
  Components: docker
Affects Versions: 0.22.0
 Environment: My Environment info:
 Mesos 0.22.0  Marathon 0.82-RC1 both running in one host-server.
 Every docker-task require 0.02 CPU and 128MB ,and the server has 8 cpus and 
 24G mems.
 So Mesos can launch thousands of task in theory.
 And the docker-task is very light-weight to launch a sshd service .
Reporter: chenqiuhao

 At the beginning, Marathon can launch docker-task very fast,but when the 
 number of tasks in the only-one mesos-slave host reached 50,It seemed 
 Marathon lauch docker-task slow.
 So I check the mesos-slave log,and I found that the time spare between 
 Queuing task and Starting container grew .
 For example, 
 launch the 1st docker task, it takes about 0.008s
 [root@CNSH231434 mesos-slave]# tail -f slave.out |egrep 'Queuing 
 task|Starting container'
 I0508 15:54:00.188350 225779 slave.cpp:1378] Queuing task 
 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' for executor 
 dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b of framework 
 '20150202-112355-2684495626-5050-26153-
 I0508 15:54:00.196832 225781 docker.cpp:581] Starting container 
 'd0b0813a-6cb6-4dfd-bbce-f1b338744285' for task 
 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' (and executor 
 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b') of framework 
 '20150202-112355-2684495626-5050-26153-'
 launch the 50th docker task, it takes about 4.9s
 I0508 16:12:10.908596 225781 slave.cpp:1378] Queuing task 
 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' for executor 
 dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b of framework 
 '20150202-112355-2684495626-5050-26153-
 I0508 16:12:15.801503 225778 docker.cpp:581] Starting container 
 '482dd47f-b9ab-4b09-b89e-e361d6f004a4' for task 
 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' (and executor 
 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b') of framework 
 '20150202-112355-2684495626-5050-26153-'
 And when i launch the 100th docker task,it takes about 13s!
 And I did the same test in one 24 Cpus and 256G mems server-host, it got the 
 same result.
 Did somebody have the same experience , or Can help to do the same pressure 
 test ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2706) When the docker-tasks grow, the time spare between Queuing task and Starting container grows

2015-05-11 Thread yuhe (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537637#comment-14537637
 ] 

yuhe commented on MESOS-2706:
-

Executor registered to the slave have delay the start time. Who can help? thx 


 When the docker-tasks grow, the time spare between Queuing task and Starting 
 container grows
 

 Key: MESOS-2706
 URL: https://issues.apache.org/jira/browse/MESOS-2706
 Project: Mesos
  Issue Type: Bug
  Components: docker
Affects Versions: 0.22.0
 Environment: My Environment info:
 Mesos 0.22.0  Marathon 0.82-RC1 both running in one host-server.
 Every docker-task require 0.02 CPU and 128MB ,and the server has 8 cpus and 
 24G mems.
 So Mesos can launch thousands of task in theory.
 And the docker-task is very light-weight to launch a sshd service .
Reporter: chenqiuhao

 At the beginning, Marathon can launch docker-task very fast,but when the 
 number of tasks in the only-one mesos-slave host reached 50,It seemed 
 Marathon lauch docker-task slow.
 So I check the mesos-slave log,and I found that the time spare between 
 Queuing task and Starting container grew .
 For example, 
 launch the 1st docker task, it takes about 0.008s
 [root@CNSH231434 mesos-slave]# tail -f slave.out |egrep 'Queuing 
 task|Starting container'
 I0508 15:54:00.188350 225779 slave.cpp:1378] Queuing task 
 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' for executor 
 dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b of framework 
 '20150202-112355-2684495626-5050-26153-
 I0508 15:54:00.196832 225781 docker.cpp:581] Starting container 
 'd0b0813a-6cb6-4dfd-bbce-f1b338744285' for task 
 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' (and executor 
 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b') of framework 
 '20150202-112355-2684495626-5050-26153-'
 launch the 50th docker task, it takes about 4.9s
 I0508 16:12:10.908596 225781 slave.cpp:1378] Queuing task 
 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' for executor 
 dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b of framework 
 '20150202-112355-2684495626-5050-26153-
 I0508 16:12:15.801503 225778 docker.cpp:581] Starting container 
 '482dd47f-b9ab-4b09-b89e-e361d6f004a4' for task 
 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' (and executor 
 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b') of framework 
 '20150202-112355-2684495626-5050-26153-'
 And when i launch the 100th docker task,it takes about 13s!
 And I did the same test in one 24 Cpus and 256G mems server-host, it got the 
 same result.
 Did somebody have the same experience , or Can help to do the same pressure 
 test ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2713) Docker resource usage

2015-05-11 Thread Ian Babrou (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537814#comment-14537814
 ] 

Ian Babrou commented on MESOS-2713:
---

I made an example docker container that exposes the issue: 
https://github.com/bobrik/mesos-wrong-stats

 Docker resource usage 
 --

 Key: MESOS-2713
 URL: https://issues.apache.org/jira/browse/MESOS-2713
 Project: Mesos
  Issue Type: Bug
  Components: containerization, docker, isolation
Affects Versions: 0.22.1
Reporter: Ian Babrou

 Looks like resource usage for docker containers on slaves is not very 
 accurate (/monitor/statistics.json). For example, cpu usage is calculated by 
 travesing process tree and summing up cpu times. Resulting numbers are not 
 even close to real usage, CPU time can even decrease.
 What is the reason for this if you can use cgroup data directly? Reading 
 cgroup location from pid of docker container is pretty straighforward.
 Another similar question: what is the reason to set isolation to posix 
 instead of cgroups by default? Looks like it suffers from the same issues as 
 docker containerizer (incorrect stats). More docs on this topic would be 
 great.
 Posix isolation also leads to bigger CPU usage from mesos slave process 
 (higher usage — posix isolation): http://i.imgur.com/jepk5m6.png



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2479) Task filter input disappears entirely once the search query yields no results

2015-05-11 Thread Ian Babrou (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537754#comment-14537754
 ] 

Ian Babrou commented on MESOS-2479:
---

This works as expected for me. Can you submit a patch? I can do it for you 
since this issue is driving me nuts.

 Task filter input disappears entirely once the search query yields no results
 -

 Key: MESOS-2479
 URL: https://issues.apache.org/jira/browse/MESOS-2479
 Project: Mesos
  Issue Type: Bug
  Components: webui
Reporter: Joe Lee
Assignee: Joe Lee
Priority: Minor
   Original Estimate: 1h
  Remaining Estimate: 1h

 The search filter at the head of each table on the Web UI disappears as soon 
 as your search token yields no results, making it impossible to edit your 
 search without having to refresh the entire page.
 This looks to be a simple fix to the hide directive in the table header. It 
 was introduced by commit dfd466cf121bf3482acc73f0461e557a5c3ac299 and undoes 
 this change, as it seems erroneous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (MESOS-2714) Memory limit for docker containers is set inconsistently

2015-05-11 Thread Ian Babrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Babrou closed MESOS-2714.
-
Resolution: Invalid

My bad, I forgot that I only updated 10 hosts out of 121 to 0.22.1 to see how 
it goes. 0.22.1 is definitely better :)

 Memory limit for docker containers is set inconsistently
 

 Key: MESOS-2714
 URL: https://issues.apache.org/jira/browse/MESOS-2714
 Project: Mesos
  Issue Type: Bug
  Components: docker, slave
Affects Versions: 0.22.1
Reporter: Ian Babrou

 I launched 120 docker containers on unique nodes with marathon and monitoring 
 said that they have different memory limits.
 Memory limit in marathon is set to 64mb, but 9 of 120 slaves reported limit 
 of 96mb. Slaves are identical in terms of hardware and mesos slave versions.
 I read stats from docker stats api, not from cgroup file. It turned out, that 
 some tasks were started with memory limit of 64mb and some with 96mb. The 
 ones with 64mb were increased to 96mb:
 I0510 15:29:26.530024 41390 docker.cpp:1298] Updated 'cpu.shares' to 307 at 
 /sys/fs/cgroup/cpu/docker/b020fd33df578a9287b25886b7d9de52353fa943a6c384c4303f8bb552f377cd
  for container 1e8c9f99-8519-4e35-bee6-69072f357c5e
 I0510 15:29:26.530828 41390 docker.cpp:1359] Updated 'memory.limit_in_bytes' 
 to 96MB at 
 /sys/fs/cgroup/memory/docker/b020fd33df578a9287b25886b7d9de52353fa943a6c384c4303f8bb552f377cd
  for container 1e8c9f99-8519-4e35-bee6-69072f357c5e
 In the end all tasks had 96mb limit in cgroup file, but memory limit reported 
 by docker was different.
 I think that the limit should be set consistently and all slaves should 
 behave identically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2588) Create pre-create hook before a Docker container launches

2015-05-11 Thread Adam Avilla (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538339#comment-14538339
 ] 

Adam Avilla commented on MESOS-2588:


+1. I think this would be an excellent feature. Let me know how I can help get 
this going and expose through Marathon / Chronos.

 Create pre-create hook before a Docker container launches
 -

 Key: MESOS-2588
 URL: https://issues.apache.org/jira/browse/MESOS-2588
 Project: Mesos
  Issue Type: Bug
  Components: docker
Reporter: Timothy Chen
Assignee: haosdent

 To be able to support custom actions to be called before launching a docker 
 contianer, we should create a hook that can be extensible and allow 
 module/hooks to be performed before a docker container is launched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2588) Create pre-create hook before a Docker container launches

2015-05-11 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538355#comment-14538355
 ] 

haosdent commented on MESOS-2588:
-

Hi, [~hekaldama]. Timothy Chen says them still have some no minor changes about 
docker in mesos, so he suggest me start this issue after them finished to 
submit those patches. When I start and finish this issue, I would let you know. 
Thank you.

 Create pre-create hook before a Docker container launches
 -

 Key: MESOS-2588
 URL: https://issues.apache.org/jira/browse/MESOS-2588
 Project: Mesos
  Issue Type: Bug
  Components: docker
Reporter: Timothy Chen
Assignee: haosdent

 To be able to support custom actions to be called before launching a docker 
 contianer, we should create a hook that can be extensible and allow 
 module/hooks to be performed before a docker container is launched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2690) --enable-optimize build fails with maybe-uninitialized

2015-05-11 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2690:
--
Assignee: Joris Van Remoortere  (was: Vinod Kone)

 --enable-optimize build fails with maybe-uninitialized
 --

 Key: MESOS-2690
 URL: https://issues.apache.org/jira/browse/MESOS-2690
 Project: Mesos
  Issue Type: Bug
  Components: build
 Environment: GCC 4.8 - 4.9
Reporter: Joris Van Remoortere
Assignee: Joris Van Remoortere
Priority: Blocker
 Fix For: 0.23.0


 When building with the `enable-optimize` flag, the build fails with 
 `maybe-uninitialized' errors.
 This is due to a bug in GCC when building optimized code triggering false 
 positives for this warning. Please see:
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59970
 We can disable this warning when using GCC + --enable-optimize.
 A quick work-around until there is a patch:
 ../configure CXXFLAGS=-Wno-maybe-uninitialized your-other-flags-here



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2226) HookTest.VerifySlaveLaunchExecutorHook is flaky

2015-05-11 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2226:
---
Story Points: 5

 HookTest.VerifySlaveLaunchExecutorHook is flaky
 ---

 Key: MESOS-2226
 URL: https://issues.apache.org/jira/browse/MESOS-2226
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 0.22.0
Reporter: Vinod Kone
Assignee: Kapil Arya
  Labels: flaky, flaky-test

 Observed this on internal CI
 {code}
 [ RUN  ] HookTest.VerifySlaveLaunchExecutorHook
 Using temporary directory '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME'
 I0114 18:51:34.659353  4720 leveldb.cpp:176] Opened db in 1.255951ms
 I0114 18:51:34.662112  4720 leveldb.cpp:183] Compacted db in 596090ns
 I0114 18:51:34.662364  4720 leveldb.cpp:198] Created db iterator in 177877ns
 I0114 18:51:34.662719  4720 leveldb.cpp:204] Seeked to beginning of db in 
 19709ns
 I0114 18:51:34.663010  4720 leveldb.cpp:273] Iterated through 0 keys in the 
 db in 18208ns
 I0114 18:51:34.663312  4720 replica.cpp:744] Replica recovered with log 
 positions 0 - 0 with 1 holes and 0 unlearned
 I0114 18:51:34.664266  4735 recover.cpp:449] Starting replica recovery
 I0114 18:51:34.664908  4735 recover.cpp:475] Replica is in EMPTY status
 I0114 18:51:34.667842  4734 replica.cpp:641] Replica in EMPTY status received 
 a broadcasted recover request
 I0114 18:51:34.669117  4735 recover.cpp:195] Received a recover response from 
 a replica in EMPTY status
 I0114 18:51:34.677913  4735 recover.cpp:566] Updating replica status to 
 STARTING
 I0114 18:51:34.683157  4735 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 137939ns
 I0114 18:51:34.683507  4735 replica.cpp:323] Persisted replica status to 
 STARTING
 I0114 18:51:34.684013  4735 recover.cpp:475] Replica is in STARTING status
 I0114 18:51:34.685554  4738 replica.cpp:641] Replica in STARTING status 
 received a broadcasted recover request
 I0114 18:51:34.696512  4736 recover.cpp:195] Received a recover response from 
 a replica in STARTING status
 I0114 18:51:34.700552  4735 recover.cpp:566] Updating replica status to VOTING
 I0114 18:51:34.701128  4735 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 115624ns
 I0114 18:51:34.701478  4735 replica.cpp:323] Persisted replica status to 
 VOTING
 I0114 18:51:34.701817  4735 recover.cpp:580] Successfully joined the Paxos 
 group
 I0114 18:51:34.702569  4735 recover.cpp:464] Recover process terminated
 I0114 18:51:34.716439  4736 master.cpp:262] Master 
 20150114-185134-2272962752-57018-4720 (fedora-19) started on 
 192.168.122.135:57018
 I0114 18:51:34.716913  4736 master.cpp:308] Master only allowing 
 authenticated frameworks to register
 I0114 18:51:34.717136  4736 master.cpp:313] Master only allowing 
 authenticated slaves to register
 I0114 18:51:34.717488  4736 credentials.hpp:36] Loading credentials for 
 authentication from 
 '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME/credentials'
 I0114 18:51:34.718077  4736 master.cpp:357] Authorization enabled
 I0114 18:51:34.719238  4738 whitelist_watcher.cpp:65] No whitelist given
 I0114 18:51:34.719755  4737 hierarchical_allocator_process.hpp:285] 
 Initialized hierarchical allocator process
 I0114 18:51:34.722584  4736 master.cpp:1219] The newly elected leader is 
 master@192.168.122.135:57018 with id 20150114-185134-2272962752-57018-4720
 I0114 18:51:34.722865  4736 master.cpp:1232] Elected as the leading master!
 I0114 18:51:34.723310  4736 master.cpp:1050] Recovering from registrar
 I0114 18:51:34.723760  4734 registrar.cpp:313] Recovering registrar
 I0114 18:51:34.725229  4740 log.cpp:660] Attempting to start the writer
 I0114 18:51:34.727893  4739 replica.cpp:477] Replica received implicit 
 promise request with proposal 1
 I0114 18:51:34.728425  4739 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 114781ns
 I0114 18:51:34.728662  4739 replica.cpp:345] Persisted promised to 1
 I0114 18:51:34.731271  4741 coordinator.cpp:230] Coordinator attemping to 
 fill missing position
 I0114 18:51:34.733223  4734 replica.cpp:378] Replica received explicit 
 promise request for position 0 with proposal 2
 I0114 18:51:34.734076  4734 leveldb.cpp:343] Persisting action (8 bytes) to 
 leveldb took 87441ns
 I0114 18:51:34.734441  4734 replica.cpp:679] Persisted action at 0
 I0114 18:51:34.740272  4739 replica.cpp:511] Replica received write request 
 for position 0
 I0114 18:51:34.740910  4739 leveldb.cpp:438] Reading position from leveldb 
 took 59846ns
 I0114 18:51:34.741672  4739 leveldb.cpp:343] Persisting action (14 bytes) to 
 leveldb took 189259ns
 I0114 18:51:34.741919  4739 replica.cpp:679] Persisted action at 0
 I0114 18:51:34.743000  4739 replica.cpp:658] Replica received learned notice 
 for 

[jira] [Updated] (MESOS-2649) Implement Resource Estimator

2015-05-11 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2649:
--
Sprint: Twitter Q2 Sprint 2, Twitter Q2 Sprint 3 - 5/11  (was: Twitter Q2 
Sprint 2)

 Implement Resource Estimator
 

 Key: MESOS-2649
 URL: https://issues.apache.org/jira/browse/MESOS-2649
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
Assignee: Jie Yu
  Labels: twitter

 Resource estimator is the component in the slave that estimates the amount of 
 oversubscribable resources.
 This needs to be integrated with the slave and resource monitor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2691) Update Resource message to include revocable resources

2015-05-11 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2691:
--
Sprint: Twitter Q2 Sprint 2, Twitter Q2 Sprint 3 - 5/11  (was: Twitter Q2 
Sprint 2)

 Update Resource message to include revocable resources
 --

 Key: MESOS-2691
 URL: https://issues.apache.org/jira/browse/MESOS-2691
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
Assignee: Vinod Kone
  Labels: twitter

 Need to update Resource message with a new subtype that indicates that the 
 resource is revocable. It might also need to specify why it is revocable 
 (e.g., oversubscribed).
 Also need to make sure all the operations on Resource(s) takes this new 
 message into account.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2665) Fix queuing discipline wrapper in linux/routing/queueing

2015-05-11 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2665:
--
Sprint: Twitter Q2 Sprint 2, Twitter Q2 Sprint 3 - 5/11  (was: Twitter Q2 
Sprint 2)

 Fix queuing discipline wrapper in linux/routing/queueing 
 -

 Key: MESOS-2665
 URL: https://issues.apache.org/jira/browse/MESOS-2665
 Project: Mesos
  Issue Type: Bug
  Components: isolation
Reporter: Paul Brett
Assignee: Paul Brett
Priority: Critical

 qdisc search function is dependent on matching a single hard coded handle and 
 does not correctly test for interface, making the implementation fragile.  
 Additionally, the current setup scripts (using dynamically created shell 
 commands) do not match the hard coded handles.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2693) Printing a resource should show information about reservation, disk etc

2015-05-11 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2693:
--
Sprint: Twitter Q2 Sprint 3 - 5/11

 Printing a resource should show information about reservation, disk etc
 ---

 Key: MESOS-2693
 URL: https://issues.apache.org/jira/browse/MESOS-2693
 Project: Mesos
  Issue Type: Improvement
Reporter: Vinod Kone
Assignee: brian wickman
  Labels: twitter

 While new fields like DiskInfo and ReservationInfo have been added to 
 Resource protobuf, the output stream operator hasn't been updated to show 
 these. This is valuable information to have in the logs during debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2697) Add a /teardown endpoint on master to teardown a framework

2015-05-11 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2697:
--
Story Points: 2

 Add a /teardown endpoint on master to teardown a framework
 --

 Key: MESOS-2697
 URL: https://issues.apache.org/jira/browse/MESOS-2697
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
Assignee: Vinod Kone

 We plan to rename /shutdown endpoint to /teardown to be compatible with 
 the new API. /shutdown will be deprecated in 0.24.0 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2715) Python egg build breakage

2015-05-11 Thread Greg Bowyer (JIRA)
Greg Bowyer created MESOS-2715:
--

 Summary: Python egg build breakage
 Key: MESOS-2715
 URL: https://issues.apache.org/jira/browse/MESOS-2715
 Project: Mesos
  Issue Type: Bug
  Components: build, python api
Reporter: Greg Bowyer
Priority: Minor


Essentially a small build fix, the python setup.py for the native code does not 
add -std=c++11 to its compiler flags.

This is probably a dup.

Fix is here for the interested

https://github.com/apache/mesos/pull/42



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2697) Add a /teardown endpoint on master to teardown a framework

2015-05-11 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2697:
--
Sprint: Twitter Q2 Sprint 2, Twitter Q2 Sprint 3 - 5/11  (was: Twitter Q2 
Sprint 2)

 Add a /teardown endpoint on master to teardown a framework
 --

 Key: MESOS-2697
 URL: https://issues.apache.org/jira/browse/MESOS-2697
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
Assignee: Vinod Kone

 We plan to rename /shutdown endpoint to /teardown to be compatible with 
 the new API. /shutdown will be deprecated in 0.24.0 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2422) Use fq_codel qdisc for egress network traffic isolation

2015-05-11 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2422:
--
Sprint: Twitter Mesos Q1 Sprint 4, Twitter Q2 Sprint 2, Twitter Q2 Sprint 3 
- 5/11  (was: Twitter Mesos Q1 Sprint 4, Twitter Q2 Sprint 2)

 Use fq_codel qdisc for egress network traffic isolation
 ---

 Key: MESOS-2422
 URL: https://issues.apache.org/jira/browse/MESOS-2422
 Project: Mesos
  Issue Type: Task
Reporter: Cong Wang
Assignee: Cong Wang
  Labels: twitter





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2479) Task filter input disappears entirely once the search query yields no results

2015-05-11 Thread Ian Babrou (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538357#comment-14538357
 ] 

Ian Babrou commented on MESOS-2479:
---

There you go: https://reviews.apache.org/r/34048/

 Task filter input disappears entirely once the search query yields no results
 -

 Key: MESOS-2479
 URL: https://issues.apache.org/jira/browse/MESOS-2479
 Project: Mesos
  Issue Type: Bug
  Components: webui
Reporter: Joe Lee
Assignee: Joe Lee
Priority: Minor
   Original Estimate: 1h
  Remaining Estimate: 1h

 The search filter at the head of each table on the Web UI disappears as soon 
 as your search token yields no results, making it impossible to edit your 
 search without having to refresh the entire page.
 This looks to be a simple fix to the hide directive in the table header. It 
 was introduced by commit dfd466cf121bf3482acc73f0461e557a5c3ac299 and undoes 
 this change, as it seems erroneous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2489) Enable a framework to perform reservation operations.

2015-05-11 Thread Michael Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Park updated MESOS-2489:

Shepherd: Jie Yu

 Enable a framework to perform reservation operations.
 -

 Key: MESOS-2489
 URL: https://issues.apache.org/jira/browse/MESOS-2489
 Project: Mesos
  Issue Type: Task
  Components: master
Reporter: Michael Park
Assignee: Michael Park
  Labels: mesosphere

 h3. Goal
 This is the first step to supporting dynamic reservations. The goal of this 
 task is to enable a framework to reply to a resource offer with *Reserve* and 
 *Unreserve* offer operations as defined by {{Offer::Operation}} in 
 {{mesos.proto}}.
 h3. Overview
 It's divided into a few subtasks so that it's clear what the small chunks to 
 be addressed are. In summary, we need to introduce the 
 {{Resource::ReservationInfo}} protobuf message to encapsulate the reservation 
 information, enable the C++ {{Resources}} class to handle it then enable the 
 master to handle reservation operations.
 h3. Expected Outcome
 * The framework will be able to send back reservation operations to 
 (un)reserve resources.
 * The reservations are kept only in the master since we don't send the 
 {{CheckpointResources}} message to checkpoint the reservations on the slave 
 yet.
 * The reservations are considered to be reserved for the framework's role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2598) Slave state.json frameworks.executors.queued_tasks wrong format?

2015-05-11 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538723#comment-14538723
 ] 

Benjamin Mahler commented on MESOS-2598:


[~marco-mesos] when marking as fixed, can you please include the commit that 
fixed it and the fix version? It's very helpful to have for posterity :)

 Slave state.json frameworks.executors.queued_tasks wrong format?
 

 Key: MESOS-2598
 URL: https://issues.apache.org/jira/browse/MESOS-2598
 Project: Mesos
  Issue Type: Bug
  Components: statistics
Affects Versions: 0.22.0
 Environment: Linux version 3.10.0-229.1.2.el7.x86_64 
 (buil...@kbuilder.dev.centos.org) (gcc version 4.8.2 20140120 (Red Hat 
 4.8.2-16) (GCC) ) #1 SMP Fri Mar 27 03:04:26 UTC 2015
Reporter: Matthias Veit
Priority: Minor
  Labels: newbie

 queued_tasks.executor_id is expected to be a string and not a complete json 
 object. It should have the very same format as the tasks array on the same 
 level.
 Example, directly taken from slave
 {noformat}
  
 queued_tasks: [
 {
   data: ,
   executor_id: {
 command: {
   argv: [],
   uris: [
 {
   executable: false,
   value: 
 http://downloads.foo.io/orchestra/storm-mesos/0.9.2-incubating-47-ovh.bb373df1c/storm-mesos-0.9.2-incubating.tgz;
 }
   ],
   value: cd storm-mesos*  python bin/storm supervisor 
 storm.mesos.MesosSupervisor
 },
 data: 
 {\assignmentid\:\srv4.hw.ca1.foo.com\,\supervisorid\:\srv4.hw.ca1.foo.com-stage-ingestion-stats-slave-111-1428421145\},
 executor_id: stage-ingestion-stats-slave-111-1428421145,
 framework_id: 20150401-160104-251662508-5050-2197-0002,
 name: ,
 resources: {
   cpus: 0.5,
   disk: 0,
   mem: 1000
 }
   },
   id: srv4.hw.ca1.foo.com-31708,
   name: worker srv4.hw.ca1.foo.com:31708,
   resources: {
 cpus: 1,
 disk: 0,
 mem: 5120,
 ports: [31708-31708]
   },
   slave_id: 20150327-025553-218108076-5050-4122-S0
 },
 ...
 ]
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2372) Test suite for verifying compatibility between Mesos components

2015-05-11 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538515#comment-14538515
 ] 

Marco Massenzio commented on MESOS-2372:


[~karya] This needs promoting to an epic and stories created for it.
The main story being worked on will be placed back in sprint and given 8 points 
(or whatever appropriate)

 Test suite for verifying compatibility between Mesos components
 ---

 Key: MESOS-2372
 URL: https://issues.apache.org/jira/browse/MESOS-2372
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
Assignee: Kapil Arya

 While our current unit/integration test suite catches functional bugs, it 
 doesn't catch compatibility bugs (e.g, MESOS-2371). This is really crucial to 
 provide operators the ability to do seamless upgrades on live clusters.
 We should have a test suite / framework (ideally running on CI vetting each 
 review on RB) that tests upgrade paths between master, slave, scheduler and 
 executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2069) Basic fetcher cache functionality

2015-05-11 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2069:
---
Story Points: 8  (was: 1)

 Basic fetcher cache functionality
 -

 Key: MESOS-2069
 URL: https://issues.apache.org/jira/browse/MESOS-2069
 Project: Mesos
  Issue Type: Improvement
  Components: fetcher, slave
Reporter: Bernd Mathiske
Assignee: Bernd Mathiske
  Labels: fetcher, slave
   Original Estimate: 48h
  Remaining Estimate: 48h

 Add a flag to CommandInfo URI protobufs that indicates that files downloaded 
 by the fetcher shall be cached in a repository. To be followed by MESOS-2057 
 for concurrency control.
 Also see MESOS-336 for the overall goals for the fetcher cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2072) Fetcher cache eviction

2015-05-11 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2072:
---
Story Points: 8  (was: 3)

 Fetcher cache eviction
 --

 Key: MESOS-2072
 URL: https://issues.apache.org/jira/browse/MESOS-2072
 Project: Mesos
  Issue Type: Improvement
  Components: fetcher, slave
Reporter: Bernd Mathiske
Assignee: Bernd Mathiske
   Original Estimate: 336h
  Remaining Estimate: 336h

 Delete files from the fetcher cache so that a given cache size is never 
 exceeded. Succeed in doing so while concurrent downloads are on their way and 
 new requests are pouring in.
 Idea: measure the size of each download before it begins, make enough room 
 before the download. This means that only download mechanisms that divulge 
 the size before the main download will be supported. AFAWK, those in use so 
 far have this property. 
 The calculation of how much space to free needs to be under concurrency 
 control, accumulating all space needed for competing, incomplete download 
 requests. (The Python script that performs fetcher caching for Aurora does 
 not seem to implement this. See 
 https://gist.github.com/zmanji/f41df77510ef9d00265a, imagine several of these 
 programs running concurrently, each one's _cache_eviction() call succeeding, 
 each perceiving the SAME free space being available.)
 Ultimately, a conflict resolution strategy is needed if just the downloads 
 underway already exceed the cache capacity. Then, as a fallback, direct 
 download into the work directory will be used for some tasks. TBD how to pick 
 which task gets treated how. 
 At first, only support copying of any downloaded files to the work directory 
 for task execution. This isolates the task life cycle after starting a task 
 from cache eviction considerations. 
 (Later, we can add symbolic links that avoid copying. But then eviction of 
 fetched files used by ongoing tasks must be blocked, which adds complexity. 
 another future extension is MESOS-1667 Extract from URI while downloading 
 into work dir).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2650) Modularize the Resource Estimator

2015-05-11 Thread Niklas Quarfot Nielsen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niklas Quarfot Nielsen reassigned MESOS-2650:
-

Assignee: Niklas Quarfot Nielsen

 Modularize the Resource Estimator
 -

 Key: MESOS-2650
 URL: https://issues.apache.org/jira/browse/MESOS-2650
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
Assignee: Niklas Quarfot Nielsen
  Labels: mesosphere

 Modularizing the resource estimator opens up the door for org specific 
 implementations.
 Test the estimator module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2074) Fetcher cache test fixture

2015-05-11 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2074:
---
Story Points: 5  (was: 1)

 Fetcher cache test fixture
 --

 Key: MESOS-2074
 URL: https://issues.apache.org/jira/browse/MESOS-2074
 Project: Mesos
  Issue Type: Improvement
  Components: fetcher, slave
Reporter: Bernd Mathiske
Assignee: Bernd Mathiske
   Original Estimate: 72h
  Remaining Estimate: 72h

 To accelerate providing good test coverage for the fetcher cache (MESOS-336), 
 we can provide a framework that canonicalizes creating and running a number 
 of tasks and allows easy parametrization with combinations of the following:
 - whether to cache or not
 - whether make what has been downloaded executable or not
 - whether to extract from an archive or not
 - whether to download from a file system, http, or...
 We can create a simple HHTP server in the test fixture to support the latter.
 Furthermore, the tests need to be robust wrt. varying numbers of StatusUpdate 
 messages. An accumulating update message sink that reports the final state is 
 needed.
 All this has already been programmed in this patch, just needs to be rebased:
 https://reviews.apache.org/r/21316/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2057) Concurrency control for fetcher cache

2015-05-11 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2057:
---
Story Points: 8  (was: 13)

 Concurrency control for fetcher cache
 -

 Key: MESOS-2057
 URL: https://issues.apache.org/jira/browse/MESOS-2057
 Project: Mesos
  Issue Type: Improvement
  Components: fetcher, slave
Reporter: Bernd Mathiske
Assignee: Bernd Mathiske
   Original Estimate: 96h
  Remaining Estimate: 96h

 Having added a URI flag to CommandInfo messages (in MESOS-2069) that 
 indicates caching, caching files downloaded by the fetcher in a repository, 
 now ensure that when a URI is cached, it is only ever downloaded once for 
 the same user on the same slave as long as the slave keeps running. 
 This even holds if multiple tasks request the same URI concurrently. If 
 multiple requests for the same URI occur, perform only one of them and reuse 
 the result. Make concurrent requests for the same URI wait for the one 
 download. 
 Different URIs from different CommandInfos can be downloaded concurrently.
 No cache eviction, cleanup or failover will be handled for now. Additional 
 tickets will be filed for these enhancements. (So don't use this feature in 
 production until the whole epic is complete.)
 Note that implementing this does not suffice for production use. This ticket 
 contains the main part of the fetcher logic, though. See the epic MESOS-336 
 for the rest of the features that lead to a fully functional fetcher cache.
 The proposed general approach is to keep all bookkeeping about what is in 
 which stage of being fetched and where it resides in the slave's 
 MesosContainerizerProcess, so that all concurrent access is disambiguated and 
 controlled by an actor (aka libprocess process).
 Depends on MESOS-2056 and MESOS-2069.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2070) Implement simple slave recovery behavior for fetcher cache

2015-05-11 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2070:
---
Story Points: 2  (was: 1)

 Implement simple slave recovery behavior for fetcher cache
 --

 Key: MESOS-2070
 URL: https://issues.apache.org/jira/browse/MESOS-2070
 Project: Mesos
  Issue Type: Improvement
  Components: fetcher, slave
Reporter: Bernd Mathiske
Assignee: Bernd Mathiske
  Labels: newbie
   Original Estimate: 6h
  Remaining Estimate: 6h

 Clean the fetcher cache completely upon slave restart/recovery. This 
 implements correct, albeit not ideal behavior. More efficient schemes that 
 restore knowledge about cached files or even resume downloads can be added 
 later. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2551) C++ Scheduler library should send Call messages to Master

2015-05-11 Thread Isabel Jimenez (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabel Jimenez updated MESOS-2551:
--
Story Points: 8

 C++ Scheduler library should send Call messages to Master
 -

 Key: MESOS-2551
 URL: https://issues.apache.org/jira/browse/MESOS-2551
 Project: Mesos
  Issue Type: Bug
Reporter: Vinod Kone
Assignee: Isabel Jimenez

 Currently, the C++ library sends different messages to Master instead of a 
 single Call message. To vet the new Call API it should send Call messages. 
 Master should be updated to handle all types of Calls.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2057) Concurrency control for fetcher cache

2015-05-11 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2057:
---
Story Points: 13  (was: 2)

 Concurrency control for fetcher cache
 -

 Key: MESOS-2057
 URL: https://issues.apache.org/jira/browse/MESOS-2057
 Project: Mesos
  Issue Type: Improvement
  Components: fetcher, slave
Reporter: Bernd Mathiske
Assignee: Bernd Mathiske
   Original Estimate: 96h
  Remaining Estimate: 96h

 Having added a URI flag to CommandInfo messages (in MESOS-2069) that 
 indicates caching, caching files downloaded by the fetcher in a repository, 
 now ensure that when a URI is cached, it is only ever downloaded once for 
 the same user on the same slave as long as the slave keeps running. 
 This even holds if multiple tasks request the same URI concurrently. If 
 multiple requests for the same URI occur, perform only one of them and reuse 
 the result. Make concurrent requests for the same URI wait for the one 
 download. 
 Different URIs from different CommandInfos can be downloaded concurrently.
 No cache eviction, cleanup or failover will be handled for now. Additional 
 tickets will be filed for these enhancements. (So don't use this feature in 
 production until the whole epic is complete.)
 Note that implementing this does not suffice for production use. This ticket 
 contains the main part of the fetcher logic, though. See the epic MESOS-336 
 for the rest of the features that lead to a fully functional fetcher cache.
 The proposed general approach is to keep all bookkeeping about what is in 
 which stage of being fetched and where it resides in the slave's 
 MesosContainerizerProcess, so that all concurrent access is disambiguated and 
 controlled by an actor (aka libprocess process).
 Depends on MESOS-2056 and MESOS-2069.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2293) Implement the Call endpoint on master

2015-05-11 Thread Isabel Jimenez (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabel Jimenez updated MESOS-2293:
--
  Sprint: Mesosphere Q1 Sprint 9 - 5/15
Story Points: 13

 Implement the Call endpoint on master
 -

 Key: MESOS-2293
 URL: https://issues.apache.org/jira/browse/MESOS-2293
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
Assignee: Isabel Jimenez
  Labels: mesosphere, twitter





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2650) Modularize the Resource Estimator

2015-05-11 Thread Niklas Quarfot Nielsen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niklas Quarfot Nielsen updated MESOS-2650:
--
Story Points: 3  (was: 5)

 Modularize the Resource Estimator
 -

 Key: MESOS-2650
 URL: https://issues.apache.org/jira/browse/MESOS-2650
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
  Labels: mesosphere

 Modularizing the resource estimator opens up the door for org specific 
 implementations.
 Test the estimator module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2651) Implement QoS controller

2015-05-11 Thread Niklas Quarfot Nielsen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niklas Quarfot Nielsen reassigned MESOS-2651:
-

Assignee: Niklas Quarfot Nielsen

 Implement QoS controller
 

 Key: MESOS-2651
 URL: https://issues.apache.org/jira/browse/MESOS-2651
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
Assignee: Niklas Quarfot Nielsen
  Labels: mesosphere

 This is a component of the slave that informs the slave about the possible 
 corrections that need to be performed (e.g., shutdown container using 
 recoverable resources).
 This needs to be integrated with the resource monitor.
 Need to figure out the metrics used for sending corrections (e.g., scheduling 
 latency, usage, informed by executor/scheduler)
 We also need to figure out the feedback loop between the QoS controller and 
 the Resource Estimator.
 {code}
 class QoSController {
 public:
   QoSController(ResourceMonitor* monitor);
   process::QueueQoSCorrection correction();
 };
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2516) Move allocation-related types to mesos::master namespace

2015-05-11 Thread Colin Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538957#comment-14538957
 ] 

Colin Williams commented on MESOS-2516:
---

Thank you

 Move allocation-related types to mesos::master namespace
 

 Key: MESOS-2516
 URL: https://issues.apache.org/jira/browse/MESOS-2516
 Project: Mesos
  Issue Type: Improvement
  Components: allocation
Reporter: Alexander Rukletsov
Assignee: Colin Williams
Priority: Minor
  Labels: easyfix, newbie

 {{Allocator}}, {{Sorter}} and {{Comaprator}} types live in 
 {{master::allocator}} namespace. This is not consistent with the rest of the 
 codebase: {{Isolator}}, {{Fetcher}}, {{Containerizer}} all live in {{slave}} 
 namespace. Namespace {{allocator}} should be killed for consistency.
 Since sorters are poorly named, they should be renamed (or namespaced) prior 
 to this change in order not to pollute {{master}} namespace. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2516) Move allocation-related types to mesos::master namespace

2015-05-11 Thread Colin Williams (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Williams reassigned MESOS-2516:
-

Assignee: Colin Williams

 Move allocation-related types to mesos::master namespace
 

 Key: MESOS-2516
 URL: https://issues.apache.org/jira/browse/MESOS-2516
 Project: Mesos
  Issue Type: Improvement
  Components: allocation
Reporter: Alexander Rukletsov
Assignee: Colin Williams
Priority: Minor
  Labels: easyfix, newbie

 {{Allocator}}, {{Sorter}} and {{Comaprator}} types live in 
 {{master::allocator}} namespace. This is not consistent with the rest of the 
 codebase: {{Isolator}}, {{Fetcher}}, {{Containerizer}} all live in {{slave}} 
 namespace. Namespace {{allocator}} should be killed for consistency.
 Since sorters are poorly named, they should be renamed (or namespaced) prior 
 to this change in order not to pollute {{master}} namespace. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-354) Oversubscribe resources

2015-05-11 Thread Chris Lambert (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Lambert updated MESOS-354:

Epic Colour: ghx-label-4

 Oversubscribe resources
 ---

 Key: MESOS-354
 URL: https://issues.apache.org/jira/browse/MESOS-354
 Project: Mesos
  Issue Type: Epic
  Components: isolation, master, slave
Reporter: brian wickman
Priority: Minor
  Labels: mesosphere, twitter
 Attachments: mesos_virtual_offers.pdf


 This proposal is predicated upon offer revocation.
 The idea would be to add a new revoked status either by (1) piggybacking 
 off an existing status update (TASK_LOST or TASK_KILLED) or (2) introducing a 
 new status update TASK_REVOKED.
 In order to augment an offer with metadata about revocability, there are 
 options:
   1) Add a revocable boolean to the Offer and
 a) offer only one type of Offer per slave at a particular time
 b) offer both revocable and non-revocable resources at the same time but 
 require frameworks to understand that Offers can contain overlapping resources
   2) Add a revocable_resources field on the Offer which is a superset of the 
 regular resources field.  By consuming  resources = revocable_resources in 
 a launchTask, the Task becomes a revocable task.  If launching a task with  
 resources, the Task is non-revocable.
 The use cases for revocable tasks are batch tasks (e.g. hadoop/pig/mapreduce) 
 and non-revocable tasks are online higher-SLA tasks (e.g. services.)
 Consider a non-revocable that asks for 4 cores, 8 GB RAM and 20 GB of disk.  
 One of these resources is a rate (4 cpu seconds per second) and two of them 
 are fixed values (8GB and 20GB respectively, though disk resources can be 
 further broken down into spindles - fixed - and iops - a rate.)  In practice, 
 these are the maximum resources in the respective dimensions that this task 
 will use.  In reality, we provision tasks at some factor below peak, and only 
 hit peak resource consumption in rare circumstances or perhaps at a diurnal 
 peak.  
 In the meantime, we stand to gain from offering the some constant factor of 
 the difference between (reserved - actual) of non-revocable tasks as 
 revocable resources, depending upon our tolerance for revocable task churn.  
 The main challenge is coming up with an accurate short / medium / long-term 
 prediction of resource consumption based upon current behavior.
 In many cases it would be OK to be sloppy:
   * CPU / iops / network IO are rates (compressible) and can often be OK 
 below guarantees for brief periods of time while task revocation takes place
   * Memory slack can be provided by enabling swap and dynamically setting 
 swap paging boundaries.  Should swap ever be activated, that would be a 
 signal to revoke.
 The master / allocator would piggyback on the slave heartbeat mechanism to 
 learn of the amount of revocable resources available at any point in time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-2706) When the docker-tasks grow, the time spare between Queuing task and Starting container grows

2015-05-11 Thread yuhe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuhe updated MESOS-2706:

Comment: was deleted

(was: waiting for u ...)

 When the docker-tasks grow, the time spare between Queuing task and Starting 
 container grows
 

 Key: MESOS-2706
 URL: https://issues.apache.org/jira/browse/MESOS-2706
 Project: Mesos
  Issue Type: Bug
  Components: docker
Affects Versions: 0.22.0
 Environment: My Environment info:
 Mesos 0.22.0  Marathon 0.82-RC1 both running in one host-server.
 Every docker-task require 0.02 CPU and 128MB ,and the server has 8 cpus and 
 24G mems.
 So Mesos can launch thousands of task in theory.
 And the docker-task is very light-weight to launch a sshd service .
Reporter: chenqiuhao

 At the beginning, Marathon can launch docker-task very fast,but when the 
 number of tasks in the only-one mesos-slave host reached 50,It seemed 
 Marathon lauch docker-task slow.
 So I check the mesos-slave log,and I found that the time spare between 
 Queuing task and Starting container grew .
 For example, 
 launch the 1st docker task, it takes about 0.008s
 [root@CNSH231434 mesos-slave]# tail -f slave.out |egrep 'Queuing 
 task|Starting container'
 I0508 15:54:00.188350 225779 slave.cpp:1378] Queuing task 
 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' for executor 
 dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b of framework 
 '20150202-112355-2684495626-5050-26153-
 I0508 15:54:00.196832 225781 docker.cpp:581] Starting container 
 'd0b0813a-6cb6-4dfd-bbce-f1b338744285' for task 
 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' (and executor 
 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b') of framework 
 '20150202-112355-2684495626-5050-26153-'
 launch the 50th docker task, it takes about 4.9s
 I0508 16:12:10.908596 225781 slave.cpp:1378] Queuing task 
 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' for executor 
 dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b of framework 
 '20150202-112355-2684495626-5050-26153-
 I0508 16:12:15.801503 225778 docker.cpp:581] Starting container 
 '482dd47f-b9ab-4b09-b89e-e361d6f004a4' for task 
 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' (and executor 
 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b') of framework 
 '20150202-112355-2684495626-5050-26153-'
 And when i launch the 100th docker task,it takes about 13s!
 And I did the same test in one 24 Cpus and 256G mems server-host, it got the 
 same result.
 Did somebody have the same experience , or Can help to do the same pressure 
 test ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2706) When the docker-tasks grow, the time spare between Queuing task and Starting container grows

2015-05-11 Thread yuhe (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539189#comment-14539189
 ] 

yuhe commented on MESOS-2706:
-

waiting for u ...

 When the docker-tasks grow, the time spare between Queuing task and Starting 
 container grows
 

 Key: MESOS-2706
 URL: https://issues.apache.org/jira/browse/MESOS-2706
 Project: Mesos
  Issue Type: Bug
  Components: docker
Affects Versions: 0.22.0
 Environment: My Environment info:
 Mesos 0.22.0  Marathon 0.82-RC1 both running in one host-server.
 Every docker-task require 0.02 CPU and 128MB ,and the server has 8 cpus and 
 24G mems.
 So Mesos can launch thousands of task in theory.
 And the docker-task is very light-weight to launch a sshd service .
Reporter: chenqiuhao

 At the beginning, Marathon can launch docker-task very fast,but when the 
 number of tasks in the only-one mesos-slave host reached 50,It seemed 
 Marathon lauch docker-task slow.
 So I check the mesos-slave log,and I found that the time spare between 
 Queuing task and Starting container grew .
 For example, 
 launch the 1st docker task, it takes about 0.008s
 [root@CNSH231434 mesos-slave]# tail -f slave.out |egrep 'Queuing 
 task|Starting container'
 I0508 15:54:00.188350 225779 slave.cpp:1378] Queuing task 
 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' for executor 
 dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b of framework 
 '20150202-112355-2684495626-5050-26153-
 I0508 15:54:00.196832 225781 docker.cpp:581] Starting container 
 'd0b0813a-6cb6-4dfd-bbce-f1b338744285' for task 
 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' (and executor 
 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b') of framework 
 '20150202-112355-2684495626-5050-26153-'
 launch the 50th docker task, it takes about 4.9s
 I0508 16:12:10.908596 225781 slave.cpp:1378] Queuing task 
 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' for executor 
 dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b of framework 
 '20150202-112355-2684495626-5050-26153-
 I0508 16:12:15.801503 225778 docker.cpp:581] Starting container 
 '482dd47f-b9ab-4b09-b89e-e361d6f004a4' for task 
 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' (and executor 
 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b') of framework 
 '20150202-112355-2684495626-5050-26153-'
 And when i launch the 100th docker task,it takes about 13s!
 And I did the same test in one 24 Cpus and 256G mems server-host, it got the 
 same result.
 Did somebody have the same experience , or Can help to do the same pressure 
 test ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2706) When the docker-tasks grow, the time spare between Queuing task and Starting container grows

2015-05-11 Thread yuhe (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539190#comment-14539190
 ] 

yuhe commented on MESOS-2706:
-

waiting for u ...

 When the docker-tasks grow, the time spare between Queuing task and Starting 
 container grows
 

 Key: MESOS-2706
 URL: https://issues.apache.org/jira/browse/MESOS-2706
 Project: Mesos
  Issue Type: Bug
  Components: docker
Affects Versions: 0.22.0
 Environment: My Environment info:
 Mesos 0.22.0  Marathon 0.82-RC1 both running in one host-server.
 Every docker-task require 0.02 CPU and 128MB ,and the server has 8 cpus and 
 24G mems.
 So Mesos can launch thousands of task in theory.
 And the docker-task is very light-weight to launch a sshd service .
Reporter: chenqiuhao

 At the beginning, Marathon can launch docker-task very fast,but when the 
 number of tasks in the only-one mesos-slave host reached 50,It seemed 
 Marathon lauch docker-task slow.
 So I check the mesos-slave log,and I found that the time spare between 
 Queuing task and Starting container grew .
 For example, 
 launch the 1st docker task, it takes about 0.008s
 [root@CNSH231434 mesos-slave]# tail -f slave.out |egrep 'Queuing 
 task|Starting container'
 I0508 15:54:00.188350 225779 slave.cpp:1378] Queuing task 
 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' for executor 
 dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b of framework 
 '20150202-112355-2684495626-5050-26153-
 I0508 15:54:00.196832 225781 docker.cpp:581] Starting container 
 'd0b0813a-6cb6-4dfd-bbce-f1b338744285' for task 
 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' (and executor 
 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b') of framework 
 '20150202-112355-2684495626-5050-26153-'
 launch the 50th docker task, it takes about 4.9s
 I0508 16:12:10.908596 225781 slave.cpp:1378] Queuing task 
 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' for executor 
 dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b of framework 
 '20150202-112355-2684495626-5050-26153-
 I0508 16:12:15.801503 225778 docker.cpp:581] Starting container 
 '482dd47f-b9ab-4b09-b89e-e361d6f004a4' for task 
 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' (and executor 
 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b') of framework 
 '20150202-112355-2684495626-5050-26153-'
 And when i launch the 100th docker task,it takes about 13s!
 And I did the same test in one 24 Cpus and 256G mems server-host, it got the 
 same result.
 Did somebody have the same experience , or Can help to do the same pressure 
 test ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2706) When the docker-tasks grow, the time spare between Queuing task and Starting container grows

2015-05-11 Thread yuhe (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539188#comment-14539188
 ] 

yuhe commented on MESOS-2706:
-

waiting for u ...

 When the docker-tasks grow, the time spare between Queuing task and Starting 
 container grows
 

 Key: MESOS-2706
 URL: https://issues.apache.org/jira/browse/MESOS-2706
 Project: Mesos
  Issue Type: Bug
  Components: docker
Affects Versions: 0.22.0
 Environment: My Environment info:
 Mesos 0.22.0  Marathon 0.82-RC1 both running in one host-server.
 Every docker-task require 0.02 CPU and 128MB ,and the server has 8 cpus and 
 24G mems.
 So Mesos can launch thousands of task in theory.
 And the docker-task is very light-weight to launch a sshd service .
Reporter: chenqiuhao

 At the beginning, Marathon can launch docker-task very fast,but when the 
 number of tasks in the only-one mesos-slave host reached 50,It seemed 
 Marathon lauch docker-task slow.
 So I check the mesos-slave log,and I found that the time spare between 
 Queuing task and Starting container grew .
 For example, 
 launch the 1st docker task, it takes about 0.008s
 [root@CNSH231434 mesos-slave]# tail -f slave.out |egrep 'Queuing 
 task|Starting container'
 I0508 15:54:00.188350 225779 slave.cpp:1378] Queuing task 
 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' for executor 
 dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b of framework 
 '20150202-112355-2684495626-5050-26153-
 I0508 15:54:00.196832 225781 docker.cpp:581] Starting container 
 'd0b0813a-6cb6-4dfd-bbce-f1b338744285' for task 
 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' (and executor 
 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b') of framework 
 '20150202-112355-2684495626-5050-26153-'
 launch the 50th docker task, it takes about 4.9s
 I0508 16:12:10.908596 225781 slave.cpp:1378] Queuing task 
 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' for executor 
 dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b of framework 
 '20150202-112355-2684495626-5050-26153-
 I0508 16:12:15.801503 225778 docker.cpp:581] Starting container 
 '482dd47f-b9ab-4b09-b89e-e361d6f004a4' for task 
 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' (and executor 
 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b') of framework 
 '20150202-112355-2684495626-5050-26153-'
 And when i launch the 100th docker task,it takes about 13s!
 And I did the same test in one 24 Cpus and 256G mems server-host, it got the 
 same result.
 Did somebody have the same experience , or Can help to do the same pressure 
 test ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-2706) When the docker-tasks grow, the time spare between Queuing task and Starting container grows

2015-05-11 Thread yuhe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuhe updated MESOS-2706:

Comment: was deleted

(was: waiting for u ...)

 When the docker-tasks grow, the time spare between Queuing task and Starting 
 container grows
 

 Key: MESOS-2706
 URL: https://issues.apache.org/jira/browse/MESOS-2706
 Project: Mesos
  Issue Type: Bug
  Components: docker
Affects Versions: 0.22.0
 Environment: My Environment info:
 Mesos 0.22.0  Marathon 0.82-RC1 both running in one host-server.
 Every docker-task require 0.02 CPU and 128MB ,and the server has 8 cpus and 
 24G mems.
 So Mesos can launch thousands of task in theory.
 And the docker-task is very light-weight to launch a sshd service .
Reporter: chenqiuhao

 At the beginning, Marathon can launch docker-task very fast,but when the 
 number of tasks in the only-one mesos-slave host reached 50,It seemed 
 Marathon lauch docker-task slow.
 So I check the mesos-slave log,and I found that the time spare between 
 Queuing task and Starting container grew .
 For example, 
 launch the 1st docker task, it takes about 0.008s
 [root@CNSH231434 mesos-slave]# tail -f slave.out |egrep 'Queuing 
 task|Starting container'
 I0508 15:54:00.188350 225779 slave.cpp:1378] Queuing task 
 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' for executor 
 dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b of framework 
 '20150202-112355-2684495626-5050-26153-
 I0508 15:54:00.196832 225781 docker.cpp:581] Starting container 
 'd0b0813a-6cb6-4dfd-bbce-f1b338744285' for task 
 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' (and executor 
 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b') of framework 
 '20150202-112355-2684495626-5050-26153-'
 launch the 50th docker task, it takes about 4.9s
 I0508 16:12:10.908596 225781 slave.cpp:1378] Queuing task 
 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' for executor 
 dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b of framework 
 '20150202-112355-2684495626-5050-26153-
 I0508 16:12:15.801503 225778 docker.cpp:581] Starting container 
 '482dd47f-b9ab-4b09-b89e-e361d6f004a4' for task 
 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' (and executor 
 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b') of framework 
 '20150202-112355-2684495626-5050-26153-'
 And when i launch the 100th docker task,it takes about 13s!
 And I did the same test in one 24 Cpus and 256G mems server-host, it got the 
 same result.
 Did somebody have the same experience , or Can help to do the same pressure 
 test ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1375) Log rotation capable

2015-05-11 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538880#comment-14538880
 ] 

Benjamin Mahler commented on MESOS-1375:


[~stevenschlansker] Sorry to hear that this is affecting you negatively, any 
reason you're not already using a tool like {{logrotate}}? Is there something 
preventing you from doing so? I assume you have to rotate logs for the rest of 
the daemon's running on your system, can the log rotation tooling be re-used 
for mesos?

 Log rotation capable
 

 Key: MESOS-1375
 URL: https://issues.apache.org/jira/browse/MESOS-1375
 Project: Mesos
  Issue Type: Improvement
  Components: master, slave
Affects Versions: 0.18.0
Reporter: Damien Hardy
  Labels: ops, twitter

 Please provide a way to let ops manage logs.
 A log4j like configuration would be hard but make rotation capable without 
 restarting the service at least. 
 Based on external logrotate tool would be great :
  * write to a constant log file name
  * check for file change (recreated by logrotate) before write



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2636) Segfault in inline TryIP getIP(const std::string hostname, int family)

2015-05-11 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-2636:
--
Description: 
We saw a segfault in production. Attaching the coredump, we see:

Core was generated by `/usr/local/sbin/mesos-slave --port=5051 
--resources=cpus:23;mem:70298;ports:[31'.
Program terminated with signal 11, Segmentation fault.
#0  0x7f639867c77e in free () from /lib64/libc.so.6
(gdb) bt
#0  0x7f639867c77e in free () from /lib64/libc.so.6
#1  0x7f63986c25d0 in freeaddrinfo () from /lib64/libc.so.6
#2  0x7f6399deeafa in net::getIP (hostname=redacted, family=2) at 
./3rdparty/stout/include/stout/net.hpp:201
#3  0x7f6399e1f273 in process::initialize (delegate=Unhandled dwarf 
expression opcode 0xf3
) at src/process.cpp:837
#4  0x0042342f in main ()

  was:
We saw a segfault in production. Attaching the coredump, we see:

Core was generated by `/usr/local/sbin/mesos-slave --port=5051 
--resources=cpus:23;mem:70298;ports:[31'.
Program terminated with signal 11, Segmentation fault.
#0  0x7f639867c77e in free () from /lib64/libc.so.6
(gdb) bt
#0  0x7f639867c77e in free () from /lib64/libc.so.6
#1  0x7f63986c25d0 in freeaddrinfo () from /lib64/libc.so.6
#2  0x7f6399deeafa in net::getIP 
(hostname=smf1-azc-03-sr2.prod.twitter.com, family=2) at 
./3rdparty/stout/include/stout/net.hpp:201
#3  0x7f6399e1f273 in process::initialize (delegate=Unhandled dwarf 
expression opcode 0xf3
) at src/process.cpp:837
#4  0x0042342f in main ()


 Segfault in inline TryIP getIP(const std::string hostname, int family)
 -

 Key: MESOS-2636
 URL: https://issues.apache.org/jira/browse/MESOS-2636
 Project: Mesos
  Issue Type: Bug
Reporter: Chi Zhang
Assignee: Chi Zhang
  Labels: twitter
 Fix For: 0.23.0


 We saw a segfault in production. Attaching the coredump, we see:
 Core was generated by `/usr/local/sbin/mesos-slave --port=5051 
 --resources=cpus:23;mem:70298;ports:[31'.
 Program terminated with signal 11, Segmentation fault.
 #0  0x7f639867c77e in free () from /lib64/libc.so.6
 (gdb) bt
 #0  0x7f639867c77e in free () from /lib64/libc.so.6
 #1  0x7f63986c25d0 in freeaddrinfo () from /lib64/libc.so.6
 #2  0x7f6399deeafa in net::getIP (hostname=redacted, family=2) at 
 ./3rdparty/stout/include/stout/net.hpp:201
 #3  0x7f6399e1f273 in process::initialize (delegate=Unhandled dwarf 
 expression opcode 0xf3
 ) at src/process.cpp:837
 #4  0x0042342f in main ()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2293) Implement the Call endpoint on master

2015-05-11 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2293:
--
Labels: mesosphere  (was: mesosphere twitter)

 Implement the Call endpoint on master
 -

 Key: MESOS-2293
 URL: https://issues.apache.org/jira/browse/MESOS-2293
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
Assignee: Isabel Jimenez
  Labels: mesosphere





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2616) Update C++ style guide on variable naming.

2015-05-11 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1453#comment-1453
 ] 

Michael Park commented on MESOS-2616:
-

We'll slowly make progress on this. I mainly wanted to capture my thoughts on 
this topic as a whole.

 Update C++ style guide on variable naming. 
 ---

 Key: MESOS-2616
 URL: https://issues.apache.org/jira/browse/MESOS-2616
 Project: Mesos
  Issue Type: Documentation
Reporter: Till Toenshoff
Assignee: Alexander Rukletsov
Priority: Minor
 Fix For: 0.23.0


 Our variable naming guide currently is not really explaining use cases for 
 leading or trailing underscores as found a lot within our codebase. 
 We should correct that.
 The following was copied from the review description for allowing discussions 
 where needed:
 Documents the patterns we use to name variables and function arguments in our 
 codebase.
 h4.Leading underscores to avoid ambiguity.
 We use this pattern extensively in libprocess, stout and mesos, a few 
 examples below.
 * stout/try.hpp:105
 {noformat}
 Try(State _state, T* _t = NULL, const std::string _message = )
   : state(_state), t(_t), message(_message) {}
 {noformat}
 * process/http.hpp:480
 {noformat}
   URL(const std::string _scheme,
   const std::string _domain,
   const uint16_t _port = 80,
   const std::string _path = /,
   const hashmapstd::string, std::string _query =
 (hashmapstd::string, std::string()),
   const Optionstd::string _fragment = None())
 : scheme(_scheme),
   domain(_domain),
   port(_port),
   path(_path),
   query(_query),
   fragment(_fragment) {}
 {noformat}
 * slave/containerizer/linux_launcher.cpp:56
 {noformat}
 LinuxLauncher::LinuxLauncher(
 const Flags _flags,
 int _namespaces,
 const string _hierarchy)
   : flags(_flags),
 namespaces(_namespaces),
 hierarchy(_hierarchy) {}
 {noformat}
 h4.Trailing undescores as prime symbols.
 We use this pattern in the code, though not extensively. We would like to see 
 more pass-by-value instead of creating copies from a variable passed by const 
 reference.
 * master.cpp:2942
 {noformat}
 // Create and add the slave id.
 SlaveInfo slaveInfo_ = slaveInfo;
 slaveInfo_.mutable_id()-CopyFrom(newSlaveId());
 {noformat}
 * slave.cpp:4180
 {noformat}
 ExecutorInfo executorInfo_ = executor-info;
 Resources resources = executorInfo_.resources();
 resources += taskInfo.resources();
 executorInfo_.mutable_resources()-CopyFrom(resources);
 {noformat}
 * status_update_manager.cpp:474
 {noformat}
 // Bounded exponential backoff.
 Duration duration_ =
 std::min(duration * 2, STATUS_UPDATE_RETRY_INTERVAL_MAX);
 {noformat}
 * containerizer/mesos/containerizer.cpp:109
 {noformat}
 // Modify the flags to include any changes to isolation.
 Flags flags_ = flags;
 flags_.isolation = isolation;
 {noformat}
 h4.Passing arguments by value.
 * slave.cpp:2480
 {noformat}
 void Slave::statusUpdate(StatusUpdate update, const UPID pid)
 {
   ...
   // Set the source before forwarding the status update.
   update.mutable_status()-set_source(
   pid == UPID() ? TaskStatus::SOURCE_SLAVE : TaskStatus::SOURCE_EXECUTOR);
   ...
 }
 {noformat}
 * process/metrics/timer.hpp:103
 {noformat}
   static void _time(Time start, Timer that)
   {
 const Time stop = Clock::now();
 double value;
 process::internal::acquire(that.data-lock);
 {
   that.data-lastValue = T(stop - start).value();
   value = that.data-lastValue.get();
 }
 process::internal::release(that.data-lock);
 that.push(value);
   }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2622) Document the semantic change in decorator return values

2015-05-11 Thread Niklas Quarfot Nielsen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niklas Quarfot Nielsen updated MESOS-2622:
--
Story Points: 1

 Document the semantic change in decorator return values
 ---

 Key: MESOS-2622
 URL: https://issues.apache.org/jira/browse/MESOS-2622
 Project: Mesos
  Issue Type: Documentation
Reporter: Niklas Quarfot Nielsen
Assignee: Niklas Quarfot Nielsen
  Labels: mesosphere

 In order to enable decorator modules to _remove_ metadata (environment 
 variables or labels), we changed the meaning of the return value for 
 decorator hooks.
 The ResultT return values means:
 ||State||Before||After||
 |Error|Error is propagated to the call-site|No change|
 |None|The result of the decorator is not applied|No change|
 |Some|The result of the decorator is *appended*|The result of the decorator 
 *overwrites* the final labels/environment object|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2645) Design doc for resource oversubscription

2015-05-11 Thread Niklas Quarfot Nielsen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538894#comment-14538894
 ] 

Niklas Quarfot Nielsen commented on MESOS-2645:
---

This has been going through a few reviews already - when do we claim success 
here? :)

 Design doc for resource oversubscription
 

 Key: MESOS-2645
 URL: https://issues.apache.org/jira/browse/MESOS-2645
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
Assignee: Niklas Quarfot Nielsen
  Labels: mesosphere





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2645) Design doc for resource oversubscription

2015-05-11 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538916#comment-14538916
 ] 

Vinod Kone commented on MESOS-2645:
---

Can you update the design doc with the latest thinking?

Also, I think the allocator design needs to be fleshed out before we can 
resolve this ticket. Agreed? 

 Design doc for resource oversubscription
 

 Key: MESOS-2645
 URL: https://issues.apache.org/jira/browse/MESOS-2645
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
Assignee: Niklas Quarfot Nielsen
  Labels: mesosphere





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2716) Add non-const reference version of OptionT::get.

2015-05-11 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-2716:
--

 Summary: Add non-const reference version of OptionT::get.
 Key: MESOS-2716
 URL: https://issues.apache.org/jira/browse/MESOS-2716
 Project: Mesos
  Issue Type: Improvement
  Components: stout
Reporter: Benjamin Mahler


Currently Option only provides a const reference to the underlying object:

{code}
template typename T
class Option
{
  ...
  const T get() const;
  ...
};
{code}

Since we use Option as a replacement for NULL, we often have optional variables 
that we need to perform non-const operations on. However, this requires taking 
a copy:

{code}
static void cleanup(const Response response)
{
  if (response.type == Response::PIPE) {
CHECK_SOME(response.reader);
http::Pipe::Reader reader = response.reader.get(); // Remove const.
reader.close();
  }
}
{code}

Taking a copy is hacky, but works for shared objects and some other copyable 
objects. Since Option represents a mutable variable, it makes sense to add 
non-const reference access to the underlying value:

{code}
template typename T
class Option
{
  ...
  const T get() const;
  T get();
  ...
};
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2516) Move allocation-related types to mesos::master namespace

2015-05-11 Thread Colin Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538751#comment-14538751
 ] 

Colin Williams commented on MESOS-2516:
---

I'd like to start working on this issue, but I'm new enough to Jira I'm not 
sure how to take the task.

 Move allocation-related types to mesos::master namespace
 

 Key: MESOS-2516
 URL: https://issues.apache.org/jira/browse/MESOS-2516
 Project: Mesos
  Issue Type: Improvement
  Components: allocation
Reporter: Alexander Rukletsov
Priority: Minor
  Labels: easyfix, newbie

 {{Allocator}}, {{Sorter}} and {{Comaprator}} types live in 
 {{master::allocator}} namespace. This is not consistent with the rest of the 
 codebase: {{Isolator}}, {{Fetcher}}, {{Containerizer}} all live in {{slave}} 
 namespace. Namespace {{allocator}} should be killed for consistency.
 Since sorters are poorly named, they should be renamed (or namespaced) prior 
 to this change in order not to pollute {{master}} namespace. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1279) Add resize task primitive

2015-05-11 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-1279:
--
Labels: mesosphere myriad  (was: mesosphere)

 Add resize task primitive
 -

 Key: MESOS-1279
 URL: https://issues.apache.org/jira/browse/MESOS-1279
 Project: Mesos
  Issue Type: Sub-task
  Components: c++ api, master, slave
Reporter: Niklas Quarfot Nielsen
  Labels: mesosphere, myriad

 As mentioned in MESOS-938, one way to support task replacement and scaling 
 could be to split the responsibility into several smaller primitives for 1) 
 reducing complexity 2) Make it easier to comprehend and 3) easier and 
 incremental in implementation.
 resizeTask() would be the primitive to either
 1) Scale a running task's resources down
 2) Scale a running task's resources up by using extra auxiliary offers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1739) Allow slave reconfiguration on restart

2015-05-11 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-1739:
--
Labels: mesosphere myriad  (was: mesosphere)

 Allow slave reconfiguration on restart
 --

 Key: MESOS-1739
 URL: https://issues.apache.org/jira/browse/MESOS-1739
 Project: Mesos
  Issue Type: Epic
Reporter: Patrick Reilly
Assignee: Cody Maloney
  Labels: mesosphere, myriad

 Make it so that either via a slave restart or a out of process reconfigure 
 ping, the attributes and resources of a slave can be updated to be a superset 
 of what they used to be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-598) Also check 'git diff --shortstat --staged' in post-reviews.py.

2015-05-11 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538865#comment-14538865
 ] 

Benjamin Mahler commented on MESOS-598:
---

[~pbrett] can you please include the commit when marking as fixed? It's very 
helpful for posterity. :)

 Also check 'git diff --shortstat --staged' in post-reviews.py.
 --

 Key: MESOS-598
 URL: https://issues.apache.org/jira/browse/MESOS-598
 Project: Mesos
  Issue Type: Bug
Reporter: Benjamin Hindman
Assignee: Paul Brett

 We current check if you have any changes before we run post-reviews.py but we 
 don't check for staged changes which IIUC could get lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2598) Slave state.json frameworks.executors.queued_tasks wrong format?

2015-05-11 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538771#comment-14538771
 ] 

Marco Massenzio commented on MESOS-2598:


[~bmahler] sure thing - please also be aware that we are doing some cleanup of 
old stuff and that information may either not be available or only available 
with much difficulty.
Ideally, people would keep their Jira up to date, but that's not always been 
the case and we're trying to bring us to a better place.

 Slave state.json frameworks.executors.queued_tasks wrong format?
 

 Key: MESOS-2598
 URL: https://issues.apache.org/jira/browse/MESOS-2598
 Project: Mesos
  Issue Type: Bug
  Components: statistics
Affects Versions: 0.22.0
 Environment: Linux version 3.10.0-229.1.2.el7.x86_64 
 (buil...@kbuilder.dev.centos.org) (gcc version 4.8.2 20140120 (Red Hat 
 4.8.2-16) (GCC) ) #1 SMP Fri Mar 27 03:04:26 UTC 2015
Reporter: Matthias Veit
Priority: Minor
  Labels: newbie

 queued_tasks.executor_id is expected to be a string and not a complete json 
 object. It should have the very same format as the tasks array on the same 
 level.
 Example, directly taken from slave
 {noformat}
  
 queued_tasks: [
 {
   data: ,
   executor_id: {
 command: {
   argv: [],
   uris: [
 {
   executable: false,
   value: 
 http://downloads.foo.io/orchestra/storm-mesos/0.9.2-incubating-47-ovh.bb373df1c/storm-mesos-0.9.2-incubating.tgz;
 }
   ],
   value: cd storm-mesos*  python bin/storm supervisor 
 storm.mesos.MesosSupervisor
 },
 data: 
 {\assignmentid\:\srv4.hw.ca1.foo.com\,\supervisorid\:\srv4.hw.ca1.foo.com-stage-ingestion-stats-slave-111-1428421145\},
 executor_id: stage-ingestion-stats-slave-111-1428421145,
 framework_id: 20150401-160104-251662508-5050-2197-0002,
 name: ,
 resources: {
   cpus: 0.5,
   disk: 0,
   mem: 1000
 }
   },
   id: srv4.hw.ca1.foo.com-31708,
   name: worker srv4.hw.ca1.foo.com:31708,
   resources: {
 cpus: 1,
 disk: 0,
 mem: 5120,
 ports: [31708-31708]
   },
   slave_id: 20150327-025553-218108076-5050-4122-S0
 },
 ...
 ]
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2516) Move allocation-related types to mesos::master namespace

2015-05-11 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538806#comment-14538806
 ] 

Vinod Kone commented on MESOS-2516:
---

Added you to the contributors role. Now you can assign the task to yourself.

 Move allocation-related types to mesos::master namespace
 

 Key: MESOS-2516
 URL: https://issues.apache.org/jira/browse/MESOS-2516
 Project: Mesos
  Issue Type: Improvement
  Components: allocation
Reporter: Alexander Rukletsov
Priority: Minor
  Labels: easyfix, newbie

 {{Allocator}}, {{Sorter}} and {{Comaprator}} types live in 
 {{master::allocator}} namespace. This is not consistent with the rest of the 
 codebase: {{Isolator}}, {{Fetcher}}, {{Containerizer}} all live in {{slave}} 
 namespace. Namespace {{allocator}} should be killed for consistency.
 Since sorters are poorly named, they should be renamed (or namespaced) prior 
 to this change in order not to pollute {{master}} namespace. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (MESOS-2598) Slave state.json frameworks.executors.queued_tasks wrong format?

2015-05-11 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler reopened MESOS-2598:


[~marco-mesos] Took a look, this is still an issue on master FWICT:

https://github.com/apache/mesos/blob/115db9250e096edf70da54c3f6aea18aefeb1a06/src/slave/http.cpp#L157

{code}
JSON::Object model(const TaskInfo task)
{
  JSON::Object object;
  object.values[id] = task.task_id().value();
  object.values[name] = task.name();
  object.values[slave_id] = task.slave_id().value();
  object.values[resources] = model(task.resources());
  object.values[data] = task.data();

  if (task.has_command()) {
object.values[command] = model(task.command());
  }
  if (task.has_executor()) {
object.values[executor_id] = model(task.executor()); // XXX Bug here.
  }

  return object;
}
{code}

 Slave state.json frameworks.executors.queued_tasks wrong format?
 

 Key: MESOS-2598
 URL: https://issues.apache.org/jira/browse/MESOS-2598
 Project: Mesos
  Issue Type: Bug
  Components: statistics
Affects Versions: 0.22.0
 Environment: Linux version 3.10.0-229.1.2.el7.x86_64 
 (buil...@kbuilder.dev.centos.org) (gcc version 4.8.2 20140120 (Red Hat 
 4.8.2-16) (GCC) ) #1 SMP Fri Mar 27 03:04:26 UTC 2015
Reporter: Matthias Veit
Priority: Minor
  Labels: newbie

 queued_tasks.executor_id is expected to be a string and not a complete json 
 object. It should have the very same format as the tasks array on the same 
 level.
 Example, directly taken from slave
 {noformat}
  
 queued_tasks: [
 {
   data: ,
   executor_id: {
 command: {
   argv: [],
   uris: [
 {
   executable: false,
   value: 
 http://downloads.foo.io/orchestra/storm-mesos/0.9.2-incubating-47-ovh.bb373df1c/storm-mesos-0.9.2-incubating.tgz;
 }
   ],
   value: cd storm-mesos*  python bin/storm supervisor 
 storm.mesos.MesosSupervisor
 },
 data: 
 {\assignmentid\:\srv4.hw.ca1.foo.com\,\supervisorid\:\srv4.hw.ca1.foo.com-stage-ingestion-stats-slave-111-1428421145\},
 executor_id: stage-ingestion-stats-slave-111-1428421145,
 framework_id: 20150401-160104-251662508-5050-2197-0002,
 name: ,
 resources: {
   cpus: 0.5,
   disk: 0,
   mem: 1000
 }
   },
   id: srv4.hw.ca1.foo.com-31708,
   name: worker srv4.hw.ca1.foo.com:31708,
   resources: {
 cpus: 1,
 disk: 0,
 mem: 5120,
 ports: [31708-31708]
   },
   slave_id: 20150327-025553-218108076-5050-4122-S0
 },
 ...
 ]
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1375) Log rotation capable

2015-05-11 Thread Timothy St. Clair (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539084#comment-14539084
 ] 

Timothy St. Clair commented on MESOS-1375:
--

Now that systemd has pretty much taken over, it seems like folks (myself 
included) should update to simply take advantage of the journal.   

 Log rotation capable
 

 Key: MESOS-1375
 URL: https://issues.apache.org/jira/browse/MESOS-1375
 Project: Mesos
  Issue Type: Improvement
  Components: master, slave
Affects Versions: 0.18.0
Reporter: Damien Hardy
  Labels: ops, twitter

 Please provide a way to let ops manage logs.
 A log4j like configuration would be hard but make rotation capable without 
 restarting the service at least. 
 Based on external logrotate tool would be great :
  * write to a constant log file name
  * check for file change (recreated by logrotate) before write



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1279) Add resize task primitive

2015-05-11 Thread Kannan Rajah (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538255#comment-14538255
 ] 

Kannan Rajah commented on MESOS-1279:
-

This JIRA has not been updated in a year. This feature will benefit Apache 
Myriad which launches place holder tasks to represent the YARN containers. 
These container sizes will keep varying and so we need a way to resize the 
place holder tasks to avoid wasting resources. Currently, we are thinking of a 
workaround in Myriad itself. But I would like to know if there is a plan on 
when the feature will be implemented.

 Add resize task primitive
 -

 Key: MESOS-1279
 URL: https://issues.apache.org/jira/browse/MESOS-1279
 Project: Mesos
  Issue Type: Sub-task
  Components: c++ api, master, slave
Reporter: Niklas Quarfot Nielsen

 As mentioned in MESOS-938, one way to support task replacement and scaling 
 could be to split the responsibility into several smaller primitives for 1) 
 reducing complexity 2) Make it easier to comprehend and 3) easier and 
 incremental in implementation.
 resizeTask() would be the primitive to either
 1) Scale a running task's resources down
 2) Scale a running task's resources up by using extra auxiliary offers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2479) Task filter input disappears entirely once the search query yields no results

2015-05-11 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2479:
--
Shepherd: Adam B

 Task filter input disappears entirely once the search query yields no results
 -

 Key: MESOS-2479
 URL: https://issues.apache.org/jira/browse/MESOS-2479
 Project: Mesos
  Issue Type: Bug
  Components: webui
Reporter: Joe Lee
Assignee: Joe Lee
Priority: Minor
   Original Estimate: 1h
  Remaining Estimate: 1h

 The search filter at the head of each table on the Web UI disappears as soon 
 as your search token yields no results, making it impossible to edit your 
 search without having to refresh the entire page.
 This looks to be a simple fix to the hide directive in the table header. It 
 was introduced by commit dfd466cf121bf3482acc73f0461e557a5c3ac299 and undoes 
 this change, as it seems erroneous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2479) Task filter input disappears entirely once the search query yields no results

2015-05-11 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538251#comment-14538251
 ] 

Adam B commented on MESOS-2479:
---

[~bobrik] Could you please submit the patch? See 
http://mesos.apache.org/documentation/latest/submitting-a-patch/
You can add Thomas and myself (adam-mesos) as reviewers.

 Task filter input disappears entirely once the search query yields no results
 -

 Key: MESOS-2479
 URL: https://issues.apache.org/jira/browse/MESOS-2479
 Project: Mesos
  Issue Type: Bug
  Components: webui
Reporter: Joe Lee
Assignee: Joe Lee
Priority: Minor
   Original Estimate: 1h
  Remaining Estimate: 1h

 The search filter at the head of each table on the Web UI disappears as soon 
 as your search token yields no results, making it impossible to edit your 
 search without having to refresh the entire page.
 This looks to be a simple fix to the hide directive in the table header. It 
 was introduced by commit dfd466cf121bf3482acc73f0461e557a5c3ac299 and undoes 
 this change, as it seems erroneous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1279) Add resize task primitive

2015-05-11 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-1279:
--
Labels: mesosphere  (was: )

 Add resize task primitive
 -

 Key: MESOS-1279
 URL: https://issues.apache.org/jira/browse/MESOS-1279
 Project: Mesos
  Issue Type: Sub-task
  Components: c++ api, master, slave
Reporter: Niklas Quarfot Nielsen
  Labels: mesosphere

 As mentioned in MESOS-938, one way to support task replacement and scaling 
 could be to split the responsibility into several smaller primitives for 1) 
 reducing complexity 2) Make it easier to comprehend and 3) easier and 
 incremental in implementation.
 resizeTask() would be the primitive to either
 1) Scale a running task's resources down
 2) Scale a running task's resources up by using extra auxiliary offers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)