[jira] [Created] (MESOS-2027) make distcheck on OSX 10.10

2014-11-03 Thread Till Toenshoff (JIRA)
Till Toenshoff created MESOS-2027:
-

 Summary: make distcheck on OSX 10.10
 Key: MESOS-2027
 URL: https://issues.apache.org/jira/browse/MESOS-2027
 Project: Mesos
  Issue Type: Bug
  Components: build
 Environment: OSX 10.10
Reporter: Till Toenshoff
Assignee: Till Toenshoff
Priority: Minor


It seems our ZooKeeper Yosemite hotfix does not correctly get applied when 
doing a make distcheck on OSX 10.10.

{noformat}
config.status: executing depfiles commands
/Applications/Xcode.app/Contents/Developer/usr/bin/make  all-am
if /bin/sh ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I.  
-I./include -I./tests -I./generated  -Wall -Werror  -g -O2 -D_GNU_SOURCE -MT 
zookeeper.lo -MD -MP -MF .deps/zookeeper.Tpo -c -o zookeeper.lo `test -f 
'src/zookeeper.c' || echo './'`src/zookeeper.c; \
then mv -f .deps/zookeeper.Tpo .deps/zookeeper.Plo; else rm -f 
.deps/zookeeper.Tpo; exit 1; fi
gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall 
-Werror -g -O2 -D_GNU_SOURCE -MT zookeeper.lo -MD -MP -MF .deps/zookeeper.Tpo 
-c src/zookeeper.c  -fno-common -DPIC -o zookeeper.o
In file included from src/zookeeper.c:27:
In file included from ./include/zookeeper.h:34:
./include/recordio.h:76:9: error: expected ')'
int64_t htonll(int64_t v);
   ^
/usr/include/sys/_endian.h:141:25: note: expanded from macro 'htonll'
#define htonll(x)   __DARWIN_OSSwapInt64(x)
   ^
/usr/include/libkern/_OSByteOrder.h:78:30: note: expanded from macro 
'__DARWIN_OSSwapInt64'
   (__builtin_constant_p(x) ? __DARWIN_OSSwapConstInt64(x) : _OSSwapInt64(x))
^
./include/recordio.h:76:9: note: to match this '('
/usr/include/sys/_endian.h:141:25: note: expanded from macro 'htonll'
#define htonll(x)   __DARWIN_OSSwapInt64(x)
   ^
/usr/include/libkern/_OSByteOrder.h:78:5: note: expanded from macro 
'__DARWIN_OSSwapInt64'
   (__builtin_constant_p(x) ? __DARWIN_OSSwapConstInt64(x) : _OSSwapInt64(x))
   ^
In file included from src/zookeeper.c:27:
In file included from ./include/zookeeper.h:34:
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1316) Implement decent unit test coverage for the mesos-fetcher tool

2014-11-03 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194647#comment-14194647
 ] 

Bernd Mathiske commented on MESOS-1316:
---

New patch for this issue: https://reviews.apache.org/r/27516/. This one is 
rebased to latest master and has a couple of minor edits. It replaces and is 
based on https://reviews.apache.org/r/21233 from [~benjaminhindman].

 Implement decent unit test coverage for the mesos-fetcher tool
 --

 Key: MESOS-1316
 URL: https://issues.apache.org/jira/browse/MESOS-1316
 Project: Mesos
  Issue Type: Improvement
  Components: technical debt, test
Reporter: Tom Arnfeld
Assignee: Bernd Mathiske

 There are current no tests that cover the {{mesos-fetcher}} tool itself, and 
 hence bugs like MESOS-1313 have accidentally slipped though.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2028) containerizer_pb2.py is empty

2014-11-03 Thread Thomas Rampelberg (JIRA)
Thomas Rampelberg created MESOS-2028:


 Summary: containerizer_pb2.py is empty
 Key: MESOS-2028
 URL: https://issues.apache.org/jira/browse/MESOS-2028
 Project: Mesos
  Issue Type: Bug
  Components: python api
Affects Versions: 0.20.1, 0.20.0, 0.21.0
Reporter: Thomas Rampelberg
Priority: Minor


The sed command to replace mesos.mesos_pb2 with mesos_pb2 is making 
containerizer_pb2.py blank. This has resulted in `mesos.interface` not being 
usable for containerizer work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2029) Allow slave to checkpoint resources.

2014-11-03 Thread Jie Yu (JIRA)
Jie Yu created MESOS-2029:
-

 Summary: Allow slave to checkpoint resources.
 Key: MESOS-2029
 URL: https://issues.apache.org/jira/browse/MESOS-2029
 Project: Mesos
  Issue Type: Task
Reporter: Jie Yu
Assignee: Jie Yu


The checkpointed resources are independent of the slave lifecycle. In other 
words, even if the slave host reboots, it'll still recover the checkpointed 
resources (unlike other checkpointed data). The slave needs to verify during 
startup that the checkpointed resources are compatible with the resources of 
the slave (specified using --resources flag).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1805) change const pass-by-value to const reference in stout

2014-11-03 Thread Ian Downes (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Downes updated MESOS-1805:
--
Target Version/s: 0.22.0  (was: 0.21.0)

 change const pass-by-value to const reference in stout
 --

 Key: MESOS-1805
 URL: https://issues.apache.org/jira/browse/MESOS-1805
 Project: Mesos
  Issue Type: Improvement
  Components: stout
Affects Versions: 0.20.0
Reporter: Kamil Domański
Assignee: Kamil Domański
Priority: Trivial
  Labels: easyfix, patch, performance

 {{os::shell}} and an overload of {{strings::internal::fmt}} in stout pass a 
 {{const std::string}} parameter instead of {{const std::string}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1873) Don't pass task-related arguments to mesos-executor

2014-11-03 Thread Ian Downes (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Downes updated MESOS-1873:
--
Target Version/s: 0.22.0  (was: 0.21.0)

 Don't pass task-related arguments to mesos-executor
 ---

 Key: MESOS-1873
 URL: https://issues.apache.org/jira/browse/MESOS-1873
 Project: Mesos
  Issue Type: Bug
  Components: slave
Affects Versions: 0.20.1
 Environment: Linux 3.13.0-35-generic x86_64 Ubuntu-Precise
Reporter: R.B. Boyer

 *TL;DR:* When a command executor is used with {{shell=false}} and an array of 
 arguments, those same arguments are directly passed to {{mesos-executor}} 
 which fails miserably.
 ---
 Attempting to launch a task using the command executor with {{shell=false}} 
 and passing arguments fails strangely.  
 {noformat:title=CommandInfo proto}
 command {
   value: /my_program
   user: app
   shell: false
   arguments: my_program
   arguments: --start
   arguments: 2014-10-06
   arguments: --end
   arguments: 2014-10-07
 }
 {noformat}
 Dies with:
 {noformat:title=stderr}
 Failed to load unknown flag 'end'
 Usage: my_program [...]
 Supported options:
   --[no-]help Prints this help message (default: false)
   --[no-]override Whether or not to override the command the executor 
 should run
   when the task is launched. Only this flag is expected 
 to be on
   the command line and all arguments after the flag will 
 be used as
   the subsequent 'argv' to be used with 'execvp' 
 (default: false)
 {noformat}
 This is coming from a failed attempt to have the slave launch 
 {{mesos-executor}}.  This is due to an adverse interaction between new 
 {{CommandInfo}} features and this blurb from {{src/slave/slave.cpp}}:
 {code}
 // Copy the CommandInfo to get the URIs and environment, but
 // update it to invoke 'mesos-executor' (unless we couldn't
 // resolve 'mesos-executor' via 'realpath', in which case just
 // echo the error and exit).
 executor.mutable_command()-MergeFrom(task.command());
 Resultstring path = os::realpath(
 path::join(flags.launcher_dir, mesos-executor));
 if (path.isSome()) {
   executor.mutable_command()-set_value(path.get());
 } else {
   executor.mutable_command()-set_value(
   echo ' +
   (path.isError()
? path.error()
: No such file or directory) +
   '; exit 1);
 }
 {code}
 This is failing to:
 * clear the {{arguments}} field
 * probably explicitly restore {{shell=true}}
 * clear {{container}} ?
 * clear {{user}} ?
 I was able to quickly fix this locally by making a man-in-the-middle program 
 at {{/usr/local/libexec/mesos/mesos-executor}} that stripped all args before 
 exec-ing the real {{mesos-executor}} binary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1856) Support specifying libnl3 install location.

2014-11-03 Thread Ian Downes (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Downes updated MESOS-1856:
--
Target Version/s: 0.22.0  (was: 0.21.0)

 Support specifying libnl3 install location.
 ---

 Key: MESOS-1856
 URL: https://issues.apache.org/jira/browse/MESOS-1856
 Project: Mesos
  Issue Type: Task
Reporter: Jie Yu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1971) Switch cgroups_limit_swap default to true

2014-11-03 Thread Ian Downes (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Downes updated MESOS-1971:
--
Target Version/s: 0.22.0  (was: 0.21.0)

 Switch cgroups_limit_swap default to true
 -

 Key: MESOS-1971
 URL: https://issues.apache.org/jira/browse/MESOS-1971
 Project: Mesos
  Issue Type: Improvement
Reporter: Anton Lindström
Assignee: Anton Lindström
Priority: Trivial

 Switch cgroups_limit_swap to true per default, see MESOS-1662 for more 
 information.
 Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2030) Support persistent disk resource in master.

2014-11-03 Thread Jie Yu (JIRA)
Jie Yu created MESOS-2030:
-

 Summary: Support persistent disk resource in master.
 Key: MESOS-2030
 URL: https://issues.apache.org/jira/browse/MESOS-2030
 Project: Mesos
  Issue Type: Task
Reporter: Jie Yu


We need to do the following in master in order to support persistent disk 
resource:
1) Add an API allowing the framework to release a persistent disk resource.
2) Maintain an in-memory data structure to track persistent disk resources on 
each slave. Update this data structure when slaves 
register/re-register/disconnect, etc.
3) Relay releasing of persistent disk resource to the corresponding slave 
according to the data structure maintained in 2)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2030) Support persistent disk resource in master.

2014-11-03 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-2030:
-

Assignee: Jie Yu

 Support persistent disk resource in master.
 ---

 Key: MESOS-2030
 URL: https://issues.apache.org/jira/browse/MESOS-2030
 Project: Mesos
  Issue Type: Task
Reporter: Jie Yu
Assignee: Jie Yu

 We need to do the following in master in order to support persistent disk 
 resource:
 1) Add an API allowing the framework to release a persistent disk resource.
 2) Maintain an in-memory data structure to track persistent disk resources on 
 each slave. Update this data structure when slaves 
 register/re-register/disconnect, etc.
 3) Relay releasing of persistent disk resource to the corresponding slave 
 according to the data structure maintained in 2)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2031) Manage persistent directories on slave.

2014-11-03 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-2031:
-

Assignee: Jie Yu

 Manage persistent directories on slave.
 ---

 Key: MESOS-2031
 URL: https://issues.apache.org/jira/browse/MESOS-2031
 Project: Mesos
  Issue Type: Task
Reporter: Jie Yu
Assignee: Jie Yu

 Whenever a slave sees a persistent disk resource (in ExecutorInfo or 
 TaskInfo) that is new to it, it will create a persistent directory which is 
 for tasks to store persistent data.
 The slave needs to do the following after it's created:
 1) symlink into the executor sandbox so that tasks/executor can see it
 2) garbage collect it once it is released by the framework



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1873) Don't pass task-related arguments to mesos-executor

2014-11-03 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-1873:
--
Fix Version/s: 0.21.0

 Don't pass task-related arguments to mesos-executor
 ---

 Key: MESOS-1873
 URL: https://issues.apache.org/jira/browse/MESOS-1873
 Project: Mesos
  Issue Type: Bug
  Components: slave
Affects Versions: 0.20.1
 Environment: Linux 3.13.0-35-generic x86_64 Ubuntu-Precise
Reporter: R.B. Boyer
 Fix For: 0.21.0


 *TL;DR:* When a command executor is used with {{shell=false}} and an array of 
 arguments, those same arguments are directly passed to {{mesos-executor}} 
 which fails miserably.
 ---
 Attempting to launch a task using the command executor with {{shell=false}} 
 and passing arguments fails strangely.  
 {noformat:title=CommandInfo proto}
 command {
   value: /my_program
   user: app
   shell: false
   arguments: my_program
   arguments: --start
   arguments: 2014-10-06
   arguments: --end
   arguments: 2014-10-07
 }
 {noformat}
 Dies with:
 {noformat:title=stderr}
 Failed to load unknown flag 'end'
 Usage: my_program [...]
 Supported options:
   --[no-]help Prints this help message (default: false)
   --[no-]override Whether or not to override the command the executor 
 should run
   when the task is launched. Only this flag is expected 
 to be on
   the command line and all arguments after the flag will 
 be used as
   the subsequent 'argv' to be used with 'execvp' 
 (default: false)
 {noformat}
 This is coming from a failed attempt to have the slave launch 
 {{mesos-executor}}.  This is due to an adverse interaction between new 
 {{CommandInfo}} features and this blurb from {{src/slave/slave.cpp}}:
 {code}
 // Copy the CommandInfo to get the URIs and environment, but
 // update it to invoke 'mesos-executor' (unless we couldn't
 // resolve 'mesos-executor' via 'realpath', in which case just
 // echo the error and exit).
 executor.mutable_command()-MergeFrom(task.command());
 Resultstring path = os::realpath(
 path::join(flags.launcher_dir, mesos-executor));
 if (path.isSome()) {
   executor.mutable_command()-set_value(path.get());
 } else {
   executor.mutable_command()-set_value(
   echo ' +
   (path.isError()
? path.error()
: No such file or directory) +
   '; exit 1);
 }
 {code}
 This is failing to:
 * clear the {{arguments}} field
 * probably explicitly restore {{shell=true}}
 * clear {{container}} ?
 * clear {{user}} ?
 I was able to quickly fix this locally by making a man-in-the-middle program 
 at {{/usr/local/libexec/mesos/mesos-executor}} that stripped all args before 
 exec-ing the real {{mesos-executor}} binary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1873) Don't pass task-related arguments to mesos-executor

2014-11-03 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-1873:
--
Target Version/s:   (was: 0.22.0)

 Don't pass task-related arguments to mesos-executor
 ---

 Key: MESOS-1873
 URL: https://issues.apache.org/jira/browse/MESOS-1873
 Project: Mesos
  Issue Type: Bug
  Components: slave
Affects Versions: 0.20.1
 Environment: Linux 3.13.0-35-generic x86_64 Ubuntu-Precise
Reporter: R.B. Boyer
 Fix For: 0.21.0


 *TL;DR:* When a command executor is used with {{shell=false}} and an array of 
 arguments, those same arguments are directly passed to {{mesos-executor}} 
 which fails miserably.
 ---
 Attempting to launch a task using the command executor with {{shell=false}} 
 and passing arguments fails strangely.  
 {noformat:title=CommandInfo proto}
 command {
   value: /my_program
   user: app
   shell: false
   arguments: my_program
   arguments: --start
   arguments: 2014-10-06
   arguments: --end
   arguments: 2014-10-07
 }
 {noformat}
 Dies with:
 {noformat:title=stderr}
 Failed to load unknown flag 'end'
 Usage: my_program [...]
 Supported options:
   --[no-]help Prints this help message (default: false)
   --[no-]override Whether or not to override the command the executor 
 should run
   when the task is launched. Only this flag is expected 
 to be on
   the command line and all arguments after the flag will 
 be used as
   the subsequent 'argv' to be used with 'execvp' 
 (default: false)
 {noformat}
 This is coming from a failed attempt to have the slave launch 
 {{mesos-executor}}.  This is due to an adverse interaction between new 
 {{CommandInfo}} features and this blurb from {{src/slave/slave.cpp}}:
 {code}
 // Copy the CommandInfo to get the URIs and environment, but
 // update it to invoke 'mesos-executor' (unless we couldn't
 // resolve 'mesos-executor' via 'realpath', in which case just
 // echo the error and exit).
 executor.mutable_command()-MergeFrom(task.command());
 Resultstring path = os::realpath(
 path::join(flags.launcher_dir, mesos-executor));
 if (path.isSome()) {
   executor.mutable_command()-set_value(path.get());
 } else {
   executor.mutable_command()-set_value(
   echo ' +
   (path.isError()
? path.error()
: No such file or directory) +
   '; exit 1);
 }
 {code}
 This is failing to:
 * clear the {{arguments}} field
 * probably explicitly restore {{shell=true}}
 * clear {{container}} ?
 * clear {{user}} ?
 I was able to quickly fix this locally by making a man-in-the-middle program 
 at {{/usr/local/libexec/mesos/mesos-executor}} that stripped all args before 
 exec-ing the real {{mesos-executor}} binary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1143) Add a TASK_ERROR task status.

2014-11-03 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1143:
-
Sprint: Twitter Mesos Q4 Sprint 3

 Add a TASK_ERROR task status.
 -

 Key: MESOS-1143
 URL: https://issues.apache.org/jira/browse/MESOS-1143
 Project: Mesos
  Issue Type: Improvement
  Components: framework, master
Reporter: Benjamin Hindman
Assignee: Dominic Hamon

 During task validation we drop tasks that have errors and send TASK_LOST 
 status updates. In most circumstances a framework will want to relaunch a 
 task that has gone lost, and in the event the task is actually malformed 
 (thus invalid) this will result in an infinite loop of sending a task and 
 having it go lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1830) Expose master stats differentiating between master-generated and slave-generated LOST tasks

2014-11-03 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1830:
-
Sprint: Twitter Q4 Sprint 1, Twitter Mesos Q4 Sprint 2, Twitter Mesos Q4 
Sprint 3  (was: Twitter Q4 Sprint 1, Twitter Mesos Q4 Sprint 2)

 Expose master stats differentiating between master-generated and 
 slave-generated LOST tasks
 ---

 Key: MESOS-1830
 URL: https://issues.apache.org/jira/browse/MESOS-1830
 Project: Mesos
  Issue Type: Story
  Components: master
Reporter: Bill Farner
Assignee: Dominic Hamon
Priority: Minor

 The master exports a monotonically-increasing counter of tasks transitioned 
 to TASK_LOST.  This loses fidelity of the source of the lost task.  A first 
 step in exposing the source of lost tasks might be to just differentiate 
 between TASK_LOST transitions initiated by the master vs the slave (and maybe 
 bad input from the scheduler).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1941) Make executor's user owner of executor's cgroup directory

2014-11-03 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1941:
-
Sprint: Twitter Mesos Q4 Sprint 2, Twitter Mesos Q4 Sprint 3  (was: Twitter 
Mesos Q4 Sprint 2)

 Make executor's user owner of executor's cgroup directory
 -

 Key: MESOS-1941
 URL: https://issues.apache.org/jira/browse/MESOS-1941
 Project: Mesos
  Issue Type: Improvement
  Components: isolation, slave
Reporter: Mohit Soni
Assignee: Ian Downes
Priority: Minor

 Currently, when cgroups are enabled, and executor is spawned, it's mounted 
 under, for ex: /sys/fs/cgroup/cpu/mesos/mesos-id. This directory in current 
 implementation is only writable by root user. This prevents process launched 
 by executor to mount its child processes under this cgroup, because the 
 cgroup directory is only writable by root.
 To enable a executor spawned process to mount it's child processes under it's 
 cgroup directory, the cgroup directory should be made writable by the user 
 which spawns the executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1456) Metric lifetime should be tied to process runstate, not lifetime.

2014-11-03 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1456:
-
Sprint: Mesos Q3 Sprint 6, Twitter Q4 Sprint 1, Twitter Mesos Q4 Sprint 2, 
Twitter Mesos Q4 Sprint 3  (was: Mesos Q3 Sprint 6, Twitter Q4 Sprint 1, 
Twitter Mesos Q4 Sprint 2)

 Metric lifetime should be tied to process runstate, not lifetime.
 -

 Key: MESOS-1456
 URL: https://issues.apache.org/jira/browse/MESOS-1456
 Project: Mesos
  Issue Type: Bug
  Components: statistics
Affects Versions: 0.19.0
Reporter: Dominic Hamon
Assignee: Dominic Hamon

 The usual pattern for termination of processes is {{terminate(..); wait(..); 
 delete ..;}} but the {{SchedulerProcess}} is terminated and then deleted some 
 time later.
 If the metrics endpoint is accessed within that period, it never returns as 
 it tries to access a {{Gauge}} that has a reference to a valid PID that is 
 not getting any timeslices (the {{SchedulerProcess}}). A one-off fix can be 
 made to the {{SchedulerProcess}} to move the metrics add/remove calls to 
 {{initialize}} and {{finalize}}, but this should be the general pattern for 
 every process with metrics. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1903) Add backoff to framework re-registration retries

2014-11-03 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1903:
-
Sprint: Twitter Mesos Q4 Sprint 2, Twitter Mesos Q4 Sprint 3  (was: Twitter 
Mesos Q4 Sprint 2)

 Add backoff to framework re-registration retries
 

 Key: MESOS-1903
 URL: https://issues.apache.org/jira/browse/MESOS-1903
 Project: Mesos
  Issue Type: Task
Reporter: Dominic Hamon
Assignee: Vinod Kone

 To avoid so many duplicate framework re-registration attempts (and thus offer 
 rescinds) we should add backoff to re-registration retries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1974) Refactor the C++ 'Resources' abstraction.

2014-11-03 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1974:
-
Sprint: Twitter Mesos Q4 Sprint 2, Twitter Mesos Q4 Sprint 3  (was: Twitter 
Mesos Q4 Sprint 2)

 Refactor the C++ 'Resources' abstraction.
 -

 Key: MESOS-1974
 URL: https://issues.apache.org/jira/browse/MESOS-1974
 Project: Mesos
  Issue Type: Improvement
Reporter: Jie Yu
Assignee: Jie Yu

 The existing C++ 'Resources' interfaces are poorly designed. Some of them are 
 confusing and unintuitive. Some of them are overloaded with too many 
 functionalities. For instance,
 {noformat}
 bool operator = (const Resource left, const Resource right);
 {noformat}
 This interface in non-intuitive because A = B doesn't imply !(B = A).
 {noformat}
 Resource operator + (const Resource left, const Resource right);
 {noformat}
 This one is also non-intuitive because if 'left' is not compatible with 
 'right', the result is 'left' (why not right???). Similar for operator '-'.
 {noformat}
 OptionResource Resources::get(const Resource r) const;
 {noformat}
 This one assume Resources is flattened, but it might not be.
 As we start to introduce persistent disk resources (MESOS-1554), things will 
 get more complicated. For example, one may want to get two types of 'disk()' 
 functions: one returns the ephemeral disk bytes (with no disk info), one 
 returns the total disk bytes (including ones that have disk info). We may 
 wanna introduce a concept about Resource that indicates that a resource 
 cannot be merged or split (e.g., atomic?).
 Since we need to change this class anyway. I wanna take this chance to 
 refactor it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1807) Disallow executors with cpu only or memory only resources

2014-11-03 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1807:
-
Sprint: Twitter Q4 Sprint 1, Twitter Mesos Q4 Sprint 2, Twitter Mesos Q4 
Sprint 3  (was: Twitter Q4 Sprint 1, Twitter Mesos Q4 Sprint 2)

 Disallow executors with cpu only or memory only resources
 -

 Key: MESOS-1807
 URL: https://issues.apache.org/jira/browse/MESOS-1807
 Project: Mesos
  Issue Type: Improvement
Reporter: Vinod Kone
Assignee: Vinod Kone
  Labels: newbie

 Currently master allows executors to be launched with either only cpus or 
 only memory but we shouldn't allow that.
 This is because executor is an actual unix process that is launched by the 
 slave. If an executor doesn't specify cpus, what should do the cpu limits be 
 for that executor when there are no tasks running on it? If no cpu limits are 
 set then it might starve other executors/tasks on the slave violating 
 isolation guarantees. Same goes with memory. Moreover, the current 
 containerizer/isolator code will throw failures when using such an executor, 
 e.g., when the last task on the executor finishes and Containerizer::update() 
 is called with 0 cpus or 0 mem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1751) Request for stats.json cannot be fulfilled after stopping the framework

2014-11-03 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1751:
-
Sprint: Mesos Q3 Sprint 6, Twitter Q4 Sprint 1, Twitter Mesos Q4 Sprint 2, 
Twitter Mesos Q4 Sprint 3  (was: Mesos Q3 Sprint 6, Twitter Q4 Sprint 1, 
Twitter Mesos Q4 Sprint 2)

 Request for stats.json cannot be fulfilled after stopping the framework 
 --

 Key: MESOS-1751
 URL: https://issues.apache.org/jira/browse/MESOS-1751
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 0.21.0
 Environment: Test case launched on Mac OS X Mavericks.
Reporter: Alexander Rukletsov
Assignee: Dominic Hamon
Priority: Minor

 Request for stats.json to master from a test case doesn't work after 
 calling frameworks' {{driver.stop()}}. However, it works for state.json. I 
 think the problem is related to {{stats()}} continuation {{_stats()}}. The 
 following test illustrates the issue:
 {code:title=TestCase.cpp|borderStyle=solid}
 TEST_F(MasterTest, RequestAfterDriverStop)
 {
   TryPIDMaster  master = StartMaster();
   ASSERT_SOME(master);
   TryPIDSlave  slave = StartSlave();
   ASSERT_SOME(slave);
   MockScheduler sched;
   MesosSchedulerDriver driver(
   sched, DEFAULT_FRAMEWORK_INFO, master.get(), DEFAULT_CREDENTIAL);
   driver.start();
   
   Futureprocess::http::Response response_before =
   process::http::get(master.get(), stats.json);
   AWAIT_READY(response_before);
   driver.stop();
   Futureprocess::http::Response response_after =
   process::http::get(master.get(), stats.json);
   AWAIT_READY(response_after);
   driver.join();
   Shutdown();  // Must shutdown before 'containerizer' gets deallocated.
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1930) Expose TASK_KILLED reason.

2014-11-03 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1930:
-
Sprint: Twitter Mesos Q4 Sprint 2, Twitter Mesos Q4 Sprint 3  (was: Twitter 
Mesos Q4 Sprint 2)

 Expose TASK_KILLED reason.
 --

 Key: MESOS-1930
 URL: https://issues.apache.org/jira/browse/MESOS-1930
 Project: Mesos
  Issue Type: Story
Reporter: Alexander Rukletsov
Assignee: Dominic Hamon
Priority: Minor

 A task process may be killed by a SIGTERM or SIGKILL. The only possibility to 
 check how the task process has exited is to examine the message: 
 {{status.message().find(Terminated)}}. However, a task may not run in its 
 own process, hence the executor may not be able to provide an exit status. 
 What we actually want is an artificial task exit status that is rendered by 
 the executor.
 This may be resolved by adding second tier states or state explanations. Here 
 is a link to a discussion: https://reviews.apache.org/r/26382/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-487) Balloon framework fails to run due to bad flags

2014-11-03 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-487:

Sprint: Twitter Mesos Q4 Sprint 3

 Balloon framework fails to run due to bad flags
 ---

 Key: MESOS-487
 URL: https://issues.apache.org/jira/browse/MESOS-487
 Project: Mesos
  Issue Type: Bug
Reporter: Vinod Kone
Assignee: Vinod Kone
  Labels: twitter

 I suspect this has to do with the latest flags refactor.
 [vinod@smfd-bkq-03-sr4 build]$  sudo GLOG_v=1 ./bin/mesos-tests.sh 
 --gtest_filter=*Balloon* --verbose
 WARNING: Logging before InitGoogleLogging() is written to STDERR
 I0529 22:28:13.094351 31506 process.cpp:1426] libprocess is initialized on 
 10.37.184.103:53425 for 24 cpus
 I0529 22:28:13.095010 31506 logging.cpp:91] Logging to STDERR
 Source directory: /home/vinod/mesos
 Build directory: /home/vinod/mesos/build
 -
 We cannot run any cgroups tests that require mounting
 hierarchies because you have the following hierarchies mounted:
 /cgroup
 We'll disable the CgroupsNoHierarchyTest test fixture for now.
 -
 Note: Google Test filter = 
 *Balloon*-CgroupsNoHierarchyTest.ROOT_CGROUPS_NOHIERARCHY_MountUnmountHierarchy:
 [==] Running 1 test from 1 test case.
 [--] Global test environment set-up.
 [--] 1 test from CgroupsIsolatorTest
 [ RUN  ] CgroupsIsolatorTest.ROOT_CGROUPS_BalloonFramework
 Using temporary directory 
 '/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_BalloonFramework_pWWdE1'
 Launched master at 31574
 Failed to load unknown flag 'build_dir'
 Usage: lt-mesos-master [...]
 Supported options:
   --allocation_interval=VALUE Amount of time to wait between performing
(batch) allocations (e.g., 500ms, 1sec, 
 etc) (default: 1secs)
   --cluster=VALUE Human readable name for the cluster,
   displayed in the webui
   --framework_sorter=VALUEPolicy to use for allocating resources
   between a given user's frameworks. Options
   are the same as for user_allocator 
 (default: drf)
   --[no-]help Prints this help message (default: false)
   --ip=VALUE  IP address to listen on
   --log_dir=VALUE Location to put log files (no default, 
 nothing
   is written to disk unless specified;
   does not affect logging to stderr)
   --logbufsecs=VALUE  How many seconds to buffer log messages for 
 (default: 0)
   --port=VALUEPort to listen on (default: 5050)
   --[no-]quietDisable logging to stderr (default: false)
   --[no-]root_submissions Can root submit frameworks? (default: true)
   --slaves=VALUE  Initial slaves that should be
   considered part of this cluster
   (or if using ZooKeeper a URL) (default: *)
   --user_sorter=VALUE Policy to use for allocating resources
   between users. May be one of:
 dominant_resource_fairness (drf) 
 (default: drf)
   --webui_dir=VALUE   Location of the webui files/assets 
 (default: /usr/local/share/mesos/webui)
   --whitelist=VALUE   Path to a file with a list of slaves
   (one per line) to advertise offers for;
   should be of the form: file://path/to/file 
 (default: *)
   --zk=VALUE  ZooKeeper URL (used for leader election 
 amongst masters)
   May be one of:
 zk://host1:port1,host2:port2,.../path
 
 zk://username:password@host1:port1,host2:port2,.../path
 file://path/to/file (where file contains 
 one of the above) (default: )
 {RED}Master crashed; failing test
 /home/vinod/mesos/src/tests/balloon_framework_test.sh: line 31: kill: (31574) 
 - No such process
 ../../src/tests/script.cpp:76: Failure
 Failed
 balloon_framework_test.sh exited with status 2
 [  FAILED  ] CgroupsIsolatorTest.ROOT_CGROUPS_BalloonFramework (2031 ms)
 [--] 1 test from CgroupsIsolatorTest (2031 ms total)
 [--] Global test environment tear-down
 [==] 1 test from 1 test case ran. (2031 ms total)
 [  PASSED  ] 0 tests.
 [  FAILED  ] 1 test, listed below:
 [  FAILED  ] CgroupsIsolatorTest.ROOT_CGROUPS_BalloonFramework
  1 FAILED TEST



--
This message was sent by 

[jira] [Updated] (MESOS-723) Expose total number of resources allocated to the slave in its endpoint

2014-11-03 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-723:

Sprint: Twitter Mesos Q4 Sprint 3

 Expose total number of resources allocated to the slave in its endpoint
 ---

 Key: MESOS-723
 URL: https://issues.apache.org/jira/browse/MESOS-723
 Project: Mesos
  Issue Type: Improvement
Reporter: Vinod Kone
  Labels: twitter

 This could be useful information if there are bugs in master/slave that 
 causes slaves to overcommit its resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1718) Command executor can overcommit the slave.

2014-11-03 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1718:
-
Sprint: Twitter Mesos Q4 Sprint 3

 Command executor can overcommit the slave.
 --

 Key: MESOS-1718
 URL: https://issues.apache.org/jira/browse/MESOS-1718
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Benjamin Mahler
Assignee: Ian Downes

 Currently we give a small amount of resources to the command executor, in 
 addition to resources used by the command task:
 https://github.com/apache/mesos/blob/0.20.0-rc1/src/slave/slave.cpp#L2448
 {code: title=}
 ExecutorInfo Slave::getExecutorInfo(
 const FrameworkID frameworkId,
 const TaskInfo task)
 {
   ...
 // Add an allowance for the command executor. This does lead to a
 // small overcommit of resources.
 executor.mutable_resources()-MergeFrom(
 Resources::parse(
   cpus: + stringify(DEFAULT_EXECUTOR_CPUS) + ; +
   mem: + stringify(DEFAULT_EXECUTOR_MEM.megabytes())).get());
   ...
 }
 {code}
 This leads to an overcommit of the slave. Ideally, for command tasks we can 
 transfer all of the task resources to the executor at the slave / isolation 
 level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2008) MasterAuthorizationTest.DuplicateReregistration is flaky

2014-11-03 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-2008:
-
Sprint: Twitter Mesos Q4 Sprint 3

 MasterAuthorizationTest.DuplicateReregistration is flaky
 

 Key: MESOS-2008
 URL: https://issues.apache.org/jira/browse/MESOS-2008
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 0.21.0
 Environment: https://builds.apache.org/computer/ubuntu-4/
Reporter: Yan Xu
Assignee: Vinod Kone

 {noformat:title=}
 [ RUN  ] MasterAuthorizationTest.DuplicateReregistration
 Using temporary directory 
 '/tmp/MasterAuthorizationTest_DuplicateReregistration_DLOmYX'
 I1029 08:25:26.021766 32232 leveldb.cpp:176] Opened db in 3.066621ms
 I1029 08:25:26.022734 32232 leveldb.cpp:183] Compacted db in 935019ns
 I1029 08:25:26.022766 32232 leveldb.cpp:198] Created db iterator in 4350ns
 I1029 08:25:26.022785 32232 leveldb.cpp:204] Seeked to beginning of db in 
 902ns
 I1029 08:25:26.022799 32232 leveldb.cpp:273] Iterated through 0 keys in the 
 db in 387ns
 I1029 08:25:26.022831 32232 replica.cpp:741] Replica recovered with log 
 positions 0 - 0 with 1 holes and 0 unlearned
 I1029 08:25:26.023305 32248 recover.cpp:437] Starting replica recovery
 I1029 08:25:26.023598 32248 recover.cpp:463] Replica is in EMPTY status
 I1029 08:25:26.025059 32260 replica.cpp:638] Replica in EMPTY status received 
 a broadcasted recover request
 I1029 08:25:26.025320 32247 recover.cpp:188] Received a recover response from 
 a replica in EMPTY status
 I1029 08:25:26.025585 32256 recover.cpp:554] Updating replica status to 
 STARTING
 I1029 08:25:26.026546 32249 master.cpp:312] Master 
 20141029-082526-3142697795-40696-32232 (pomona.apache.org) started on 
 67.195.81.187:40696
 I1029 08:25:26.026561 32261 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 69ns
 I1029 08:25:26.026592 32249 master.cpp:358] Master only allowing 
 authenticated frameworks to register
 I1029 08:25:26.026592 32261 replica.cpp:320] Persisted replica status to 
 STARTING
 I1029 08:25:26.026605 32249 master.cpp:363] Master only allowing 
 authenticated slaves to register
 I1029 08:25:26.026639 32249 credentials.hpp:36] Loading credentials for 
 authentication from 
 '/tmp/MasterAuthorizationTest_DuplicateReregistration_DLOmYX/credentials'
 I1029 08:25:26.026877 32249 master.cpp:392] Authorization enabled
 I1029 08:25:26.026901 32260 recover.cpp:463] Replica is in STARTING status
 I1029 08:25:26.027498 32261 master.cpp:120] No whitelist given. Advertising 
 offers for all slaves
 I1029 08:25:26.027541 32248 hierarchical_allocator_process.hpp:299] 
 Initializing hierarchical allocator process with master : 
 master@67.195.81.187:40696
 I1029 08:25:26.028055 32252 replica.cpp:638] Replica in STARTING status 
 received a broadcasted recover request
 I1029 08:25:26.028451 32247 recover.cpp:188] Received a recover response from 
 a replica in STARTING status
 I1029 08:25:26.028733 32249 master.cpp:1242] The newly elected leader is 
 master@67.195.81.187:40696 with id 20141029-082526-3142697795-40696-32232
 I1029 08:25:26.028764 32249 master.cpp:1255] Elected as the leading master!
 I1029 08:25:26.028781 32249 master.cpp:1073] Recovering from registrar
 I1029 08:25:26.028904 32246 recover.cpp:554] Updating replica status to VOTING
 I1029 08:25:26.029163 32257 registrar.cpp:313] Recovering registrar
 I1029 08:25:26.029556 32251 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 485711ns
 I1029 08:25:26.029588 32251 replica.cpp:320] Persisted replica status to 
 VOTING
 I1029 08:25:26.029726 32253 recover.cpp:568] Successfully joined the Paxos 
 group
 I1029 08:25:26.029932 32253 recover.cpp:452] Recover process terminated
 I1029 08:25:26.030436 32250 log.cpp:656] Attempting to start the writer
 I1029 08:25:26.032152 32248 replica.cpp:474] Replica received implicit 
 promise request with proposal 1
 I1029 08:25:26.032778 32248 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 597030ns
 I1029 08:25:26.032807 32248 replica.cpp:342] Persisted promised to 1
 I1029 08:25:26.033481 32254 coordinator.cpp:230] Coordinator attemping to 
 fill missing position
 I1029 08:25:26.035429 32247 replica.cpp:375] Replica received explicit 
 promise request for position 0 with proposal 2
 I1029 08:25:26.036154 32247 leveldb.cpp:343] Persisting action (8 bytes) to 
 leveldb took 690208ns
 I1029 08:25:26.036181 32247 replica.cpp:676] Persisted action at 0
 I1029 08:25:26.037344 32249 replica.cpp:508] Replica received write request 
 for position 0
 I1029 08:25:26.037395 32249 leveldb.cpp:438] Reading position from leveldb 
 took 22607ns
 I1029 08:25:26.038074 32249 leveldb.cpp:343] Persisting action (14 bytes) to 
 leveldb took 647429ns
 I1029 

[jira] [Updated] (MESOS-2030) Support persistent disk resource in master.

2014-11-03 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-2030:
-
Sprint: Twitter Mesos Q4 Sprint 3

 Support persistent disk resource in master.
 ---

 Key: MESOS-2030
 URL: https://issues.apache.org/jira/browse/MESOS-2030
 Project: Mesos
  Issue Type: Task
Reporter: Jie Yu
Assignee: Jie Yu

 We need to do the following in master in order to support persistent disk 
 resource:
 1) Add an API allowing the framework to release a persistent disk resource.
 2) Maintain an in-memory data structure to track persistent disk resources on 
 each slave. Update this data structure when slaves 
 register/re-register/disconnect, etc.
 3) Relay releasing of persistent disk resource to the corresponding slave 
 according to the data structure maintained in 2)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1902) Support persistent disk resource.

2014-11-03 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1902:
-
Sprint: Twitter Q4 Sprint 1, Twitter Mesos Q4 Sprint 2  (was: Twitter Q4 
Sprint 1, Twitter Mesos Q4 Sprint 2, Twitter Mesos Q4 Sprint 3)

 Support persistent disk resource.
 -

 Key: MESOS-1902
 URL: https://issues.apache.org/jira/browse/MESOS-1902
 Project: Mesos
  Issue Type: Task
Reporter: Jie Yu
Assignee: Jie Yu

 Mesos needs to provide a way to allow tasks to write persistent data which 
 won’t be garbage collected. For example, a task can write its persistent data 
 to some predefined directory. When this task finishes, the framework can 
 launch a new task which is able to access the persistent data written by the 
 previous task which Mesos would have usually garbage-collected.
 One way to achieve that is to provide a new type of disk resources which are 
 persistent. We call it persistent disk resource. When a framework launches a 
 task using persistent disk resources, the data the task writes will be 
 persisted. When the framework launches a new task using the same persistent 
 disk resource (after the previous task finishes), the new task will be able 
 to access the data written by the previous task.
 The persistent disk resource should be able to survive slave reboot or slave 
 info/id change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-487) Balloon framework fails to run due to bad flags

2014-11-03 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-487:

Story Points: 1

 Balloon framework fails to run due to bad flags
 ---

 Key: MESOS-487
 URL: https://issues.apache.org/jira/browse/MESOS-487
 Project: Mesos
  Issue Type: Bug
Reporter: Vinod Kone
Assignee: Vinod Kone
  Labels: twitter

 I suspect this has to do with the latest flags refactor.
 [vinod@smfd-bkq-03-sr4 build]$  sudo GLOG_v=1 ./bin/mesos-tests.sh 
 --gtest_filter=*Balloon* --verbose
 WARNING: Logging before InitGoogleLogging() is written to STDERR
 I0529 22:28:13.094351 31506 process.cpp:1426] libprocess is initialized on 
 10.37.184.103:53425 for 24 cpus
 I0529 22:28:13.095010 31506 logging.cpp:91] Logging to STDERR
 Source directory: /home/vinod/mesos
 Build directory: /home/vinod/mesos/build
 -
 We cannot run any cgroups tests that require mounting
 hierarchies because you have the following hierarchies mounted:
 /cgroup
 We'll disable the CgroupsNoHierarchyTest test fixture for now.
 -
 Note: Google Test filter = 
 *Balloon*-CgroupsNoHierarchyTest.ROOT_CGROUPS_NOHIERARCHY_MountUnmountHierarchy:
 [==] Running 1 test from 1 test case.
 [--] Global test environment set-up.
 [--] 1 test from CgroupsIsolatorTest
 [ RUN  ] CgroupsIsolatorTest.ROOT_CGROUPS_BalloonFramework
 Using temporary directory 
 '/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_BalloonFramework_pWWdE1'
 Launched master at 31574
 Failed to load unknown flag 'build_dir'
 Usage: lt-mesos-master [...]
 Supported options:
   --allocation_interval=VALUE Amount of time to wait between performing
(batch) allocations (e.g., 500ms, 1sec, 
 etc) (default: 1secs)
   --cluster=VALUE Human readable name for the cluster,
   displayed in the webui
   --framework_sorter=VALUEPolicy to use for allocating resources
   between a given user's frameworks. Options
   are the same as for user_allocator 
 (default: drf)
   --[no-]help Prints this help message (default: false)
   --ip=VALUE  IP address to listen on
   --log_dir=VALUE Location to put log files (no default, 
 nothing
   is written to disk unless specified;
   does not affect logging to stderr)
   --logbufsecs=VALUE  How many seconds to buffer log messages for 
 (default: 0)
   --port=VALUEPort to listen on (default: 5050)
   --[no-]quietDisable logging to stderr (default: false)
   --[no-]root_submissions Can root submit frameworks? (default: true)
   --slaves=VALUE  Initial slaves that should be
   considered part of this cluster
   (or if using ZooKeeper a URL) (default: *)
   --user_sorter=VALUE Policy to use for allocating resources
   between users. May be one of:
 dominant_resource_fairness (drf) 
 (default: drf)
   --webui_dir=VALUE   Location of the webui files/assets 
 (default: /usr/local/share/mesos/webui)
   --whitelist=VALUE   Path to a file with a list of slaves
   (one per line) to advertise offers for;
   should be of the form: file://path/to/file 
 (default: *)
   --zk=VALUE  ZooKeeper URL (used for leader election 
 amongst masters)
   May be one of:
 zk://host1:port1,host2:port2,.../path
 
 zk://username:password@host1:port1,host2:port2,.../path
 file://path/to/file (where file contains 
 one of the above) (default: )
 {RED}Master crashed; failing test
 /home/vinod/mesos/src/tests/balloon_framework_test.sh: line 31: kill: (31574) 
 - No such process
 ../../src/tests/script.cpp:76: Failure
 Failed
 balloon_framework_test.sh exited with status 2
 [  FAILED  ] CgroupsIsolatorTest.ROOT_CGROUPS_BalloonFramework (2031 ms)
 [--] 1 test from CgroupsIsolatorTest (2031 ms total)
 [--] Global test environment tear-down
 [==] 1 test from 1 test case ran. (2031 ms total)
 [  PASSED  ] 0 tests.
 [  FAILED  ] 1 test, listed below:
 [  FAILED  ] CgroupsIsolatorTest.ROOT_CGROUPS_BalloonFramework
  1 FAILED TEST



--
This message was sent by Atlassian JIRA

[jira] [Updated] (MESOS-1718) Command executor can overcommit the slave.

2014-11-03 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1718:
-
Story Points: 3

 Command executor can overcommit the slave.
 --

 Key: MESOS-1718
 URL: https://issues.apache.org/jira/browse/MESOS-1718
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Benjamin Mahler
Assignee: Ian Downes

 Currently we give a small amount of resources to the command executor, in 
 addition to resources used by the command task:
 https://github.com/apache/mesos/blob/0.20.0-rc1/src/slave/slave.cpp#L2448
 {code: title=}
 ExecutorInfo Slave::getExecutorInfo(
 const FrameworkID frameworkId,
 const TaskInfo task)
 {
   ...
 // Add an allowance for the command executor. This does lead to a
 // small overcommit of resources.
 executor.mutable_resources()-MergeFrom(
 Resources::parse(
   cpus: + stringify(DEFAULT_EXECUTOR_CPUS) + ; +
   mem: + stringify(DEFAULT_EXECUTOR_MEM.megabytes())).get());
   ...
 }
 {code}
 This leads to an overcommit of the slave. Ideally, for command tasks we can 
 transfer all of the task resources to the executor at the slave / isolation 
 level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2017) Segfault with Pure virtual method called when tests fail

2014-11-03 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-2017:
-
Story Points: 5

 Segfault with Pure virtual method called when tests fail
 --

 Key: MESOS-2017
 URL: https://issues.apache.org/jira/browse/MESOS-2017
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 0.21.0
Reporter: Yan Xu
Assignee: Yan Xu
  Labels: twitter

 The most recent one:
 {noformat:title=DRFAllocatorTest.DRFAllocatorProcess}
 [ RUN  ] DRFAllocatorTest.DRFAllocatorProcess
 Using temporary directory '/tmp/DRFAllocatorTest_DRFAllocatorProcess_BI905j'
 I1030 05:55:06.934813 24459 leveldb.cpp:176] Opened db in 3.175202ms
 I1030 05:55:06.935925 24459 leveldb.cpp:183] Compacted db in 1.077924ms
 I1030 05:55:06.935976 24459 leveldb.cpp:198] Created db iterator in 16460ns
 I1030 05:55:06.935995 24459 leveldb.cpp:204] Seeked to beginning of db in 
 2018ns
 I1030 05:55:06.936005 24459 leveldb.cpp:273] Iterated through 0 keys in the 
 db in 335ns
 I1030 05:55:06.936039 24459 replica.cpp:741] Replica recovered with log 
 positions 0 - 0 with 1 holes and 0 unlearned
 I1030 05:55:06.936705 24480 recover.cpp:437] Starting replica recovery
 I1030 05:55:06.937023 24480 recover.cpp:463] Replica is in EMPTY status
 I1030 05:55:06.938158 24475 replica.cpp:638] Replica in EMPTY status received 
 a broadcasted recover request
 I1030 05:55:06.938859 24482 recover.cpp:188] Received a recover response from 
 a replica in EMPTY status
 I1030 05:55:06.939486 24474 recover.cpp:554] Updating replica status to 
 STARTING
 I1030 05:55:06.940249 24489 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 591981ns
 I1030 05:55:06.940274 24489 replica.cpp:320] Persisted replica status to 
 STARTING
 I1030 05:55:06.940752 24481 recover.cpp:463] Replica is in STARTING status
 I1030 05:55:06.940820 24489 master.cpp:312] Master 
 20141030-055506-3142697795-40429-24459 (pomona.apache.org) started on 
 67.195.81.187:40429
 I1030 05:55:06.940871 24489 master.cpp:358] Master only allowing 
 authenticated frameworks to register
 I1030 05:55:06.940891 24489 master.cpp:363] Master only allowing 
 authenticated slaves to register
 I1030 05:55:06.940908 24489 credentials.hpp:36] Loading credentials for 
 authentication from 
 '/tmp/DRFAllocatorTest_DRFAllocatorProcess_BI905j/credentials'
 I1030 05:55:06.941215 24489 master.cpp:392] Authorization enabled
 I1030 05:55:06.941751 24475 master.cpp:120] No whitelist given. Advertising 
 offers for all slaves
 I1030 05:55:06.942227 24474 replica.cpp:638] Replica in STARTING status 
 received a broadcasted recover request
 I1030 05:55:06.942401 24476 hierarchical_allocator_process.hpp:299] 
 Initializing hierarchical allocator process with master : 
 master@67.195.81.187:40429
 I1030 05:55:06.942895 24483 recover.cpp:188] Received a recover response from 
 a replica in STARTING status
 I1030 05:55:06.943035 24474 master.cpp:1242] The newly elected leader is 
 master@67.195.81.187:40429 with id 20141030-055506-3142697795-40429-24459
 I1030 05:55:06.943063 24474 master.cpp:1255] Elected as the leading master!
 I1030 05:55:06.943079 24474 master.cpp:1073] Recovering from registrar
 I1030 05:55:06.943313 24480 registrar.cpp:313] Recovering registrar
 I1030 05:55:06.943455 24475 recover.cpp:554] Updating replica status to VOTING
 I1030 05:55:06.944144 24474 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 536365ns
 I1030 05:55:06.944172 24474 replica.cpp:320] Persisted replica status to 
 VOTING
 I1030 05:55:06.944355 24489 recover.cpp:568] Successfully joined the Paxos 
 group
 I1030 05:55:06.944576 24489 recover.cpp:452] Recover process terminated
 I1030 05:55:06.945155 24486 log.cpp:656] Attempting to start the writer
 I1030 05:55:06.947013 24473 replica.cpp:474] Replica received implicit 
 promise request with proposal 1
 I1030 05:55:06.947854 24473 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 806463ns
 I1030 05:55:06.947883 24473 replica.cpp:342] Persisted promised to 1
 I1030 05:55:06.948547 24481 coordinator.cpp:230] Coordinator attemping to 
 fill missing position
 I1030 05:55:06.950269 24479 replica.cpp:375] Replica received explicit 
 promise request for position 0 with proposal 2
 I1030 05:55:06.950933 24479 leveldb.cpp:343] Persisting action (8 bytes) to 
 leveldb took 603843ns
 I1030 05:55:06.950961 24479 replica.cpp:676] Persisted action at 0
 I1030 05:55:06.952180 24476 replica.cpp:508] Replica received write request 
 for position 0
 I1030 05:55:06.952239 24476 leveldb.cpp:438] Reading position from leveldb 
 took 28437ns
 I1030 05:55:06.952896 24476 leveldb.cpp:343] Persisting action (14 bytes) to 
 leveldb took 623980ns
 I1030 05:55:06.952926 24476 replica.cpp:676] 

[jira] [Updated] (MESOS-2032) Update Maintenance design to account for persistent resources.

2014-11-03 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-2032:
-
Sprint: Twitter Mesos Q4 Sprint 3

 Update Maintenance design to account for persistent resources.
 --

 Key: MESOS-2032
 URL: https://issues.apache.org/jira/browse/MESOS-2032
 Project: Mesos
  Issue Type: Task
  Components: framework, master, slave
Reporter: Benjamin Mahler
Assignee: Benjamin Mahler

 With persistent resources and dynamic reservations, frameworks need to know 
 how long the resources will be unavailable for maintenance operations.
 This is because for persistent resources, the framework needs to understand 
 how long the persistent resource will be unavailable. For example, if there 
 will be a 10 minute reboot for a kernel upgrade, the framework will not want 
 to re-replicate all of it's persistent data on the machine. Rather, 
 tolerating one unavailable replica for the maintenance window would be 
 preferred.
 I'd like to do a revisit of the design to ensure it works well for persistent 
 resources as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2034) Documentation for isolator namespaces/pid.

2014-11-03 Thread Ian Downes (JIRA)
Ian Downes created MESOS-2034:
-

 Summary: Documentation for isolator namespaces/pid.
 Key: MESOS-2034
 URL: https://issues.apache.org/jira/browse/MESOS-2034
 Project: Mesos
  Issue Type: Documentation
Affects Versions: 0.21.0
Reporter: Ian Downes
Assignee: Ian Downes






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2033) Documentation for isolator filesystem/shared.

2014-11-03 Thread Ian Downes (JIRA)
Ian Downes created MESOS-2033:
-

 Summary: Documentation for isolator filesystem/shared.
 Key: MESOS-2033
 URL: https://issues.apache.org/jira/browse/MESOS-2033
 Project: Mesos
  Issue Type: Documentation
Affects Versions: 0.21.0
Reporter: Ian Downes
Assignee: Ian Downes






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1941) Make executor's user owner of executor's cgroup directory

2014-11-03 Thread Ian Downes (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Downes updated MESOS-1941:
--
Labels: twitter  (was: )

 Make executor's user owner of executor's cgroup directory
 -

 Key: MESOS-1941
 URL: https://issues.apache.org/jira/browse/MESOS-1941
 Project: Mesos
  Issue Type: Improvement
  Components: isolation, slave
Reporter: Mohit Soni
Assignee: Ian Downes
Priority: Minor
  Labels: twitter

 Currently, when cgroups are enabled, and executor is spawned, it's mounted 
 under, for ex: /sys/fs/cgroup/cpu/mesos/mesos-id. This directory in current 
 implementation is only writable by root user. This prevents process launched 
 by executor to mount its child processes under this cgroup, because the 
 cgroup directory is only writable by root.
 To enable a executor spawned process to mount it's child processes under it's 
 cgroup directory, the cgroup directory should be made writable by the user 
 which spawns the executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2036) Fix the Json format for the --modules and update the help message

2014-11-03 Thread Kapil Arya (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195370#comment-14195370
 ] 

Kapil Arya commented on MESOS-2036:
---

Updated the Json format and adjusted the --modules help message accordingly.

https://reviews.apache.org/r/27481

 Fix the Json format for the --modules and update the help message
 -

 Key: MESOS-2036
 URL: https://issues.apache.org/jira/browse/MESOS-2036
 Project: Mesos
  Issue Type: Bug
Reporter: Kapil Arya
Assignee: Kapil Arya
Priority: Blocker

 The Json format for specifying module-specific parameters is not correctly 
 reflected in the help message. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2025) OsTest.killtreeNoRoot: Process reparent assumes new parent is init pid 1

2014-11-03 Thread Benjamin Hindman (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Hindman updated MESOS-2025:

Sprint: Mesosphere Q4 Sprint 2  (was: Mesosphere Q4 Sprint 1 10/31)

 OsTest.killtreeNoRoot: Process reparent assumes new parent is init pid 1
 

 Key: MESOS-2025
 URL: https://issues.apache.org/jira/browse/MESOS-2025
 Project: Mesos
  Issue Type: Bug
  Components: stout
 Environment: Ubuntu 14.04 with graphical interface
Reporter: Joris Van Remoortere
Assignee: Joris Van Remoortere
Priority: Minor

 Reparenting does not always assign pid 1 (/sbin/init). If there is a user 
 init such as init --user with some other pid, this will be the new parent.
 Modify os_tests to check up the parent tree, and succeed if there is a path 
 to pid 1 without zombies along the way.
 This is not the cleanest fix, but I'm having trouble finding a way to find 
 the appropriate init to check for.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1991) Remove dynamic allocation from Option

2014-11-03 Thread Benjamin Hindman (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Hindman updated MESOS-1991:

Sprint: Mesosphere Q4 Sprint 2  (was: Mesosphere Q4 Sprint 1 10/31)

 Remove dynamic allocation from Option
 -

 Key: MESOS-1991
 URL: https://issues.apache.org/jira/browse/MESOS-1991
 Project: Mesos
  Issue Type: Improvement
  Components: stout
Reporter: Joris Van Remoortere
Assignee: Joris Van Remoortere
Priority: Minor

 Remove dynamic allocations from Option class.
 The motivation for this is 3-fold:
 1. Reduce dynamic allocations. These can cause latency jitter as process 
 lifetime grows. This kind of jitter can make it hard to grasp the upper bound 
 of latency on certain operations under locks. This modification only moves 
 the allocated space of T, it does not reduce or increase the number of actual 
 construction / move calls unless the new move constructor is used.
 2. The commonly understood implication of Optional / Option / Nullable is 
 that it augments the type field by 1 bit in order to allow representation of 
 an unknown or null state. This is handy in cases where a type such as int64_t 
 fully utilizes its 64 bit storage space, and representing unknown would 
 otherwise require us to steal a number (such as INT64_MAX). This class should 
 not take on the additional responsibility of managing memory for the 
 augmented type.
 3. It can be very deceptive to a newcomer when Optionint64_t does a dynamic 
 allocation. Intuitively you would not expect a type such as int64_t to do a 
 dynamic allocation or be expensive to copy. Naturally OptionBigType would 
 be expected to be expensive to copy, and so a developer would be more 
 inclined to do something like std::shared_ptrOptionBigType.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1316) Implement decent unit test coverage for the mesos-fetcher tool

2014-11-03 Thread Benjamin Hindman (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Hindman updated MESOS-1316:

Sprint: Q3 Sprint 1, Q3 Sprint 2, Mesosphere Q4 Sprint 2  (was: Q3 Sprint 
1, Q3 Sprint 2, Mesosphere Q4 Sprint 1 10/31)

 Implement decent unit test coverage for the mesos-fetcher tool
 --

 Key: MESOS-1316
 URL: https://issues.apache.org/jira/browse/MESOS-1316
 Project: Mesos
  Issue Type: Improvement
  Components: technical debt, test
Reporter: Tom Arnfeld
Assignee: Bernd Mathiske

 There are current no tests that cover the {{mesos-fetcher}} tool itself, and 
 hence bugs like MESOS-1313 have accidentally slipped though.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2010) Libprocess: Introduce enable_shared_from_this

2014-11-03 Thread Benjamin Hindman (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Hindman updated MESOS-2010:

Sprint: Mesosphere Q4 Sprint 2  (was: Mesosphere Q4 Sprint 1 10/31)

 Libprocess: Introduce enable_shared_from_this
 -

 Key: MESOS-2010
 URL: https://issues.apache.org/jira/browse/MESOS-2010
 Project: Mesos
  Issue Type: Improvement
  Components: libprocess
Reporter: Joris Van Remoortere
Assignee: Joris Van Remoortere

 add enable_shared_from_this to the configure check



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1330) Introduce stream abstraction to libprocess

2014-11-03 Thread Benjamin Hindman (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Hindman updated MESOS-1330:

Sprint: Mesosphere Q4 Sprint 2  (was: Mesosphere Q4 Sprint 1 10/31)

 Introduce stream abstraction to libprocess
 --

 Key: MESOS-1330
 URL: https://issues.apache.org/jira/browse/MESOS-1330
 Project: Mesos
  Issue Type: Task
  Components: general, libprocess
Reporter: Niklas Quarfot Nielsen
Assignee: Joris Van Remoortere
  Labels: libprocess, network

 I think it makes sense to think in terms of different low or middle layer 
 transports (which can accommodate channels like SSL). We could capture 
 connection life-cycles and network send/receive primitives in a much explicit 
 manner than currently in libprocess.
 I have a proof of concept transport / connection abstraction ready and which 
 we can use to iterate a design.
 Notably, there are opportunities to change the current SocketManager/Socket 
 abstractions to explicit ConnectionManager/Connection, which allow several 
 and composeable communication layers.
 I am proposing to own this ticket and am looking for a shepherd to 
 (thoroughly) go over design considerations before jumping into an actual 
 implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2009) Libprocess: Introduce mutex

2014-11-03 Thread Benjamin Hindman (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Hindman updated MESOS-2009:

Sprint: Mesosphere Q4 Sprint 2  (was: Mesosphere Q4 Sprint 1 10/31)

 Libprocess: Introduce mutex
 ---

 Key: MESOS-2009
 URL: https://issues.apache.org/jira/browse/MESOS-2009
 Project: Mesos
  Issue Type: Improvement
  Components: libprocess
Reporter: Joris Van Remoortere
Assignee: Joris Van Remoortere

 add mutex to the configure check



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1571) Signal escalation timeout is not configurable

2014-11-03 Thread Benjamin Hindman (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Hindman updated MESOS-1571:

Sprint: Mesosphere Q4 Sprint 2  (was: Mesosphere Q4 Sprint 1 10/31)

 Signal escalation timeout is not configurable
 -

 Key: MESOS-1571
 URL: https://issues.apache.org/jira/browse/MESOS-1571
 Project: Mesos
  Issue Type: Bug
Reporter: Niklas Quarfot Nielsen
Assignee: Alexander Rukletsov

 Even though the executor shutdown grace period is set to a larger interval, 
 the signal escalation timeout will still be 3 seconds. It should either be 
 configurable or dependent on EXECUTOR_SHUTDOWN_GRACE_PERIOD.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2011) Introduce mutex

2014-11-03 Thread Benjamin Hindman (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Hindman updated MESOS-2011:

Sprint: Mesosphere Q4 Sprint 2  (was: Mesosphere Q4 Sprint 1 10/31)

 Introduce mutex
 ---

 Key: MESOS-2011
 URL: https://issues.apache.org/jira/browse/MESOS-2011
 Project: Mesos
  Issue Type: Improvement
Reporter: Joris Van Remoortere
Assignee: Joris Van Remoortere

 * add mutex to the configure check
 * document use of mutex in style guide



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1316) Implement decent unit test coverage for the mesos-fetcher tool

2014-11-03 Thread Benjamin Hindman (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Hindman updated MESOS-1316:

Sprint: Q3 Sprint 1, Q3 Sprint 2, Mesosphere Q4 Sprint 1 10/31  (was: Q3 
Sprint 1, Q3 Sprint 2, Mesosphere Q4 Sprint 2)

 Implement decent unit test coverage for the mesos-fetcher tool
 --

 Key: MESOS-1316
 URL: https://issues.apache.org/jira/browse/MESOS-1316
 Project: Mesos
  Issue Type: Improvement
  Components: technical debt, test
Reporter: Tom Arnfeld
Assignee: Bernd Mathiske

 There are current no tests that cover the {{mesos-fetcher}} tool itself, and 
 hence bugs like MESOS-1313 have accidentally slipped though.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1316) Implement decent unit test coverage for the mesos-fetcher tool

2014-11-03 Thread Benjamin Hindman (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Hindman updated MESOS-1316:

Sprint: Q3 Sprint 1, Q3 Sprint 2, Mesosphere Q4 Sprint 1 10/31, Mesosphere 
Q4 Sprint 2  (was: Q3 Sprint 1, Q3 Sprint 2, Mesosphere Q4 Sprint 1 10/31)

 Implement decent unit test coverage for the mesos-fetcher tool
 --

 Key: MESOS-1316
 URL: https://issues.apache.org/jira/browse/MESOS-1316
 Project: Mesos
  Issue Type: Improvement
  Components: technical debt, test
Reporter: Tom Arnfeld
Assignee: Bernd Mathiske

 There are current no tests that cover the {{mesos-fetcher}} tool itself, and 
 hence bugs like MESOS-1313 have accidentally slipped though.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2035) Add reason to containerizer proto Termination

2014-11-03 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-2035:
-
Description: When an isolator kills a task, the reason is unknown. As part 
of MESOS-1830, the reason is set to a general one but ideally we would have the 
termination reason to pass through to the status update.  (was: When an 
isolator kills a task, the reason is unknown. As part of MESOS-1830, the reason 
is set to a general one but ideally we would have the termination reason to 
pass through to the status update. We could also differentiate a bad command 
(using the Command executor) from a termination from an isolator.)

 Add reason to containerizer proto Termination
 -

 Key: MESOS-2035
 URL: https://issues.apache.org/jira/browse/MESOS-2035
 Project: Mesos
  Issue Type: Improvement
  Components: slave
Affects Versions: 0.21.0
Reporter: Dominic Hamon
Assignee: Dominic Hamon
Priority: Minor

 When an isolator kills a task, the reason is unknown. As part of MESOS-1830, 
 the reason is set to a general one but ideally we would have the termination 
 reason to pass through to the status update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1941) Make executor's user owner of executor's cgroup directory

2014-11-03 Thread Ian Downes (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195491#comment-14195491
 ] 

Ian Downes commented on MESOS-1941:
---

https://reviews.apache.org/r/27557/
https://reviews.apache.org/r/27558/

 Make executor's user owner of executor's cgroup directory
 -

 Key: MESOS-1941
 URL: https://issues.apache.org/jira/browse/MESOS-1941
 Project: Mesos
  Issue Type: Improvement
  Components: isolation, slave
Reporter: Mohit Soni
Assignee: Ian Downes
Priority: Minor
  Labels: twitter

 Currently, when cgroups are enabled, and executor is spawned, it's mounted 
 under, for ex: /sys/fs/cgroup/cpu/mesos/mesos-id. This directory in current 
 implementation is only writable by root user. This prevents process launched 
 by executor to mount its child processes under this cgroup, because the 
 cgroup directory is only writable by root.
 To enable a executor spawned process to mount it's child processes under it's 
 cgroup directory, the cgroup directory should be made writable by the user 
 which spawns the executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2037) Update docs/configuration.md

2014-11-03 Thread Kapil Arya (JIRA)
Kapil Arya created MESOS-2037:
-

 Summary: Update docs/configuration.md
 Key: MESOS-2037
 URL: https://issues.apache.org/jira/browse/MESOS-2037
 Project: Mesos
  Issue Type: Documentation
Reporter: Kapil Arya
Assignee: Kapil Arya
Priority: Blocker


Update documentation for configuration flags (docs/configuration.md) to reflect 
the current state.https://reviews.apache.org/r/27556/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2025) OsTest.killtreeNoRoot: Process reparent assumes new parent is init pid 1

2014-11-03 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2025:
--
Target Version/s: 0.21.0

 OsTest.killtreeNoRoot: Process reparent assumes new parent is init pid 1
 

 Key: MESOS-2025
 URL: https://issues.apache.org/jira/browse/MESOS-2025
 Project: Mesos
  Issue Type: Bug
  Components: stout
 Environment: Ubuntu 14.04 with graphical interface
Reporter: Joris Van Remoortere
Assignee: Joris Van Remoortere
Priority: Minor

 Reparenting does not always assign pid 1 (/sbin/init). If there is a user 
 init such as init --user with some other pid, this will be the new parent.
 Modify os_tests to check that the subtree has been reparented to a process 
 different from its original parent (a.k.a. child) and that it is not a zombie.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1981) Create docs/modules.md to record module API changes

2014-11-03 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195568#comment-14195568
 ] 

Till Toenshoff commented on MESOS-1981:
---

Given that Niklas is currently unavailable, I will take the freedom to commit 
this now.

 Create docs/modules.md to record module API changes
 ---

 Key: MESOS-1981
 URL: https://issues.apache.org/jira/browse/MESOS-1981
 Project: Mesos
  Issue Type: Bug
Reporter: Kapil Arya
Assignee: Kapil Arya

 The docs/modules.md file keep a history of all module API changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-1950) Add module writers guide

2014-11-03 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya reassigned MESOS-1950:
-

Assignee: Kapil Arya

 Add module writers guide
 

 Key: MESOS-1950
 URL: https://issues.apache.org/jira/browse/MESOS-1950
 Project: Mesos
  Issue Type: Documentation
  Components: modules
Reporter: Niklas Quarfot Nielsen
Assignee: Kapil Arya
Priority: Critical

 Similar to Apache Webserver's Developing Modules guide 
 (http://httpd.apache.org/docs/2.4/developer/modguide.html), we should write 
 up a comprehensive guide to writing robust modules.
 I started a draft here: 
 https://cwiki.apache.org/confluence/display/MESOS/Mesos+Modules+Developer+Guide
 It should be completed and/or copied (or moved) to docs/modules.md. There may 
 be usefulness for both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1937) Create a document explaining the --modules flag

2014-11-03 Thread Kapil Arya (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195570#comment-14195570
 ] 

Kapil Arya commented on MESOS-1937:
---

RR: https://reviews.apache.org/r/27453/

 Create a document explaining the --modules flag
 ---

 Key: MESOS-1937
 URL: https://issues.apache.org/jira/browse/MESOS-1937
 Project: Mesos
  Issue Type: Documentation
Reporter: Kapil Arya
Assignee: Kapil Arya
Priority: Blocker

 As the protobuf/Json for --modules is evolving, it is harder to explain 
 everything in the command-line help.  We should create a man page sort of 
 document that explain all the intricacies of the --modules flag and refer to 
 the document in the command-line help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1950) Add module writers guide

2014-11-03 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya updated MESOS-1950:
--
Shepherd: Till Toenshoff

 Add module writers guide
 

 Key: MESOS-1950
 URL: https://issues.apache.org/jira/browse/MESOS-1950
 Project: Mesos
  Issue Type: Documentation
  Components: modules
Reporter: Niklas Quarfot Nielsen
Assignee: Kapil Arya
Priority: Critical

 Similar to Apache Webserver's Developing Modules guide 
 (http://httpd.apache.org/docs/2.4/developer/modguide.html), we should write 
 up a comprehensive guide to writing robust modules.
 I started a draft here: 
 https://cwiki.apache.org/confluence/display/MESOS/Mesos+Modules+Developer+Guide
 It should be completed and/or copied (or moved) to docs/modules.md. There may 
 be usefulness for both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1950) Add module writers guide

2014-11-03 Thread Kapil Arya (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195574#comment-14195574
 ] 

Kapil Arya commented on MESOS-1950:
---

RR: https://reviews.apache.org/r/27453/


 Add module writers guide
 

 Key: MESOS-1950
 URL: https://issues.apache.org/jira/browse/MESOS-1950
 Project: Mesos
  Issue Type: Documentation
  Components: modules
Reporter: Niklas Quarfot Nielsen
Assignee: Kapil Arya
Priority: Critical

 Similar to Apache Webserver's Developing Modules guide 
 (http://httpd.apache.org/docs/2.4/developer/modguide.html), we should write 
 up a comprehensive guide to writing robust modules.
 I started a draft here: 
 https://cwiki.apache.org/confluence/display/MESOS/Mesos+Modules+Developer+Guide
 It should be completed and/or copied (or moved) to docs/modules.md. There may 
 be usefulness for both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1950) Add module writers guide

2014-11-03 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya updated MESOS-1950:
--
Target Version/s: 0.21.0

 Add module writers guide
 

 Key: MESOS-1950
 URL: https://issues.apache.org/jira/browse/MESOS-1950
 Project: Mesos
  Issue Type: Documentation
  Components: modules
Reporter: Niklas Quarfot Nielsen
Assignee: Kapil Arya
Priority: Critical

 Similar to Apache Webserver's Developing Modules guide 
 (http://httpd.apache.org/docs/2.4/developer/modguide.html), we should write 
 up a comprehensive guide to writing robust modules.
 I started a draft here: 
 https://cwiki.apache.org/confluence/display/MESOS/Mesos+Modules+Developer+Guide
 It should be completed and/or copied (or moved) to docs/modules.md. There may 
 be usefulness for both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2001) Authenticatee modules similar to Authenticator modules

2014-11-03 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2001:
--
Sprint: Mesosphere Q4 Sprint 1 10/31, Mesosphere Q4 Sprint 2  (was: 
Mesosphere Q4 Sprint 1 10/31)

 Authenticatee modules similar to Authenticator modules
 --

 Key: MESOS-2001
 URL: https://issues.apache.org/jira/browse/MESOS-2001
 Project: Mesos
  Issue Type: Epic
  Components: modules
Reporter: Till Toenshoff
  Labels: authentication, module

 For covering a complete modules based authentication, we will need to allow 
 for authenticatee modules just like we are with authenticator modules.
 h4.Motivation
 Allow for third parties to quickly develop and plug-in new authentication 
 methods. The modularized Authenticatee API will lower the barrier for the 
 community to provide new methods to Mesos. An example for such additional, 
 next step module could be PAM (LDAP, MySQL, NIS, UNIX) backed authentication. 
 cyrus-sasl2 itself already offers more than a half a dozen mechanisms via its 
 standard plugins and these could be triggered by additional Authenticator / 
 Authenticatee modules. cyrus-sasl2 does support even more mechanisms when 
 being custom built (about a full dozen) but we do not want to bundle 
 cyrus-sasl2 to enforce custom builds. Alternative authentication (especially 
 non-SASL based) methods may bring in new dependencies that we don't want to 
 enforce on all of our users. Mesos users may be required to use custom 
 authentication techniques due to strict security policies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)