[jira] [Commented] (MESOS-1416) mesos-0.19.0 build directory is read-only

2014-10-08 Thread Da Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163405#comment-14163405
 ] 

Da Ma commented on MESOS-1416:
--

Hi team,

Would you share the steps to reproduce this issue? I'm a newer of Mesos :).

Thanks
Da Ma

 mesos-0.19.0 build directory is read-only
 -

 Key: MESOS-1416
 URL: https://issues.apache.org/jira/browse/MESOS-1416
 Project: Mesos
  Issue Type: Bug
  Components: build
 Environment: Ubuntu 13.10
Reporter: Vinson Lee
Priority: Blocker

 The build creates a read-only mesos-0.19.0 directory. This blocks Jenkins 
 builds because the workspace cannot be automatically cleaned by the git 
 plugin.
 {noformat}
 [...]
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/gzip.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/fatal.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/linkedhashmap.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/protobuf.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/foreach.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/memory.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/hashset.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/format.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/error.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/uuid.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/net.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/numify.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp
 [...]
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1871) Sending SIGTERM to a task command may render it orphaned

2014-10-08 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-1871:
---
Description: 
{{CommandExecutor}} launches tasks wrapping them into {{sh -c}}. That means 
signals are sent to the top process—that is {{sh -c}}—and not to the task 
directly. Though {{SIGTERM}} is propagated by {{sh -c}} down the process tree, 
if the task is unresponsive to {{SIGTERM}}, {{sh -c}} terminates reporting 
success to the {{CommandExecutor}}, rendering the task detached from the parent 
process and still running. Because the {{CommandExecutor}} thinks the command 
terminated normally, its OS process exits normally and may not trigger 
containerizer's escalation which destroys cgroups.

Here is the test related to this issue: 
[https://gist.github.com/rukletsov/3f19ecc7389fa51e65c0].

  was:
{{CommandExecutor}} launches tasks wrapping them into {{sh -c}}. That means 
signals are sent to the top process—that is {{sh -c}}—and not to the task 
directly. Though {{SIGTERM}} is propagated by {{sh -c}} down the process tree, 
if the task is unresponsive to {{SIGTERM}}, {{sh -c}} terminates reporting 
success to the {{CommandExecutor}}, rendering the task detached from the parent 
process and still running. Because the {{CommandExecutor}} thinks the command 
terminated normally, its OS process exits normally and may not trigger 
containerizer's escalation which destroys cgroups.

Here is the test related to this issue: 
[https://gist.github.com/rukletsov/3f19ecc7389fa51e65c0]. As expected, it fails 
on Linux, but surprisingly, it works on Mac OS 10.9.4.


 Sending SIGTERM to a task command may render it orphaned
 

 Key: MESOS-1871
 URL: https://issues.apache.org/jira/browse/MESOS-1871
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Alexander Rukletsov

 {{CommandExecutor}} launches tasks wrapping them into {{sh -c}}. That means 
 signals are sent to the top process—that is {{sh -c}}—and not to the task 
 directly. Though {{SIGTERM}} is propagated by {{sh -c}} down the process 
 tree, if the task is unresponsive to {{SIGTERM}}, {{sh -c}} terminates 
 reporting success to the {{CommandExecutor}}, rendering the task detached 
 from the parent process and still running. Because the {{CommandExecutor}} 
 thinks the command terminated normally, its OS process exits normally and may 
 not trigger containerizer's escalation which destroys cgroups.
 Here is the test related to this issue: 
 [https://gist.github.com/rukletsov/3f19ecc7389fa51e65c0].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1871) Sending SIGTERM to a task command may render it orphaned

2014-10-08 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-1871:
---
Description: 
{{CommandExecutor}} launches tasks wrapping them into {{sh -c}}. That means 
signals are sent to the top process—that is {{sh -c}}—and not to the task 
directly. Though {{SIGTERM}} is propagated by {{sh -c}} down the process tree, 
if the task is unresponsive to {{SIGTERM}}, {{sh -c}} terminates reporting 
success to the {{CommandExecutor}}, rendering the task detached from the parent 
process and still running. Because the {{CommandExecutor}} thinks the command 
terminated normally, its OS process exits normally and may not trigger 
containerizer's escalation which destroys cgroups.

Here is the test related to the first part: 
[https://gist.github.com/rukletsov/68259dfb02421813f9e6].
Here is the test related to the second part: 
[https://gist.github.com/rukletsov/3f19ecc7389fa51e65c0].

  was:
{{CommandExecutor}} launches tasks wrapping them into {{sh -c}}. That means 
signals are sent to the top process—that is {{sh -c}}—and not to the task 
directly. Though {{SIGTERM}} is propagated by {{sh -c}} down the process tree, 
if the task is unresponsive to {{SIGTERM}}, {{sh -c}} terminates reporting 
success to the {{CommandExecutor}}, rendering the task detached from the parent 
process and still running. Because the {{CommandExecutor}} thinks the command 
terminated normally, its OS process exits normally and may not trigger 
containerizer's escalation which destroys cgroups.

Here is the test related to this issue: 
[https://gist.github.com/rukletsov/3f19ecc7389fa51e65c0].


 Sending SIGTERM to a task command may render it orphaned
 

 Key: MESOS-1871
 URL: https://issues.apache.org/jira/browse/MESOS-1871
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Alexander Rukletsov

 {{CommandExecutor}} launches tasks wrapping them into {{sh -c}}. That means 
 signals are sent to the top process—that is {{sh -c}}—and not to the task 
 directly. Though {{SIGTERM}} is propagated by {{sh -c}} down the process 
 tree, if the task is unresponsive to {{SIGTERM}}, {{sh -c}} terminates 
 reporting success to the {{CommandExecutor}}, rendering the task detached 
 from the parent process and still running. Because the {{CommandExecutor}} 
 thinks the command terminated normally, its OS process exits normally and may 
 not trigger containerizer's escalation which destroys cgroups.
 Here is the test related to the first part: 
 [https://gist.github.com/rukletsov/68259dfb02421813f9e6].
 Here is the test related to the second part: 
 [https://gist.github.com/rukletsov/3f19ecc7389fa51e65c0].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1871) Sending SIGTERM to a task command may render it orphaned

2014-10-08 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163607#comment-14163607
 ] 

Alexander Rukletsov commented on MESOS-1871:


It looks like this issue consists of two parts.

1. If CommandExecutor starts a task via {{sh -c}}, we reap the wrong process. 
Instead of reaping {{sh -c}} it makes sense to monitor and reap the actual task 
process, or the whole process tree rooted at {{sh -c}}, i.e. call {{reaped()}} 
only when all process in the tree terminate. Otherwise—as illustrated by the 
test in the description—{{reaped()}} happily disables escalation leaving the 
task process orphaned in the system.

2. In case we manage to enter {{escalated()}} callback, we should ensure all 
child of {{sh -c}} receive {{SIGKILL}}. I'm not sure current implementation via 
{{os::killtree}} provides such a guarantee.

As proposed by [~idownes], POSIX process groups might be a solution and reap 
the whole group. However, it would be still nice to obtain an OS pid of the 
task process, in order to deliver in status updates messages related to the 
task process, and not to the wrapper {{sh -c}}.

 Sending SIGTERM to a task command may render it orphaned
 

 Key: MESOS-1871
 URL: https://issues.apache.org/jira/browse/MESOS-1871
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Alexander Rukletsov

 {{CommandExecutor}} launches tasks wrapping them into {{sh -c}}. That means 
 signals are sent to the top process—that is {{sh -c}}—and not to the task 
 directly. Though {{SIGTERM}} is propagated by {{sh -c}} down the process 
 tree, if the task is unresponsive to {{SIGTERM}}, {{sh -c}} terminates 
 reporting success to the {{CommandExecutor}}, rendering the task detached 
 from the parent process and still running. Because the {{CommandExecutor}} 
 thinks the command terminated normally, its OS process exits normally and may 
 not trigger containerizer's escalation which destroys cgroups.
 Here is the test related to the first part: 
 [https://gist.github.com/rukletsov/68259dfb02421813f9e6].
 Here is the test related to the second part: 
 [https://gist.github.com/rukletsov/3f19ecc7389fa51e65c0].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-1871) Sending SIGTERM to a task command may render it orphaned

2014-10-08 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov reassigned MESOS-1871:
--

Assignee: Alexander Rukletsov

 Sending SIGTERM to a task command may render it orphaned
 

 Key: MESOS-1871
 URL: https://issues.apache.org/jira/browse/MESOS-1871
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Alexander Rukletsov
Assignee: Alexander Rukletsov

 {{CommandExecutor}} launches tasks wrapping them into {{sh -c}}. That means 
 signals are sent to the top process—that is {{sh -c}}—and not to the task 
 directly. Though {{SIGTERM}} is propagated by {{sh -c}} down the process 
 tree, if the task is unresponsive to {{SIGTERM}}, {{sh -c}} terminates 
 reporting success to the {{CommandExecutor}}, rendering the task detached 
 from the parent process and still running. Because the {{CommandExecutor}} 
 thinks the command terminated normally, its OS process exits normally and may 
 not trigger containerizer's escalation which destroys cgroups.
 Here is the test related to the first part: 
 [https://gist.github.com/rukletsov/68259dfb02421813f9e6].
 Here is the test related to the second part: 
 [https://gist.github.com/rukletsov/3f19ecc7389fa51e65c0].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-156) Create framework that provides a high level resource request language

2014-10-08 Thread Jay Buffington (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163620#comment-14163620
 ] 

Jay Buffington commented on MESOS-156:
--

It looks like this was opened before Aurora and Marathon were open sourced.  I 
suspect these frameworks meet your needs.  Can this Jira be closed?

 Create framework that provides a high level resource request language
 -

 Key: MESOS-156
 URL: https://issues.apache.org/jira/browse/MESOS-156
 Project: Mesos
  Issue Type: Story
  Components: framework
Reporter: Andy Konwinski
   Original Estimate: 2m
  Remaining Estimate: 2m

 One of the primary points of confusion about Mesos is the mechanism it 
 provides frameworks to acquire new resources (e.g. cpu, ram, etc.). 
 Currently, frameworks receive callbacks with resource offers which they can 
 accept (entirely or only a portion) or reject. When they accept them, they 
 provide a task to be executed on those resources.
 Many engineers we have spoken to have said that they would find it more 
 intuitive to provide their executable up front with a description of which 
 and how many resources they want, and then have mesos do the scheduling.
 I propose that Mesos should ship with a framework that can very easily be 
 installed and run by new users, and this framework should accept Launch Job 
 Requests expressed via some language that describes the resource 
 requirements and where to find the executable for the tasks in the job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-1874) Offer network interfaces as resources

2014-10-08 Thread Jay Buffington (JIRA)
Jay Buffington created MESOS-1874:
-

 Summary: Offer network interfaces as resources
 Key: MESOS-1874
 URL: https://issues.apache.org/jira/browse/MESOS-1874
 Project: Mesos
  Issue Type: Improvement
Reporter: Jay Buffington


I have a use case where I want two tasks to bind to the same port on the same 
slave, but on different interfaces.

Ports are offered as a resource, but it is assumed that the task will bind to 
all interfaces (0.0.0.0).  If task A gets the Port resources of 31201 and only 
binds to 127.0.0.1:31201, task B cannot get that Port and bind to 
10.1.2.3:31201 on the same host even though 10.1.2.3:31201 is unused.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1871) Sending SIGTERM to a task command may render it orphaned

2014-10-08 Thread Timothy St. Clair (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163661#comment-14163661
 ] 

Timothy St. Clair commented on MESOS-1871:
--

Doesn't the executor get isolated by its container?  If this is not the case, 
then my world view is incorrect :-/ 

 Sending SIGTERM to a task command may render it orphaned
 

 Key: MESOS-1871
 URL: https://issues.apache.org/jira/browse/MESOS-1871
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Alexander Rukletsov
Assignee: Alexander Rukletsov

 {{CommandExecutor}} launches tasks wrapping them into {{sh -c}}. That means 
 signals are sent to the top process—that is {{sh -c}}—and not to the task 
 directly. Though {{SIGTERM}} is propagated by {{sh -c}} down the process 
 tree, if the task is unresponsive to {{SIGTERM}}, {{sh -c}} terminates 
 reporting success to the {{CommandExecutor}}, rendering the task detached 
 from the parent process and still running. Because the {{CommandExecutor}} 
 thinks the command terminated normally, its OS process exits normally and may 
 not trigger containerizer's escalation which destroys cgroups.
 Here is the test related to the first part: 
 [https://gist.github.com/rukletsov/68259dfb02421813f9e6].
 Here is the test related to the second part: 
 [https://gist.github.com/rukletsov/3f19ecc7389fa51e65c0].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1046) Use of leading underscore in names (global symbols and defines)

2014-10-08 Thread Dominic Hamon (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163710#comment-14163710
 ] 

Dominic Hamon commented on MESOS-1046:
--

We can replace the include guards with the guidance from 
http://google-styleguide.googlecode.com/svn/trunk/cppguide.html#The__define_Guard.

Continuations are a more intrusive change. The underscore scheme works really 
well for indicating continuations, but we do use two or more continuations in 
places. I'm loathe to suggest numbering ({{launch}}, {{launch1}}, {{launch2}}) 
as i find that difficult to parse. perhaps breaking up the underscores with a 
character like 'c' for continuation: {{launch}}, {{c_launch}}, {{c_c_launch}}?



 Use of leading underscore in names (global symbols and defines)
 ---

 Key: MESOS-1046
 URL: https://issues.apache.org/jira/browse/MESOS-1046
 Project: Mesos
  Issue Type: Improvement
  Components: technical debt
Affects Versions: 0.19.0
Reporter: Till Toenshoff
Priority: Minor
  Labels: c, c++, libprocess, mesos, standards, stout

 Even though this appears to be a very common standard breach, I thought it 
 would still be nice to play entirely by the rules.
 If I get things right, then according to the 1999 C standard as well as the 
 2003 C++ standard, using leading underscores followed by a capital letter and 
 maybe even more importantly, using double-underscores are reserved for the 
 implementation of those standards. This appears to apply for both, global 
 namespace symbols as well as defines. 
 We are currently using double-underscores in our include-guards and it may be 
 wise to fix that and any other collision with the standards in relation with 
 the use of underscores.
 A nice compilation of the related standard quotes can be found at 
 http://stackoverflow.com/a/228797/91282



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1416) mesos-0.19.0 build directory is read-only

2014-10-08 Thread Timothy St. Clair (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163732#comment-14163732
 ] 

Timothy St. Clair commented on MESOS-1416:
--

I don't believe this should be a problem no master, if not, please let us know. 

 mesos-0.19.0 build directory is read-only
 -

 Key: MESOS-1416
 URL: https://issues.apache.org/jira/browse/MESOS-1416
 Project: Mesos
  Issue Type: Bug
  Components: build
 Environment: Ubuntu 13.10
Reporter: Vinson Lee
Priority: Blocker

 The build creates a read-only mesos-0.19.0 directory. This blocks Jenkins 
 builds because the workspace cannot be automatically cleaned by the git 
 plugin.
 {noformat}
 [...]
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/gzip.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/fatal.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/linkedhashmap.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/protobuf.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/foreach.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/memory.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/hashset.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/format.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/error.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/uuid.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/net.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/numify.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp
 [...]
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1871) Sending SIGTERM to a task command may render it orphaned

2014-10-08 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163768#comment-14163768
 ] 

Alexander Rukletsov commented on MESOS-1871:


I think what happens is that the task process escapes its process tree and is 
not killed by {{PosixLauncher}}. Here is an orphaned process after launching 
the first test:
{code}
alex@alex-hh.local: ~ $ ps aux | grep handler
alex 5641   0.0  0.0  2432784624 s003  S+6:52PM   0:00.00 
grep handler
alex 5620   0.0  0.0  2447700688   ??  S 6:52PM   0:00.00 
sh -c ( handler() { echo SIGTERM; }; trap 'handler TERM' SIGTERM; echo $$; echo 
$(which sleep); while true; do date; sleep 1; done; exit 0 )

alex@alex-hh.local: ~ $ ps -p 5620 -o ppid=
1
{code}

 Sending SIGTERM to a task command may render it orphaned
 

 Key: MESOS-1871
 URL: https://issues.apache.org/jira/browse/MESOS-1871
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Alexander Rukletsov
Assignee: Alexander Rukletsov

 {{CommandExecutor}} launches tasks wrapping them into {{sh -c}}. That means 
 signals are sent to the top process—that is {{sh -c}}—and not to the task 
 directly. Though {{SIGTERM}} is propagated by {{sh -c}} down the process 
 tree, if the task is unresponsive to {{SIGTERM}}, {{sh -c}} terminates 
 reporting success to the {{CommandExecutor}}, rendering the task detached 
 from the parent process and still running. Because the {{CommandExecutor}} 
 thinks the command terminated normally, its OS process exits normally and may 
 not trigger containerizer's escalation which destroys cgroups.
 Here is the test related to the first part: 
 [https://gist.github.com/rukletsov/68259dfb02421813f9e6].
 Here is the test related to the second part: 
 [https://gist.github.com/rukletsov/3f19ecc7389fa51e65c0].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1871) Sending SIGTERM to a task command may render it orphaned

2014-10-08 Thread Ian Downes (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163803#comment-14163803
 ] 

Ian Downes commented on MESOS-1871:
---

I looked at the code: os::killtree()'s behavior is incorrect.

1. The posix launcher puts the executor into it's own session with setsid.
2. The posix launcher calls os::killtree(pid, SIGKILL, true, true) where the 
trues are for killing all processes in group and session.
3. os::killtree() *returns early* if it can't find the *process* with pid 
(which is the scenario you're describing) so it doesn't actually continue to 
kill everything in the process group/session.

I modified the code early this year and perpetuated the existing bug. I'll file 
a ticket on this.


 Sending SIGTERM to a task command may render it orphaned
 

 Key: MESOS-1871
 URL: https://issues.apache.org/jira/browse/MESOS-1871
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Alexander Rukletsov
Assignee: Alexander Rukletsov

 {{CommandExecutor}} launches tasks wrapping them into {{sh -c}}. That means 
 signals are sent to the top process—that is {{sh -c}}—and not to the task 
 directly. Though {{SIGTERM}} is propagated by {{sh -c}} down the process 
 tree, if the task is unresponsive to {{SIGTERM}}, {{sh -c}} terminates 
 reporting success to the {{CommandExecutor}}, rendering the task detached 
 from the parent process and still running. Because the {{CommandExecutor}} 
 thinks the command terminated normally, its OS process exits normally and may 
 not trigger containerizer's escalation which destroys cgroups.
 Here is the test related to the first part: 
 [https://gist.github.com/rukletsov/68259dfb02421813f9e6].
 Here is the test related to the second part: 
 [https://gist.github.com/rukletsov/3f19ecc7389fa51e65c0].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-1875) os::killtree() incorrectly returns early if pid has terminated

2014-10-08 Thread Ian Downes (JIRA)
Ian Downes created MESOS-1875:
-

 Summary: os::killtree() incorrectly returns early if pid has 
terminated
 Key: MESOS-1875
 URL: https://issues.apache.org/jira/browse/MESOS-1875
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.20.1, 0.19.1, 0.20.0, 0.19.0, 0.18.2, 0.18.1, 0.18.0
Reporter: Ian Downes


If groups == true and/or sessions == true then os::kill tree should continue to 
signal all processes in the process group and/or session, even if the leading 
pid has terminated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1875) os::killtree() incorrectly returns early if pid has terminated

2014-10-08 Thread Ian Downes (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Downes updated MESOS-1875:
--
Description: If groups == true and/or sessions == true then os::killtree() 
should continue to signal all processes in the process group and/or session, 
even if the leading pid has terminated.  (was: If groups == true and/or 
sessions == true then os::kill tree should continue to signal all processes in 
the process group and/or session, even if the leading pid has terminated.)

 os::killtree() incorrectly returns early if pid has terminated
 --

 Key: MESOS-1875
 URL: https://issues.apache.org/jira/browse/MESOS-1875
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.18.0, 0.18.1, 0.18.2, 0.19.0, 0.20.0, 0.19.1, 0.20.1
Reporter: Ian Downes

 If groups == true and/or sessions == true then os::killtree() should continue 
 to signal all processes in the process group and/or session, even if the 
 leading pid has terminated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-156) Create framework that provides a high level resource request language

2014-10-08 Thread Andy Konwinski (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163866#comment-14163866
 ] 

Andy Konwinski commented on MESOS-156:
--

Sure!



 Create framework that provides a high level resource request language
 -

 Key: MESOS-156
 URL: https://issues.apache.org/jira/browse/MESOS-156
 Project: Mesos
  Issue Type: Story
  Components: framework
Reporter: Andy Konwinski
   Original Estimate: 2m
  Remaining Estimate: 2m

 One of the primary points of confusion about Mesos is the mechanism it 
 provides frameworks to acquire new resources (e.g. cpu, ram, etc.). 
 Currently, frameworks receive callbacks with resource offers which they can 
 accept (entirely or only a portion) or reject. When they accept them, they 
 provide a task to be executed on those resources.
 Many engineers we have spoken to have said that they would find it more 
 intuitive to provide their executable up front with a description of which 
 and how many resources they want, and then have mesos do the scheduling.
 I propose that Mesos should ship with a framework that can very easily be 
 installed and run by new users, and this framework should accept Launch Job 
 Requests expressed via some language that describes the resource 
 requirements and where to find the executable for the tasks in the job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-1416) mesos-0.19.0 build directory is read-only

2014-10-08 Thread Timothy St. Clair (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163732#comment-14163732
 ] 

Timothy St. Clair edited comment on MESOS-1416 at 10/8/14 6:33 PM:
---

I don't believe this should be a problem on master, if not, please let us know. 


was (Author: tstclair):
I don't believe this should be a problem no master, if not, please let us know. 

 mesos-0.19.0 build directory is read-only
 -

 Key: MESOS-1416
 URL: https://issues.apache.org/jira/browse/MESOS-1416
 Project: Mesos
  Issue Type: Bug
  Components: build
 Environment: Ubuntu 13.10
Reporter: Vinson Lee
Priority: Blocker

 The build creates a read-only mesos-0.19.0 directory. This blocks Jenkins 
 builds because the workspace cannot be automatically cleaned by the git 
 plugin.
 {noformat}
 [...]
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/gzip.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/fatal.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/linkedhashmap.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/protobuf.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/foreach.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/memory.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/hashset.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/format.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/error.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/uuid.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/net.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/numify.hpp
 warning: failed to remove 
 mesos-0.19.0/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp
 [...]
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1848) DRFAllocatorTest.DRFAllocatorProcess is flaky

2014-10-08 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163973#comment-14163973
 ] 

Till Toenshoff commented on MESOS-1848:
---

Turned out the described symptoms were caused by a custom sasl installation I 
did on that VM. After removing all traces of it and rebuilding against a proper 
one, everything went back to normal. That certainly does not really help for 
pinning the problem to the exact cause but it did the job for me. 

 DRFAllocatorTest.DRFAllocatorProcess is flaky
 -

 Key: MESOS-1848
 URL: https://issues.apache.org/jira/browse/MESOS-1848
 Project: Mesos
  Issue Type: Bug
  Components: test
 Environment: Fedora 20
Reporter: Vinod Kone

 Observed this on CI. This is pretty strange because the authentication of 
 both the framework and slave timed out at the very beginning, even though we 
 don't manipulate clocks.
 {code}
 [ RUN  ] DRFAllocatorTest.DRFAllocatorProcess
 Using temporary directory '/tmp/DRFAllocatorTest_DRFAllocatorProcess_igiR9X'
 I0929 20:11:12.801327 16997 leveldb.cpp:176] Opened db in 489720ns
 I0929 20:11:12.801627 16997 leveldb.cpp:183] Compacted db in 168280ns
 I0929 20:11:12.801784 16997 leveldb.cpp:198] Created db iterator in 5820ns
 I0929 20:11:12.801898 16997 leveldb.cpp:204] Seeked to beginning of db in 
 1285ns
 I0929 20:11:12.802039 16997 leveldb.cpp:273] Iterated through 0 keys in the 
 db in 792ns
 I0929 20:11:12.802160 16997 replica.cpp:741] Replica recovered with log 
 positions 0 - 0 with 1 holes and 0 unlearned
 I0929 20:11:12.802441 17012 recover.cpp:425] Starting replica recovery
 I0929 20:11:12.802623 17012 recover.cpp:451] Replica is in EMPTY status
 I0929 20:11:12.803251 17012 replica.cpp:638] Replica in EMPTY status received 
 a broadcasted recover request
 I0929 20:11:12.803427 17012 recover.cpp:188] Received a recover response from 
 a replica in EMPTY status
 I0929 20:11:12.803632 17012 recover.cpp:542] Updating replica status to 
 STARTING
 I0929 20:11:12.803911 17012 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 33999ns
 I0929 20:11:12.804033 17012 replica.cpp:320] Persisted replica status to 
 STARTING
 I0929 20:11:12.804245 17012 recover.cpp:451] Replica is in STARTING status
 I0929 20:11:12.804592 17012 replica.cpp:638] Replica in STARTING status 
 received a broadcasted recover request
 I0929 20:11:12.804775 17012 recover.cpp:188] Received a recover response from 
 a replica in STARTING status
 I0929 20:11:12.804952 17012 recover.cpp:542] Updating replica status to VOTING
 I0929 20:11:12.805115 17012 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 15990ns
 I0929 20:11:12.805234 17012 replica.cpp:320] Persisted replica status to 
 VOTING
 I0929 20:11:12.805366 17012 recover.cpp:556] Successfully joined the Paxos 
 group
 I0929 20:11:12.805539 17012 recover.cpp:440] Recover process terminated
 I0929 20:11:12.809062 17017 master.cpp:312] Master 
 20140929-201112-2759502016-47295-16997 (fedora-20) started on 
 192.168.122.164:47295
 I0929 20:11:12.809432 17017 master.cpp:358] Master only allowing 
 authenticated frameworks to register
 I0929 20:11:12.809546 17017 master.cpp:363] Master only allowing 
 authenticated slaves to register
 I0929 20:11:12.810169 17017 credentials.hpp:36] Loading credentials for 
 authentication from 
 '/tmp/DRFAllocatorTest_DRFAllocatorProcess_igiR9X/credentials'
 I0929 20:11:12.810510 17017 master.cpp:392] Authorization enabled
 I0929 20:11:12.811841 17016 master.cpp:120] No whitelist given. Advertising 
 offers for all slaves
 I0929 20:11:12.812099 17013 hierarchical_allocator_process.hpp:299] 
 Initializing hierarchical allocator process with master : 
 master@192.168.122.164:47295
 I0929 20:11:12.813006 17017 master.cpp:1241] The newly elected leader is 
 master@192.168.122.164:47295 with id 20140929-201112-2759502016-47295-16997
 I0929 20:11:12.813164 17017 master.cpp:1254] Elected as the leading master!
 I0929 20:11:12.813279 17017 master.cpp:1072] Recovering from registrar
 I0929 20:11:12.813487 17013 registrar.cpp:312] Recovering registrar
 I0929 20:11:12.813824 17013 log.cpp:656] Attempting to start the writer
 I0929 20:11:12.814256 17013 replica.cpp:474] Replica received implicit 
 promise request with proposal 1
 I0929 20:11:12.814419 17013 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 25049ns
 I0929 20:11:12.814581 17013 replica.cpp:342] Persisted promised to 1
 I0929 20:11:12.814909 17013 coordinator.cpp:230] Coordinator attemping to 
 fill missing position
 I0929 20:11:12.815340 17013 replica.cpp:375] Replica received explicit 
 promise request for position 0 with proposal 2
 I0929 20:11:12.815497 17013 leveldb.cpp:343] Persisting action (8 bytes) to 
 leveldb took 19855ns
 I0929 20:11:12.815636 17013 

[jira] [Commented] (MESOS-1847) mesos-ec2 launch: tries to rsync before ssh is available

2014-10-08 Thread Killian Murphy (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14164094#comment-14164094
 ] 

Killian Murphy commented on MESOS-1847:
---

I had the same issue.

Adding --wait 600 worked for me. Adding --wait 180 did not. Testing with sshing 
into the created VM after the failure looks like about 7-8 minutes before sshd 
is ready for login.
The only way to recover for me was destroy and recreate with the additional 
--wait option.

Here's the failure:

killian@nore ~/development/mesos/mesos-0.20.1/ec2: ./mesos_ec2.py -k kdefault 
-i ~/AWS/id_rsa-kdefault -s 1 launch k_mesos
Setting up security groups...
Checking for running cluster...
Launching instances...
Launched slaves, regid = r-87bd89ac
Launched master, regid = r-65bf8b4e
Waiting for instances to start up...
Waiting 60 more seconds...
Deploying files to master...
ssh: connect to host ec2-54-237-156-217.compute-1.amazonaws.com port 22: 
Connection refused
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at 
/SourceCache/rsync/rsync-42/rsync/io.c(452) [sender=2.6.9]
Traceback (most recent call last):
  File ./mesos_ec2.py, line 571, in module
main()
  File ./mesos_ec2.py, line 480, in main
setup_cluster(conn, master_nodes, slave_nodes, zoo_nodes, opts, True)
  File ./mesos_ec2.py, line 334, in setup_cluster
deploy_files(conn, deploy. + opts.os, opts, master_nodes, slave_nodes, 
zoo_nodes)
  File ./mesos_ec2.py, line 445, in deploy_files
subprocess.check_call(command, shell=True)
  File 
/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py,
 line 540, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'rsync -rv -e 'ssh -o 
StrictHostKeyChecking=no -i /Users/killian/AWS/id_rsa-kdefault' 
'/var/folders/8t/hp2txtm56h3byl8q5cdd33bmgp/T/tmp5VZqO3/' 
'r...@ec2-54-237-156-217.compute-1.amazonaws.com:/'' returned non-zero exit 
status 255



 mesos-ec2 launch: tries to rsync before ssh is available
 

 Key: MESOS-1847
 URL: https://issues.apache.org/jira/browse/MESOS-1847
 Project: Mesos
  Issue Type: Bug
  Components: ec2
Reporter: Kevin Matzen

 If you don't specify a wait time that is long enough, then wait_for_cluster 
 will return once the instances have launched, but ssh will not necessarily be 
 available.  deploy_files will execute rsync and then possibly fail.  ssh 
 should be tested before continuing onto the file deployment stage.  It's not 
 really clear to me why opts.wait is even a thing when you can simply test for 
 the availability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-1876) Remove deprecated 'slave_id' field in ReregisterSlaveMessage.

2014-10-08 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-1876:
--

 Summary: Remove deprecated 'slave_id' field in 
ReregisterSlaveMessage.
 Key: MESOS-1876
 URL: https://issues.apache.org/jira/browse/MESOS-1876
 Project: Mesos
  Issue Type: Task
  Components: technical debt
Reporter: Benjamin Mahler


This is to follow through on removing the deprecated field that we've been 
phasing out. In 0.21.0, this field will no longer be read:

{code}
message ReregisterSlaveMessage {
  // TODO(bmahler): slave_id is deprecated.
  // 0.21.0: Now an optional field. Always written, never read.
  // 0.22.0: Remove this field.
  optional SlaveID slave_id = 1;
  required SlaveInfo slave = 2;
  repeated ExecutorInfo executor_infos = 4;
  repeated Task tasks = 3;
  repeated Archive.Framework completed_frameworks = 5;
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1876) Remove deprecated 'slave_id' field in ReregisterSlaveMessage.

2014-10-08 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-1876:
---
Priority: Trivial  (was: Major)

 Remove deprecated 'slave_id' field in ReregisterSlaveMessage.
 -

 Key: MESOS-1876
 URL: https://issues.apache.org/jira/browse/MESOS-1876
 Project: Mesos
  Issue Type: Task
  Components: technical debt
Reporter: Benjamin Mahler
Priority: Trivial

 This is to follow through on removing the deprecated field that we've been 
 phasing out. In 0.21.0, this field will no longer be read:
 {code}
 message ReregisterSlaveMessage {
   // TODO(bmahler): slave_id is deprecated.
   // 0.21.0: Now an optional field. Always written, never read.
   // 0.22.0: Remove this field.
   optional SlaveID slave_id = 1;
   required SlaveInfo slave = 2;
   repeated ExecutorInfo executor_infos = 4;
   repeated Task tasks = 3;
   repeated Archive.Framework completed_frameworks = 5;
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-1877) Unify stout include style

2014-10-08 Thread Cody Maloney (JIRA)
Cody Maloney created MESOS-1877:
---

 Summary: Unify stout include style
 Key: MESOS-1877
 URL: https://issues.apache.org/jira/browse/MESOS-1877
 Project: Mesos
  Issue Type: Bug
  Components: stout
Reporter: Cody Maloney
Priority: Minor


Some of the files in stout use relative includes (stringify.hpp for example), 
while others use absolute includes (resulth.pp) to get to files which live 
inside of stout. They should all use one format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-1878) Access to sandbox on slave from master UI does not show the sandbox contents

2014-10-08 Thread Anindya Sinha (JIRA)
Anindya Sinha created MESOS-1878:


 Summary: Access to sandbox on slave from master UI does not show 
the sandbox contents
 Key: MESOS-1878
 URL: https://issues.apache.org/jira/browse/MESOS-1878
 Project: Mesos
  Issue Type: Bug
  Components: webui
Reporter: Anindya Sinha
Priority: Minor


From master UI, clicking Sandbox to go to slave sandbox does not list the 
sandbox contents. The directory path of the sandbox shows up fine, but not the 
actual contents of the sandbox that is displayed below.

Looks like the issue is it fails in the following GET from the corresponding 
slave:
http://slave1:4891/files/browse.json?jsonp=angular.callbacks._9path=sandbox-path

Looking at the commits, I could confirm that the issue is not seen with commit 
'babb1c06ecf3077f292a19cfcbf1f1a4ed0e07b1'. Rolling back to a mesos build with 
this commit being the last commit on mesos slave does not show this behavior.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-1879) Handle a temporary one-way slave -- master socket closure.

2014-10-08 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-1879:
--

 Summary: Handle a temporary one-way slave -- master socket 
closure.
 Key: MESOS-1879
 URL: https://issues.apache.org/jira/browse/MESOS-1879
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Benjamin Mahler
Priority: Minor


In the same spirit as MESOS-1668, we want to correctly handle a scenario where 
the slave -- master socket closes, and a new socket can be immediately 
re-established.

If this occurs, the ping / pongs will resume but there may be dropped messages 
sent by the slave, and so a re-registration would be a good safety net.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1878) Access to sandbox on slave from master UI does not show the sandbox contents

2014-10-08 Thread Anindya Sinha (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anindya Sinha updated MESOS-1878:
-
Description: 
From master UI, clicking Sandbox to go to slave sandbox does not list the 
sandbox contents. The directory path of the sandbox shows up fine, but not the 
actual contents of the sandbox that is displayed below.

Looks like the issue is it fails in the following GET from the corresponding 
slave:
http://slave-host:4891/files/browse.json?jsonp=angular.callbacks._9path=sandbox-path

Looking at the commits, I could confirm that the issue is not seen with commit 
'babb1c06ecf3077f292a19cfcbf1f1a4ed0e07b1'. Rolling back to a mesos build with 
this commit being the last commit on mesos slave does not show this behavior.


  was:
From master UI, clicking Sandbox to go to slave sandbox does not list the 
sandbox contents. The directory path of the sandbox shows up fine, but not the 
actual contents of the sandbox that is displayed below.

Looks like the issue is it fails in the following GET from the corresponding 
slave:
http://slave1:4891/files/browse.json?jsonp=angular.callbacks._9path=sandbox-path

Looking at the commits, I could confirm that the issue is not seen with commit 
'babb1c06ecf3077f292a19cfcbf1f1a4ed0e07b1'. Rolling back to a mesos build with 
this commit being the last commit on mesos slave does not show this behavior.



 Access to sandbox on slave from master UI does not show the sandbox contents
 

 Key: MESOS-1878
 URL: https://issues.apache.org/jira/browse/MESOS-1878
 Project: Mesos
  Issue Type: Bug
  Components: webui
Reporter: Anindya Sinha
Priority: Minor

 From master UI, clicking Sandbox to go to slave sandbox does not list the 
 sandbox contents. The directory path of the sandbox shows up fine, but not 
 the actual contents of the sandbox that is displayed below.
 Looks like the issue is it fails in the following GET from the corresponding 
 slave:
 http://slave-host:4891/files/browse.json?jsonp=angular.callbacks._9path=sandbox-path
 Looking at the commits, I could confirm that the issue is not seen with 
 commit 'babb1c06ecf3077f292a19cfcbf1f1a4ed0e07b1'. Rolling back to a mesos 
 build with this commit being the last commit on mesos slave does not show 
 this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-1880) config options page on mesos.apache.org out of date and missing versioning info.

2014-10-08 Thread Jay Buffington (JIRA)
Jay Buffington created MESOS-1880:
-

 Summary: config options page on mesos.apache.org out of date and 
missing versioning info.
 Key: MESOS-1880
 URL: https://issues.apache.org/jira/browse/MESOS-1880
 Project: Mesos
  Issue Type: Improvement
Reporter: Jay Buffington
Assignee: Dave Lester


http://mesos.apache.org/documentation/latest/configuration/ is old.  For 
example slave options doesn't list --containerizers which was introduced in 
0.20.0.

Also, I think there should be a note that the list on that page is for a 
particular version.  mesos-slave --help is the best way to get all the options 
for the particular version you're running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1880) config options page on mesos.apache.org out of date and missing versioning info.

2014-10-08 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-1880:
--
Issue Type: Documentation  (was: Improvement)

 config options page on mesos.apache.org out of date and missing versioning 
 info.
 

 Key: MESOS-1880
 URL: https://issues.apache.org/jira/browse/MESOS-1880
 Project: Mesos
  Issue Type: Documentation
  Components: documentation
Reporter: Jay Buffington
Assignee: Dave Lester
  Labels: newbie

 http://mesos.apache.org/documentation/latest/configuration/ is old.  For 
 example slave options doesn't list --containerizers which was introduced in 
 0.20.0.
 Also, I think there should be a note that the list on that page is for a 
 particular version.  mesos-slave --help is the best way to get all the 
 options for the particular version you're running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1880) config options page on mesos.apache.org out of date and missing versioning info.

2014-10-08 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-1880:
--
Labels: newbie  (was: )

 config options page on mesos.apache.org out of date and missing versioning 
 info.
 

 Key: MESOS-1880
 URL: https://issues.apache.org/jira/browse/MESOS-1880
 Project: Mesos
  Issue Type: Documentation
  Components: documentation
Reporter: Jay Buffington
Assignee: Dave Lester
  Labels: newbie

 http://mesos.apache.org/documentation/latest/configuration/ is old.  For 
 example slave options doesn't list --containerizers which was introduced in 
 0.20.0.
 Also, I think there should be a note that the list on that page is for a 
 particular version.  mesos-slave --help is the best way to get all the 
 options for the particular version you're running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1880) config options page on mesos.apache.org out of date and missing versioning info.

2014-10-08 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-1880:
--
Component/s: documentation

 config options page on mesos.apache.org out of date and missing versioning 
 info.
 

 Key: MESOS-1880
 URL: https://issues.apache.org/jira/browse/MESOS-1880
 Project: Mesos
  Issue Type: Documentation
  Components: documentation
Reporter: Jay Buffington
Assignee: Dave Lester
  Labels: newbie

 http://mesos.apache.org/documentation/latest/configuration/ is old.  For 
 example slave options doesn't list --containerizers which was introduced in 
 0.20.0.
 Also, I think there should be a note that the list on that page is for a 
 particular version.  mesos-slave --help is the best way to get all the 
 options for the particular version you're running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1878) Access to sandbox on slave from master UI does not show the sandbox contents

2014-10-08 Thread Anindya Sinha (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anindya Sinha updated MESOS-1878:
-
Description: 
From master UI, clicking Sandbox to go to slave sandbox does not list the 
sandbox contents. The directory path of the sandbox shows up fine, but not the 
actual contents of the sandbox that is displayed below.

Looks like the issue is it fails in the following GET from the corresponding 
slave:
http://slave-host:4891/files/browse.json?jsonp=angular.callbacks._9path=sandbox-path

Looking at the commits, I could confirm that the issue is not seen with commit 
'babb1c06ecf3077f292a19cfcbf1f1a4ed0e07b1'. Rolling back to a mesos build with 
this commit being the last commit on mesos slave does not show this behavior.

Update: The issue has been introduced by the following 2 commits:
ca2e8ef MESOS-1857 Fixed path::join() on older libstdc++ which lack back().
b08fccf Switched path::join() to be variadic

Note that the commit ca2e8ef fixes a build issue (on older libstd++) on top of 
the commit b08fccf.

  was:
From master UI, clicking Sandbox to go to slave sandbox does not list the 
sandbox contents. The directory path of the sandbox shows up fine, but not the 
actual contents of the sandbox that is displayed below.

Looks like the issue is it fails in the following GET from the corresponding 
slave:
http://slave-host:4891/files/browse.json?jsonp=angular.callbacks._9path=sandbox-path

Looking at the commits, I could confirm that the issue is not seen with commit 
'babb1c06ecf3077f292a19cfcbf1f1a4ed0e07b1'. Rolling back to a mesos build with 
this commit being the last commit on mesos slave does not show this behavior.



 Access to sandbox on slave from master UI does not show the sandbox contents
 

 Key: MESOS-1878
 URL: https://issues.apache.org/jira/browse/MESOS-1878
 Project: Mesos
  Issue Type: Bug
  Components: webui
Reporter: Anindya Sinha
Priority: Minor

 From master UI, clicking Sandbox to go to slave sandbox does not list the 
 sandbox contents. The directory path of the sandbox shows up fine, but not 
 the actual contents of the sandbox that is displayed below.
 Looks like the issue is it fails in the following GET from the corresponding 
 slave:
 http://slave-host:4891/files/browse.json?jsonp=angular.callbacks._9path=sandbox-path
 Looking at the commits, I could confirm that the issue is not seen with 
 commit 'babb1c06ecf3077f292a19cfcbf1f1a4ed0e07b1'. Rolling back to a mesos 
 build with this commit being the last commit on mesos slave does not show 
 this behavior.
 Update: The issue has been introduced by the following 2 commits:
 ca2e8ef MESOS-1857 Fixed path::join() on older libstdc++ which lack back().
 b08fccf Switched path::join() to be variadic
 Note that the commit ca2e8ef fixes a build issue (on older libstd++) on top 
 of the commit b08fccf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1878) Access to sandbox on slave from master UI does not show the sandbox contents

2014-10-08 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-1878:
---
 Target Version/s: 0.21.0
Affects Version/s: 0.21.0
 Assignee: Cody Maloney

[~cmaloney] can you take a look at this?

 Access to sandbox on slave from master UI does not show the sandbox contents
 

 Key: MESOS-1878
 URL: https://issues.apache.org/jira/browse/MESOS-1878
 Project: Mesos
  Issue Type: Bug
  Components: webui
Affects Versions: 0.21.0
Reporter: Anindya Sinha
Assignee: Cody Maloney
Priority: Minor

 From master UI, clicking Sandbox to go to slave sandbox does not list the 
 sandbox contents. The directory path of the sandbox shows up fine, but not 
 the actual contents of the sandbox that is displayed below.
 Looks like the issue is it fails in the following GET from the corresponding 
 slave:
 http://slave-host:4891/files/browse.json?jsonp=angular.callbacks._9path=sandbox-path
 Looking at the commits, I could confirm that the issue is not seen with 
 commit 'babb1c06ecf3077f292a19cfcbf1f1a4ed0e07b1'. Rolling back to a mesos 
 build with this commit being the last commit on mesos slave does not show 
 this behavior.
 Update: The issue has been introduced by the following 2 commits:
 ca2e8ef MESOS-1857 Fixed path::join() on older libstdc++ which lack back().
 b08fccf Switched path::join() to be variadic
 Note that the commit ca2e8ef fixes a build issue (on older libstd++) on top 
 of the commit b08fccf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1869) UpdateFramework message might reach the slave before Reregistered message and get dropped

2014-10-08 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14164526#comment-14164526
 ] 

Benjamin Mahler commented on MESOS-1869:


Fixed as part of MESOS-1696:
https://reviews.apache.org/r/26206/

 UpdateFramework message might reach the slave before Reregistered message and 
 get dropped
 -

 Key: MESOS-1869
 URL: https://issues.apache.org/jira/browse/MESOS-1869
 Project: Mesos
  Issue Type: Bug
Reporter: Vinod Kone
Assignee: Benjamin Mahler

 In reregisterSlave() we send 'SlaveReregisteredMessage' before we link the 
 slave pid, which means a temporary socket will be created and used.
 Subsequently, after linking, we send the UpdateFrameworkMessage, which 
 creates and uses a persistent socket.
 This might lead to out-of-order delivery, resulting in UpdateFrameworkMessage 
 reaching the slave before the SlaveReregisteredMessage and getting dropped 
 because the slave is not yet (re-)registered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1830) Expose master stats differentiating between master-generated and slave-generated LOST tasks

2014-10-08 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14164536#comment-14164536
 ] 

Vinod Kone commented on MESOS-1830:
---

I added a proposal for how it could look in the attached review. Please take a 
look. 

Feedback welcome on the review or here.

 Expose master stats differentiating between master-generated and 
 slave-generated LOST tasks
 ---

 Key: MESOS-1830
 URL: https://issues.apache.org/jira/browse/MESOS-1830
 Project: Mesos
  Issue Type: Story
  Components: master
Reporter: Bill Farner
Priority: Minor

 The master exports a monotonically-increasing counter of tasks transitioned 
 to TASK_LOST.  This loses fidelity of the source of the lost task.  A first 
 step in exposing the source of lost tasks might be to just differentiate 
 between TASK_LOST transitions initiated by the master vs the slave (and maybe 
 bad input from the scheduler).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1469) No output from review bot on timeout

2014-10-08 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-1469:
---
Component/s: reviewbot

 No output from review bot on timeout
 

 Key: MESOS-1469
 URL: https://issues.apache.org/jira/browse/MESOS-1469
 Project: Mesos
  Issue Type: Bug
  Components: build, reviewbot
Reporter: Dominic Hamon
Assignee: Dominic Hamon
Priority: Minor

 When the mesos review build times out, likely due to a long-running failing 
 test, we have no output to debug. We should find a way to stream the output 
 from the build instead of waiting for the build to finish.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1234) Mesos ReviewBot should look at old reviews first

2014-10-08 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-1234:
---
Component/s: reviewbot

 Mesos ReviewBot should look at old reviews first
 

 Key: MESOS-1234
 URL: https://issues.apache.org/jira/browse/MESOS-1234
 Project: Mesos
  Issue Type: Improvement
  Components: reviewbot
Reporter: Vinod Kone
Assignee: Vinod Kone
 Fix For: 0.19.0


 Currently the ReviewBot looks at newest reviews first starving out old 
 reviews if there are enough new/updated reviews.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1712) Automate disallowing of commits mixing mesos/libprocess/stout

2014-10-08 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-1712:
---
Component/s: reviewbot

 Automate disallowing of commits mixing mesos/libprocess/stout
 -

 Key: MESOS-1712
 URL: https://issues.apache.org/jira/browse/MESOS-1712
 Project: Mesos
  Issue Type: Bug
  Components: reviewbot
Reporter: Vinod Kone

 For various reasons, we don't want to mix mesos/libprocess/stout changes into 
 a single commit. Typically, it is up to the reviewee/reviewer to catch this. 
 It wold be nice to automate this via the pre-commit hook .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-1881) Reviewbot should not apply reviews that are submitted.

2014-10-08 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-1881:
--

 Summary: Reviewbot should not apply reviews that are submitted.
 Key: MESOS-1881
 URL: https://issues.apache.org/jira/browse/MESOS-1881
 Project: Mesos
  Issue Type: Bug
  Components: reviewbot
Reporter: Benjamin Mahler
Priority: Trivial


If a review contains a dependent review that is already submitted, reviewbot 
will still try to apply it and it will fail.

We should skip dependent reviews that are marked as submitted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-1882) Add a --dry-run to verify-reviews.py.

2014-10-08 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-1882:
--

 Summary: Add a --dry-run to verify-reviews.py.
 Key: MESOS-1882
 URL: https://issues.apache.org/jira/browse/MESOS-1882
 Project: Mesos
  Issue Type: Improvement
  Components: reviewbot
Reporter: Benjamin Mahler
Priority: Minor


To improve the ease of making changes to verify-reviews.py, we should add the 
ability to pass a {{\-\-dry\-run}} flag. This will print all commands to be 
executed.

Additional improvements that we may want to break out of this ticket:
# Rename verify-reviews.py to verify_reviews.py to allow importing.
# Make verify-reviews.py only execute when run as a {{\_\_main\_\_}}, if 
imported it should merely make the library methods / classes available, so that 
one can use the library from an interpreter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1873) Don't pass task-related arguments to mesos-executor

2014-10-08 Thread R.B. Boyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

R.B. Boyer updated MESOS-1873:
--
Attachment: mesos_executor_overshare.v2.diff

Attaching a second attempt at the patch (mesos_executor_overshare.v2.diff), 
this time I have reliably reproduced the fix in a test environment by 
recompiling the library and swapping it out on a running system.

 Don't pass task-related arguments to mesos-executor
 ---

 Key: MESOS-1873
 URL: https://issues.apache.org/jira/browse/MESOS-1873
 Project: Mesos
  Issue Type: Bug
  Components: slave
Affects Versions: 0.20.1
 Environment: Linux 3.13.0-35-generic x86_64 Ubuntu-Precise
Reporter: R.B. Boyer
 Attachments: mesos_executor_overshare.v2.diff


 Attempting to launch a task using the command executor with {{shell=false}} 
 and passing arguments fails strangely.
 {noformat:title=CommandInfo proto}
 command {
   value: /my_program
   user: app
   shell: false
   arguments: my_program
   arguments: --start
   arguments: 2014-10-06
   arguments: --end
   arguments: 2014-10-07
 }
 {noformat}
 Dies with:
 {noformat:title=stderr}
 Failed to load unknown flag 'end'
 Usage: my_program [...]
 Supported options:
   --[no-]help Prints this help message (default: false)
   --[no-]override Whether or not to override the command the executor 
 should run
   when the task is launched. Only this flag is expected 
 to be on
   the command line and all arguments after the flag will 
 be used as
   the subsequent 'argv' to be used with 'execvp' 
 (default: false)
 {noformat}
 This is coming from a failed attempt to have the slave launch 
 {{mesos-executor}}.  This is due to an adverse interaction between new 
 {{CommandInfo}} features and this blurb from {{src/slave/slave.cpp}}:
 {code}
 // Copy the CommandInfo to get the URIs and environment, but
 // update it to invoke 'mesos-executor' (unless we couldn't
 // resolve 'mesos-executor' via 'realpath', in which case just
 // echo the error and exit).
 executor.mutable_command()-MergeFrom(task.command());
 Resultstring path = os::realpath(
 path::join(flags.launcher_dir, mesos-executor));
 if (path.isSome()) {
   executor.mutable_command()-set_value(path.get());
 } else {
   executor.mutable_command()-set_value(
   echo ' +
   (path.isError()
? path.error()
: No such file or directory) +
   '; exit 1);
 }
 {code}
 This is failing to:
 * clear the {{arguments}} field
 * probably explicitly restore {{shell=true}}
 * clear {{container}} ?
 * clear {{user}} ?
 I was able to quickly fix this locally by making a man-in-the-middle program 
 at {{/usr/local/libexec/mesos/mesos-executor}} that stripped all args before 
 exec-ing the real {{mesos-executor}} binary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1873) Don't pass task-related arguments to mesos-executor

2014-10-08 Thread R.B. Boyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

R.B. Boyer updated MESOS-1873:
--
Attachment: (was: mesos_executor_overshare.diff)

 Don't pass task-related arguments to mesos-executor
 ---

 Key: MESOS-1873
 URL: https://issues.apache.org/jira/browse/MESOS-1873
 Project: Mesos
  Issue Type: Bug
  Components: slave
Affects Versions: 0.20.1
 Environment: Linux 3.13.0-35-generic x86_64 Ubuntu-Precise
Reporter: R.B. Boyer
 Attachments: mesos_executor_overshare.v2.diff


 Attempting to launch a task using the command executor with {{shell=false}} 
 and passing arguments fails strangely.
 {noformat:title=CommandInfo proto}
 command {
   value: /my_program
   user: app
   shell: false
   arguments: my_program
   arguments: --start
   arguments: 2014-10-06
   arguments: --end
   arguments: 2014-10-07
 }
 {noformat}
 Dies with:
 {noformat:title=stderr}
 Failed to load unknown flag 'end'
 Usage: my_program [...]
 Supported options:
   --[no-]help Prints this help message (default: false)
   --[no-]override Whether or not to override the command the executor 
 should run
   when the task is launched. Only this flag is expected 
 to be on
   the command line and all arguments after the flag will 
 be used as
   the subsequent 'argv' to be used with 'execvp' 
 (default: false)
 {noformat}
 This is coming from a failed attempt to have the slave launch 
 {{mesos-executor}}.  This is due to an adverse interaction between new 
 {{CommandInfo}} features and this blurb from {{src/slave/slave.cpp}}:
 {code}
 // Copy the CommandInfo to get the URIs and environment, but
 // update it to invoke 'mesos-executor' (unless we couldn't
 // resolve 'mesos-executor' via 'realpath', in which case just
 // echo the error and exit).
 executor.mutable_command()-MergeFrom(task.command());
 Resultstring path = os::realpath(
 path::join(flags.launcher_dir, mesos-executor));
 if (path.isSome()) {
   executor.mutable_command()-set_value(path.get());
 } else {
   executor.mutable_command()-set_value(
   echo ' +
   (path.isError()
? path.error()
: No such file or directory) +
   '; exit 1);
 }
 {code}
 This is failing to:
 * clear the {{arguments}} field
 * probably explicitly restore {{shell=true}}
 * clear {{container}} ?
 * clear {{user}} ?
 I was able to quickly fix this locally by making a man-in-the-middle program 
 at {{/usr/local/libexec/mesos/mesos-executor}} that stripped all args before 
 exec-ing the real {{mesos-executor}} binary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-1883) Possible race between reregistration, launching tasks, and rescinding offers

2014-10-08 Thread Dominic Hamon (JIRA)
Dominic Hamon created MESOS-1883:


 Summary: Possible race between reregistration, launching tasks, 
and rescinding offers
 Key: MESOS-1883
 URL: https://issues.apache.org/jira/browse/MESOS-1883
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.21.0
Reporter: Dominic Hamon
Priority: Minor


When a framework reregisters, we rescind any offers we have sent, however the 
framework may attempt to launch tasks before the rescind message is received. 
This leads to a number of lost tasks due to invalid offers.

Should we send offers before a framework is registered?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)