[jira] [Updated] (MESOS-1667) Extract from URI while downloading into work dir

2015-08-13 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-1667:
--
Assignee: (was: Bernd Mathiske)

 Extract from URI while downloading into work dir
 

 Key: MESOS-1667
 URL: https://issues.apache.org/jira/browse/MESOS-1667
 Project: Mesos
  Issue Type: Improvement
  Components: fetcher, slave
Affects Versions: 0.20.0
 Environment: Every
Reporter: Bernd Mathiske
  Labels: features, mesosphere, performance
   Original Estimate: 96h
  Remaining Estimate: 96h

 When the fetcher downloads an extractable archive, e.g. a tar file, it 
 currently downloads it completely and only then starts extracting from it. 
 But only the end result is needed for execution. Thus the space used for the 
 downloaded copy of the archive is wasted. This can become critical in case of 
 large archives.
 The general idea to solve this issue is to perform the extraction while 
 downloading, and not storing intermediate results on disk. Possibly, this can 
 be achieved by arranging process pipes or by using some extraction library 
 code to stream the data through.
 However, as a result of this, repeated downloading may always be called for, 
 whereas given an existing (https://reviews.apache.org/r/21316/) but not yet 
 committed patch for MESOS-336, the fetcher cache could just repeat the 
 extraction, without downloading more than once. Thus choosing in-stream 
 extraction might result in an overall performance loss. We should therefore 
 give users extra options in CommandInfo.URI to choose how to handle this.
 In some cases, it could be possible to reuse the extracted assets directly, 
 also forgoing the repeat extraction. This could be handled with sym links. 
 Then extraction can happen during downloading and neither repeat downloading 
 nor repeat extraction occur. The user has to be conscious of the safety 
 issue, though, that any post-extraction modifications to the downloaded 
 assets are visible to subsequent tasks. So, an explicit flag in 
 CommandInfo.UIR is called for here, as well.
 Ideally, this issue would be solved as a follow-up of MESOS-336, because some 
 of the described benefits depend on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3070) Master CHECK failure if a framework uses duplicated task id.

2015-08-13 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695224#comment-14695224
 ] 

Klaus Ma commented on MESOS-3070:
-

*Current Status*:
- Reproduced duplicate task id CHECK failed with UT cases
- The previous solution (send KillTaskMessage to slave when re-register) 
will trigger another CHECK failed (removeTask)

*Next actions*:
- Find other solution:
Option 1. send rejected tasks list to slave within 
SlaveReregistedMessage, slave kill the executor/tasks accordingly
Option 2. persist tasks info in registry; reject duplicated tasks when 
master restarted
- Add post-condition check according to the solution

 Master CHECK failure if a framework uses duplicated task id.
 

 Key: MESOS-3070
 URL: https://issues.apache.org/jira/browse/MESOS-3070
 Project: Mesos
  Issue Type: Bug
  Components: master
Affects Versions: 0.22.1
Reporter: Jie Yu
Assignee: Klaus Ma

 We observed this in one of our testing cluster.
 One framework (under development) keeps launching tasks using the same 
 task_id. We don't expect the master to crash even if the framework is not 
 doing what it's supposed to do. However, under a series of events, this could 
 happen and keeps crashing the master.
 1) frameworkA launches task 'task_id_1' on slaveA
 2) master fails over
 3) slaveA has not re-registered yet
 4) frameworkA re-registered and launches task 'task_id_1' on slaveB
 5) slaveA re-registering and add task task_id_1' to frameworkA
 6) CHECK failure in addTask
 {noformat}
 I0716 21:52:50.759305 28805 master.hpp:159] Adding task 'task_id_1' with 
 resources cpus(*):4; mem(*):32768 on slave 
 20150417-232509-1735470090-5050-48870-S25 (hostname)
 ...
 ...
 F0716 21:52:50.760136 28805 master.hpp:362] Check failed: 
 !tasks.contains(task-task_id()) Duplicate task 'task_id_1' of framework 
 framework_id
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3255) Create TasksKiller Tests

2015-08-13 Thread Joerg Schad (JIRA)
Joerg Schad created MESOS-3255:
--

 Summary: Create TasksKiller Tests
 Key: MESOS-3255
 URL: https://issues.apache.org/jira/browse/MESOS-3255
 Project: Mesos
  Issue Type: Task
  Components: test
Reporter: Joerg Schad
Assignee: Joerg Schad


As a follow up to Mesos-3086 we test both the old (Freeze) TasksKiller and also 
the new (nonFreeze) TasksKiller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3224) Create a Mesos Contributor Newbie Guide

2015-08-13 Thread Diana Arroyo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diana Arroyo reassigned MESOS-3224:
---

Assignee: Diana Arroyo  (was: Timothy Chen)

 Create a Mesos Contributor Newbie Guide
 ---

 Key: MESOS-3224
 URL: https://issues.apache.org/jira/browse/MESOS-3224
 Project: Mesos
  Issue Type: Documentation
  Components: documentation
Reporter: Timothy Chen
Assignee: Diana Arroyo

 Currently the website doesn't have a helpful guide for community users to 
 know how to start learning to contribute to Mesos, understand the concepts 
 and lower the barrier to get involved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3224) Create a Mesos Contributor Newbie Guide

2015-08-13 Thread Diana Arroyo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diana Arroyo updated MESOS-3224:

Assignee: Timothy Chen  (was: Diana Arroyo)

 Create a Mesos Contributor Newbie Guide
 ---

 Key: MESOS-3224
 URL: https://issues.apache.org/jira/browse/MESOS-3224
 Project: Mesos
  Issue Type: Documentation
  Components: documentation
Reporter: Timothy Chen
Assignee: Timothy Chen

 Currently the website doesn't have a helpful guide for community users to 
 know how to start learning to contribute to Mesos, understand the concepts 
 and lower the barrier to get involved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3256) Consistent naming of http request methods.

2015-08-13 Thread Joerg Schad (JIRA)
Joerg Schad created MESOS-3256:
--

 Summary: Consistent naming of http request methods.
 Key: MESOS-3256
 URL: https://issues.apache.org/jira/browse/MESOS-3256
 Project: Mesos
  Issue Type: Task
Reporter: Joerg Schad


Currently the http requests in libprocess/http.hpp are named post(), put(), and 
get(). This naming scheme did not for the addition of delete with Mesos-3152 as 
delete is a C++ keyword and hence that call was named deleteRequest.

We should come up with a consistent naming scheme which is easily 
understandable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2880) Add Frameworkinfo.capabilities on framework re-registration

2015-08-13 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2880:
--
Description: Add support for adding capabilities. This should be 
straightforward.  (was: Part 1: Add support for adding capabilities. This 
should be straightforward.

Part 2: Add support for removing capabilities. This is a bit tricky because we 
need to deal with exiting tasks and allocations for revocable resources.)
Summary: Add Frameworkinfo.capabilities on framework re-registration  
(was: Update Frameworkinfo.capabilities on framework re-registration)

 Add Frameworkinfo.capabilities on framework re-registration
 ---

 Key: MESOS-2880
 URL: https://issues.apache.org/jira/browse/MESOS-2880
 Project: Mesos
  Issue Type: Improvement
Reporter: Vinod Kone
Assignee: Aditi Dixit

 Add support for adding capabilities. This should be straightforward.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2706) When the docker-tasks grow, the time spare between Queuing task and Starting container grows

2015-08-13 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695609#comment-14695609
 ] 

Alexander Rukletsov commented on MESOS-2706:


This can be a docker related issue: docker daemon process requests slower in 
presence of numerous docker containers.

 When the docker-tasks grow, the time spare between Queuing task and Starting 
 container grows
 

 Key: MESOS-2706
 URL: https://issues.apache.org/jira/browse/MESOS-2706
 Project: Mesos
  Issue Type: Bug
  Components: docker
Affects Versions: 0.22.0
 Environment: My Environment info:
 Mesos 0.22.0  Marathon 0.82-RC1 both running in one host-server.
 Every docker-task require 0.02 CPU and 128MB ,and the server has 8 cpus and 
 24G mems.
 So Mesos can launch thousands of task in theory.
 And the docker-task is very light-weight to launch a sshd service .
Reporter: chenqiuhao

 At the beginning, Marathon can launch docker-task very fast,but when the 
 number of tasks in the only-one mesos-slave host reached 50,It seemed 
 Marathon lauch docker-task slow.
 So I check the mesos-slave log,and I found that the time spare between 
 Queuing task and Starting container grew .
 For example, 
 launch the 1st docker task, it takes about 0.008s
 [root@CNSH231434 mesos-slave]# tail -f slave.out |egrep 'Queuing 
 task|Starting container'
 I0508 15:54:00.188350 225779 slave.cpp:1378] Queuing task 
 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' for executor 
 dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b of framework 
 '20150202-112355-2684495626-5050-26153-
 I0508 15:54:00.196832 225781 docker.cpp:581] Starting container 
 'd0b0813a-6cb6-4dfd-bbce-f1b338744285' for task 
 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' (and executor 
 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b') of framework 
 '20150202-112355-2684495626-5050-26153-'
 launch the 50th docker task, it takes about 4.9s
 I0508 16:12:10.908596 225781 slave.cpp:1378] Queuing task 
 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' for executor 
 dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b of framework 
 '20150202-112355-2684495626-5050-26153-
 I0508 16:12:15.801503 225778 docker.cpp:581] Starting container 
 '482dd47f-b9ab-4b09-b89e-e361d6f004a4' for task 
 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' (and executor 
 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b') of framework 
 '20150202-112355-2684495626-5050-26153-'
 And when i launch the 100th docker task,it takes about 13s!
 And I did the same test in one 24 Cpus and 256G mems server-host, it got the 
 same result.
 Did somebody have the same experience , or Can help to do the same pressure 
 test ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3257) Zookeeper JVM test failure causes test harness to fail

2015-08-13 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695592#comment-14695592
 ] 

haosdent commented on MESOS-3257:
-

Hello, could you add jdk version and operation system information for this 
problem?

 Zookeeper JVM test failure causes test harness to fail
 --

 Key: MESOS-3257
 URL: https://issues.apache.org/jira/browse/MESOS-3257
 Project: Mesos
  Issue Type: Bug
Reporter: Paul Brett

 Failure of the test setup for ZooKeeper Java setup causes test harness to 
 exit, preventing subsequent tests from running.
 {code}
 [--] 2 tests from LogZooKeeperTest
 F0813 16:09:33.647265 13790 zookeeper.cpp:78] CHECK_SOME(jvm): Error looking 
 up symbol 'JNI_CreateJavaVM' in '' : 
 /home/pbrett/sandbox/perf.refactor2/build/src/.libs/mesos-tests: undefined 
 symbol: JNI_CreateJavaVM
 *** Check failure stack trace: ***
 @ 0x7f2d8cca7aac  google::LogMessage::Fail()
 @ 0x7f2d8cca79fb  google::LogMessage::SendToLog()
 @ 0x7f2d8cca740c  google::LogMessage::Flush()
 @ 0x7f2d8ccaa140  google::LogMessageFatal::~LogMessageFatal()
 @   0x8a938c  _CheckFatal::~_CheckFatal()
 @  0x12f68c0  
 mesos::internal::tests::ZooKeeperTest::SetUpTestCase()
 @  0x132a88a  testing::TestCase::RunSetUpTestCase()
 @  0x1334cf7  
 testing::internal::HandleSehExceptionsInMethodIfSupported()
 @  0x132fb94  
 testing::internal::HandleExceptionsInMethodIfSupported()
 @  0x1311635  testing::TestCase::Run()
 @  0x1317fca  testing::internal::UnitTestImpl::RunAllTests()
 @  0x1335427  
 testing::internal::HandleSehExceptionsInMethodIfSupported()
 @  0x1330128  
 testing::internal::HandleExceptionsInMethodIfSupported()
 @  0x1316cf0  testing::UnitTest::Run()
 @   0xc3a9d8  RUN_ALL_TESTS()
 @   0xc3a6c8  main
 @ 0x7f2d8818d9f4  __libc_start_main
 @   0x8a5fa9  (unknown)
 make[3]: *** [check-local] Aborted
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3258) Remove Frameworkinfo capabilities on re-registration

2015-08-13 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-3258:
-

 Summary: Remove Frameworkinfo capabilities on re-registration
 Key: MESOS-3258
 URL: https://issues.apache.org/jira/browse/MESOS-3258
 Project: Mesos
  Issue Type: Bug
Reporter: Vinod Kone
Assignee: Aditi Dixit


Add support for removing capabilities. The idea is that we leave the running 
revocable tasks as it, but the framework will not got any new revocable offers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3149) Use setuptools to install python cli package

2015-08-13 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695571#comment-14695571
 ] 

haosdent commented on MESOS-3149:
-

Hi, [~vinodkone] [~tillt] Let me describe my idea about this patch here. This 
patch try to fix this problem: when execute mesos-ps, mesos-cat, mesos-scp, 
mesos-tail, we would got this error.
{code}
Traceback (most recent call last):
  File /usr/local/bin/mesos-cat, line 12, in module
from mesos import http
ImportError: cannot import name http
{code}

So I think this patch is necessary to 0.24, could you help review this and 
commit it? Or we have a better way to fix the problem above? Thank you in 
advance. Also thank [~marco-mesos] help push this patch. ;-)

 Use setuptools to install python cli package
 

 Key: MESOS-3149
 URL: https://issues.apache.org/jira/browse/MESOS-3149
 Project: Mesos
  Issue Type: Task
Reporter: haosdent
Assignee: haosdent

 mesos-ps/mesos-cat which depends on src/cli/python/mesos could not work in 
 OSX because src/cli/python is not installed to sys.path. It's time to 
 finish this TODO.
  
 {code}
 # Add 'src/cli/python' to PYTHONPATH.
 # TODO(benh): Remove this if/when we install the 'mesos' module via
 # PIP and setuptools.
 PYTHONPATH=@abs_top_srcdir@/src/cli/python:${PYTHONPATH}
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3260) SchedulerTest.* are broken on OSX and CentOS

2015-08-13 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696145#comment-14696145
 ] 

Vinod Kone commented on MESOS-3260:
---

I'm reopening this to track the proper fix.

 SchedulerTest.* are broken on OSX and CentOS
 

 Key: MESOS-3260
 URL: https://issues.apache.org/jira/browse/MESOS-3260
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.24.0
 Environment: OSX 10.10.5 (14F6a),
 Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)
Reporter: Till Toenshoff
Assignee: Vinod Kone
Priority: Blocker
 Fix For: 0.24.0


 Running a plain configure and make check on OSX does currently lead to the 
 following:
 {noformat}
 [ RUN  ] SchedulerTest.Subscribe
 ../../src/tests/scheduler_tests.cpp:168: Failure
 Value of: event.get().type()
   Actual: HEARTBEAT
 Expected: Event::SUBSCRIBED
 Which is: SUBSCRIBED
 ../../src/tests/scheduler_tests.cpp:169: Failure
 Value of: event.get().subscribed().framework_id()
   Actual:
 Expected: id
 Which is: 20150813-222454-347252928-56290-60707-
 [  FAILED  ] SchedulerTest.Subscribe (183 ms)
 [ RUN  ] SchedulerTest.TaskRunning
 ../../src/tests/scheduler_tests.cpp:227: Failure
 Value of: event.get().type()
   Actual: HEARTBEAT
 Expected: Event::OFFERS
 Which is: OFFERS
 ../../src/tests/scheduler_tests.cpp:228: Failure
 Expected: (0) != (event.get().offers().offers().size()), actual: 0 vs 0
 [libprotobuf FATAL 
 ../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/google/protobuf/repeated_field.h:824]
  CHECK failed: (index)  (size()):
 ../../src/tests/scheduler_tests.cpp:237: Failure
 Actual function call count doesn't match EXPECT_CALL(containerizer, update(_, 
 _))...
  Expected: to be called at least once
Actual: never called - unsatisfied and active
 ../../src/tests/scheduler_tests.cpp:233: Failure
 Actual function call count doesn't match EXPECT_CALL(exec, launchTask(_, 
 _))...
  Expected: to be called once
Actual: never called - unsatisfied and active
 ../../src/tests/scheduler_tests.cpp:230: Failure
 Actual function call count doesn't match EXPECT_CALL(exec, registered(_, _, 
 _, _))...
  Expected: to be called once
Actual: never called - unsatisfied and active
 unknown file: Failure
 C++ exception with description CHECK failed: (index)  (size()):  thrown in 
 the test body.
 *** Aborted at 1439497494 (unix time) try date -d @1439497494 if you are 
 using GNU date ***
 PC: @ 0x7fb2c0f20490 (unknown)
 *** SIGBUS (@0x7fb2c0f20490) received by PID 60707 (TID 0x7fff7a876300) stack 
 trace: ***
 @ 0x7fff8a77ef1a _sigtramp
 @ 0x7fff532c9990 (unknown)
 @0x10d3bcedb mesos::internal::tests::MesosTest::ShutdownSlaves()
 @0x10d3bce75 mesos::internal::tests::MesosTest::Shutdown()
 @0x10d3b7d47 mesos::internal::tests::MesosTest::TearDown()
 @0x10dbc8283 
 testing::internal::HandleSehExceptionsInMethodIfSupported()
 @0x10dbafab7 
 testing::internal::HandleExceptionsInMethodIfSupported()
 @0x10db6f8ba testing::Test::Run()
 @0x10db70deb testing::TestInfo::Run()
 @0x10db71ab7 testing::TestCase::Run()
 @0x10db804b3 testing::internal::UnitTestImpl::RunAllTests()
 @0x10dbc4fe3 
 testing::internal::HandleSehExceptionsInMethodIfSupported()
 @0x10dbb1ea7 
 testing::internal::HandleExceptionsInMethodIfSupported()
 @0x10db800b0 testing::UnitTest::Run()
 @0x10d10c8d1 RUN_ALL_TESTS()
 @0x10d108b87 main
 @ 0x7fff8da765c9 start
 Bus error: 10
 {noformat}
 Results on CentOS look similar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3256) Consistent naming of http request methods.

2015-08-13 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695791#comment-14695791
 ] 

Benjamin Mahler commented on MESOS-3256:


Also consider just taking a Request object per the TODO:
https://github.com/apache/mesos/blob/0.23.0/3rdparty/libprocess/include/process/http.hpp#L649

 Consistent naming of http request methods.
 --

 Key: MESOS-3256
 URL: https://issues.apache.org/jira/browse/MESOS-3256
 Project: Mesos
  Issue Type: Task
Reporter: Joerg Schad

 Currently the http requests in libprocess/http.hpp are named post(), put(), 
 and get(). This naming scheme did not for the addition of delete with 
 Mesos-3152 as delete is a C++ keyword and hence that call was named 
 deleteRequest.
 We should come up with a consistent naming scheme which is easily 
 understandable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3154) Enable Mesos Agent Node to use arbitrary script / module to figure out IP, HOSTNAME

2015-08-13 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696219#comment-14696219
 ] 

Marco Massenzio commented on MESOS-3154:


This is essentially a copy  paste of the code in {{src/master/main.cpp}} - 
doesn't make me proud, but at least should speed up review  commit :)

 Enable Mesos Agent Node to use arbitrary script / module to figure out IP, 
 HOSTNAME
 ---

 Key: MESOS-3154
 URL: https://issues.apache.org/jira/browse/MESOS-3154
 Project: Mesos
  Issue Type: Story
  Components: slave
Reporter: Benjamin Hindman
Assignee: Marco Massenzio
  Labels: mesosphere

 Following from MESOS-2902 we want to enable the same functionality in the 
 Mesos Agents too.
 This is probably best done once we implement the new {{os::shell}} semantics, 
 as described in MESOS-3142.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2688) Slave should kill revocable tasks if oversubscription is disabled

2015-08-13 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-2688:
--
Sprint:   (was: Twitter Mesos Q3 Sprint 3)

 Slave should kill revocable tasks if oversubscription is disabled
 -

 Key: MESOS-2688
 URL: https://issues.apache.org/jira/browse/MESOS-2688
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
Assignee: Jie Yu
  Labels: twitter

 If oversubscription is disabled on a restarted slave (that had it previously 
 enabled), it should kill revocable tasks.
 Slave knows this information from the Resources of a container that it 
 checkpoints and recovers.
 Add a new reason OVERSUBSCRIPTION_DISABLED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2695) Add master flag to enable/disable oversubscription

2015-08-13 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-2695:
--
Sprint:   (was: Twitter Mesos Q3 Sprint 3)

 Add master flag to enable/disable oversubscription
 --

 Key: MESOS-2695
 URL: https://issues.apache.org/jira/browse/MESOS-2695
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
Assignee: Jie Yu
  Labels: twitter

 This flag lets an operator control cluster level oversubscription. 
 The master should send revocable offers to framework iff this flag is enabled 
 and the framework opts in to receive them.
 Master should ignore revocable resources from slaves if the flag is disabled.
 Need tests for all these scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3260) SchedulerTest.* are broken on OSX and CentOS

2015-08-13 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696187#comment-14696187
 ] 

Vinod Kone edited comment on MESOS-3260 at 8/14/15 12:03 AM:
-

commit 2a391e8036303f08aa42dc9c66c210940ec8d21f
Author: Vinod Kone vinodk...@gmail.com
Date:   Thu Aug 13 16:17:52 2015 -0700

Fixed scheduler tests to work with heartbeats.

Review: https://reviews.apache.org/r/37449




was (Author: vinodkone):
commit f011b0a98e5a1c28f6c670102374f5317488fa03
Author: Vinod Kone vinodk...@gmail.com
Date:   Thu Aug 13 16:17:52 2015 -0700

Fixed scheduler tests to work with heartbeats.

Review: https://reviews.apache.org/r/37449


 SchedulerTest.* are broken on OSX and CentOS
 

 Key: MESOS-3260
 URL: https://issues.apache.org/jira/browse/MESOS-3260
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.24.0
 Environment: OSX 10.10.5 (14F6a),
 Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)
Reporter: Till Toenshoff
Assignee: Vinod Kone
Priority: Blocker
 Fix For: 0.24.0


 Running a plain configure and make check on OSX does currently lead to the 
 following:
 {noformat}
 [ RUN  ] SchedulerTest.Subscribe
 ../../src/tests/scheduler_tests.cpp:168: Failure
 Value of: event.get().type()
   Actual: HEARTBEAT
 Expected: Event::SUBSCRIBED
 Which is: SUBSCRIBED
 ../../src/tests/scheduler_tests.cpp:169: Failure
 Value of: event.get().subscribed().framework_id()
   Actual:
 Expected: id
 Which is: 20150813-222454-347252928-56290-60707-
 [  FAILED  ] SchedulerTest.Subscribe (183 ms)
 [ RUN  ] SchedulerTest.TaskRunning
 ../../src/tests/scheduler_tests.cpp:227: Failure
 Value of: event.get().type()
   Actual: HEARTBEAT
 Expected: Event::OFFERS
 Which is: OFFERS
 ../../src/tests/scheduler_tests.cpp:228: Failure
 Expected: (0) != (event.get().offers().offers().size()), actual: 0 vs 0
 [libprotobuf FATAL 
 ../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/google/protobuf/repeated_field.h:824]
  CHECK failed: (index)  (size()):
 ../../src/tests/scheduler_tests.cpp:237: Failure
 Actual function call count doesn't match EXPECT_CALL(containerizer, update(_, 
 _))...
  Expected: to be called at least once
Actual: never called - unsatisfied and active
 ../../src/tests/scheduler_tests.cpp:233: Failure
 Actual function call count doesn't match EXPECT_CALL(exec, launchTask(_, 
 _))...
  Expected: to be called once
Actual: never called - unsatisfied and active
 ../../src/tests/scheduler_tests.cpp:230: Failure
 Actual function call count doesn't match EXPECT_CALL(exec, registered(_, _, 
 _, _))...
  Expected: to be called once
Actual: never called - unsatisfied and active
 unknown file: Failure
 C++ exception with description CHECK failed: (index)  (size()):  thrown in 
 the test body.
 *** Aborted at 1439497494 (unix time) try date -d @1439497494 if you are 
 using GNU date ***
 PC: @ 0x7fb2c0f20490 (unknown)
 *** SIGBUS (@0x7fb2c0f20490) received by PID 60707 (TID 0x7fff7a876300) stack 
 trace: ***
 @ 0x7fff8a77ef1a _sigtramp
 @ 0x7fff532c9990 (unknown)
 @0x10d3bcedb mesos::internal::tests::MesosTest::ShutdownSlaves()
 @0x10d3bce75 mesos::internal::tests::MesosTest::Shutdown()
 @0x10d3b7d47 mesos::internal::tests::MesosTest::TearDown()
 @0x10dbc8283 
 testing::internal::HandleSehExceptionsInMethodIfSupported()
 @0x10dbafab7 
 testing::internal::HandleExceptionsInMethodIfSupported()
 @0x10db6f8ba testing::Test::Run()
 @0x10db70deb testing::TestInfo::Run()
 @0x10db71ab7 testing::TestCase::Run()
 @0x10db804b3 testing::internal::UnitTestImpl::RunAllTests()
 @0x10dbc4fe3 
 testing::internal::HandleSehExceptionsInMethodIfSupported()
 @0x10dbb1ea7 
 testing::internal::HandleExceptionsInMethodIfSupported()
 @0x10db800b0 testing::UnitTest::Run()
 @0x10d10c8d1 RUN_ALL_TESTS()
 @0x10d108b87 main
 @ 0x7fff8da765c9 start
 Bus error: 10
 {noformat}
 Results on CentOS look similar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3260) SchedulerTest.* are broken on OSX and CentOS

2015-08-13 Thread Till Toenshoff (JIRA)
Till Toenshoff created MESOS-3260:
-

 Summary: SchedulerTest.* are broken on OSX and CentOS
 Key: MESOS-3260
 URL: https://issues.apache.org/jira/browse/MESOS-3260
 Project: Mesos
  Issue Type: Bug
 Environment: OSX 10.10.5 (14F6a),
Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)
Reporter: Till Toenshoff
Priority: Blocker


Running a plain configure and make check on OSX does currently lead to the 
following:

{noformat}
[ RUN  ] SchedulerTest.Subscribe
../../src/tests/scheduler_tests.cpp:168: Failure
Value of: event.get().type()
  Actual: HEARTBEAT
Expected: Event::SUBSCRIBED
Which is: SUBSCRIBED
../../src/tests/scheduler_tests.cpp:169: Failure
Value of: event.get().subscribed().framework_id()
  Actual:
Expected: id
Which is: 20150813-222454-347252928-56290-60707-
[  FAILED  ] SchedulerTest.Subscribe (183 ms)
[ RUN  ] SchedulerTest.TaskRunning
../../src/tests/scheduler_tests.cpp:227: Failure
Value of: event.get().type()
  Actual: HEARTBEAT
Expected: Event::OFFERS
Which is: OFFERS
../../src/tests/scheduler_tests.cpp:228: Failure
Expected: (0) != (event.get().offers().offers().size()), actual: 0 vs 0
[libprotobuf FATAL 
../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/google/protobuf/repeated_field.h:824]
 CHECK failed: (index)  (size()):
../../src/tests/scheduler_tests.cpp:237: Failure
Actual function call count doesn't match EXPECT_CALL(containerizer, update(_, 
_))...
 Expected: to be called at least once
   Actual: never called - unsatisfied and active
../../src/tests/scheduler_tests.cpp:233: Failure
Actual function call count doesn't match EXPECT_CALL(exec, launchTask(_, _))...
 Expected: to be called once
   Actual: never called - unsatisfied and active
../../src/tests/scheduler_tests.cpp:230: Failure
Actual function call count doesn't match EXPECT_CALL(exec, registered(_, _, _, 
_))...
 Expected: to be called once
   Actual: never called - unsatisfied and active
unknown file: Failure
C++ exception with description CHECK failed: (index)  (size()):  thrown in 
the test body.
*** Aborted at 1439497494 (unix time) try date -d @1439497494 if you are 
using GNU date ***
PC: @ 0x7fb2c0f20490 (unknown)
*** SIGBUS (@0x7fb2c0f20490) received by PID 60707 (TID 0x7fff7a876300) stack 
trace: ***
@ 0x7fff8a77ef1a _sigtramp
@ 0x7fff532c9990 (unknown)
@0x10d3bcedb mesos::internal::tests::MesosTest::ShutdownSlaves()
@0x10d3bce75 mesos::internal::tests::MesosTest::Shutdown()
@0x10d3b7d47 mesos::internal::tests::MesosTest::TearDown()
@0x10dbc8283 
testing::internal::HandleSehExceptionsInMethodIfSupported()
@0x10dbafab7 
testing::internal::HandleExceptionsInMethodIfSupported()
@0x10db6f8ba testing::Test::Run()
@0x10db70deb testing::TestInfo::Run()
@0x10db71ab7 testing::TestCase::Run()
@0x10db804b3 testing::internal::UnitTestImpl::RunAllTests()
@0x10dbc4fe3 
testing::internal::HandleSehExceptionsInMethodIfSupported()
@0x10dbb1ea7 
testing::internal::HandleExceptionsInMethodIfSupported()
@0x10db800b0 testing::UnitTest::Run()
@0x10d10c8d1 RUN_ALL_TESTS()
@0x10d108b87 main
@ 0x7fff8da765c9 start
Bus error: 10
{noformat}

Results on CentOS look similar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3260) SchedulerTest.* are broken on OSX and CentOS

2015-08-13 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695900#comment-14695900
 ] 

Vinod Kone commented on MESOS-3260:
---

I'm reverting some patches while we investigate the issue.

 SchedulerTest.* are broken on OSX and CentOS
 

 Key: MESOS-3260
 URL: https://issues.apache.org/jira/browse/MESOS-3260
 Project: Mesos
  Issue Type: Bug
 Environment: OSX 10.10.5 (14F6a),
 Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)
Reporter: Till Toenshoff
Priority: Blocker

 Running a plain configure and make check on OSX does currently lead to the 
 following:
 {noformat}
 [ RUN  ] SchedulerTest.Subscribe
 ../../src/tests/scheduler_tests.cpp:168: Failure
 Value of: event.get().type()
   Actual: HEARTBEAT
 Expected: Event::SUBSCRIBED
 Which is: SUBSCRIBED
 ../../src/tests/scheduler_tests.cpp:169: Failure
 Value of: event.get().subscribed().framework_id()
   Actual:
 Expected: id
 Which is: 20150813-222454-347252928-56290-60707-
 [  FAILED  ] SchedulerTest.Subscribe (183 ms)
 [ RUN  ] SchedulerTest.TaskRunning
 ../../src/tests/scheduler_tests.cpp:227: Failure
 Value of: event.get().type()
   Actual: HEARTBEAT
 Expected: Event::OFFERS
 Which is: OFFERS
 ../../src/tests/scheduler_tests.cpp:228: Failure
 Expected: (0) != (event.get().offers().offers().size()), actual: 0 vs 0
 [libprotobuf FATAL 
 ../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/google/protobuf/repeated_field.h:824]
  CHECK failed: (index)  (size()):
 ../../src/tests/scheduler_tests.cpp:237: Failure
 Actual function call count doesn't match EXPECT_CALL(containerizer, update(_, 
 _))...
  Expected: to be called at least once
Actual: never called - unsatisfied and active
 ../../src/tests/scheduler_tests.cpp:233: Failure
 Actual function call count doesn't match EXPECT_CALL(exec, launchTask(_, 
 _))...
  Expected: to be called once
Actual: never called - unsatisfied and active
 ../../src/tests/scheduler_tests.cpp:230: Failure
 Actual function call count doesn't match EXPECT_CALL(exec, registered(_, _, 
 _, _))...
  Expected: to be called once
Actual: never called - unsatisfied and active
 unknown file: Failure
 C++ exception with description CHECK failed: (index)  (size()):  thrown in 
 the test body.
 *** Aborted at 1439497494 (unix time) try date -d @1439497494 if you are 
 using GNU date ***
 PC: @ 0x7fb2c0f20490 (unknown)
 *** SIGBUS (@0x7fb2c0f20490) received by PID 60707 (TID 0x7fff7a876300) stack 
 trace: ***
 @ 0x7fff8a77ef1a _sigtramp
 @ 0x7fff532c9990 (unknown)
 @0x10d3bcedb mesos::internal::tests::MesosTest::ShutdownSlaves()
 @0x10d3bce75 mesos::internal::tests::MesosTest::Shutdown()
 @0x10d3b7d47 mesos::internal::tests::MesosTest::TearDown()
 @0x10dbc8283 
 testing::internal::HandleSehExceptionsInMethodIfSupported()
 @0x10dbafab7 
 testing::internal::HandleExceptionsInMethodIfSupported()
 @0x10db6f8ba testing::Test::Run()
 @0x10db70deb testing::TestInfo::Run()
 @0x10db71ab7 testing::TestCase::Run()
 @0x10db804b3 testing::internal::UnitTestImpl::RunAllTests()
 @0x10dbc4fe3 
 testing::internal::HandleSehExceptionsInMethodIfSupported()
 @0x10dbb1ea7 
 testing::internal::HandleExceptionsInMethodIfSupported()
 @0x10db800b0 testing::UnitTest::Run()
 @0x10d10c8d1 RUN_ALL_TESTS()
 @0x10d108b87 main
 @ 0x7fff8da765c9 start
 Bus error: 10
 {noformat}
 Results on CentOS look similar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3149) Use setuptools to install python cli package

2015-08-13 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695761#comment-14695761
 ] 

Marco Massenzio commented on MESOS-3149:


I've asked a couple of colleagues at Mesosphere who are familiar with Python to 
also review the patch, and we all seem to agree that the changes are fine.
We will be committing this patch soon.

 Use setuptools to install python cli package
 

 Key: MESOS-3149
 URL: https://issues.apache.org/jira/browse/MESOS-3149
 Project: Mesos
  Issue Type: Task
Reporter: haosdent
Assignee: haosdent

 mesos-ps/mesos-cat which depends on src/cli/python/mesos could not work in 
 OSX because src/cli/python is not installed to sys.path. It's time to 
 finish this TODO.
  
 {code}
 # Add 'src/cli/python' to PYTHONPATH.
 # TODO(benh): Remove this if/when we install the 'mesos' module via
 # PIP and setuptools.
 PYTHONPATH=@abs_top_srcdir@/src/cli/python:${PYTHONPATH}
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2841) FrameworkInfo should include a Labels field to support arbitrary, lightweight metadata

2015-08-13 Thread James DeFelice (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695826#comment-14695826
 ] 

James DeFelice commented on MESOS-2841:
---

I've addressed the outstanding review comments and submitted an updated patch 
for review:
https://reviews.apache.org/r/37443/


 FrameworkInfo should include a Labels field to support arbitrary, lightweight 
 metadata
 --

 Key: MESOS-2841
 URL: https://issues.apache.org/jira/browse/MESOS-2841
 Project: Mesos
  Issue Type: Improvement
Reporter: James DeFelice
Assignee: Neil Conway
  Labels: mesosphere

 A framework instance may offer specific capabilities to the cluster: storage, 
 smartly-balanced request handling across deployed tasks, access to 3rd party 
 services outside of the cluster, etc. These capabilities may or may not be 
 utilized by all, or even most mesos clusters. However, it should be possible 
 for processes running in the cluster to discover capabilities or features of 
 frameworks in order to achieve a higher level of functionality and a more 
 seamless integration experience across the cluster.
 A rich discovery API attached to the FrameworkInfo could result in some form 
 of early lock-in: there are probably many ways to realize cross-framework 
 integration and external services integration that we haven't considered yet. 
 Rather than over-specify a discovery info message type at the framework level 
 I think FrameworkInfo should expose a **very generic** way to supply metadata 
 for interested consumers (other processes, tasks, etc).
 Adding a Labels field to FrameworkInfo reuses an existing message type and 
 seems to fit well with the overall intent: attaching generic metadata to a 
 framework instance. These labels should be visible when querying a mesos 
 master's state.json endpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2841) FrameworkInfo should include a Labels field to support arbitrary, lightweight metadata

2015-08-13 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695852#comment-14695852
 ] 

Neil Conway commented on MESOS-2841:


[~jdef] Thanks for finishing this off! My apologies for being flaky 
(vacation/travel, etc.). I'm back now, in case any more work is needed here.

 FrameworkInfo should include a Labels field to support arbitrary, lightweight 
 metadata
 --

 Key: MESOS-2841
 URL: https://issues.apache.org/jira/browse/MESOS-2841
 Project: Mesos
  Issue Type: Improvement
Reporter: James DeFelice
Assignee: Neil Conway
  Labels: mesosphere

 A framework instance may offer specific capabilities to the cluster: storage, 
 smartly-balanced request handling across deployed tasks, access to 3rd party 
 services outside of the cluster, etc. These capabilities may or may not be 
 utilized by all, or even most mesos clusters. However, it should be possible 
 for processes running in the cluster to discover capabilities or features of 
 frameworks in order to achieve a higher level of functionality and a more 
 seamless integration experience across the cluster.
 A rich discovery API attached to the FrameworkInfo could result in some form 
 of early lock-in: there are probably many ways to realize cross-framework 
 integration and external services integration that we haven't considered yet. 
 Rather than over-specify a discovery info message type at the framework level 
 I think FrameworkInfo should expose a **very generic** way to supply metadata 
 for interested consumers (other processes, tasks, etc).
 Adding a Labels field to FrameworkInfo reuses an existing message type and 
 seems to fit well with the overall intent: attaching generic metadata to a 
 framework instance. These labels should be visible when querying a mesos 
 master's state.json endpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2497) Create synchronous validations for Calls

2015-08-13 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695769#comment-14695769
 ] 

Benjamin Mahler commented on MESOS-2497:


Proper validation of the 'Accept' header and a bug fix:

{noformat}
commit 61f71f05ad2a7e9205565437b3243aa84072bf84
Author: Isabel Jimenez cont...@isabeljimenez.com
Date:   Thu Aug 13 10:37:19 2015 -0700

Updated /scheduler endopint to use Request::acceptsMediaType.

Review: https://reviews.apache.org/r/37403
{noformat}

{noformat}
commit b3c18d6d6179ac34be89545dc3b8a9333c91ebb7
Author: Benjamin Mahler benjamin.mah...@gmail.com
Date:   Thu Aug 13 11:43:06 2015 -0700

Ensure the Content-Type is set for the streaming scheduler endpoint.
{noformat}

 Create synchronous validations for Calls
 

 Key: MESOS-2497
 URL: https://issues.apache.org/jira/browse/MESOS-2497
 Project: Mesos
  Issue Type: Bug
Reporter: Isabel Jimenez
Assignee: Isabel Jimenez
  Labels: HTTP, mesosphere

 /call endpoint will return a 202 accepted code but has to do some basic 
 validations before. In case of invalidation it will return a 4xx code. We 
 have to create a mechanism that will validate the 'request' and send back the 
 appropriate code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3259) Support health checks in Docker Containerizer

2015-08-13 Thread Timothy Chen (JIRA)
Timothy Chen created MESOS-3259:
---

 Summary: Support health checks in Docker Containerizer 
 Key: MESOS-3259
 URL: https://issues.apache.org/jira/browse/MESOS-3259
 Project: Mesos
  Issue Type: Improvement
  Components: docker
Reporter: Timothy Chen
Assignee: Jojy Varghese


We need to support docker exec health checks in a container within the docker 
executor.

A health check is defined in a TaskInfo and it's not supported in the Docker 
Containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3260) SchedulerTest.* are broken on OSX and CentOS

2015-08-13 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-3260:
--
 Assignee: Vinod Kone
Affects Version/s: 0.24.0
 Target Version/s: 0.24.0

 SchedulerTest.* are broken on OSX and CentOS
 

 Key: MESOS-3260
 URL: https://issues.apache.org/jira/browse/MESOS-3260
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.24.0
 Environment: OSX 10.10.5 (14F6a),
 Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)
Reporter: Till Toenshoff
Assignee: Vinod Kone
Priority: Blocker

 Running a plain configure and make check on OSX does currently lead to the 
 following:
 {noformat}
 [ RUN  ] SchedulerTest.Subscribe
 ../../src/tests/scheduler_tests.cpp:168: Failure
 Value of: event.get().type()
   Actual: HEARTBEAT
 Expected: Event::SUBSCRIBED
 Which is: SUBSCRIBED
 ../../src/tests/scheduler_tests.cpp:169: Failure
 Value of: event.get().subscribed().framework_id()
   Actual:
 Expected: id
 Which is: 20150813-222454-347252928-56290-60707-
 [  FAILED  ] SchedulerTest.Subscribe (183 ms)
 [ RUN  ] SchedulerTest.TaskRunning
 ../../src/tests/scheduler_tests.cpp:227: Failure
 Value of: event.get().type()
   Actual: HEARTBEAT
 Expected: Event::OFFERS
 Which is: OFFERS
 ../../src/tests/scheduler_tests.cpp:228: Failure
 Expected: (0) != (event.get().offers().offers().size()), actual: 0 vs 0
 [libprotobuf FATAL 
 ../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/google/protobuf/repeated_field.h:824]
  CHECK failed: (index)  (size()):
 ../../src/tests/scheduler_tests.cpp:237: Failure
 Actual function call count doesn't match EXPECT_CALL(containerizer, update(_, 
 _))...
  Expected: to be called at least once
Actual: never called - unsatisfied and active
 ../../src/tests/scheduler_tests.cpp:233: Failure
 Actual function call count doesn't match EXPECT_CALL(exec, launchTask(_, 
 _))...
  Expected: to be called once
Actual: never called - unsatisfied and active
 ../../src/tests/scheduler_tests.cpp:230: Failure
 Actual function call count doesn't match EXPECT_CALL(exec, registered(_, _, 
 _, _))...
  Expected: to be called once
Actual: never called - unsatisfied and active
 unknown file: Failure
 C++ exception with description CHECK failed: (index)  (size()):  thrown in 
 the test body.
 *** Aborted at 1439497494 (unix time) try date -d @1439497494 if you are 
 using GNU date ***
 PC: @ 0x7fb2c0f20490 (unknown)
 *** SIGBUS (@0x7fb2c0f20490) received by PID 60707 (TID 0x7fff7a876300) stack 
 trace: ***
 @ 0x7fff8a77ef1a _sigtramp
 @ 0x7fff532c9990 (unknown)
 @0x10d3bcedb mesos::internal::tests::MesosTest::ShutdownSlaves()
 @0x10d3bce75 mesos::internal::tests::MesosTest::Shutdown()
 @0x10d3b7d47 mesos::internal::tests::MesosTest::TearDown()
 @0x10dbc8283 
 testing::internal::HandleSehExceptionsInMethodIfSupported()
 @0x10dbafab7 
 testing::internal::HandleExceptionsInMethodIfSupported()
 @0x10db6f8ba testing::Test::Run()
 @0x10db70deb testing::TestInfo::Run()
 @0x10db71ab7 testing::TestCase::Run()
 @0x10db804b3 testing::internal::UnitTestImpl::RunAllTests()
 @0x10dbc4fe3 
 testing::internal::HandleSehExceptionsInMethodIfSupported()
 @0x10dbb1ea7 
 testing::internal::HandleExceptionsInMethodIfSupported()
 @0x10db800b0 testing::UnitTest::Run()
 @0x10d10c8d1 RUN_ALL_TESTS()
 @0x10d108b87 main
 @ 0x7fff8da765c9 start
 Bus error: 10
 {noformat}
 Results on CentOS look similar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3187) Docker cli option support

2015-08-13 Thread Vaibhav Khanduja (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Khanduja updated MESOS-3187:

Issue Type: Bug  (was: Improvement)

 Docker cli option support
 -

 Key: MESOS-3187
 URL: https://issues.apache.org/jira/browse/MESOS-3187
 Project: Mesos
  Issue Type: Bug
  Components: docker, slave
Reporter: Vaibhav Khanduja
Assignee: Vaibhav Khanduja
Priority: Minor

 Mesos slave today support docker as a container environment. The docker cli 
 support much more options than what is supported by mesos slave. The slave 
 command line option should be enhanced support such parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2406) Add CLI tool for creating persistent volumes for pre-existing data

2015-08-13 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696387#comment-14696387
 ] 

Klaus Ma commented on MESOS-2406:
-

If no developer working on this, i'd like to have a try.

 Add CLI tool for creating persistent volumes for pre-existing data
 --

 Key: MESOS-2406
 URL: https://issues.apache.org/jira/browse/MESOS-2406
 Project: Mesos
  Issue Type: Task
Reporter: Jie Yu

 This is for the case where the user has some pre-existing data under a 
 certain directory (e.g., /var/lib/cassandra) and wants to expose that 
 directory as a persistent volume to the framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2406) Add CLI tool for creating persistent volumes for pre-existing data

2015-08-13 Thread Klaus Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaus Ma reassigned MESOS-2406:
---

Assignee: Klaus Ma

 Add CLI tool for creating persistent volumes for pre-existing data
 --

 Key: MESOS-2406
 URL: https://issues.apache.org/jira/browse/MESOS-2406
 Project: Mesos
  Issue Type: Task
Reporter: Jie Yu
Assignee: Klaus Ma

 This is for the case where the user has some pre-existing data under a 
 certain directory (e.g., /var/lib/cassandra) and wants to expose that 
 directory as a persistent volume to the framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3189) TimeTest.Now fails with --enable-libevent

2015-08-13 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696461#comment-14696461
 ] 

Vinod Kone commented on MESOS-3189:
---

Try running the test in a loop to see if you can repro.

make check GTEST_FILTER=* TimeTest.Now* GTEST_REPEAT=-1 
GTEST_BREAK_ON_FAILURE=1

 TimeTest.Now fails with --enable-libevent
 -

 Key: MESOS-3189
 URL: https://issues.apache.org/jira/browse/MESOS-3189
 Project: Mesos
  Issue Type: Bug
  Components: libprocess
Affects Versions: 0.23.0
Reporter: Joris Van Remoortere
  Labels: beginner, libprocess, mesosphere, newbie

 [ RUN  ] TimeTest.Now
 ../../../3rdparty/libprocess/src/tests/time_tests.cpp:50: Failure
 Expected: (Microseconds(10))  (Clock::now() - t1), actual: 8-byte object 
 10-27 00-00 00-00 00-00 vs 0ns
 [  FAILED  ] TimeTest.Now (0 ms)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3187) Docker cli option support

2015-08-13 Thread Vaibhav Khanduja (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Khanduja updated MESOS-3187:

Issue Type: Improvement  (was: Bug)

 Docker cli option support
 -

 Key: MESOS-3187
 URL: https://issues.apache.org/jira/browse/MESOS-3187
 Project: Mesos
  Issue Type: Improvement
  Components: docker, slave
Reporter: Vaibhav Khanduja
Assignee: Vaibhav Khanduja
Priority: Minor

 Mesos slave today support docker as a container environment. The docker cli 
 support much more options than what is supported by mesos slave. The slave 
 command line option should be enhanced support such parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2516) Move allocation-related types to mesos::master namespace

2015-08-13 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MESOS-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696326#comment-14696326
 ] 

José Guilherme Vanz commented on MESOS-2516:


https://reviews.apache.org/r/37468/

 Move allocation-related types to mesos::master namespace
 

 Key: MESOS-2516
 URL: https://issues.apache.org/jira/browse/MESOS-2516
 Project: Mesos
  Issue Type: Improvement
  Components: allocation
Reporter: Alexander Rukletsov
Assignee: José Guilherme Vanz
Priority: Minor
  Labels: easyfix, newbie

 {{Allocator}}, {{Sorter}} and {{Comaprator}} types live in 
 {{master::allocator}} namespace. This is not consistent with the rest of the 
 codebase: {{Isolator}}, {{Fetcher}}, {{Containerizer}} all live in {{slave}} 
 namespace. Namespace {{allocator}} should be killed for consistency.
 Since sorters are poorly named, they should be renamed (or namespaced) prior 
 to this change in order not to pollute {{master}} namespace. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3189) TimeTest.Now fails with --enable-libevent

2015-08-13 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MESOS-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696377#comment-14696377
 ] 

José Guilherme Vanz commented on MESOS-3189:


I'm trying simulate these problem, but I could not.

I  executed `configure --enable-libevent` and `make check` in the current HEAD 
commit and the `TimeTest.Now(0ms)` test is passing. 
I'm using Fedora 22, maybe are there some issue in the libevent in other OS? 
What is your environment? 

 TimeTest.Now fails with --enable-libevent
 -

 Key: MESOS-3189
 URL: https://issues.apache.org/jira/browse/MESOS-3189
 Project: Mesos
  Issue Type: Bug
  Components: libprocess
Affects Versions: 0.23.0
Reporter: Joris Van Remoortere
  Labels: beginner, libprocess, mesosphere, newbie

 [ RUN  ] TimeTest.Now
 ../../../3rdparty/libprocess/src/tests/time_tests.cpp:50: Failure
 Expected: (Microseconds(10))  (Clock::now() - t1), actual: 8-byte object 
 10-27 00-00 00-00 00-00 vs 0ns
 [  FAILED  ] TimeTest.Now (0 ms)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3189) TimeTest.Now fails with --enable-libevent

2015-08-13 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MESOS-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696377#comment-14696377
 ] 

José Guilherme Vanz edited comment on MESOS-3189 at 8/14/15 3:32 AM:
-

I was trying simulate this issue, but I could not.

I  executed `configure --enable-libevent` and `make check` in the current HEAD 
commit and the `TimeTest.Now(0ms)` test is passing. 
I'm using Fedora 22, maybe are there some issue in the libevent in other OS? 
What is your environment? 


was (Author: jvanz):
I'm trying simulate these problem, but I could not.

I  executed `configure --enable-libevent` and `make check` in the current HEAD 
commit and the `TimeTest.Now(0ms)` test is passing. 
I'm using Fedora 22, maybe are there some issue in the libevent in other OS? 
What is your environment? 

 TimeTest.Now fails with --enable-libevent
 -

 Key: MESOS-3189
 URL: https://issues.apache.org/jira/browse/MESOS-3189
 Project: Mesos
  Issue Type: Bug
  Components: libprocess
Affects Versions: 0.23.0
Reporter: Joris Van Remoortere
  Labels: beginner, libprocess, mesosphere, newbie

 [ RUN  ] TimeTest.Now
 ../../../3rdparty/libprocess/src/tests/time_tests.cpp:50: Failure
 Expected: (Microseconds(10))  (Clock::now() - t1), actual: 8-byte object 
 10-27 00-00 00-00 00-00 vs 0ns
 [  FAILED  ] TimeTest.Now (0 ms)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2912) Provide a Python library for master detection

2015-08-13 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695435#comment-14695435
 ] 

Marco Massenzio commented on MESOS-2912:


Review posted at https://github.com/mesos/commons/pull/2

[~vinodkone] mentioned he had comments about it (but no review yet) - it would 
be good to have those in, before I proceed any further in adding more 
features/functionality to it.

 Provide a Python library for master detection
 -

 Key: MESOS-2912
 URL: https://issues.apache.org/jira/browse/MESOS-2912
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
Assignee: Marco Massenzio
  Labels: mesosphere

 When schedulers start interacting with Mesos master via HTTP endpoints, they 
 need a way to detect masters. 
 Mesos should provide a master detection Python library to make this easy for 
 frameworks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3070) Master CHECK failure if a framework uses duplicated task id.

2015-08-13 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695439#comment-14695439
 ] 

Vinod Kone commented on MESOS-3070:
---

Thanks for the update on progress.

Not sure how Option 1 works. If the old slave kills the duplicate task it will 
result in master getting terminal updates for the task which might confuse the 
master in thinking that the *new* task has terminated.

Option 2 is a heavy hammer because it might become scalability bottleneck if 
the master has to persist every task info.

Note that a duplicate task id is only a problem if the duplicate task is being 
launched on a different slave than the original slave. If it were the same 
slave, master would've rejected it!

So how about storing tasks in master in a per slave map instead of a global 
tasks map? That way master can be smarter when receiving duplicate task 
launches or status updates.


 Master CHECK failure if a framework uses duplicated task id.
 

 Key: MESOS-3070
 URL: https://issues.apache.org/jira/browse/MESOS-3070
 Project: Mesos
  Issue Type: Bug
  Components: master
Affects Versions: 0.22.1
Reporter: Jie Yu
Assignee: Klaus Ma

 We observed this in one of our testing cluster.
 One framework (under development) keeps launching tasks using the same 
 task_id. We don't expect the master to crash even if the framework is not 
 doing what it's supposed to do. However, under a series of events, this could 
 happen and keeps crashing the master.
 1) frameworkA launches task 'task_id_1' on slaveA
 2) master fails over
 3) slaveA has not re-registered yet
 4) frameworkA re-registered and launches task 'task_id_1' on slaveB
 5) slaveA re-registering and add task task_id_1' to frameworkA
 6) CHECK failure in addTask
 {noformat}
 I0716 21:52:50.759305 28805 master.hpp:159] Adding task 'task_id_1' with 
 resources cpus(*):4; mem(*):32768 on slave 
 20150417-232509-1735470090-5050-48870-S25 (hostname)
 ...
 ...
 F0716 21:52:50.760136 28805 master.hpp:362] Check failed: 
 !tasks.contains(task-task_id()) Duplicate task 'task_id_1' of framework 
 framework_id
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3257) Zookeeper JVM test failure causes test harness to fail

2015-08-13 Thread Paul Brett (JIRA)
Paul Brett created MESOS-3257:
-

 Summary: Zookeeper JVM test failure causes test harness to fail
 Key: MESOS-3257
 URL: https://issues.apache.org/jira/browse/MESOS-3257
 Project: Mesos
  Issue Type: Bug
Reporter: Paul Brett


Failure of the test setup for ZooKeeper Java setup causes test harness to exit, 
preventing subsequent tests from running.

{code}
[--] 2 tests from LogZooKeeperTest
F0813 16:09:33.647265 13790 zookeeper.cpp:78] CHECK_SOME(jvm): Error looking up 
symbol 'JNI_CreateJavaVM' in '' : 
/home/pbrett/sandbox/perf.refactor2/build/src/.libs/mesos-tests: undefined 
symbol: JNI_CreateJavaVM
*** Check failure stack trace: ***
@ 0x7f2d8cca7aac  google::LogMessage::Fail()
@ 0x7f2d8cca79fb  google::LogMessage::SendToLog()
@ 0x7f2d8cca740c  google::LogMessage::Flush()
@ 0x7f2d8ccaa140  google::LogMessageFatal::~LogMessageFatal()
@   0x8a938c  _CheckFatal::~_CheckFatal()
@  0x12f68c0  mesos::internal::tests::ZooKeeperTest::SetUpTestCase()
@  0x132a88a  testing::TestCase::RunSetUpTestCase()
@  0x1334cf7  
testing::internal::HandleSehExceptionsInMethodIfSupported()
@  0x132fb94  
testing::internal::HandleExceptionsInMethodIfSupported()
@  0x1311635  testing::TestCase::Run()
@  0x1317fca  testing::internal::UnitTestImpl::RunAllTests()
@  0x1335427  
testing::internal::HandleSehExceptionsInMethodIfSupported()
@  0x1330128  
testing::internal::HandleExceptionsInMethodIfSupported()
@  0x1316cf0  testing::UnitTest::Run()
@   0xc3a9d8  RUN_ALL_TESTS()
@   0xc3a6c8  main
@ 0x7f2d8818d9f4  __libc_start_main
@   0x8a5fa9  (unknown)
make[3]: *** [check-local] Aborted
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1013) ExamplesTest.JavaLog is flaky

2015-08-13 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-1013:
-
Shepherd: Till Toenshoff  (was: Joris Van Remoortere)

 ExamplesTest.JavaLog is flaky
 -

 Key: MESOS-1013
 URL: https://issues.apache.org/jira/browse/MESOS-1013
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 0.19.0
Reporter: Vinod Kone
Assignee: Greg Mann
  Labels: flaky, mesosphere
 Attachments: ExamplesTest.JavaLog.logs


 The {{ExamplesTest.JavaLog}} test framework is flaky, possibly related to a 
 race condition between mutexes.
 {noformat}
 [ RUN  ] ExamplesTest.JavaLog
 Using temporary directory '/tmp/ExamplesTest_JavaLog_WBWEb9'
 Feb 18, 2014 12:10:57 PM TestLog main
 INFO: Starting a local ZooKeeper server
 ...
 F0218 12:10:58.575036 17450 coordinator.cpp:394] Check failed: !missing Not 
 expecting local replica to be missing position 3 after the writing is done
 *** Check failure stack trace: ***
 tests/script.cpp:81: Failure
 Failed
 java_log_test.sh terminated with signal 'Aborted'
 [  FAILED  ] ExamplesTest.JavaLog (2166 ms)
 {noformat}
 Full logs attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)