[jira] [Commented] (MESOS-2297) Add authentication support for HTTP API

2015-07-03 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612951#comment-14612951
 ] 

Till Toenshoff commented on MESOS-2297:
---

+1

 Add authentication support for HTTP API
 ---

 Key: MESOS-2297
 URL: https://issues.apache.org/jira/browse/MESOS-2297
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
Assignee: Isabel Jimenez
  Labels: mesosphere

 To start with, we will only support basic http auth.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1457) Process IDs should be required to be human-readable

2015-07-07 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617296#comment-14617296
 ] 

Till Toenshoff commented on MESOS-1457:
---

I got pointed to this older issue as the patches did not get committed.

Seems Palak's solution is acceptable. It would be great if we could indeed get 
a comment into the ProcessBase constructor stating something like the proposed 
{noformat}
// Please provide a process ID prefix to ease debugging (See MESOS-1457).
{noformat}

[~PalakPC] could you possibly propose the above in a review-request and rebase 
those other two patches so we can get them committed?

 Process IDs should be required to be human-readable 
 

 Key: MESOS-1457
 URL: https://issues.apache.org/jira/browse/MESOS-1457
 Project: Mesos
  Issue Type: Improvement
  Components: libprocess
Reporter: Dominic Hamon
Assignee: Palak Choudhary
Priority: Minor

 When debugging, it's very useful to understand which processes are getting 
 timeslices. As such, the human-readable names that can be passed to 
 {{ProcessBase}} are incredibly valuable, however they are currently optional.
 If the constructor of {{ProcessBase}} took a mandatory string, every process 
 would get a human-readable name and debugging would be much easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3170) 0.23 Build fails when compiling against -lsasl2 which has been statically linked

2015-08-03 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff reassigned MESOS-3170:
-

Assignee: Till Toenshoff

 0.23 Build fails when compiling against -lsasl2 which has been statically 
 linked
 

 Key: MESOS-3170
 URL: https://issues.apache.org/jira/browse/MESOS-3170
 Project: Mesos
  Issue Type: Bug
  Components: build
Affects Versions: 0.23.0
Reporter: Chris Heller
Assignee: Till Toenshoff
Priority: Minor
  Labels: easyfix
 Fix For: 0.24.0


 If the sasl library has been statically linked the check from CRAM-MD5 can 
 fail, due to missing symbols.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1010) Python extension build is broken if gflags-dev is installed

2015-08-12 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693621#comment-14693621
 ] 

Till Toenshoff commented on MESOS-1010:
---

I have no reopened that review request - will also discuss it with the some 
other committers today - stay tuned for more :)

 Python extension build is broken if gflags-dev is installed
 ---

 Key: MESOS-1010
 URL: https://issues.apache.org/jira/browse/MESOS-1010
 Project: Mesos
  Issue Type: Bug
  Components: build, python api
 Environment: Fedora 20, amd64, GCC: 4.8.2; OSX Yosemite, Apple LLVM 
 6.1.0 (~LLVM 3.6.0)
Reporter: Nikita Vetoshkin
Assignee: Greg Mann
  Labels: flaky-test, mesosphere

 In my environment mesos build from master results in broken python api module 
 {{_mesos.so}}:
 {noformat}
 nekto0n@ya-darkstar ~/workspace/mesos/src/python $ 
 PYTHONPATH=build/lib.linux-x86_64-2.7/ python -c import _mesos
 Traceback (most recent call last):
   File string, line 1, in module
 ImportError: 
 /home/nekto0n/workspace/mesos/src/python/build/lib.linux-x86_64-2.7/_mesos.so:
  undefined symbol: _ZN6google14FlagRegistererC1EPKcS2_S2_S2_PvS3_
 {noformat}
 Unmangled version of symbol looks like this:
 {noformat}
 google::FlagRegisterer::FlagRegisterer(char const*, char const*, char const*, 
 char const*, void*, void*)
 {noformat}
 During {{./configure}} step {{glog}} finds {{gflags}} development files and 
 starts using them, thus *implicitly* adding dependency on {{libgflags.so}}. 
 This breaks Python extensions module and perhaps can break other mesos 
 subsystems when moved to hosts without {{gflags}} installed.
 This task is done when the ExamplesTest.PythonFramework test will pass on a 
 system with gflags installed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2946) Authorizer Module: Interface design

2015-06-26 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-2946:
--
Description: 
h4.Motivation
Design an interface covering authorizer modules while staying minimal invasive 
in regards to changes on the existing {{LocalAuthorizer}} implementation.


  was:
Motivation
Design an interface covering authorizer modules while staying minimal invasive 
in regards to changes on the existing {{LocalAuthorizer}} implementation.



 Authorizer Module: Interface design
 ---

 Key: MESOS-2946
 URL: https://issues.apache.org/jira/browse/MESOS-2946
 Project: Mesos
  Issue Type: Improvement
Reporter: Till Toenshoff

 h4.Motivation
 Design an interface covering authorizer modules while staying minimal 
 invasive in regards to changes on the existing {{LocalAuthorizer}} 
 implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2947) Authorizer Module: Implementation, Integration Tests

2015-06-26 Thread Till Toenshoff (JIRA)
Till Toenshoff created MESOS-2947:
-

 Summary: Authorizer Module: Implementation, Integration  Tests
 Key: MESOS-2947
 URL: https://issues.apache.org/jira/browse/MESOS-2947
 Project: Mesos
  Issue Type: Improvement
Reporter: Till Toenshoff


h4.Motivation
Provide an example authorizer module based on the {{LocalAuthorizer}} 
implementation. Make sure that such authorizer module can be fully unit- and 
integration- tested within the mesos test suite.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2945) Create an Authorizer Module

2015-06-26 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-2945:
--
Epic Name: Authorizer Module

 Create an Authorizer Module
 ---

 Key: MESOS-2945
 URL: https://issues.apache.org/jira/browse/MESOS-2945
 Project: Mesos
  Issue Type: Epic
Reporter: Till Toenshoff

 h4. Motivation
 Allow for third parties to quickly develop and plug-in new authorizing 
 methods. The modularized Authorizer API will lower the barrier for the 
 community to provide new methods to Mesos. An example for such additional, 
 next step module could be LDAP / AD backed authorization. Alternative 
 authorizing methods may bring in new dependencies that we don't want to 
 enforce on all of our users. Mesos users may be required to use custom 
 authorizing techniques due to strict security policies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2947) Authorizer Module: Implementation, Integration Tests

2015-06-26 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-2947:
--
Labels: mesosphere module security  (was: )

 Authorizer Module: Implementation, Integration  Tests
 --

 Key: MESOS-2947
 URL: https://issues.apache.org/jira/browse/MESOS-2947
 Project: Mesos
  Issue Type: Improvement
Reporter: Till Toenshoff
Assignee: Till Toenshoff
  Labels: mesosphere, module, security

 h4.Motivation
 Provide an example authorizer module based on the {{LocalAuthorizer}} 
 implementation. Make sure that such authorizer module can be fully unit- and 
 integration- tested within the mesos test suite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2946) Authorizer Module: Interface design

2015-06-26 Thread Till Toenshoff (JIRA)
Till Toenshoff created MESOS-2946:
-

 Summary: Authorizer Module: Interface design
 Key: MESOS-2946
 URL: https://issues.apache.org/jira/browse/MESOS-2946
 Project: Mesos
  Issue Type: Improvement
Reporter: Till Toenshoff


Motivation
Design an interface covering authorizer modules while staying minimal invasive 
in regards to changes on the existing {{LocalAuthorizer}} implementation.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2946) Authorizer Module: Interface design

2015-06-26 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-2946:
--
Description: 
h4.Motivation
Design an interface covering authorizer modules while staying minimally 
invasive in regards to changes to the existing {{LocalAuthorizer}} 
implementation.


  was:
h4.Motivation
Design an interface covering authorizer modules while staying minimal invasive 
in regards to changes on the existing {{LocalAuthorizer}} implementation.



 Authorizer Module: Interface design
 ---

 Key: MESOS-2946
 URL: https://issues.apache.org/jira/browse/MESOS-2946
 Project: Mesos
  Issue Type: Improvement
Reporter: Till Toenshoff

 h4.Motivation
 Design an interface covering authorizer modules while staying minimally 
 invasive in regards to changes to the existing {{LocalAuthorizer}} 
 implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2946) Authorizer Module: Interface design

2015-06-26 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-2946:
--
Story Points: 2

 Authorizer Module: Interface design
 ---

 Key: MESOS-2946
 URL: https://issues.apache.org/jira/browse/MESOS-2946
 Project: Mesos
  Issue Type: Improvement
Reporter: Till Toenshoff
Assignee: Till Toenshoff

 h4.Motivation
 Design an interface covering authorizer modules while staying minimally 
 invasive in regards to changes to the existing {{LocalAuthorizer}} 
 implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2945) Create an Authorizer Module

2015-06-26 Thread Till Toenshoff (JIRA)
Till Toenshoff created MESOS-2945:
-

 Summary: Create an Authorizer Module
 Key: MESOS-2945
 URL: https://issues.apache.org/jira/browse/MESOS-2945
 Project: Mesos
  Issue Type: Improvement
Reporter: Till Toenshoff


h4. Motivation
Allow for third parties to quickly develop and plug-in new authorizing methods. 
The modularized Authorizer API will lower the barrier for the community to 
provide new methods to Mesos. An example for such additional, next step module 
could be LDAP / AD backed authorization. Alternative authorizing methods may 
bring in new dependencies that we don't want to enforce on all of our users. 
Mesos users may be required to use custom authorizing techniques due to strict 
security policies.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-3173) Mark Path::basename, Path::dirname as const functions.

2015-07-29 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3173:
--
Comment: was deleted

(was: https://reviews.apache.org/r/36773/)

 Mark Path::basename, Path::dirname as const functions.
 --

 Key: MESOS-3173
 URL: https://issues.apache.org/jira/browse/MESOS-3173
 Project: Mesos
  Issue Type: Improvement
  Components: stout
Reporter: Jan Schlicht
Assignee: Jan Schlicht
Priority: Trivial
  Labels: easyfix, mesosphere

 The functions Path::basename and Path::dirname in stout/path.hpp are not 
 marked const, although they could. Marking them const would remove some 
 ambiguities in the usage of these functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1457) Process IDs should be required to be human-readable

2015-08-05 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658324#comment-14658324
 ] 

Till Toenshoff commented on MESOS-1457:
---

Shepherd will get assigned shortly.

 Process IDs should be required to be human-readable 
 

 Key: MESOS-1457
 URL: https://issues.apache.org/jira/browse/MESOS-1457
 Project: Mesos
  Issue Type: Improvement
  Components: libprocess
Reporter: Dominic Hamon
Assignee: Palak Choudhary
Priority: Minor

 When debugging, it's very useful to understand which processes are getting 
 timeslices. As such, the human-readable names that can be passed to 
 {{ProcessBase}} are incredibly valuable, however they are currently optional.
 If the constructor of {{ProcessBase}} took a mandatory string, every process 
 would get a human-readable name and debugging would be much easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-830) ExamplesTest.JavaFramework is flaky

2015-08-05 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658434#comment-14658434
 ] 

Till Toenshoff commented on MESOS-830:
--

[~greggomann] I added some debug code into that macro which told me that 
pthread_rwlock_wrlock returned 22 (Invalid Argument) and from that I assumed 
that the mutex in question had gotten killed already.

 ExamplesTest.JavaFramework is flaky
 ---

 Key: MESOS-830
 URL: https://issues.apache.org/jira/browse/MESOS-830
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Vinod Kone
Assignee: Greg Mann
  Labels: flaky, mesosphere

 Identify the cause of the following test failure:
 [ RUN  ] ExamplesTest.JavaFramework
 Using temporary directory '/tmp/ExamplesTest_JavaFramework_wSc7u8'
 Enabling authentication for the framework
 I1120 15:13:39.820032 1681264640 master.cpp:285] Master started on 
 172.25.133.171:52576
 I1120 15:13:39.820180 1681264640 master.cpp:299] Master ID: 
 201311201513-2877626796-52576-3234
 I1120 15:13:39.820194 1681264640 master.cpp:302] Master only allowing 
 authenticated frameworks to register!
 I1120 15:13:39.821197 1679654912 slave.cpp:112] Slave started on 
 1)@172.25.133.171:52576
 I1120 15:13:39.821795 1679654912 slave.cpp:212] Slave resources: cpus(*):4; 
 mem(*):7168; disk(*):481998; ports(*):[31000-32000]
 I1120 15:13:39.822855 1682337792 slave.cpp:112] Slave started on 
 2)@172.25.133.171:52576
 I1120 15:13:39.823652 1682337792 slave.cpp:212] Slave resources: cpus(*):4; 
 mem(*):7168; disk(*):481998; ports(*):[31000-32000]
 I1120 15:13:39.825330 1679118336 master.cpp:744] The newly elected leader is 
 master@172.25.133.171:52576
 I1120 15:13:39.825445 1679118336 master.cpp:748] Elected as the leading 
 master!
 I1120 15:13:39.825907 1681264640 state.cpp:33] Recovering state from 
 '/tmp/ExamplesTest_JavaFramework_wSc7u8/0/meta'
 I1120 15:13:39.826127 1681264640 status_update_manager.cpp:180] Recovering 
 status update manager
 I1120 15:13:39.826331 1681801216 process_isolator.cpp:317] Recovering isolator
 I1120 15:13:39.826738 1682874368 slave.cpp:2743] Finished recovery
 I1120 15:13:39.827747 1682337792 state.cpp:33] Recovering state from 
 '/tmp/ExamplesTest_JavaFramework_wSc7u8/1/meta'
 I1120 15:13:39.827945 1680191488 slave.cpp:112] Slave started on 
 3)@172.25.133.171:52576
 I1120 15:13:39.828415 1682337792 status_update_manager.cpp:180] Recovering 
 status update manager
 I1120 15:13:39.828608 1680728064 sched.cpp:260] Authenticating with master 
 master@172.25.133.171:52576
 I1120 15:13:39.828606 1680191488 slave.cpp:212] Slave resources: cpus(*):4; 
 mem(*):7168; disk(*):481998; ports(*):[31000-32000]
 I1120 15:13:39.828680 1682874368 slave.cpp:497] New master detected at 
 master@172.25.133.171:52576
 I1120 15:13:39.828765 1682337792 process_isolator.cpp:317] Recovering isolator
 I1120 15:13:39.829828 1680728064 sched.cpp:229] Detecting new master
 I1120 15:13:39.830288 1679654912 authenticatee.hpp:100] Initializing client 
 SASL
 I1120 15:13:39.831635 1680191488 state.cpp:33] Recovering state from 
 '/tmp/ExamplesTest_JavaFramework_wSc7u8/2/meta'
 I1120 15:13:39.831991 1679118336 status_update_manager.cpp:158] New master 
 detected at master@172.25.133.171:52576
 I1120 15:13:39.832042 1682874368 slave.cpp:524] Detecting new master
 I1120 15:13:39.832314 1682337792 slave.cpp:2743] Finished recovery
 I1120 15:13:39.832309 1681264640 master.cpp:1266] Attempting to register 
 slave on vkone.local at slave(1)@172.25.133.171:52576
 I1120 15:13:39.832929 1680728064 status_update_manager.cpp:180] Recovering 
 status update manager
 I1120 15:13:39.833371 1681801216 slave.cpp:497] New master detected at 
 master@172.25.133.171:52576
 I1120 15:13:39.833273 1681264640 master.cpp:2513] Adding slave 
 201311201513-2877626796-52576-3234-0 at vkone.local with cpus(*):4; 
 mem(*):7168; disk(*):481998; ports(*):[31000-32000]
 I1120 15:13:39.833595 1680728064 process_isolator.cpp:317] Recovering isolator
 I1120 15:13:39.833859 1681801216 slave.cpp:524] Detecting new master
 I1120 15:13:39.833861 1682874368 status_update_manager.cpp:158] New master 
 detected at master@172.25.133.171:52576
 I1120 15:13:39.834092 1680191488 slave.cpp:542] Registered with master 
 master@172.25.133.171:52576; given slave ID 
 201311201513-2877626796-52576-3234-0
 I1120 15:13:39.834486 1681264640 master.cpp:1266] Attempting to register 
 slave on vkone.local at slave(2)@172.25.133.171:52576
 I1120 15:13:39.834549 1681264640 master.cpp:2513] Adding slave 
 201311201513-2877626796-52576-3234-1 at vkone.local with cpus(*):4; 
 mem(*):7168; disk(*):481998; ports(*):[31000-32000]
 I1120 15:13:39.834750 1680191488 slave.cpp:555] Checkpointing SlaveInfo to 
 

[jira] [Updated] (MESOS-3170) 0.23 Build fails when compiling against -lsasl2 which has been statically linked

2015-08-03 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3170:
--
Shepherd: Till Toenshoff

 0.23 Build fails when compiling against -lsasl2 which has been statically 
 linked
 

 Key: MESOS-3170
 URL: https://issues.apache.org/jira/browse/MESOS-3170
 Project: Mesos
  Issue Type: Bug
  Components: build
Affects Versions: 0.23.0
Reporter: Chris Heller
Assignee: Chris Heller
Priority: Minor
  Labels: easyfix
 Fix For: 0.24.0


 If the sasl library has been statically linked the check from CRAM-MD5 can 
 fail, due to missing symbols.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3260) SchedulerTest.* are broken on OSX and CentOS

2015-08-13 Thread Till Toenshoff (JIRA)
Till Toenshoff created MESOS-3260:
-

 Summary: SchedulerTest.* are broken on OSX and CentOS
 Key: MESOS-3260
 URL: https://issues.apache.org/jira/browse/MESOS-3260
 Project: Mesos
  Issue Type: Bug
 Environment: OSX 10.10.5 (14F6a),
Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)
Reporter: Till Toenshoff
Priority: Blocker


Running a plain configure and make check on OSX does currently lead to the 
following:

{noformat}
[ RUN  ] SchedulerTest.Subscribe
../../src/tests/scheduler_tests.cpp:168: Failure
Value of: event.get().type()
  Actual: HEARTBEAT
Expected: Event::SUBSCRIBED
Which is: SUBSCRIBED
../../src/tests/scheduler_tests.cpp:169: Failure
Value of: event.get().subscribed().framework_id()
  Actual:
Expected: id
Which is: 20150813-222454-347252928-56290-60707-
[  FAILED  ] SchedulerTest.Subscribe (183 ms)
[ RUN  ] SchedulerTest.TaskRunning
../../src/tests/scheduler_tests.cpp:227: Failure
Value of: event.get().type()
  Actual: HEARTBEAT
Expected: Event::OFFERS
Which is: OFFERS
../../src/tests/scheduler_tests.cpp:228: Failure
Expected: (0) != (event.get().offers().offers().size()), actual: 0 vs 0
[libprotobuf FATAL 
../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/google/protobuf/repeated_field.h:824]
 CHECK failed: (index)  (size()):
../../src/tests/scheduler_tests.cpp:237: Failure
Actual function call count doesn't match EXPECT_CALL(containerizer, update(_, 
_))...
 Expected: to be called at least once
   Actual: never called - unsatisfied and active
../../src/tests/scheduler_tests.cpp:233: Failure
Actual function call count doesn't match EXPECT_CALL(exec, launchTask(_, _))...
 Expected: to be called once
   Actual: never called - unsatisfied and active
../../src/tests/scheduler_tests.cpp:230: Failure
Actual function call count doesn't match EXPECT_CALL(exec, registered(_, _, _, 
_))...
 Expected: to be called once
   Actual: never called - unsatisfied and active
unknown file: Failure
C++ exception with description CHECK failed: (index)  (size()):  thrown in 
the test body.
*** Aborted at 1439497494 (unix time) try date -d @1439497494 if you are 
using GNU date ***
PC: @ 0x7fb2c0f20490 (unknown)
*** SIGBUS (@0x7fb2c0f20490) received by PID 60707 (TID 0x7fff7a876300) stack 
trace: ***
@ 0x7fff8a77ef1a _sigtramp
@ 0x7fff532c9990 (unknown)
@0x10d3bcedb mesos::internal::tests::MesosTest::ShutdownSlaves()
@0x10d3bce75 mesos::internal::tests::MesosTest::Shutdown()
@0x10d3b7d47 mesos::internal::tests::MesosTest::TearDown()
@0x10dbc8283 
testing::internal::HandleSehExceptionsInMethodIfSupported()
@0x10dbafab7 
testing::internal::HandleExceptionsInMethodIfSupported()
@0x10db6f8ba testing::Test::Run()
@0x10db70deb testing::TestInfo::Run()
@0x10db71ab7 testing::TestCase::Run()
@0x10db804b3 testing::internal::UnitTestImpl::RunAllTests()
@0x10dbc4fe3 
testing::internal::HandleSehExceptionsInMethodIfSupported()
@0x10dbb1ea7 
testing::internal::HandleExceptionsInMethodIfSupported()
@0x10db800b0 testing::UnitTest::Run()
@0x10d10c8d1 RUN_ALL_TESTS()
@0x10d108b87 main
@ 0x7fff8da765c9 start
Bus error: 10
{noformat}

Results on CentOS look similar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3149) Use setuptools to install python cli package

2015-08-12 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3149:
--
Shepherd: Till Toenshoff

 Use setuptools to install python cli package
 

 Key: MESOS-3149
 URL: https://issues.apache.org/jira/browse/MESOS-3149
 Project: Mesos
  Issue Type: Task
Reporter: haosdent
Assignee: haosdent

 mesos-ps/mesos-cat which depends on src/cli/python/mesos could not work in 
 OSX because src/cli/python is not installed to sys.path. It's time to 
 finish this TODO.
  
 {code}
 # Add 'src/cli/python' to PYTHONPATH.
 # TODO(benh): Remove this if/when we install the 'mesos' module via
 # PIP and setuptools.
 PYTHONPATH=@abs_top_srcdir@/src/cli/python:${PYTHONPATH}
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2697) Add a /teardown endpoint on master to teardown a framework

2015-07-27 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642691#comment-14642691
 ] 

Till Toenshoff commented on MESOS-2697:
---

commit 90f3fec71535bdf9c0cd5fc90c62e19a86b92470
Author: Joerg Schad jo...@mesosphere.io
Date:   Mon Jul 27 14:17:12 2015 +0200

Updated Authorization documentation to use /teardown endpoint.

With Mesos 0.23 the /shutdown endpoint has been deprecated in favor of
the /teardown endpoint. See MESOS-2697 for details.

Review: https://reviews.apache.org/r/36774

 Add a /teardown endpoint on master to teardown a framework
 --

 Key: MESOS-2697
 URL: https://issues.apache.org/jira/browse/MESOS-2697
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
Assignee: Vinod Kone
 Fix For: 0.23.0


 We plan to rename /shutdown endpoint to /teardown to be compatible with 
 the new API. /shutdown will be deprecated in 0.24.0 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3785) Use URI content modification time to trigger fetcher cache updates.

2015-11-09 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3785:
--
Target Version/s: 0.27.0  (was: 0.26.0)

> Use URI content modification time to trigger fetcher cache updates.
> ---
>
> Key: MESOS-3785
> URL: https://issues.apache.org/jira/browse/MESOS-3785
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher
>Reporter: Bernd Mathiske
>Assignee: Benjamin Bannier
>  Labels: mesosphere
>
> Instead of using checksums to trigger fetcher cache updates, we can for 
> starters use the content modification time (mtime), which is available for a 
> number of download protocols, e.g. HTTP and HDFS.
> Proposal: Instead of just fetching the content size, we fetch both size  and 
> mtime together. As before, if there is no size, then caching fails and we 
> fall back on direct downloading to the sandbox. 
> Assuming a size is given, we compare the mtime from the fetch URI with the 
> mtime known to the cache. If it differs, we update the cache. (As a defensive 
> measure, a difference in size should also trigger an update.) 
> Not having an mtime available at the fetch URI is simply treated as a unique 
> valid mtime value that differs from all others. This means that when 
> initially there is no mtime, cache content remains valid until there is one. 
> Thereafter,  anew lack of an mtime invalidates the cache once. In other 
> words: any change from no mtime to having one or back is the same as 
> encountering a new mtime.
> Note that this scheme does not require any new protobuf fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3851) Investigate recent crashes in Command Executor

2015-11-10 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14998699#comment-14998699
 ] 

Till Toenshoff commented on MESOS-3851:
---

I will be committing the workaround patch Tim has provided 
https://reviews.apache.org/r/40107/  (thanks a bunch [~tnachen]!) shortly after 
running a final check on it.

> Investigate recent crashes in Command Executor
> --
>
> Key: MESOS-3851
> URL: https://issues.apache.org/jira/browse/MESOS-3851
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Anand Mazumdar
>Priority: Blocker
>  Labels: mesosphere
>
> Post https://reviews.apache.org/r/38900 i.e. updating CommandExecutor to 
> support rootfs. There seem to be some tests showing frequent crashes due to 
> assert violations.
> {{FetcherCacheTest.SimpleEviction}} failed due to the following log:
> {code}
> I1107 19:36:46.360908 30657 slave.cpp:1793] Sending queued task '3' to 
> executor ''3' of framework 7d94c7fb-8950-4bcf-80c1-46112292dcd6- at 
> executor(1)@172.17.5.200:33871'
> I1107 19:36:46.363682  1236 exec.cpp:297] 
> I1107 19:36:46.373569  1245 exec.cpp:210] Executor registered on slave 
> 7d94c7fb-8950-4bcf-80c1-46112292dcd6-S0
> @ 0x7f9f5a7db3fa  google::LogMessage::Fail()
> I1107 19:36:46.394081  1245 exec.cpp:222] Executor::registered took 395411ns
> @ 0x7f9f5a7db359  google::LogMessage::SendToLog()
> @ 0x7f9f5a7dad6a  google::LogMessage::Flush()
> @ 0x7f9f5a7dda9e  google::LogMessageFatal::~LogMessageFatal()
> @   0x48d00a  _CheckFatal::~_CheckFatal()
> @   0x49c99d  
> mesos::internal::CommandExecutorProcess::launchTask()
> @   0x4b3dd7  
> _ZZN7process8dispatchIN5mesos8internal22CommandExecutorProcessEPNS1_14ExecutorDriverERKNS1_8TaskInfoES5_S6_EEvRKNS_3PIDIT_EEMSA_FvT0_T1_ET2_T3_ENKUlPNS_11ProcessBaseEE_clESL_
> @   0x4c470c  
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal22CommandExecutorProcessEPNS5_14ExecutorDriverERKNS5_8TaskInfoES9_SA_EEvRKNS0_3PIDIT_EEMSE_FvT0_T1_ET2_T3_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x7f9f5a761b1b  std::function<>::operator()()
> @ 0x7f9f5a749935  process::ProcessBase::visit()
> @ 0x7f9f5a74d700  process::DispatchEvent::visit()
> @   0x48e004  process::ProcessBase::serve()
> @ 0x7f9f5a745d21  process::ProcessManager::resume()
> @ 0x7f9f5a742f52  
> _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
> @ 0x7f9f5a74cf2c  
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
> @ 0x7f9f5a74cedc  
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
> @ 0x7f9f5a74ce6e  
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
> @ 0x7f9f5a74cdc5  
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
> @ 0x7f9f5a74cd5e  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
> @ 0x7f9f5624f1e0  (unknown)
> @ 0x7f9f564a8df5  start_thread
> @ 0x7f9f559b71ad  __clone
> I1107 19:36:46.551370 30656 containerizer.cpp:1257] Executor for container 
> '6553a617-6b4a-418d-9759-5681f45ff854' has exited
> I1107 19:36:46.551429 30656 containerizer.cpp:1074] Destroying container 
> '6553a617-6b4a-418d-9759-5681f45ff854'
> I1107 19:36:46.553869 30656 containerizer.cpp:1257] Executor for container 
> 'd2c1f924-c92a-453e-82b1-c294d09c4873' has exited
> {code}
> The reason seems to be a race between the executor receiving a 
> {{RunTaskMessage}} before {{ExecutorRegisteredMessage}} leading to the 
> {{CHECK_SOME(executorInfo)}} failure.
> Link to complete log: 
> https://issues.apache.org/jira/browse/MESOS-2831?focusedCommentId=14995535=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14995535
> Another related failure from {{ExamplesTest.PersistentVolumeFramework}}
> {code}
> @ 0x7f4f71529cbd  google::LogMessage::SendToLog()
> I1107 13:15:09.949987 31573 slave.cpp:2337] Status update manager 
> successfully handled status update acknowledgement (UUID: 
> 721c7316-5580-4636-a83a-098e3bd4ed1f) for task 
> ad90531f-d3d8-43f6-96f2-c81c4548a12d of framework 
> ac4ea54a-7d19-4e41-9ee3-1a761f8e5b0f-
> @ 0x7f4f715296ce  google::LogMessage::Flush()
> @   

[jira] [Updated] (MESOS-3581) License headers show up all over doxygen documentation.

2015-11-10 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3581:
--
Target Version/s:   (was: 0.26.0)

> License headers show up all over doxygen documentation.
> ---
>
> Key: MESOS-3581
> URL: https://issues.apache.org/jira/browse/MESOS-3581
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.24.1
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Minor
>  Labels: mesosphere
>
> Currently license headers are commented in something resembling Javadoc style,
> {code}
> /**
> * Licensed ...
> {code}
> Since we use Javadoc-style comment blocks for doxygen documentation all 
> license headers appear in the generated documentation, potentially and likely 
> hiding the actual documentation.
> Using {{/*}} to start the comment blocks would be enough to hide them from 
> doxygen, but would likely also result in a largish (though mostly 
> uninteresting) patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3802) Clear the suppressed flag when deactive a framework

2015-11-10 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3802:
--
Target Version/s:   (was: 0.26.0)

> Clear the suppressed flag when deactive a framework
> ---
>
> Key: MESOS-3802
> URL: https://issues.apache.org/jira/browse/MESOS-3802
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> When deactivate the framework, the suppressed flag was not cleared and this 
> will cause the framework cannot get resource immediately after active, we 
> should clear this flag when deactivate the framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3418) Factor out V1 API test helper functions

2015-11-10 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3418:
--
Target Version/s: 0.27.0  (was: 0.26.0)

> Factor out V1 API test helper functions
> ---
>
> Key: MESOS-3418
> URL: https://issues.apache.org/jira/browse/MESOS-3418
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Joris Van Remoortere
>Assignee: Guangya Liu
>  Labels: beginner, mesosphere, newbie, v1_api
>
> We currently have some helper functionality for V1 API tests. This is copied 
> in a few test files.
> Factor this out into a common place once the API is stabilized.
> {code}
> // Helper class for using EXPECT_CALL since the Mesos scheduler API
>   // is callback based.
>   class Callbacks
>   {
>   public:
> MOCK_METHOD0(connected, void(void));
> MOCK_METHOD0(disconnected, void(void));
> MOCK_METHOD1(received, void(const std::queue&));
>   };
> {code}
> {code}
> // Enqueues all received events into a libprocess queue.
> // TODO(jmlvanre): Factor this common code out of tests into V1
> // helper.
> ACTION_P(Enqueue, queue)
> {
>   std::queue events = arg0;
>   while (!events.empty()) {
> // Note that we currently drop HEARTBEATs because most of these tests
> // are not designed to deal with heartbeats.
> // TODO(vinod): Implement DROP_HTTP_CALLS that can filter heartbeats.
> if (events.front().type() == Event::HEARTBEAT) {
>   VLOG(1) << "Ignoring HEARTBEAT event";
> } else {
>   queue->put(events.front());
> }
> events.pop();
>   }
> }
> {code}
> We can also update the helpers in {{/tests/mesos.hpp}} to support the V1 API. 
>  This would let us get ride of lines like:
> {code}
> v1::TaskInfo taskInfo = evolve(createTask(devolve(offer), "", 
> DEFAULT_EXECUTOR_ID));
> {code}
> In favor of:
> {code}
> v1::TaskInfo taskInfo = createTask(offer, "", DEFAULT_EXECUTOR_ID);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3851) Investigate recent crashes in Command Executor

2015-11-10 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14998731#comment-14998731
 ] 

Till Toenshoff commented on MESOS-3851:
---

This following commit fixes the crash - we still may want to find the reasoning 
for the race condition and hence I will not close this ticket but will remove 
the target version (0.26.0) to unblock 0.26.0.

{noformat}
commit b6d4b28a4c9ca717ad8be5bbc27e40c005fc51ad
Author: Timothy Chen 
Date:   Tue Nov 10 15:46:17 2015 +0100

Removed unused checks in command executor.

Review: https://reviews.apache.org/r/40107
{noformat}

> Investigate recent crashes in Command Executor
> --
>
> Key: MESOS-3851
> URL: https://issues.apache.org/jira/browse/MESOS-3851
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Anand Mazumdar
>Priority: Blocker
>  Labels: mesosphere
>
> Post https://reviews.apache.org/r/38900 i.e. updating CommandExecutor to 
> support rootfs. There seem to be some tests showing frequent crashes due to 
> assert violations.
> {{FetcherCacheTest.SimpleEviction}} failed due to the following log:
> {code}
> I1107 19:36:46.360908 30657 slave.cpp:1793] Sending queued task '3' to 
> executor ''3' of framework 7d94c7fb-8950-4bcf-80c1-46112292dcd6- at 
> executor(1)@172.17.5.200:33871'
> I1107 19:36:46.363682  1236 exec.cpp:297] 
> I1107 19:36:46.373569  1245 exec.cpp:210] Executor registered on slave 
> 7d94c7fb-8950-4bcf-80c1-46112292dcd6-S0
> @ 0x7f9f5a7db3fa  google::LogMessage::Fail()
> I1107 19:36:46.394081  1245 exec.cpp:222] Executor::registered took 395411ns
> @ 0x7f9f5a7db359  google::LogMessage::SendToLog()
> @ 0x7f9f5a7dad6a  google::LogMessage::Flush()
> @ 0x7f9f5a7dda9e  google::LogMessageFatal::~LogMessageFatal()
> @   0x48d00a  _CheckFatal::~_CheckFatal()
> @   0x49c99d  
> mesos::internal::CommandExecutorProcess::launchTask()
> @   0x4b3dd7  
> _ZZN7process8dispatchIN5mesos8internal22CommandExecutorProcessEPNS1_14ExecutorDriverERKNS1_8TaskInfoES5_S6_EEvRKNS_3PIDIT_EEMSA_FvT0_T1_ET2_T3_ENKUlPNS_11ProcessBaseEE_clESL_
> @   0x4c470c  
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal22CommandExecutorProcessEPNS5_14ExecutorDriverERKNS5_8TaskInfoES9_SA_EEvRKNS0_3PIDIT_EEMSE_FvT0_T1_ET2_T3_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x7f9f5a761b1b  std::function<>::operator()()
> @ 0x7f9f5a749935  process::ProcessBase::visit()
> @ 0x7f9f5a74d700  process::DispatchEvent::visit()
> @   0x48e004  process::ProcessBase::serve()
> @ 0x7f9f5a745d21  process::ProcessManager::resume()
> @ 0x7f9f5a742f52  
> _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
> @ 0x7f9f5a74cf2c  
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
> @ 0x7f9f5a74cedc  
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
> @ 0x7f9f5a74ce6e  
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
> @ 0x7f9f5a74cdc5  
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
> @ 0x7f9f5a74cd5e  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
> @ 0x7f9f5624f1e0  (unknown)
> @ 0x7f9f564a8df5  start_thread
> @ 0x7f9f559b71ad  __clone
> I1107 19:36:46.551370 30656 containerizer.cpp:1257] Executor for container 
> '6553a617-6b4a-418d-9759-5681f45ff854' has exited
> I1107 19:36:46.551429 30656 containerizer.cpp:1074] Destroying container 
> '6553a617-6b4a-418d-9759-5681f45ff854'
> I1107 19:36:46.553869 30656 containerizer.cpp:1257] Executor for container 
> 'd2c1f924-c92a-453e-82b1-c294d09c4873' has exited
> {code}
> The reason seems to be a race between the executor receiving a 
> {{RunTaskMessage}} before {{ExecutorRegisteredMessage}} leading to the 
> {{CHECK_SOME(executorInfo)}} failure.
> Link to complete log: 
> https://issues.apache.org/jira/browse/MESOS-2831?focusedCommentId=14995535=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14995535
> Another related failure from {{ExamplesTest.PersistentVolumeFramework}}
> {code}
> @ 0x7f4f71529cbd  google::LogMessage::SendToLog()
> I1107 13:15:09.949987 31573 slave.cpp:2337] Status 

[jira] [Updated] (MESOS-3937) Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.

2015-11-17 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3937:
--
Description: 
{noformat}
../configure
make check
sudo ./bin/mesos-tests.sh 
--gtest_filter="DockerContainerizerTest.ROOT_DOCKER_Launch_Executor" --verbose
{noformat}

{noformat}
[==] Running 1 test from 1 test case.
[--] Global test environment set-up.
[--] 1 test from DockerContainerizerTest
I1117 15:08:09.265943 26380 leveldb.cpp:176] Opened db in 3.199666ms
I1117 15:08:09.267761 26380 leveldb.cpp:183] Compacted db in 1.684873ms
I1117 15:08:09.267902 26380 leveldb.cpp:198] Created db iterator in 58313ns
I1117 15:08:09.267966 26380 leveldb.cpp:204] Seeked to beginning of db in 4927ns
I1117 15:08:09.267997 26380 leveldb.cpp:273] Iterated through 0 keys in the db 
in 1605ns
I1117 15:08:09.268156 26380 replica.cpp:780] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1117 15:08:09.270148 26396 recover.cpp:449] Starting replica recovery
I1117 15:08:09.272105 26396 recover.cpp:475] Replica is in EMPTY status
I1117 15:08:09.275640 26396 replica.cpp:676] Replica in EMPTY status received a 
broadcasted recover request from (4)@10.0.2.15:50088
I1117 15:08:09.276578 26399 recover.cpp:195] Received a recover response from a 
replica in EMPTY status
I1117 15:08:09.277600 26397 recover.cpp:566] Updating replica status to STARTING
I1117 15:08:09.279613 26396 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 1.016098ms
I1117 15:08:09.279731 26396 replica.cpp:323] Persisted replica status to 
STARTING
I1117 15:08:09.280306 26399 recover.cpp:475] Replica is in STARTING status
I1117 15:08:09.282181 26400 replica.cpp:676] Replica in STARTING status 
received a broadcasted recover request from (5)@10.0.2.15:50088
I1117 15:08:09.282552 26400 master.cpp:367] Master 
59c600f1-92ff-4926-9c84-073d9b81f68a (vagrant-ubuntu-trusty-64) started on 
10.0.2.15:50088
I1117 15:08:09.283021 26400 master.cpp:369] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/40AlT8/credentials" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
--quiet="false" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="25secs" --registry_strict="true" 
--root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/40AlT8/master" 
--zk_session_timeout="10secs"
I1117 15:08:09.283920 26400 master.cpp:414] Master only allowing authenticated 
frameworks to register
I1117 15:08:09.283972 26400 master.cpp:419] Master only allowing authenticated 
slaves to register
I1117 15:08:09.284032 26400 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/40AlT8/credentials'
I1117 15:08:09.282944 26401 recover.cpp:195] Received a recover response from a 
replica in STARTING status
I1117 15:08:09.284639 26401 recover.cpp:566] Updating replica status to VOTING
I1117 15:08:09.285539 26400 master.cpp:458] Using default 'crammd5' 
authenticator
I1117 15:08:09.285995 26401 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 1.075466ms
I1117 15:08:09.286062 26401 replica.cpp:323] Persisted replica status to VOTING
I1117 15:08:09.286200 26401 recover.cpp:580] Successfully joined the Paxos group
I1117 15:08:09.286471 26401 recover.cpp:464] Recover process terminated
I1117 15:08:09.287303 26400 authenticator.cpp:520] Initializing server SASL
I1117 15:08:09.289371 26400 master.cpp:495] Authorization enabled
I1117 15:08:09.296018 26399 master.cpp:1606] The newly elected leader is 
master@10.0.2.15:50088 with id 59c600f1-92ff-4926-9c84-073d9b81f68a
I1117 15:08:09.296115 26399 master.cpp:1619] Elected as the leading master!
I1117 15:08:09.296187 26399 master.cpp:1379] Recovering from registrar
I1117 15:08:09.296717 26397 registrar.cpp:309] Recovering registrar
I1117 15:08:09.298842 26396 log.cpp:661] Attempting to start the writer
I1117 15:08:09.301563 26394 replica.cpp:496] Replica received implicit promise 
request from (6)@10.0.2.15:50088 with proposal 1
I1117 15:08:09.302561 26394 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 922719ns
I1117 15:08:09.302635 26394 replica.cpp:345] Persisted promised to 1
I1117 15:08:09.303755 26394 coordinator.cpp:240] Coordinator attempting to fill 
missing positions
I1117 15:08:09.306161 26394 replica.cpp:391] Replica received explicit promise 
request from (7)@10.0.2.15:50088 for position 0 with proposal 2
I1117 15:08:09.306972 26394 

[jira] [Commented] (MESOS-3583) Introduce sessions in HTTP Scheduler API Subscribed Responses

2015-11-05 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992311#comment-14992311
 ] 

Till Toenshoff commented on MESOS-3583:
---

[~anandmazumdar] shall we push this towards 0.27.0?

> Introduce sessions in HTTP Scheduler API Subscribed Responses
> -
>
> Key: MESOS-3583
> URL: https://issues.apache.org/jira/browse/MESOS-3583
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>  Labels: mesosphere, tech-debt
>
> Currently, the HTTP Scheduler API has no concept of Sessions aka 
> {{SessionID}} or a {{TokenID}}. This is useful in some failure scenarios. As 
> of now, if a framework fails over and then subscribes again with the same 
> {{FrameworkID}} with the {{force}} option set. The Mesos master would 
> subscribe it.
> If the previous instance of the framework/scheduler tries to send a Call , 
> e.g. {{Call::KILL}} with the same previous {{FrameworkID}} set, it would be 
> still accepted by the master leading to erroneously killing a task.
> This is possible because we do not have a way currently of distinguishing 
> connections. It used to work in the previous driver implementation due to the 
> master also performing a {{UPID}} check to verify if they matched and only 
> then allowing the call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3460) Update Java Test Framework Support QuiesceOffer and reviveOffer

2015-11-05 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3460:
--
Target Version/s: 0.27.0  (was: 0.26.0)

> Update Java Test Framework Support QuiesceOffer and reviveOffer
> ---
>
> Key: MESOS-3460
> URL: https://issues.apache.org/jira/browse/MESOS-3460
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Guangya Liu
> Fix For: 0.26.0
>
>
> This is a follow up for https://reviews.apache.org/r/38120/ , we need to add 
> Java framework support for quieseceOffers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3461) Update Python Test Framework Support QuiesceOffer and reviveOffer

2015-11-05 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3461:
--
Target Version/s: 0.27.0  (was: 0.26.0)

> Update Python Test Framework Support QuiesceOffer and reviveOffer
> -
>
> Key: MESOS-3461
> URL: https://issues.apache.org/jira/browse/MESOS-3461
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Guangya Liu
> Fix For: 0.26.0
>
>
> This is a follow up for https://reviews.apache.org/r/38121/ , we need to add  
> Python framework support for quieseceOffers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3756) Generalized HTTP Authentication Modules

2015-11-05 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3756:
--
Target Version/s: 0.27.0  (was: 0.26.0)

> Generalized HTTP Authentication Modules
> ---
>
> Key: MESOS-3756
> URL: https://issues.apache.org/jira/browse/MESOS-3756
> Project: Mesos
>  Issue Type: Task
>  Components: modules
>Reporter: Bernd Mathiske
>Assignee: Alexander Rojas
>
> Libprocess is going to factor out an authentication interface: MESOS-3231
> Here we propose that Mesos can provide implementations for this interface as 
> Mesos modules.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3461) Update Python Test Framework Support QuiesceOffer and reviveOffer

2015-11-05 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3461:
--
Fix Version/s: (was: 0.26.0)

> Update Python Test Framework Support QuiesceOffer and reviveOffer
> -
>
> Key: MESOS-3461
> URL: https://issues.apache.org/jira/browse/MESOS-3461
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Guangya Liu
>
> This is a follow up for https://reviews.apache.org/r/38121/ , we need to add  
> Python framework support for quieseceOffers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3460) Update Java Test Framework Support QuiesceOffer and reviveOffer

2015-11-05 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3460:
--
Fix Version/s: (was: 0.26.0)

> Update Java Test Framework Support QuiesceOffer and reviveOffer
> ---
>
> Key: MESOS-3460
> URL: https://issues.apache.org/jira/browse/MESOS-3460
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Guangya Liu
>
> This is a follow up for https://reviews.apache.org/r/38120/ , we need to add 
> Java framework support for quieseceOffers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2295) Implement the Call endpoint on Slave

2015-11-05 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992314#comment-14992314
 ] 

Till Toenshoff commented on MESOS-2295:
---

[~anandmazumdar]  Seems this issue got resolved via the subtasks, correct?

> Implement the Call endpoint on Slave
> 
>
> Key: MESOS-2295
> URL: https://issues.apache.org/jira/browse/MESOS-2295
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3810) Must be able to use NetworkInfo with mesos-executor

2015-11-05 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3810:
--
Assignee: Spike Curtis

> Must be able to use NetworkInfo with mesos-executor
> ---
>
> Key: MESOS-3810
> URL: https://issues.apache.org/jira/browse/MESOS-3810
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, slave
>Affects Versions: 0.25.0
>Reporter: Spike Curtis
>Assignee: Spike Curtis
>Priority: Blocker
>
> ContainerInfo with included NetworkInfo can appear in one of two places 
> during a task launch: in the ExecutorInfo.container, or if using the 
> mesos-executor (aka command executor), within the TaskInfo.container.
> Mesos 0.25.0 correctly supports the former, but not the latter.  In that 
> case, the MesosContainerizer fails the task launch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3558) Make the CommandExecutor use the Executor Library speaking HTTP

2015-11-05 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3558:
--
Target Version/s: 0.27.0  (was: 0.26.0)

> Make the CommandExecutor use the Executor Library speaking HTTP
> ---
>
> Key: MESOS-3558
> URL: https://issues.apache.org/jira/browse/MESOS-3558
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>  Labels: mesosphere
>
> Instead of using the {{MesosExecutorDriver}} , we should make the 
> {{CommandExecutor}} in {{src/launcher/executor.cpp}} use the new Executor 
> HTTP Library that we create in {{MESOS-3550}}. 
> This would act as a good validation of the {{HTTP API}} implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3688) Get Container Name information when launching a container task

2015-11-07 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3688:
--
Target Version/s: 0.27.0  (was: 0.26.0)

> Get Container Name information when launching a container task
> --
>
> Key: MESOS-3688
> URL: https://issues.apache.org/jira/browse/MESOS-3688
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Affects Versions: 0.24.1
>Reporter: Raffaele Di Fazio
>Assignee: Kapil Arya
>  Labels: mesosphere
>
> We want to get the Docker Name (or Docker ID, or both) when launching a 
> container task with mesos. The container name is generated by mesos itself 
> (i.e. mesos-77e5fde6-83e7-4618-a2dd-d5b10f2b4d25, obtained with "docker ps") 
> and it would be nice to expose this information to frameworks so that this 
> information can be used, for example by Marathon to give this information to 
> users via a REST API. 
> To go a bit in depth with our use case, we have files created by fluentd 
> logdriver that are named with Docker Name or Docker ID (full or short) and we 
> need a mapping for the users of the REST API and thus the first step is to 
> make this information available from mesos. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3849) Corrected style in Makefiles

2015-11-07 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3849:
--
Target Version/s: 0.26.0

> Corrected style in Makefiles
> 
>
> Key: MESOS-3849
> URL: https://issues.apache.org/jira/browse/MESOS-3849
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> Order of files in Makefiles is not strictly alphabetic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3835) Expose framework principal through state.json/state

2015-11-07 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3835:
--
Affects Version/s: (was: 0.26.0)
Fix Version/s: (was: 0.26.0)

> Expose framework principal through state.json/state
> ---
>
> Key: MESOS-3835
> URL: https://issues.apache.org/jira/browse/MESOS-3835
> Project: Mesos
>  Issue Type: Wish
>  Components: master
>Reporter: Sargun Dhillon
>Assignee: Guangya Liu
>Priority: Trivial
>
> We would like to expose the framework principal through the Master 
> /state.json or /state. This is for the purposes of both debugging (from the 
> operator perspective). This could be used for inspection during the process 
> of creating, or modifying ACLs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3736) Support docker local store pull same image simultaneously

2015-11-07 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995392#comment-14995392
 ] 

Till Toenshoff commented on MESOS-3736:
---

[~gilbert] I have bumped this to 0.27.0 as the RR seems to be WIP and we would 
love to cut 0.26.0 very soon.

> Support docker local store pull same image simultaneously 
> --
>
> Key: MESOS-3736
> URL: https://issues.apache.org/jira/browse/MESOS-3736
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: mesosphere
>
> The current local store implements get() using the local puller. For all 
> requests of pulling same docker image at the same time, the local puller just 
> untar the image tarball as many times as those requests are, and cp all of 
> them to the same directory, which wastes time and bear high demand of 
> computation. We should be able to support the local store/puller only do 
> these for the first time, and the simultaneous pulling request should wait 
> for the promised future and get it once the first pulling finishes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3736) Support docker local store pull same image simultaneously

2015-11-07 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3736:
--
Target Version/s: 0.27.0  (was: 0.26.0)

> Support docker local store pull same image simultaneously 
> --
>
> Key: MESOS-3736
> URL: https://issues.apache.org/jira/browse/MESOS-3736
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: mesosphere
>
> The current local store implements get() using the local puller. For all 
> requests of pulling same docker image at the same time, the local puller just 
> untar the image tarball as many times as those requests are, and cp all of 
> them to the same directory, which wastes time and bear high demand of 
> computation. We should be able to support the local store/puller only do 
> these for the first time, and the simultaneous pulling request should wait 
> for the promised future and get it once the first pulling finishes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3554) Allocator changes trigger large re-compiles

2015-11-07 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995399#comment-14995399
 ] 

Till Toenshoff commented on MESOS-3554:
---

[~jvanremoortere] from the status of this issue it seems this is not entirely 
resolved - shall we bump it up to 0.27.0, so that this does not block 0.26.0?

> Allocator changes trigger large re-compiles
> ---
>
> Key: MESOS-3554
> URL: https://issues.apache.org/jira/browse/MESOS-3554
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>  Labels: mesosphere
>
> Due to the templatized nature of the allocator, even small changes trigger 
> large recompiles of the code-base. This make iterating on changes expensive 
> for developers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3838) Put authorize logic for teardown into a common function

2015-11-07 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3838:
--
Target Version/s: 0.27.0  (was: 0.26.0)

> Put authorize logic for teardown into a common function
> ---
>
> Key: MESOS-3838
> URL: https://issues.apache.org/jira/browse/MESOS-3838
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> The mesos now have {{authorizeTask}}, {{authorizeFramework}} and may have 
> {{authorizeReserveResource}} and {{authorizeUnReserveResource}} later. 
> But now the {{Master::Http::teardown()}} is putting the authorize logic in 
> the {{Master::Http::teardown()}} itself, it is better to put authorize logic 
> for teardown into a common function {{authorizeTeardown()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3802) Clear the suppressed flag when deactive a framework

2015-11-07 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3802:
--
Fix Version/s: (was: 0.26.0)

> Clear the suppressed flag when deactive a framework
> ---
>
> Key: MESOS-3802
> URL: https://issues.apache.org/jira/browse/MESOS-3802
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> When deactivate the framework, the suppressed flag was not cleared and this 
> will cause the framework cannot get resource immediately after active, we 
> should clear this flag when deactivate the framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3035) As a Developer I would like a standard way to run a Subprocess in libprocess

2015-11-07 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3035:
--
Target Version/s: 0.27.0  (was: 0.26.0)

> As a Developer I would like a standard way to run a Subprocess in libprocess
> 
>
> Key: MESOS-3035
> URL: https://issues.apache.org/jira/browse/MESOS-3035
> Project: Mesos
>  Issue Type: Story
>  Components: libprocess
>Reporter: Marco Massenzio
>Assignee: Marco Massenzio
>
> As part of MESOS-2830 and MESOS-2902 I have been researching the ability to 
> run a {{Subprocess}} and capture the {{stdout / stderr}} along with the exit 
> status code.
> {{process::subprocess()}} offers much of the functionality, but in a way that 
> still requires a lot of handiwork on the developer's part; we would like to 
> further abstract away the ability to just pass a string, an optional set of 
> command-line arguments and then collect the output of the command (bonus: 
> without blocking).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3835) Expose framework principal through state.json/state

2015-11-07 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3835:
--
Target Version/s: 0.27.0  (was: 0.26.0)

> Expose framework principal through state.json/state
> ---
>
> Key: MESOS-3835
> URL: https://issues.apache.org/jira/browse/MESOS-3835
> Project: Mesos
>  Issue Type: Wish
>  Components: master
>Reporter: Sargun Dhillon
>Assignee: Guangya Liu
>Priority: Trivial
>
> We would like to expose the framework principal through the Master 
> /state.json or /state. This is for the purposes of both debugging (from the 
> operator perspective). This could be used for inspection during the process 
> of creating, or modifying ACLs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3841) Master HTTP API support to get the leader

2015-11-07 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3841:
--
Fix Version/s: (was: 0.26.0)

> Master HTTP API support to get the leader
> -
>
> Key: MESOS-3841
> URL: https://issues.apache.org/jira/browse/MESOS-3841
> Project: Mesos
>  Issue Type: Improvement
>  Components: HTTP API
>Reporter: Cosmin Lehene
>
> There's currently no good way to query the current master ensemble leader.
> Some workarounds to get the leader (and parse it from leader@ip) from 
> {{/state.json}} or to grep it from  {{master/redirect}}. 
> The scheduler API does an HTTP redirect, but that requires an HTTP  POST 
> coming from a framework as well
> {{POST /api/v1/scheduler  HTTP/1.1}}
> There should be a lightweight API call to get the current master. 
> This could be part of a more granular representation (REST) of the current 
> state.json.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2077) Ensure that TASK_LOSTs for a hard slave drain (SIGUSR1) include a Reason.

2015-11-07 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-2077:
--
Target Version/s: 0.27.0  (was: 0.26.0)

> Ensure that TASK_LOSTs for a hard slave drain (SIGUSR1) include a Reason.
> -
>
> Key: MESOS-2077
> URL: https://issues.apache.org/jira/browse/MESOS-2077
> Project: Mesos
>  Issue Type: Improvement
>  Components: master, slave
>Reporter: Benjamin Mahler
>Assignee: Guangya Liu
>  Labels: twitter
>
> For maintenance, sometimes operators will force the drain of a slave (via 
> SIGUSR1), when deemed safe (e.g. non-critical tasks running) and/or necessary 
> (e.g. bad hardware).
> To eliminate alerting noise, we'd like to add a 'Reason' that expresses the 
> forced drain of the slave, so that these are not considered to be a generic 
> slave removal TASK_LOST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3769) Agent logs are misleading during agent shutdown

2015-11-07 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3769:
--
Fix Version/s: (was: 0.26.0)

> Agent logs are misleading during agent shutdown
> ---
>
> Key: MESOS-3769
> URL: https://issues.apache.org/jira/browse/MESOS-3769
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Alexander Rukletsov
>Assignee: Guangya Liu
>Priority: Minor
>  Labels: newbie
>
> When analyzing output of the {{MasterAllocatorTest.SlaveLost}} test I spotted 
> following logs:
> {noformat}
> I1020 18:18:09.026553 237658112 status_update_manager.cpp:322] Received 
> status update TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for 
> task 0 of framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b-
> I1020 18:18:09.026845 234438656 slave.cpp:3090] Forwarding the update 
> TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for task 0 of 
> framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b- to 
> master@172.18.6.110:62507
> I1020 18:18:09.026973 234438656 slave.cpp:651] ; unregistering and shutting 
> down
> I1020 18:18:09.027007 234438656 slave.cpp:2016] Asked to shut down framework 
> 7aff439d-307c-486b-9c0d-c2a47ddbda5b- by @0.0.0.0:0
> I1020 18:18:09.027019 234438656 slave.cpp:2041] Shutting down framework 
> 7aff439d-307c-486b-9c0d-c2a47ddbda5b-
> {noformat}
> It looks like {{Slave::shutdown()}} uses wrong assumptions about possible 
> execution paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3420) Resolve shutdown semantics for Machine/Down

2015-11-06 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993475#comment-14993475
 ] 

Till Toenshoff commented on MESOS-3420:
---

[~klaus1982] we are preparing to tag 0.26.0 - shall we push this one to 0.27.0?

> Resolve shutdown semantics for Machine/Down
> ---
>
> Key: MESOS-3420
> URL: https://issues.apache.org/jira/browse/MESOS-3420
> Project: Mesos
>  Issue Type: Task
>Reporter: Joris Van Remoortere
>Assignee: Klaus Ma
>  Labels: maintenance, mesosphere
>
> When an operator uses the {{machine/down}} endpoint, the master sends a 
> shutdown message to the agent.
> We need to discuss and resolve the semantics that we want regarding the 
> operators and frameworks knowing when their tasks are terminated.
> One option is to explicitly remove the agent from the master which will send 
> the {{TASK_LOST}} updates and {{SlaveLostMessage}} directly from the master. 
> The concern around this is that during a network partition, or if the agent 
> was down at the time, that these tasks could still be running.
> This is a general problem related to task life-times being dissociated with 
> that life-time of the agent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3339) Implement filtering mechanism for (Scheduler API Events) Testing

2015-11-06 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993480#comment-14993480
 ] 

Till Toenshoff commented on MESOS-3339:
---

[~anandmazumdar] we are preparing to tag the 0.26.0 release. The posted RR 
seems to be work-in-progress. Shall we push this one to 0.27.0?

> Implement filtering mechanism for (Scheduler API Events) Testing
> 
>
> Key: MESOS-3339
> URL: https://issues.apache.org/jira/browse/MESOS-3339
> Project: Mesos
>  Issue Type: Task
>  Components: test
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Currently, our testing infrastructure does not have a mechanism of 
> filtering/dropping HTTP events of a particular type from the Scheduler API 
> response stream.  We need a {{DROP_HTTP_CALLS}} abstraction that can help us 
> to filter a particular event type.
> {code}
> // Enqueues all received events into a libprocess queue.
> ACTION_P(Enqueue, queue)
> {
>   std::queue events = arg0;
>   while (!events.empty()) {
> // Note that we currently drop HEARTBEATs because most of these tests
> // are not designed to deal with heartbeats.
> // TODO(vinod): Implement DROP_HTTP_CALLS that can filter heartbeats.
> if (events.front().type() == Event::HEARTBEAT) {
>   VLOG(1) << "Ignoring HEARTBEAT event";
> } else {
>   queue->put(events.front());
> }
> events.pop();
>   }
> }
> {code}
> This helper code is duplicated in at least two places currently, Scheduler 
> Library/Maintenance Primitives tests. 
> - The solution can be as trivial as moving this helper function to a common 
> test-header.
> - Implement a {{DROP_HTTP_CALLS}} similar to what we do for other protobufs 
> via {{DROP_CALLS}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3832) Scheduler HTTP API does not redirect to leading master

2015-11-06 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993578#comment-14993578
 ] 

Till Toenshoff commented on MESOS-3832:
---

[~drexin] are you working on this one, cause we are preparing to tag 0.26.0 - 
shall we push this one to 0.27.0?

> Scheduler HTTP API does not redirect to leading master
> --
>
> Key: MESOS-3832
> URL: https://issues.apache.org/jira/browse/MESOS-3832
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0, 0.24.1, 0.25.0
>Reporter: Dario Rexin
>Assignee: Dario Rexin
>  Labels: newbie
>
> The documentation for the Scheduler HTTP API says:
> {quote}If requests are made to a non-leading master a “HTTP 307 Temporary 
> Redirect” will be received with the “Location” header pointing to the leading 
> master.{quote}
> While the redirect functionality has been implemented, it was not actually 
> used in the handler for the HTTP api.
> A probable fix could be:
> - Check if the current master is the leading master.
> - If not, invoke the existing {{redirect}} method in {{src/master/http.cpp}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3405) Add JSON::protobuf for google::protobuf::RepeatedPtrField.

2015-11-06 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993583#comment-14993583
 ] 

Till Toenshoff commented on MESOS-3405:
---

[~mcypark] seems the RRs are ready to commit, no? We would love to cut the 
0.26.0 and hence need to get this committed or pushed to 0.27.0.

> Add JSON::protobuf for google::protobuf::RepeatedPtrField.
> --
>
> Key: MESOS-3405
> URL: https://issues.apache.org/jira/browse/MESOS-3405
> Project: Mesos
>  Issue Type: Task
>  Components: stout
>Reporter: Michael Park
>Assignee: Klaus Ma
>
> Currently, {{stout/protobuf.hpp}} provides a {{JSON::Protobuf}} utility which 
> converts a {{google::protobuf::Message}} into a {{JSON::Object}}.
> We should add the support for {{google::protobuf::RepeatedPtrField}} by 
> introducing overloaded functions.
> {code}
> namespace JSON {
>   Object protobuf(const google::protobuf::Message& message)
>   {
> Object object;
> /* Move the body of JSON::Protobuf constructor here. */
> return object;
>   }
>   template 
>   Array protobuf(const google::protobuf::RepeatedPtrField& repeated)
>   {
> static_assert(std::is_convertible::value,
>   "T must be a google::protobuf::Message");
> JSON::Array array;
> array.values.reserve(repeated.size());
> foreach (const T& elem, repeated) {
>   array.values.push_back(JSON::Protobuf(elem));
> }
> return array;
>   }
> }
> {code}
> The new {{RepeatedPtrField}} version can be used in at least the following 
> places:
> * {{src/common/http.cpp}}
> * {{src/master/http.cpp}}
> * {{src/slave/containerizer/mesos/containerizer.cpp}}
> * {{src/tests/reservation_endpoints_tests.cpp}}
> * {{src/tests/resources_tests.cpp}}: {{ResourcesTest.ParsingFromJSON}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3063) Add an example framework using dynamic reservation

2015-11-06 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993587#comment-14993587
 ] 

Till Toenshoff commented on MESOS-3063:
---

[~klaus1982] shall we push this towards 0.27.0?

> Add an example framework using dynamic reservation
> --
>
> Key: MESOS-3063
> URL: https://issues.apache.org/jira/browse/MESOS-3063
> Project: Mesos
>  Issue Type: Task
>Reporter: Michael Park
>Assignee: Klaus Ma
>  Labels: mesosphere, persistent-volumes
>
> An example framework using dynamic reservation should added to
> # test dynamic reservations further, and
> # to be used as a reference for those who want to use the dynamic reservation 
> feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2077) Ensure that TASK_LOSTs for a hard slave drain (SIGUSR1) include a Reason.

2015-11-06 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-2077:
--
Affects Version/s: (was: 0.26.0)
Fix Version/s: (was: 0.26.0)

> Ensure that TASK_LOSTs for a hard slave drain (SIGUSR1) include a Reason.
> -
>
> Key: MESOS-2077
> URL: https://issues.apache.org/jira/browse/MESOS-2077
> Project: Mesos
>  Issue Type: Improvement
>  Components: master, slave
>Reporter: Benjamin Mahler
>Assignee: Guangya Liu
>  Labels: twitter
>
> For maintenance, sometimes operators will force the drain of a slave (via 
> SIGUSR1), when deemed safe (e.g. non-critical tasks running) and/or necessary 
> (e.g. bad hardware).
> To eliminate alerting noise, we'd like to add a 'Reason' that expresses the 
> forced drain of the slave, so that these are not considered to be a generic 
> slave removal TASK_LOST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3838) Put authorize logic for teardown into a common function

2015-11-06 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3838:
--
Affects Version/s: (was: 0.26.0)
Fix Version/s: (was: 0.26.0)

> Put authorize logic for teardown into a common function
> ---
>
> Key: MESOS-3838
> URL: https://issues.apache.org/jira/browse/MESOS-3838
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> The mesos now have {{authorizeTask}}, {{authorizeFramework}} and may have 
> {{authorizeReserveResource}} and {{authorizeUnReserveResource}} later. 
> But now the {{Master::Http::teardown()}} is putting the authorize logic in 
> the {{Master::Http::teardown()}} itself, it is better to put authorize logic 
> for teardown into a common function {{authorizeTeardown()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3728) Libprocess: Flaky behavior on test suite when finalizing.

2015-10-14 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956605#comment-14956605
 ] 

Till Toenshoff commented on MESOS-3728:
---

Just got a slightly different trace

{noformat}
E1014 11:54:17.246136 4820992 process.cpp:1914] Failed to shutdown socket with 
fd 15: Socket is not connected
Assertion failed: (ec == 0), function unlock, file 
/BuildRoot/Library/Caches/com.apple.xbs/Sources/libcxx/libcxx-120.1/src/mutex.cpp,
 line 45.
E1014 11:54:17.246408 4820992 process.cpp:1914] Failed to shutdown socket with 
fd 17: Socket is not connected
*** Aborted at 1444816457 (unix time) try "date -d @1444816457" if you are 
using GNU date ***
PC: @ 0x7fff94e370ae __pthread_kill
*** SIGABRT (@0x7fff94e370ae) received by PID 12800 (TID 0x7fff7a208000) stack 
trace: ***
@ 0x7fff9410d52a _sigtramp
@0x1011587f0 
_ZZ11synchronizeINSt3__115recursive_mutexEE12SynchronizedIT_EPS3_ENKUlPS1_E_clES6_
@ 0x7fff9c71237b abort
@ 0x7fff9c6d99c4 __assert_rtn
@ 0x7fff8a27afc8 std::__1::mutex::unlock()
@0x1015329a9 
_ZZ11synchronizeINSt3__15mutexEE12SynchronizedIT_EPS3_ENKUlPS1_E0_clES6_
@0x101532988 
_ZZ11synchronizeINSt3__15mutexEE12SynchronizedIT_EPS3_ENUlPS1_E0_8__invokeES6_
@0x101532a40 Synchronized<>::~Synchronized()
@0x1014f90f5 Synchronized<>::~Synchronized()
@0x10150d45c Gate::empty()
@0x1014f02cc process::ProcessManager::wait()
@0x1014f42b6 process::wait()
@0x100ed67ee process::wait()
@0x1014e9c82 process::ProcessManager::~ProcessManager()
@0x1014da875 process::ProcessManager::~ProcessManager()
@0x1014da848 process::finalize()
@0x100f3f72e main
@ 0x7fff9ba305ad start
make[5]: *** [check-local] Abort trap: 6
make[4]: *** [check-am] Error 2
make[3]: *** [check-recursive] Error 1
make[2]: *** [check-recursive] Error 1
make[1]: *** [check] Error 2
make: *** [check-recursive] Error 1
{noformat}

> Libprocess: Flaky behavior on test suite when finalizing.
> -
>
> Key: MESOS-3728
> URL: https://issues.apache.org/jira/browse/MESOS-3728
> Project: Mesos
>  Issue Type: Bug
> Environment: OS 10.11.1 Beta (15B30a),
> Apple LLVM version 7.0.0 (clang-700.0.72)
>Reporter: Till Toenshoff
>
> The issue manifests in the following stacktrace. Triggering the issue is not 
> too hard on my machine - fails in more than 10% of all attempts.
> {noformat}
> [--] Global test environment tear-down
> [==] 148 tests from 22 test cases ran. (1323 ms total)
> [  PASSED  ] 148 tests.
>   YOU HAVE 2 DISABLED TESTS
> Assertion failed: (ec == 0), function unlock, file 
> /BuildRoot/Library/Caches/com.apple.xbs/Sources/libcxx/libcxx-120.1/src/mutex.cpp,
>  line 45.
> *** Aborted at 1444816067 (unix time) try "date -d @1444816067" if you are 
> using GNU date ***
> PC: @ 0x7fff94e370ae __pthread_kill
> *** SIGABRT (@0x7fff94e370ae) received by PID 11537 (TID 0x70104000) 
> stack trace: ***
> @ 0x7fff9410d52a _sigtramp
> @ 0x701034d8 (unknown)
> @ 0x7fff9c71237b abort
> @ 0x7fff9c6d99c4 __assert_rtn
> @ 0x7fff8a27afc8 std::__1::mutex::unlock()
> @0x102b109a9 
> _ZZ11synchronizeINSt3__15mutexEE12SynchronizedIT_EPS3_ENKUlPS1_E0_clES6_
> @0x102b10988 
> _ZZ11synchronizeINSt3__15mutexEE12SynchronizedIT_EPS3_ENUlPS1_E0_8__invokeES6_
> @0x102b10a40 Synchronized<>::~Synchronized()
> @0x102ad70f5 Synchronized<>::~Synchronized()
> @0x102aeb45c Gate::empty()
> @0x102ace2cc process::ProcessManager::wait()
> @0x102ad22b6 process::wait()
> @0x1024b47ee process::wait()
> @0x1029b56a6 process::http::Connection::Data::~Data()
> @0x1029b55d5 process::http::Connection::Data::~Data()
> @0x1029a71bc std::__1::__shared_ptr_emplace<>::__on_zero_shared()
> @ 0x7fff8a27acb8 std::__1::__shared_weak_count::__release_shared()
> @0x1024ba50f std::__1::shared_ptr<>::~shared_ptr()
> @0x1024ba4d5 std::__1::shared_ptr<>::~shared_ptr()
> @0x1024ba4b5 process::http::Connection::~Connection()
> @0x1024a2cc5 process::http::Connection::~Connection()
> @0x10297b81d 
> _ZZZN7process4http8internal7requestERKNS0_7RequestEbENK3$_2clENS0_10ConnectionEENKUlvE_clEv
> @0x10297b6fd 
> _ZN7process20AsyncExecutorProcess7executeIZZNS_4http8internal7requestERKNS2_7RequestEbENK3$_2clENS2_10ConnectionEEUlvE_EE7NothingRKT_PN5boost9enable_ifINSE_7is_voidINSt3__19result_ofIFSB_vEE4typeEEEvE4typeE
> @0x10297d8be 
> 

[jira] [Updated] (MESOS-3728) Libprocess: Flaky behavior on test suite when finalizing.

2015-10-14 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3728:
--
Summary: Libprocess: Flaky behavior on test suite when finalizing.  (was: 
Libprocess: Flaky behavior on test suite is finalizing.)

> Libprocess: Flaky behavior on test suite when finalizing.
> -
>
> Key: MESOS-3728
> URL: https://issues.apache.org/jira/browse/MESOS-3728
> Project: Mesos
>  Issue Type: Bug
> Environment: OS 10.11.1 Beta (15B30a),
> Apple LLVM version 7.0.0 (clang-700.0.72)
>Reporter: Till Toenshoff
>
> The issue manifests in the following stacktrace. Triggering the issue is not 
> too hard on my machine - fails in more than 10% of all attempts.
> {noformat}
> [--] Global test environment tear-down
> [==] 148 tests from 22 test cases ran. (1323 ms total)
> [  PASSED  ] 148 tests.
>   YOU HAVE 2 DISABLED TESTS
> Assertion failed: (ec == 0), function unlock, file 
> /BuildRoot/Library/Caches/com.apple.xbs/Sources/libcxx/libcxx-120.1/src/mutex.cpp,
>  line 45.
> *** Aborted at 1444816067 (unix time) try "date -d @1444816067" if you are 
> using GNU date ***
> PC: @ 0x7fff94e370ae __pthread_kill
> *** SIGABRT (@0x7fff94e370ae) received by PID 11537 (TID 0x70104000) 
> stack trace: ***
> @ 0x7fff9410d52a _sigtramp
> @ 0x701034d8 (unknown)
> @ 0x7fff9c71237b abort
> @ 0x7fff9c6d99c4 __assert_rtn
> @ 0x7fff8a27afc8 std::__1::mutex::unlock()
> @0x102b109a9 
> _ZZ11synchronizeINSt3__15mutexEE12SynchronizedIT_EPS3_ENKUlPS1_E0_clES6_
> @0x102b10988 
> _ZZ11synchronizeINSt3__15mutexEE12SynchronizedIT_EPS3_ENUlPS1_E0_8__invokeES6_
> @0x102b10a40 Synchronized<>::~Synchronized()
> @0x102ad70f5 Synchronized<>::~Synchronized()
> @0x102aeb45c Gate::empty()
> @0x102ace2cc process::ProcessManager::wait()
> @0x102ad22b6 process::wait()
> @0x1024b47ee process::wait()
> @0x1029b56a6 process::http::Connection::Data::~Data()
> @0x1029b55d5 process::http::Connection::Data::~Data()
> @0x1029a71bc std::__1::__shared_ptr_emplace<>::__on_zero_shared()
> @ 0x7fff8a27acb8 std::__1::__shared_weak_count::__release_shared()
> @0x1024ba50f std::__1::shared_ptr<>::~shared_ptr()
> @0x1024ba4d5 std::__1::shared_ptr<>::~shared_ptr()
> @0x1024ba4b5 process::http::Connection::~Connection()
> @0x1024a2cc5 process::http::Connection::~Connection()
> @0x10297b81d 
> _ZZZN7process4http8internal7requestERKNS0_7RequestEbENK3$_2clENS0_10ConnectionEENKUlvE_clEv
> @0x10297b6fd 
> _ZN7process20AsyncExecutorProcess7executeIZZNS_4http8internal7requestERKNS2_7RequestEbENK3$_2clENS2_10ConnectionEEUlvE_EE7NothingRKT_PN5boost9enable_ifINSE_7is_voidINSt3__19result_ofIFSB_vEE4typeEEEvE4typeE
> @0x10297d8be 
> _ZZN7process8dispatchI7NothingNS_20AsyncExecutorProcessERKZZNS_4http8internal7requestERKNS3_7RequestEbENK3$_2clENS3_10ConnectionEEUlvE_PvSA_SD_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSI_FSF_T1_T2_ET3_T4_ENKUlPNS_11ProcessBaseEE_clEST_
> @0x10297d730 
> _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN7process8dispatchI7NothingNS3_20AsyncExecutorProcessERKZZNS3_4http8internal7requestERKNS7_7RequestEbENK3$_2clENS7_10ConnectionEEUlvE_PvSE_SH_EENS3_6FutureIT_EERKNS3_3PIDIT0_EEMSM_FSJ_T1_T2_ET3_T4_EUlPNS3_11ProcessBaseEE_SX_EEEvDpOT_
> @0x10297d3fc 
> _ZNSt3__110__function6__funcIZN7process8dispatchI7NothingNS2_20AsyncExecutorProcessERKZZNS2_4http8internal7requestERKNS6_7RequestEbENK3$_2clENS6_10ConnectionEEUlvE_PvSD_SG_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSL_FSI_T1_T2_ET3_T4_EUlPNS2_11ProcessBaseEE_NS_9allocatorISX_EEFvSW_EEclEOSW_
> @0x102aec69f std::__1::function<>::operator()()
> @0x102acef4f process::ProcessBase::visit()
> @0x102b0f7de process::DispatchEvent::visit()
> @0x1023417d1 process::ProcessBase::serve()
> @0x102acbce1 process::ProcessManager::resume()
> @0x102ad6a4c 
> process::ProcessManager::init_threads()::$_1::operator()()
> make[5]: *** [check-local] Abort trap: 6
> make[4]: *** [check-am] Error 2
> make[3]: *** [check-recursive] Error 1
> make[2]: *** [check-recursive] Error 1
> make[1]: *** [check] Error 2
> make: *** [check-recursive] Error 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3728) Libprocess: Flaky behavior on test suite is finalizing.

2015-10-14 Thread Till Toenshoff (JIRA)
Till Toenshoff created MESOS-3728:
-

 Summary: Libprocess: Flaky behavior on test suite is finalizing.
 Key: MESOS-3728
 URL: https://issues.apache.org/jira/browse/MESOS-3728
 Project: Mesos
  Issue Type: Bug
 Environment: OS 10.11.1 Beta (15B30a),
Apple LLVM version 7.0.0 (clang-700.0.72)
Reporter: Till Toenshoff


The issue manifests in the following stacktrace. Triggering the issue is not 
too hard on my machine - fails in more than 10% of all attempts.

{noformat}
[--] Global test environment tear-down
[==] 148 tests from 22 test cases ran. (1323 ms total)
[  PASSED  ] 148 tests.

  YOU HAVE 2 DISABLED TESTS

Assertion failed: (ec == 0), function unlock, file 
/BuildRoot/Library/Caches/com.apple.xbs/Sources/libcxx/libcxx-120.1/src/mutex.cpp,
 line 45.
*** Aborted at 1444816067 (unix time) try "date -d @1444816067" if you are 
using GNU date ***
PC: @ 0x7fff94e370ae __pthread_kill
*** SIGABRT (@0x7fff94e370ae) received by PID 11537 (TID 0x70104000) stack 
trace: ***
@ 0x7fff9410d52a _sigtramp
@ 0x701034d8 (unknown)
@ 0x7fff9c71237b abort
@ 0x7fff9c6d99c4 __assert_rtn
@ 0x7fff8a27afc8 std::__1::mutex::unlock()
@0x102b109a9 
_ZZ11synchronizeINSt3__15mutexEE12SynchronizedIT_EPS3_ENKUlPS1_E0_clES6_
@0x102b10988 
_ZZ11synchronizeINSt3__15mutexEE12SynchronizedIT_EPS3_ENUlPS1_E0_8__invokeES6_
@0x102b10a40 Synchronized<>::~Synchronized()
@0x102ad70f5 Synchronized<>::~Synchronized()
@0x102aeb45c Gate::empty()
@0x102ace2cc process::ProcessManager::wait()
@0x102ad22b6 process::wait()
@0x1024b47ee process::wait()
@0x1029b56a6 process::http::Connection::Data::~Data()
@0x1029b55d5 process::http::Connection::Data::~Data()
@0x1029a71bc std::__1::__shared_ptr_emplace<>::__on_zero_shared()
@ 0x7fff8a27acb8 std::__1::__shared_weak_count::__release_shared()
@0x1024ba50f std::__1::shared_ptr<>::~shared_ptr()
@0x1024ba4d5 std::__1::shared_ptr<>::~shared_ptr()
@0x1024ba4b5 process::http::Connection::~Connection()
@0x1024a2cc5 process::http::Connection::~Connection()
@0x10297b81d 
_ZZZN7process4http8internal7requestERKNS0_7RequestEbENK3$_2clENS0_10ConnectionEENKUlvE_clEv
@0x10297b6fd 
_ZN7process20AsyncExecutorProcess7executeIZZNS_4http8internal7requestERKNS2_7RequestEbENK3$_2clENS2_10ConnectionEEUlvE_EE7NothingRKT_PN5boost9enable_ifINSE_7is_voidINSt3__19result_ofIFSB_vEE4typeEEEvE4typeE
@0x10297d8be 
_ZZN7process8dispatchI7NothingNS_20AsyncExecutorProcessERKZZNS_4http8internal7requestERKNS3_7RequestEbENK3$_2clENS3_10ConnectionEEUlvE_PvSA_SD_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSI_FSF_T1_T2_ET3_T4_ENKUlPNS_11ProcessBaseEE_clEST_
@0x10297d730 
_ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN7process8dispatchI7NothingNS3_20AsyncExecutorProcessERKZZNS3_4http8internal7requestERKNS7_7RequestEbENK3$_2clENS7_10ConnectionEEUlvE_PvSE_SH_EENS3_6FutureIT_EERKNS3_3PIDIT0_EEMSM_FSJ_T1_T2_ET3_T4_EUlPNS3_11ProcessBaseEE_SX_EEEvDpOT_
@0x10297d3fc 
_ZNSt3__110__function6__funcIZN7process8dispatchI7NothingNS2_20AsyncExecutorProcessERKZZNS2_4http8internal7requestERKNS6_7RequestEbENK3$_2clENS6_10ConnectionEEUlvE_PvSD_SG_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSL_FSI_T1_T2_ET3_T4_EUlPNS2_11ProcessBaseEE_NS_9allocatorISX_EEFvSW_EEclEOSW_
@0x102aec69f std::__1::function<>::operator()()
@0x102acef4f process::ProcessBase::visit()
@0x102b0f7de process::DispatchEvent::visit()
@0x1023417d1 process::ProcessBase::serve()
@0x102acbce1 process::ProcessManager::resume()
@0x102ad6a4c 
process::ProcessManager::init_threads()::$_1::operator()()
make[5]: *** [check-local] Abort trap: 6
make[4]: *** [check-am] Error 2
make[3]: *** [check-recursive] Error 1
make[2]: *** [check-recursive] Error 1
make[1]: *** [check] Error 2
make: *** [check-recursive] Error 1
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3769) Agent logs are misleading during agent shutdown

2015-10-20 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3769:
--
Shepherd: Till Toenshoff

> Agent logs are misleading during agent shutdown
> ---
>
> Key: MESOS-3769
> URL: https://issues.apache.org/jira/browse/MESOS-3769
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rukletsov
>Priority: Minor
>
> When analyzing output of the {{MasterAllocatorTest.SlaveLost}} test I spotted 
> following logs:
> {noformat}
> I1020 18:18:09.026553 237658112 status_update_manager.cpp:322] Received 
> status update TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for 
> task 0 of framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b-
> I1020 18:18:09.026845 234438656 slave.cpp:3090] Forwarding the update 
> TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for task 0 of 
> framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b- to 
> master@172.18.6.110:62507
> I1020 18:18:09.026973 234438656 slave.cpp:651] ; unregistering and shutting 
> down
> I1020 18:18:09.027007 234438656 slave.cpp:2016] Asked to shut down framework 
> 7aff439d-307c-486b-9c0d-c2a47ddbda5b- by @0.0.0.0:0
> I1020 18:18:09.027019 234438656 slave.cpp:2041] Shutting down framework 
> 7aff439d-307c-486b-9c0d-c2a47ddbda5b-
> {noformat}
> It looks like {{Slave::shutdown()}} uses wrong assumptions about possible 
> execution paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3769) Agent logs are misleading during agent shutdown

2015-10-20 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3769:
--
Labels: newbie  (was: )

Looks like all code-paths in {{void Slave::shutdown(const UPID& from, const 
string& message)}} should should check for {{message.empty()}} before trying to 
log it.


> Agent logs are misleading during agent shutdown
> ---
>
> Key: MESOS-3769
> URL: https://issues.apache.org/jira/browse/MESOS-3769
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rukletsov
>Priority: Minor
>  Labels: newbie
>
> When analyzing output of the {{MasterAllocatorTest.SlaveLost}} test I spotted 
> following logs:
> {noformat}
> I1020 18:18:09.026553 237658112 status_update_manager.cpp:322] Received 
> status update TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for 
> task 0 of framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b-
> I1020 18:18:09.026845 234438656 slave.cpp:3090] Forwarding the update 
> TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for task 0 of 
> framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b- to 
> master@172.18.6.110:62507
> I1020 18:18:09.026973 234438656 slave.cpp:651] ; unregistering and shutting 
> down
> I1020 18:18:09.027007 234438656 slave.cpp:2016] Asked to shut down framework 
> 7aff439d-307c-486b-9c0d-c2a47ddbda5b- by @0.0.0.0:0
> I1020 18:18:09.027019 234438656 slave.cpp:2041] Shutting down framework 
> 7aff439d-307c-486b-9c0d-c2a47ddbda5b-
> {noformat}
> It looks like {{Slave::shutdown()}} uses wrong assumptions about possible 
> execution paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3113) Add resource usage section to containerizer documentation

2015-10-20 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3113:
--
Shepherd: Till Toenshoff

> Add resource usage section to containerizer documentation
> -
>
> Key: MESOS-3113
> URL: https://issues.apache.org/jira/browse/MESOS-3113
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Niklas Quarfot Nielsen
>Assignee: Gilbert Song
>  Labels: docathon, documentaion, mesosphere
>
> Currently, the containerizer documentation doesn't touch upon the usage() API 
> and how to interpret the collected statistics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3030) Build failure on OS 10.11 using Xcode 7.

2015-10-09 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950168#comment-14950168
 ] 

Till Toenshoff commented on MESOS-3030:
---

The issues persists all the way to OS X 10.11 release, will propose a fix.

> Build failure on OS 10.11 using Xcode 7.
> 
>
> Key: MESOS-3030
> URL: https://issues.apache.org/jira/browse/MESOS-3030
> Project: Mesos
>  Issue Type: Bug
> Environment: OS 10.11 Beta (15A215h), Apple LLVM version 7.0.0 
> (clang-700.0.57.2)
>Reporter: Till Toenshoff
>
> When trying to build Mesos (recent master) on OS X El Capitan (public beta 1) 
> with apple's clang distribution via Xcode 7 (beta 3) the following warnings 
> trigger build failures;
> h6.Boost: unused-local-typedef 
> {noformat}
> ../3rdparty/libprocess/3rdparty/boost-1.53.0/boost/tuple/detail/tuple_basic.hpp:228:31:
>  error: unused typedef 'cons_element' [-Werror,-Wunused-local-typedef]
>   typedef typename impl::type cons_element;
> {noformat}
> h6.CyrusSASL2: deprecated-declarations
> {noformat}
> distcc[57619] ERROR: compile 
> /Users/till/.ccache/tmp/authentica.stdout.lobomacpro2.fritz.box.48363.0QJikQ.ii
>  on localhost failed
> ../../src/authentication/cram_md5/authenticatee.cpp:75:7: error: 
> 'sasl_dispose' is deprecated: first deprecated in OS X 10.11 
> [-Werror,-Wdeprecated-declarations]
>   sasl_dispose();
>   ^
> /usr/include/sasl/sasl.h:746:13: note: 'sasl_dispose' has been explicitly 
> marked deprecated here
> extern void sasl_dispose(sasl_conn_t **pconn) 
> __attribute__((availability(macosx,introduced=10.0,deprecated=10.11)));
> ^
> {noformat}
> 
> A simple workaround is disabling those warnings for now;
> {noformat}
> export CXXFLAGS="-Wno-unused-local-typedef -Wno-deprecated-declarations"
> export CCFLAGS="-Wno-unused-local-typedef -Wno-deprecated-declarations"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3030) Build failure on OS 10.11 using Xcode 7.

2015-10-12 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff reassigned MESOS-3030:
-

Assignee: Till Toenshoff

> Build failure on OS 10.11 using Xcode 7.
> 
>
> Key: MESOS-3030
> URL: https://issues.apache.org/jira/browse/MESOS-3030
> Project: Mesos
>  Issue Type: Bug
> Environment: OS 10.11 Beta (15A215h), Apple LLVM version 7.0.0 
> (clang-700.0.57.2)
>Reporter: Till Toenshoff
>Assignee: Till Toenshoff
>
> When trying to build Mesos (recent master) on OS X El Capitan (public beta 1) 
> with apple's clang distribution via Xcode 7 (beta 3) the following warnings 
> trigger build failures;
> h6.Boost: unused-local-typedef 
> {noformat}
> ../3rdparty/libprocess/3rdparty/boost-1.53.0/boost/tuple/detail/tuple_basic.hpp:228:31:
>  error: unused typedef 'cons_element' [-Werror,-Wunused-local-typedef]
>   typedef typename impl::type cons_element;
> {noformat}
> h6.CyrusSASL2: deprecated-declarations
> {noformat}
> distcc[57619] ERROR: compile 
> /Users/till/.ccache/tmp/authentica.stdout.lobomacpro2.fritz.box.48363.0QJikQ.ii
>  on localhost failed
> ../../src/authentication/cram_md5/authenticatee.cpp:75:7: error: 
> 'sasl_dispose' is deprecated: first deprecated in OS X 10.11 
> [-Werror,-Wdeprecated-declarations]
>   sasl_dispose();
>   ^
> /usr/include/sasl/sasl.h:746:13: note: 'sasl_dispose' has been explicitly 
> marked deprecated here
> extern void sasl_dispose(sasl_conn_t **pconn) 
> __attribute__((availability(macosx,introduced=10.0,deprecated=10.11)));
> ^
> {noformat}
> 
> A simple workaround is disabling those warnings for now;
> {noformat}
> export CXXFLAGS="-Wno-unused-local-typedef -Wno-deprecated-declarations"
> export CCFLAGS="-Wno-unused-local-typedef -Wno-deprecated-declarations"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3600) unable to build with non-default protobuf

2015-10-13 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3600:
--
Shepherd: Till Toenshoff

> unable to build with non-default protobuf
> -
>
> Key: MESOS-3600
> URL: https://issues.apache.org/jira/browse/MESOS-3600
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Reporter: James Peach
>
> If I install a custom protobuf into {{/opt/protobuf}}, I should be able to 
> pass {{--with-protobuf=/opt/protobuf}} to configure the build to use it.
> On OS X, this fails:
> {code}
> ...
> checking google/protobuf/message.h usability... yes
> checking google/protobuf/message.h presence... yes
> checking for google/protobuf/message.h... yes
> checking for _init in -lprotobuf... no
> configure: error: cannot find protobuf
> ---
> You have requested the use of a non-bundled protobuf but no suitable
> protobuf could be found.
> You may want specify the location of protobuf by providing a prefix
> path via --with-protobuf=DIR, or check that the path you provided is
> correct if youre already doing this.
> ---
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3608) optionally install test binaries

2015-10-13 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3608:
--
Shepherd: Till Toenshoff

> optionally install test binaries
> 
>
> Key: MESOS-3608
> URL: https://issues.apache.org/jira/browse/MESOS-3608
> Project: Mesos
>  Issue Type: Improvement
>  Components: build, test
>Reporter: James Peach
>Priority: Minor
>
> Many of the tests in Mesos could be described as integration tests, since 
> they have external dependencies on kernel features, installed tools, 
> permissions, etc. I'd like to be able to generate a {{mesos-tests}} RPM along 
> with my {{mesos}} RPM so that I can run the same tests in different 
> deployment environments.
> I propose a new configuration option named {{--enable-test-tools}} that will 
> install the tests into {{libexec/mesos/tests}}. I'll also need to make some 
> minor changes to tests so that helper tools can be found in this location as 
> well as in the build directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2947) Authorizer Module: Implementation, Integration Tests

2015-07-08 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-2947:
--
Sprint: Mesosphere Sprint 14

 Authorizer Module: Implementation, Integration  Tests
 --

 Key: MESOS-2947
 URL: https://issues.apache.org/jira/browse/MESOS-2947
 Project: Mesos
  Issue Type: Improvement
Reporter: Till Toenshoff
Assignee: Alexander Rojas
  Labels: mesosphere, module, security

 h4.Motivation
 Provide an example authorizer module based on the {{LocalAuthorizer}} 
 implementation. Make sure that such authorizer module can be fully unit- and 
 integration- tested within the mesos test suite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-2946) Authorizer Module: Interface design

2015-07-07 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613069#comment-14613069
 ] 

Till Toenshoff edited comment on MESOS-2946 at 7/7/15 8:27 PM:
---

h4.Status Quo
As the current design stands, {{Authorizer}} is indeed an interface, but its 
default implementation is declared in the same header. Moreover, if one decides 
to create an alternative implementation for authorization, Mesos needs to be 
recompiled and all the places where the authorizer gets instantiated need to be 
updated.

h4.Design
Under the modularize version, the MVP for the {{Authorizer}} interface will 
look like:

{code}
class Authorizer
{
public:
  static TryAuthorizer* create(const std::string name);

  virtual ~Authorizer() {}

  virtual TryNothing initialize(const OptionACLs acls) = 0;

  virtual process::Futurebool authorize(
  const ACL::RegisterFramework request) = 0;
  virtual process::Futurebool authorize(
  const ACL::RunTask request) = 0;
  virtual process::Futurebool authorize(
  const ACL::ShutdownFramework request) = 0;

protected:
  Authorizer() {}
};
{code}

Where {{Authorizer::create(const std::string)}} is the factory function which 
will construct the default {{LocalAuthorizer}} if local is selected and will 
use the existing facilities within {{ModuleManager}} to load the appropriate 
module in any other case.

In order to allow the {{LocalAuthorizer}} to play nicely with the general 
modules design, it needs a default constructor. This constraint leads to the 
existence of {{Authorizer::initialize(const OptionACLs)}} which is needed to 
pass initialization parameters to the {{LocalAuthorizer}}. Note that all other 
authorizers will use the {{ModuleManager}} mechanisms to pass initialization 
parameters. This follows the pattern used in the {{Authenticator}} module. The 
method {{Authorizer::initialize(const OptionACLs)}} can be removed when we 
go to a modules only implementation.

All other methods remain unchanged from the original {{Authorizer}} interface.


was (Author: arojas):
h4.Status Quo
As the current design stands, {{Authorizer}} is indeed an interface, but its 
default implementation is declared in the same header. Moreover, if one decides 
to create an alternative implementation for authorization, Mesos needs to be 
recompiled and all the places where the authorizer gets instantiated need to be 
updated.

h4.Design
Under the modularize version, the MVP for the {{Authorizer}} interface will 
look like:

{code}
class Authorizer
{
public:
  static TryAuthorizer* create(const std::string name);

  virtual ~Authorizer() {}

  virtual TryNothing initialize(const OptionACLs acls) = 0;

  virtual process::Futurebool authorize(
  const ACL::RegisterFramework request) = 0;
  virtual process::Futurebool authorize(
  const ACL::RunTask request) = 0;
  virtual process::Futurebool authorize(
  const ACL::ShutdownFramework request) = 0;

protected:
  Authorizer() {}
};
{code}

Where {{Authorizer::create(const std::string)}} is the factory function which 
will construct the default {{LocalAuthorizer}} if local is selected and will 
use the existing facilities within {{ModuleManager}} to load the appropriate 
module in any other case.

In order to allow the {{LocalAuthorizer}} to play nicely with the general 
modules design, it needs a default constructor. This constraint leads to the 
existence of {{Authorizer::initialize(const OptionACLs)}} which is needed to 
pass initialization parameters to the {{LocalAuthorizer}}. Note that all other 
authorizers will use the {{ModuleManager}} mechanisms to pass initialization 
parameters. This follows the pattern used in the {{Authorizator}} module. The 
method {{Authorizer::initialize(const OptionACLs)}} can be removed when we 
go to a modules only implementation.

All other methods remain unchanged from the original {{Authorizer}} interface.

 Authorizer Module: Interface design
 ---

 Key: MESOS-2946
 URL: https://issues.apache.org/jira/browse/MESOS-2946
 Project: Mesos
  Issue Type: Improvement
Reporter: Till Toenshoff
Assignee: Till Toenshoff
  Labels: mesosphere, module, security

 h4.Motivation
 Design an interface covering authorizer modules while staying minimally 
 invasive in regards to changes to the existing {{LocalAuthorizer}} 
 implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-708) Static files missing Last-Modified HTTP headers

2015-07-09 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621394#comment-14621394
 ] 

Till Toenshoff commented on MESOS-708:
--

See MESOS-3026 for some serious problems of these patches.

 Static files missing Last-Modified HTTP headers
 -

 Key: MESOS-708
 URL: https://issues.apache.org/jira/browse/MESOS-708
 Project: Mesos
  Issue Type: Improvement
  Components: libprocess, webui
Affects Versions: 0.13.0
Reporter: Ross Allen
Assignee: Alexander Rojas
  Labels: mesosphere

 Static assets served by the Mesos master don't return Last-Modified HTTP 
 headers. That means clients receive a 200 status code and re-download assets 
 on every page request even if the assets haven't changed. Because Angular JS 
 does most of the work, the downloading happens only when you navigate to 
 Mesos master in your browser or use the browser's refresh.
 Example header for mesos.css:
 HTTP/1.1 200 OK
 Date: Thu, 26 Sep 2013 17:18:52 GMT
 Content-Length: 1670
 Content-Type: text/css
 Clients sometimes use the Date header for the same effect as 
 Last-Modified, but the date is always the time of the response from the 
 server, i.e. it changes on every request and makes the assets look new every 
 time.
 The Last-Modified header should be added and should be the last modified 
 time of the file. On subsequent requests for the same files, the master 
 should return 304 responses with no content rather than 200 with the full 
 files. It could save clients a lot of download time since Mesos assets are 
 rather heavyweight.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3026) ProcessTest.Cache fails and hangs

2015-07-09 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621389#comment-14621389
 ] 

Till Toenshoff commented on MESOS-3026:
---

commit dab0977d2c9649fd9a7235c82cfa5d944ca32214
Author: Till Toenshoff toensh...@me.com
Date:   Fri Jul 10 00:13:06 2015 +0200

Reverted commit for HTTP caching of static assets.

This reverts commit d0300e1a47d1ba5d6714957fc258ab125fd53ed1.

We identified several issues in this implementation and the most important
one is described by MESOS-3026.

 ProcessTest.Cache fails and hangs
 -

 Key: MESOS-3026
 URL: https://issues.apache.org/jira/browse/MESOS-3026
 Project: Mesos
  Issue Type: Bug
  Components: libprocess
 Environment: ubuntu 15.04/ ubuntu 14.04.2
 clang-3.6 / gcc 4.8.2
Reporter: Joris Van Remoortere
Assignee: Alexander Rojas
Priority: Blocker
  Labels: libprocess, tests

 {code}
 [ RUN  ] ProcessTest.Cache
 ../../../3rdparty/libprocess/src/tests/process_tests.cpp:1726: Failure
 Value of: response.get().status
   Actual: 200 OK
 Expected: 304 Not Modified
 [  FAILED  ] ProcessTest.Cache (1 ms)
 {code}
 The tests then finish running, but the gtest framework fails to terminate and 
 uses 100% CPU.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3024) HTTP endpoint authN is enabled merely by specifying --credentials

2015-07-09 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621487#comment-14621487
 ] 

Till Toenshoff commented on MESOS-3024:
---

+1 for undesired coupling of master provided credentials with authentication 
activation. Let's keep in mind that e.g. ticket based authentication mechanisms 
do not require credentials.

+1 for getting rid of the authentication requirement activation via the 
authenticate and authenticate_slaves flags and instead using the ACLs.


 HTTP endpoint authN is enabled merely by specifying --credentials
 -

 Key: MESOS-3024
 URL: https://issues.apache.org/jira/browse/MESOS-3024
 Project: Mesos
  Issue Type: Bug
  Components: master, security
Reporter: Adam B
  Labels: authentication, http, mesosphere

 If I set `--credentials` on the master, framework and slave authentication 
 are allowed, but not required. On the other hand, http authentication is now 
 required for authenticated endpoints (currently only `/shutdown`). That means 
 that I cannot enable framework or slave authentication without also enabling 
 http endpoint authentication. This is undesirable.
 Framework and slave authentication have separate flags (`--authenticate` and 
 `--authenticate_slaves`) to require authentication for each. It would be 
 great if there was also such a flag for framework authentication. Or maybe we 
 get rid of these flags altogether and rely on ACLs to determine which 
 unauthenticated principals are even allowed to authenticate for each 
 endpoint/action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3937) Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.

2015-11-17 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009819#comment-15009819
 ] 

Till Toenshoff commented on MESOS-3937:
---

I have that test also failing (100%) on a vmware fusion box; exact same OS, 
compiler and docker configuration as tested by Bernd -- but not the same image 
/ machine.

> Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.
> ---
>
> Key: MESOS-3937
> URL: https://issues.apache.org/jira/browse/MESOS-3937
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.26.0
> Environment: Ubuntu 14.04, gcc 4.8.4, Docker version 1.6.2
>Reporter: Bernd Mathiske
>
> {noformat}
> ../configure
> make check
> sudo ./bin/mesos-tests.sh 
> --gtest_filter="DockerContainerizerTest.ROOT_DOCKER_Launch_Executor" --verbose
> {noformat}
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from DockerContainerizerTest
> I1117 15:08:09.265943 26380 leveldb.cpp:176] Opened db in 3.199666ms
> I1117 15:08:09.267761 26380 leveldb.cpp:183] Compacted db in 1.684873ms
> I1117 15:08:09.267902 26380 leveldb.cpp:198] Created db iterator in 58313ns
> I1117 15:08:09.267966 26380 leveldb.cpp:204] Seeked to beginning of db in 
> 4927ns
> I1117 15:08:09.267997 26380 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 1605ns
> I1117 15:08:09.268156 26380 replica.cpp:780] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1117 15:08:09.270148 26396 recover.cpp:449] Starting replica recovery
> I1117 15:08:09.272105 26396 recover.cpp:475] Replica is in EMPTY status
> I1117 15:08:09.275640 26396 replica.cpp:676] Replica in EMPTY status received 
> a broadcasted recover request from (4)@10.0.2.15:50088
> I1117 15:08:09.276578 26399 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1117 15:08:09.277600 26397 recover.cpp:566] Updating replica status to 
> STARTING
> I1117 15:08:09.279613 26396 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.016098ms
> I1117 15:08:09.279731 26396 replica.cpp:323] Persisted replica status to 
> STARTING
> I1117 15:08:09.280306 26399 recover.cpp:475] Replica is in STARTING status
> I1117 15:08:09.282181 26400 replica.cpp:676] Replica in STARTING status 
> received a broadcasted recover request from (5)@10.0.2.15:50088
> I1117 15:08:09.282552 26400 master.cpp:367] Master 
> 59c600f1-92ff-4926-9c84-073d9b81f68a (vagrant-ubuntu-trusty-64) started on 
> 10.0.2.15:50088
> I1117 15:08:09.283021 26400 master.cpp:369] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/40AlT8/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/40AlT8/master" 
> --zk_session_timeout="10secs"
> I1117 15:08:09.283920 26400 master.cpp:414] Master only allowing 
> authenticated frameworks to register
> I1117 15:08:09.283972 26400 master.cpp:419] Master only allowing 
> authenticated slaves to register
> I1117 15:08:09.284032 26400 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/40AlT8/credentials'
> I1117 15:08:09.282944 26401 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1117 15:08:09.284639 26401 recover.cpp:566] Updating replica status to VOTING
> I1117 15:08:09.285539 26400 master.cpp:458] Using default 'crammd5' 
> authenticator
> I1117 15:08:09.285995 26401 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.075466ms
> I1117 15:08:09.286062 26401 replica.cpp:323] Persisted replica status to 
> VOTING
> I1117 15:08:09.286200 26401 recover.cpp:580] Successfully joined the Paxos 
> group
> I1117 15:08:09.286471 26401 recover.cpp:464] Recover process terminated
> I1117 15:08:09.287303 26400 authenticator.cpp:520] Initializing server SASL
> I1117 15:08:09.289371 26400 master.cpp:495] Authorization enabled
> I1117 15:08:09.296018 26399 master.cpp:1606] The newly elected leader is 
> master@10.0.2.15:50088 with id 59c600f1-92ff-4926-9c84-073d9b81f68a
> I1117 15:08:09.296115 26399 master.cpp:1619] Elected 

[jira] [Commented] (MESOS-3937) Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.

2015-11-18 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011684#comment-15011684
 ] 

Till Toenshoff commented on MESOS-3937:
---

Yes, that one. Sorry for the confusion.

> Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.
> ---
>
> Key: MESOS-3937
> URL: https://issues.apache.org/jira/browse/MESOS-3937
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.26.0
> Environment: Ubuntu 14.04, gcc 4.8.4, Docker version 1.6.2
> 8 CPUs, 16 GB memory
> Vagrant, libvirt/Virtual Box or VMware
>Reporter: Bernd Mathiske
>
> {noformat}
> ../configure
> make check
> sudo ./bin/mesos-tests.sh 
> --gtest_filter="DockerContainerizerTest.ROOT_DOCKER_Launch_Executor" --verbose
> {noformat}
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from DockerContainerizerTest
> I1117 15:08:09.265943 26380 leveldb.cpp:176] Opened db in 3.199666ms
> I1117 15:08:09.267761 26380 leveldb.cpp:183] Compacted db in 1.684873ms
> I1117 15:08:09.267902 26380 leveldb.cpp:198] Created db iterator in 58313ns
> I1117 15:08:09.267966 26380 leveldb.cpp:204] Seeked to beginning of db in 
> 4927ns
> I1117 15:08:09.267997 26380 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 1605ns
> I1117 15:08:09.268156 26380 replica.cpp:780] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1117 15:08:09.270148 26396 recover.cpp:449] Starting replica recovery
> I1117 15:08:09.272105 26396 recover.cpp:475] Replica is in EMPTY status
> I1117 15:08:09.275640 26396 replica.cpp:676] Replica in EMPTY status received 
> a broadcasted recover request from (4)@10.0.2.15:50088
> I1117 15:08:09.276578 26399 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1117 15:08:09.277600 26397 recover.cpp:566] Updating replica status to 
> STARTING
> I1117 15:08:09.279613 26396 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.016098ms
> I1117 15:08:09.279731 26396 replica.cpp:323] Persisted replica status to 
> STARTING
> I1117 15:08:09.280306 26399 recover.cpp:475] Replica is in STARTING status
> I1117 15:08:09.282181 26400 replica.cpp:676] Replica in STARTING status 
> received a broadcasted recover request from (5)@10.0.2.15:50088
> I1117 15:08:09.282552 26400 master.cpp:367] Master 
> 59c600f1-92ff-4926-9c84-073d9b81f68a (vagrant-ubuntu-trusty-64) started on 
> 10.0.2.15:50088
> I1117 15:08:09.283021 26400 master.cpp:369] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/40AlT8/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/40AlT8/master" 
> --zk_session_timeout="10secs"
> I1117 15:08:09.283920 26400 master.cpp:414] Master only allowing 
> authenticated frameworks to register
> I1117 15:08:09.283972 26400 master.cpp:419] Master only allowing 
> authenticated slaves to register
> I1117 15:08:09.284032 26400 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/40AlT8/credentials'
> I1117 15:08:09.282944 26401 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1117 15:08:09.284639 26401 recover.cpp:566] Updating replica status to VOTING
> I1117 15:08:09.285539 26400 master.cpp:458] Using default 'crammd5' 
> authenticator
> I1117 15:08:09.285995 26401 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.075466ms
> I1117 15:08:09.286062 26401 replica.cpp:323] Persisted replica status to 
> VOTING
> I1117 15:08:09.286200 26401 recover.cpp:580] Successfully joined the Paxos 
> group
> I1117 15:08:09.286471 26401 recover.cpp:464] Recover process terminated
> I1117 15:08:09.287303 26400 authenticator.cpp:520] Initializing server SASL
> I1117 15:08:09.289371 26400 master.cpp:495] Authorization enabled
> I1117 15:08:09.296018 26399 master.cpp:1606] The newly elected leader is 
> master@10.0.2.15:50088 with id 59c600f1-92ff-4926-9c84-073d9b81f68a
> I1117 15:08:09.296115 26399 master.cpp:1619] Elected as the leading master!
> I1117 15:08:09.296187 26399 

[jira] [Comment Edited] (MESOS-3937) Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.

2015-11-18 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011644#comment-15011644
 ] 

Till Toenshoff edited comment on MESOS-3937 at 11/18/15 6:58 PM:
-

Tim had a great hint and that fixed the problem for me; the docker executor 
image was outdated.

A manual: {noformat}docker pull tnachen/test-executor{noformat} fixed the issue 
for me. 

Seems the reason for my problems was outdated proto code within the image.


was (Author: tillt):
Tim had a great hint and that fixed the problem for me; the docker executor 
image was outdated.

A manual: {noformat}docker pull tnachen/docker-executor{noformat} fixed the 
issue for me. 

Seems the reason for my problems was outdated proto code within the image.

> Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.
> ---
>
> Key: MESOS-3937
> URL: https://issues.apache.org/jira/browse/MESOS-3937
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.26.0
> Environment: Ubuntu 14.04, gcc 4.8.4, Docker version 1.6.2
> 8 CPUs, 16 GB memory
> Vagrant, libvirt/Virtual Box or VMware
>Reporter: Bernd Mathiske
>
> {noformat}
> ../configure
> make check
> sudo ./bin/mesos-tests.sh 
> --gtest_filter="DockerContainerizerTest.ROOT_DOCKER_Launch_Executor" --verbose
> {noformat}
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from DockerContainerizerTest
> I1117 15:08:09.265943 26380 leveldb.cpp:176] Opened db in 3.199666ms
> I1117 15:08:09.267761 26380 leveldb.cpp:183] Compacted db in 1.684873ms
> I1117 15:08:09.267902 26380 leveldb.cpp:198] Created db iterator in 58313ns
> I1117 15:08:09.267966 26380 leveldb.cpp:204] Seeked to beginning of db in 
> 4927ns
> I1117 15:08:09.267997 26380 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 1605ns
> I1117 15:08:09.268156 26380 replica.cpp:780] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1117 15:08:09.270148 26396 recover.cpp:449] Starting replica recovery
> I1117 15:08:09.272105 26396 recover.cpp:475] Replica is in EMPTY status
> I1117 15:08:09.275640 26396 replica.cpp:676] Replica in EMPTY status received 
> a broadcasted recover request from (4)@10.0.2.15:50088
> I1117 15:08:09.276578 26399 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1117 15:08:09.277600 26397 recover.cpp:566] Updating replica status to 
> STARTING
> I1117 15:08:09.279613 26396 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.016098ms
> I1117 15:08:09.279731 26396 replica.cpp:323] Persisted replica status to 
> STARTING
> I1117 15:08:09.280306 26399 recover.cpp:475] Replica is in STARTING status
> I1117 15:08:09.282181 26400 replica.cpp:676] Replica in STARTING status 
> received a broadcasted recover request from (5)@10.0.2.15:50088
> I1117 15:08:09.282552 26400 master.cpp:367] Master 
> 59c600f1-92ff-4926-9c84-073d9b81f68a (vagrant-ubuntu-trusty-64) started on 
> 10.0.2.15:50088
> I1117 15:08:09.283021 26400 master.cpp:369] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/40AlT8/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/40AlT8/master" 
> --zk_session_timeout="10secs"
> I1117 15:08:09.283920 26400 master.cpp:414] Master only allowing 
> authenticated frameworks to register
> I1117 15:08:09.283972 26400 master.cpp:419] Master only allowing 
> authenticated slaves to register
> I1117 15:08:09.284032 26400 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/40AlT8/credentials'
> I1117 15:08:09.282944 26401 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1117 15:08:09.284639 26401 recover.cpp:566] Updating replica status to VOTING
> I1117 15:08:09.285539 26400 master.cpp:458] Using default 'crammd5' 
> authenticator
> I1117 15:08:09.285995 26401 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.075466ms
> I1117 15:08:09.286062 26401 replica.cpp:323] Persisted replica status to 
> VOTING
> I1117 

[jira] [Commented] (MESOS-3937) Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.

2015-11-18 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011644#comment-15011644
 ] 

Till Toenshoff commented on MESOS-3937:
---

Tim had a great hint and that fixed the problem for me; the docker executor 
image was outdated.

A manual: {noformat}docker pull tnachen/docker-executor{noformat} fixed the 
issue for me. 

Seems the reason for my problems was outdated proto code within the image.

> Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.
> ---
>
> Key: MESOS-3937
> URL: https://issues.apache.org/jira/browse/MESOS-3937
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.26.0
> Environment: Ubuntu 14.04, gcc 4.8.4, Docker version 1.6.2
> 8 CPUs, 16 GB memory
> Vagrant, libvirt/Virtual Box or VMware
>Reporter: Bernd Mathiske
>
> {noformat}
> ../configure
> make check
> sudo ./bin/mesos-tests.sh 
> --gtest_filter="DockerContainerizerTest.ROOT_DOCKER_Launch_Executor" --verbose
> {noformat}
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from DockerContainerizerTest
> I1117 15:08:09.265943 26380 leveldb.cpp:176] Opened db in 3.199666ms
> I1117 15:08:09.267761 26380 leveldb.cpp:183] Compacted db in 1.684873ms
> I1117 15:08:09.267902 26380 leveldb.cpp:198] Created db iterator in 58313ns
> I1117 15:08:09.267966 26380 leveldb.cpp:204] Seeked to beginning of db in 
> 4927ns
> I1117 15:08:09.267997 26380 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 1605ns
> I1117 15:08:09.268156 26380 replica.cpp:780] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1117 15:08:09.270148 26396 recover.cpp:449] Starting replica recovery
> I1117 15:08:09.272105 26396 recover.cpp:475] Replica is in EMPTY status
> I1117 15:08:09.275640 26396 replica.cpp:676] Replica in EMPTY status received 
> a broadcasted recover request from (4)@10.0.2.15:50088
> I1117 15:08:09.276578 26399 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1117 15:08:09.277600 26397 recover.cpp:566] Updating replica status to 
> STARTING
> I1117 15:08:09.279613 26396 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.016098ms
> I1117 15:08:09.279731 26396 replica.cpp:323] Persisted replica status to 
> STARTING
> I1117 15:08:09.280306 26399 recover.cpp:475] Replica is in STARTING status
> I1117 15:08:09.282181 26400 replica.cpp:676] Replica in STARTING status 
> received a broadcasted recover request from (5)@10.0.2.15:50088
> I1117 15:08:09.282552 26400 master.cpp:367] Master 
> 59c600f1-92ff-4926-9c84-073d9b81f68a (vagrant-ubuntu-trusty-64) started on 
> 10.0.2.15:50088
> I1117 15:08:09.283021 26400 master.cpp:369] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/40AlT8/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/40AlT8/master" 
> --zk_session_timeout="10secs"
> I1117 15:08:09.283920 26400 master.cpp:414] Master only allowing 
> authenticated frameworks to register
> I1117 15:08:09.283972 26400 master.cpp:419] Master only allowing 
> authenticated slaves to register
> I1117 15:08:09.284032 26400 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/40AlT8/credentials'
> I1117 15:08:09.282944 26401 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1117 15:08:09.284639 26401 recover.cpp:566] Updating replica status to VOTING
> I1117 15:08:09.285539 26400 master.cpp:458] Using default 'crammd5' 
> authenticator
> I1117 15:08:09.285995 26401 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.075466ms
> I1117 15:08:09.286062 26401 replica.cpp:323] Persisted replica status to 
> VOTING
> I1117 15:08:09.286200 26401 recover.cpp:580] Successfully joined the Paxos 
> group
> I1117 15:08:09.286471 26401 recover.cpp:464] Recover process terminated
> I1117 15:08:09.287303 26400 authenticator.cpp:520] Initializing server SASL
> I1117 15:08:09.289371 26400 master.cpp:495] Authorization enabled
> I1117 15:08:09.296018 26399 

[jira] [Commented] (MESOS-3937) Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.

2015-11-17 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009945#comment-15009945
 ] 

Till Toenshoff commented on MESOS-3937:
---

Are you saying that the lack of memory limitation on the container side would 
fail this test?

> Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.
> ---
>
> Key: MESOS-3937
> URL: https://issues.apache.org/jira/browse/MESOS-3937
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.26.0
> Environment: Ubuntu 14.04, gcc 4.8.4, Docker version 1.6.2
>Reporter: Bernd Mathiske
>
> {noformat}
> ../configure
> make check
> sudo ./bin/mesos-tests.sh 
> --gtest_filter="DockerContainerizerTest.ROOT_DOCKER_Launch_Executor" --verbose
> {noformat}
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from DockerContainerizerTest
> I1117 15:08:09.265943 26380 leveldb.cpp:176] Opened db in 3.199666ms
> I1117 15:08:09.267761 26380 leveldb.cpp:183] Compacted db in 1.684873ms
> I1117 15:08:09.267902 26380 leveldb.cpp:198] Created db iterator in 58313ns
> I1117 15:08:09.267966 26380 leveldb.cpp:204] Seeked to beginning of db in 
> 4927ns
> I1117 15:08:09.267997 26380 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 1605ns
> I1117 15:08:09.268156 26380 replica.cpp:780] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1117 15:08:09.270148 26396 recover.cpp:449] Starting replica recovery
> I1117 15:08:09.272105 26396 recover.cpp:475] Replica is in EMPTY status
> I1117 15:08:09.275640 26396 replica.cpp:676] Replica in EMPTY status received 
> a broadcasted recover request from (4)@10.0.2.15:50088
> I1117 15:08:09.276578 26399 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1117 15:08:09.277600 26397 recover.cpp:566] Updating replica status to 
> STARTING
> I1117 15:08:09.279613 26396 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.016098ms
> I1117 15:08:09.279731 26396 replica.cpp:323] Persisted replica status to 
> STARTING
> I1117 15:08:09.280306 26399 recover.cpp:475] Replica is in STARTING status
> I1117 15:08:09.282181 26400 replica.cpp:676] Replica in STARTING status 
> received a broadcasted recover request from (5)@10.0.2.15:50088
> I1117 15:08:09.282552 26400 master.cpp:367] Master 
> 59c600f1-92ff-4926-9c84-073d9b81f68a (vagrant-ubuntu-trusty-64) started on 
> 10.0.2.15:50088
> I1117 15:08:09.283021 26400 master.cpp:369] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/40AlT8/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/40AlT8/master" 
> --zk_session_timeout="10secs"
> I1117 15:08:09.283920 26400 master.cpp:414] Master only allowing 
> authenticated frameworks to register
> I1117 15:08:09.283972 26400 master.cpp:419] Master only allowing 
> authenticated slaves to register
> I1117 15:08:09.284032 26400 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/40AlT8/credentials'
> I1117 15:08:09.282944 26401 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1117 15:08:09.284639 26401 recover.cpp:566] Updating replica status to VOTING
> I1117 15:08:09.285539 26400 master.cpp:458] Using default 'crammd5' 
> authenticator
> I1117 15:08:09.285995 26401 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.075466ms
> I1117 15:08:09.286062 26401 replica.cpp:323] Persisted replica status to 
> VOTING
> I1117 15:08:09.286200 26401 recover.cpp:580] Successfully joined the Paxos 
> group
> I1117 15:08:09.286471 26401 recover.cpp:464] Recover process terminated
> I1117 15:08:09.287303 26400 authenticator.cpp:520] Initializing server SASL
> I1117 15:08:09.289371 26400 master.cpp:495] Authorization enabled
> I1117 15:08:09.296018 26399 master.cpp:1606] The newly elected leader is 
> master@10.0.2.15:50088 with id 59c600f1-92ff-4926-9c84-073d9b81f68a
> I1117 15:08:09.296115 26399 master.cpp:1619] Elected as the leading master!
> I1117 15:08:09.296187 26399 master.cpp:1379] 

[jira] [Updated] (MESOS-3316) provisioner_backend_tests.cpp breaks the build on OSX

2015-08-26 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3316:
--
Assignee: Yan Xu

 provisioner_backend_tests.cpp breaks the build on OSX
 -

 Key: MESOS-3316
 URL: https://issues.apache.org/jira/browse/MESOS-3316
 Project: Mesos
  Issue Type: Bug
Reporter: Alexander Rojas
Assignee: Yan Xu
Priority: Blocker
  Labels: build-failure

 The test file makes an include of {{linux/fs.hpp}} which in turn includes 
 {{mntent.h}} which is only available in linux.
 Building in OSX leads to:
 {noformat}
 g++ -DPACKAGE_NAME=\mesos\ -DPACKAGE_TARNAME=\mesos\ 
 -DPACKAGE_VERSION=\0.25.0\ -DPACKAGE_STRING=\mesos\ 0.25.0\ 
 -DPACKAGE_BUGREPORT=\\ -DPACKAGE_URL=\\ -DPACKAGE=\mesos\ 
 -DVERSION=\0.25.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 
 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 
 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 
 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -DHAVE_PTHREAD_PRIO_INHERIT=1 
 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 
 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 
 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -I. 
 -I../../src   -Wall -Werror -DLIBDIR=\/usr/local/lib\ 
 -DPKGLIBEXECDIR=\/usr/local/libexec/mesos\ 
 -DPKGDATADIR=\/usr/local/share/mesos\ -I../../include 
 -I../../3rdparty/libprocess/include 
 -I../../3rdparty/libprocess/3rdparty/stout/include -I../include 
 -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 
 -I../3rdparty/libprocess/3rdparty/picojson-4f93734 
 -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
 -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
 -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
 -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include 
 -I../3rdparty/zookeeper-3.4.5/src/c/generated 
 -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
 -DSOURCE_DIR=\/Users/alexander/Documents/workspace/pmesos/build/..\ 
 -DBUILD_DIR=\/Users/alexander/Documents/workspace/pmesos/build\ 
 -I../3rdparty/libprocess/3rdparty/gmock-1.7.0/gtest/include 
 -I../3rdparty/libprocess/3rdparty/gmock-1.7.0/include  
 -I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include 
 -I/usr/local/opt/subversion/include/subversion-1 -I/usr/include/apr-1 
 -I/usr/include/apr-1.0  -D_THREAD_SAFE -pthread -g -O0 -std=c++11 
 -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -MT 
 tests/containerizer/mesos_tests-provisioner_backend_tests.o -MD -MP -MF 
 tests/containerizer/.deps/mesos_tests-provisioner_backend_tests.Tpo -c -o 
 tests/containerizer/mesos_tests-provisioner_backend_tests.o `test -f 
 'tests/containerizer/provisioner_backend_tests.cpp' || echo 
 '../../src/'`tests/containerizer/provisioner_backend_tests.cpp
 make[3]: Nothing to be done for `../../src/tests/balloon_framework_test.sh'.
 make[3]: Nothing to be done for 
 `../../src/tests/event_call_framework_test.sh'.
 make[3]: Nothing to be done for `../../src/tests/java_exception_test.sh'.
 make[3]: Nothing to be done for `../../src/tests/java_framework_test.sh'.
 make[3]: Nothing to be done for `../../src/tests/java_log_test.sh'.
 make[3]: Nothing to be done for 
 `../../src/tests/no_executor_framework_test.sh'.
 make[3]: Nothing to be done for 
 `../../src/tests/persistent_volume_framework_test.sh'.
 make[3]: Nothing to be done for `../../src/tests/python_framework_test.sh'.
 make[3]: Nothing to be done for `../../src/tests/test_framework_test.sh'.
 In file included from 
 ../../src/tests/containerizer/provisioner_backend_tests.cpp:28:
 ../../src/linux/fs.hpp:23:10: fatal error: 'mntent.h' file not found
 #include mntent.h
  ^
 1 error generated.
 make[3]: *** [tests/containerizer/mesos_tests-provisioner_backend_tests.o] 
 Error 1
 make[2]: *** [check-am] Error 2
 make[1]: *** [check] Error 2
 make: *** [check-recursive] Error 1
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1194) protobuf-JSON rendering doesnt validate

2015-09-08 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734530#comment-14734530
 ] 

Till Toenshoff commented on MESOS-1194:
---

[~Akanksha08] ping :)

> protobuf-JSON rendering doesnt validate
> ---
>
> Key: MESOS-1194
> URL: https://issues.apache.org/jira/browse/MESOS-1194
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Affects Versions: 0.19.0
>Reporter: Till Toenshoff
>Assignee: Akanksha Agrawal
>Priority: Minor
>  Labels: json, newbie, protobuf, stout
>
> When using JSON::Protobuf(Message&), the supplied protobuf is not checked for 
> being properly initialized, hence e.g. required fields could be missing.
> It would be desirable to have a feedback mechanism in place for this 
> constructor - maybe this would do:
> {noformat}
> if (!message.IsInitialized()) { 
>   std::cerr << "Protobuf not initialized: " << 
> message.InitializationErrorString() << std::endl;
>   abort();
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3370) Deprecate the external containerizer

2015-09-11 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740945#comment-14740945
 ] 

Till Toenshoff commented on MESOS-3370:
---

Yes, a vote would be the right approach here, but I am also +1 (with a 
small tear in my eyes ;) ).

> Deprecate the external containerizer
> 
>
> Key: MESOS-3370
> URL: https://issues.apache.org/jira/browse/MESOS-3370
> Project: Mesos
>  Issue Type: Task
>Reporter: Niklas Quarfot Nielsen
>
> To our knowledge, no one is using the external containerizer and we could 
> clean up code paths in the slave and containerizer interface (the dual 
> launch() signatures)
> In a deprecation cycle, we can move this code into a module (dependent on 
> containerizer modules landing) and from there, move it into it's own repo



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4065) slave FD for ZK tcp connection leaked to executor process

2015-12-08 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046570#comment-15046570
 ] 

Till Toenshoff commented on MESOS-4065:
---

>From your results, this conclusion seems sensible to me.

We should actually add a bug report in the zookeeper JIRA so it can be properly 
handled upstream 
(https://issues.apache.org/jira/browse/ZOOKEEPER/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
 
Could you please take care of that [~jdef]? 

> slave FD for ZK tcp connection leaked to executor process
> -
>
> Key: MESOS-4065
> URL: https://issues.apache.org/jira/browse/MESOS-4065
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.24.1, 0.25.0
>Reporter: James DeFelice
>  Labels: mesosphere, security
>
> {code}
> core@ip-10-0-0-45 ~ $ ps auxwww|grep -e etcd
> root  1432 99.3  0.0 202420 12928 ?Rsl  21:32  13:51 
> ./etcd-mesos-executor -log_dir=./
> root  1450  0.4  0.1  38332 28752 ?Sl   21:32   0:03 ./etcd 
> --data-dir=etcd_data --name=etcd-1449178273 
> --listen-peer-urls=http://10.0.0.45:1025 
> --initial-advertise-peer-urls=http://10.0.0.45:1025 
> --listen-client-urls=http://10.0.0.45:1026 
> --advertise-client-urls=http://10.0.0.45:1026 
> --initial-cluster=etcd-1449178273=http://10.0.0.45:1025,etcd-1449178271=http://10.0.2.95:1025,etcd-1449178272=http://10.0.2.216:1025
>  --initial-cluster-state=existing
> core  1651  0.0  0.0   6740   928 pts/0S+   21:46   0:00 grep 
> --colour=auto -e etcd
> core@ip-10-0-0-45 ~ $ sudo lsof -p 1432|grep -e 2181
> etcd-meso 1432 root   10u IPv4  21973  0t0TCP 
> ip-10-0-0-45.us-west-2.compute.internal:54016->ip-10-0-5-206.us-west-2.compute.internal:2181
>  (ESTABLISHED)
> core@ip-10-0-0-45 ~ $ ps auxwww|grep -e slave
> root  1124  0.2  0.1 900496 25736 ?Ssl  21:11   0:04 
> /opt/mesosphere/packages/mesos--52cbecde74638029c3ba0ac5e5ab81df8debf0fa/sbin/mesos-slave
> core  1658  0.0  0.0   6740   832 pts/0S+   21:46   0:00 grep 
> --colour=auto -e slave
> core@ip-10-0-0-45 ~ $ sudo lsof -p 1124|grep -e 2181
> mesos-sla 1124 root   10u IPv4  21973  0t0TCP 
> ip-10-0-0-45.us-west-2.compute.internal:54016->ip-10-0-5-206.us-west-2.compute.internal:2181
>  (ESTABLISHED)
> {code}
> I only tested against mesos 0.24.1 and 0.25.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4065) slave FD for ZK tcp connection leaked to executor process

2015-12-08 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046588#comment-15046588
 ] 

Till Toenshoff commented on MESOS-4065:
---

Some tool that has been rather useful for debugging such issues within Mesos; 
https://github.com/tillt/mesos/commit/d6982ece26121c599426e6b5c573e8d8afeff837


> slave FD for ZK tcp connection leaked to executor process
> -
>
> Key: MESOS-4065
> URL: https://issues.apache.org/jira/browse/MESOS-4065
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.24.1, 0.25.0
>Reporter: James DeFelice
>  Labels: mesosphere, security
>
> {code}
> core@ip-10-0-0-45 ~ $ ps auxwww|grep -e etcd
> root  1432 99.3  0.0 202420 12928 ?Rsl  21:32  13:51 
> ./etcd-mesos-executor -log_dir=./
> root  1450  0.4  0.1  38332 28752 ?Sl   21:32   0:03 ./etcd 
> --data-dir=etcd_data --name=etcd-1449178273 
> --listen-peer-urls=http://10.0.0.45:1025 
> --initial-advertise-peer-urls=http://10.0.0.45:1025 
> --listen-client-urls=http://10.0.0.45:1026 
> --advertise-client-urls=http://10.0.0.45:1026 
> --initial-cluster=etcd-1449178273=http://10.0.0.45:1025,etcd-1449178271=http://10.0.2.95:1025,etcd-1449178272=http://10.0.2.216:1025
>  --initial-cluster-state=existing
> core  1651  0.0  0.0   6740   928 pts/0S+   21:46   0:00 grep 
> --colour=auto -e etcd
> core@ip-10-0-0-45 ~ $ sudo lsof -p 1432|grep -e 2181
> etcd-meso 1432 root   10u IPv4  21973  0t0TCP 
> ip-10-0-0-45.us-west-2.compute.internal:54016->ip-10-0-5-206.us-west-2.compute.internal:2181
>  (ESTABLISHED)
> core@ip-10-0-0-45 ~ $ ps auxwww|grep -e slave
> root  1124  0.2  0.1 900496 25736 ?Ssl  21:11   0:04 
> /opt/mesosphere/packages/mesos--52cbecde74638029c3ba0ac5e5ab81df8debf0fa/sbin/mesos-slave
> core  1658  0.0  0.0   6740   832 pts/0S+   21:46   0:00 grep 
> --colour=auto -e slave
> core@ip-10-0-0-45 ~ $ sudo lsof -p 1124|grep -e 2181
> mesos-sla 1124 root   10u IPv4  21973  0t0TCP 
> ip-10-0-0-45.us-west-2.compute.internal:54016->ip-10-0-5-206.us-west-2.compute.internal:2181
>  (ESTABLISHED)
> {code}
> I only tested against mesos 0.24.1 and 0.25.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.

2015-12-01 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-4025:
--
Component/s: test

> SlaveRecoveryTest/0.GCExecutor is flaky.
> 
>
> Key: MESOS-4025
> URL: https://issues.apache.org/jira/browse/MESOS-4025
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>  Labels: flaky, flaky-test, test
>
> Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based 
> on 0.26.0-rc1.
> Testsuite was run as root.
> {noformat}
> sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1
> {noformat}
> {noformat}
> [ RUN  ] SlaveRecoveryTest/0.GCExecutor
> I1130 16:49:16.336833  1032 exec.cpp:136] Version: 0.26.0
> I1130 16:49:16.345212  1049 exec.cpp:210] Executor registered on slave 
> dde9fd4e-b016-4a99-9081-b047e9df9afa-S0
> Registered executor on ubuntu14
> Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114
> sh -c 'sleep 1000'
> Forked command at 1057
> ../../src/tests/mesos.cpp:779: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave':
>  Device or resource busy
> *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are 
> using GNU date ***
> PC: @  0x1443e9a testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; 
> stack trace: ***
> @ 0x7f1be92b80b7 os::Linux::chained_handler()
> @ 0x7f1be92bc219 JVM_handle_linux_signal
> @ 0x7f1bf7bbc340 (unknown)
> @  0x1443e9a testing::UnitTest::AddTestPartResult()
> @  0x1438b99 testing::internal::AssertHelper::operator=()
> @   0xf0b3bb 
> mesos::internal::tests::ContainerizerTest<>::TearDown()
> @  0x1461882 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145c6f8 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x143de4a testing::Test::Run()
> @  0x143e584 testing::TestInfo::Run()
> @  0x143ebca testing::TestCase::Run()
> @  0x1445312 testing::internal::UnitTestImpl::RunAllTests()
> @  0x14624a7 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145d26e 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x14440ae testing::UnitTest::Run()
> @   0xd15cd4 RUN_ALL_TESTS()
> @   0xd158c1 main
> @ 0x7f1bf7808ec5 (unknown)
> @   0x913009 (unknown)
> {noformat}
> My Vagrantfile generator;
> {noformat}
> #!/usr/bin/env bash
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.box = "bento/ubuntu-14.04"
>   config.vm.hostname = "${PLATFORM_NAME}"
>   config.vm.provider "virtualbox" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
> vb.customize ["modifyvm", :id, "--nictype1", "virtio"]
> vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
> vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"]
>   end
>   config.vm.provider "vmware_fusion" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
>   end
>   config.vm.provision "file", source: "../test.sh", destination: "~/test.sh"
>   config.vm.provision "shell", inline: <<-SHELL
> sudo apt-get update
> sudo apt-get -y install openjdk-7-jdk autoconf libtool
> sudo apt-get -y install build-essential python-dev python-boto  \
> libcurl4-nss-dev libsasl2-dev maven \
> libapr1-dev libsvn-dev libssl-dev libevent-dev
> sudo apt-get -y install git
> sudo wget -qO- https://get.docker.com/ | sh
>   SHELL
> end
> EOF
> {noformat}
> The problem is kicking in frequently in my tests - I'ld say > 10% but less 
> than 50%.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.

2015-12-01 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-4025:
--
Labels: flaky flaky-test test  (was: test)

> SlaveRecoveryTest/0.GCExecutor is flaky.
> 
>
> Key: MESOS-4025
> URL: https://issues.apache.org/jira/browse/MESOS-4025
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>  Labels: flaky, flaky-test, test
>
> Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based 
> on 0.26.0-rc1.
> Testsuite was run as root.
> {noformat}
> sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1
> {noformat}
> {noformat}
> [ RUN  ] SlaveRecoveryTest/0.GCExecutor
> I1130 16:49:16.336833  1032 exec.cpp:136] Version: 0.26.0
> I1130 16:49:16.345212  1049 exec.cpp:210] Executor registered on slave 
> dde9fd4e-b016-4a99-9081-b047e9df9afa-S0
> Registered executor on ubuntu14
> Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114
> sh -c 'sleep 1000'
> Forked command at 1057
> ../../src/tests/mesos.cpp:779: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave':
>  Device or resource busy
> *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are 
> using GNU date ***
> PC: @  0x1443e9a testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; 
> stack trace: ***
> @ 0x7f1be92b80b7 os::Linux::chained_handler()
> @ 0x7f1be92bc219 JVM_handle_linux_signal
> @ 0x7f1bf7bbc340 (unknown)
> @  0x1443e9a testing::UnitTest::AddTestPartResult()
> @  0x1438b99 testing::internal::AssertHelper::operator=()
> @   0xf0b3bb 
> mesos::internal::tests::ContainerizerTest<>::TearDown()
> @  0x1461882 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145c6f8 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x143de4a testing::Test::Run()
> @  0x143e584 testing::TestInfo::Run()
> @  0x143ebca testing::TestCase::Run()
> @  0x1445312 testing::internal::UnitTestImpl::RunAllTests()
> @  0x14624a7 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145d26e 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x14440ae testing::UnitTest::Run()
> @   0xd15cd4 RUN_ALL_TESTS()
> @   0xd158c1 main
> @ 0x7f1bf7808ec5 (unknown)
> @   0x913009 (unknown)
> {noformat}
> My Vagrantfile generator;
> {noformat}
> #!/usr/bin/env bash
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.box = "bento/ubuntu-14.04"
>   config.vm.hostname = "${PLATFORM_NAME}"
>   config.vm.provider "virtualbox" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
> vb.customize ["modifyvm", :id, "--nictype1", "virtio"]
> vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
> vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"]
>   end
>   config.vm.provider "vmware_fusion" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
>   end
>   config.vm.provision "file", source: "../test.sh", destination: "~/test.sh"
>   config.vm.provision "shell", inline: <<-SHELL
> sudo apt-get update
> sudo apt-get -y install openjdk-7-jdk autoconf libtool
> sudo apt-get -y install build-essential python-dev python-boto  \
> libcurl4-nss-dev libsasl2-dev maven \
> libapr1-dev libsvn-dev libssl-dev libevent-dev
> sudo apt-get -y install git
> sudo wget -qO- https://get.docker.com/ | sh
>   SHELL
> end
> EOF
> {noformat}
> The problem is kicking in frequently in my tests - I'ld say > 10% but less 
> than 50%.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3586) MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and CGROUPS_ROOT_SlaveRecovery are flaky

2015-12-01 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3586:
--
Labels: flaky flaky-test  (was: )

> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and 
> CGROUPS_ROOT_SlaveRecovery are flaky
> 
>
> Key: MESOS-3586
> URL: https://issues.apache.org/jira/browse/MESOS-3586
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0, 0.26.0
> Environment: Ubuntu 14.04, 3.13.0-32 generic
> Debian 8, gcc 4.9.2
>Reporter: Miguel Bernadin
>  Labels: flaky, flaky-test
>
> I am install Mesos 0.24.0 on 4 servers which have very similar hardware and 
> software configurations. 
> After performing ../configure, make, and make check some servers have 
> completed successfully and other failed on test [ RUN  ] 
> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics.
> Is there something I should check in this test? 
> PERFORMED MAKE CHECK NODE-001
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0
> I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave 
> 20151005-143735-2393768202-35106-27900-S0
> Registered executor on svdidac038.techlabs.accenture.com
> Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0
> Forked command at 38510
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> PERFORMED MAKE CHECK NODE-002
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0
> I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave 
> 20151005-143857-2360213770-50427-26325-S0
> Registered executor on svdidac039.techlabs.accenture.com
> Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> Forked command at 37028
> ../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure
> Expected: (usage.get().mem_medium_pressure_counter()) >= 
> (usage.get().mem_critical_pressure_counter()), actual: 5 vs 6
> 2015-10-05 
> 14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3586) MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and CGROUPS_ROOT_SlaveRecovery are flaky

2015-12-01 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3586:
--
Component/s: test

> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and 
> CGROUPS_ROOT_SlaveRecovery are flaky
> 
>
> Key: MESOS-3586
> URL: https://issues.apache.org/jira/browse/MESOS-3586
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0, 0.26.0
> Environment: Ubuntu 14.04, 3.13.0-32 generic
> Debian 8, gcc 4.9.2
>Reporter: Miguel Bernadin
>  Labels: flaky, flaky-test
>
> I am install Mesos 0.24.0 on 4 servers which have very similar hardware and 
> software configurations. 
> After performing ../configure, make, and make check some servers have 
> completed successfully and other failed on test [ RUN  ] 
> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics.
> Is there something I should check in this test? 
> PERFORMED MAKE CHECK NODE-001
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0
> I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave 
> 20151005-143735-2393768202-35106-27900-S0
> Registered executor on svdidac038.techlabs.accenture.com
> Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0
> Forked command at 38510
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> PERFORMED MAKE CHECK NODE-002
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0
> I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave 
> 20151005-143857-2360213770-50427-26325-S0
> Registered executor on svdidac039.techlabs.accenture.com
> Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> Forked command at 37028
> ../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure
> Expected: (usage.get().mem_medium_pressure_counter()) >= 
> (usage.get().mem_critical_pressure_counter()), actual: 5 vs 6
> 2015-10-05 
> 14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3608) optionally install test binaries

2015-12-03 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039700#comment-15039700
 ] 

Till Toenshoff commented on MESOS-3608:
---

Due to the 0.26.0 release process, things have been much delayed already - 
sorry for that. I will not be able to review this until monday.

> optionally install test binaries
> 
>
> Key: MESOS-3608
> URL: https://issues.apache.org/jira/browse/MESOS-3608
> Project: Mesos
>  Issue Type: Improvement
>  Components: build, test
>Reporter: James Peach
>Assignee: James Peach
>Priority: Minor
>
> Many of the tests in Mesos could be described as integration tests, since 
> they have external dependencies on kernel features, installed tools, 
> permissions, etc. I'd like to be able to generate a {{mesos-tests}} RPM along 
> with my {{mesos}} RPM so that I can run the same tests in different 
> deployment environments.
> I propose a new configuration option named {{--enable-test-tools}} that will 
> install the tests into {{libexec/mesos/tests}}. I'll also need to make some 
> minor changes to tests so that helper tools can be found in this location as 
> well as in the build directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4065) slave FD for ZK tcp connection leaked to executor process

2015-12-04 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15041767#comment-15041767
 ] 

Till Toenshoff commented on MESOS-4065:
---

What we see there is the fact that two processes (slave + executor) both use 
the same fd (10u) which likely is a bug.

> slave FD for ZK tcp connection leaked to executor process
> -
>
> Key: MESOS-4065
> URL: https://issues.apache.org/jira/browse/MESOS-4065
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.24.1, 0.25.0
>Reporter: James DeFelice
>  Labels: mesosphere, security
>
> {code}
> core@ip-10-0-0-45 ~ $ ps auxwww|grep -e etcd
> root  1432 99.3  0.0 202420 12928 ?Rsl  21:32  13:51 
> ./etcd-mesos-executor -log_dir=./
> root  1450  0.4  0.1  38332 28752 ?Sl   21:32   0:03 ./etcd 
> --data-dir=etcd_data --name=etcd-1449178273 
> --listen-peer-urls=http://10.0.0.45:1025 
> --initial-advertise-peer-urls=http://10.0.0.45:1025 
> --listen-client-urls=http://10.0.0.45:1026 
> --advertise-client-urls=http://10.0.0.45:1026 
> --initial-cluster=etcd-1449178273=http://10.0.0.45:1025,etcd-1449178271=http://10.0.2.95:1025,etcd-1449178272=http://10.0.2.216:1025
>  --initial-cluster-state=existing
> core  1651  0.0  0.0   6740   928 pts/0S+   21:46   0:00 grep 
> --colour=auto -e etcd
> core@ip-10-0-0-45 ~ $ sudo lsof -p 1432|grep -e 2181
> etcd-meso 1432 root   10u IPv4  21973  0t0TCP 
> ip-10-0-0-45.us-west-2.compute.internal:54016->ip-10-0-5-206.us-west-2.compute.internal:2181
>  (ESTABLISHED)
> core@ip-10-0-0-45 ~ $ ps auxwww|grep -e slave
> root  1124  0.2  0.1 900496 25736 ?Ssl  21:11   0:04 
> /opt/mesosphere/packages/mesos--52cbecde74638029c3ba0ac5e5ab81df8debf0fa/sbin/mesos-slave
> core  1658  0.0  0.0   6740   832 pts/0S+   21:46   0:00 grep 
> --colour=auto -e slave
> core@ip-10-0-0-45 ~ $ sudo lsof -p 1124|grep -e 2181
> mesos-sla 1124 root   10u IPv4  21973  0t0TCP 
> ip-10-0-0-45.us-west-2.compute.internal:54016->ip-10-0-5-206.us-west-2.compute.internal:2181
>  (ESTABLISHED)
> {code}
> I only tested against mesos 0.24.1 and 0.25.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4045) NumifyTest.HexNumberTest fails

2015-12-03 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037918#comment-15037918
 ] 

Till Toenshoff commented on MESOS-4045:
---

Appears to be a gcc vs. clang issue -- or maybe even a libstdc++ vs. libc++  
problem.

Building on OSX using gcc works fine, but bugs out using clang.

> NumifyTest.HexNumberTest fails
> --
>
> Key: MESOS-4045
> URL: https://issues.apache.org/jira/browse/MESOS-4045
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
> Environment: Mac OS X 10.11.1
>Reporter: Michael Park
>
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from NumifyTest
> [ RUN  ] NumifyTest.HexNumberTest
> ../../../../3rdparty/libprocess/3rdparty/stout/tests/numify_tests.cpp:44: 
> Failure
> Value of: numify("0x10.9").isError()
>   Actual: false
> Expected: true
> [  FAILED  ] NumifyTest.HexNumberTest (0 ms)
> [--] 1 test from NumifyTest (0 ms total)
> [--] Global test environment tear-down
> [==] 1 test from 1 test case ran. (0 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] NumifyTest.HexNumberTest
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4061) Flaky tests: docker containerizer tests on debian 8 VM

2015-12-03 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038556#comment-15038556
 ] 

Till Toenshoff commented on MESOS-4061:
---

Just for the fun of it - same test on VMware Fusion:

{noformat}
real0m5.887s
user0m0.028s
sys 0m0.020s
{noformat}


> Flaky tests: docker containerizer tests on debian 8 VM
> --
>
> Key: MESOS-4061
> URL: https://issues.apache.org/jira/browse/MESOS-4061
> Project: Mesos
>  Issue Type: Bug
> Environment: debian 8, vagrant, virtual box
>Reporter: Jojy Varghese
>
> Following tests were failing for 0.26 rc3:
> * DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping
> * DockerContainerizerTest.ROOT_DOCKER_Recover
> * DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer
> * DockerContainerizerTest.ROOT_DOCKER_Launch_Executor
> * DockerContainerizerTest.ROOT_DOCKER_Launch
> * DockerContainerizerTest.ROOT_DOCKER_Usage
> * DockerContainerizerTest.ROOT_DOCKER_Update
> * DockerContainerizerTest.ROOT_DOCKER_SkipRecoverNonDocker
> Note that this is not a comprehensive list. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4061) Flaky tests: docker containerizer tests on debian 8 VM

2015-12-03 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038584#comment-15038584
 ] 

Till Toenshoff commented on MESOS-4061:
---

The more intersting test here actually would be re-running the same command 
(with a  new name) as that would cut out the download of that busybox image.

> Flaky tests: docker containerizer tests on debian 8 VM
> --
>
> Key: MESOS-4061
> URL: https://issues.apache.org/jira/browse/MESOS-4061
> Project: Mesos
>  Issue Type: Bug
> Environment: debian 8, vagrant, virtual box
>Reporter: Jojy Varghese
>
> Following tests were failing for 0.26 rc3:
> * DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping
> * DockerContainerizerTest.ROOT_DOCKER_Recover
> * DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer
> * DockerContainerizerTest.ROOT_DOCKER_Launch_Executor
> * DockerContainerizerTest.ROOT_DOCKER_Launch
> * DockerContainerizerTest.ROOT_DOCKER_Usage
> * DockerContainerizerTest.ROOT_DOCKER_Update
> * DockerContainerizerTest.ROOT_DOCKER_SkipRecoverNonDocker
> Note that this is not a comprehensive list. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3608) optionally install test binaries

2015-12-07 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3608:
--
Labels: mesosphere  (was: )

> optionally install test binaries
> 
>
> Key: MESOS-3608
> URL: https://issues.apache.org/jira/browse/MESOS-3608
> Project: Mesos
>  Issue Type: Improvement
>  Components: build, test
>Reporter: James Peach
>Assignee: James Peach
>Priority: Minor
>  Labels: mesosphere
>
> Many of the tests in Mesos could be described as integration tests, since 
> they have external dependencies on kernel features, installed tools, 
> permissions, etc. I'd like to be able to generate a {{mesos-tests}} RPM along 
> with my {{mesos}} RPM so that I can run the same tests in different 
> deployment environments.
> I propose a new configuration option named {{--enable-test-tools}} that will 
> install the tests into {{libexec/mesos/tests}}. I'll also need to make some 
> minor changes to tests so that helper tools can be found in this location as 
> well as in the build directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4012) Update documentation to reflect the addition of installable tests.

2015-12-07 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-4012:
--
Labels: mesosphere  (was: )

> Update documentation to reflect the addition of installable tests.  
> 
>
> Key: MESOS-4012
> URL: https://issues.apache.org/jira/browse/MESOS-4012
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Till Toenshoff
>  Labels: mesosphere
>
> We may want to add the needed steps for administrators to create and run the 
> test-suite on anything other than the build machine. 
> One possible location could be {{docs/gettings-started.md}} for validating 
> the pre-requisites as described in that document. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4091) Mesos protobuf message definition ContainerInfo skipped an index.

2015-12-07 Thread Till Toenshoff (JIRA)
Till Toenshoff created MESOS-4091:
-

 Summary: Mesos protobuf message definition ContainerInfo skipped 
an index.
 Key: MESOS-4091
 URL: https://issues.apache.org/jira/browse/MESOS-4091
 Project: Mesos
  Issue Type: Bug
Reporter: Till Toenshoff


Looking at {{include/mesos/mesos.proto}}:

{noformat}
/**
 * Describes a container configuration and allows extensible
 * configurations for different container implementations.
 */
message ContainerInfo {
  // All container implementation types.
  enum Type {
DOCKER = 1;
MESOS = 2;
  }

  message DockerInfo {
// The docker image that is going to be passed to the registry.
required string image = 1;

// Network options.
enum Network {
  HOST = 1;
  BRIDGE = 2;
  NONE = 3;
}

optional Network network = 2 [default = HOST];

message PortMapping {
  required uint32 host_port = 1;
  required uint32 container_port = 2;
  // Protocol to expose as (ie: tcp, udp).
  optional string protocol = 3;
}

repeated PortMapping port_mappings = 3;

optional bool privileged = 4 [default = false];

// Allowing arbitrary parameters to be passed to docker CLI.
// Note that anything passed to this field is not guaranteed
// to be supported moving forward, as we might move away from
// the docker CLI.
repeated Parameter parameters = 5;

// With this flag set to true, the docker containerizer will
// pull the docker image from the registry even if the image
// is already downloaded on the slave.
optional bool force_pull_image = 6;
  }

  message MesosInfo {
optional Image image = 1;
  }

  required Type type = 1;
  repeated Volume volumes = 2;
  optional string hostname = 4;

  // Only one of the following *Info messages should be set to match
  // the type.
  optional DockerInfo docker = 3;
  optional MesosInfo mesos = 5;

  // A list of network requests. A framework can request multiple IP addresses
  // for the container.
  repeated NetworkInfo network_infos = 7;
}
{noformat}

Seems we are missing index 6 here.

A quick history check revealed no intension to remove a former field 6 - hence 
this appears to be a bug. Checked via;
{noformat}
$ git log -L 1500,1515:include/mesos/mesos.proto
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4015) Expose task / executor health in master & slave state.json

2015-12-10 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-4015:
--
Fix Version/s: (was: 0.27.0)
   0.26.0

> Expose task / executor health in master & slave state.json
> --
>
> Key: MESOS-4015
> URL: https://issues.apache.org/jira/browse/MESOS-4015
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 0.25.0
>Reporter: Sargun Dhillon
>Assignee: Artem Harutyunyan
>Priority: Trivial
>  Labels: mesosphere
> Fix For: 0.26.0
>
>
> Right now, if I specify a healthcheck for a task, the only way to get to it 
> is via the Task Status updates that come to the framework. Unfortunately, 
> this information isn't exposed in the state.json either in the slave or 
> master. It'd be ideal to have that information to enable tools like Mesos-DNS 
> to be health-aware.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4106) The health checker may fail to inform the executor to kill an unhealthy task after max_consecutive_failures.

2015-12-10 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-4106:
--
Fix Version/s: (was: 0.27.0)
   0.26.0

> The health checker may fail to inform the executor to kill an unhealthy task 
> after max_consecutive_failures.
> 
>
> Key: MESOS-4106
> URL: https://issues.apache.org/jira/browse/MESOS-4106
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.20.0, 0.20.1, 0.21.1, 0.21.2, 0.22.1, 0.22.2, 0.23.0, 
> 0.23.1, 0.24.0, 0.24.1, 0.25.0
>Reporter: Benjamin Mahler
>Assignee: Benjamin Mahler
>Priority: Blocker
> Fix For: 0.26.0
>
>
> This was reported by [~tan] experimenting with health checks. Many tasks were 
> launched with the following health check, taken from the container 
> stdout/stderr:
> {code}
> Launching health check process: /usr/local/libexec/mesos/mesos-health-check 
> --executor=(1)@127.0.0.1:39629 
> --health_check_json={"command":{"shell":true,"value":"false"},"consecutive_failures":1,"delay_seconds":0.0,"grace_period_seconds":1.0,"interval_seconds":1.0,"timeout_seconds":1.0}
>  --task_id=sleepy-2
> {code}
> This should have led to all tasks getting killed due to 
> {{\-\-consecutive_failures}} being set, however, only some tasks get killed, 
> while other remain running.
> It turns out that the health check binary does a {{send}} and promptly exits. 
> Unfortunately, this may lead to a message drop since libprocess may not have 
> sent this message over the socket by the time the process exits.
> We work around this in the command executor with a manual sleep, which has 
> been around since the svn days. See 
> [here|https://github.com/apache/mesos/blob/0.14.0/src/launcher/executor.cpp#L288-L290].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3686) General cleanup of documentation

2015-12-16 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060691#comment-15060691
 ] 

Till Toenshoff commented on MESOS-3686:
---

This commit 
https://github.com/apache/mesos/commit/b29ec4f110483555a5e1a65ef25a7ecc13e31b7f 
is the source, it seems.

This is the fix: https://reviews.apache.org/r/41463/ 

> General cleanup of documentation
> 
>
> Key: MESOS-3686
> URL: https://issues.apache.org/jira/browse/MESOS-3686
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation, general
>Reporter: Chris Elsmore
>  Labels: documentation
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Part of the MesosCon Europe 2015 Hackathon!
> Current documentation is inconsistent, and could do with a clean up:-
> * File names use a mix of hyphens and underscores,  some start with 'mesos' 
> some not.
> * A general clean up of broken links, and markdown tables etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3686) General cleanup of documentation

2015-12-16 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060636#comment-15060636
 ] 

Till Toenshoff commented on MESOS-3686:
---

Seems we got duplicate documentation now.

https://github.com/apache/mesos/blob/master/docs/mesos-documentation-guide.md
https://github.com/apache/mesos/blob/master/docs/documentation-guide.md


> General cleanup of documentation
> 
>
> Key: MESOS-3686
> URL: https://issues.apache.org/jira/browse/MESOS-3686
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation, general
>Reporter: Chris Elsmore
>  Labels: documentation
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Part of the MesosCon Europe 2015 Hackathon!
> Current documentation is inconsistent, and could do with a clean up:-
> * File names use a mix of hyphens and underscores,  some start with 'mesos' 
> some not.
> * A general clean up of broken links, and markdown tables etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3742) Site needs to get updated as it still lists MesosCon Europe as an upcoming event

2015-12-16 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff reassigned MESOS-3742:
-

Assignee: Till Toenshoff

> Site needs to get updated as it still lists MesosCon Europe as an upcoming 
> event
> 
>
> Key: MESOS-3742
> URL: https://issues.apache.org/jira/browse/MESOS-3742
> Project: Mesos
>  Issue Type: Bug
>  Components: project website
>Reporter: Till Toenshoff
>Assignee: Till Toenshoff
>
> The Apache website does need to get updated as it still lists MesosCon Europe 
> as an upcoming event
> Even the registration page 
> (http://events.linuxfoundation.org/events/mesoscon-europe/attend/register) 
> still seems to accept registrations - something we might want to get fixed 
> upstream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3844) getting started documentation has flaws, corrections suggested (http://mesos.apache.org/gettingstarted/)

2015-12-16 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3844:
--
Shepherd: Till Toenshoff  (was: Benjamin Hindman)

> getting started documentation has flaws, corrections suggested 
> (http://mesos.apache.org/gettingstarted/)
> 
>
> Key: MESOS-3844
> URL: https://issues.apache.org/jira/browse/MESOS-3844
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation, project website, test
>Affects Versions: 0.25.0
> Environment: CentOS 7 AWS Linux image: AWS EC2 MarketPlace CentOS 7 
> (x86_64) with Updates HVM (a t2.medium instance)
>Reporter: Manne Laukkanen
>Assignee: Kevin Klues
>Priority: Trivial
>  Labels: build, documentation, mesosphere
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Getting started documentation, while having great virtues, has room for 
> improvement:
> 1) Documentation is illogical and wrong for this part:
>  " $ wget http://www.apache.org/dist/mesos/0.25.0/mesos-0.25.0.tar.gz
>  $ tar -zxf mesos-0.25.0.tar.gz" ...then, later:
> "# Install a few utility tools
> $ sudo yum install -y tar wget
> ..obviously using tar and wget is not possible before installing them.
> 2) Although vi is fine for many, utility tools having:
> sudo yum install -y tar wget nano
> might make editing e.g. the WANDISCO -repo file way easier for newbies.
> 3) Advice to launch Mesos with localhost option ( " ./bin/mesos-master.sh 
> --ip=127.0.0.1 --work_dir=/var/lib/mesos " ) will lead into a state where 
> Mesos UI can not be reached in port :5050 in a production environment e.g. in 
> AWS EC2. Mentioning this would help, not hinder deployment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4118) Update Getting Started for Mac OS X El Capitan

2015-12-16 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-4118:
--
Shepherd: Till Toenshoff

> Update Getting Started for Mac OS X El Capitan
> --
>
> Key: MESOS-4118
> URL: https://issues.apache.org/jira/browse/MESOS-4118
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
> Environment: Mac OS X
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>Priority: Minor
>  Labels: documentation, mesosphere
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> This ticket pertains to the Getting Started guide on the apache mesos website
> The current instructions for installing on Mac OS X only include instructions 
> for Yosemite.  The instructions to build for El Capitan are identical except 
> in the case of upgrading from Yosemite to El Capitan.  To build after an 
> upgrade requires a trivial (but important) step which is non-obvious -- you 
> have to rerun 'xcode-select --install' after you complete the upgrade.
> Let's change the heading for installing on Mac OS X to say:
> Mac OS X Yosemite & El Capitan
> and then add a comment at the bottom of the section to point out that a rerun 
> of 'xcode-select --install' is necessary after an upgrade from Yosemite to El 
> Capitan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4134) Add note about tunneling in site-docker README

2015-12-16 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-4134:
--
Shepherd: Till Toenshoff

> Add note about tunneling in site-docker README
> --
>
> Key: MESOS-4134
> URL: https://issues.apache.org/jira/browse/MESOS-4134
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>Priority: Minor
>  Labels: documentation
>
> If we are running the site-docker container on a remote machine, we should 
> set up a tunnel to localhost to view the site locally.  The README should 
> explain how to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    1   2   3   4   5   6   7   8   9   10   >