[jira] [Commented] (MESOS-8010) AfterTest.Loop is flaky.

2017-09-25 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180065#comment-16180065
 ] 

Till Toenshoff commented on MESOS-8010:
---

Rerunning all of libprocess tests now in repetition - hopefully I will have 
more input then.

> AfterTest.Loop is flaky.
> 
>
> Key: MESOS-8010
> URL: https://issues.apache.org/jira/browse/MESOS-8010
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 1.5.0
> Environment: Apple LLVM version 9.0.0 (clang-900.0.37)
>Reporter: Till Toenshoff
>  Labels: flaky, flaky-test
>
> The following just happened in a local test-build of the current master. I 
> did not go deeper in retrying or raising the log-level yet.
> {noformat}
> [ RUN  ] FutureTest.After1
> PC: @0x10d53b127 process::Future<>::Data::clearAllCallbacks()
> *** SIGSEGV (@0xb0) received by PID 49169 (TID 0x7d88f000) stack trace: 
> ***
> @ 0x7fff5ce76f5a _sigtramp
> @ 0x7fb52ce054f0 (unknown)
> @0x10d572be7 process::Future<>::_set<>()
> @0x10d5729d5 process::Future<>::set()
> @0x10d572990 process::Promise<>::_set<>()
> @0x10d572835 process::Promise<>::set()
> @0x10d5727bb AfterTest_Loop_Test::TestBody()::$_4::operator()()
> @0x10d572765 
> _ZZNK7process6FutureINS_11ControlFlowI7NothingEEE9onDiscardIZN19AfterTest_Loop_Test8TestBodyEvE3$_4EERKS4_OT_ENUlvE_clEv
> @0x10d57273d 
> _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZNK7process6FutureINS3_11ControlFlowI7NothingEEE9onDiscardIZN19AfterTest_Loop_Test8TestBodyEvE3$_4EERKS8_OT_EUlvE_EEEvDpOT_
> @0x10d572659 
> _ZNSt3__110__function6__funcIZNK7process6FutureINS2_11ControlFlowI7NothingEEE9onDiscardIZN19AfterTest_Loop_Test8TestBodyEvE3$_4EERKS7_OT_EUlvE_NS_9allocatorISF_EEFvvEEclEv
> @0x10d51982b std::__1::function<>::operator()()
> @0x10d51d405 
> _ZN7process8internal3runINSt3__18functionIFvvEEEJEEEvRKNS2_6vectorIT_NS2_9allocatorIS7_DpOT0_
> @0x10d537c05 process::Future<>::discard()
> @0x10d5591ff process::internal::Loop<>::run()
> @0x10d569be3 
> _ZZN7process8internal4LoopIZN19AfterTest_Loop_Test8TestBodyEvE3$_2ZNS2_8TestBodyEvE3$_37NothingS5_E3runENS_6FutureIS5_EEENKUlRKS8_E_clESA_
> @0x10d56f1ed 
> _ZZNK7process6FutureI7NothingE5onAnyIRZNS_8internal4LoopIZN19AfterTest_Loop_Test8TestBodyEvE3$_2ZNS6_8TestBodyEvE3$_3S1_S1_E3runES2_EUlRKS2_E_vEESB_OT_NS2_6PreferEENUlSB_E_clESB_
> @0x10d56f1bd 
> _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZNK7process6FutureI7NothingE5onAnyIRZNS3_8internal4LoopIZN19AfterTest_Loop_Test8TestBodyEvE3$_2ZNSA_8TestBodyEvE3$_3S5_S5_E3runES6_EUlRKS6_E_vEESF_OT_NS6_6PreferEEUlSF_E_SF_EEEvDpOT_
> @0x10d56ef79 
> _ZNSt3__110__function6__funcIZNK7process6FutureI7NothingE5onAnyIRZNS2_8internal4LoopIZN19AfterTest_Loop_Test8TestBodyEvE3$_2ZNS9_8TestBodyEvE3$_3S4_S4_E3runES5_EUlRKS5_E_vEESE_OT_NS5_6PreferEEUlSE_E_NS_9allocatorISK_EEFvSE_EEclESE_
> @0x10d51918e std::__1::function<>::operator()()
> @0x10d516545 
> _ZN7process8internal3runINSt3__18functionIFvRKNS_6FutureI7NothingEJRS6_EEEvRKNS2_6vectorIT_NS2_9allocatorISD_DpOT0_
> @0x10d51622a process::Future<>::_set<>()
> @0x10d516035 process::Future<>::set()
> @0x10d515ff0 process::Promise<>::_set<>()
> @0x10d515f95 process::Promise<>::set()
> @0x10d515f42 _ZZN7process5afterERK8DurationENKUlvE_clEv
> @0x10d515efd 
> _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN7process5afterERK8DurationEUlvE_EEEvDpOT_
> @0x10d515c19 
> _ZNSt3__110__function6__funcIZN7process5afterERK8DurationEUlvE_NS_9allocatorIS6_EEFvvEEclEv
> @0x10d51982b std::__1::function<>::operator()()
> @0x10e2a7c19 process::Timer::operator()()
> @0x10e2a762a process::timedout()
> @0x10e390265 
> _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRNS_6__bindIPFvRKNS_4listIN7process5TimerENS_9allocatorIS6_EJRKNS_12placeholders4__phILi1EESB_EEEvDpOT_
> @0x10e38ff79 
> _ZNSt3__110__function6__funcINS_6__bindIPFvRKNS_4listIN7process5TimerENS_9allocatorIS5_EJRKNS_12placeholders4__phILi1EENS6_ISI_EESB_EclESA_
> make[6]: *** [check-local] Segmentation fault: 11
> make[5]: *** [check-am] Error 2
> make[4]: *** [check-recursive] Error 1
> make[3]: *** [check] Error 2
> make[2]: *** [check-recursive] Error 1
> make[1]: *** [check] Error 2
> make: *** [check-recursive] Error 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8010) AfterTest.Loop is flaky.

2017-09-25 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180061#comment-16180061
 ] 

Till Toenshoff commented on MESOS-8010:
---

[~benjaminhindman] I did not notice other failures but have no complete log 
anymore -> I am not sure :( .

> AfterTest.Loop is flaky.
> 
>
> Key: MESOS-8010
> URL: https://issues.apache.org/jira/browse/MESOS-8010
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 1.5.0
> Environment: Apple LLVM version 9.0.0 (clang-900.0.37)
>Reporter: Till Toenshoff
>  Labels: flaky, flaky-test
>
> The following just happened in a local test-build of the current master. I 
> did not go deeper in retrying or raising the log-level yet.
> {noformat}
> [ RUN  ] FutureTest.After1
> PC: @0x10d53b127 process::Future<>::Data::clearAllCallbacks()
> *** SIGSEGV (@0xb0) received by PID 49169 (TID 0x7d88f000) stack trace: 
> ***
> @ 0x7fff5ce76f5a _sigtramp
> @ 0x7fb52ce054f0 (unknown)
> @0x10d572be7 process::Future<>::_set<>()
> @0x10d5729d5 process::Future<>::set()
> @0x10d572990 process::Promise<>::_set<>()
> @0x10d572835 process::Promise<>::set()
> @0x10d5727bb AfterTest_Loop_Test::TestBody()::$_4::operator()()
> @0x10d572765 
> _ZZNK7process6FutureINS_11ControlFlowI7NothingEEE9onDiscardIZN19AfterTest_Loop_Test8TestBodyEvE3$_4EERKS4_OT_ENUlvE_clEv
> @0x10d57273d 
> _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZNK7process6FutureINS3_11ControlFlowI7NothingEEE9onDiscardIZN19AfterTest_Loop_Test8TestBodyEvE3$_4EERKS8_OT_EUlvE_EEEvDpOT_
> @0x10d572659 
> _ZNSt3__110__function6__funcIZNK7process6FutureINS2_11ControlFlowI7NothingEEE9onDiscardIZN19AfterTest_Loop_Test8TestBodyEvE3$_4EERKS7_OT_EUlvE_NS_9allocatorISF_EEFvvEEclEv
> @0x10d51982b std::__1::function<>::operator()()
> @0x10d51d405 
> _ZN7process8internal3runINSt3__18functionIFvvEEEJEEEvRKNS2_6vectorIT_NS2_9allocatorIS7_DpOT0_
> @0x10d537c05 process::Future<>::discard()
> @0x10d5591ff process::internal::Loop<>::run()
> @0x10d569be3 
> _ZZN7process8internal4LoopIZN19AfterTest_Loop_Test8TestBodyEvE3$_2ZNS2_8TestBodyEvE3$_37NothingS5_E3runENS_6FutureIS5_EEENKUlRKS8_E_clESA_
> @0x10d56f1ed 
> _ZZNK7process6FutureI7NothingE5onAnyIRZNS_8internal4LoopIZN19AfterTest_Loop_Test8TestBodyEvE3$_2ZNS6_8TestBodyEvE3$_3S1_S1_E3runES2_EUlRKS2_E_vEESB_OT_NS2_6PreferEENUlSB_E_clESB_
> @0x10d56f1bd 
> _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZNK7process6FutureI7NothingE5onAnyIRZNS3_8internal4LoopIZN19AfterTest_Loop_Test8TestBodyEvE3$_2ZNSA_8TestBodyEvE3$_3S5_S5_E3runES6_EUlRKS6_E_vEESF_OT_NS6_6PreferEEUlSF_E_SF_EEEvDpOT_
> @0x10d56ef79 
> _ZNSt3__110__function6__funcIZNK7process6FutureI7NothingE5onAnyIRZNS2_8internal4LoopIZN19AfterTest_Loop_Test8TestBodyEvE3$_2ZNS9_8TestBodyEvE3$_3S4_S4_E3runES5_EUlRKS5_E_vEESE_OT_NS5_6PreferEEUlSE_E_NS_9allocatorISK_EEFvSE_EEclESE_
> @0x10d51918e std::__1::function<>::operator()()
> @0x10d516545 
> _ZN7process8internal3runINSt3__18functionIFvRKNS_6FutureI7NothingEJRS6_EEEvRKNS2_6vectorIT_NS2_9allocatorISD_DpOT0_
> @0x10d51622a process::Future<>::_set<>()
> @0x10d516035 process::Future<>::set()
> @0x10d515ff0 process::Promise<>::_set<>()
> @0x10d515f95 process::Promise<>::set()
> @0x10d515f42 _ZZN7process5afterERK8DurationENKUlvE_clEv
> @0x10d515efd 
> _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN7process5afterERK8DurationEUlvE_EEEvDpOT_
> @0x10d515c19 
> _ZNSt3__110__function6__funcIZN7process5afterERK8DurationEUlvE_NS_9allocatorIS6_EEFvvEEclEv
> @0x10d51982b std::__1::function<>::operator()()
> @0x10e2a7c19 process::Timer::operator()()
> @0x10e2a762a process::timedout()
> @0x10e390265 
> _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRNS_6__bindIPFvRKNS_4listIN7process5TimerENS_9allocatorIS6_EJRKNS_12placeholders4__phILi1EESB_EEEvDpOT_
> @0x10e38ff79 
> _ZNSt3__110__function6__funcINS_6__bindIPFvRKNS_4listIN7process5TimerENS_9allocatorIS5_EJRKNS_12placeholders4__phILi1EENS6_ISI_EESB_EclESA_
> make[6]: *** [check-local] Segmentation fault: 11
> make[5]: *** [check-am] Error 2
> make[4]: *** [check-recursive] Error 1
> make[3]: *** [check] Error 2
> make[2]: *** [check-recursive] Error 1
> make[1]: *** [check] Error 2
> make: *** [check-recursive] Error 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8015) Design a scheduler (V1) HTTP API authenticatee mechanism.

2017-09-25 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180047#comment-16180047
 ] 

Till Toenshoff commented on MESOS-8015:
---

https://docs.google.com/document/d/1nr7Pi5Dyy5kXcHJWk1UKfEYjUtc3CdvpTKIoHO6iqqs/edit

> Design a scheduler (V1) HTTP API authenticatee mechanism.
> -
>
> Key: MESOS-8015
> URL: https://issues.apache.org/jira/browse/MESOS-8015
> Project: Mesos
>  Issue Type: Improvement
>  Components: HTTP API, modules, scheduler api, security
>Reporter: Till Toenshoff
>Assignee: Till Toenshoff
>  Labels: design, mesosphere, modularization, scheduler, security
>
> Provide a design proposal for a scheduler HTTP API authenticatee module.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8010) AfterTest.Loop is flaky.

2017-09-25 Thread Benjamin Hindman (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180003#comment-16180003
 ] 

Benjamin Hindman commented on MESOS-8010:
-

[~tillt]: did any tests fail above this? In particular, did {{AfterTest.Loop}} 
fail?

> AfterTest.Loop is flaky.
> 
>
> Key: MESOS-8010
> URL: https://issues.apache.org/jira/browse/MESOS-8010
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 1.5.0
> Environment: Apple LLVM version 9.0.0 (clang-900.0.37)
>Reporter: Till Toenshoff
>  Labels: flaky, flaky-test
>
> The following just happened in a local test-build of the current master. I 
> did not go deeper in retrying or raising the log-level yet.
> {noformat}
> [ RUN  ] FutureTest.After1
> PC: @0x10d53b127 process::Future<>::Data::clearAllCallbacks()
> *** SIGSEGV (@0xb0) received by PID 49169 (TID 0x7d88f000) stack trace: 
> ***
> @ 0x7fff5ce76f5a _sigtramp
> @ 0x7fb52ce054f0 (unknown)
> @0x10d572be7 process::Future<>::_set<>()
> @0x10d5729d5 process::Future<>::set()
> @0x10d572990 process::Promise<>::_set<>()
> @0x10d572835 process::Promise<>::set()
> @0x10d5727bb AfterTest_Loop_Test::TestBody()::$_4::operator()()
> @0x10d572765 
> _ZZNK7process6FutureINS_11ControlFlowI7NothingEEE9onDiscardIZN19AfterTest_Loop_Test8TestBodyEvE3$_4EERKS4_OT_ENUlvE_clEv
> @0x10d57273d 
> _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZNK7process6FutureINS3_11ControlFlowI7NothingEEE9onDiscardIZN19AfterTest_Loop_Test8TestBodyEvE3$_4EERKS8_OT_EUlvE_EEEvDpOT_
> @0x10d572659 
> _ZNSt3__110__function6__funcIZNK7process6FutureINS2_11ControlFlowI7NothingEEE9onDiscardIZN19AfterTest_Loop_Test8TestBodyEvE3$_4EERKS7_OT_EUlvE_NS_9allocatorISF_EEFvvEEclEv
> @0x10d51982b std::__1::function<>::operator()()
> @0x10d51d405 
> _ZN7process8internal3runINSt3__18functionIFvvEEEJEEEvRKNS2_6vectorIT_NS2_9allocatorIS7_DpOT0_
> @0x10d537c05 process::Future<>::discard()
> @0x10d5591ff process::internal::Loop<>::run()
> @0x10d569be3 
> _ZZN7process8internal4LoopIZN19AfterTest_Loop_Test8TestBodyEvE3$_2ZNS2_8TestBodyEvE3$_37NothingS5_E3runENS_6FutureIS5_EEENKUlRKS8_E_clESA_
> @0x10d56f1ed 
> _ZZNK7process6FutureI7NothingE5onAnyIRZNS_8internal4LoopIZN19AfterTest_Loop_Test8TestBodyEvE3$_2ZNS6_8TestBodyEvE3$_3S1_S1_E3runES2_EUlRKS2_E_vEESB_OT_NS2_6PreferEENUlSB_E_clESB_
> @0x10d56f1bd 
> _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZNK7process6FutureI7NothingE5onAnyIRZNS3_8internal4LoopIZN19AfterTest_Loop_Test8TestBodyEvE3$_2ZNSA_8TestBodyEvE3$_3S5_S5_E3runES6_EUlRKS6_E_vEESF_OT_NS6_6PreferEEUlSF_E_SF_EEEvDpOT_
> @0x10d56ef79 
> _ZNSt3__110__function6__funcIZNK7process6FutureI7NothingE5onAnyIRZNS2_8internal4LoopIZN19AfterTest_Loop_Test8TestBodyEvE3$_2ZNS9_8TestBodyEvE3$_3S4_S4_E3runES5_EUlRKS5_E_vEESE_OT_NS5_6PreferEEUlSE_E_NS_9allocatorISK_EEFvSE_EEclESE_
> @0x10d51918e std::__1::function<>::operator()()
> @0x10d516545 
> _ZN7process8internal3runINSt3__18functionIFvRKNS_6FutureI7NothingEJRS6_EEEvRKNS2_6vectorIT_NS2_9allocatorISD_DpOT0_
> @0x10d51622a process::Future<>::_set<>()
> @0x10d516035 process::Future<>::set()
> @0x10d515ff0 process::Promise<>::_set<>()
> @0x10d515f95 process::Promise<>::set()
> @0x10d515f42 _ZZN7process5afterERK8DurationENKUlvE_clEv
> @0x10d515efd 
> _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN7process5afterERK8DurationEUlvE_EEEvDpOT_
> @0x10d515c19 
> _ZNSt3__110__function6__funcIZN7process5afterERK8DurationEUlvE_NS_9allocatorIS6_EEFvvEEclEv
> @0x10d51982b std::__1::function<>::operator()()
> @0x10e2a7c19 process::Timer::operator()()
> @0x10e2a762a process::timedout()
> @0x10e390265 
> _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRNS_6__bindIPFvRKNS_4listIN7process5TimerENS_9allocatorIS6_EJRKNS_12placeholders4__phILi1EESB_EEEvDpOT_
> @0x10e38ff79 
> _ZNSt3__110__function6__funcINS_6__bindIPFvRKNS_4listIN7process5TimerENS_9allocatorIS5_EJRKNS_12placeholders4__phILi1EENS6_ISI_EESB_EclESA_
> make[6]: *** [check-local] Segmentation fault: 11
> make[5]: *** [check-am] Error 2
> make[4]: *** [check-recursive] Error 1
> make[3]: *** [check] Error 2
> make[2]: *** [check-recursive] Error 1
> make[1]: *** [check] Error 2
> make: *** [check-recursive] Error 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7448) Add support for pruning the list of gone agents in the registry.

2017-09-25 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7448:
--
Shepherd: Vinod Kone

> Add support for pruning the list of gone agents in the registry.
> 
>
> Key: MESOS-7448
> URL: https://issues.apache.org/jira/browse/MESOS-7448
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> The list of gone agents in the registry can grow unbounded. We need to 
> implement a pruning operation similar to the already existing 
> {{PruneUnreachable}} for unreachable agents.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-3394) Pull in glog 0.3.6 (when it's released)

2017-09-25 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer updated MESOS-3394:

Summary: Pull in glog 0.3.6 (when it's released)  (was: Change glog 
download target for Windows when pull req is moved upstream)

> Pull in glog 0.3.6 (when it's released)
> ---
>
> Key: MESOS-3394
> URL: https://issues.apache.org/jira/browse/MESOS-3394
> Project: Mesos
>  Issue Type: Task
>  Components: cmake
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>  Labels: build, cmake, mesosphere
>
> To build on Windows, we have to build glog on Windows. But, glog doesn't 
> build on Windows, so we had to submit a patch to the project. So, to build on 
> Windows, we download the patched version directly from the pull request that 
> was sent to the glog repository on GitHub.
> When these patches move upstream, we need to change this to point at the 
> "real" glog release instead of the pull request.
> (For details see the `CMakeLists.txt` in `3rdparty/libprocess/3rdparty`.)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-3542) Separate libmesos into compiling from many binaries.

2017-09-25 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-3542:
---

Assignee: (was: Andrew Schwartzmeyer)

> Separate libmesos into compiling from many binaries.
> 
>
> Key: MESOS-3542
> URL: https://issues.apache.org/jira/browse/MESOS-3542
> Project: Mesos
>  Issue Type: Epic
>  Components: cmake
>Reporter: Alex Clemmer
>  Labels: cmake, mesosphere, microsoft, windows-mvp
>
> Historically libmesos is built as a huge monolithic binary. Another idea 
> would be to build it from a bunch of smaller libraries (_e.g._, libagent, 
> _etc_.).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-3542) Separate libmesos into compiling from many binaries.

2017-09-25 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179881#comment-16179881
 ] 

Andrew Schwartzmeyer commented on MESOS-3542:
-

This is a mammoth re-architecturing that's going to involve everyone, 
un-assigning from myself for now.

> Separate libmesos into compiling from many binaries.
> 
>
> Key: MESOS-3542
> URL: https://issues.apache.org/jira/browse/MESOS-3542
> Project: Mesos
>  Issue Type: Epic
>  Components: cmake
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>  Labels: cmake, mesosphere, microsoft, windows-mvp
>
> Historically libmesos is built as a huge monolithic binary. Another idea 
> would be to build it from a bunch of smaller libraries (_e.g._, libagent, 
> _etc_.).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7604) SlaveTest.ExecutorReregistrationTimeoutFlag aborts on Windows

2017-09-25 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179880#comment-16179880
 ] 

Andrew Schwartzmeyer commented on MESOS-7604:
-

I traced it a bit. The current way we're copying the {{id}} results in 
{{"latest"}} being copied as the {{id}} instead of the actual {{id}}.

This patch at least resolves _that part_ of the problem, and let's the 
{{AgentInfo}}s appear equivalent.

{noformat}
diff --git i/src/slave/slave.cpp w/src/slave/slave.cpp
index 319ff124d..f03f28810 100644
--- i/src/slave/slave.cpp
+++ w/src/slave/slave.cpp
@@ -6112,7 +6112,7 @@ Future Slave::recover(const Try& 
state)
 // TODO(vinod): Also check for version compatibility.

 SlaveInfo _info(info);
-_info.mutable_id()->CopyFrom(slaveState->id);
+_info.mutable_id()->CopyFrom(slaveState->info.get().id());
 if (flags.recover == "reconnect" &&
 !(_info == slaveState->info.get())) {
   string message = strings::join(
{noformat}

But it begs the question why the first {{id}} shows up as {{"latest"}} instead 
of being resolved to the {{id}}, and also, the test will still fail later due 
to other problems (pending investigation).

> SlaveTest.ExecutorReregistrationTimeoutFlag aborts on Windows
> -
>
> Key: MESOS-7604
> URL: https://issues.apache.org/jira/browse/MESOS-7604
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.4.0
> Environment: Windows
>Reporter: Joseph Wu
>Assignee: Andrew Schwartzmeyer
>  Labels: mesosphere, windows
>
> {code}
> [ RUN  ] SlaveTest.ExecutorReregistrationTimeoutFlag
> rk ae9679b1-67c9-4db6-8187-0641b0e929d2-
> I0601 23:53:23.488337  2748 master.cpp:1156] Master terminating
> I0601 23:53:23.492337  2728 hierarchical.cpp:579] Removed agent 
> ae9679b1-67c9-4db6-8187-0641b0e929d2-S0
> I0601 23:53:23.530340  1512 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0601 23:53:23.544342  2728 master.cpp:436] Master 
> f07f4fdd-cd91-4d62-bf33-169b20d02020 (ip-172-20-128-1.ec2.internal) started 
> on 172.20.128.1:51241
> I0601 23:53:23.545341  2728 master.cpp:438] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="false" --authenticate_frameworks="false" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="C:\temp\FWZORI\credentials" 
> --filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
> --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/webui" --work_dir="C:\temp\FWZORI\master" 
> --zk_session_timeout="10secs"
> I0601 23:53:23.550338  2728 master.cpp:515] Master only allowing 
> authenticated HTTP frameworks to register
> I0601 23:53:23.550338  2728 credentials.hpp:37] Loading credentials for 
> authentication from 'C:\temp\FWZORI\credentials'
> I0601 23:53:23.552338  2728 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0601 23:53:23.553339  2728 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0601 23:53:23.554340  2728 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0601 23:53:23.555341  2728 master.cpp:640] Authorization enabled
> I0601 23:53:23.570340  2124 master.cpp:2159] Elected as the leading master!
> I0601 23:53:23.570340  2124 master.cpp:1698] Recovering from registrar
> I0601 23:53:23.573341  1920 registrar.cpp:389] Successfully fetched the 
> registry (0B) in 0ns
> I0601 23:53:23.573341  1920 registrar.cpp:493] Applied 1 operations in 0ns; 
> attempting to update the registry
> I0601 23:53:23.575342  1920 registrar.cpp:550] Successfully updated the 
> registry in 0ns
> I0601 23:53:23.576344  1920 registrar.cpp:422] Successfully recovered 
> registrar
> I0601 23:53:23.577342  2728 master.cpp:1797] Recovered 0 agents from the 
> registry (167B); 

[jira] [Created] (MESOS-8015) Design a scheduler (V1) HTTP API authenticatee mechanism.

2017-09-25 Thread Till Toenshoff (JIRA)
Till Toenshoff created MESOS-8015:
-

 Summary: Design a scheduler (V1) HTTP API authenticatee mechanism.
 Key: MESOS-8015
 URL: https://issues.apache.org/jira/browse/MESOS-8015
 Project: Mesos
  Issue Type: Improvement
  Components: HTTP API, modules, scheduler api, security
Reporter: Till Toenshoff
Assignee: Till Toenshoff


Provide a design proposal for a scheduler HTTP API authenticatee module.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8010) AfterTest.Loop is flaky.

2017-09-25 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-8010:
---
Summary: AfterTest.Loop is flaky.  (was: FutureTest.After1 is flaky.)

Looking at the stack trace, it appears to be from AfterTest.Loop, but triggered 
asynchronously after that particular test completed. Adjusted the summary.

> AfterTest.Loop is flaky.
> 
>
> Key: MESOS-8010
> URL: https://issues.apache.org/jira/browse/MESOS-8010
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 1.5.0
> Environment: Apple LLVM version 9.0.0 (clang-900.0.37)
>Reporter: Till Toenshoff
>  Labels: flaky, flaky-test
>
> The following just happened in a local test-build of the current master. I 
> did not go deeper in retrying or raising the log-level yet.
> {noformat}
> [ RUN  ] FutureTest.After1
> PC: @0x10d53b127 process::Future<>::Data::clearAllCallbacks()
> *** SIGSEGV (@0xb0) received by PID 49169 (TID 0x7d88f000) stack trace: 
> ***
> @ 0x7fff5ce76f5a _sigtramp
> @ 0x7fb52ce054f0 (unknown)
> @0x10d572be7 process::Future<>::_set<>()
> @0x10d5729d5 process::Future<>::set()
> @0x10d572990 process::Promise<>::_set<>()
> @0x10d572835 process::Promise<>::set()
> @0x10d5727bb AfterTest_Loop_Test::TestBody()::$_4::operator()()
> @0x10d572765 
> _ZZNK7process6FutureINS_11ControlFlowI7NothingEEE9onDiscardIZN19AfterTest_Loop_Test8TestBodyEvE3$_4EERKS4_OT_ENUlvE_clEv
> @0x10d57273d 
> _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZNK7process6FutureINS3_11ControlFlowI7NothingEEE9onDiscardIZN19AfterTest_Loop_Test8TestBodyEvE3$_4EERKS8_OT_EUlvE_EEEvDpOT_
> @0x10d572659 
> _ZNSt3__110__function6__funcIZNK7process6FutureINS2_11ControlFlowI7NothingEEE9onDiscardIZN19AfterTest_Loop_Test8TestBodyEvE3$_4EERKS7_OT_EUlvE_NS_9allocatorISF_EEFvvEEclEv
> @0x10d51982b std::__1::function<>::operator()()
> @0x10d51d405 
> _ZN7process8internal3runINSt3__18functionIFvvEEEJEEEvRKNS2_6vectorIT_NS2_9allocatorIS7_DpOT0_
> @0x10d537c05 process::Future<>::discard()
> @0x10d5591ff process::internal::Loop<>::run()
> @0x10d569be3 
> _ZZN7process8internal4LoopIZN19AfterTest_Loop_Test8TestBodyEvE3$_2ZNS2_8TestBodyEvE3$_37NothingS5_E3runENS_6FutureIS5_EEENKUlRKS8_E_clESA_
> @0x10d56f1ed 
> _ZZNK7process6FutureI7NothingE5onAnyIRZNS_8internal4LoopIZN19AfterTest_Loop_Test8TestBodyEvE3$_2ZNS6_8TestBodyEvE3$_3S1_S1_E3runES2_EUlRKS2_E_vEESB_OT_NS2_6PreferEENUlSB_E_clESB_
> @0x10d56f1bd 
> _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZNK7process6FutureI7NothingE5onAnyIRZNS3_8internal4LoopIZN19AfterTest_Loop_Test8TestBodyEvE3$_2ZNSA_8TestBodyEvE3$_3S5_S5_E3runES6_EUlRKS6_E_vEESF_OT_NS6_6PreferEEUlSF_E_SF_EEEvDpOT_
> @0x10d56ef79 
> _ZNSt3__110__function6__funcIZNK7process6FutureI7NothingE5onAnyIRZNS2_8internal4LoopIZN19AfterTest_Loop_Test8TestBodyEvE3$_2ZNS9_8TestBodyEvE3$_3S4_S4_E3runES5_EUlRKS5_E_vEESE_OT_NS5_6PreferEEUlSE_E_NS_9allocatorISK_EEFvSE_EEclESE_
> @0x10d51918e std::__1::function<>::operator()()
> @0x10d516545 
> _ZN7process8internal3runINSt3__18functionIFvRKNS_6FutureI7NothingEJRS6_EEEvRKNS2_6vectorIT_NS2_9allocatorISD_DpOT0_
> @0x10d51622a process::Future<>::_set<>()
> @0x10d516035 process::Future<>::set()
> @0x10d515ff0 process::Promise<>::_set<>()
> @0x10d515f95 process::Promise<>::set()
> @0x10d515f42 _ZZN7process5afterERK8DurationENKUlvE_clEv
> @0x10d515efd 
> _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN7process5afterERK8DurationEUlvE_EEEvDpOT_
> @0x10d515c19 
> _ZNSt3__110__function6__funcIZN7process5afterERK8DurationEUlvE_NS_9allocatorIS6_EEFvvEEclEv
> @0x10d51982b std::__1::function<>::operator()()
> @0x10e2a7c19 process::Timer::operator()()
> @0x10e2a762a process::timedout()
> @0x10e390265 
> _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRNS_6__bindIPFvRKNS_4listIN7process5TimerENS_9allocatorIS6_EJRKNS_12placeholders4__phILi1EESB_EEEvDpOT_
> @0x10e38ff79 
> _ZNSt3__110__function6__funcINS_6__bindIPFvRKNS_4listIN7process5TimerENS_9allocatorIS5_EJRKNS_12placeholders4__phILi1EENS6_ISI_EESB_EclESA_
> make[6]: *** [check-local] Segmentation fault: 11
> make[5]: *** [check-am] Error 2
> make[4]: *** [check-recursive] Error 1
> make[3]: *** [check] Error 2
> make[2]: *** [check-recursive] Error 1
> make[1]: *** [check] Error 2
> make: *** [check-recursive] Error 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-3009) Reproduce systemd cgroup behavior

2017-09-25 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179773#comment-16179773
 ] 

James Peach edited comment on MESOS-3009 at 9/25/17 9:08 PM:
-

With {{systemd-233}} I see systemd nuking the memory cgroup which breaks the 
Mesos agent:
{noformat}
systemd(kernel.function("SyS_rmdir@fs/namei.c:3936")): /sys/fs/cgroup/memory
 0x7f7559be3c47 : rmdir+0x7/0x30 [/usr/lib64/libc-2.25.so]
 0x7f755b2fa169 : cg_trim+0x109/0x1f0 
[/usr/lib/systemd/libsystemd-shared-233.so]
 0x7f755b2fc280 : cg_create_everywhere+0xa0/0xb0 
[/usr/lib/systemd/libsystemd-shared-233.so]
 0x55d79d8bf861 : unit_realize_cgroup_now.lto_priv.582+0x101/0x23b0 
[/usr/lib/systemd/systemd]
 0x55d79d8bfc88 : unit_realize_cgroup_now.lto_priv.582+0x528/0x23b0 
[/usr/lib/systemd/systemd]
 0x55d79d8bfc88 : unit_realize_cgroup_now.lto_priv.582+0x528/0x23b0 
[/usr/lib/systemd/systemd]
 0x55d79d8c1ce2 : unit_realize_cgroup+0x1d2/0x200 [/usr/lib/systemd/systemd]
 0x55d79d8a1daa : slice_start.lto_priv.202+0x2a/0x90 [/usr/lib/systemd/systemd]
 0x55d79d8b898c : job_perform_on_unit.lto_priv.583+0x5fc/0x6d0 
[/usr/lib/systemd/systemd]
 0x55d79d85e3a8 : manager_dispatch_run_queue+0x258/0x640 
[/usr/lib/systemd/systemd]
 0x7f755b33f8ca : source_dispatch+0x14a/0x380 
[/usr/lib/systemd/libsystemd-shared-233.so]
 0x7f755b33fbca : sd_event_dispatch+0xca/0x1d0 
[/usr/lib/systemd/libsystemd-shared-233.so]
 0x7f755b341007 : sd_event_run+0x77/0x200 
[/usr/lib/systemd/libsystemd-shared-233.so]
 0x55d79d853384 : manager_loop+0x605/0x676 [/usr/lib/systemd/systemd]
 0x55d79d85b2b6 : main+0x39b6/0x4710 [/usr/lib/systemd/systemd]
 0x7f7559b0250a : __libc_start_main+0xea/0x1c0 [/usr/lib64/libc-2.25.so]
 0x55d79d85c05a : _start+0x2a/0x30 [/usr/lib/systemd/systemd]
{noformat}


was (Author: jamespeach):
With {{systems-233}} I see systemd nuking the memory cgroup which breaks the 
Mesos agent:
{noformat}
systemd(kernel.function("SyS_rmdir@fs/namei.c:3936")): /sys/fs/cgroup/memory
 0x7f7559be3c47 : rmdir+0x7/0x30 [/usr/lib64/libc-2.25.so]
 0x7f755b2fa169 : cg_trim+0x109/0x1f0 
[/usr/lib/systemd/libsystemd-shared-233.so]
 0x7f755b2fc280 : cg_create_everywhere+0xa0/0xb0 
[/usr/lib/systemd/libsystemd-shared-233.so]
 0x55d79d8bf861 : unit_realize_cgroup_now.lto_priv.582+0x101/0x23b0 
[/usr/lib/systemd/systemd]
 0x55d79d8bfc88 : unit_realize_cgroup_now.lto_priv.582+0x528/0x23b0 
[/usr/lib/systemd/systemd]
 0x55d79d8bfc88 : unit_realize_cgroup_now.lto_priv.582+0x528/0x23b0 
[/usr/lib/systemd/systemd]
 0x55d79d8c1ce2 : unit_realize_cgroup+0x1d2/0x200 [/usr/lib/systemd/systemd]
 0x55d79d8a1daa : slice_start.lto_priv.202+0x2a/0x90 [/usr/lib/systemd/systemd]
 0x55d79d8b898c : job_perform_on_unit.lto_priv.583+0x5fc/0x6d0 
[/usr/lib/systemd/systemd]
 0x55d79d85e3a8 : manager_dispatch_run_queue+0x258/0x640 
[/usr/lib/systemd/systemd]
 0x7f755b33f8ca : source_dispatch+0x14a/0x380 
[/usr/lib/systemd/libsystemd-shared-233.so]
 0x7f755b33fbca : sd_event_dispatch+0xca/0x1d0 
[/usr/lib/systemd/libsystemd-shared-233.so]
 0x7f755b341007 : sd_event_run+0x77/0x200 
[/usr/lib/systemd/libsystemd-shared-233.so]
 0x55d79d853384 : manager_loop+0x605/0x676 [/usr/lib/systemd/systemd]
 0x55d79d85b2b6 : main+0x39b6/0x4710 [/usr/lib/systemd/systemd]
 0x7f7559b0250a : __libc_start_main+0xea/0x1c0 [/usr/lib64/libc-2.25.so]
 0x55d79d85c05a : _start+0x2a/0x30 [/usr/lib/systemd/systemd]
{noformat}

> Reproduce systemd cgroup behavior 
> --
>
> Key: MESOS-3009
> URL: https://issues.apache.org/jira/browse/MESOS-3009
> Project: Mesos
>  Issue Type: Task
>Reporter: Artem Harutyunyan
>Assignee: Joris Van Remoortere
>  Labels: mesosphere
>
> It has been noticed before that systemd reorganizes cgroup hierarchy created 
> by mesos slave. Because of this mesos is no longer able to find the cgroup, 
> and there is also a chance of undoing the isolation that mesos slave puts in 
> place. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-3009) Reproduce systemd cgroup behavior

2017-09-25 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179773#comment-16179773
 ] 

James Peach commented on MESOS-3009:


With {{systems-233}} I see systemd nuking the memory cgroup which breaks the 
Mesos agent:
{noformat}
systemd(kernel.function("SyS_rmdir@fs/namei.c:3936")): /sys/fs/cgroup/memory
 0x7f7559be3c47 : rmdir+0x7/0x30 [/usr/lib64/libc-2.25.so]
 0x7f755b2fa169 : cg_trim+0x109/0x1f0 
[/usr/lib/systemd/libsystemd-shared-233.so]
 0x7f755b2fc280 : cg_create_everywhere+0xa0/0xb0 
[/usr/lib/systemd/libsystemd-shared-233.so]
 0x55d79d8bf861 : unit_realize_cgroup_now.lto_priv.582+0x101/0x23b0 
[/usr/lib/systemd/systemd]
 0x55d79d8bfc88 : unit_realize_cgroup_now.lto_priv.582+0x528/0x23b0 
[/usr/lib/systemd/systemd]
 0x55d79d8bfc88 : unit_realize_cgroup_now.lto_priv.582+0x528/0x23b0 
[/usr/lib/systemd/systemd]
 0x55d79d8c1ce2 : unit_realize_cgroup+0x1d2/0x200 [/usr/lib/systemd/systemd]
 0x55d79d8a1daa : slice_start.lto_priv.202+0x2a/0x90 [/usr/lib/systemd/systemd]
 0x55d79d8b898c : job_perform_on_unit.lto_priv.583+0x5fc/0x6d0 
[/usr/lib/systemd/systemd]
 0x55d79d85e3a8 : manager_dispatch_run_queue+0x258/0x640 
[/usr/lib/systemd/systemd]
 0x7f755b33f8ca : source_dispatch+0x14a/0x380 
[/usr/lib/systemd/libsystemd-shared-233.so]
 0x7f755b33fbca : sd_event_dispatch+0xca/0x1d0 
[/usr/lib/systemd/libsystemd-shared-233.so]
 0x7f755b341007 : sd_event_run+0x77/0x200 
[/usr/lib/systemd/libsystemd-shared-233.so]
 0x55d79d853384 : manager_loop+0x605/0x676 [/usr/lib/systemd/systemd]
 0x55d79d85b2b6 : main+0x39b6/0x4710 [/usr/lib/systemd/systemd]
 0x7f7559b0250a : __libc_start_main+0xea/0x1c0 [/usr/lib64/libc-2.25.so]
 0x55d79d85c05a : _start+0x2a/0x30 [/usr/lib/systemd/systemd]
{noformat}

> Reproduce systemd cgroup behavior 
> --
>
> Key: MESOS-3009
> URL: https://issues.apache.org/jira/browse/MESOS-3009
> Project: Mesos
>  Issue Type: Task
>Reporter: Artem Harutyunyan
>Assignee: Joris Van Remoortere
>  Labels: mesosphere
>
> It has been noticed before that systemd reorganizes cgroup hierarchy created 
> by mesos slave. Because of this mesos is no longer able to find the cgroup, 
> and there is also a chance of undoing the isolation that mesos slave puts in 
> place. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8014) Provide HTTP authenticatee interface re/usable for the scheduler library.

2017-09-25 Thread Till Toenshoff (JIRA)
Till Toenshoff created MESOS-8014:
-

 Summary: Provide HTTP authenticatee interface re/usable for the 
scheduler library.
 Key: MESOS-8014
 URL: https://issues.apache.org/jira/browse/MESOS-8014
 Project: Mesos
  Issue Type: Epic
  Components: HTTP API, modules, scheduler api, security
Reporter: Till Toenshoff
Assignee: Till Toenshoff


h4. Motivation

Authentication and authorization have been added to most Mesos APIs at this 
point. Schedulers making use of the Mesos HTTP scheduler library however, 
currently only support a hard wired basic HTTP authentication.

To secure the master’s HTTP scheduler API, the {{/api/v1/scheduler}} endpoint 
must be authenticated. Without authentication, a malicious or buggy actor from 
within or outside the cluster could send requests to these master endpoints, 
potentially disrupting running schedulers or tasks, injecting harmful tasks, or 
exposing privileged information.


h4. Goals

- Support custom authentication of schedulers based on the Mesos V1 HTTP 
scheduler API library 
[/src/scheduler/scheduler.cpp|https://github.com/apache/mesos/blob/8198579fea7e433e202bd33f4ea62eb235859365/src/scheduler/scheduler.cpp].
- Require minimal operator configuration when enabling scheduler authentication 
for a simple default use case.
- Provide a thin, reusable layer of abstraction enabling any HTTP API consumer 
to authenticate.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7951) Extend the KillPolicy

2017-09-25 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179729#comment-16179729
 ] 

Greg Mann commented on MESOS-7951:
--

Design doc here: 
https://docs.google.com/document/d/1xRaOEe2K7OIVrDTOY9UDwwJbCIwXF3wZUrXYl8Pqy24/edit?usp=sharing

> Extend the KillPolicy
> -
>
> Key: MESOS-7951
> URL: https://issues.apache.org/jira/browse/MESOS-7951
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent, executor, HTTP API
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
>
> After introducing the {{KillPolicy}} in MESOS-4909, some interactions with 
> framework developers have led to the suggestion of a couple possible 
> improvements to this interface. Namely,
> * Allowing the framework to specify a command to be run to initiate 
> termination, rather than a signal to be sent, would allow some developers to 
> avoid wrapping their application in a signal handler. This is useful because 
> a signal handler wrapper modifies the application's process tree, which may 
> make introspection and debugging more difficult in the case of well-known 
> services with standard debugging procedures.
> * In the case of terminations which do begin with a signal, it would be 
> useful to allow the framework to specify the signal to be sent, rather than 
> assuming SIGTERM. PostgreSQL, for example, permits several shutdown types, 
> each initiated with a [different 
> signal|https://www.postgresql.org/docs/9.3/static/server-shutdown.html].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-7963) Task groups can lose the container limitation status.

2017-09-25 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach reassigned MESOS-7963:
--

Assignee: James Peach

> Task groups can lose the container limitation status.
> -
>
> Key: MESOS-7963
> URL: https://issues.apache.org/jira/browse/MESOS-7963
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, executor
>Reporter: James Peach
>Assignee: James Peach
>
> If you run a single task in a task group and that task fails with a container 
> limitation, that status update can be lost and only the executor failure will 
> be reported to the framework.
> {noformat}
> exec /opt/mesos/bin/mesos-execute --content_type=json 
> --master=jpeach.apple.com:5050 '--task_group={
> "tasks":
> [
> {
> "name": "7f141aca-55fe-4bb0-af4b-87f5ee26986a",
> "task_id": {"value" : "2866368d-7279-4657-b8eb-bf1d968e8ebf"},
> "agent_id": {"value" : ""},
> "resources": [{
> "name": "cpus",
> "type": "SCALAR",
> "scalar": {
> "value": 0.2
> }
> }, {
> "name": "mem",
> "type": "SCALAR",
> "scalar": {
> "value": 32
> }
> }, {
> "name": "disk",
> "type": "SCALAR",
> "scalar": {
> "value": 2
> }
> }
> ],
> "command": {
> "value": "sleep 2 ; /usr/bin/dd if=/dev/zero of=out.dat bs=1M 
> count=64 ; sleep 1"
> }
> }
> ]
> }'
> I0911 11:48:01.480689  7340 scheduler.cpp:184] Version: 1.5.0
> I0911 11:48:01.488868  7339 scheduler.cpp:470] New master detected at 
> master@17.228.224.108:5050
> Subscribed with ID aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-0010
> Submitted task group with tasks [ 2866368d-7279-4657-b8eb-bf1d968e8ebf ] to 
> agent 'aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-S0'
> Received status update TASK_RUNNING for task 
> '2866368d-7279-4657-b8eb-bf1d968e8ebf'
>   source: SOURCE_EXECUTOR
> Received status update TASK_FAILED for task 
> '2866368d-7279-4657-b8eb-bf1d968e8ebf'
>   message: 'Command terminated with signal Killed'
>   source: SOURCE_EXECUTOR
> {noformat}
> However, the agent logs show that this failed with a memory limitation:
> {noformat}
> I0911 11:48:02.235818  7012 http.cpp:532] Processing call 
> WAIT_NESTED_CONTAINER
> I0911 11:48:02.236395  7013 status_update_manager.cpp:323] Received status 
> update TASK_RUNNING (UUID: 85e7a8e8-22a7-4561-9000-2cd6d93502d9) for task 
> 2866368d-7279-4657-b8eb-bf1d968e8ebf of framework 
> aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-0010
> I0911 11:48:02.237083  7016 slave.cpp:4875] Forwarding the update 
> TASK_RUNNING (UUID: 85e7a8e8-22a7-4561-9000-2cd6d93502d9) for task 
> 2866368d-7279-4657-b8eb-bf1d968e8ebf of framework 
> aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-0010 to master@17.228.224.108:5050
> I0911 11:48:02.283661  7007 status_update_manager.cpp:395] Received status 
> update acknowledgement (UUID: 85e7a8e8-22a7-4561-9000-2cd6d93502d9) for task 
> 2866368d-7279-4657-b8eb-bf1d968e8ebf of framework 
> aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-0010
> I0911 11:48:04.771455  7014 memory.cpp:516] OOM detected for container 
> 474388fe-43c3-4372-b903-eaca22740996
> I0911 11:48:04.776445  7014 memory.cpp:556] Memory limit exceeded: Requested: 
> 64MB Maximum Used: 64MB
> ...
> I0911 11:48:04.776943  7012 containerizer.cpp:2681] Container 
> 474388fe-43c3-4372-b903-eaca22740996 has reached its limit for resource 
> [{"name":"mem","scalar":{"value":64.0},"type":"SCALAR"}] and will be 
> terminated
> {noformat}
> The following {{mesos-execute}} task will show the container limitation 
> correctly:
> {noformat}
> exec /opt/mesos/bin/mesos-execute --content_type=json 
> --master=jpeach.apple.com:5050 '--task_group={
> "tasks":
> [
> {
> "name": "37db08f6-4f0f-4ef6-97ee-b10a5c5cc211",
> "task_id": {"value" : "1372b2e2-c501-4e80-bcbd-1a5c5194e206"},
> "agent_id": {"value" : ""},
> "resources": [{
> "name": "cpus",
> "type": "SCALAR",
> "scalar": {
> "value": 0.2
> }
> },
> {
> "name": "mem",
> "type": "SCALAR",
> "scalar": {
> "value": 32
> }
> }],
> "command": {
> "value": "sleep 600"
> }
> }, {
> "name": "7247643c-5e4d-4b01-9839-e38db49f7f4d",
> "task_id": {"value" : "a7571608-3a53-4971-a187-41ed8be183ba"},
>  

[jira] [Commented] (MESOS-7963) Task groups can lose the container limitation status.

2017-09-25 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179635#comment-16179635
 ] 

Vinod Kone commented on MESOS-7963:
---

[~jpe...@apache.org] If you are working on this can you assign it to yourself? 
Thanks.

> Task groups can lose the container limitation status.
> -
>
> Key: MESOS-7963
> URL: https://issues.apache.org/jira/browse/MESOS-7963
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, executor
>Reporter: James Peach
>
> If you run a single task in a task group and that task fails with a container 
> limitation, that status update can be lost and only the executor failure will 
> be reported to the framework.
> {noformat}
> exec /opt/mesos/bin/mesos-execute --content_type=json 
> --master=jpeach.apple.com:5050 '--task_group={
> "tasks":
> [
> {
> "name": "7f141aca-55fe-4bb0-af4b-87f5ee26986a",
> "task_id": {"value" : "2866368d-7279-4657-b8eb-bf1d968e8ebf"},
> "agent_id": {"value" : ""},
> "resources": [{
> "name": "cpus",
> "type": "SCALAR",
> "scalar": {
> "value": 0.2
> }
> }, {
> "name": "mem",
> "type": "SCALAR",
> "scalar": {
> "value": 32
> }
> }, {
> "name": "disk",
> "type": "SCALAR",
> "scalar": {
> "value": 2
> }
> }
> ],
> "command": {
> "value": "sleep 2 ; /usr/bin/dd if=/dev/zero of=out.dat bs=1M 
> count=64 ; sleep 1"
> }
> }
> ]
> }'
> I0911 11:48:01.480689  7340 scheduler.cpp:184] Version: 1.5.0
> I0911 11:48:01.488868  7339 scheduler.cpp:470] New master detected at 
> master@17.228.224.108:5050
> Subscribed with ID aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-0010
> Submitted task group with tasks [ 2866368d-7279-4657-b8eb-bf1d968e8ebf ] to 
> agent 'aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-S0'
> Received status update TASK_RUNNING for task 
> '2866368d-7279-4657-b8eb-bf1d968e8ebf'
>   source: SOURCE_EXECUTOR
> Received status update TASK_FAILED for task 
> '2866368d-7279-4657-b8eb-bf1d968e8ebf'
>   message: 'Command terminated with signal Killed'
>   source: SOURCE_EXECUTOR
> {noformat}
> However, the agent logs show that this failed with a memory limitation:
> {noformat}
> I0911 11:48:02.235818  7012 http.cpp:532] Processing call 
> WAIT_NESTED_CONTAINER
> I0911 11:48:02.236395  7013 status_update_manager.cpp:323] Received status 
> update TASK_RUNNING (UUID: 85e7a8e8-22a7-4561-9000-2cd6d93502d9) for task 
> 2866368d-7279-4657-b8eb-bf1d968e8ebf of framework 
> aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-0010
> I0911 11:48:02.237083  7016 slave.cpp:4875] Forwarding the update 
> TASK_RUNNING (UUID: 85e7a8e8-22a7-4561-9000-2cd6d93502d9) for task 
> 2866368d-7279-4657-b8eb-bf1d968e8ebf of framework 
> aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-0010 to master@17.228.224.108:5050
> I0911 11:48:02.283661  7007 status_update_manager.cpp:395] Received status 
> update acknowledgement (UUID: 85e7a8e8-22a7-4561-9000-2cd6d93502d9) for task 
> 2866368d-7279-4657-b8eb-bf1d968e8ebf of framework 
> aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-0010
> I0911 11:48:04.771455  7014 memory.cpp:516] OOM detected for container 
> 474388fe-43c3-4372-b903-eaca22740996
> I0911 11:48:04.776445  7014 memory.cpp:556] Memory limit exceeded: Requested: 
> 64MB Maximum Used: 64MB
> ...
> I0911 11:48:04.776943  7012 containerizer.cpp:2681] Container 
> 474388fe-43c3-4372-b903-eaca22740996 has reached its limit for resource 
> [{"name":"mem","scalar":{"value":64.0},"type":"SCALAR"}] and will be 
> terminated
> {noformat}
> The following {{mesos-execute}} task will show the container limitation 
> correctly:
> {noformat}
> exec /opt/mesos/bin/mesos-execute --content_type=json 
> --master=jpeach.apple.com:5050 '--task_group={
> "tasks":
> [
> {
> "name": "37db08f6-4f0f-4ef6-97ee-b10a5c5cc211",
> "task_id": {"value" : "1372b2e2-c501-4e80-bcbd-1a5c5194e206"},
> "agent_id": {"value" : ""},
> "resources": [{
> "name": "cpus",
> "type": "SCALAR",
> "scalar": {
> "value": 0.2
> }
> },
> {
> "name": "mem",
> "type": "SCALAR",
> "scalar": {
> "value": 32
> }
> }],
> "command": {
> "value": "sleep 600"
> }
> }, {
> "name": "7247643c-5e4d-4b01-9839-e38db49f7f4d",
> 

[jira] [Assigned] (MESOS-7130) port_mapping isolator: executor hangs when running on EC2

2017-09-25 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone reassigned MESOS-7130:
-

Assignee: Jie Yu
  Sprint: Mesosphere Sprint 65

> port_mapping isolator: executor hangs when running on EC2
> -
>
> Key: MESOS-7130
> URL: https://issues.apache.org/jira/browse/MESOS-7130
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Reporter: Pierre Cheynier
>Assignee: Jie Yu
>
> Hi,
> I'm experiencing a weird issue: I'm using a CI to do testing on 
> infrastructure automation.
> I recently activated the {{network/port_mapping}} isolator.
> I'm able to make the changes work and pass the test for bare-metal servers 
> and virtualbox VMs using this configuration.
> But when I try on EC2 (on which my CI pipeline rely) it systematically fails 
> to run any container.
> It appears that the sandbox is created and the port_mapping isolator seems to 
> be OK according to the logs in stdout and stderr and the {tc} output :
> {noformat}
> + mount --make-rslave /run/netns
> + test -f /proc/sys/net/ipv6/conf/all/disable_ipv6
> + echo 1
> + ip link set lo address 02:44:20:bb:42:cf mtu 9001 up
> + ethtool -K eth0 rx off
> (...)
> + tc filter show dev eth0 parent :0
> + tc filter show dev lo parent :0
> I0215 16:01:13.941375 1 exec.cpp:161] Version: 1.0.2
> {noformat}
> Then the executor never come back in REGISTERED state and hang indefinitely.
> {GLOG_v=3} doesn't help here.
> My skills in this area are limited, but trying to load the symbols and attach 
> a gdb to the mesos-executor process, I'm able to print this stack:
> {noformat}
> #0  0x7feffc1386d5 in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /usr/lib64/libpthread.so.0
> #1  0x7feffbed69ec in 
> std::condition_variable::wait(std::unique_lock&) () from 
> /usr/lib64/libstdc++.so.6
> #2  0x7ff0003dd8ec in void synchronized_wait std::mutex>(std::condition_variable*, std::mutex*) () from 
> /usr/lib64/libmesos-1.0.2.so
> #3  0x7ff0017d595d in Gate::arrive(long) () from 
> /usr/lib64/libmesos-1.0.2.so
> #4  0x7ff0017c00ed in process::ProcessManager::wait(process::UPID const&) 
> () from /usr/lib64/libmesos-1.0.2.so
> #5  0x7ff0017c5c05 in process::wait(process::UPID const&, Duration 
> const&) () from /usr/lib64/libmesos-1.0.2.so
> #6  0x004ab26f in process::wait(process::ProcessBase const*, Duration 
> const&) ()
> #7  0x004a3903 in main ()
> {noformat}
> I concluded that the underlying shell script launched by the isolator or the 
> task itself is just .. blocked. But I don't understand why.
> Here is a process tree to show that I've no task running but the executor is:
> {noformat}
> root 28420  0.8  3.0 1061420 124940 ?  Ssl  17:56   0:25 
> /usr/sbin/mesos-slave --advertise_ip=127.0.0.1 
> --attributes=platform:centos;platform_major_version:7;type:base 
> --cgroups_enable_cfs --cgroups_hierarchy=/sys/fs/cgroup 
> --cgroups_net_cls_primary_handle=0xC370 
> --container_logger=org_apache_mesos_LogrotateContainerLogger 
> --containerizers=mesos,docker 
> --credential=file:///etc/mesos-chef/slave-credential 
> --default_container_info={"type":"MESOS","volumes":[{"host_path":"tmp","container_path":"/tmp","mode":"RW"}]}
>  --default_role=default --docker_registry=/usr/share/mesos/users 
> --docker_store_dir=/var/opt/mesos/store/docker 
> --egress_unique_flow_per_container --enforce_container_disk_quota 
> --ephemeral_ports_per_container=128 
> --executor_environment_variables={"PATH":"/bin:/usr/bin:/usr/sbin","CRITEO_DC":"par","CRITEO_ENV":"prod"}
>  --image_providers=docker --image_provisioner_backend=copy 
> --isolation=cgroups/cpu,cgroups/mem,cgroups/net_cls,namespaces/pid,disk/du,filesystem/shared,filesystem/linux,docker/runtime,network/cni,network/port_mapping
>  --logging_level=INFO 
> --master=zk://mesos:test@localhost.localdomain:2181/mesos 
> --modules=file:///etc/mesos-chef/slave-modules.json --port=5051 
> --recover=reconnect 
> --resources=ports:[31000-32000];ephemeral_ports:[32768-57344] --strict 
> --work_dir=/var/opt/mesos
> root 28484  0.0  2.3 433676 95016 ?Ssl  17:56   0:00  \_ 
> mesos-logrotate-logger --help=false 
> --log_filename=/var/opt/mesos/slaves/cdf94219-87b2-4af2-9f61-5697f0442915-S0/frameworks/366e8ed2-730e-4423-9324-086704d182b0-/executors/group_simplehttp.16f7c2ee-f3a8-11e6-be1c-0242b44d071f/runs/1d3e6b1c-cda8-47e5-92c4-a161429a7ac6/stdout
>  --logrotate_options=rotate 5 --logrotate_path=logrotate --max_size=10MB
> root 28485  0.0  2.3 499212 94724 ?Ssl  17:56   0:00  \_ 
> mesos-logrotate-logger --help=false 
> 

[jira] [Assigned] (MESOS-7966) check for maintenance on agent causes fatal error

2017-09-25 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone reassigned MESOS-7966:
-

Assignee: Joseph Wu
  Sprint: Mesosphere Sprint 65
Priority: Blocker  (was: Critical)

> check for maintenance on agent causes fatal error
> -
>
> Key: MESOS-7966
> URL: https://issues.apache.org/jira/browse/MESOS-7966
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.1.0
>Reporter: Rob Johnson
>Assignee: Joseph Wu
>Priority: Blocker
>
> We interact with the maintenance API frequently to orchestrate gracefully 
> draining agents of tasks without impacting service availability.
> Occasionally we seem to trigger a fatal error in Mesos when interacting with 
> the api. This happens relatively frequently, and impacts us when downstream 
> frameworks (marathon) react badly to leader elections.
> Here is the log line that we see when the master dies:
> {code}
> F0911 12:18:49.543401 123748 hierarchical.cpp:872] Check failed: 
> slaves[slaveId].maintenance.isSome()
> {code}
> It's quite possibly we're using the maintenance API in the wrong way. We're 
> happy to provide any other logs you need - please let me know what would be 
> useful for debugging.
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7975) The command/default executor can incorrectly send a TASK_FINISHED update even when the task is killed

2017-09-25 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7975:
--
Sprint: Mesosphere Sprint 65

> The command/default executor can incorrectly send a TASK_FINISHED update even 
> when the task is killed
> -
>
> Key: MESOS-7975
> URL: https://issues.apache.org/jira/browse/MESOS-7975
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Qian Zhang
>Priority: Critical
>  Labels: mesosphere
>
> Currently, when a task is killed, the default and the command executor 
> incorrectly send a {{TASK_FINISHED}} status update instead of 
> {{TASK_KILLED}}. This is due to an unfortunate missed conditional check when 
> the task exits with a zero status code.
> {code}
>   if (WSUCCEEDED(status)) {
> taskState = TASK_FINISHED;
>   } else if (killed) {
> // Send TASK_KILLED if the task was killed as a result of
> // kill() or shutdown().
> taskState = TASK_KILLED;
>   } else {
> taskState = TASK_FAILED;
>   }
> {code}
> We should modify the code to correctly send {{TASK_KILLED}} status updates 
> when a task is killed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7975) The command/default executor can incorrectly send a TASK_FINISHED update even when the task is killed

2017-09-25 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179628#comment-16179628
 ] 

Vinod Kone commented on MESOS-7975:
---

cc [~bmahler]

> The command/default executor can incorrectly send a TASK_FINISHED update even 
> when the task is killed
> -
>
> Key: MESOS-7975
> URL: https://issues.apache.org/jira/browse/MESOS-7975
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Qian Zhang
>Priority: Critical
>  Labels: mesosphere
>
> Currently, when a task is killed, the default and the command executor 
> incorrectly send a {{TASK_FINISHED}} status update instead of 
> {{TASK_KILLED}}. This is due to an unfortunate missed conditional check when 
> the task exits with a zero status code.
> {code}
>   if (WSUCCEEDED(status)) {
> taskState = TASK_FINISHED;
>   } else if (killed) {
> // Send TASK_KILLED if the task was killed as a result of
> // kill() or shutdown().
> taskState = TASK_KILLED;
>   } else {
> taskState = TASK_FAILED;
>   }
> {code}
> We should modify the code to correctly send {{TASK_KILLED}} status updates 
> when a task is killed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-7991) fatal, check failed !framework->recovered()

2017-09-25 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone reassigned MESOS-7991:
-

Assignee: Kapil Arya

> fatal, check failed !framework->recovered()
> ---
>
> Key: MESOS-7991
> URL: https://issues.apache.org/jira/browse/MESOS-7991
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jack Crawford
>Assignee: Kapil Arya
>
> mesos master crashed on what appears to be framework recovery
> mesos master version: 1.3.1
> mesos agent version: 1.3.1
> ```
> W0920 14:58:54.756364 25452 master.cpp:7568] Task 
> 862181ec-dffb-4c03-8807-5fb4c4e9a907 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756369 25452 master.cpp:7568] Task 
> 9c21c48a-63ad-4d58-9e22-f720af19a644 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756376 25452 master.cpp:7568] Task 
> 05c451f8-c48a-47bd-a235-0ceb9b3f8d0c of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756381 25452 master.cpp:7568] Task 
> e8641b1f-f67f-42fe-821c-09e5a290fc60 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756386 25452 master.cpp:7568] Task 
> f838a03c-5cd4-47eb-8606-69b004d89808 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756392 25452 master.cpp:7568] Task 
> 685ca5da-fa24-494d-a806-06e03bbf00bd of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756397 25452 master.cpp:7568] Task 
> 65ccf39b-5c46-4121-9fdd-21570e8068e6 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> F0920 14:58:54.756404 25452 master.cpp:7601] Check failed: 
> !framework->recovered()
> *** Check failure stack trace: ***
> @ 0x7f7bf80087ed  google::LogMessage::Fail()
> @ 0x7f7bf800a5a0  google::LogMessage::SendToLog()
> @ 0x7f7bf80083d3  google::LogMessage::Flush()
> @ 0x7f7bf800afc9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f7bf736fe7e  
> mesos::internal::master::Master::reconcileKnownSlave()
> @ 0x7f7bf739e612  mesos::internal::master::Master::_reregisterSlave()
> @ 0x7f7bf73a580e  
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master6MasterERKNS5_9SlaveInfoERKNS0_4UPIDERK6OptionINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIc
> RKSt6vectorINS5_8ResourceESaISQ_EERKSP_INS5_12ExecutorInfoESaISV_EERKSP_INS5_4TaskESaIS10_EERKSP_INS5_13FrameworkInfoESaIS15_EERKSP_INS6_17Archive_FrameworkESaIS1A_EERKSL_RKSP_INS5_20SlaveInfo_CapabilityESaIS
> 1H_EERKNS0_6FutureIbEES9_SC_SM_SS_SX_S12_S17_S1C_SL_S1J_S1N_EEvRKNS0_3PIDIT_EEMS1R_FvT0_T1_T2_T3_T4_T5_T6_T7_T8_T9_T10_ET11_T12_T13_T14_T15_T16_T17_T18_T19_T20_T21_EUlS2_E_E9_M_invokeERKSt9_Any_dataOS2_
> @ 0x7f7bf7f5e69c  process::ProcessBase::visit()
> @ 0x7f7bf7f71403  process::ProcessManager::resume()
> @ 0x7f7bf7f7c127  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f7bf60b5c80  (unknown)
> @ 0x7f7bf58c86ba  start_thread
> @ 0x7f7bf55fe3dd  (unknown)
> mesos-master.service: Main process exited, code=killed, status=6/ABRT
> mesos-master.service: Unit entered failed state.
> mesos-master.service: Failed with result 'signal'.```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7991) fatal, check failed !framework->recovered()

2017-09-25 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7991:
--
Priority: Blocker  (was: Major)

> fatal, check failed !framework->recovered()
> ---
>
> Key: MESOS-7991
> URL: https://issues.apache.org/jira/browse/MESOS-7991
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jack Crawford
>Assignee: Kapil Arya
>Priority: Blocker
>
> mesos master crashed on what appears to be framework recovery
> mesos master version: 1.3.1
> mesos agent version: 1.3.1
> ```
> W0920 14:58:54.756364 25452 master.cpp:7568] Task 
> 862181ec-dffb-4c03-8807-5fb4c4e9a907 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756369 25452 master.cpp:7568] Task 
> 9c21c48a-63ad-4d58-9e22-f720af19a644 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756376 25452 master.cpp:7568] Task 
> 05c451f8-c48a-47bd-a235-0ceb9b3f8d0c of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756381 25452 master.cpp:7568] Task 
> e8641b1f-f67f-42fe-821c-09e5a290fc60 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756386 25452 master.cpp:7568] Task 
> f838a03c-5cd4-47eb-8606-69b004d89808 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756392 25452 master.cpp:7568] Task 
> 685ca5da-fa24-494d-a806-06e03bbf00bd of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756397 25452 master.cpp:7568] Task 
> 65ccf39b-5c46-4121-9fdd-21570e8068e6 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> F0920 14:58:54.756404 25452 master.cpp:7601] Check failed: 
> !framework->recovered()
> *** Check failure stack trace: ***
> @ 0x7f7bf80087ed  google::LogMessage::Fail()
> @ 0x7f7bf800a5a0  google::LogMessage::SendToLog()
> @ 0x7f7bf80083d3  google::LogMessage::Flush()
> @ 0x7f7bf800afc9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f7bf736fe7e  
> mesos::internal::master::Master::reconcileKnownSlave()
> @ 0x7f7bf739e612  mesos::internal::master::Master::_reregisterSlave()
> @ 0x7f7bf73a580e  
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master6MasterERKNS5_9SlaveInfoERKNS0_4UPIDERK6OptionINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIc
> RKSt6vectorINS5_8ResourceESaISQ_EERKSP_INS5_12ExecutorInfoESaISV_EERKSP_INS5_4TaskESaIS10_EERKSP_INS5_13FrameworkInfoESaIS15_EERKSP_INS6_17Archive_FrameworkESaIS1A_EERKSL_RKSP_INS5_20SlaveInfo_CapabilityESaIS
> 1H_EERKNS0_6FutureIbEES9_SC_SM_SS_SX_S12_S17_S1C_SL_S1J_S1N_EEvRKNS0_3PIDIT_EEMS1R_FvT0_T1_T2_T3_T4_T5_T6_T7_T8_T9_T10_ET11_T12_T13_T14_T15_T16_T17_T18_T19_T20_T21_EUlS2_E_E9_M_invokeERKSt9_Any_dataOS2_
> @ 0x7f7bf7f5e69c  process::ProcessBase::visit()
> @ 0x7f7bf7f71403  process::ProcessManager::resume()
> @ 0x7f7bf7f7c127  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f7bf60b5c80  (unknown)
> @ 0x7f7bf58c86ba  start_thread
> @ 0x7f7bf55fe3dd  (unknown)
> mesos-master.service: Main process exited, code=killed, status=6/ABRT
> mesos-master.service: Unit entered failed state.
> mesos-master.service: Failed with result 'signal'.```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7991) fatal, check failed !framework->recovered()

2017-09-25 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7991:
--
Sprint: Mesosphere Sprint 65

> fatal, check failed !framework->recovered()
> ---
>
> Key: MESOS-7991
> URL: https://issues.apache.org/jira/browse/MESOS-7991
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jack Crawford
>Assignee: Kapil Arya
>
> mesos master crashed on what appears to be framework recovery
> mesos master version: 1.3.1
> mesos agent version: 1.3.1
> ```
> W0920 14:58:54.756364 25452 master.cpp:7568] Task 
> 862181ec-dffb-4c03-8807-5fb4c4e9a907 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756369 25452 master.cpp:7568] Task 
> 9c21c48a-63ad-4d58-9e22-f720af19a644 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756376 25452 master.cpp:7568] Task 
> 05c451f8-c48a-47bd-a235-0ceb9b3f8d0c of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756381 25452 master.cpp:7568] Task 
> e8641b1f-f67f-42fe-821c-09e5a290fc60 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756386 25452 master.cpp:7568] Task 
> f838a03c-5cd4-47eb-8606-69b004d89808 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756392 25452 master.cpp:7568] Task 
> 685ca5da-fa24-494d-a806-06e03bbf00bd of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756397 25452 master.cpp:7568] Task 
> 65ccf39b-5c46-4121-9fdd-21570e8068e6 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> F0920 14:58:54.756404 25452 master.cpp:7601] Check failed: 
> !framework->recovered()
> *** Check failure stack trace: ***
> @ 0x7f7bf80087ed  google::LogMessage::Fail()
> @ 0x7f7bf800a5a0  google::LogMessage::SendToLog()
> @ 0x7f7bf80083d3  google::LogMessage::Flush()
> @ 0x7f7bf800afc9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f7bf736fe7e  
> mesos::internal::master::Master::reconcileKnownSlave()
> @ 0x7f7bf739e612  mesos::internal::master::Master::_reregisterSlave()
> @ 0x7f7bf73a580e  
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master6MasterERKNS5_9SlaveInfoERKNS0_4UPIDERK6OptionINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIc
> RKSt6vectorINS5_8ResourceESaISQ_EERKSP_INS5_12ExecutorInfoESaISV_EERKSP_INS5_4TaskESaIS10_EERKSP_INS5_13FrameworkInfoESaIS15_EERKSP_INS6_17Archive_FrameworkESaIS1A_EERKSL_RKSP_INS5_20SlaveInfo_CapabilityESaIS
> 1H_EERKNS0_6FutureIbEES9_SC_SM_SS_SX_S12_S17_S1C_SL_S1J_S1N_EEvRKNS0_3PIDIT_EEMS1R_FvT0_T1_T2_T3_T4_T5_T6_T7_T8_T9_T10_ET11_T12_T13_T14_T15_T16_T17_T18_T19_T20_T21_EUlS2_E_E9_M_invokeERKSt9_Any_dataOS2_
> @ 0x7f7bf7f5e69c  process::ProcessBase::visit()
> @ 0x7f7bf7f71403  process::ProcessManager::resume()
> @ 0x7f7bf7f7c127  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f7bf60b5c80  (unknown)
> @ 0x7f7bf58c86ba  start_thread
> @ 0x7f7bf55fe3dd  (unknown)
> mesos-master.service: Main process exited, code=killed, status=6/ABRT
> mesos-master.service: Unit entered failed state.
> mesos-master.service: Failed with result 'signal'.```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8002) Marathon can't start on macOS 11.12.x with Mesos 1.3.0

2017-09-25 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179618#comment-16179618
 ] 

Vinod Kone commented on MESOS-8002:
---

[~tillt] Is this something you can look into?

> Marathon can't start on macOS 11.12.x with Mesos 1.3.0
> --
>
> Key: MESOS-8002
> URL: https://issues.apache.org/jira/browse/MESOS-8002
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.3.0
> Environment: macOS 10.12.x 
>Reporter: Alex Lee
>
> We upgraded our Mesos cluster 1.3.0 and run into the following error when 
> starting Marathon 1.4.7:
> ```
> I0823 17:19:17.498087 101744640 group.cpp:340] Group process 
> (zookeeper-group(1)@127.0.0.1:57708) connected to ZooKeeper
> I0823 17:19:17.498652 101744640 group.cpp:830] Syncing group operations: 
> queue size (joins, cancels, datas) = (0, 0, 0)
> I0823 17:19:17.499153 101744640 group.cpp:418] Trying to create path 
> '/mesos/master' in ZooKeeper
> Assertion failed: (0), function hash, file 
> /BuildRoot/Library/Caches/com.apple.xbs/Sources/cmph/cmph-6/src/hash.c, line 
> 35.
> ```
> This was reported in: https://jira.mesosphere.com/browse/MARATHON-7727
> Interestingly, Marathon was able to start in the same cluster on macOS 
> 10.11.6 host. We were suspecting it's OS version issue initially and open the 
> issue with Apple. But macOS team responded that there may be a regression in 
> mesos. The assertion is being raised in libcmph that libmeso.dylib invokes 
> with providing invalid input and the hash functions in libcmph don’t look 
> like they’ve changed between 10.11.6 and 10.12.6, at least with respect to 
> that assert(0) being around.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8010) FutureTest.After1 is flaky.

2017-09-25 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179610#comment-16179610
 ] 

Vinod Kone commented on MESOS-8010:
---

cc [~benjaminhindman] [~bmahler]

> FutureTest.After1 is flaky.
> ---
>
> Key: MESOS-8010
> URL: https://issues.apache.org/jira/browse/MESOS-8010
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 1.5.0
> Environment: Apple LLVM version 9.0.0 (clang-900.0.37)
>Reporter: Till Toenshoff
>  Labels: flaky, flaky-test
>
> The following just happened in a local test-build of the current master. I 
> did not go deeper in retrying or raising the log-level yet.
> {noformat}
> [ RUN  ] FutureTest.After1
> PC: @0x10d53b127 process::Future<>::Data::clearAllCallbacks()
> *** SIGSEGV (@0xb0) received by PID 49169 (TID 0x7d88f000) stack trace: 
> ***
> @ 0x7fff5ce76f5a _sigtramp
> @ 0x7fb52ce054f0 (unknown)
> @0x10d572be7 process::Future<>::_set<>()
> @0x10d5729d5 process::Future<>::set()
> @0x10d572990 process::Promise<>::_set<>()
> @0x10d572835 process::Promise<>::set()
> @0x10d5727bb AfterTest_Loop_Test::TestBody()::$_4::operator()()
> @0x10d572765 
> _ZZNK7process6FutureINS_11ControlFlowI7NothingEEE9onDiscardIZN19AfterTest_Loop_Test8TestBodyEvE3$_4EERKS4_OT_ENUlvE_clEv
> @0x10d57273d 
> _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZNK7process6FutureINS3_11ControlFlowI7NothingEEE9onDiscardIZN19AfterTest_Loop_Test8TestBodyEvE3$_4EERKS8_OT_EUlvE_EEEvDpOT_
> @0x10d572659 
> _ZNSt3__110__function6__funcIZNK7process6FutureINS2_11ControlFlowI7NothingEEE9onDiscardIZN19AfterTest_Loop_Test8TestBodyEvE3$_4EERKS7_OT_EUlvE_NS_9allocatorISF_EEFvvEEclEv
> @0x10d51982b std::__1::function<>::operator()()
> @0x10d51d405 
> _ZN7process8internal3runINSt3__18functionIFvvEEEJEEEvRKNS2_6vectorIT_NS2_9allocatorIS7_DpOT0_
> @0x10d537c05 process::Future<>::discard()
> @0x10d5591ff process::internal::Loop<>::run()
> @0x10d569be3 
> _ZZN7process8internal4LoopIZN19AfterTest_Loop_Test8TestBodyEvE3$_2ZNS2_8TestBodyEvE3$_37NothingS5_E3runENS_6FutureIS5_EEENKUlRKS8_E_clESA_
> @0x10d56f1ed 
> _ZZNK7process6FutureI7NothingE5onAnyIRZNS_8internal4LoopIZN19AfterTest_Loop_Test8TestBodyEvE3$_2ZNS6_8TestBodyEvE3$_3S1_S1_E3runES2_EUlRKS2_E_vEESB_OT_NS2_6PreferEENUlSB_E_clESB_
> @0x10d56f1bd 
> _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZNK7process6FutureI7NothingE5onAnyIRZNS3_8internal4LoopIZN19AfterTest_Loop_Test8TestBodyEvE3$_2ZNSA_8TestBodyEvE3$_3S5_S5_E3runES6_EUlRKS6_E_vEESF_OT_NS6_6PreferEEUlSF_E_SF_EEEvDpOT_
> @0x10d56ef79 
> _ZNSt3__110__function6__funcIZNK7process6FutureI7NothingE5onAnyIRZNS2_8internal4LoopIZN19AfterTest_Loop_Test8TestBodyEvE3$_2ZNS9_8TestBodyEvE3$_3S4_S4_E3runES5_EUlRKS5_E_vEESE_OT_NS5_6PreferEEUlSE_E_NS_9allocatorISK_EEFvSE_EEclESE_
> @0x10d51918e std::__1::function<>::operator()()
> @0x10d516545 
> _ZN7process8internal3runINSt3__18functionIFvRKNS_6FutureI7NothingEJRS6_EEEvRKNS2_6vectorIT_NS2_9allocatorISD_DpOT0_
> @0x10d51622a process::Future<>::_set<>()
> @0x10d516035 process::Future<>::set()
> @0x10d515ff0 process::Promise<>::_set<>()
> @0x10d515f95 process::Promise<>::set()
> @0x10d515f42 _ZZN7process5afterERK8DurationENKUlvE_clEv
> @0x10d515efd 
> _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN7process5afterERK8DurationEUlvE_EEEvDpOT_
> @0x10d515c19 
> _ZNSt3__110__function6__funcIZN7process5afterERK8DurationEUlvE_NS_9allocatorIS6_EEFvvEEclEv
> @0x10d51982b std::__1::function<>::operator()()
> @0x10e2a7c19 process::Timer::operator()()
> @0x10e2a762a process::timedout()
> @0x10e390265 
> _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRNS_6__bindIPFvRKNS_4listIN7process5TimerENS_9allocatorIS6_EJRKNS_12placeholders4__phILi1EESB_EEEvDpOT_
> @0x10e38ff79 
> _ZNSt3__110__function6__funcINS_6__bindIPFvRKNS_4listIN7process5TimerENS_9allocatorIS5_EJRKNS_12placeholders4__phILi1EENS6_ISI_EESB_EclESA_
> make[6]: *** [check-local] Segmentation fault: 11
> make[5]: *** [check-am] Error 2
> make[4]: *** [check-recursive] Error 1
> make[3]: *** [check] Error 2
> make[2]: *** [check-recursive] Error 1
> make[1]: *** [check] Error 2
> make: *** [check-recursive] Error 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-8011) Enabling Port mapping generate segfault

2017-09-25 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179530#comment-16179530
 ] 

Jie Yu edited comment on MESOS-8011 at 9/25/17 6:37 PM:


I haven't seen this assertion failure before. What's the compiler version you 
used?


was (Author: jieyu):
I haven't seen this assertion failure before. 

> Enabling Port mapping generate segfault 
> 
>
> Key: MESOS-8011
> URL: https://issues.apache.org/jira/browse/MESOS-8011
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, network
>Affects Versions: 1.3.0, 1.3.1, 1.4.0
>Reporter: Jean-Baptiste
>  Labels: core, isolation, reliability
>
> h2. Overview
> After a succesful build of Mesos in the different versions (1.3.0 / 1.3.1 / 
> 1.4.0 / 1.5.0), I still get stuck with the following segfault when starting 
> the Mesos agent:
> h2. Environment
> * *Debian* Linux 8.7 (Jessie)
> * *Kernel* 4.12 (also tried with 3.16 and 4.9)
> * *Mesos* 1.3.0 (also tried with 1.3.1, 1.4.0 and 1.5.0)
> * *Libnl* 3.2.27-2
> h2. Stack trace
> {code}
> Sep 25 12:41:46 ip-10-43-20-218 systemd[1]: Starting Mesos Slave...
> Sep 25 12:41:46 ip-10-43-20-218 systemd[1]: Started Mesos Slave.
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: WARNING: Logging before 
> InitGoogleLogging() is written to STDERR
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: W0925 12:41:46.510066  
> 2717 parse.hpp:97] Specifying an absolute filename to read a command line 
> option out of without using 'file:// is deprecated and will be removed in a 
> future release. Simply adding 'file://' to the beginning of the path should 
> eliminate this warning.
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.510259  
> 2717 main.cpp:322] Build: 2017-09-04 19:29:27 by pbuilder
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.510275  
> 2717 main.cpp:323] Version: 1.3.1
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.511230  
> 2717 logging.cpp:194] INFO level logging started!
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.517127  
> 2717 systemd.cpp:238] systemd version `215` detected
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: W0925 12:41:46.517174  
> 2717 systemd.cpp:246] Required functionality `Delegate` was introduced in 
> Version `218`. Your system may not function properly; however since some 
> distributions have patched systemd packages, your system may still be 
> functional. This is why we keep running. See MESOS-3352 for more information
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.517293  
> 2717 main.cpp:432] Inializing systemd state
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.520074  
> 2717 systemd.cpp:326] Started systemd slice `mesos_executors.slice`
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: W0925 12:41:46.611994  
> 2717 containerizer.cpp:189] 'posix/disk' has been renamed as 'disk/du', 
> please update your --isolation flag to use 'disk/du'
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.612027  
> 2717 containerizer.cpp:221] Using isolation: 
> cgroups/cpu,posix/mem,posix/disk,network/port_mapping,filesystem/posix
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.615073  
> 2717 linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer 
> hierarchy for the Linux launcher
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.615413  
> 2717 provisioner.cpp:249] Using default backend 'overlay'
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: mesos-slave: 
> ../3rdparty/boost-1.53.0/boost/icl/concept/interval.hpp:586: typename 
> boost::enable_if::type 
> boost::icl::non_empty::exclusive_less(const Type&, const Type&) [with Type = 
> Interval; typename 
> boost::enable_if::type = 
> bool]: Assertion `!(icl::is_empty(left) || icl::is_empty(right))' failed.
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: *** Aborted at 1506343306 
> (unix time) try "date -d @1506343306" if you are using GNU date ***
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: PC: @ 0x7f27069d1067 
> (unknown)
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: *** SIGABRT (@0xa9d) 
> received by PID 2717 (TID 0x7f270a0a2800) from PID 2717; stack trace: ***
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f2706d56890 
> (unknown)
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27069d1067 
> (unknown)
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27069d2448 
> (unknown)
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27069ca266 
> (unknown)
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 

[jira] [Commented] (MESOS-8011) Enabling Port mapping generate segfault

2017-09-25 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179530#comment-16179530
 ] 

Jie Yu commented on MESOS-8011:
---

I haven't seen this assertion failure before. 

> Enabling Port mapping generate segfault 
> 
>
> Key: MESOS-8011
> URL: https://issues.apache.org/jira/browse/MESOS-8011
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, network
>Affects Versions: 1.3.0, 1.3.1, 1.4.0
>Reporter: Jean-Baptiste
>  Labels: core, isolation, reliability
>
> h2. Overview
> After a succesful build of Mesos in the different versions (1.3.0 / 1.3.1 / 
> 1.4.0 / 1.5.0), I still get stuck with the following segfault when starting 
> the Mesos agent:
> h2. Environment
> * *Debian* Linux 8.7 (Jessie)
> * *Kernel* 4.12 (also tried with 3.16 and 4.9)
> * *Mesos* 1.3.0 (also tried with 1.3.1, 1.4.0 and 1.5.0)
> * *Libnl* 3.2.27-2
> h2. Stack trace
> {code}
> Sep 25 12:41:46 ip-10-43-20-218 systemd[1]: Starting Mesos Slave...
> Sep 25 12:41:46 ip-10-43-20-218 systemd[1]: Started Mesos Slave.
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: WARNING: Logging before 
> InitGoogleLogging() is written to STDERR
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: W0925 12:41:46.510066  
> 2717 parse.hpp:97] Specifying an absolute filename to read a command line 
> option out of without using 'file:// is deprecated and will be removed in a 
> future release. Simply adding 'file://' to the beginning of the path should 
> eliminate this warning.
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.510259  
> 2717 main.cpp:322] Build: 2017-09-04 19:29:27 by pbuilder
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.510275  
> 2717 main.cpp:323] Version: 1.3.1
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.511230  
> 2717 logging.cpp:194] INFO level logging started!
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.517127  
> 2717 systemd.cpp:238] systemd version `215` detected
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: W0925 12:41:46.517174  
> 2717 systemd.cpp:246] Required functionality `Delegate` was introduced in 
> Version `218`. Your system may not function properly; however since some 
> distributions have patched systemd packages, your system may still be 
> functional. This is why we keep running. See MESOS-3352 for more information
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.517293  
> 2717 main.cpp:432] Inializing systemd state
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.520074  
> 2717 systemd.cpp:326] Started systemd slice `mesos_executors.slice`
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: W0925 12:41:46.611994  
> 2717 containerizer.cpp:189] 'posix/disk' has been renamed as 'disk/du', 
> please update your --isolation flag to use 'disk/du'
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.612027  
> 2717 containerizer.cpp:221] Using isolation: 
> cgroups/cpu,posix/mem,posix/disk,network/port_mapping,filesystem/posix
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.615073  
> 2717 linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer 
> hierarchy for the Linux launcher
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.615413  
> 2717 provisioner.cpp:249] Using default backend 'overlay'
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: mesos-slave: 
> ../3rdparty/boost-1.53.0/boost/icl/concept/interval.hpp:586: typename 
> boost::enable_if::type 
> boost::icl::non_empty::exclusive_less(const Type&, const Type&) [with Type = 
> Interval; typename 
> boost::enable_if::type = 
> bool]: Assertion `!(icl::is_empty(left) || icl::is_empty(right))' failed.
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: *** Aborted at 1506343306 
> (unix time) try "date -d @1506343306" if you are using GNU date ***
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: PC: @ 0x7f27069d1067 
> (unknown)
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: *** SIGABRT (@0xa9d) 
> received by PID 2717 (TID 0x7f270a0a2800) from PID 2717; stack trace: ***
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f2706d56890 
> (unknown)
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27069d1067 
> (unknown)
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27069d2448 
> (unknown)
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27069ca266 
> (unknown)
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27069ca312 
> (unknown)
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f2708d124c3 
> (unknown)
> Sep 25 12:41:46 ip-10-43-20-218 

[jira] [Updated] (MESOS-564) Update Contribution Documentation

2017-09-25 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-564:

Summary: Update Contribution Documentation  (was: Update 'Mesos Developers 
Guide' Contribution Documentation)

> Update Contribution Documentation
> -
>
> Key: MESOS-564
> URL: https://issues.apache.org/jira/browse/MESOS-564
> Project: Mesos
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Dave Lester
>Assignee: Greg Mann
>  Labels: documentation, mesosphere
>
> Our contribution guide is currently fairly verbose, and it focuses on the 
> ReviewBoard workflow for making code contributions. It would be helpful for 
> new contributors to have a first-time contribution guide which focuses on 
> using GitHub PRs to make small contributions, since that workflow has a 
> smaller barrier to entry for new users.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-564) Update 'Mesos Developers Guide' Contribution Documentation

2017-09-25 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-564:

 Sprint: Mesosphere Sprint 64
 Labels: documentation mesosphere  (was: twitter)
Description: Our contribution guide is currently fairly verbose, and it 
focuses on the ReviewBoard workflow for making code contributions. It would be 
helpful for new contributors to have a first-time contribution guide which 
focuses on using GitHub PRs to make small contributions, since that workflow 
has a smaller barrier to entry for new users.
Component/s: documentation
 Issue Type: Improvement  (was: Bug)

> Update 'Mesos Developers Guide' Contribution Documentation
> --
>
> Key: MESOS-564
> URL: https://issues.apache.org/jira/browse/MESOS-564
> Project: Mesos
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Dave Lester
>Assignee: Greg Mann
>  Labels: documentation, mesosphere
>
> Our contribution guide is currently fairly verbose, and it focuses on the 
> ReviewBoard workflow for making code contributions. It would be helpful for 
> new contributors to have a first-time contribution guide which focuses on 
> using GitHub PRs to make small contributions, since that workflow has a 
> smaller barrier to entry for new users.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-564) Update 'Mesos Developers Guide' Contribution Documentation

2017-09-25 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann reassigned MESOS-564:
---

Assignee: Greg Mann

> Update 'Mesos Developers Guide' Contribution Documentation
> --
>
> Key: MESOS-564
> URL: https://issues.apache.org/jira/browse/MESOS-564
> Project: Mesos
>  Issue Type: Bug
>Reporter: Dave Lester
>Assignee: Greg Mann
>  Labels: twitter
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7999) Add and document ability to expose new /monitor modules on agents

2017-09-25 Thread Charles Allen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179286#comment-16179286
 ] 

Charles Allen commented on MESOS-7999:
--

Cool thanks! If that is the way the mesos community wants to sustain going 
forward then this can be considered closed.

> Add and document ability to expose new /monitor modules on agents
> -
>
> Key: MESOS-7999
> URL: https://issues.apache.org/jira/browse/MESOS-7999
> Project: Mesos
>  Issue Type: Wish
>  Components: agent, json api, modules, statistics
>Reporter: Charles Allen
>
> When looking at how to collect data about the cluster, the best way to 
> support functionality similar to Kubernetes DaemonSets is not completely 
> clear.
> One key use case fore DaemonSets is a monitor for system metrics. This ask is 
> that agents are able to have a module which either exposes new endpoints in 
> {{/monitor}} or allows pluggable entries to be added to 
> {{/monitor/statistics}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7999) Add and document ability to expose new /monitor modules on agents

2017-09-25 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179281#comment-16179281
 ] 

James Peach commented on MESOS-7999:


Yes, exactly.

> Add and document ability to expose new /monitor modules on agents
> -
>
> Key: MESOS-7999
> URL: https://issues.apache.org/jira/browse/MESOS-7999
> Project: Mesos
>  Issue Type: Wish
>  Components: agent, json api, modules, statistics
>Reporter: Charles Allen
>
> When looking at how to collect data about the cluster, the best way to 
> support functionality similar to Kubernetes DaemonSets is not completely 
> clear.
> One key use case fore DaemonSets is a monitor for system metrics. This ask is 
> that agents are able to have a module which either exposes new endpoints in 
> {{/monitor}} or allows pluggable entries to be added to 
> {{/monitor/statistics}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7999) Add and document ability to expose new /monitor modules on agents

2017-09-25 Thread Charles Allen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179278#comment-16179278
 ] 

Charles Allen commented on MESOS-7999:
--

[~jamespeach] Thanks, just to make sure I understand, you are suggesting doing 
something like 
https://github.com/apache/mesos/blob/master/src/slave/metrics.cpp but with an 
anonymous module?

> Add and document ability to expose new /monitor modules on agents
> -
>
> Key: MESOS-7999
> URL: https://issues.apache.org/jira/browse/MESOS-7999
> Project: Mesos
>  Issue Type: Wish
>  Components: agent, json api, modules, statistics
>Reporter: Charles Allen
>
> When looking at how to collect data about the cluster, the best way to 
> support functionality similar to Kubernetes DaemonSets is not completely 
> clear.
> One key use case fore DaemonSets is a monitor for system metrics. This ask is 
> that agents are able to have a module which either exposes new endpoints in 
> {{/monitor}} or allows pluggable entries to be added to 
> {{/monitor/statistics}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8011) Enabling Port mapping generate segfault

2017-09-25 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-8011:
---
Labels: core isolation reliability  (was: core isolation)

> Enabling Port mapping generate segfault 
> 
>
> Key: MESOS-8011
> URL: https://issues.apache.org/jira/browse/MESOS-8011
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, network
>Affects Versions: 1.3.0, 1.3.1, 1.4.0
>Reporter: Jean-Baptiste
>  Labels: core, isolation, reliability
>
> h2. Overview
> After a succesful build of Mesos in the different versions (1.3.0 / 1.3.1 / 
> 1.4.0 / 1.5.0), I still get stuck with the following segfault when starting 
> the Mesos agent:
> h2. Environment
> * *Debian* Linux 8.7 (Jessie)
> * *Kernel* 4.12 (also tried with 3.16 and 4.9)
> * *Mesos* 1.3.0 (also tried with 1.3.1, 1.4.0 and 1.5.0)
> * *Libnl* 3.2.27-2
> h2. Stack trace
> {code}
> Sep 25 12:41:46 ip-10-43-20-218 systemd[1]: Starting Mesos Slave...
> Sep 25 12:41:46 ip-10-43-20-218 systemd[1]: Started Mesos Slave.
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: WARNING: Logging before 
> InitGoogleLogging() is written to STDERR
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: W0925 12:41:46.510066  
> 2717 parse.hpp:97] Specifying an absolute filename to read a command line 
> option out of without using 'file:// is deprecated and will be removed in a 
> future release. Simply adding 'file://' to the beginning of the path should 
> eliminate this warning.
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.510259  
> 2717 main.cpp:322] Build: 2017-09-04 19:29:27 by pbuilder
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.510275  
> 2717 main.cpp:323] Version: 1.3.1
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.511230  
> 2717 logging.cpp:194] INFO level logging started!
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.517127  
> 2717 systemd.cpp:238] systemd version `215` detected
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: W0925 12:41:46.517174  
> 2717 systemd.cpp:246] Required functionality `Delegate` was introduced in 
> Version `218`. Your system may not function properly; however since some 
> distributions have patched systemd packages, your system may still be 
> functional. This is why we keep running. See MESOS-3352 for more information
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.517293  
> 2717 main.cpp:432] Inializing systemd state
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.520074  
> 2717 systemd.cpp:326] Started systemd slice `mesos_executors.slice`
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: W0925 12:41:46.611994  
> 2717 containerizer.cpp:189] 'posix/disk' has been renamed as 'disk/du', 
> please update your --isolation flag to use 'disk/du'
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.612027  
> 2717 containerizer.cpp:221] Using isolation: 
> cgroups/cpu,posix/mem,posix/disk,network/port_mapping,filesystem/posix
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.615073  
> 2717 linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer 
> hierarchy for the Linux launcher
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.615413  
> 2717 provisioner.cpp:249] Using default backend 'overlay'
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: mesos-slave: 
> ../3rdparty/boost-1.53.0/boost/icl/concept/interval.hpp:586: typename 
> boost::enable_if::type 
> boost::icl::non_empty::exclusive_less(const Type&, const Type&) [with Type = 
> Interval; typename 
> boost::enable_if::type = 
> bool]: Assertion `!(icl::is_empty(left) || icl::is_empty(right))' failed.
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: *** Aborted at 1506343306 
> (unix time) try "date -d @1506343306" if you are using GNU date ***
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: PC: @ 0x7f27069d1067 
> (unknown)
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: *** SIGABRT (@0xa9d) 
> received by PID 2717 (TID 0x7f270a0a2800) from PID 2717; stack trace: ***
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f2706d56890 
> (unknown)
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27069d1067 
> (unknown)
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27069d2448 
> (unknown)
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27069ca266 
> (unknown)
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27069ca312 
> (unknown)
> Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f2708d124c3 
> (unknown)
> Sep 25 12:41:46 ip-10-43-20-218 

[jira] [Created] (MESOS-8013) Add test for blkio statistics

2017-09-25 Thread Qian Zhang (JIRA)
Qian Zhang created MESOS-8013:
-

 Summary: Add test for blkio statistics
 Key: MESOS-8013
 URL: https://issues.apache.org/jira/browse/MESOS-8013
 Project: Mesos
  Issue Type: Task
  Components: cgroups
Reporter: Qian Zhang
Assignee: Qian Zhang


In [MESOS-6162|https://issues.apache.org/jira/browse/MESOS-6162], we have added 
the support for cgroups blkio statistics. In this ticket, we'd like to add a 
test to verify the cgroups blkio statistics can be correctly retrieved via 
Mesos containerizer's {{usage()}} method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-7500) Command checks via agent lead to flaky tests.

2017-09-25 Thread Andrei Budnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179000#comment-16179000
 ] 

Andrei Budnik edited comment on MESOS-7500 at 9/25/17 2:18 PM:
---

The issue is caused by recompilation/relinking of an executable by libtool 
wrapper script. E.g. when we launch `mesos-io-switchboard` for the first time, 
executable might be missing, so wrapper script starts to compile/link 
corresponding executable. On slow machines compilation takes quite a while, 
hence these tests become flaky.

One possible solution is to pass [\-\-enable-fast-install=no 
(--disable-fast-install)|http://mdcc.cx/pub/autobook/autobook-latest/html/autobook_85.html]
 as $CONFIGURATION environment variable into docker helper script.


was (Author: abudnik):
The issue is caused by recompilation/relinking of an executable by libtool 
wrapper script. E.g. when we launch `mesos-io-switchboard` for the first time, 
executable might be missing, so wrapper script starts to compile/link 
corresponding executable. On slow machines compilation takes quite a while, 
hence these tests become flaky.

One possible solution is to pass 
[--disable-fast-install|http://mdcc.cx/pub/autobook/autobook-latest/html/autobook_85.html]
 as $CONFIGURATION environment variable into docker helper script.

> Command checks via agent lead to flaky tests.
> -
>
> Key: MESOS-7500
> URL: https://issues.apache.org/jira/browse/MESOS-7500
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rukletsov
>Assignee: Andrei Budnik
>  Labels: check, flaky-test, health-check, mesosphere
>
> Tests that rely on command checks via agent are flaky on Apache CI. Here is 
> an example from one of the failed run: https://pastebin.com/g2mPgYzu



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-8012) Support Znode paths for masters in the new CLI

2017-09-25 Thread Armand Grillet (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Armand Grillet reassigned MESOS-8012:
-

Assignee: (was: Armand Grillet)

> Support Znode paths for masters in the new CLI
> --
>
> Key: MESOS-8012
> URL: https://issues.apache.org/jira/browse/MESOS-8012
> Project: Mesos
>  Issue Type: Improvement
>  Components: cli
>Reporter: Kevin Klues
>
> Right now the new Mesos CLI only works in single master mode with a single 
> master IP and port. We should add support for finding the mesos leader in HA 
> mode by hitting a set of zk instances similar to how {{mesos-resolve}} works.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8011) Enabling Port mapping generate segfault

2017-09-25 Thread Jean-Baptiste (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste updated MESOS-8011:
-
Description: 
h2. Overview
After a succesful build of Mesos in the different versions (1.3.0 / 1.3.1 / 
1.4.0 / 1.5.0), I still get stuck with the following segfault when starting the 
Mesos agent:

h2. Environment
* *Debian* Linux 8.7 (Jessie)
* *Kernel* 4.12 (also tried with 3.16 and 4.9)
* *Mesos* 1.3.0 (also tried with 1.3.1, 1.4.0 and 1.5.0)
* *Libnl* 3.2.27-2

h2. Stack trace
{code}
Sep 25 12:41:46 ip-10-43-20-218 systemd[1]: Starting Mesos Slave...
Sep 25 12:41:46 ip-10-43-20-218 systemd[1]: Started Mesos Slave.
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: WARNING: Logging before 
InitGoogleLogging() is written to STDERR
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: W0925 12:41:46.510066  2717 
parse.hpp:97] Specifying an absolute filename to read a command line option out 
of without using 'file:// is deprecated and will be removed in a future 
release. Simply adding 'file://' to the beginning of the path should eliminate 
this warning.
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.510259  2717 
main.cpp:322] Build: 2017-09-04 19:29:27 by pbuilder
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.510275  2717 
main.cpp:323] Version: 1.3.1
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.511230  2717 
logging.cpp:194] INFO level logging started!
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.517127  2717 
systemd.cpp:238] systemd version `215` detected
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: W0925 12:41:46.517174  2717 
systemd.cpp:246] Required functionality `Delegate` was introduced in Version 
`218`. Your system may not function properly; however since some distributions 
have patched systemd packages, your system may still be functional. This is why 
we keep running. See MESOS-3352 for more information
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.517293  2717 
main.cpp:432] Inializing systemd state
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.520074  2717 
systemd.cpp:326] Started systemd slice `mesos_executors.slice`
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: W0925 12:41:46.611994  2717 
containerizer.cpp:189] 'posix/disk' has been renamed as 'disk/du', please 
update your --isolation flag to use 'disk/du'
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.612027  2717 
containerizer.cpp:221] Using isolation: 
cgroups/cpu,posix/mem,posix/disk,network/port_mapping,filesystem/posix
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.615073  2717 
linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy 
for the Linux launcher
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.615413  2717 
provisioner.cpp:249] Using default backend 'overlay'
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: mesos-slave: 
../3rdparty/boost-1.53.0/boost/icl/concept/interval.hpp:586: typename 
boost::enable_if::type 
boost::icl::non_empty::exclusive_less(const Type&, const Type&) [with Type = 
Interval; typename 
boost::enable_if::type = bool]: 
Assertion `!(icl::is_empty(left) || icl::is_empty(right))' failed.
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: *** Aborted at 1506343306 
(unix time) try "date -d @1506343306" if you are using GNU date ***
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: PC: @ 0x7f27069d1067 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: *** SIGABRT (@0xa9d) 
received by PID 2717 (TID 0x7f270a0a2800) from PID 2717; stack trace: ***
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f2706d56890 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27069d1067 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27069d2448 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27069ca266 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27069ca312 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f2708d124c3 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f2708d126a7 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f2708d4d0dc 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f2708d38dc2 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27089dbe2c 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27089cf201 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f2708944198 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x557ff33c4e7a 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27069bdb45 
(unknown)
Sep 25 

[jira] [Updated] (MESOS-8011) Enabling Port mapping generate segfault

2017-09-25 Thread Jean-Baptiste (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste updated MESOS-8011:
-
Description: 
h2. Overview
After a succesful build of Mesos in the different versions (1.3.0 / 1.3.1 / 
1.4.0 / 1.5.0), I still get stuck with the following segfault when starting the 
`Mesos` agent:

h2. Environment
* *Debian* Linux `8.7` (Jessie)
* *Kernel* `4.12` (also tried with 3.16 and 4.9)
* *Mesos* `1.3.0` (also tried with 1.3.1, 1.4.0 and 1.5.0)
* *Libnl* `3.2.27-2`

h2. Stack trace
{code}
Sep 25 12:41:46 ip-10-43-20-218 systemd[1]: Starting Mesos Slave...
Sep 25 12:41:46 ip-10-43-20-218 systemd[1]: Started Mesos Slave.
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: WARNING: Logging before 
InitGoogleLogging() is written to STDERR
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: W0925 12:41:46.510066  2717 
parse.hpp:97] Specifying an absolute filename to read a command line option out 
of without using 'file:// is deprecated and will be removed in a future 
release. Simply adding 'file://' to the beginning of the path should eliminate 
this warning.
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.510259  2717 
main.cpp:322] Build: 2017-09-04 19:29:27 by pbuilder
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.510275  2717 
main.cpp:323] Version: 1.3.1
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.511230  2717 
logging.cpp:194] INFO level logging started!
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.517127  2717 
systemd.cpp:238] systemd version `215` detected
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: W0925 12:41:46.517174  2717 
systemd.cpp:246] Required functionality `Delegate` was introduced in Version 
`218`. Your system may not function properly; however since some distributions 
have patched systemd packages, your system may still be functional. This is why 
we keep running. See MESOS-3352 for more information
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.517293  2717 
main.cpp:432] Inializing systemd state
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.520074  2717 
systemd.cpp:326] Started systemd slice `mesos_executors.slice`
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: W0925 12:41:46.611994  2717 
containerizer.cpp:189] 'posix/disk' has been renamed as 'disk/du', please 
update your --isolation flag to use 'disk/du'
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.612027  2717 
containerizer.cpp:221] Using isolation: 
cgroups/cpu,posix/mem,posix/disk,network/port_mapping,filesystem/posix
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.615073  2717 
linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy 
for the Linux launcher
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.615413  2717 
provisioner.cpp:249] Using default backend 'overlay'
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: mesos-slave: 
../3rdparty/boost-1.53.0/boost/icl/concept/interval.hpp:586: typename 
boost::enable_if::type 
boost::icl::non_empty::exclusive_less(const Type&, const Type&) [with Type = 
Interval; typename 
boost::enable_if::type = bool]: 
Assertion `!(icl::is_empty(left) || icl::is_empty(right))' failed.
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: *** Aborted at 1506343306 
(unix time) try "date -d @1506343306" if you are using GNU date ***
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: PC: @ 0x7f27069d1067 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: *** SIGABRT (@0xa9d) 
received by PID 2717 (TID 0x7f270a0a2800) from PID 2717; stack trace: ***
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f2706d56890 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27069d1067 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27069d2448 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27069ca266 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27069ca312 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f2708d124c3 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f2708d126a7 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f2708d4d0dc 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f2708d38dc2 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27089dbe2c 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27089cf201 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f2708944198 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x557ff33c4e7a 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27069bdb45 

[jira] [Updated] (MESOS-7500) Command checks via agent lead to flaky tests.

2017-09-25 Thread Andrei Budnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrei Budnik updated MESOS-7500:
-
Story Points: 8  (was: 5)

> Command checks via agent lead to flaky tests.
> -
>
> Key: MESOS-7500
> URL: https://issues.apache.org/jira/browse/MESOS-7500
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rukletsov
>Assignee: Andrei Budnik
>  Labels: check, flaky-test, health-check, mesosphere
>
> Tests that rely on command checks via agent are flaky on Apache CI. Here is 
> an example from one of the failed run: https://pastebin.com/g2mPgYzu



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-7500) Command checks via agent lead to flaky tests.

2017-09-25 Thread Andrei Budnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrei Budnik reassigned MESOS-7500:


Assignee: Andrei Budnik  (was: Gastón Kleiman)

> Command checks via agent lead to flaky tests.
> -
>
> Key: MESOS-7500
> URL: https://issues.apache.org/jira/browse/MESOS-7500
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rukletsov
>Assignee: Andrei Budnik
>  Labels: check, flaky-test, health-check, mesosphere
>
> Tests that rely on command checks via agent are flaky on Apache CI. Here is 
> an example from one of the failed run: https://pastebin.com/g2mPgYzu



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8012) Support Znode paths for masters in the new CLI

2017-09-25 Thread Kevin Klues (JIRA)
Kevin Klues created MESOS-8012:
--

 Summary: Support Znode paths for masters in the new CLI
 Key: MESOS-8012
 URL: https://issues.apache.org/jira/browse/MESOS-8012
 Project: Mesos
  Issue Type: Improvement
  Components: cli
Reporter: Kevin Klues
Assignee: Armand Grillet


Right now the new Mesos CLI only works in single master mode with a single 
master IP and port. We should add support for finding the mesos leader in HA 
mode by hitting a set of zk instances similar to how {{mesos-resolve}} works.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7500) Command checks via agent lead to flaky tests.

2017-09-25 Thread Andrei Budnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179000#comment-16179000
 ] 

Andrei Budnik commented on MESOS-7500:
--

The issue is caused by recompilation/relinking of an executable by libtool 
wrapper script. E.g. when we launch `mesos-io-switchboard` for the first time, 
executable might be missing, so wrapper script starts to compile/link 
corresponding executable. On slow machines compilation takes quite a while, 
hence these tests become flaky.

One possible solution is to pass 
[--disable-fast-install|http://mdcc.cx/pub/autobook/autobook-latest/html/autobook_85.html]
 as $CONFIGURATION environment variable into docker helper script.

> Command checks via agent lead to flaky tests.
> -
>
> Key: MESOS-7500
> URL: https://issues.apache.org/jira/browse/MESOS-7500
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rukletsov
>Assignee: Gastón Kleiman
>  Labels: check, flaky-test, health-check, mesosphere
>
> Tests that rely on command checks via agent are flaky on Apache CI. Here is 
> an example from one of the failed run: https://pastebin.com/g2mPgYzu



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8011) Enabling Port mapping generate segfault

2017-09-25 Thread Jean-Baptiste (JIRA)
Jean-Baptiste created MESOS-8011:


 Summary: Enabling Port mapping generate segfault 
 Key: MESOS-8011
 URL: https://issues.apache.org/jira/browse/MESOS-8011
 Project: Mesos
  Issue Type: Bug
  Components: agent, network
Affects Versions: 1.4.0, 1.3.1, 1.3.0
Reporter: Jean-Baptiste


h2. Overview
After a succesful build of Mesos in the different versions (1.3.0 / 1.3.1 / 
1.4.0 / 1.5.0), I still get stuck with the following segfault when starting the 
`Mesos` agent:

h2. Environment
* Debian Linux `8.7` (Jessie)
* Kernel `4.12` (also tried with 3.16 and 4.9)
* Mesos `1.3.0` (also tried with 1.3.1, 1.4.0 and 1.5.0)
* Libnl `3.2.27-2`

h2. Stack trace
{code}
Sep 25 12:41:46 ip-10-43-20-218 systemd[1]: Starting Mesos Slave...
Sep 25 12:41:46 ip-10-43-20-218 systemd[1]: Started Mesos Slave.
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: WARNING: Logging before 
InitGoogleLogging() is written to STDERR
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: W0925 12:41:46.510066  2717 
parse.hpp:97] Specifying an absolute filename to read a command line option out 
of without using 'file:// is deprecated and will be removed in a future 
release. Simply adding 'file://' to the beginning of the path should eliminate 
this warning.
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.510259  2717 
main.cpp:322] Build: 2017-09-04 19:29:27 by pbuilder
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.510275  2717 
main.cpp:323] Version: 1.3.1
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.511230  2717 
logging.cpp:194] INFO level logging started!
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.517127  2717 
systemd.cpp:238] systemd version `215` detected
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: W0925 12:41:46.517174  2717 
systemd.cpp:246] Required functionality `Delegate` was introduced in Version 
`218`. Your system may not function properly; however since some distributions 
have patched systemd packages, your system may still be functional. This is why 
we keep running. See MESOS-3352 for more information
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.517293  2717 
main.cpp:432] Inializing systemd state
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.520074  2717 
systemd.cpp:326] Started systemd slice `mesos_executors.slice`
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: W0925 12:41:46.611994  2717 
containerizer.cpp:189] 'posix/disk' has been renamed as 'disk/du', please 
update your --isolation flag to use 'disk/du'
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.612027  2717 
containerizer.cpp:221] Using isolation: 
cgroups/cpu,posix/mem,posix/disk,network/port_mapping,filesystem/posix
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.615073  2717 
linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy 
for the Linux launcher
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: I0925 12:41:46.615413  2717 
provisioner.cpp:249] Using default backend 'overlay'
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: mesos-slave: 
../3rdparty/boost-1.53.0/boost/icl/concept/interval.hpp:586: typename 
boost::enable_if::type 
boost::icl::non_empty::exclusive_less(const Type&, const Type&) [with Type = 
Interval; typename 
boost::enable_if::type = bool]: 
Assertion `!(icl::is_empty(left) || icl::is_empty(right))' failed.
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: *** Aborted at 1506343306 
(unix time) try "date -d @1506343306" if you are using GNU date ***
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: PC: @ 0x7f27069d1067 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: *** SIGABRT (@0xa9d) 
received by PID 2717 (TID 0x7f270a0a2800) from PID 2717; stack trace: ***
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f2706d56890 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27069d1067 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27069d2448 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27069ca266 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27069ca312 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f2708d124c3 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f2708d126a7 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f2708d4d0dc 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f2708d38dc2 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27089dbe2c 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 0x7f27089cf201 
(unknown)
Sep 25 12:41:46 ip-10-43-20-218 mesos-slave[2754]: @ 

[jira] [Commented] (MESOS-7130) port_mapping isolator: executor hangs when running on EC2

2017-09-25 Thread Pierre Cheynier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16178853#comment-16178853
 ] 

Pierre Cheynier commented on MESOS-7130:


Interesting feedback, I had no time to pursue on that in February, I'll try to 
see if it fixes the issue in my case.

> port_mapping isolator: executor hangs when running on EC2
> -
>
> Key: MESOS-7130
> URL: https://issues.apache.org/jira/browse/MESOS-7130
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Reporter: Pierre Cheynier
>
> Hi,
> I'm experiencing a weird issue: I'm using a CI to do testing on 
> infrastructure automation.
> I recently activated the {{network/port_mapping}} isolator.
> I'm able to make the changes work and pass the test for bare-metal servers 
> and virtualbox VMs using this configuration.
> But when I try on EC2 (on which my CI pipeline rely) it systematically fails 
> to run any container.
> It appears that the sandbox is created and the port_mapping isolator seems to 
> be OK according to the logs in stdout and stderr and the {tc} output :
> {noformat}
> + mount --make-rslave /run/netns
> + test -f /proc/sys/net/ipv6/conf/all/disable_ipv6
> + echo 1
> + ip link set lo address 02:44:20:bb:42:cf mtu 9001 up
> + ethtool -K eth0 rx off
> (...)
> + tc filter show dev eth0 parent :0
> + tc filter show dev lo parent :0
> I0215 16:01:13.941375 1 exec.cpp:161] Version: 1.0.2
> {noformat}
> Then the executor never come back in REGISTERED state and hang indefinitely.
> {GLOG_v=3} doesn't help here.
> My skills in this area are limited, but trying to load the symbols and attach 
> a gdb to the mesos-executor process, I'm able to print this stack:
> {noformat}
> #0  0x7feffc1386d5 in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /usr/lib64/libpthread.so.0
> #1  0x7feffbed69ec in 
> std::condition_variable::wait(std::unique_lock&) () from 
> /usr/lib64/libstdc++.so.6
> #2  0x7ff0003dd8ec in void synchronized_wait std::mutex>(std::condition_variable*, std::mutex*) () from 
> /usr/lib64/libmesos-1.0.2.so
> #3  0x7ff0017d595d in Gate::arrive(long) () from 
> /usr/lib64/libmesos-1.0.2.so
> #4  0x7ff0017c00ed in process::ProcessManager::wait(process::UPID const&) 
> () from /usr/lib64/libmesos-1.0.2.so
> #5  0x7ff0017c5c05 in process::wait(process::UPID const&, Duration 
> const&) () from /usr/lib64/libmesos-1.0.2.so
> #6  0x004ab26f in process::wait(process::ProcessBase const*, Duration 
> const&) ()
> #7  0x004a3903 in main ()
> {noformat}
> I concluded that the underlying shell script launched by the isolator or the 
> task itself is just .. blocked. But I don't understand why.
> Here is a process tree to show that I've no task running but the executor is:
> {noformat}
> root 28420  0.8  3.0 1061420 124940 ?  Ssl  17:56   0:25 
> /usr/sbin/mesos-slave --advertise_ip=127.0.0.1 
> --attributes=platform:centos;platform_major_version:7;type:base 
> --cgroups_enable_cfs --cgroups_hierarchy=/sys/fs/cgroup 
> --cgroups_net_cls_primary_handle=0xC370 
> --container_logger=org_apache_mesos_LogrotateContainerLogger 
> --containerizers=mesos,docker 
> --credential=file:///etc/mesos-chef/slave-credential 
> --default_container_info={"type":"MESOS","volumes":[{"host_path":"tmp","container_path":"/tmp","mode":"RW"}]}
>  --default_role=default --docker_registry=/usr/share/mesos/users 
> --docker_store_dir=/var/opt/mesos/store/docker 
> --egress_unique_flow_per_container --enforce_container_disk_quota 
> --ephemeral_ports_per_container=128 
> --executor_environment_variables={"PATH":"/bin:/usr/bin:/usr/sbin","CRITEO_DC":"par","CRITEO_ENV":"prod"}
>  --image_providers=docker --image_provisioner_backend=copy 
> --isolation=cgroups/cpu,cgroups/mem,cgroups/net_cls,namespaces/pid,disk/du,filesystem/shared,filesystem/linux,docker/runtime,network/cni,network/port_mapping
>  --logging_level=INFO 
> --master=zk://mesos:test@localhost.localdomain:2181/mesos 
> --modules=file:///etc/mesos-chef/slave-modules.json --port=5051 
> --recover=reconnect 
> --resources=ports:[31000-32000];ephemeral_ports:[32768-57344] --strict 
> --work_dir=/var/opt/mesos
> root 28484  0.0  2.3 433676 95016 ?Ssl  17:56   0:00  \_ 
> mesos-logrotate-logger --help=false 
> --log_filename=/var/opt/mesos/slaves/cdf94219-87b2-4af2-9f61-5697f0442915-S0/frameworks/366e8ed2-730e-4423-9324-086704d182b0-/executors/group_simplehttp.16f7c2ee-f3a8-11e6-be1c-0242b44d071f/runs/1d3e6b1c-cda8-47e5-92c4-a161429a7ac6/stdout
>  --logrotate_options=rotate 5 --logrotate_path=logrotate --max_size=10MB
> root 28485  0.0  2.3 499212 94724 ?Ssl  17:56   0:00  \_ 
> mesos-logrotate-logger --help=false 
>