[jira] [Commented] (MESOS-5400) Add preliminary support for parsing ELF files in stout.

2016-06-16 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335449#comment-15335449
 ] 

Kevin Klues commented on MESOS-5400:


Need to patch up stout/Makefile.am as a regression to the previous patch set:
https://reviews.apache.org/r/48838/

> Add preliminary support for parsing ELF files in stout.
> ---
>
> Key: MESOS-5400
> URL: https://issues.apache.org/jira/browse/MESOS-5400
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>Priority: Minor
> Fix For: 1.0.0
>
>
> The upcoming Nvidia GPU support for docker containers in Mesos relies on 
> consolidating all Nvidia shared libraries into a common location for 
> injecting a volume into a container.
> As part of this, we need some preliminary parsing capabilities for ELF file 
> to infer things about each shared library we are consolidating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5629) Agent segfaults after request to '/files/browse'

2016-06-16 Thread Joerg Schad (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joerg Schad updated MESOS-5629:
---
Description: 
We observed a number of agent segfaults today on an internal testing cluster. 
Here is a log excerpt:
{code}
Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.522925 24830 
status_update_manager.cpp:392] Received status update acknowledgement (UUID: 
e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
6d4248cd-2832-4152-b5d0-defbf36f6759-
Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.523006 24830 
status_update_manager.cpp:824] Checkpointing ACK for status update TASK_RUNNING 
(UUID: e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
6d4248cd-2832-4152-b5d0-defbf36f6759-
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:29.147181 24824 
http.cpp:192] HTTP GET for /slave(1)/state from 10.10.0.87:33356
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: *** Aborted at 1466097149 
(unix time) try "date -d @1466097149" if you are using GNU date ***
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: PC: @ 0x7ff4d68b12a3 
(unknown)
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: *** SIGSEGV (@0x0) received 
by PID 24818 (TID 0x7ff4d31ab700) from PID 0; stack trace: ***
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6431100 (unknown)
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d68b12a3 (unknown)
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7eced33 
process::dispatch<>()
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7e7aad7 
_ZNSt17_Function_handlerIFN7process6FutureIbEERK6OptionISsEEZN5mesos8internal5slave9Framework15recoverExecutorERKNSA_5state13ExecutorStateEEUlS6_E_E9_M_invokeERKSt9_Any_dataS6_
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd1752 
mesos::internal::FilesProcess::authorize()
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd1bea 
mesos::internal::FilesProcess::browse()
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd6e43 
std::_Function_handler<>::_M_invoke()
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d85478cb 
_ZZZN7process11ProcessBase5visitERKNS_9HttpEventEENKUlRKNS_6FutureI6OptionINS_4http14authentication20AuthenticationResultE0_clESC_ENKUlRKNS4_IbEEE1_clESG_
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d8551341 
process::ProcessManager::resume()
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d8551647 
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6909220 (unknown)
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6429dc5 
start_thread
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d615728d __clone
Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service: main 
process exited, code=killed, status=11/SEGV
Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: Unit dcos-mesos-slave.service entered 
failed state.
Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service failed.
Jun 16 17:12:34 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service holdoff time 
over, scheduling restart.
{code}

In every case, the stack trace indicates one of the {{/files/*}} endpoints; I 
observed this a number of times coming from {{browse()}}, and twice from 
{{read()}}.

The agent was built from the 1.0.0-rc1 branch, with two cherry-picks applied: 
[this|https://reviews.apache.org/r/48563/] and 
[this|https://reviews.apache.org/r/48566/], which were done to repair a 
different [segfault issue|https://issues.apache.org/jira/browse/MESOS-5587] on 
the master and agent.

Thanks go to [~bmahler] for digging into this a bit and discovering a possible 
cause 
[here|https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L5737-L5745],
 where use of {{defer()}} may be necessary to keep execution in the correct 
context.

  was:
We observed a number of agent segfaults today on an internal testing cluster. 
Here is a log excerpt:
{code}
Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.522925 24830 
status_update_manager.cpp:392] Received status update acknowledgement (UUID: 
e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
6d4248cd-2832-4152-b5d0-defbf36f6759-
Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.523006 24830 
status_update_manager.cpp:824] Checkpointing ACK for status update TASK_RUNNING 
(UUID: e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
6d4248cd-2832-4152-b5d0-defbf36f6759-
Jun 16 17:12:29 ip-10-10-0-87 

[jira] [Updated] (MESOS-5629) Agent segfaults after request to '/files/browse'

2016-06-16 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-5629:
-
Assignee: Joerg Schad  (was: Greg Mann)

> Agent segfaults after request to '/files/browse'
> 
>
> Key: MESOS-5629
> URL: https://issues.apache.org/jira/browse/MESOS-5629
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS 7, Mesos 1.0.0-rc1 with patches
>Reporter: Greg Mann
>Assignee: Joerg Schad
>Priority: Blocker
>  Labels: authorization, mesosphere, security
> Fix For: 1.0.0
>
>
> We observed a number of agent segfaults today on an internal testing cluster. 
> Here is a log excerpt:
> {code}
> Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.522925 24830 
> status_update_manager.cpp:392] Received status update acknowledgement (UUID: 
> e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
> datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
> 6d4248cd-2832-4152-b5d0-defbf36f6759-
> Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.523006 24830 
> status_update_manager.cpp:824] Checkpointing ACK for status update 
> TASK_RUNNING (UUID: e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
> datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
> 6d4248cd-2832-4152-b5d0-defbf36f6759-
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:29.147181 24824 
> http.cpp:192] HTTP GET for /slave(1)/state from 10.10.0.87:33356
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: *** Aborted at 1466097149 
> (unix time) try "date -d @1466097149" if you are using GNU date ***
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: PC: @ 0x7ff4d68b12a3 
> (unknown)
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: *** SIGSEGV (@0x0) received 
> by PID 24818 (TID 0x7ff4d31ab700) from PID 0; stack trace: ***
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6431100 
> (unknown)
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d68b12a3 
> (unknown)
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7eced33 
> process::dispatch<>()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7e7aad7 
> _ZNSt17_Function_handlerIFN7process6FutureIbEERK6OptionISsEEZN5mesos8internal5slave9Framework15recoverExecutorERKNSA_5state13ExecutorStateEEUlS6_E_E9_M_invokeERKSt9_Any_dataS6_
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd1752 
> mesos::internal::FilesProcess::authorize()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd1bea 
> mesos::internal::FilesProcess::browse()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd6e43 
> std::_Function_handler<>::_M_invoke()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d85478cb 
> _ZZZN7process11ProcessBase5visitERKNS_9HttpEventEENKUlRKNS_6FutureI6OptionINS_4http14authentication20AuthenticationResultE0_clESC_ENKUlRKNS4_IbEEE1_clESG_
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d8551341 
> process::ProcessManager::resume()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d8551647 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6909220 
> (unknown)
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6429dc5 
> start_thread
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d615728d __clone
> Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service: main 
> process exited, code=killed, status=11/SEGV
> Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: Unit dcos-mesos-slave.service 
> entered failed state.
> Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service failed.
> Jun 16 17:12:34 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service holdoff 
> time over, scheduling restart.
> {code}
> In every case, the stack trace indicates one of the {{/files/*}} endpoints; I 
> observed this a number of times coming from {{browse()}}, and twice from 
> {{read()}}.
> The agent was built from the 1.0.0-rc1 branch, with two cherry-picks applied: 
> [this|https://reviews.apache.org/r/48563/] and 
> [this|https://reviews.apache.org/r/48566/], which were done to repair a 
> different [segfault issue|https://issues.apache.org/jira/browse/MESOS-5587] 
> on the master.
> Thanks go to [~bmahler] for digging into this a bit and discovering a 
> possible cause 
> [here|https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L5737-L5745],
>  where use of {{defer()}} may be necessary to keep execution in the correct 
> context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5629) Agent segfaults after request to '/files/browse'

2016-06-16 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335383#comment-15335383
 ] 

Greg Mann edited comment on MESOS-5629 at 6/17/16 4:37 AM:
---

I was able to reproduce this on my local machine; the segfault occurs when a 
request for a particular path comes in to {{/files/browse}} *while* that path 
is garbage-collected by the agent. This leads to {{FilesProcess::authorize()}} 
attempting to call an authorization callback which has just been removed from 
its {{authorizations}} map. We could avoid such races by changing 
[this|https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L5737-L5745]
 and 
[this|https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L5939-L5946]
 to be deferred dispatches to the {{FilesProcess}} (i.e., {{authorize = 
defer(files, []() ... )}}).

To reproduce, I:
1) Brought the hard disk usage to over 90%, to force GC
2) Launched a master and agent
3) Ran the test-framework once
4) Hit 'agent.host/files/debug' to get a valid path
5) Ran the attached script, 'test-browse.py', to rapidly send requests to 
'agent.host/files/browse' for the valid path
6) While 'test-browse.py' is running, repeatedly run the test-framework until 
the sandbox from its first run is GC'd. When this happens, the segfault will 
likely occur.

Here's the stack trace I generated:
{code}
E0616 20:58:36.976708 183517184 process.cpp:2040] Failed to shutdown socket 
with fd 18: Socket is not connected
I0616 20:58:36.976701 179761152 slave.cpp:3783] executor(1)@127.0.0.1:51568 
exited
E0616 20:58:36.976786 183517184 process.cpp:2040] Failed to shutdown socket 
with fd 10: Socket is not connected
E0616 20:58:36.982058 183517184 process.cpp:2040] Failed to shutdown socket 
with fd 13: Socket is not connected
E0616 20:58:36.985937 183517184 process.cpp:2040] Failed to shutdown socket 
with fd 11: Socket is not connected
*** Aborted at 1466135916 (unix time) try "date -d @1466135916" if you are 
using GNU date ***
PC: @0x102658c48 process::PID<>::PID()
*** SIGSEGV (@0x0) received by PID 21788 (TID 0x10adfe000) stack trace: ***
@ 0x7fff90b6df1a _sigtramp
@0x10203c8ad std::__1::char_traits<>::compare()
@0x102658bcd process::PID<>::PID()
@0x1025de5eb process::Process<>::self()
@0x1033b6ecd process::dispatch<>()
@0x10323f7f0 
mesos::internal::slave::Framework::launchExecutor()::$_12::operator()()
@0x10323f714 
_ZNSt3__128__invoke_void_return_wrapperIN7process6FutureIbEEE6__callIJRZN5mesos8internal5slave9Framework14launchExecutorERKNS6_12ExecutorInfoERKNS6_8TaskInfoEE4$_12RK6OptionINS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcS3_DpOT_
@0x10323f4a3 std::__1::__function::__func<>::operator()()
@0x1024dc667 std::__1::function<>::operator()()
@0x1024c365a mesos::internal::FilesProcess::authorize()
@0x1024bfe70 mesos::internal::FilesProcess::browse()
@0x1024eb297 
_ZNSt3__128__invoke_void_return_wrapperIN7process6FutureINS1_4http8Response6__callIJRNS_6__bindIRMN5mesos8internal12FilesProcessEFS5_RKNS3_7RequestERK6OptionINS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEJPSB_RNS_12placeholders4__phILi1EEERNSU_ILi2EESE_SO_EEES5_DpOT_
@0x1024eae13 
_ZNSt3__110__function6__funcINS_6__bindIRMN5mesos8internal12FilesProcessEFN7process6FutureINS6_4http8ResponseEEERKNS8_7RequestERK6OptionINS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEJPS5_RNS_12placeholders4__phILi1EEERNST_ILi2EENSI_ISY_EEFSA_SD_SN_EEclESD_SN_
@0x104cc4342 std::__1::function<>::operator()()
@0x104beb674 
_ZZZN7process11ProcessBase5visitERKNS_9HttpEventEENK3$_3clERKNS_6FutureI6OptionINS_4http14authentication20AuthenticationResultENKUlRKNS5_IbEEE_clESG_
@0x104beeae1 
_ZZZNK7process9_DeferredIZZNS_11ProcessBase5visitERKNS_9HttpEventEENK3$_3clERKNS_6FutureI6OptionINS_4http14authentication20AuthenticationResultEUlRKNS6_IbEEE_EcvNSt3__18functionIFvT_EEEISH_EEvENKUlSH_E_clESH_ENKUlvE_clEv
@0x104beeaad 
_ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZZNK7process9_DeferredIZZNS3_11ProcessBase5visitERKNS3_9HttpEventEENK3$_3clERKNS3_6FutureI6OptionINS3_4http14authentication20AuthenticationResultEUlRKNSA_IbEEE_EcvNS_8functionIFvT_EEEISL_EEvENKUlSL_E_clESL_EUlvE_EEEvDpOT_
@0x104bee7ec 
_ZNSt3__110__function6__funcIZZNK7process9_DeferredIZZNS2_11ProcessBase5visitERKNS2_9HttpEventEENK3$_3clERKNS2_6FutureI6OptionINS2_4http14authentication20AuthenticationResultEUlRKNS9_IbEEE_EcvNS_8functionIFvT_EEEISK_EEvENKUlSK_E_clESK_EUlvE_NS_9allocatorIST_EEFvvEEclEv
@0x10208a7d1 std::__1::function<>::operator()()
@0x1020ea359 
_ZZN7process8dispatchERKNS_4UPIDERKNSt3__18functionIFvvNKUlPNS_11ProcessBaseEE_clESA_
@

[jira] [Commented] (MESOS-5629) Agent segfaults after request to '/files/browse'

2016-06-16 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335383#comment-15335383
 ] 

Greg Mann commented on MESOS-5629:
--

I was able to reproduce this on my local machine; the segfault occurs when a 
request for a particular path comes in to {{/files/browse}} *while* that path 
is garbage-collected by the agent. This leads to {{FilesProcess::authorize()}} 
attempting to call an authorization callback which has just been removed from 
its {{authorizations}} map. We could avoid such races by changing 
[this|https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L5737-L5745]
 and 
[this|https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L5939-L5946]
 to be deferred dispatches to the {{FilesProcess}} (i.e., {{authorize = 
defer(files, []() ... )}}).

> Agent segfaults after request to '/files/browse'
> 
>
> Key: MESOS-5629
> URL: https://issues.apache.org/jira/browse/MESOS-5629
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS 7, Mesos 1.0.0-rc1 with patches
>Reporter: Greg Mann
>Assignee: Greg Mann
>Priority: Blocker
>  Labels: authorization, mesosphere, security
> Fix For: 1.0.0
>
>
> We observed a number of agent segfaults today on an internal testing cluster. 
> Here is a log excerpt:
> {code}
> Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.522925 24830 
> status_update_manager.cpp:392] Received status update acknowledgement (UUID: 
> e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
> datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
> 6d4248cd-2832-4152-b5d0-defbf36f6759-
> Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.523006 24830 
> status_update_manager.cpp:824] Checkpointing ACK for status update 
> TASK_RUNNING (UUID: e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
> datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
> 6d4248cd-2832-4152-b5d0-defbf36f6759-
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:29.147181 24824 
> http.cpp:192] HTTP GET for /slave(1)/state from 10.10.0.87:33356
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: *** Aborted at 1466097149 
> (unix time) try "date -d @1466097149" if you are using GNU date ***
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: PC: @ 0x7ff4d68b12a3 
> (unknown)
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: *** SIGSEGV (@0x0) received 
> by PID 24818 (TID 0x7ff4d31ab700) from PID 0; stack trace: ***
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6431100 
> (unknown)
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d68b12a3 
> (unknown)
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7eced33 
> process::dispatch<>()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7e7aad7 
> _ZNSt17_Function_handlerIFN7process6FutureIbEERK6OptionISsEEZN5mesos8internal5slave9Framework15recoverExecutorERKNSA_5state13ExecutorStateEEUlS6_E_E9_M_invokeERKSt9_Any_dataS6_
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd1752 
> mesos::internal::FilesProcess::authorize()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd1bea 
> mesos::internal::FilesProcess::browse()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd6e43 
> std::_Function_handler<>::_M_invoke()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d85478cb 
> _ZZZN7process11ProcessBase5visitERKNS_9HttpEventEENKUlRKNS_6FutureI6OptionINS_4http14authentication20AuthenticationResultE0_clESC_ENKUlRKNS4_IbEEE1_clESG_
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d8551341 
> process::ProcessManager::resume()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d8551647 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6909220 
> (unknown)
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6429dc5 
> start_thread
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d615728d __clone
> Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service: main 
> process exited, code=killed, status=11/SEGV
> Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: Unit dcos-mesos-slave.service 
> entered failed state.
> Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service failed.
> Jun 16 17:12:34 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service holdoff 
> time over, scheduling restart.
> {code}
> In every case, the stack trace indicates one of the {{/files/*}} endpoints; I 
> observed this a number of times coming from {{browse()}}, and twice from 
> {{read()}}.
> The agent was built from the 1.0.0-rc1 branch, with 

[jira] [Commented] (MESOS-5619) Add task_num to mesos-execute

2016-06-16 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335358#comment-15335358
 ] 

Klaus Ma commented on MESOS-5619:
-

It's the {{mesos-execute}} (the command scheduler). Update the info accordingly.

> Add task_num to mesos-execute
> -
>
> Key: MESOS-5619
> URL: https://issues.apache.org/jira/browse/MESOS-5619
> Project: Mesos
>  Issue Type: Bug
>  Components: cli
>Reporter: Klaus Ma
>Assignee: Klaus Ma
>
> According to current code, {{mesos-execute}} will only launch one task. It's 
> better to add a parameter to special how many task to launch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5619) Add task_num to mesos-execute

2016-06-16 Thread Klaus Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaus Ma updated MESOS-5619:

Summary: Add task_num to mesos-execute  (was: Add task_num to 
mesos-executor)

> Add task_num to mesos-execute
> -
>
> Key: MESOS-5619
> URL: https://issues.apache.org/jira/browse/MESOS-5619
> Project: Mesos
>  Issue Type: Bug
>  Components: cli
>Reporter: Klaus Ma
>Assignee: Klaus Ma
>
> According to current code, {{mesos-executor}} will only launch one task. It's 
> better to add a parameter to special how many task to launch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5619) Add task_num to mesos-execute

2016-06-16 Thread Klaus Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaus Ma updated MESOS-5619:

Description: According to current code, {{mesos-execute}} will only launch 
one task. It's better to add a parameter to special how many task to launch.  
(was: According to current code, {{mesos-executor}} will only launch one task. 
It's better to add a parameter to special how many task to launch.)

> Add task_num to mesos-execute
> -
>
> Key: MESOS-5619
> URL: https://issues.apache.org/jira/browse/MESOS-5619
> Project: Mesos
>  Issue Type: Bug
>  Components: cli
>Reporter: Klaus Ma
>Assignee: Klaus Ma
>
> According to current code, {{mesos-execute}} will only launch one task. It's 
> better to add a parameter to special how many task to launch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5629) Agent segfaults after request to '/files/browse'

2016-06-16 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335219#comment-15335219
 ] 

Greg Mann commented on MESOS-5629:
--

Sorry [~haosd...@gmail.com]!! Wrong link; should be fixed now :-)

> Agent segfaults after request to '/files/browse'
> 
>
> Key: MESOS-5629
> URL: https://issues.apache.org/jira/browse/MESOS-5629
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS 7, Mesos 1.0.0-rc1 with patches
>Reporter: Greg Mann
>Assignee: Greg Mann
>Priority: Blocker
>  Labels: authorization, mesosphere, security
> Fix For: 1.0.0
>
>
> We observed a number of agent segfaults today on an internal testing cluster. 
> Here is a log excerpt:
> {code}
> Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.522925 24830 
> status_update_manager.cpp:392] Received status update acknowledgement (UUID: 
> e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
> datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
> 6d4248cd-2832-4152-b5d0-defbf36f6759-
> Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.523006 24830 
> status_update_manager.cpp:824] Checkpointing ACK for status update 
> TASK_RUNNING (UUID: e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
> datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
> 6d4248cd-2832-4152-b5d0-defbf36f6759-
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:29.147181 24824 
> http.cpp:192] HTTP GET for /slave(1)/state from 10.10.0.87:33356
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: *** Aborted at 1466097149 
> (unix time) try "date -d @1466097149" if you are using GNU date ***
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: PC: @ 0x7ff4d68b12a3 
> (unknown)
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: *** SIGSEGV (@0x0) received 
> by PID 24818 (TID 0x7ff4d31ab700) from PID 0; stack trace: ***
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6431100 
> (unknown)
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d68b12a3 
> (unknown)
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7eced33 
> process::dispatch<>()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7e7aad7 
> _ZNSt17_Function_handlerIFN7process6FutureIbEERK6OptionISsEEZN5mesos8internal5slave9Framework15recoverExecutorERKNSA_5state13ExecutorStateEEUlS6_E_E9_M_invokeERKSt9_Any_dataS6_
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd1752 
> mesos::internal::FilesProcess::authorize()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd1bea 
> mesos::internal::FilesProcess::browse()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd6e43 
> std::_Function_handler<>::_M_invoke()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d85478cb 
> _ZZZN7process11ProcessBase5visitERKNS_9HttpEventEENKUlRKNS_6FutureI6OptionINS_4http14authentication20AuthenticationResultE0_clESC_ENKUlRKNS4_IbEEE1_clESG_
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d8551341 
> process::ProcessManager::resume()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d8551647 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6909220 
> (unknown)
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6429dc5 
> start_thread
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d615728d __clone
> Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service: main 
> process exited, code=killed, status=11/SEGV
> Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: Unit dcos-mesos-slave.service 
> entered failed state.
> Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service failed.
> Jun 16 17:12:34 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service holdoff 
> time over, scheduling restart.
> {code}
> In every case, the stack trace indicates one of the {{/files/*}} endpoints; I 
> observed this a number of times coming from {{browse()}}, and twice from 
> {{read()}}.
> The agent was built from the 1.0.0-rc1 branch, with two cherry-picks applied: 
> [this|https://reviews.apache.org/r/48563/] and 
> [this|https://reviews.apache.org/r/48566/], which were done to repair a 
> different [segfault issue|https://issues.apache.org/jira/browse/MESOS-5587] 
> on the master.
> Thanks go to [~bmahler] for digging into this a bit and discovering a 
> possible cause 
> [here|https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L5737-L5745],
>  where use of {{defer()}} may be necessary to keep execution in the correct 
> context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5629) Agent segfaults after request to '/files/browse'

2016-06-16 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-5629:
-
Sprint: Mesosphere Sprint 37

> Agent segfaults after request to '/files/browse'
> 
>
> Key: MESOS-5629
> URL: https://issues.apache.org/jira/browse/MESOS-5629
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS 7, Mesos 1.0.0-rc1 with patches
>Reporter: Greg Mann
>Assignee: Greg Mann
>Priority: Blocker
>  Labels: authorization, mesosphere, security
> Fix For: 1.0.0
>
>
> We observed a number of agent segfaults today on an internal testing cluster. 
> Here is a log excerpt:
> {code}
> Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.522925 24830 
> status_update_manager.cpp:392] Received status update acknowledgement (UUID: 
> e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
> datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
> 6d4248cd-2832-4152-b5d0-defbf36f6759-
> Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.523006 24830 
> status_update_manager.cpp:824] Checkpointing ACK for status update 
> TASK_RUNNING (UUID: e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
> datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
> 6d4248cd-2832-4152-b5d0-defbf36f6759-
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:29.147181 24824 
> http.cpp:192] HTTP GET for /slave(1)/state from 10.10.0.87:33356
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: *** Aborted at 1466097149 
> (unix time) try "date -d @1466097149" if you are using GNU date ***
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: PC: @ 0x7ff4d68b12a3 
> (unknown)
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: *** SIGSEGV (@0x0) received 
> by PID 24818 (TID 0x7ff4d31ab700) from PID 0; stack trace: ***
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6431100 
> (unknown)
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d68b12a3 
> (unknown)
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7eced33 
> process::dispatch<>()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7e7aad7 
> _ZNSt17_Function_handlerIFN7process6FutureIbEERK6OptionISsEEZN5mesos8internal5slave9Framework15recoverExecutorERKNSA_5state13ExecutorStateEEUlS6_E_E9_M_invokeERKSt9_Any_dataS6_
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd1752 
> mesos::internal::FilesProcess::authorize()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd1bea 
> mesos::internal::FilesProcess::browse()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd6e43 
> std::_Function_handler<>::_M_invoke()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d85478cb 
> _ZZZN7process11ProcessBase5visitERKNS_9HttpEventEENKUlRKNS_6FutureI6OptionINS_4http14authentication20AuthenticationResultE0_clESC_ENKUlRKNS4_IbEEE1_clESG_
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d8551341 
> process::ProcessManager::resume()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d8551647 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6909220 
> (unknown)
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6429dc5 
> start_thread
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d615728d __clone
> Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service: main 
> process exited, code=killed, status=11/SEGV
> Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: Unit dcos-mesos-slave.service 
> entered failed state.
> Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service failed.
> Jun 16 17:12:34 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service holdoff 
> time over, scheduling restart.
> {code}
> In every case, the stack trace indicates one of the {{/files/*}} endpoints; I 
> observed this a number of times coming from {{browse()}}, and twice from 
> {{read()}}.
> The agent was built from the 1.0.0-rc1 branch, with two cherry-picks applied: 
> [this|https://reviews.apache.org/r/48563/] and 
> [this|https://reviews.apache.org/r/48566/], which were done to repair a 
> different [segfault issue|https://issues.apache.org/jira/browse/MESOS-5587] 
> on the master.
> Thanks go to [~bmahler] for digging into this a bit and discovering a 
> possible cause 
> [here|https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L5737-L5745],
>  where use of {{defer()}} may be necessary to keep execution in the correct 
> context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5629) Agent segfaults after request to '/files/browse'

2016-06-16 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-5629:
-
Description: 
We observed a number of agent segfaults today on an internal testing cluster. 
Here is a log excerpt:
{code}
Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.522925 24830 
status_update_manager.cpp:392] Received status update acknowledgement (UUID: 
e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
6d4248cd-2832-4152-b5d0-defbf36f6759-
Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.523006 24830 
status_update_manager.cpp:824] Checkpointing ACK for status update TASK_RUNNING 
(UUID: e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
6d4248cd-2832-4152-b5d0-defbf36f6759-
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:29.147181 24824 
http.cpp:192] HTTP GET for /slave(1)/state from 10.10.0.87:33356
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: *** Aborted at 1466097149 
(unix time) try "date -d @1466097149" if you are using GNU date ***
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: PC: @ 0x7ff4d68b12a3 
(unknown)
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: *** SIGSEGV (@0x0) received 
by PID 24818 (TID 0x7ff4d31ab700) from PID 0; stack trace: ***
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6431100 (unknown)
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d68b12a3 (unknown)
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7eced33 
process::dispatch<>()
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7e7aad7 
_ZNSt17_Function_handlerIFN7process6FutureIbEERK6OptionISsEEZN5mesos8internal5slave9Framework15recoverExecutorERKNSA_5state13ExecutorStateEEUlS6_E_E9_M_invokeERKSt9_Any_dataS6_
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd1752 
mesos::internal::FilesProcess::authorize()
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd1bea 
mesos::internal::FilesProcess::browse()
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd6e43 
std::_Function_handler<>::_M_invoke()
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d85478cb 
_ZZZN7process11ProcessBase5visitERKNS_9HttpEventEENKUlRKNS_6FutureI6OptionINS_4http14authentication20AuthenticationResultE0_clESC_ENKUlRKNS4_IbEEE1_clESG_
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d8551341 
process::ProcessManager::resume()
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d8551647 
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6909220 (unknown)
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6429dc5 
start_thread
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d615728d __clone
Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service: main 
process exited, code=killed, status=11/SEGV
Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: Unit dcos-mesos-slave.service entered 
failed state.
Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service failed.
Jun 16 17:12:34 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service holdoff time 
over, scheduling restart.
{code}

In every case, the stack trace indicates one of the {{/files/*}} endpoints; I 
observed this a number of times coming from {{browse()}}, and twice from 
{{read()}}.

The agent was built from the 1.0.0-rc1 branch, with two cherry-picks applied: 
[this|https://reviews.apache.org/r/48563/] and 
[this|https://reviews.apache.org/r/48566/], which were done to repair a 
different [segfault issue|https://issues.apache.org/jira/browse/MESOS-5587] on 
the master.

Thanks go to [~bmahler] for digging into this a bit and discovering a possible 
cause 
[here|https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L5737-L5745],
 where use of {{defer()}} may be necessary to keep execution in the correct 
context.

  was:
We observed a number of agent segfaults today on an internal testing cluster. 
Here is a log excerpt:
{code}
Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.522925 24830 
status_update_manager.cpp:392] Received status update acknowledgement (UUID: 
e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
6d4248cd-2832-4152-b5d0-defbf36f6759-
Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.523006 24830 
status_update_manager.cpp:824] Checkpointing ACK for status update TASK_RUNNING 
(UUID: e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
6d4248cd-2832-4152-b5d0-defbf36f6759-
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: I0616 

[jira] [Commented] (MESOS-4248) mesos slave can't start in CentOS-7 docker container

2016-06-16 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335211#comment-15335211
 ] 

haosdent commented on MESOS-4248:
-

cc [~jvenus]

> mesos slave can't start in CentOS-7 docker container
> 
>
> Key: MESOS-4248
> URL: https://issues.apache.org/jira/browse/MESOS-4248
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 0.26.0
> Environment: My host OS is Debian Jessie,  the container OS is CentOS 
> 7.2.
> {code}
> # cat /etc/system-release
> CentOS Linux release 7.2.1511 (Core) 
> # rpm -qa |grep mesos
> mesosphere-zookeeper-3.4.6-0.1.20141204175332.centos7.x86_64
> mesosphere-el-repo-7-1.noarch
> mesos-0.26.0-0.2.145.centos701406.x86_64
> $ docker version
> Client:
>  Version:  1.9.1
>  API version:  1.21
>  Go version:   go1.4.2
>  Git commit:   a34a1d5
>  Built:Fri Nov 20 12:59:02 UTC 2015
>  OS/Arch:  linux/amd64
> Server:
>  Version:  1.9.1
>  API version:  1.21
>  Go version:   go1.4.2
>  Git commit:   a34a1d5
>  Built:Fri Nov 20 12:59:02 UTC 2015
>  OS/Arch:  linux/amd64
> {code}
>Reporter: Yubao Liu
>
> // Check the "Environment" label above for kinds of software versions.
> "systemctl start mesos-slave" can't start mesos-slave:
> {code}
> # journalctl -u mesos-slave
> 
> Dec 24 10:35:25 mesos-slave1 systemd[1]: Started Mesos Slave.
> Dec 24 10:35:25 mesos-slave1 systemd[1]: Starting Mesos Slave...
> Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.210180 12838 
> logging.cpp:172] INFO level logging started!
> Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.210603 12838 
> main.cpp:190] Build: 2015-12-16 23:06:16 by root
> Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.210625 12838 
> main.cpp:192] Version: 0.26.0
> Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.210634 12838 
> main.cpp:195] Git tag: 0.26.0
> Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.210644 12838 
> main.cpp:199] Git SHA: d3717e5c4d1bf4fca5c41cd7ea54fae489028faa
> Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.210765 12838 
> containerizer.cpp:142] Using isolation: posix/cpu,posix/mem,filesystem/posix
> Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.215638 12838 
> linux_launcher.cpp:103] Using /sys/fs/cgroup/freezer as the freezer hierarchy 
> for the Linux launcher
> Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.220279 12838 
> systemd.cpp:128] systemd version `219` detected
> Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.227017 12838 
> systemd.cpp:210] Started systemd slice `mesos_executors.slice`
> Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: Failed to create a 
> containerizer: Could not create MesosContainerizer: Failed to create 
> launcher: Failed to locate systemd cgroups hierarchy: does not exist
> Dec 24 10:35:25 mesos-slave1 systemd[1]: mesos-slave.service: main process 
> exited, code=exited, status=1/FAILURE
> Dec 24 10:35:25 mesos-slave1 systemd[1]: Unit mesos-slave.service entered 
> failed state.
> Dec 24 10:35:25 mesos-slave1 systemd[1]: mesos-slave.service failed.
> {code}
> I used strace to debug it, mesos-slave tried to access 
> "/sys/fs/cgroup/systemd/mesos_executors.slice",  but it's actually at 
> "/sys/fs/cgroup/systemd/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope/mesos_executors.slice/",
>mesos-slave should check "/proc/self/cgroup" to find those intermediate 
> directories:
> {code}
> # cat /proc/self/cgroup 
> 8:perf_event:/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope
> 7:blkio:/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope
> 6:net_cls,net_prio:/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope
> 5:freezer:/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope
> 4:devices:/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope
> 3:cpu,cpuacct:/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope
> 2:cpuset:/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope
> 1:name=systemd:/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5629) Agent segfaults after request to '/files/browse'

2016-06-16 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335195#comment-15335195
 ] 

haosdent commented on MESOS-5629:
-

hi, [~greggomann] Could not open 
https://github.com/mesosphere/mesos-private/blob/greg/1.0-w-fixes/src/slave/slave.cpp#L5704-L5712

> Agent segfaults after request to '/files/browse'
> 
>
> Key: MESOS-5629
> URL: https://issues.apache.org/jira/browse/MESOS-5629
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS 7, Mesos 1.0.0-rc1 with patches
>Reporter: Greg Mann
>Assignee: Greg Mann
>Priority: Blocker
>  Labels: authorization, mesosphere, security
> Fix For: 1.0.0
>
>
> We observed a number of agent segfaults today on an internal testing cluster. 
> Here is a log excerpt:
> {code}
> Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.522925 24830 
> status_update_manager.cpp:392] Received status update acknowledgement (UUID: 
> e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
> datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
> 6d4248cd-2832-4152-b5d0-defbf36f6759-
> Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.523006 24830 
> status_update_manager.cpp:824] Checkpointing ACK for status update 
> TASK_RUNNING (UUID: e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
> datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
> 6d4248cd-2832-4152-b5d0-defbf36f6759-
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:29.147181 24824 
> http.cpp:192] HTTP GET for /slave(1)/state from 10.10.0.87:33356
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: *** Aborted at 1466097149 
> (unix time) try "date -d @1466097149" if you are using GNU date ***
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: PC: @ 0x7ff4d68b12a3 
> (unknown)
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: *** SIGSEGV (@0x0) received 
> by PID 24818 (TID 0x7ff4d31ab700) from PID 0; stack trace: ***
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6431100 
> (unknown)
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d68b12a3 
> (unknown)
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7eced33 
> process::dispatch<>()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7e7aad7 
> _ZNSt17_Function_handlerIFN7process6FutureIbEERK6OptionISsEEZN5mesos8internal5slave9Framework15recoverExecutorERKNSA_5state13ExecutorStateEEUlS6_E_E9_M_invokeERKSt9_Any_dataS6_
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd1752 
> mesos::internal::FilesProcess::authorize()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd1bea 
> mesos::internal::FilesProcess::browse()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd6e43 
> std::_Function_handler<>::_M_invoke()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d85478cb 
> _ZZZN7process11ProcessBase5visitERKNS_9HttpEventEENKUlRKNS_6FutureI6OptionINS_4http14authentication20AuthenticationResultE0_clESC_ENKUlRKNS4_IbEEE1_clESG_
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d8551341 
> process::ProcessManager::resume()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d8551647 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6909220 
> (unknown)
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6429dc5 
> start_thread
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d615728d __clone
> Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service: main 
> process exited, code=killed, status=11/SEGV
> Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: Unit dcos-mesos-slave.service 
> entered failed state.
> Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service failed.
> Jun 16 17:12:34 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service holdoff 
> time over, scheduling restart.
> {code}
> In every case, the stack trace indicates one of the {{/files/*}} endpoints; I 
> observed this a number of times coming from {{browse()}}, and twice from 
> {{read()}}.
> The agent was built from the 1.0.0-rc1 branch, with two cherry-picks applied: 
> [this|https://reviews.apache.org/r/48563/] and 
> [this|https://reviews.apache.org/r/48566/], which were done to repair a 
> different [segfault issue|https://issues.apache.org/jira/browse/MESOS-5587] 
> on the master.
> Thanks go to [~bmahler] for digging into this a bit and discovering a 
> possible cause 
> [here|https://github.com/mesosphere/mesos-private/blob/greg/1.0-w-fixes/src/slave/slave.cpp#L5704-L5712],
>  where use of {{defer()}} may be necessary to keep execution in the correct 

[jira] [Commented] (MESOS-5503) Implement GET_MAINTENANCE_STATUS Call in v1 master API.

2016-06-16 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335162#comment-15335162
 ] 

haosdent commented on MESOS-5503:
-

[~vinodkone] Thanks a lot for your helps. I just rebase the chain. Could you 
help review again? Thank you in advance.

> Implement GET_MAINTENANCE_STATUS Call in v1 master API.
> ---
>
> Key: MESOS-5503
> URL: https://issues.apache.org/jira/browse/MESOS-5503
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: haosdent
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4248) mesos slave can't start in CentOS-7 docker container

2016-06-16 Thread Justin Venus (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335119#comment-15335119
 ] 

Justin Venus commented on MESOS-4248:
-

I can reproduce this issue on CoreOS and Centos7 hosts.
  
The container image I use is centos7 with systemd to supervising the 
mesos-slave process.
The versions of I've observed with this issue are mesos 0.25.0 and 0.26.0.

The patch above works for 0.26.0.
I'm about to upgrade to 0.27.2 and will report back.


Also:  I have run a mesos-slave container on ubuntu 12.04 with this patch and 
have observed no negative side effects.


Here is part of my build setup that could be used reproduce the issue, you'll 
need to remove some stuff from the Dockerfile.
https://gist.github.com/anonymous/66fc10436c2c381c2aea6dda122ce205

> mesos slave can't start in CentOS-7 docker container
> 
>
> Key: MESOS-4248
> URL: https://issues.apache.org/jira/browse/MESOS-4248
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 0.26.0
> Environment: My host OS is Debian Jessie,  the container OS is CentOS 
> 7.2.
> {code}
> # cat /etc/system-release
> CentOS Linux release 7.2.1511 (Core) 
> # rpm -qa |grep mesos
> mesosphere-zookeeper-3.4.6-0.1.20141204175332.centos7.x86_64
> mesosphere-el-repo-7-1.noarch
> mesos-0.26.0-0.2.145.centos701406.x86_64
> $ docker version
> Client:
>  Version:  1.9.1
>  API version:  1.21
>  Go version:   go1.4.2
>  Git commit:   a34a1d5
>  Built:Fri Nov 20 12:59:02 UTC 2015
>  OS/Arch:  linux/amd64
> Server:
>  Version:  1.9.1
>  API version:  1.21
>  Go version:   go1.4.2
>  Git commit:   a34a1d5
>  Built:Fri Nov 20 12:59:02 UTC 2015
>  OS/Arch:  linux/amd64
> {code}
>Reporter: Yubao Liu
>
> // Check the "Environment" label above for kinds of software versions.
> "systemctl start mesos-slave" can't start mesos-slave:
> {code}
> # journalctl -u mesos-slave
> 
> Dec 24 10:35:25 mesos-slave1 systemd[1]: Started Mesos Slave.
> Dec 24 10:35:25 mesos-slave1 systemd[1]: Starting Mesos Slave...
> Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.210180 12838 
> logging.cpp:172] INFO level logging started!
> Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.210603 12838 
> main.cpp:190] Build: 2015-12-16 23:06:16 by root
> Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.210625 12838 
> main.cpp:192] Version: 0.26.0
> Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.210634 12838 
> main.cpp:195] Git tag: 0.26.0
> Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.210644 12838 
> main.cpp:199] Git SHA: d3717e5c4d1bf4fca5c41cd7ea54fae489028faa
> Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.210765 12838 
> containerizer.cpp:142] Using isolation: posix/cpu,posix/mem,filesystem/posix
> Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.215638 12838 
> linux_launcher.cpp:103] Using /sys/fs/cgroup/freezer as the freezer hierarchy 
> for the Linux launcher
> Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.220279 12838 
> systemd.cpp:128] systemd version `219` detected
> Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.227017 12838 
> systemd.cpp:210] Started systemd slice `mesos_executors.slice`
> Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: Failed to create a 
> containerizer: Could not create MesosContainerizer: Failed to create 
> launcher: Failed to locate systemd cgroups hierarchy: does not exist
> Dec 24 10:35:25 mesos-slave1 systemd[1]: mesos-slave.service: main process 
> exited, code=exited, status=1/FAILURE
> Dec 24 10:35:25 mesos-slave1 systemd[1]: Unit mesos-slave.service entered 
> failed state.
> Dec 24 10:35:25 mesos-slave1 systemd[1]: mesos-slave.service failed.
> {code}
> I used strace to debug it, mesos-slave tried to access 
> "/sys/fs/cgroup/systemd/mesos_executors.slice",  but it's actually at 
> "/sys/fs/cgroup/systemd/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope/mesos_executors.slice/",
>mesos-slave should check "/proc/self/cgroup" to find those intermediate 
> directories:
> {code}
> # cat /proc/self/cgroup 
> 8:perf_event:/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope
> 7:blkio:/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope
> 6:net_cls,net_prio:/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope
> 5:freezer:/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope
> 4:devices:/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope
> 

[jira] [Comment Edited] (MESOS-5628) `QuotaHandler` should only make one authorization request to the authorizer.

2016-06-16 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334837#comment-15334837
 ] 

Alexander Rukletsov edited comment on MESOS-5628 at 6/17/16 12:37 AM:
--

https://reviews.apache.org/r/48039
https://reviews.apache.org/r/48038/
https://reviews.apache.org/r/48040/
https://reviews.apache.org/r/48824/


was (Author: mcypark):
https://reviews.apache.org/r/48039
https://reviews.apache.org/r/48038/
https://reviews.apache.org/r/48040/

> `QuotaHandler` should only make one authorization request to the authorizer.
> 
>
> Key: MESOS-5628
> URL: https://issues.apache.org/jira/browse/MESOS-5628
> Project: Mesos
>  Issue Type: Task
>Reporter: Michael Park
>Assignee: Michael Park
>  Labels: mesosphere
> Fix For: 1.0.0
>
>
> Currently, the {{QuotaHandler}} makes two authorization requests to the 
> authorizer. For example, {{SetQuota}} and {{UpdateQuota}}. It then uses the 
> following loop to determine its behavior.
> {code}
> foreach (bool authorized, authorizeResults) {
>   if (!authorized) {
> return Forbidden();
>   }
> }
> return _set(quotaInfo, forced);
> {code}
> This depends on the fact that {{LocalAuthorizer::authorized}} returns 
> {{true}} when it receives a request it does not support. Considering that 
> {{true}} as an answer to {{authorized}} means authorized, this is clearly 
> incorrect. In general, this type of global invariant is difficult to keep in 
> sync and correct.
> Another issue is that a seemingly innocent transformation of this loop would 
> break the logic:
> {code}
> foreach (bool authorized, authorizeResults) {
>   if (authorized) {
> return _set(quotaInfo, forced);
>   }
> }
> return Forbidden();
> {code}
> Attempting to make multiple requests to the authorizer for an action and 
> trying to combine the results is complicated. It would be much simpler to 
> make one request per action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5629) Agent segfaults after request to '/files/browse'

2016-06-16 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-5629:
--
Labels: authorization mesosphere security  (was: authorization mesosphere)

> Agent segfaults after request to '/files/browse'
> 
>
> Key: MESOS-5629
> URL: https://issues.apache.org/jira/browse/MESOS-5629
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS 7, Mesos 1.0.0-rc1 with patches
>Reporter: Greg Mann
>Assignee: Greg Mann
>Priority: Blocker
>  Labels: authorization, mesosphere, security
> Fix For: 1.0.0
>
>
> We observed a number of agent segfaults today on an internal testing cluster. 
> Here is a log excerpt:
> {code}
> Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.522925 24830 
> status_update_manager.cpp:392] Received status update acknowledgement (UUID: 
> e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
> datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
> 6d4248cd-2832-4152-b5d0-defbf36f6759-
> Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.523006 24830 
> status_update_manager.cpp:824] Checkpointing ACK for status update 
> TASK_RUNNING (UUID: e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
> datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
> 6d4248cd-2832-4152-b5d0-defbf36f6759-
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:29.147181 24824 
> http.cpp:192] HTTP GET for /slave(1)/state from 10.10.0.87:33356
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: *** Aborted at 1466097149 
> (unix time) try "date -d @1466097149" if you are using GNU date ***
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: PC: @ 0x7ff4d68b12a3 
> (unknown)
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: *** SIGSEGV (@0x0) received 
> by PID 24818 (TID 0x7ff4d31ab700) from PID 0; stack trace: ***
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6431100 
> (unknown)
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d68b12a3 
> (unknown)
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7eced33 
> process::dispatch<>()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7e7aad7 
> _ZNSt17_Function_handlerIFN7process6FutureIbEERK6OptionISsEEZN5mesos8internal5slave9Framework15recoverExecutorERKNSA_5state13ExecutorStateEEUlS6_E_E9_M_invokeERKSt9_Any_dataS6_
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd1752 
> mesos::internal::FilesProcess::authorize()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd1bea 
> mesos::internal::FilesProcess::browse()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd6e43 
> std::_Function_handler<>::_M_invoke()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d85478cb 
> _ZZZN7process11ProcessBase5visitERKNS_9HttpEventEENKUlRKNS_6FutureI6OptionINS_4http14authentication20AuthenticationResultE0_clESC_ENKUlRKNS4_IbEEE1_clESG_
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d8551341 
> process::ProcessManager::resume()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d8551647 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6909220 
> (unknown)
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6429dc5 
> start_thread
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d615728d __clone
> Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service: main 
> process exited, code=killed, status=11/SEGV
> Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: Unit dcos-mesos-slave.service 
> entered failed state.
> Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service failed.
> Jun 16 17:12:34 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service holdoff 
> time over, scheduling restart.
> {code}
> In every case, the stack trace indicates one of the {{/files/*}} endpoints; I 
> observed this a number of times coming from {{browse()}}, and twice from 
> {{read()}}.
> The agent was built from the 1.0.0-rc1 branch, with two cherry-picks applied: 
> [this|https://reviews.apache.org/r/48563/] and 
> [this|https://reviews.apache.org/r/48566/], which were done to repair a 
> different [segfault issue|https://issues.apache.org/jira/browse/MESOS-5587] 
> on the master.
> Thanks go to [~bmahler] for digging into this a bit and discovering a 
> possible cause 
> [here|https://github.com/mesosphere/mesos-private/blob/greg/1.0-w-fixes/src/slave/slave.cpp#L5704-L5712],
>  where use of {{defer()}} may be necessary to keep execution in the correct 
> context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5629) Agent segfaults after request to '/files/browse'

2016-06-16 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann reassigned MESOS-5629:


Assignee: Greg Mann

> Agent segfaults after request to '/files/browse'
> 
>
> Key: MESOS-5629
> URL: https://issues.apache.org/jira/browse/MESOS-5629
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS 7, Mesos 1.0.0-rc1 with patches
>Reporter: Greg Mann
>Assignee: Greg Mann
>Priority: Blocker
>  Labels: authorization, mesosphere
> Fix For: 1.0.0
>
>
> We observed a number of agent segfaults today on an internal testing cluster. 
> Here is a log excerpt:
> {code}
> Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.522925 24830 
> status_update_manager.cpp:392] Received status update acknowledgement (UUID: 
> e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
> datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
> 6d4248cd-2832-4152-b5d0-defbf36f6759-
> Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.523006 24830 
> status_update_manager.cpp:824] Checkpointing ACK for status update 
> TASK_RUNNING (UUID: e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
> datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
> 6d4248cd-2832-4152-b5d0-defbf36f6759-
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:29.147181 24824 
> http.cpp:192] HTTP GET for /slave(1)/state from 10.10.0.87:33356
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: *** Aborted at 1466097149 
> (unix time) try "date -d @1466097149" if you are using GNU date ***
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: PC: @ 0x7ff4d68b12a3 
> (unknown)
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: *** SIGSEGV (@0x0) received 
> by PID 24818 (TID 0x7ff4d31ab700) from PID 0; stack trace: ***
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6431100 
> (unknown)
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d68b12a3 
> (unknown)
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7eced33 
> process::dispatch<>()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7e7aad7 
> _ZNSt17_Function_handlerIFN7process6FutureIbEERK6OptionISsEEZN5mesos8internal5slave9Framework15recoverExecutorERKNSA_5state13ExecutorStateEEUlS6_E_E9_M_invokeERKSt9_Any_dataS6_
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd1752 
> mesos::internal::FilesProcess::authorize()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd1bea 
> mesos::internal::FilesProcess::browse()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd6e43 
> std::_Function_handler<>::_M_invoke()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d85478cb 
> _ZZZN7process11ProcessBase5visitERKNS_9HttpEventEENKUlRKNS_6FutureI6OptionINS_4http14authentication20AuthenticationResultE0_clESC_ENKUlRKNS4_IbEEE1_clESG_
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d8551341 
> process::ProcessManager::resume()
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d8551647 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6909220 
> (unknown)
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6429dc5 
> start_thread
> Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d615728d __clone
> Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service: main 
> process exited, code=killed, status=11/SEGV
> Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: Unit dcos-mesos-slave.service 
> entered failed state.
> Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service failed.
> Jun 16 17:12:34 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service holdoff 
> time over, scheduling restart.
> {code}
> In every case, the stack trace indicates one of the {{/files/*}} endpoints; I 
> observed this a number of times coming from {{browse()}}, and twice from 
> {{read()}}.
> The agent was built from the 1.0.0-rc1 branch, with two cherry-picks applied: 
> [this|https://reviews.apache.org/r/48563/] and 
> [this|https://reviews.apache.org/r/48566/], which were done to repair a 
> different [segfault issue|https://issues.apache.org/jira/browse/MESOS-5587] 
> on the master.
> Thanks go to [~bmahler] for digging into this a bit and discovering a 
> possible cause 
> [here|https://github.com/mesosphere/mesos-private/blob/greg/1.0-w-fixes/src/slave/slave.cpp#L5704-L5712],
>  where use of {{defer()}} may be necessary to keep execution in the correct 
> context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5629) Agent segfaults after request to '/files/browse'

2016-06-16 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-5629:
-
Description: 
We observed a number of agent segfaults today on an internal testing cluster. 
Here is a log excerpt:
{code}
Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.522925 24830 
status_update_manager.cpp:392] Received status update acknowledgement (UUID: 
e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
6d4248cd-2832-4152-b5d0-defbf36f6759-
Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.523006 24830 
status_update_manager.cpp:824] Checkpointing ACK for status update TASK_RUNNING 
(UUID: e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
6d4248cd-2832-4152-b5d0-defbf36f6759-
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:29.147181 24824 
http.cpp:192] HTTP GET for /slave(1)/state from 10.10.0.87:33356
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: *** Aborted at 1466097149 
(unix time) try "date -d @1466097149" if you are using GNU date ***
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: PC: @ 0x7ff4d68b12a3 
(unknown)
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: *** SIGSEGV (@0x0) received 
by PID 24818 (TID 0x7ff4d31ab700) from PID 0; stack trace: ***
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6431100 (unknown)
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d68b12a3 (unknown)
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7eced33 
process::dispatch<>()
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7e7aad7 
_ZNSt17_Function_handlerIFN7process6FutureIbEERK6OptionISsEEZN5mesos8internal5slave9Framework15recoverExecutorERKNSA_5state13ExecutorStateEEUlS6_E_E9_M_invokeERKSt9_Any_dataS6_
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd1752 
mesos::internal::FilesProcess::authorize()
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd1bea 
mesos::internal::FilesProcess::browse()
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd6e43 
std::_Function_handler<>::_M_invoke()
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d85478cb 
_ZZZN7process11ProcessBase5visitERKNS_9HttpEventEENKUlRKNS_6FutureI6OptionINS_4http14authentication20AuthenticationResultE0_clESC_ENKUlRKNS4_IbEEE1_clESG_
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d8551341 
process::ProcessManager::resume()
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d8551647 
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6909220 (unknown)
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6429dc5 
start_thread
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d615728d __clone
Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service: main 
process exited, code=killed, status=11/SEGV
Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: Unit dcos-mesos-slave.service entered 
failed state.
Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service failed.
Jun 16 17:12:34 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service holdoff time 
over, scheduling restart.
{code}

In every case, the stack trace indicates one of the {{/files/*}} endpoints; I 
observed this a number of times coming from {{browse()}}, and twice from 
{{read()}}.

The agent was built from the 1.0.0-rc1 branch, with two cherry-picks applied: 
[this|https://reviews.apache.org/r/48563/] and 
[this|https://reviews.apache.org/r/48566/], which were done to repair a 
different [segfault issue|https://issues.apache.org/jira/browse/MESOS-5587] on 
the master.

Thanks go to [~bmahler] for digging into this a bit and discovering a possible 
cause 
[here](https://github.com/mesosphere/mesos-private/blob/greg/1.0-w-fixes/src/slave/slave.cpp#L5704-L5712),
 where use of {{defer()}} may be necessary to keep execution in the correct 
context.

  was:
We observed a number of agent segfaults today on an internal testing cluster. 
Here is a log excerpt:
{code}
Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.522925 24830 
status_update_manager.cpp:392] Received status update acknowledgement (UUID: 
e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
6d4248cd-2832-4152-b5d0-defbf36f6759-
Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.523006 24830 
status_update_manager.cpp:824] Checkpointing ACK for status update TASK_RUNNING 
(UUID: e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
6d4248cd-2832-4152-b5d0-defbf36f6759-
Jun 16 17:12:29 ip-10-10-0-87 

[jira] [Updated] (MESOS-5629) Agent segfaults after request to '/files/browse'

2016-06-16 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-5629:
-
Description: 
We observed a number of agent segfaults today on an internal testing cluster. 
Here is a log excerpt:
{code}
Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.522925 24830 
status_update_manager.cpp:392] Received status update acknowledgement (UUID: 
e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
6d4248cd-2832-4152-b5d0-defbf36f6759-
Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.523006 24830 
status_update_manager.cpp:824] Checkpointing ACK for status update TASK_RUNNING 
(UUID: e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
6d4248cd-2832-4152-b5d0-defbf36f6759-
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:29.147181 24824 
http.cpp:192] HTTP GET for /slave(1)/state from 10.10.0.87:33356
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: *** Aborted at 1466097149 
(unix time) try "date -d @1466097149" if you are using GNU date ***
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: PC: @ 0x7ff4d68b12a3 
(unknown)
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: *** SIGSEGV (@0x0) received 
by PID 24818 (TID 0x7ff4d31ab700) from PID 0; stack trace: ***
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6431100 (unknown)
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d68b12a3 (unknown)
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7eced33 
process::dispatch<>()
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7e7aad7 
_ZNSt17_Function_handlerIFN7process6FutureIbEERK6OptionISsEEZN5mesos8internal5slave9Framework15recoverExecutorERKNSA_5state13ExecutorStateEEUlS6_E_E9_M_invokeERKSt9_Any_dataS6_
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd1752 
mesos::internal::FilesProcess::authorize()
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd1bea 
mesos::internal::FilesProcess::browse()
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd6e43 
std::_Function_handler<>::_M_invoke()
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d85478cb 
_ZZZN7process11ProcessBase5visitERKNS_9HttpEventEENKUlRKNS_6FutureI6OptionINS_4http14authentication20AuthenticationResultE0_clESC_ENKUlRKNS4_IbEEE1_clESG_
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d8551341 
process::ProcessManager::resume()
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d8551647 
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6909220 (unknown)
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6429dc5 
start_thread
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d615728d __clone
Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service: main 
process exited, code=killed, status=11/SEGV
Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: Unit dcos-mesos-slave.service entered 
failed state.
Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service failed.
Jun 16 17:12:34 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service holdoff time 
over, scheduling restart.
{code}

In every case, the stack trace indicates one of the {{/files/*}} endpoints; I 
observed this a number of times coming from {{browse()}}, and twice from 
{{read()}}.

The agent was built from the 1.0.0-rc1 branch, with two cherry-picks applied: 
[this|https://reviews.apache.org/r/48563/] and 
[this|https://reviews.apache.org/r/48566/], which were done to repair a 
different [segfault issue|https://issues.apache.org/jira/browse/MESOS-5587] on 
the master.

Thanks go to [~bmahler] for digging into this a bit and discovering a possible 
cause 
[here|https://github.com/mesosphere/mesos-private/blob/greg/1.0-w-fixes/src/slave/slave.cpp#L5704-L5712],
 where use of {{defer()}} may be necessary to keep execution in the correct 
context.

  was:
We observed a number of agent segfaults today on an internal testing cluster. 
Here is a log excerpt:
{code}
Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.522925 24830 
status_update_manager.cpp:392] Received status update acknowledgement (UUID: 
e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
6d4248cd-2832-4152-b5d0-defbf36f6759-
Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.523006 24830 
status_update_manager.cpp:824] Checkpointing ACK for status update TASK_RUNNING 
(UUID: e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
6d4248cd-2832-4152-b5d0-defbf36f6759-
Jun 16 17:12:29 ip-10-10-0-87 

[jira] [Created] (MESOS-5629) Agent segfaults after request to '/files/browse'

2016-06-16 Thread Greg Mann (JIRA)
Greg Mann created MESOS-5629:


 Summary: Agent segfaults after request to '/files/browse'
 Key: MESOS-5629
 URL: https://issues.apache.org/jira/browse/MESOS-5629
 Project: Mesos
  Issue Type: Bug
 Environment: CentOS 7, Mesos 1.0.0-rc1 with patches
Reporter: Greg Mann
Priority: Blocker
 Fix For: 1.0.0


We observed a number of agent segfaults today on an internal testing cluster. 
Here is a log excerpt:
{code}
Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.522925 24830 
status_update_manager.cpp:392] Received status update acknowledgement (UUID: 
e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
6d4248cd-2832-4152-b5d0-defbf36f6759-
Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.523006 24830 
status_update_manager.cpp:824] Checkpointing ACK for status update TASK_RUNNING 
(UUID: e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task 
datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 
6d4248cd-2832-4152-b5d0-defbf36f6759-
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:29.147181 24824 
http.cpp:192] HTTP GET for /slave(1)/state from 10.10.0.87:33356
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: *** Aborted at 1466097149 
(unix time) try "date -d @1466097149" if you are using GNU date ***
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: PC: @ 0x7ff4d68b12a3 
(unknown)
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: *** SIGSEGV (@0x0) received 
by PID 24818 (TID 0x7ff4d31ab700) from PID 0; stack trace: ***
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6431100 (unknown)
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d68b12a3 (unknown)
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7eced33 
process::dispatch<>()
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7e7aad7 
_ZNSt17_Function_handlerIFN7process6FutureIbEERK6OptionISsEEZN5mesos8internal5slave9Framework15recoverExecutorERKNSA_5state13ExecutorStateEEUlS6_E_E9_M_invokeERKSt9_Any_dataS6_
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd1752 
mesos::internal::FilesProcess::authorize()
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd1bea 
mesos::internal::FilesProcess::browse()
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd6e43 
std::_Function_handler<>::_M_invoke()
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d85478cb 
_ZZZN7process11ProcessBase5visitERKNS_9HttpEventEENKUlRKNS_6FutureI6OptionINS_4http14authentication20AuthenticationResultE0_clESC_ENKUlRKNS4_IbEEE1_clESG_
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d8551341 
process::ProcessManager::resume()
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d8551647 
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6909220 (unknown)
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6429dc5 
start_thread
Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d615728d __clone
Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service: main 
process exited, code=killed, status=11/SEGV
Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: Unit dcos-mesos-slave.service entered 
failed state.
Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service failed.
Jun 16 17:12:34 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service holdoff time 
over, scheduling restart.
{code}

In every case, the stack trace indicates one of the {{/files/*}} endpoints; I 
observed this a number of times coming from {{browse()}}, and twice from 
{{read()}}.

Thanks go to [~bmahler] for digging into this a bit and discovering a possible 
cause 
[here](https://github.com/mesosphere/mesos-private/blob/greg/1.0-w-fixes/src/slave/slave.cpp#L5704-L5712),
 where use of {{defer()}} may be necessary to keep execution in the correct 
context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5588) Improve error handling when parsing acls.

2016-06-16 Thread Joerg Schad (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334888#comment-15334888
 ] 

Joerg Schad commented on MESOS-5588:


The problem is that those fields are optional, ie., we get a valid protobuf 
which passes validation.

> Improve error handling when parsing acls.
> -
>
> Key: MESOS-5588
> URL: https://issues.apache.org/jira/browse/MESOS-5588
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>Priority: Blocker
>  Labels: mesosphere, security
> Fix For: 1.0.0
>
>
> During parsing of the authorizer errors are ignored. This can lead to 
> undetected security issues.
> Consider the following acl with an typo (usr instead of user)
> {code}
>"view_frameworks": [
>   {
> "principals": { "type": "ANY" },
> "usr": { "type": "NONE" }
>   }
> ]
> {code}
> When the master is started with these flags it will interprete the acl int he 
> following way which gives any principal access to any framework.
> {noformat}
> view_frameworks {
>   principals {
> type: ANY
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5628) `QuotaHandler` should only make one authorization request to the authorizer.

2016-06-16 Thread Michael Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Park updated MESOS-5628:

Fix Version/s: 1.0.0

> `QuotaHandler` should only make one authorization request to the authorizer.
> 
>
> Key: MESOS-5628
> URL: https://issues.apache.org/jira/browse/MESOS-5628
> Project: Mesos
>  Issue Type: Task
>Reporter: Michael Park
>Assignee: Michael Park
>  Labels: mesosphere
> Fix For: 1.0.0
>
>
> Currently, the {{QuotaHandler}} makes two authorization requests to the 
> authorizer. For example, {{SetQuota}} and {{UpdateQuota}}. It then uses the 
> following loop to determine its behavior.
> {code}
> foreach (bool authorized, authorizeResults) {
>   if (!authorized) {
> return Forbidden();
>   }
> }
> return _set(quotaInfo, forced);
> {code}
> This depends on the fact that {{LocalAuthorizer::authorized}} returns 
> {{true}} when it receives a request it does not support. Considering that 
> {{true}} as an answer to {{authorized}} means authorized, this is clearly 
> incorrect. In general, this type of global invariant is difficult to keep in 
> sync and correct.
> Another issue is that a seemingly innocent transformation of this loop would 
> break the logic:
> {code}
> foreach (bool authorized, authorizeResults) {
>   if (authorized) {
> return _set(quotaInfo, forced);
>   }
> }
> return Forbidden();
> {code}
> Attempting to make multiple requests to the authorizer for an action and 
> trying to combine the results is complicated. It would be much simpler to 
> make one request per action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5257) Add autodiscovery for GPU resources

2016-06-16 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-5257:
---
Fix Version/s: 1.0.0

> Add autodiscovery for GPU resources
> ---
>
> Key: MESOS-5257
> URL: https://issues.apache.org/jira/browse/MESOS-5257
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: isolator
> Fix For: 1.0.0
>
>
> Right now, the only way to enumerate the available GPUs on an agent is to use 
> the `--nvidia_gpu_devices` flag and explicitly list them out.  Instead, we 
> should leverage NVML to autodiscover the GPUs that are available and only use 
> this flag as a way to explicitly list out the GPUs you want to make available 
> in order to restrict access to some of them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5628) `QuotaHandler` should only make one authorization request to the authorizer.

2016-06-16 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334837#comment-15334837
 ] 

Michael Park edited comment on MESOS-5628 at 6/16/16 10:33 PM:
---

https://reviews.apache.org/r/48039
https://reviews.apache.org/r/48038/
https://reviews.apache.org/r/48040/


was (Author: mcypark):
https://reviews.apache.org/r/48039

> `QuotaHandler` should only make one authorization request to the authorizer.
> 
>
> Key: MESOS-5628
> URL: https://issues.apache.org/jira/browse/MESOS-5628
> Project: Mesos
>  Issue Type: Task
>Reporter: Michael Park
>Assignee: Michael Park
>  Labels: mesosphere
>
> Currently, the {{QuotaHandler}} makes two authorization requests to the 
> authorizer. For example, {{SetQuota}} and {{UpdateQuota}}. It then uses the 
> following loop to determine its behavior.
> {code}
> foreach (bool authorized, authorizeResults) {
>   if (!authorized) {
> return Forbidden();
>   }
> }
> return _set(quotaInfo, forced);
> {code}
> This depends on the fact that {{LocalAuthorizer::authorized}} returns 
> {{true}} when it receives a request it does not support. Considering that 
> {{true}} as an answer to {{authorized}} means authorized, this is clearly 
> incorrect. In general, this type of global invariant is difficult to keep in 
> sync and correct.
> Another issue is that a seemingly innocent transformation of this loop would 
> break the logic:
> {code}
> foreach (bool authorized, authorizeResults) {
>   if (authorized) {
> return _set(quotaInfo, forced);
>   }
> }
> return Forbidden();
> {code}
> Attempting to make multiple requests to the authorizer for an action and 
> trying to combine the results is complicated. It would be much simpler to 
> make one request per action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5628) `QuotaHandler` should only make one authorization request to the authorizer.

2016-06-16 Thread Michael Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Park updated MESOS-5628:

Description: 
Currently, the {{QuotaHandler}} makes two authorization requests to the 
authorizer. For example, {{SetQuota}} and {{UpdateQuota}}. It then uses the 
following loop to determine its behavior.

{code}
foreach (bool authorized, authorizeResults) {
  if (!authorized) {
return Forbidden();
  }
}
return _set(quotaInfo, forced);
{code}

This depends on the fact that {{LocalAuthorizer::authorized}} returns {{true}} 
when it receives a request it does not support. Considering that {{true}} as an 
answer to {{authorized}} means authorized, this is clearly incorrect. In 
general, this type of global invariant is difficult to keep in sync and correct.

Another issue is that a seemingly innocent transformation of this loop would 
break the logic:

{code}
foreach (bool authorized, authorizeResults) {
  if (authorized) {
return _set(quotaInfo, forced);
  }
}
return Forbidden();
{code}

Attempting to make multiple requests to the authorizer for an action and trying 
to combine the results is complicated. It would be much simpler to make one 
request per action.

  was:
Currently, the {{QuotaHandler}} makes two authorization requests to the 
authorizer. For example, {{SetQuota}} and {{UpdateQuota}}. It then uses the 
following loop to determine its behavior.

{code}
foreach (bool authorized, authorizeResults) {
  if (!authorized) {
return Forbidden();
  }
}
return _set(quotaInfo, forced);
{code}

This depends on the fact that {{LocalAuthorizer::authorized}} returns {{true}} 
when it receives a request it does not support. Considering that {{true}} as an 
answer to {{authorized}} means authorized, this is clearly incorrect.

Another issue is that a seemingly innocent transformation of this loop can 
break the logic:

{code}
foreach (bool authorized, authorizeResults) {
  if (authorized) {
return _set(quotaInfo, forced);
  }
}
return Forbidden();
{code}

Attempting to make multiple requests to the authorizer for an action and trying 
to combine the results is complicated. It would be much simpler to make one 
request per action.


> `QuotaHandler` should only make one authorization request to the authorizer.
> 
>
> Key: MESOS-5628
> URL: https://issues.apache.org/jira/browse/MESOS-5628
> Project: Mesos
>  Issue Type: Task
>Reporter: Michael Park
>Assignee: Michael Park
>  Labels: mesosphere
>
> Currently, the {{QuotaHandler}} makes two authorization requests to the 
> authorizer. For example, {{SetQuota}} and {{UpdateQuota}}. It then uses the 
> following loop to determine its behavior.
> {code}
> foreach (bool authorized, authorizeResults) {
>   if (!authorized) {
> return Forbidden();
>   }
> }
> return _set(quotaInfo, forced);
> {code}
> This depends on the fact that {{LocalAuthorizer::authorized}} returns 
> {{true}} when it receives a request it does not support. Considering that 
> {{true}} as an answer to {{authorized}} means authorized, this is clearly 
> incorrect. In general, this type of global invariant is difficult to keep in 
> sync and correct.
> Another issue is that a seemingly innocent transformation of this loop would 
> break the logic:
> {code}
> foreach (bool authorized, authorizeResults) {
>   if (authorized) {
> return _set(quotaInfo, forced);
>   }
> }
> return Forbidden();
> {code}
> Attempting to make multiple requests to the authorizer for an action and 
> trying to combine the results is complicated. It would be much simpler to 
> make one request per action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5628) `QuotaHandler` should only make one authorization request to the authorizer.

2016-06-16 Thread Michael Park (JIRA)
Michael Park created MESOS-5628:
---

 Summary: `QuotaHandler` should only make one authorization request 
to the authorizer.
 Key: MESOS-5628
 URL: https://issues.apache.org/jira/browse/MESOS-5628
 Project: Mesos
  Issue Type: Task
Reporter: Michael Park
Assignee: Michael Park


Currently, the {{QuotaHandler}} makes two authorization requests to the 
authorizer. For example, {{SetQuota}} and {{UpdateQuota}}. It then uses the 
following loop to determine its behavior.

{code}
foreach (bool authorized, authorizeResults) {
  if (!authorized) {
return Forbidden();
  }
}
return _set(quotaInfo, forced);
{code}

This depends on the fact that {{LocalAuthorizer::authorized}} returns {{true}} 
when it receives a request it does not support. Considering that {{true}} as an 
answer to {{authorized}} means authorized, this is clearly incorrect.

Another issue is that a seemingly innocent transformation of this loop can 
break the logic:

{code}
foreach (bool authorized, authorizeResults) {
  if (authorized) {
return _set(quotaInfo, forced);
  }
}
return Forbidden();
{code}

Attempting to make multiple requests to the authorizer for an action and trying 
to combine the results is complicated. It would be much simpler to make one 
request per action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5615) When using command executor, the ExecutorInfo is useless for sandbox authorization

2016-06-16 Thread Joerg Schad (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334830#comment-15334830
 ] 

Joerg Schad commented on MESOS-5615:


A few more comments on this: IMO a general issue of this creating an 
executorInfo on the agent (i.e., already before this the fixes proposed here) 
is that the agent/state and master/state will differ as the agent contains this 
new executorInfo.

Especially this means that the copied Labels/DiscoveryInfo can appear both on 
the TaskInfo and the ExecutorInfo. These fields can be custom generated by 
frameworks and custom consumed by external tool. So we should make sure users 
(both framework writers and operators/tool writers) are aware of this.

> When using command executor, the ExecutorInfo is useless for sandbox 
> authorization
> --
>
> Key: MESOS-5615
> URL: https://issues.apache.org/jira/browse/MESOS-5615
> Project: Mesos
>  Issue Type: Bug
>  Components: modules, security, slave
>Affects Versions: 1.0.0
>Reporter: Alexander Rojas
>Assignee: Joerg Schad
>Priority: Blocker
>  Labels: authorization, mesosphere, modularization, security
> Fix For: 1.0.0
>
>
> The design for sandbox access authorization uses the {{ExecutorInfo}} 
> associated with the task as the main authorization space and the 
> {{FrameworkInfo}} as a secondary one. This allows module writes to use fields 
> such a labels for authorization.
> When a task uses the _command executor_ it doesn't provide an 
> {{ExecutorInfo}}, but the info object is generated automatically inside the 
> agent. As such, information which could be used for authorization (e.g. 
> labels) is not available for authorization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5627) Quota-related authorization actions should be removed rather than deprecated.

2016-06-16 Thread Michael Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Park updated MESOS-5627:

Description: 
Quota-related actions {{SET_QUOTA_WITH_ROLE}} and 
{{DESTROY_QUOTA_WITH_PRINCIPAL}} were "deprecated" in 
https://reviews.apache.org/r/47399/. However, these authorization actions are 
new and therefore can be removed instead.

Similar to the transition from {{RUN_TASK_WITH_USER}}, to {{RUN_TASK}} in 
https://reviews.apache.org/r/48113, {{UPDATE_QUOTA_WITH_ROLE}} should become 
{{UPDATE_QUOTA}} and be accompanied with {{QuotaInfo}}.

  was:
Quota-related actions {{SET_QUOTA_WITH_ROLE}} and 
{{DESTROY_QUOTA_WITH_PRINCIPAL}} were "deprecated" in 
https://reviews.apache.org/r/47399/. However, these authorization actions are 
new and therefore can be removed instead.

Also, similar to the transition from {{RUN_TASK_WITH_USER}}, to {{RUN_TASK}} in 
https://reviews.apache.org/r/48113, {{UPDATE_QUOTA_WITH_ROLE}} should become 
{{UPDATE_QUOTA}} and be accompanied with {{QuotaInfo}}.


> Quota-related authorization actions should be removed rather than deprecated.
> -
>
> Key: MESOS-5627
> URL: https://issues.apache.org/jira/browse/MESOS-5627
> Project: Mesos
>  Issue Type: Bug
>Reporter: Michael Park
>Assignee: Michael Park
>  Labels: mesosphere
> Fix For: 1.0.0
>
>
> Quota-related actions {{SET_QUOTA_WITH_ROLE}} and 
> {{DESTROY_QUOTA_WITH_PRINCIPAL}} were "deprecated" in 
> https://reviews.apache.org/r/47399/. However, these authorization actions are 
> new and therefore can be removed instead.
> Similar to the transition from {{RUN_TASK_WITH_USER}}, to {{RUN_TASK}} in 
> https://reviews.apache.org/r/48113, {{UPDATE_QUOTA_WITH_ROLE}} should become 
> {{UPDATE_QUOTA}} and be accompanied with {{QuotaInfo}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5588) Improve error handling when parsing acls.

2016-06-16 Thread Joerg Schad (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334813#comment-15334813
 ] 

Joerg Schad edited comment on MESOS-5588 at 6/16/16 10:17 PM:
--

The above review (if all issues are resolved) solves only one part of the 
problem: i.e., that the object is missing.
This is imo the most prominent problem as it might end up allowing some action 
for ANY object.

But the same problem can appear on the layer above, i.e., typos on the action 
name level:

{code}
   "view_frameworks": [
  {
"principals": { "type": "ANY" },
"user": { "type": "NONE" }
  }
]
{code}

This would result in the ACLs for that action are not being considered.

One potential way to check that all acls are parsed could be the following 
(note that a problem here is that we validate the protobuf, but the protobuf is 
a valid):
We could get the action count from the json file (count the objects) and 
compare it to the action count in the protobuf.

Any other ideas?



was (Author: js84):
The above review (if all issues are resolved) solves only one part of the 
problem: i.e., that the object is missing.
This is imo the most prominent problem as it might end up allowing some action 
for ANY object.

But the same problem can appear on the layer above, i.e., typos on the action 
name level:

{code}
   "view_frameworks": [
  {
"principals": { "type": "ANY" },
"usr": { "type": "NONE" }
  }
]
{code}

This would result in the ACLs for that action are not being considered.

One potential way to check that all acls are parsed could be the following 
(note that a problem here is that we validate the protobuf, but the protobuf is 
a valid):
We could get the action count from the json file (count the objects) and 
compare it to the action count in the protobuf.

Any other ideas?


> Improve error handling when parsing acls.
> -
>
> Key: MESOS-5588
> URL: https://issues.apache.org/jira/browse/MESOS-5588
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>Priority: Blocker
>  Labels: mesosphere, security
> Fix For: 1.0.0
>
>
> During parsing of the authorizer errors are ignored. This can lead to 
> undetected security issues.
> Consider the following acl with an typo (usr instead of user)
> {code}
>"view_frameworks": [
>   {
> "principals": { "type": "ANY" },
> "usr": { "type": "NONE" }
>   }
> ]
> {code}
> When the master is started with these flags it will interprete the acl int he 
> following way which gives any principal access to any framework.
> {noformat}
> view_frameworks {
>   principals {
> type: ANY
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5588) Improve error handling when parsing acls.

2016-06-16 Thread Joerg Schad (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334813#comment-15334813
 ] 

Joerg Schad commented on MESOS-5588:


The above review (if all issues are resolved) solves only one part of the 
problem: i.e., that the object is missing.
This is imo the most prominent problem as it might end up allowing some action 
for ANY object.

But the same problem can appear on the layer above, i.e., typos on the action 
name level:

{code}
   "view_frameworks": [
  {
"principals": { "type": "ANY" },
"usr": { "type": "NONE" }
  }
]
{code}

This would result in the ACLs for that action are not being considered.

One potential way to check that all acls are parsed could be the following 
(note that a problem here is that we validate the protobuf, but the protobuf is 
a valid):
We could get the action count from the json file (count the objects) and 
compare it to the action count in the protobuf.

Any other ideas?


> Improve error handling when parsing acls.
> -
>
> Key: MESOS-5588
> URL: https://issues.apache.org/jira/browse/MESOS-5588
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>Priority: Blocker
>  Labels: mesosphere, security
> Fix For: 1.0.0
>
>
> During parsing of the authorizer errors are ignored. This can lead to 
> undetected security issues.
> Consider the following acl with an typo (usr instead of user)
> {code}
>"view_frameworks": [
>   {
> "principals": { "type": "ANY" },
> "usr": { "type": "NONE" }
>   }
> ]
> {code}
> When the master is started with these flags it will interprete the acl int he 
> following way which gives any principal access to any framework.
> {noformat}
> view_frameworks {
>   principals {
> type: ANY
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5627) Quota-related authorization actions should be removed rather than deprecated.

2016-06-16 Thread Michael Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Park updated MESOS-5627:

Description: 
Quota-related actions {{SET_QUOTA_WITH_ROLE}} and 
{{DESTROY_QUOTA_WITH_PRINCIPAL}} were "deprecated" in 
https://reviews.apache.org/r/47399/. However, these authorization actions are 
new and therefore can be removed instead.

Also, similar to the transition from {{RUN_TASK_WITH_USER}}, to {{RUN_TASK}} in 
https://reviews.apache.org/r/48113, {{UPDATE_QUOTA_WITH_ROLE}} should become 
{{UPDATE_QUOTA}} and be accompanied with {{QuotaInfo}}.

  was:
Quota-related actions {{SET_QUOTA_WITH_ROLE}} and 
{{DESTROY_QUOTA_WITH_PRINCIPAL}} were "deprecated" in 
https://reviews.apache.org/r/47399/. However, these authorization actions are 
new and therefore can be instead, removed.

Also, similar to the transition from {{RUN_TASK_WITH_USER}}, to {{RUN_TASK}} in 
https://reviews.apache.org/r/48113, {{UPDATE_QUOTA_WITH_ROLE}} should become 
{{UPDATE_QUOTA}} and be accompanied with {{QuotaInfo}}.


> Quota-related authorization actions should be removed rather than deprecated.
> -
>
> Key: MESOS-5627
> URL: https://issues.apache.org/jira/browse/MESOS-5627
> Project: Mesos
>  Issue Type: Bug
>Reporter: Michael Park
>Assignee: Michael Park
>  Labels: mesosphere
> Fix For: 1.0.0
>
>
> Quota-related actions {{SET_QUOTA_WITH_ROLE}} and 
> {{DESTROY_QUOTA_WITH_PRINCIPAL}} were "deprecated" in 
> https://reviews.apache.org/r/47399/. However, these authorization actions are 
> new and therefore can be removed instead.
> Also, similar to the transition from {{RUN_TASK_WITH_USER}}, to {{RUN_TASK}} 
> in https://reviews.apache.org/r/48113, {{UPDATE_QUOTA_WITH_ROLE}} should 
> become {{UPDATE_QUOTA}} and be accompanied with {{QuotaInfo}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5627) Quota-related authorization actions should be removed rather than deprecated.

2016-06-16 Thread Michael Park (JIRA)
Michael Park created MESOS-5627:
---

 Summary: Quota-related authorization actions should be removed 
rather than deprecated.
 Key: MESOS-5627
 URL: https://issues.apache.org/jira/browse/MESOS-5627
 Project: Mesos
  Issue Type: Bug
Reporter: Michael Park
Assignee: Michael Park
 Fix For: 1.0.0


Quota-related actions {{SET_QUOTA_WITH_ROLE}} and 
{{DESTROY_QUOTA_WITH_PRINCIPAL}} were "deprecated" in 
https://reviews.apache.org/r/47399/. However, these authorization actions are 
new and therefore can be instead, removed.

Also, similar to the transition from {{RUN_TASK_WITH_USER}}, to {{RUN_TASK}} in 
https://reviews.apache.org/r/48113, {{UPDATE_QUOTA_WITH_ROLE}} should become 
{{UPDATE_QUOTA}} and be accompanied with {{QuotaInfo}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5555) Always provide access to NVIDIA control devices within containers (if GPU isolation is enabled).

2016-06-16 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-:
---
Summary: Always provide access to NVIDIA control devices within containers 
(if GPU isolation is enabled).  (was: Change semantics for granting access to 
/dev/nvidiactl, etc)

> Always provide access to NVIDIA control devices within containers (if GPU 
> isolation is enabled).
> 
>
> Key: MESOS-
> URL: https://issues.apache.org/jira/browse/MESOS-
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: gpu, mesosphere
> Fix For: 1.0.0
>
>
> Currently, access to `/dev/nvidiactl` and `/dev/nvidia-uvm` is only granted 
> to / revoked from a container as GPUs are added and removed from them. On 
> some level, this makes sense because most jobs don't need access to these 
> devices unless they are also using a GPU. However, there are cases when 
> access to these files is appropriate, even when not making use of a GPU. 
> Running `nvidia-smi` to control the global state of the underlying nvidia 
> driver, for example.
> 
> We should add `/dev/nvidiactl` and `/dev/nvidia-uvm` to the default whitelist 
> of devices to include in every container when the `gpu/nvidia` isolator is 
> enabled. This will allow a container to run standard nvidia driver tools 
> (such as `nvidia-smi`) without failing with abnormal errors when no GPUs have 
> been granted to it. As such, these tools will now report that no GPUs are 
> installed instead of failing abnormally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5592) Pass NetworkInfoto CNI Plugins

2016-06-16 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-5592:
--
Summary: Pass NetworkInfoto CNI Plugins  (was: Pass NetworkInfo.Labels to 
CNI Plugins)

> Pass NetworkInfoto CNI Plugins
> --
>
> Key: MESOS-5592
> URL: https://issues.apache.org/jira/browse/MESOS-5592
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Dan Osborne
> Fix For: 1.0.0
>
>
> Mesos has adopted the Container Network Interface as a simple means of 
> networking Mesos tasks launched by the Unified Containerizer. The CNI 
> specification covers a minimum feature set, granting the flexibility to add 
> customized networking functionality in the form of agreements made between 
> the orchestrator and CNI plugin.
> This proposal is to pass NetworkInfo.Labels to the CNI plugin by injecting it 
> into the CNI network configuration json during plugin invocation.
> Design Doc on this change: 
> https://docs.google.com/document/d/1rxruCCcJqpppsQxQrzTbHFVnnW6CgQ2oTieYAmwL284/edit?usp=sharing
> reviewboard: https://reviews.apache.org/r/48527/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5592) Pass NetworkInfo to CNI Plugins

2016-06-16 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-5592:
--
Summary: Pass NetworkInfo to CNI Plugins  (was: Pass NetworkInfoto CNI 
Plugins)

> Pass NetworkInfo to CNI Plugins
> ---
>
> Key: MESOS-5592
> URL: https://issues.apache.org/jira/browse/MESOS-5592
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Dan Osborne
> Fix For: 1.0.0
>
>
> Mesos has adopted the Container Network Interface as a simple means of 
> networking Mesos tasks launched by the Unified Containerizer. The CNI 
> specification covers a minimum feature set, granting the flexibility to add 
> customized networking functionality in the form of agreements made between 
> the orchestrator and CNI plugin.
> This proposal is to pass NetworkInfo.Labels to the CNI plugin by injecting it 
> into the CNI network configuration json during plugin invocation.
> Design Doc on this change: 
> https://docs.google.com/document/d/1rxruCCcJqpppsQxQrzTbHFVnnW6CgQ2oTieYAmwL284/edit?usp=sharing
> reviewboard: https://reviews.apache.org/r/48527/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5412) Support CNI_ARGS

2016-06-16 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334603#comment-15334603
 ] 

Jie Yu commented on MESOS-5412:
---

CNI spec just merged the 'args' support
https://github.com/containernetworking/cni/pull/247

We should passing in NetworkInfo to the plugin using 'args' field.

> Support CNI_ARGS
> 
>
> Key: MESOS-5412
> URL: https://issues.apache.org/jira/browse/MESOS-5412
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Dan Osborne
>
> Mesos-CNI should support the 
> [CNI_ARGS|https://github.com/containernetworking/cni/blob/master/SPEC.md#parameters]
>  field.
> This would allow CNI plugins to be able to implement advanced networking 
> capabilities without needing modifications to Mesos. Current use case I am 
> facing: Allowing users to specify policy for their CNI plugin. 
> I'm proposing the following implementation: Pass a task's [NetworkInfo 
> Labels|https://github.com/apache/mesos/blob/b7e50fe8b20c96cda5546db5f2c2f47bee461edb/include/mesos/mesos.proto#L1732]
>  to the CNI plugin as CNI_ARGS. CNI args are simply key-value pairs split by 
> a '=', e.g. "FOO=BAR;ABC=123", which could be easily generated from the 
> NetworkInfo's key-value labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5619) Add task_num to mesos-executor

2016-06-16 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334584#comment-15334584
 ] 

Joseph Wu commented on MESOS-5619:
--

[~klausma] This ticket has the {{cli}} component, are you proposing a change to 
{{mesos-execute}} (the command scheduler) or {{mesos-executor}} (the command 
executor)?

> Add task_num to mesos-executor
> --
>
> Key: MESOS-5619
> URL: https://issues.apache.org/jira/browse/MESOS-5619
> Project: Mesos
>  Issue Type: Bug
>  Components: cli
>Reporter: Klaus Ma
>Assignee: Klaus Ma
>
> According to current code, {{mesos-executor}} will only launch one task. It's 
> better to add a parameter to special how many task to launch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5503) Implement GET_MAINTENANCE_STATUS Call in v1 master API.

2016-06-16 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334564#comment-15334564
 ] 

Vinod Kone commented on MESOS-5503:
---

Can you rebase these reviews?

> Implement GET_MAINTENANCE_STATUS Call in v1 master API.
> ---
>
> Key: MESOS-5503
> URL: https://issues.apache.org/jira/browse/MESOS-5503
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: haosdent
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5484) Implement GET_METRICS Call in v1 master API.

2016-06-16 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334553#comment-15334553
 ] 

Vinod Kone edited comment on MESOS-5484 at 6/16/16 7:54 PM:


commit 3441e4ff93423691be15380851e3241634a2ba87
Author: haosdent huang 
Date:   Thu Jun 16 12:51:46 2016 -0700

Implemented v1::master::Call::GET_METRICS.

Review: https://reviews.apache.org/r/48602/

commit ec5faa1b19c347b9ff78b1f37c668a2efac7aa59
Author: haosdent huang 
Date:   Thu Jun 16 12:51:42 2016 -0700

Exposed metrics information via `process::metrics::snapshot`.

Review: https://reviews.apache.org/r/48601/



was (Author: vinodkone):
commit 3441e4ff93423691be15380851e3241634a2ba87
Author: haosdent huang 
Date:   Thu Jun 16 12:51:46 2016 -0700

Implemented v1::master::Call::GET_METRICS.

Review: https://reviews.apache.org/r/48602/


> Implement GET_METRICS Call in v1 master API.
> 
>
> Key: MESOS-5484
> URL: https://issues.apache.org/jira/browse/MESOS-5484
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: haosdent
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5605) Improve documentation for using persistent volumes.

2016-06-16 Thread Joerg Schad (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joerg Schad updated MESOS-5605:
---
Assignee: (was: Joerg Schad)

> Improve documentation for using persistent volumes. 
> 
>
> Key: MESOS-5605
> URL: https://issues.apache.org/jira/browse/MESOS-5605
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Joerg Schad
>
> When using persistent volumes at a arangoDB we ran into a few pitfalls.
> We should document them in order for others to avoid those issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5605) Improve documentation for using persistent volumes.

2016-06-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MESOS-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334239#comment-15334239
 ] 

Max Neunhöffer commented on MESOS-5605:
---

Added an anchor.

https://reviews.apache.org/r/48224/

should probably be mentioned here as well.

> Improve documentation for using persistent volumes. 
> 
>
> Key: MESOS-5605
> URL: https://issues.apache.org/jira/browse/MESOS-5605
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>
> When using persistent volumes at a arangoDB we ran into a few pitfalls.
> We should document them in order for others to avoid those issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5618) Added a metric indicating if replicated log for the registrar has recovered or not.

2016-06-16 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334229#comment-15334229
 ] 

Jie Yu edited comment on MESOS-5618 at 6/16/16 5:31 PM:


https://reviews.apache.org/r/48801
https://reviews.apache.org/r/48802
https://reviews.apache.org/r/48803
https://reviews.apache.org/r/48804


was (Author: jieyu):
ttps://reviews.apache.org/r/48801
ttps://reviews.apache.org/r/48802
ttps://reviews.apache.org/r/48803
ttps://reviews.apache.org/r/48804

> Added a metric indicating if replicated log for the registrar has recovered 
> or not.
> ---
>
> Key: MESOS-5618
> URL: https://issues.apache.org/jira/browse/MESOS-5618
> Project: Mesos
>  Issue Type: Improvement
>  Components: replicated log
>Reporter: Jie Yu
>Assignee: Jie Yu
>  Labels: mesosphere
> Fix For: 1.0.0
>
>
> This gives operator insight about the state of the replicated log for 
> registrar. The operator needs to know when it is safe to move on to another 
> master in the upgrade orchestration pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5626) The discard logic in RecoverProtocolProcess is problematic.

2016-06-16 Thread Jie Yu (JIRA)
Jie Yu created MESOS-5626:
-

 Summary: The discard logic in RecoverProtocolProcess is 
problematic.
 Key: MESOS-5626
 URL: https://issues.apache.org/jira/browse/MESOS-5626
 Project: Mesos
  Issue Type: Bug
  Components: replicated log
Affects Versions: 0.28.2, 0.26.1, 0.27.3
Reporter: Jie Yu


The discard logic in RecoverProtocolProcess is problematic. It's likely that 
doing a 'discard' on the returned 'future' won't cause the 
RecoverProtocolProcess to terminate.

Right now, this is what we do when reacting to a discard on the returned future:
{code}
void discard()
{
  terminating = true;
  chain.discard();
}
{code}

We expect that 'chain' will become terminal and terminate the process in 
'finished'. However, it's likely that 'chain' is already terminal (e.g., during 
retry). As a result, the process might never be terminated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5624) In-place builds fail with CMake

2016-06-16 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334036#comment-15334036
 ] 

Vinod Kone commented on MESOS-5624:
---

Committed the work around.

commit 5ee100ffe11c8f541810387dacf96c1dadf89a12
Author: Jan Schlicht 
Date:   Thu Jun 16 08:46:34 2016 -0700

Changed CMake build to be out-of-place.

Using an out-of-place CMake build works around MESOS-5624.

Review: https://reviews.apache.org/r/48795/


> In-place builds fail with CMake
> ---
>
> Key: MESOS-5624
> URL: https://issues.apache.org/jira/browse/MESOS-5624
> Project: Mesos
>  Issue Type: Bug
>  Components: cmake
>Affects Versions: 1.0.0
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>
> In a cloned mesos repository, running
> {noformat}
> ./bootstrap && mkdir build && cd build && cmake .. && make
> {noformat}
> works, while
> {noformat}
> ./boostrap && cmake . && make
> {noformat}
> will fail to compile with the following error:
> {noformat}
> Building CXX object 
> src/CMakeFiles/mesos-1.0.0.dir/slave/containerizer/mesos/provisioner/appc/store.cpp.o
> cd /mnt/mesos/mesos/src && /usr/bin/c++   -DBUILD_DATE="\"2016-3-3 10:20\"" 
> -DBUILD_FLAGS=\"\" -DBUILD_JAVA_JVM_LIBRARY=\"\" -DBUILD_TIME=\"100\" 
> -DBUILD_USER=\"frank\" -DHAS_AUTHENTICATION=1 
> -DLIBDIR=\"/usr/local/libmesos\" -DPICOJSON_USE_INT64 
> -DPKGDATADIR=\"/usr/local/share/mesos\" 
> -DPKGLIBEXECDIR=\"/usr/local/libexec/mesos\" -DUSE_STATIC_LIB 
> -DVERSION=\"1.0.0\" -D__STDC_FORMAT_MACROS -std=c++11 -g 
> -I/mnt/mesos/mesos/include -I/mnt/mesos/mesos/include/mesos 
> -I/mnt/mesos/mesos/src -I/mnt/mesos/mesos/3rdparty/stout/include 
> -I/usr/include/apr-1.0 
> -I/mnt/mesos/mesos/3rdparty/boost-1.53.0/src/boost-1.53.0 
> -I/mnt/mesos/mesos/3rdparty/glog-0.3.3/src/glog-0.3.3-lib/lib/include 
> -I/mnt/mesos/mesos/3rdparty/picojson-1.3.0/src/picojson-1.3.0 
> -I/mnt/mesos/mesos/3rdparty/protobuf-2.6.1/src/protobuf-2.6.1-lib/lib/include 
> -I/usr/include/subversion-1 -I/mnt/mesos/mesos/src/src 
> -I/mnt/mesos/mesos/3rdparty/libprocess/include 
> -I/mnt/mesos/mesos/3rdparty/http_parser-2.6.2/src/http_parser-2.6.2 
> -I/mnt/mesos/mesos/3rdparty/libev-4.22/src/libev-4.22 
> -I/mnt/mesos/mesos/3rdparty/zookeeper-3.4.8/src/zookeeper-3.4.8/src/c/include 
> -I/mnt/mesos/mesos/3rdparty/zookeeper-3.4.8/src/zookeeper-3.4.8/src/c/generated
>  -I/mnt/mesos/mesos/3rdparty/leveldb-1.4/src/leveldb-1.4/include-o 
> CMakeFiles/mesos-1.0.0.dir/slave/containerizer/mesos/provisioner/appc/store.cpp.o
>  -c /mnt/mesos/mesos/src/slave/containerizer/mesos/provisioner/appc/store.cpp
> /mnt/mesos/mesos/src/slave/containerizer/mesos/provisioner/appc/store.cpp: In 
> static member function 'static 
> Try 
> mesos::internal::slave::appc::Store::create(const 
> mesos::internal::slave::Flags&)':
> /mnt/mesos/mesos/src/slave/containerizer/mesos/provisioner/appc/store.cpp:119:46:
>  error: 'mesos::uri::fetcher' has not been declared
>Try uriFetcher = uri::fetcher::create();
>   ^
> make[2]: *** 
> [src/CMakeFiles/mesos-1.0.0.dir/slave/containerizer/mesos/provisioner/appc/store.cpp.o]
>  Error 1
> make[2]: Leaving directory `/mnt/mesos/mesos'
> make[1]: *** [src/CMakeFiles/mesos-1.0.0.dir/all] Error 2
> make[1]: Leaving directory `/mnt/mesos/mesos'
> make: *** [all] Error 2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5624) In-place builds fail with CMake

2016-06-16 Thread Jan Schlicht (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334027#comment-15334027
 ] 

Jan Schlicht commented on MESOS-5624:
-

Workaround, by out-of-place building:
https://reviews.apache.org/r/48795/

> In-place builds fail with CMake
> ---
>
> Key: MESOS-5624
> URL: https://issues.apache.org/jira/browse/MESOS-5624
> Project: Mesos
>  Issue Type: Bug
>  Components: cmake
>Affects Versions: 1.0.0
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>
> In a cloned mesos repository, running
> {noformat}
> ./bootstrap && mkdir build && cd build && cmake .. && make
> {noformat}
> works, while
> {noformat}
> ./boostrap && cmake . && make
> {noformat}
> will fail to compile with the following error:
> {noformat}
> Building CXX object 
> src/CMakeFiles/mesos-1.0.0.dir/slave/containerizer/mesos/provisioner/appc/store.cpp.o
> cd /mnt/mesos/mesos/src && /usr/bin/c++   -DBUILD_DATE="\"2016-3-3 10:20\"" 
> -DBUILD_FLAGS=\"\" -DBUILD_JAVA_JVM_LIBRARY=\"\" -DBUILD_TIME=\"100\" 
> -DBUILD_USER=\"frank\" -DHAS_AUTHENTICATION=1 
> -DLIBDIR=\"/usr/local/libmesos\" -DPICOJSON_USE_INT64 
> -DPKGDATADIR=\"/usr/local/share/mesos\" 
> -DPKGLIBEXECDIR=\"/usr/local/libexec/mesos\" -DUSE_STATIC_LIB 
> -DVERSION=\"1.0.0\" -D__STDC_FORMAT_MACROS -std=c++11 -g 
> -I/mnt/mesos/mesos/include -I/mnt/mesos/mesos/include/mesos 
> -I/mnt/mesos/mesos/src -I/mnt/mesos/mesos/3rdparty/stout/include 
> -I/usr/include/apr-1.0 
> -I/mnt/mesos/mesos/3rdparty/boost-1.53.0/src/boost-1.53.0 
> -I/mnt/mesos/mesos/3rdparty/glog-0.3.3/src/glog-0.3.3-lib/lib/include 
> -I/mnt/mesos/mesos/3rdparty/picojson-1.3.0/src/picojson-1.3.0 
> -I/mnt/mesos/mesos/3rdparty/protobuf-2.6.1/src/protobuf-2.6.1-lib/lib/include 
> -I/usr/include/subversion-1 -I/mnt/mesos/mesos/src/src 
> -I/mnt/mesos/mesos/3rdparty/libprocess/include 
> -I/mnt/mesos/mesos/3rdparty/http_parser-2.6.2/src/http_parser-2.6.2 
> -I/mnt/mesos/mesos/3rdparty/libev-4.22/src/libev-4.22 
> -I/mnt/mesos/mesos/3rdparty/zookeeper-3.4.8/src/zookeeper-3.4.8/src/c/include 
> -I/mnt/mesos/mesos/3rdparty/zookeeper-3.4.8/src/zookeeper-3.4.8/src/c/generated
>  -I/mnt/mesos/mesos/3rdparty/leveldb-1.4/src/leveldb-1.4/include-o 
> CMakeFiles/mesos-1.0.0.dir/slave/containerizer/mesos/provisioner/appc/store.cpp.o
>  -c /mnt/mesos/mesos/src/slave/containerizer/mesos/provisioner/appc/store.cpp
> /mnt/mesos/mesos/src/slave/containerizer/mesos/provisioner/appc/store.cpp: In 
> static member function 'static 
> Try 
> mesos::internal::slave::appc::Store::create(const 
> mesos::internal::slave::Flags&)':
> /mnt/mesos/mesos/src/slave/containerizer/mesos/provisioner/appc/store.cpp:119:46:
>  error: 'mesos::uri::fetcher' has not been declared
>Try uriFetcher = uri::fetcher::create();
>   ^
> make[2]: *** 
> [src/CMakeFiles/mesos-1.0.0.dir/slave/containerizer/mesos/provisioner/appc/store.cpp.o]
>  Error 1
> make[2]: Leaving directory `/mnt/mesos/mesos'
> make[1]: *** [src/CMakeFiles/mesos-1.0.0.dir/all] Error 2
> make[1]: Leaving directory `/mnt/mesos/mesos'
> make: *** [all] Error 2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5588) Improve error handling when parsing acls.

2016-06-16 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-5588:
---
Shepherd: Till Toenshoff
Story Points: 5
  Labels: mesosphere security  (was: )

> Improve error handling when parsing acls.
> -
>
> Key: MESOS-5588
> URL: https://issues.apache.org/jira/browse/MESOS-5588
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>Priority: Blocker
>  Labels: mesosphere, security
> Fix For: 1.0.0
>
>
> During parsing of the authorizer errors are ignored. This can lead to 
> undetected security issues.
> Consider the following acl with an typo (usr instead of user)
> {code}
>"view_frameworks": [
>   {
> "principals": { "type": "ANY" },
> "usr": { "type": "NONE" }
>   }
> ]
> {code}
> When the master is started with these flags it will interprete the acl int he 
> following way which gives any principal access to any framework.
> {noformat}
> view_frameworks {
>   principals {
> type: ANY
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5588) Improve error handling when parsing acls.

2016-06-16 Thread Alexander Rojas (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333900#comment-15333900
 ] 

Alexander Rojas commented on MESOS-5588:


[r/48781/|https://reviews.apache.org/r/48781/]: Marked some optional fields in 
state.json as required.

> Improve error handling when parsing acls.
> -
>
> Key: MESOS-5588
> URL: https://issues.apache.org/jira/browse/MESOS-5588
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>Priority: Blocker
>  Labels: mesosphere, security
> Fix For: 1.0.0
>
>
> During parsing of the authorizer errors are ignored. This can lead to 
> undetected security issues.
> Consider the following acl with an typo (usr instead of user)
> {code}
>"view_frameworks": [
>   {
> "principals": { "type": "ANY" },
> "usr": { "type": "NONE" }
>   }
> ]
> {code}
> When the master is started with these flags it will interprete the acl int he 
> following way which gives any principal access to any framework.
> {noformat}
> view_frameworks {
>   principals {
> type: ANY
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5588) Improve error handling when parsing acls.

2016-06-16 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-5588:
---
 Priority: Blocker  (was: Major)
Fix Version/s: 1.0.0

> Improve error handling when parsing acls.
> -
>
> Key: MESOS-5588
> URL: https://issues.apache.org/jira/browse/MESOS-5588
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>Priority: Blocker
> Fix For: 1.0.0
>
>
> During parsing of the authorizer errors are ignored. This can lead to 
> undetected security issues.
> Consider the following acl with an typo (usr instead of user)
> {code}
>"view_frameworks": [
>   {
> "principals": { "type": "ANY" },
> "usr": { "type": "NONE" }
>   }
> ]
> {code}
> When the master is started with these flags it will interprete the acl int he 
> following way which gives any principal access to any framework.
> {noformat}
> view_frameworks {
>   principals {
> type: ANY
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5625) Document the overall treatment of scarce resources.

2016-06-16 Thread Guangya Liu (JIRA)
Guangya Liu created MESOS-5625:
--

 Summary: Document the overall treatment of scarce resources.
 Key: MESOS-5625
 URL: https://issues.apache.org/jira/browse/MESOS-5625
 Project: Mesos
  Issue Type: Bug
Reporter: Guangya Liu
Assignee: Guangya Liu


This document should clarify the overall treatment of scarce resources.

Please refer to http://markmail.org/thread/ojoz5zyko2l5srld for some initial 
discussion.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5624) In-place builds fail with CMake

2016-06-16 Thread Jan Schlicht (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht updated MESOS-5624:

Affects Version/s: 1.0.0

> In-place builds fail with CMake
> ---
>
> Key: MESOS-5624
> URL: https://issues.apache.org/jira/browse/MESOS-5624
> Project: Mesos
>  Issue Type: Bug
>  Components: cmake
>Affects Versions: 1.0.0
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>
> In a cloned mesos repository, running
> {noformat}
> ./bootstrap && mkdir build && cd build && cmake .. && make
> {noformat}
> works, while
> {noformat}
> ./boostrap && cmake . && make
> {noformat}
> will fail to compile with the following error:
> {noformat}
> Building CXX object 
> src/CMakeFiles/mesos-1.0.0.dir/slave/containerizer/mesos/provisioner/appc/store.cpp.o
> cd /mnt/mesos/mesos/src && /usr/bin/c++   -DBUILD_DATE="\"2016-3-3 10:20\"" 
> -DBUILD_FLAGS=\"\" -DBUILD_JAVA_JVM_LIBRARY=\"\" -DBUILD_TIME=\"100\" 
> -DBUILD_USER=\"frank\" -DHAS_AUTHENTICATION=1 
> -DLIBDIR=\"/usr/local/libmesos\" -DPICOJSON_USE_INT64 
> -DPKGDATADIR=\"/usr/local/share/mesos\" 
> -DPKGLIBEXECDIR=\"/usr/local/libexec/mesos\" -DUSE_STATIC_LIB 
> -DVERSION=\"1.0.0\" -D__STDC_FORMAT_MACROS -std=c++11 -g 
> -I/mnt/mesos/mesos/include -I/mnt/mesos/mesos/include/mesos 
> -I/mnt/mesos/mesos/src -I/mnt/mesos/mesos/3rdparty/stout/include 
> -I/usr/include/apr-1.0 
> -I/mnt/mesos/mesos/3rdparty/boost-1.53.0/src/boost-1.53.0 
> -I/mnt/mesos/mesos/3rdparty/glog-0.3.3/src/glog-0.3.3-lib/lib/include 
> -I/mnt/mesos/mesos/3rdparty/picojson-1.3.0/src/picojson-1.3.0 
> -I/mnt/mesos/mesos/3rdparty/protobuf-2.6.1/src/protobuf-2.6.1-lib/lib/include 
> -I/usr/include/subversion-1 -I/mnt/mesos/mesos/src/src 
> -I/mnt/mesos/mesos/3rdparty/libprocess/include 
> -I/mnt/mesos/mesos/3rdparty/http_parser-2.6.2/src/http_parser-2.6.2 
> -I/mnt/mesos/mesos/3rdparty/libev-4.22/src/libev-4.22 
> -I/mnt/mesos/mesos/3rdparty/zookeeper-3.4.8/src/zookeeper-3.4.8/src/c/include 
> -I/mnt/mesos/mesos/3rdparty/zookeeper-3.4.8/src/zookeeper-3.4.8/src/c/generated
>  -I/mnt/mesos/mesos/3rdparty/leveldb-1.4/src/leveldb-1.4/include-o 
> CMakeFiles/mesos-1.0.0.dir/slave/containerizer/mesos/provisioner/appc/store.cpp.o
>  -c /mnt/mesos/mesos/src/slave/containerizer/mesos/provisioner/appc/store.cpp
> /mnt/mesos/mesos/src/slave/containerizer/mesos/provisioner/appc/store.cpp: In 
> static member function 'static 
> Try 
> mesos::internal::slave::appc::Store::create(const 
> mesos::internal::slave::Flags&)':
> /mnt/mesos/mesos/src/slave/containerizer/mesos/provisioner/appc/store.cpp:119:46:
>  error: 'mesos::uri::fetcher' has not been declared
>Try uriFetcher = uri::fetcher::create();
>   ^
> make[2]: *** 
> [src/CMakeFiles/mesos-1.0.0.dir/slave/containerizer/mesos/provisioner/appc/store.cpp.o]
>  Error 1
> make[2]: Leaving directory `/mnt/mesos/mesos'
> make[1]: *** [src/CMakeFiles/mesos-1.0.0.dir/all] Error 2
> make[1]: Leaving directory `/mnt/mesos/mesos'
> make: *** [all] Error 2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5615) When using command executor, the ExecutorInfo is useless for sandbox authorization

2016-06-16 Thread Joerg Schad (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332751#comment-15332751
 ] 

Joerg Schad edited comment on MESOS-5615 at 6/16/16 12:58 PM:
--

Refactored sandbox authorization logic to use ObjectAuthorizer.
https://reviews.apache.org/r/48764/

Added 'labels' and 'discovery' to generated 'ExecutorInfo'.
https://reviews.apache.org/r/48765/

 Added tests for sandbox authorization.
 Review: https://reviews.apache.org/r/48789

 Added note about generation of `ExecutorInfo` for `CommandInfo`.
 Review: https://reviews.apache.org/r/48790


was (Author: js84):
Refactored sandbox authorization logic to use ObjectAuthorizer.
https://reviews.apache.org/r/48764/

Added 'labels' and 'discovery' to generated 'ExecutorInfo'.
https://reviews.apache.org/r/48765/

> When using command executor, the ExecutorInfo is useless for sandbox 
> authorization
> --
>
> Key: MESOS-5615
> URL: https://issues.apache.org/jira/browse/MESOS-5615
> Project: Mesos
>  Issue Type: Bug
>  Components: modules, security, slave
>Affects Versions: 1.0.0
>Reporter: Alexander Rojas
>Assignee: Joerg Schad
>Priority: Blocker
>  Labels: authorization, mesosphere, modularization, security
> Fix For: 1.0.0
>
>
> The design for sandbox access authorization uses the {{ExecutorInfo}} 
> associated with the task as the main authorization space and the 
> {{FrameworkInfo}} as a secondary one. This allows module writes to use fields 
> such a labels for authorization.
> When a task uses the _command executor_ it doesn't provide an 
> {{ExecutorInfo}}, but the info object is generated automatically inside the 
> agent. As such, information which could be used for authorization (e.g. 
> labels) is not available for authorization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5624) In-place builds fail with CMake

2016-06-16 Thread Jan Schlicht (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333676#comment-15333676
 ] 

Jan Schlicht commented on MESOS-5624:
-

A red herring. This include path is the same for a (working) out-of-place build.

> In-place builds fail with CMake
> ---
>
> Key: MESOS-5624
> URL: https://issues.apache.org/jira/browse/MESOS-5624
> Project: Mesos
>  Issue Type: Bug
>  Components: cmake
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>
> In a cloned mesos repository, running
> {noformat}
> ./bootstrap && mkdir build && cd build && cmake .. && make
> {noformat}
> works, while
> {noformat}
> ./boostrap && cmake . && make
> {noformat}
> will fail to compile with the following error:
> {noformat}
> Building CXX object 
> src/CMakeFiles/mesos-1.0.0.dir/slave/containerizer/mesos/provisioner/appc/store.cpp.o
> cd /mnt/mesos/mesos/src && /usr/bin/c++   -DBUILD_DATE="\"2016-3-3 10:20\"" 
> -DBUILD_FLAGS=\"\" -DBUILD_JAVA_JVM_LIBRARY=\"\" -DBUILD_TIME=\"100\" 
> -DBUILD_USER=\"frank\" -DHAS_AUTHENTICATION=1 
> -DLIBDIR=\"/usr/local/libmesos\" -DPICOJSON_USE_INT64 
> -DPKGDATADIR=\"/usr/local/share/mesos\" 
> -DPKGLIBEXECDIR=\"/usr/local/libexec/mesos\" -DUSE_STATIC_LIB 
> -DVERSION=\"1.0.0\" -D__STDC_FORMAT_MACROS -std=c++11 -g 
> -I/mnt/mesos/mesos/include -I/mnt/mesos/mesos/include/mesos 
> -I/mnt/mesos/mesos/src -I/mnt/mesos/mesos/3rdparty/stout/include 
> -I/usr/include/apr-1.0 
> -I/mnt/mesos/mesos/3rdparty/boost-1.53.0/src/boost-1.53.0 
> -I/mnt/mesos/mesos/3rdparty/glog-0.3.3/src/glog-0.3.3-lib/lib/include 
> -I/mnt/mesos/mesos/3rdparty/picojson-1.3.0/src/picojson-1.3.0 
> -I/mnt/mesos/mesos/3rdparty/protobuf-2.6.1/src/protobuf-2.6.1-lib/lib/include 
> -I/usr/include/subversion-1 -I/mnt/mesos/mesos/src/src 
> -I/mnt/mesos/mesos/3rdparty/libprocess/include 
> -I/mnt/mesos/mesos/3rdparty/http_parser-2.6.2/src/http_parser-2.6.2 
> -I/mnt/mesos/mesos/3rdparty/libev-4.22/src/libev-4.22 
> -I/mnt/mesos/mesos/3rdparty/zookeeper-3.4.8/src/zookeeper-3.4.8/src/c/include 
> -I/mnt/mesos/mesos/3rdparty/zookeeper-3.4.8/src/zookeeper-3.4.8/src/c/generated
>  -I/mnt/mesos/mesos/3rdparty/leveldb-1.4/src/leveldb-1.4/include-o 
> CMakeFiles/mesos-1.0.0.dir/slave/containerizer/mesos/provisioner/appc/store.cpp.o
>  -c /mnt/mesos/mesos/src/slave/containerizer/mesos/provisioner/appc/store.cpp
> /mnt/mesos/mesos/src/slave/containerizer/mesos/provisioner/appc/store.cpp: In 
> static member function 'static 
> Try 
> mesos::internal::slave::appc::Store::create(const 
> mesos::internal::slave::Flags&)':
> /mnt/mesos/mesos/src/slave/containerizer/mesos/provisioner/appc/store.cpp:119:46:
>  error: 'mesos::uri::fetcher' has not been declared
>Try uriFetcher = uri::fetcher::create();
>   ^
> make[2]: *** 
> [src/CMakeFiles/mesos-1.0.0.dir/slave/containerizer/mesos/provisioner/appc/store.cpp.o]
>  Error 1
> make[2]: Leaving directory `/mnt/mesos/mesos'
> make[1]: *** [src/CMakeFiles/mesos-1.0.0.dir/all] Error 2
> make[1]: Leaving directory `/mnt/mesos/mesos'
> make: *** [all] Error 2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5624) In-place builds fail with CMake

2016-06-16 Thread Jan Schlicht (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333639#comment-15333639
 ] 

Jan Schlicht commented on MESOS-5624:
-

{{-I/mnt/mesos/mesos/src/src}} looks odd. Seems like some include paths could 
be wrongly generated.

> In-place builds fail with CMake
> ---
>
> Key: MESOS-5624
> URL: https://issues.apache.org/jira/browse/MESOS-5624
> Project: Mesos
>  Issue Type: Bug
>  Components: cmake
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>
> In a cloned mesos repository, running
> {noformat}
> ./bootstrap && mkdir build && cd build && cmake .. && make
> {noformat}
> works, while
> {noformat}
> ./boostrap && cmake . && make
> {noformat}
> will fail to compile with the following error:
> {noformat}
> Building CXX object 
> src/CMakeFiles/mesos-1.0.0.dir/slave/containerizer/mesos/provisioner/appc/store.cpp.o
> cd /mnt/mesos/mesos/src && /usr/bin/c++   -DBUILD_DATE="\"2016-3-3 10:20\"" 
> -DBUILD_FLAGS=\"\" -DBUILD_JAVA_JVM_LIBRARY=\"\" -DBUILD_TIME=\"100\" 
> -DBUILD_USER=\"frank\" -DHAS_AUTHENTICATION=1 
> -DLIBDIR=\"/usr/local/libmesos\" -DPICOJSON_USE_INT64 
> -DPKGDATADIR=\"/usr/local/share/mesos\" 
> -DPKGLIBEXECDIR=\"/usr/local/libexec/mesos\" -DUSE_STATIC_LIB 
> -DVERSION=\"1.0.0\" -D__STDC_FORMAT_MACROS -std=c++11 -g 
> -I/mnt/mesos/mesos/include -I/mnt/mesos/mesos/include/mesos 
> -I/mnt/mesos/mesos/src -I/mnt/mesos/mesos/3rdparty/stout/include 
> -I/usr/include/apr-1.0 
> -I/mnt/mesos/mesos/3rdparty/boost-1.53.0/src/boost-1.53.0 
> -I/mnt/mesos/mesos/3rdparty/glog-0.3.3/src/glog-0.3.3-lib/lib/include 
> -I/mnt/mesos/mesos/3rdparty/picojson-1.3.0/src/picojson-1.3.0 
> -I/mnt/mesos/mesos/3rdparty/protobuf-2.6.1/src/protobuf-2.6.1-lib/lib/include 
> -I/usr/include/subversion-1 -I/mnt/mesos/mesos/src/src 
> -I/mnt/mesos/mesos/3rdparty/libprocess/include 
> -I/mnt/mesos/mesos/3rdparty/http_parser-2.6.2/src/http_parser-2.6.2 
> -I/mnt/mesos/mesos/3rdparty/libev-4.22/src/libev-4.22 
> -I/mnt/mesos/mesos/3rdparty/zookeeper-3.4.8/src/zookeeper-3.4.8/src/c/include 
> -I/mnt/mesos/mesos/3rdparty/zookeeper-3.4.8/src/zookeeper-3.4.8/src/c/generated
>  -I/mnt/mesos/mesos/3rdparty/leveldb-1.4/src/leveldb-1.4/include-o 
> CMakeFiles/mesos-1.0.0.dir/slave/containerizer/mesos/provisioner/appc/store.cpp.o
>  -c /mnt/mesos/mesos/src/slave/containerizer/mesos/provisioner/appc/store.cpp
> /mnt/mesos/mesos/src/slave/containerizer/mesos/provisioner/appc/store.cpp: In 
> static member function 'static 
> Try 
> mesos::internal::slave::appc::Store::create(const 
> mesos::internal::slave::Flags&)':
> /mnt/mesos/mesos/src/slave/containerizer/mesos/provisioner/appc/store.cpp:119:46:
>  error: 'mesos::uri::fetcher' has not been declared
>Try uriFetcher = uri::fetcher::create();
>   ^
> make[2]: *** 
> [src/CMakeFiles/mesos-1.0.0.dir/slave/containerizer/mesos/provisioner/appc/store.cpp.o]
>  Error 1
> make[2]: Leaving directory `/mnt/mesos/mesos'
> make[1]: *** [src/CMakeFiles/mesos-1.0.0.dir/all] Error 2
> make[1]: Leaving directory `/mnt/mesos/mesos'
> make: *** [all] Error 2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5588) Improve error handling when parsing acls.

2016-06-16 Thread Alexander Rojas (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1589#comment-1589
 ] 

Alexander Rojas edited comment on MESOS-5588 at 6/16/16 11:55 AM:
--

After consideration it seems many of use copied and pasted the last message 
when introducing new ACLs, so once one came with {{optional}} the next pasted 
one inherited accidentally.


was (Author: arojas):
After consideration it seems many of use copied and pasted the last message 
when introducing new ACLs, so once one came with {{optional}} the next pasted 
once inherited accidentally.

> Improve error handling when parsing acls.
> -
>
> Key: MESOS-5588
> URL: https://issues.apache.org/jira/browse/MESOS-5588
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>
> During parsing of the authorizer errors are ignored. This can lead to 
> undetected security issues.
> Consider the following acl with an typo (usr instead of user)
> {code}
>"view_frameworks": [
>   {
> "principals": { "type": "ANY" },
> "usr": { "type": "NONE" }
>   }
> ]
> {code}
> When the master is started with these flags it will interprete the acl int he 
> following way which gives any principal access to any framework.
> {noformat}
> view_frameworks {
>   principals {
> type: ANY
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5588) Improve error handling when parsing acls.

2016-06-16 Thread Alexander Rojas (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1589#comment-1589
 ] 

Alexander Rojas edited comment on MESOS-5588 at 6/16/16 11:55 AM:
--

After consideration it seems many of use copied and pasted the last message 
when introducing new ACLs, so once one came with {{optional}} the next pasted 
one inherited the same attribute accidentally.


was (Author: arojas):
After consideration it seems many of use copied and pasted the last message 
when introducing new ACLs, so once one came with {{optional}} the next pasted 
one inherited accidentally.

> Improve error handling when parsing acls.
> -
>
> Key: MESOS-5588
> URL: https://issues.apache.org/jira/browse/MESOS-5588
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>
> During parsing of the authorizer errors are ignored. This can lead to 
> undetected security issues.
> Consider the following acl with an typo (usr instead of user)
> {code}
>"view_frameworks": [
>   {
> "principals": { "type": "ANY" },
> "usr": { "type": "NONE" }
>   }
> ]
> {code}
> When the master is started with these flags it will interprete the acl int he 
> following way which gives any principal access to any framework.
> {noformat}
> view_frameworks {
>   principals {
> type: ANY
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5624) In-place builds fail with CMake

2016-06-16 Thread Jan Schlicht (JIRA)
Jan Schlicht created MESOS-5624:
---

 Summary: In-place builds fail with CMake
 Key: MESOS-5624
 URL: https://issues.apache.org/jira/browse/MESOS-5624
 Project: Mesos
  Issue Type: Bug
  Components: cmake
Reporter: Jan Schlicht
Assignee: Jan Schlicht


In a cloned mesos repository, running
{noformat}
./bootstrap && mkdir build && cd build && cmake .. && make
{noformat}
works, while
{noformat}
./boostrap && cmake . && make
{noformat}
will fail to compile with the following error:
{noformat}
Building CXX object 
src/CMakeFiles/mesos-1.0.0.dir/slave/containerizer/mesos/provisioner/appc/store.cpp.o
cd /mnt/mesos/mesos/src && /usr/bin/c++   -DBUILD_DATE="\"2016-3-3 10:20\"" 
-DBUILD_FLAGS=\"\" -DBUILD_JAVA_JVM_LIBRARY=\"\" -DBUILD_TIME=\"100\" 
-DBUILD_USER=\"frank\" -DHAS_AUTHENTICATION=1 -DLIBDIR=\"/usr/local/libmesos\" 
-DPICOJSON_USE_INT64 -DPKGDATADIR=\"/usr/local/share/mesos\" 
-DPKGLIBEXECDIR=\"/usr/local/libexec/mesos\" -DUSE_STATIC_LIB 
-DVERSION=\"1.0.0\" -D__STDC_FORMAT_MACROS -std=c++11 -g 
-I/mnt/mesos/mesos/include -I/mnt/mesos/mesos/include/mesos 
-I/mnt/mesos/mesos/src -I/mnt/mesos/mesos/3rdparty/stout/include 
-I/usr/include/apr-1.0 
-I/mnt/mesos/mesos/3rdparty/boost-1.53.0/src/boost-1.53.0 
-I/mnt/mesos/mesos/3rdparty/glog-0.3.3/src/glog-0.3.3-lib/lib/include 
-I/mnt/mesos/mesos/3rdparty/picojson-1.3.0/src/picojson-1.3.0 
-I/mnt/mesos/mesos/3rdparty/protobuf-2.6.1/src/protobuf-2.6.1-lib/lib/include 
-I/usr/include/subversion-1 -I/mnt/mesos/mesos/src/src 
-I/mnt/mesos/mesos/3rdparty/libprocess/include 
-I/mnt/mesos/mesos/3rdparty/http_parser-2.6.2/src/http_parser-2.6.2 
-I/mnt/mesos/mesos/3rdparty/libev-4.22/src/libev-4.22 
-I/mnt/mesos/mesos/3rdparty/zookeeper-3.4.8/src/zookeeper-3.4.8/src/c/include 
-I/mnt/mesos/mesos/3rdparty/zookeeper-3.4.8/src/zookeeper-3.4.8/src/c/generated 
-I/mnt/mesos/mesos/3rdparty/leveldb-1.4/src/leveldb-1.4/include-o 
CMakeFiles/mesos-1.0.0.dir/slave/containerizer/mesos/provisioner/appc/store.cpp.o
 -c /mnt/mesos/mesos/src/slave/containerizer/mesos/provisioner/appc/store.cpp
/mnt/mesos/mesos/src/slave/containerizer/mesos/provisioner/appc/store.cpp: In 
static member function 'static 
Try 
mesos::internal::slave::appc::Store::create(const 
mesos::internal::slave::Flags&)':
/mnt/mesos/mesos/src/slave/containerizer/mesos/provisioner/appc/store.cpp:119:46:
 error: 'mesos::uri::fetcher' has not been declared
   Try uriFetcher = uri::fetcher::create();
  ^
make[2]: *** 
[src/CMakeFiles/mesos-1.0.0.dir/slave/containerizer/mesos/provisioner/appc/store.cpp.o]
 Error 1
make[2]: Leaving directory `/mnt/mesos/mesos'
make[1]: *** [src/CMakeFiles/mesos-1.0.0.dir/all] Error 2
make[1]: Leaving directory `/mnt/mesos/mesos'
make: *** [all] Error 2
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5623) Add test cases for scarce resources

2016-06-16 Thread Guangya Liu (JIRA)
Guangya Liu created MESOS-5623:
--

 Summary: Add test cases for scarce resources
 Key: MESOS-5623
 URL: https://issues.apache.org/jira/browse/MESOS-5623
 Project: Mesos
  Issue Type: Bug
Reporter: Guangya Liu
Assignee: Guangya Liu


Add some test cases for scarce resources change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5622) Update allocator to handle scarce resources

2016-06-16 Thread Guangya Liu (JIRA)
Guangya Liu created MESOS-5622:
--

 Summary: Update allocator to handle scarce resources
 Key: MESOS-5622
 URL: https://issues.apache.org/jira/browse/MESOS-5622
 Project: Mesos
  Issue Type: Bug
Reporter: Guangya Liu
Assignee: Guangya Liu


The allocator should be updated to handle scarce resources, the idea is exclude 
scarce resources from all sorters in allocator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5621) Add helper function to get non scarce resoures

2016-06-16 Thread Guangya Liu (JIRA)
Guangya Liu created MESOS-5621:
--

 Summary: Add helper function to get non scarce resoures
 Key: MESOS-5621
 URL: https://issues.apache.org/jira/browse/MESOS-5621
 Project: Mesos
  Issue Type: Bug
Reporter: Guangya Liu
Assignee: Guangya Liu


We need a helper function to get all non scarce resources so as to help 
allocator get the non scarce resources information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5620) Add a new flag in master to define the scarce resources.

2016-06-16 Thread Guangya Liu (JIRA)
Guangya Liu created MESOS-5620:
--

 Summary: Add a new flag in master to define the scarce resources.
 Key: MESOS-5620
 URL: https://issues.apache.org/jira/browse/MESOS-5620
 Project: Mesos
  Issue Type: Bug
Reporter: Guangya Liu
Assignee: Guangya Liu


Add a new flag to define the scarce resources, the scarce resources will be 
excluded from DRF.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2717) Qemu/KVM containerizer

2016-06-16 Thread Abhishek Dasgupta (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333459#comment-15333459
 ] 

Abhishek Dasgupta commented on MESOS-2717:
--

why can't we merge the two containerization techniques and make it one? And 
also if we accommodate hypervisorization, we should differentiate it than 
containerizations. As hypervisor acts differently than container and going 
forward, it would be highly difficult to maintain, if we try to merge 
hypervisor technique to container technique. Instead, I propose to merge two 
container technique ( docker and mesos) and a separate hypervisor technique. 
What do you say?

> Qemu/KVM containerizer
> --
>
> Key: MESOS-2717
> URL: https://issues.apache.org/jira/browse/MESOS-2717
> Project: Mesos
>  Issue Type: Wish
>  Components: containerization
>Reporter: Pierre-Yves Ritschard
>Assignee: Abhishek Dasgupta
>
> I think it would make sense for Mesos to have the ability to treat 
> hypervisors as containerizers and the most sensible one to start with would 
> probably be Qemu/KVM.
> There are a few workloads that can require full-fledged VMs (the most obvious 
> one being Windows workloads).
> The containerization code is well decoupled and seems simple enough, I can 
> definitely take a shot at it. VMs do bring some questions with them here is 
> my take on them:
> 1. Routing, network strategy
> ==
> The simplest approach here might very well be to go for bridged networks
> and leave the setup and inter slave routing up to the administrator
> 2. IP Address assignment
> 
> At first, it can be up to the Frameworks to deal with IP assignment.
> The simplest way to address this could be to have an executor running
> on slaves providing the qemu/kvm containerizer which would instrument a DHCP 
> server and collect IP + Mac address resources from slaves. While it may be up 
> to the frameworks to provide this, an example should most likely be provided.
> 3. VM Templates
> ==
> VM templates should probably leverage the fetcher and could thus be copied 
> locally or fetch from HTTP(s) / HDFS.
> 4. Resource limiting
> 
> Mapping resouce constraints to the qemu command line is probably the easiest 
> part, Additional command line should also be fetchable. For Unix VMs, the 
> sandbox could show the output of the serial console
> 5. Libvirt / plain Qemu
> =
> I tend to favor limiting the amount of necessary hoops to jump through and 
> would thus investigate working directly with Qemu, maintaining an open 
> connection to the monitor to assert status.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5588) Improve error handling when parsing acls.

2016-06-16 Thread Alexander Rojas (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1589#comment-1589
 ] 

Alexander Rojas commented on MESOS-5588:


After consideration it seems many of use copied and pasted the last message 
when introducing new ACLs, so once one came with {{optional}} the next pasted 
once inherited accidentally.

> Improve error handling when parsing acls.
> -
>
> Key: MESOS-5588
> URL: https://issues.apache.org/jira/browse/MESOS-5588
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>
> During parsing of the authorizer errors are ignored. This can lead to 
> undetected security issues.
> Consider the following acl with an typo (usr instead of user)
> {code}
>"view_frameworks": [
>   {
> "principals": { "type": "ANY" },
> "usr": { "type": "NONE" }
>   }
> ]
> {code}
> When the master is started with these flags it will interprete the acl int he 
> following way which gives any principal access to any framework.
> {noformat}
> view_frameworks {
>   principals {
> type: ANY
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5588) Improve error handling when parsing acls.

2016-06-16 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1587#comment-1587
 ] 

Alexander Rukletsov commented on MESOS-5588:


I think the precedent for {{optional}} was set in 
https://reviews.apache.org/r/41681/diff/28-29/ . Unfortunately, I don't see any 
comments on why this change was made, maybe there were some offline discussion 
(cc [~adam-mesos], [~gradywang]). Since then, newly added actions were 
following the pattern. I don't see any reason why we should use {{required}} in 
some cases and {{optional}} in other, so let's pick one and restore consistency.

Keep in mind, that ptoro3 doesn't support {{required}} fields; so even though 
it makes sense to have protobuf parser check ACL integrity for us, we might 
have to revisit this in future and implement validation ourselves.

> Improve error handling when parsing acls.
> -
>
> Key: MESOS-5588
> URL: https://issues.apache.org/jira/browse/MESOS-5588
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>
> During parsing of the authorizer errors are ignored. This can lead to 
> undetected security issues.
> Consider the following acl with an typo (usr instead of user)
> {code}
>"view_frameworks": [
>   {
> "principals": { "type": "ANY" },
> "usr": { "type": "NONE" }
>   }
> ]
> {code}
> When the master is started with these flags it will interprete the acl int he 
> following way which gives any principal access to any framework.
> {noformat}
> view_frameworks {
>   principals {
> type: ANY
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5588) Improve error handling when parsing acls.

2016-06-16 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1563#comment-1563
 ] 

Alexander Rukletsov commented on MESOS-5588:


But we still can check whether certain fields are present _after_ parsing, 
right?

> Improve error handling when parsing acls.
> -
>
> Key: MESOS-5588
> URL: https://issues.apache.org/jira/browse/MESOS-5588
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>
> During parsing of the authorizer errors are ignored. This can lead to 
> undetected security issues.
> Consider the following acl with an typo (usr instead of user)
> {code}
>"view_frameworks": [
>   {
> "principals": { "type": "ANY" },
> "usr": { "type": "NONE" }
>   }
> ]
> {code}
> When the master is started with these flags it will interprete the acl int he 
> following way which gives any principal access to any framework.
> {noformat}
> view_frameworks {
>   principals {
> type: ANY
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5588) Improve error handling when parsing acls.

2016-06-16 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1559#comment-1559
 ] 

Alexander Rukletsov commented on MESOS-5588:


If we choose the validation path, we can fix it as part of MESOS-5406.

> Improve error handling when parsing acls.
> -
>
> Key: MESOS-5588
> URL: https://issues.apache.org/jira/browse/MESOS-5588
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>
> During parsing of the authorizer errors are ignored. This can lead to 
> undetected security issues.
> Consider the following acl with an typo (usr instead of user)
> {code}
>"view_frameworks": [
>   {
> "principals": { "type": "ANY" },
> "usr": { "type": "NONE" }
>   }
> ]
> {code}
> When the master is started with these flags it will interprete the acl int he 
> following way which gives any principal access to any framework.
> {noformat}
> view_frameworks {
>   principals {
> type: ANY
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5588) Improve error handling when parsing acls.

2016-06-16 Thread Joerg Schad (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1540#comment-1540
 ] 

Joerg Schad commented on MESOS-5588:


One weird inconsistency is that we have different optional/required semantics 
in different acls:

{code}
  // Specifies which roles a principal can reserve resources for.
  message ReserveResources {
// Subjects: Framework principal or Operator username.
required Entity principals = 1;

// Objects: The principal(s) can reserve resources for these roles.
required Entity roles = 2;
  }
{code}

vs. 
{code}
  // Which principals are authorized to access the Mesos logs.
  message AccessMesosLog {
// Subjects: HTTP Username.
required Entity principals = 1;

// Objects: Given implicitly. Use Entity type ANY or NONE to allow or deny
// access.
optional Entity logs = 2;
  }
{code}

> Improve error handling when parsing acls.
> -
>
> Key: MESOS-5588
> URL: https://issues.apache.org/jira/browse/MESOS-5588
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>
> During parsing of the authorizer errors are ignored. This can lead to 
> undetected security issues.
> Consider the following acl with an typo (usr instead of user)
> {code}
>"view_frameworks": [
>   {
> "principals": { "type": "ANY" },
> "usr": { "type": "NONE" }
>   }
> ]
> {code}
> When the master is started with these flags it will interprete the acl int he 
> following way which gives any principal access to any framework.
> {noformat}
> view_frameworks {
>   principals {
> type: ANY
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5615) When using command executor, the ExecutorInfo is useless for sandbox authorization

2016-06-16 Thread Alexander Rojas (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1521#comment-1521
 ] 

Alexander Rojas commented on MESOS-5615:


I personally don't know where this becomes an issue, since the type give an 
extra level of "safeness" meaning, a label may be unique within all tasks, but 
executors are no tasks. i.e. there is some type safety involved.

> When using command executor, the ExecutorInfo is useless for sandbox 
> authorization
> --
>
> Key: MESOS-5615
> URL: https://issues.apache.org/jira/browse/MESOS-5615
> Project: Mesos
>  Issue Type: Bug
>  Components: modules, security, slave
>Affects Versions: 1.0.0
>Reporter: Alexander Rojas
>Assignee: Joerg Schad
>Priority: Blocker
>  Labels: authorization, mesosphere, modularization, security
> Fix For: 1.0.0
>
>
> The design for sandbox access authorization uses the {{ExecutorInfo}} 
> associated with the task as the main authorization space and the 
> {{FrameworkInfo}} as a secondary one. This allows module writes to use fields 
> such a labels for authorization.
> When a task uses the _command executor_ it doesn't provide an 
> {{ExecutorInfo}}, but the info object is generated automatically inside the 
> agent. As such, information which could be used for authorization (e.g. 
> labels) is not available for authorization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5585) Support the pids cgroup in the agent

2016-06-16 Thread Abhishek Dasgupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Dasgupta reassigned MESOS-5585:


Assignee: Abhishek Dasgupta

> Support the pids cgroup in the agent
> 
>
> Key: MESOS-5585
> URL: https://issues.apache.org/jira/browse/MESOS-5585
> Project: Mesos
>  Issue Type: Improvement
>  Components: slave
>Affects Versions: 1.0.0
>Reporter: Jeffrey Schroeder
>Assignee: Abhishek Dasgupta
>Priority: Minor
>
> http://kernelnewbies.org/Linux_4.3#head-6d5a75f66376fbdc0a77e2386b5aa743d8f7aeb8
> For most fork-bomb style attacks, the memory limit should neutralize them, 
> but if the task requests a lot of memory, it could still impact the host. 
> This is a nice feature that gives cluster operators some flexibility in 
> multi-tenant scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)