[jira] [Updated] (MESOS-6937) ContentType/MasterAPITest.ReserveResources/1 fails during Writer close

2017-01-17 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-6937:
-
Sprint: Mesosphere Sprint 49

> ContentType/MasterAPITest.ReserveResources/1 fails during Writer close
> --
>
> Key: MESOS-6937
> URL: https://issues.apache.org/jira/browse/MESOS-6937
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
> Environment: ASF CI, Ubuntu 14.04, libevent and SSL enabled
>Reporter: Greg Mann
>Assignee: Greg Mann
>Priority: Blocker
>  Labels: tests
> Attachments: MasterAPITest.ReserveResources.txt
>
>
> This was observed on ASF CI. Libevent was enabled, but the test in question 
> was not running in SSL-enabled mode. We see the following stack trace:
> {code}
> *** Error in `src/mesos-tests': double free or corruption (fasttop): 
> 0x2b4f7001bf70 ***
> *** Aborted at 1484691168 (unix time) try "date -d @1484691168" if you are 
> using GNU date ***
> PC: @ 0x2b4f2bc9ac37 (unknown)
> *** SIGABRT (@0x3e869c7) received by PID 27079 (TID 0x2b4f35be5700) from 
> PID 27079; stack trace: ***
> @ 0x2b4f2b236330 (unknown)
> @ 0x2b4f2bc9ac37 (unknown)
> @ 0x2b4f2bc9e028 (unknown)
> @ 0x2b4f2bcd72a4 (unknown)
> @ 0x2b4f2bce355e (unknown)
> @ 0x2b4f299e98a0 
> _ZNSt14_Function_base13_Base_managerIZN7process8internal4LoopIZNS1_4http4Pipe6Reader7readAllEvEUlvE_ZNS6_7readAllEvEUlRKSsE0_SsSsE3runENS1_6FutureISsEEEUlvE3_E10_M_managerERSt9_Any_dataRKSG_St18_Manager_operation
> @ 0x2b4f299fadb9 
> _ZN7process8internal4LoopIZNS_4http4Pipe6Reader7readAllEvEUlvE_ZNS4_7readAllEvEUlRKSsE0_SsSsE3runENS_6FutureISsEE
> @ 0x2b4f299fca57 
> _ZNSt17_Function_handlerIFvRKN7process6FutureISsEEEZNKS2_5onAnyIRZNS0_8internal4LoopIZNS0_4http4Pipe6Reader7readAllEvEUlvE_ZNSB_7readAllEvEUlRKSsE0_SsSsE3runES2_EUlS4_E2_vEES4_OT_NS2_6PreferEEUlS4_E_E9_M_invokeERKSt9_Any_dataS4_
> @ 0x2b4f28a4cc16 
> _ZN7process8internal3runISt8functionIFvRKNS_6FutureISsJRS4_EEEvRKSt6vectorIT_SaISB_EEDpOT0_
> @ 0x2b4f29a2479f process::Future<>::_set<>()
> @ 0x2b4f299f46a9 process::http::Pipe::Writer::close()
> @ 0x2b4f29a24d32 
> process::StreamingRequestDecoder::on_message_complete()
> @ 0x2b4f29b0641d http_parser_execute
> @ 0x2b4f29aaeafe process::internal::decode_recv()
> @ 0x2b4f29abc44b 
> _ZNSt17_Function_handlerIFvRKN7process6FutureImEEEZNKS2_5onAnyISt5_BindIFPFvS4_PcmNS0_7network8internal6SocketINS9_4inet7AddressEEEPNS0_23StreamingRequestDecoderEESt12_PlaceholderILi1EES8_mSE_SG_EEvEES4_OT_NS2_6PreferEEUlS4_E_E9_M_invokeERKSt9_Any_dataS4_
> @  0x14e136e process::internal::run<>()
> @  0x14e5d9f process::Future<>::_set<>()
> @ 0x2b4f29a4c23d 
> _ZN7process8internal4LoopIZNS_2io8internal4readEiPvmEUlvE_ZNS3_4readEiS4_mEUlRK6OptionImEE0_S7_mE3runENS_6FutureIS7_EE
> @ 0x2b4f29a4dc6f 
> _ZNSt17_Function_handlerIFvRKN7process6FutureINS0_11ControlFlowImEZNKS4_5onAnyIRZNS0_8internal4LoopIZNS0_2io8internal4readEiPvmEUlvE_ZNSC_4readEiSD_mEUlRK6OptionImEE0_SG_mE3runENS1_ISG_EEEUlS6_E0_vEES6_OT_NS4_6PreferEEUlS6_E_E9_M_invokeERKSt9_Any_dataS6_
> @ 0x2b4f29a5bec6 
> _ZN7process8internal3runISt8functionIFvRKNS_6FutureINS_11ControlFlowImEEJRS6_EEEvRKSt6vectorIT_SaISD_EEDpOT0_
> @ 0x2b4f29a5d971 process::Future<>::_set<>()
> @ 0x2b4f29a600a1 process::Promise<>::associate()
> @ 0x2b4f29a608da process::internal::thenf<>()
> @ 0x2b4f29b0170e 
> _ZN7process8internal3runISt8functionIFvRKNS_6FutureIsJRS4_EEEvRKSt6vectorIT_SaISB_EEDpOT0_
> @ 0x2b4f29b01cd1 process::Future<>::_set<>()
> @ 0x2b4f29b00b36 process::io::internal::pollCallback()
> @ 0x2b4f29b0b990 event_process_active_single_queue
> @ 0x2b4f29b0bf06 event_process_active
> @ 0x2b4f29b0c662 event_base_loop
> @ 0x2b4f29aff96d process::EventLoop::run()
> @ 0x2b4f2b4f5a60 (unknown)
> @ 0x2b4f2b22e184 start_thread
> {code}
> Find the log from the failed run attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6902) Add support for agent capabilities

2017-01-17 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827419#comment-15827419
 ] 

Jay Guo commented on MESOS-6902:


[~bmahler] Sure, working on it.

> Add support for agent capabilities
> --
>
> Key: MESOS-6902
> URL: https://issues.apache.org/jira/browse/MESOS-6902
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Reporter: Neil Conway
>Assignee: Jay Guo
>  Labels: mesosphere
>
> Similarly to how we might add support for master capabilities (MESOS-5675), 
> agent capabilities would also make sense: in a mixed cluster, the master 
> might have support for features that are not present on certain agents, and 
> vice versa.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6941) Add support for batch processing of status updates, to increase latency / throughput / cluster scalability.

2017-01-17 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-6941:
--

 Summary: Add support for batch processing of status updates, to 
increase latency / throughput / cluster scalability.
 Key: MESOS-6941
 URL: https://issues.apache.org/jira/browse/MESOS-6941
 Project: Mesos
  Issue Type: Improvement
  Components: agent, framework api, master
Reporter: Benjamin Mahler


Currently, each task has its own independent status update stream. Within an 
individual stream, updates are sent to schedulers in a serial fashion: the 
agent will send the N+1th status update only after it receives the scheduler's 
acknowledgement for the Nth status update.

This approach slows down throughput substantially and has the potential to 
backlog status updates when they occur rapidly. Rather, we should add the 
ability for all available updates on a stream to be sent together (there should 
probably be a limit to the size of this "batch") so that the scheduler can 
process them together without incurring the round trip acknowledgement latency 
in between each update.

In addition, there may be cases of updates where the scheduler only wants the 
latest information (e.g. download status per MESOS-2256, or possibly health 
information). But this should be tackled separately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6854) Prevent launching MULTI_ROLE framework's tasks on agents without MULTI_ROLE support.

2017-01-17 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827207#comment-15827207
 ] 

Benjamin Mahler commented on MESOS-6854:


[~guoger] I filed MESOS-6940 to replace this ticket, I think we want to avoid 
sending offers entirely.

The situation you mention is ok so long as the framework never changed its 
roles, which we will impose in phase 1. However, in phase 2, we don't have a 
means to know if the framework changed its role during the lifetime of the 
agent. The only means we have to check is to inspect the active executor and 
tasks and this only tells us that the framework didn't change its role during 
the lifetime of the active tasks and executors on the agent. If the framework 
changed its role before these were launched, we might accidentally expose the 
agent to a changed framework role and the old agent doesn't handle role changes.

Since this is rather complicated, and since this is new functionality, I think 
we can say: "if you want to use a MULTI_ROLE framework, upgrade your cluster to 
1.z. If there are agents registered that have not been upgraded, the MULTI_ROLE 
framework will not receive any offers for this agent. If the MULTI_ROLE 
framework was previously running without the MULTI_ROLE capability and has long 
running tasks on non-MULTI_ROLE agents, these tasks will continue to run". 
We'll need to be precise about this kind of thing in the upgrade notes that we 
publish.

> Prevent launching MULTI_ROLE framework's tasks on agents without MULTI_ROLE 
> support.
> 
>
> Key: MESOS-6854
> URL: https://issues.apache.org/jira/browse/MESOS-6854
> Project: Mesos
>  Issue Type: Task
>  Components: agent, master
>Reporter: Benjamin Mahler
>Assignee: Jay Guo
>
> The proposal for upgrades / backwards compatibility in phase 1 of multi-role 
> framework support is that we require that masters and agents are all upgraded 
> before a multi-role framework registers.
> We need to explicitly protect against this situation occurring given it's 
> common for old agents to show up in a cluster. The master can prevent the 
> launching of MULTI_ROLE frameworks' tasks on agent without MULTI_ROLE 
> framework support.
> If we were to naively let this happen the old agent would think the resources 
> are allocated to the "*" and there would need to be master logic to deal with 
> the old agent not populating Resource.AllocationInfo.
> The guard will either need to be version based or agent capability based, the 
> latter seeming like the stronger approach given some users upgrade off of 
> master rather than using release versions.
> We can initially start with the master side guard, and have the agent send 
> the capability once the agent-side implementation is complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6940) Do not send offers to MULTI_ROLE schedulers if agent does not have MULTI_ROLE capability.

2017-01-17 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-6940:
--

 Summary: Do not send offers to MULTI_ROLE schedulers if agent does 
not have MULTI_ROLE capability.
 Key: MESOS-6940
 URL: https://issues.apache.org/jira/browse/MESOS-6940
 Project: Mesos
  Issue Type: Task
  Components: allocation, master
Reporter: Benjamin Mahler


Old agents that do not have the MULTI_ROLE capability cannot correctly receive 
tasks from schedulers that have the MULTI_ROLE capability *and are using 
multiple roles*. In this case, we should not send the offer to the scheduler, 
rather than sending an offer but rejecting the scheduler's operations.

Note also that since we allow a single role scheduler to upgrade into having 
the MULTI_ROLE capability (use of the {{FrameworkInfo.roles}} field) so long as 
they continue to use a single role (in phase 1 of multi-role support the roles 
cannot be changed), we could continue sending offers if the scheduler is 
MULTI_ROLE capable but only uses a single role.

In phase 2 of multi-role support, we cannot safely allow a MULTI_ROLE scheduler 
to receive resources from a non-MULTI_ROLE agent, so it seems we should simply 
disallow MULTI_ROLE schedulers from receiving offers from non-MULTI_ROLE 
agents, regardless of how many roles the scheduler is using.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4245) Add `dist` target to CMake solution

2017-01-17 Thread Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827182#comment-15827182
 ] 

Srinivas commented on MESOS-4245:
-

https://reviews.apache.org/r/55657/

> Add `dist` target to CMake solution
> ---
>
> Key: MESOS-4245
> URL: https://issues.apache.org/jira/browse/MESOS-4245
> Project: Mesos
>  Issue Type: Bug
>  Components: cmake
>Reporter: Alex Clemmer
>Assignee: Srinivas
>  Labels: cmake, mesosphere, microsoft, windows
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6249) On Mesos master failover the reregistered callback is not triggered

2017-01-17 Thread Zhitao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827147#comment-15827147
 ] 

Zhitao Li commented on MESOS-6249:
--

Strongly +1 for updating documentation as well as inline comments in header.

> On Mesos master failover the reregistered callback is not triggered
> ---
>
> Key: MESOS-6249
> URL: https://issues.apache.org/jira/browse/MESOS-6249
> Project: Mesos
>  Issue Type: Bug
>  Components: java api
>Affects Versions: 0.28.0, 0.28.1, 1.0.1
> Environment: OS X 10.11.6
>Reporter: Markus Jura
>
> On a Mesos master failover the reregistered callback of the Java API is not 
> triggered. Only the registration callback is triggered which makes it hard 
> for a framework to distinguish between these scenarios.
> This behaviour has been tested with the ConductR framework, both with the 
> Java API version 0.28.0, 0.28.1 and 1.0.1. Below you find the logs from the 
> master that got re-elected and from the ConductR framework.
> *Log: Mesos master on a master re-election*
> {code:bash}
> I0926 11:44:20.008306 3747840 zookeeper.cpp:259] A new leading master 
> (UPID=master@127.0.0.1:5050) is detected
> I0926 11:44:20.008458 3747840 master.cpp:1847] The newly elected leader is 
> master@127.0.0.1:5050 with id ca5b9713-1eec-43e1-9d27-9ebc5c0f95b1
> I0926 11:44:20.008484 3747840 master.cpp:1860] Elected as the leading master!
> I0926 11:44:20.008498 3747840 master.cpp:1547] Recovering from registrar
> I0926 11:44:20.008607 3747840 registrar.cpp:332] Recovering registrar
> I0926 11:44:20.016340 4284416 registrar.cpp:365] Successfully fetched the 
> registry (0B) in 7.702016ms
> I0926 11:44:20.016393 4284416 registrar.cpp:464] Applied 1 operations in 
> 12us; attempting to update the 'registry'
> I0926 11:44:20.021428 4284416 registrar.cpp:509] Successfully updated the 
> 'registry' in 5.019904ms
> I0926 11:44:20.021481 4284416 registrar.cpp:395] Successfully recovered 
> registrar
> I0926 11:44:20.021611 528384 master.cpp:1655] Recovered 0 agents from the 
> Registry (118B) ; allowing 10mins for agents to re-register
> I0926 11:44:20.536859 3747840 master.cpp:2424] Received SUBSCRIBE call for 
> framework 'conductr' at 
> scheduler-3f8b9645-7a17-4e9f-8ad5-077fe8c23b39@192.168.2.106:57164
> I0926 11:44:20.536969 3747840 master.cpp:2500] Subscribing framework conductr 
> with checkpointing disabled and capabilities [  ]
> I0926 11:44:20.537401 3211264 hierarchical.cpp:271] Added framework conductr
> I0926 11:44:20.807895 528384 master.cpp:4787] Re-registering agent 
> b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0 at slave(1)@127.0.0.1:5051 (127.0.0.1)
> I0926 11:44:20.808145 1601536 registrar.cpp:464] Applied 1 operations in 
> 38us; attempting to update the 'registry'
> I0926 11:44:20.815757 1601536 registrar.cpp:509] Successfully updated the 
> 'registry' in 7.568896ms
> I0926 11:44:20.815992 3747840 master.cpp:7447] Adding task 
> 6abce9bb-895f-4f6f-be5b-25f6bd09f548 with resources mem(*):0 on agent 
> b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0 (127.0.0.1)
> I0926 11:44:20.816339 3747840 master.cpp:4872] Re-registered agent 
> b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0 at slave(1)@127.0.0.1:5051 
> (127.0.0.1) with cpus(*):8; mem(*):15360; disk(*):470832; 
> ports(*):[31000-32000]
> I0926 11:44:20.816385 1601536 hierarchical.cpp:478] Added agent 
> b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0 (127.0.0.1) with cpus(*):8; 
> mem(*):15360; disk(*):470832; ports(*):[31000-32000] (allocated: cpus(*):0.9; 
> mem(*):402.653; disk(*):1000; ports(*):[31000-31000, 31001-31500])
> I0926 11:44:20.816437 3747840 master.cpp:4940] Sending updated checkpointed 
> resources  to agent b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0 at 
> slave(1)@127.0.0.1:5051 (127.0.0.1)
> I0926 11:44:20.816787 4284416 master.cpp:5725] Sending 1 offers to framework 
> conductr (conductr) at 
> scheduler-3f8b9645-7a17-4e9f-8ad5-077fe8c23b39@192.168.2.106:57164
> {code}
> *Log: ConductR framework*
> {code:bash}
> I0926 11:44:20.007189 66441216 detector.cpp:152] Detected a new leader: 
> (id='87')
> I0926 11:44:20.007524 64294912 group.cpp:706] Trying to get 
> '/mesos/json.info_87' in ZooKeeper
> I0926 11:44:20.008625 63758336 zookeeper.cpp:259] A new leading master 
> (UPID=master@127.0.0.1:5050) is detected
> I0926 11:44:20.008965 63758336 sched.cpp:330] New master detected at 
> master@127.0.0.1:5050
> 2016-09-26T09:44:20Z MacBook-Pro-6.local INFO  MesosSchedulerClient 
> [sourceThread=conductr-akka.actor.default-dispatcher-2, 
> akkaTimestamp=09:44:20.009UTC, 
> akkaSource=akka.tcp://conductr@127.0.0.1:9004/user/reaper/mesos-client-supervisor/singleton/mesos-client,
>  sourceActorSystem=conductr] - Mesos master has been disconnected..
> I0926 11:44:20.012472 63758336 sched.cpp:341] No credentials provided. 

[jira] [Created] (MESOS-6939) Attributes on Resources

2017-01-17 Thread Gabriel Hartmann (JIRA)
Gabriel Hartmann created MESOS-6939:
---

 Summary: Attributes on Resources
 Key: MESOS-6939
 URL: https://issues.apache.org/jira/browse/MESOS-6939
 Project: Mesos
  Issue Type: Improvement
Reporter: Gabriel Hartmann


Resources (particularly disks) need attributes so that they can be selectively 
consumed by frameworks depending on characteristics like whether they are SSDs.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6937) ContentType/MasterAPITest.ReserveResources/1 fails during Writer close

2017-01-17 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-6937:
-
Shepherd: Anand Mazumdar

> ContentType/MasterAPITest.ReserveResources/1 fails during Writer close
> --
>
> Key: MESOS-6937
> URL: https://issues.apache.org/jira/browse/MESOS-6937
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
> Environment: ASF CI, Ubuntu 14.04, libevent and SSL enabled
>Reporter: Greg Mann
>Assignee: Greg Mann
>Priority: Blocker
>  Labels: tests
> Attachments: MasterAPITest.ReserveResources.txt
>
>
> This was observed on ASF CI. Libevent was enabled, but the test in question 
> was not running in SSL-enabled mode. We see the following stack trace:
> {code}
> *** Error in `src/mesos-tests': double free or corruption (fasttop): 
> 0x2b4f7001bf70 ***
> *** Aborted at 1484691168 (unix time) try "date -d @1484691168" if you are 
> using GNU date ***
> PC: @ 0x2b4f2bc9ac37 (unknown)
> *** SIGABRT (@0x3e869c7) received by PID 27079 (TID 0x2b4f35be5700) from 
> PID 27079; stack trace: ***
> @ 0x2b4f2b236330 (unknown)
> @ 0x2b4f2bc9ac37 (unknown)
> @ 0x2b4f2bc9e028 (unknown)
> @ 0x2b4f2bcd72a4 (unknown)
> @ 0x2b4f2bce355e (unknown)
> @ 0x2b4f299e98a0 
> _ZNSt14_Function_base13_Base_managerIZN7process8internal4LoopIZNS1_4http4Pipe6Reader7readAllEvEUlvE_ZNS6_7readAllEvEUlRKSsE0_SsSsE3runENS1_6FutureISsEEEUlvE3_E10_M_managerERSt9_Any_dataRKSG_St18_Manager_operation
> @ 0x2b4f299fadb9 
> _ZN7process8internal4LoopIZNS_4http4Pipe6Reader7readAllEvEUlvE_ZNS4_7readAllEvEUlRKSsE0_SsSsE3runENS_6FutureISsEE
> @ 0x2b4f299fca57 
> _ZNSt17_Function_handlerIFvRKN7process6FutureISsEEEZNKS2_5onAnyIRZNS0_8internal4LoopIZNS0_4http4Pipe6Reader7readAllEvEUlvE_ZNSB_7readAllEvEUlRKSsE0_SsSsE3runES2_EUlS4_E2_vEES4_OT_NS2_6PreferEEUlS4_E_E9_M_invokeERKSt9_Any_dataS4_
> @ 0x2b4f28a4cc16 
> _ZN7process8internal3runISt8functionIFvRKNS_6FutureISsJRS4_EEEvRKSt6vectorIT_SaISB_EEDpOT0_
> @ 0x2b4f29a2479f process::Future<>::_set<>()
> @ 0x2b4f299f46a9 process::http::Pipe::Writer::close()
> @ 0x2b4f29a24d32 
> process::StreamingRequestDecoder::on_message_complete()
> @ 0x2b4f29b0641d http_parser_execute
> @ 0x2b4f29aaeafe process::internal::decode_recv()
> @ 0x2b4f29abc44b 
> _ZNSt17_Function_handlerIFvRKN7process6FutureImEEEZNKS2_5onAnyISt5_BindIFPFvS4_PcmNS0_7network8internal6SocketINS9_4inet7AddressEEEPNS0_23StreamingRequestDecoderEESt12_PlaceholderILi1EES8_mSE_SG_EEvEES4_OT_NS2_6PreferEEUlS4_E_E9_M_invokeERKSt9_Any_dataS4_
> @  0x14e136e process::internal::run<>()
> @  0x14e5d9f process::Future<>::_set<>()
> @ 0x2b4f29a4c23d 
> _ZN7process8internal4LoopIZNS_2io8internal4readEiPvmEUlvE_ZNS3_4readEiS4_mEUlRK6OptionImEE0_S7_mE3runENS_6FutureIS7_EE
> @ 0x2b4f29a4dc6f 
> _ZNSt17_Function_handlerIFvRKN7process6FutureINS0_11ControlFlowImEZNKS4_5onAnyIRZNS0_8internal4LoopIZNS0_2io8internal4readEiPvmEUlvE_ZNSC_4readEiSD_mEUlRK6OptionImEE0_SG_mE3runENS1_ISG_EEEUlS6_E0_vEES6_OT_NS4_6PreferEEUlS6_E_E9_M_invokeERKSt9_Any_dataS6_
> @ 0x2b4f29a5bec6 
> _ZN7process8internal3runISt8functionIFvRKNS_6FutureINS_11ControlFlowImEEJRS6_EEEvRKSt6vectorIT_SaISD_EEDpOT0_
> @ 0x2b4f29a5d971 process::Future<>::_set<>()
> @ 0x2b4f29a600a1 process::Promise<>::associate()
> @ 0x2b4f29a608da process::internal::thenf<>()
> @ 0x2b4f29b0170e 
> _ZN7process8internal3runISt8functionIFvRKNS_6FutureIsJRS4_EEEvRKSt6vectorIT_SaISB_EEDpOT0_
> @ 0x2b4f29b01cd1 process::Future<>::_set<>()
> @ 0x2b4f29b00b36 process::io::internal::pollCallback()
> @ 0x2b4f29b0b990 event_process_active_single_queue
> @ 0x2b4f29b0bf06 event_process_active
> @ 0x2b4f29b0c662 event_base_loop
> @ 0x2b4f29aff96d process::EventLoop::run()
> @ 0x2b4f2b4f5a60 (unknown)
> @ 0x2b4f2b22e184 start_thread
> {code}
> Find the log from the failed run attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6937) ContentType/MasterAPITest.ReserveResources/1 fails during Writer close

2017-01-17 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann reassigned MESOS-6937:


Assignee: Greg Mann

> ContentType/MasterAPITest.ReserveResources/1 fails during Writer close
> --
>
> Key: MESOS-6937
> URL: https://issues.apache.org/jira/browse/MESOS-6937
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
> Environment: ASF CI, Ubuntu 14.04, libevent and SSL enabled
>Reporter: Greg Mann
>Assignee: Greg Mann
>Priority: Blocker
>  Labels: tests
> Attachments: MasterAPITest.ReserveResources.txt
>
>
> This was observed on ASF CI. Libevent was enabled, but the test in question 
> was not running in SSL-enabled mode. We see the following stack trace:
> {code}
> *** Error in `src/mesos-tests': double free or corruption (fasttop): 
> 0x2b4f7001bf70 ***
> *** Aborted at 1484691168 (unix time) try "date -d @1484691168" if you are 
> using GNU date ***
> PC: @ 0x2b4f2bc9ac37 (unknown)
> *** SIGABRT (@0x3e869c7) received by PID 27079 (TID 0x2b4f35be5700) from 
> PID 27079; stack trace: ***
> @ 0x2b4f2b236330 (unknown)
> @ 0x2b4f2bc9ac37 (unknown)
> @ 0x2b4f2bc9e028 (unknown)
> @ 0x2b4f2bcd72a4 (unknown)
> @ 0x2b4f2bce355e (unknown)
> @ 0x2b4f299e98a0 
> _ZNSt14_Function_base13_Base_managerIZN7process8internal4LoopIZNS1_4http4Pipe6Reader7readAllEvEUlvE_ZNS6_7readAllEvEUlRKSsE0_SsSsE3runENS1_6FutureISsEEEUlvE3_E10_M_managerERSt9_Any_dataRKSG_St18_Manager_operation
> @ 0x2b4f299fadb9 
> _ZN7process8internal4LoopIZNS_4http4Pipe6Reader7readAllEvEUlvE_ZNS4_7readAllEvEUlRKSsE0_SsSsE3runENS_6FutureISsEE
> @ 0x2b4f299fca57 
> _ZNSt17_Function_handlerIFvRKN7process6FutureISsEEEZNKS2_5onAnyIRZNS0_8internal4LoopIZNS0_4http4Pipe6Reader7readAllEvEUlvE_ZNSB_7readAllEvEUlRKSsE0_SsSsE3runES2_EUlS4_E2_vEES4_OT_NS2_6PreferEEUlS4_E_E9_M_invokeERKSt9_Any_dataS4_
> @ 0x2b4f28a4cc16 
> _ZN7process8internal3runISt8functionIFvRKNS_6FutureISsJRS4_EEEvRKSt6vectorIT_SaISB_EEDpOT0_
> @ 0x2b4f29a2479f process::Future<>::_set<>()
> @ 0x2b4f299f46a9 process::http::Pipe::Writer::close()
> @ 0x2b4f29a24d32 
> process::StreamingRequestDecoder::on_message_complete()
> @ 0x2b4f29b0641d http_parser_execute
> @ 0x2b4f29aaeafe process::internal::decode_recv()
> @ 0x2b4f29abc44b 
> _ZNSt17_Function_handlerIFvRKN7process6FutureImEEEZNKS2_5onAnyISt5_BindIFPFvS4_PcmNS0_7network8internal6SocketINS9_4inet7AddressEEEPNS0_23StreamingRequestDecoderEESt12_PlaceholderILi1EES8_mSE_SG_EEvEES4_OT_NS2_6PreferEEUlS4_E_E9_M_invokeERKSt9_Any_dataS4_
> @  0x14e136e process::internal::run<>()
> @  0x14e5d9f process::Future<>::_set<>()
> @ 0x2b4f29a4c23d 
> _ZN7process8internal4LoopIZNS_2io8internal4readEiPvmEUlvE_ZNS3_4readEiS4_mEUlRK6OptionImEE0_S7_mE3runENS_6FutureIS7_EE
> @ 0x2b4f29a4dc6f 
> _ZNSt17_Function_handlerIFvRKN7process6FutureINS0_11ControlFlowImEZNKS4_5onAnyIRZNS0_8internal4LoopIZNS0_2io8internal4readEiPvmEUlvE_ZNSC_4readEiSD_mEUlRK6OptionImEE0_SG_mE3runENS1_ISG_EEEUlS6_E0_vEES6_OT_NS4_6PreferEEUlS6_E_E9_M_invokeERKSt9_Any_dataS6_
> @ 0x2b4f29a5bec6 
> _ZN7process8internal3runISt8functionIFvRKNS_6FutureINS_11ControlFlowImEEJRS6_EEEvRKSt6vectorIT_SaISD_EEDpOT0_
> @ 0x2b4f29a5d971 process::Future<>::_set<>()
> @ 0x2b4f29a600a1 process::Promise<>::associate()
> @ 0x2b4f29a608da process::internal::thenf<>()
> @ 0x2b4f29b0170e 
> _ZN7process8internal3runISt8functionIFvRKNS_6FutureIsJRS4_EEEvRKSt6vectorIT_SaISB_EEDpOT0_
> @ 0x2b4f29b01cd1 process::Future<>::_set<>()
> @ 0x2b4f29b00b36 process::io::internal::pollCallback()
> @ 0x2b4f29b0b990 event_process_active_single_queue
> @ 0x2b4f29b0bf06 event_process_active
> @ 0x2b4f29b0c662 event_base_loop
> @ 0x2b4f29aff96d process::EventLoop::run()
> @ 0x2b4f2b4f5a60 (unknown)
> @ 0x2b4f2b22e184 start_thread
> {code}
> Find the log from the failed run attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6938) Libprocess reinitialization is flaky, can segfault

2017-01-17 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-6938:
-
Description: 
This was observed on ASF CI. Based on the placement of the stacktrace, the 
segfault seems to occur during libprocess reinitialization, when 
{{process::initialize}} is called:
{code}
[--] 4 tests from Encryption/NetSocketTest
[ RUN  ] Encryption/NetSocketTest.EOFBeforeRecv/0
I0117 15:18:35.320691 27596 openssl.cpp:419] CA file path is unspecified! NOTE: 
Set CA file path with LIBPROCESS_SSL_CA_FILE=
I0117 15:18:35.320714 27596 openssl.cpp:424] CA directory path unspecified! 
NOTE: Set CA directory path with LIBPROCESS_SSL_CA_DIR=
I0117 15:18:35.320719 27596 openssl.cpp:429] Will not verify peer certificate!
NOTE: Set LIBPROCESS_SSL_VERIFY_CERT=1 to enable peer certificate verification
I0117 15:18:35.320726 27596 openssl.cpp:435] Will only verify peer certificate 
if presented!
NOTE: Set LIBPROCESS_SSL_REQUIRE_CERT=1 to require peer certificate verification
I0117 15:18:35.335141 27596 process.cpp:1234] libprocess is initialized on 
172.17.0.3:46415 with 16 worker threads
[   OK ] Encryption/NetSocketTest.EOFBeforeRecv/0 (422 ms)
[ RUN  ] Encryption/NetSocketTest.EOFBeforeRecv/1
I0117 15:18:35.390697 27596 process.cpp:1234] libprocess is initialized on 
172.17.0.3:39822 with 16 worker threads
[   OK ] Encryption/NetSocketTest.EOFBeforeRecv/1 (6 ms)
[ RUN  ] Encryption/NetSocketTest.EOFAfterRecv/0
I0117 15:18:35.998528 27596 openssl.cpp:419] CA file path is unspecified! NOTE: 
Set CA file path with LIBPROCESS_SSL_CA_FILE=
I0117 15:18:35.998559 27596 openssl.cpp:424] CA directory path unspecified! 
NOTE: Set CA directory path with LIBPROCESS_SSL_CA_DIR=
I0117 15:18:35.998566 27596 openssl.cpp:429] Will not verify peer certificate!
NOTE: Set LIBPROCESS_SSL_VERIFY_CERT=1 to enable peer certificate verification
I0117 15:18:35.998572 27596 openssl.cpp:435] Will only verify peer certificate 
if presented!
NOTE: Set LIBPROCESS_SSL_REQUIRE_CERT=1 to require peer certificate verification
I0117 15:18:36.010643 27596 process.cpp:1234] libprocess is initialized on 
172.17.0.3:47429 with 16 worker threads
[   OK ] Encryption/NetSocketTest.EOFAfterRecv/0 (664 ms)
[ RUN  ] Encryption/NetSocketTest.EOFAfterRecv/1
I0117 15:18:36.079453 27596 process.cpp:1234] libprocess is initialized on 
172.17.0.3:38149 with 16 worker threads
[   OK ] Encryption/NetSocketTest.EOFAfterRecv/1 (19 ms)
*** Aborted at 1484666316 (unix time) try "date -d @1484666316" if you are 
using GNU date ***
PC: @ 0x7f7643ad7c56 __memcpy_ssse3_back
*** SIGSEGV (@0x57c10f8) received by PID 27596 (TID 0x7f76393c2700) from PID 
92016888; stack trace: ***
@ 0x7f7644ba0370 (unknown)
@ 0x7f7643ad7c56 __memcpy_ssse3_back
@ 0x7f76443248e0 (unknown)
@ 0x7f7644324f8c (unknown)
@   0x422a4d process::UPID::UPID()
I0117 15:18:36.090376 27596 process.cpp:1234] libprocess is initialized on 
172.17.0.3:43835 with 16 worker threads
[--] 4 tests from Encryption/NetSocketTest (1116 ms total)

[--] 6 tests from SSLVerifyIPAdd/SSLTest
[ RUN  ] SSLVerifyIPAdd/SSLTest.BasicSameProcess/0
@   0x8ae4a8 process::DispatchEvent::DispatchEvent()
@   0x8a6a5e process::internal::dispatch()
@   0x8c0b44 process::dispatch<>()
@   0x8a598a process::ProcessBase::route()
@   0x98be53 process::ProcessBase::route<>()
@   0x988096 process::Help::initialize()
@   0x89ef2a process::ProcessManager::resume()
@   0x89b976 _ZZN7process14ProcessManager12init_threadsEvENKUt_clEv
@   0x8adb3c 
_ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
@   0x8ada80 
_ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEclEv
@   0x8ada0a 
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
@ 0x7f764431b230 (unknown)
@ 0x7f7644b98dc5 start_thread
@ 0x7f7643a8473d __clone
make[7]: *** [check-local] Segmentation fault
{code}

  was:
This was observed on ASF CI. Based on the placement of the stacktrace, the 
segfault seems to occur during libprocess reinitialization, when 
{{process::initialize}} is called:
{code}
[ RUN  ] Encryption/NetSocketTest.EOFAfterRecv/1
I0117 15:18:36.079453 27596 process.cpp:1234] libprocess is initialized on 
172.17.0.3:38149 with 16 worker threads
[   OK ] Encryption/NetSocketTest.EOFAfterRecv/1 (19 ms)
*** Aborted at 1484666316 (unix time) try "date -d @1484666316" if you are 
using GNU date ***
PC: @ 0x7f7643ad7c56 __memcpy_ssse3_back
*** SIGSEGV (@0x57c10f8) received by PID 27596 (TID 0x7f76393c2700) from PID 
92016888; stack trace: ***
@ 0x7f7644ba0370 (unknown)
@ 

[jira] [Created] (MESOS-6938) Libprocess reinitialization is flaky, can segfault

2017-01-17 Thread Greg Mann (JIRA)
Greg Mann created MESOS-6938:


 Summary: Libprocess reinitialization is flaky, can segfault
 Key: MESOS-6938
 URL: https://issues.apache.org/jira/browse/MESOS-6938
 Project: Mesos
  Issue Type: Bug
  Components: libprocess, tests
 Environment: ASF CI, CentOS 7, libevent and SSL enabled
Reporter: Greg Mann


This was observed on ASF CI. Based on the placement of the stacktrace, the 
segfault seems to occur during libprocess reinitialization, when 
{{process::initialize}} is called:
{code}
[ RUN  ] Encryption/NetSocketTest.EOFAfterRecv/1
I0117 15:18:36.079453 27596 process.cpp:1234] libprocess is initialized on 
172.17.0.3:38149 with 16 worker threads
[   OK ] Encryption/NetSocketTest.EOFAfterRecv/1 (19 ms)
*** Aborted at 1484666316 (unix time) try "date -d @1484666316" if you are 
using GNU date ***
PC: @ 0x7f7643ad7c56 __memcpy_ssse3_back
*** SIGSEGV (@0x57c10f8) received by PID 27596 (TID 0x7f76393c2700) from PID 
92016888; stack trace: ***
@ 0x7f7644ba0370 (unknown)
@ 0x7f7643ad7c56 __memcpy_ssse3_back
@ 0x7f76443248e0 (unknown)
@ 0x7f7644324f8c (unknown)
@   0x422a4d process::UPID::UPID()
I0117 15:18:36.090376 27596 process.cpp:1234] libprocess is initialized on 
172.17.0.3:43835 with 16 worker threads
[--] 4 tests from Encryption/NetSocketTest (1116 ms total)

[--] 6 tests from SSLVerifyIPAdd/SSLTest
[ RUN  ] SSLVerifyIPAdd/SSLTest.BasicSameProcess/0
@   0x8ae4a8 process::DispatchEvent::DispatchEvent()
@   0x8a6a5e process::internal::dispatch()
@   0x8c0b44 process::dispatch<>()
@   0x8a598a process::ProcessBase::route()
@   0x98be53 process::ProcessBase::route<>()
@   0x988096 process::Help::initialize()
@   0x89ef2a process::ProcessManager::resume()
@   0x89b976 _ZZN7process14ProcessManager12init_threadsEvENKUt_clEv
@   0x8adb3c 
_ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
@   0x8ada80 
_ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEclEv
@   0x8ada0a 
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
@ 0x7f764431b230 (unknown)
@ 0x7f7644b98dc5 start_thread
@ 0x7f7643a8473d __clone
make[7]: *** [check-local] Segmentation fault
{code}

Find attached the log from the failed run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6937) ContentType/MasterAPITest.ReserveResources/1 fails during Writer close

2017-01-17 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-6937:
--
Target Version/s: 1.2.0
Priority: Blocker  (was: Major)

> ContentType/MasterAPITest.ReserveResources/1 fails during Writer close
> --
>
> Key: MESOS-6937
> URL: https://issues.apache.org/jira/browse/MESOS-6937
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
> Environment: ASF CI, Ubuntu 14.04, libevent and SSL enabled
>Reporter: Greg Mann
>Priority: Blocker
>  Labels: tests
> Attachments: MasterAPITest.ReserveResources.txt
>
>
> This was observed on ASF CI. Libevent was enabled, but the test in question 
> was not running in SSL-enabled mode. We see the following stack trace:
> {code}
> *** Error in `src/mesos-tests': double free or corruption (fasttop): 
> 0x2b4f7001bf70 ***
> *** Aborted at 1484691168 (unix time) try "date -d @1484691168" if you are 
> using GNU date ***
> PC: @ 0x2b4f2bc9ac37 (unknown)
> *** SIGABRT (@0x3e869c7) received by PID 27079 (TID 0x2b4f35be5700) from 
> PID 27079; stack trace: ***
> @ 0x2b4f2b236330 (unknown)
> @ 0x2b4f2bc9ac37 (unknown)
> @ 0x2b4f2bc9e028 (unknown)
> @ 0x2b4f2bcd72a4 (unknown)
> @ 0x2b4f2bce355e (unknown)
> @ 0x2b4f299e98a0 
> _ZNSt14_Function_base13_Base_managerIZN7process8internal4LoopIZNS1_4http4Pipe6Reader7readAllEvEUlvE_ZNS6_7readAllEvEUlRKSsE0_SsSsE3runENS1_6FutureISsEEEUlvE3_E10_M_managerERSt9_Any_dataRKSG_St18_Manager_operation
> @ 0x2b4f299fadb9 
> _ZN7process8internal4LoopIZNS_4http4Pipe6Reader7readAllEvEUlvE_ZNS4_7readAllEvEUlRKSsE0_SsSsE3runENS_6FutureISsEE
> @ 0x2b4f299fca57 
> _ZNSt17_Function_handlerIFvRKN7process6FutureISsEEEZNKS2_5onAnyIRZNS0_8internal4LoopIZNS0_4http4Pipe6Reader7readAllEvEUlvE_ZNSB_7readAllEvEUlRKSsE0_SsSsE3runES2_EUlS4_E2_vEES4_OT_NS2_6PreferEEUlS4_E_E9_M_invokeERKSt9_Any_dataS4_
> @ 0x2b4f28a4cc16 
> _ZN7process8internal3runISt8functionIFvRKNS_6FutureISsJRS4_EEEvRKSt6vectorIT_SaISB_EEDpOT0_
> @ 0x2b4f29a2479f process::Future<>::_set<>()
> @ 0x2b4f299f46a9 process::http::Pipe::Writer::close()
> @ 0x2b4f29a24d32 
> process::StreamingRequestDecoder::on_message_complete()
> @ 0x2b4f29b0641d http_parser_execute
> @ 0x2b4f29aaeafe process::internal::decode_recv()
> @ 0x2b4f29abc44b 
> _ZNSt17_Function_handlerIFvRKN7process6FutureImEEEZNKS2_5onAnyISt5_BindIFPFvS4_PcmNS0_7network8internal6SocketINS9_4inet7AddressEEEPNS0_23StreamingRequestDecoderEESt12_PlaceholderILi1EES8_mSE_SG_EEvEES4_OT_NS2_6PreferEEUlS4_E_E9_M_invokeERKSt9_Any_dataS4_
> @  0x14e136e process::internal::run<>()
> @  0x14e5d9f process::Future<>::_set<>()
> @ 0x2b4f29a4c23d 
> _ZN7process8internal4LoopIZNS_2io8internal4readEiPvmEUlvE_ZNS3_4readEiS4_mEUlRK6OptionImEE0_S7_mE3runENS_6FutureIS7_EE
> @ 0x2b4f29a4dc6f 
> _ZNSt17_Function_handlerIFvRKN7process6FutureINS0_11ControlFlowImEZNKS4_5onAnyIRZNS0_8internal4LoopIZNS0_2io8internal4readEiPvmEUlvE_ZNSC_4readEiSD_mEUlRK6OptionImEE0_SG_mE3runENS1_ISG_EEEUlS6_E0_vEES6_OT_NS4_6PreferEEUlS6_E_E9_M_invokeERKSt9_Any_dataS6_
> @ 0x2b4f29a5bec6 
> _ZN7process8internal3runISt8functionIFvRKNS_6FutureINS_11ControlFlowImEEJRS6_EEEvRKSt6vectorIT_SaISD_EEDpOT0_
> @ 0x2b4f29a5d971 process::Future<>::_set<>()
> @ 0x2b4f29a600a1 process::Promise<>::associate()
> @ 0x2b4f29a608da process::internal::thenf<>()
> @ 0x2b4f29b0170e 
> _ZN7process8internal3runISt8functionIFvRKNS_6FutureIsJRS4_EEEvRKSt6vectorIT_SaISB_EEDpOT0_
> @ 0x2b4f29b01cd1 process::Future<>::_set<>()
> @ 0x2b4f29b00b36 process::io::internal::pollCallback()
> @ 0x2b4f29b0b990 event_process_active_single_queue
> @ 0x2b4f29b0bf06 event_process_active
> @ 0x2b4f29b0c662 event_base_loop
> @ 0x2b4f29aff96d process::EventLoop::run()
> @ 0x2b4f2b4f5a60 (unknown)
> @ 0x2b4f2b22e184 start_thread
> {code}
> Find the log from the failed run attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4441) Allocate revocable resources beyond quota guarantee.

2017-01-17 Thread Michael Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Park updated MESOS-4441:

Priority: Major  (was: Blocker)

> Allocate revocable resources beyond quota guarantee.
> 
>
> Key: MESOS-4441
> URL: https://issues.apache.org/jira/browse/MESOS-4441
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Alexander Rukletsov
>  Labels: mesosphere
>
> h4. Status Quo
> Currently resources allocated to frameworks in a role with quota (aka 
> quota'ed role) beyond quota guarantee are marked non-revocable. This impacts 
> our flexibility for revoking them if we decide so in the future.
> h4. Proposal
> Once quota guarantee is satisfied we must not necessarily further allocate 
> resources as non-revocable. Instead we can mark all offers resources beyond 
> guarantee as revocable. When in the future {{RevocableInfo}} evolves 
> frameworks will get additional information about "revocability" of the 
> resource (i.e. allocation slack)
> h4. Caveats
> Though it seems like a simple change, it has several implications.
> h6. Fairness
> Currently the hierarchical allocator considers revocable resources as regular 
> resources when doing fairness calculations. This may prevent frameworks 
> getting non-revocable resources as part of their role's quota guarantee if 
> they accept some revocable resources as well.
> Consider the following scenario. A single framework in a role with quota set 
> to {{10}} CPUs is allocated {{10}} CPUs as non-revocable resources as part of 
> its quota and additionally {{2}} revocable CPUs. Now a task using {{2}} 
> non-revocable CPUs finishes and its resources are returned. Total allocation 
> for the role is {{8}} non-revocable + {{2}} revocable. However, the role may 
> not be offered additional {{2}} non-revocable since its total allocation 
> satisfies quota.
> h6. Resource math
> If we allocate non-revocable resources as revocable, we should make sure we 
> do accounting right: either we should update total agent resources and mark 
> them as revocable as well, or bookkeep resources as non-revocable and convert 
> them to revocable when necessary.
> h6. Coarse-grained nature of allocation
> The hierarchical allocator performs "coarse-grained" allocation, meaning it 
> always allocates the entire remaining agent resources to a single framework. 
> This may lead to over-allocating some resources as non-revocable beyond quota 
> guarantee.
> h6. Quotas smaller than fair share
> If a quota set for a role is smaller than its fair share, it may reduce the 
> amount of resources offered to this role, if frameworks in it do not accept 
> revocable resources. This is probably the most important consequence of the 
> proposed change. Operators may set quota to get guarantees, but may observe a 
> decrease in amount of resources a role gets, which is not intuitive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4441) Allocate revocable resources beyond quota guarantee.

2017-01-17 Thread Michael Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Park updated MESOS-4441:

Target Version/s:   (was: 0.28.0)

> Allocate revocable resources beyond quota guarantee.
> 
>
> Key: MESOS-4441
> URL: https://issues.apache.org/jira/browse/MESOS-4441
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Alexander Rukletsov
>Priority: Blocker
>  Labels: mesosphere
>
> h4. Status Quo
> Currently resources allocated to frameworks in a role with quota (aka 
> quota'ed role) beyond quota guarantee are marked non-revocable. This impacts 
> our flexibility for revoking them if we decide so in the future.
> h4. Proposal
> Once quota guarantee is satisfied we must not necessarily further allocate 
> resources as non-revocable. Instead we can mark all offers resources beyond 
> guarantee as revocable. When in the future {{RevocableInfo}} evolves 
> frameworks will get additional information about "revocability" of the 
> resource (i.e. allocation slack)
> h4. Caveats
> Though it seems like a simple change, it has several implications.
> h6. Fairness
> Currently the hierarchical allocator considers revocable resources as regular 
> resources when doing fairness calculations. This may prevent frameworks 
> getting non-revocable resources as part of their role's quota guarantee if 
> they accept some revocable resources as well.
> Consider the following scenario. A single framework in a role with quota set 
> to {{10}} CPUs is allocated {{10}} CPUs as non-revocable resources as part of 
> its quota and additionally {{2}} revocable CPUs. Now a task using {{2}} 
> non-revocable CPUs finishes and its resources are returned. Total allocation 
> for the role is {{8}} non-revocable + {{2}} revocable. However, the role may 
> not be offered additional {{2}} non-revocable since its total allocation 
> satisfies quota.
> h6. Resource math
> If we allocate non-revocable resources as revocable, we should make sure we 
> do accounting right: either we should update total agent resources and mark 
> them as revocable as well, or bookkeep resources as non-revocable and convert 
> them to revocable when necessary.
> h6. Coarse-grained nature of allocation
> The hierarchical allocator performs "coarse-grained" allocation, meaning it 
> always allocates the entire remaining agent resources to a single framework. 
> This may lead to over-allocating some resources as non-revocable beyond quota 
> guarantee.
> h6. Quotas smaller than fair share
> If a quota set for a role is smaller than its fair share, it may reduce the 
> amount of resources offered to this role, if frameworks in it do not accept 
> revocable resources. This is probably the most important consequence of the 
> proposed change. Operators may set quota to get guarantees, but may observe a 
> decrease in amount of resources a role gets, which is not intuitive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4441) Allocate revocable resources beyond quota guarantee.

2017-01-17 Thread Michael Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Park updated MESOS-4441:

Assignee: (was: Michael Park)

> Allocate revocable resources beyond quota guarantee.
> 
>
> Key: MESOS-4441
> URL: https://issues.apache.org/jira/browse/MESOS-4441
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Alexander Rukletsov
>Priority: Blocker
>  Labels: mesosphere
>
> h4. Status Quo
> Currently resources allocated to frameworks in a role with quota (aka 
> quota'ed role) beyond quota guarantee are marked non-revocable. This impacts 
> our flexibility for revoking them if we decide so in the future.
> h4. Proposal
> Once quota guarantee is satisfied we must not necessarily further allocate 
> resources as non-revocable. Instead we can mark all offers resources beyond 
> guarantee as revocable. When in the future {{RevocableInfo}} evolves 
> frameworks will get additional information about "revocability" of the 
> resource (i.e. allocation slack)
> h4. Caveats
> Though it seems like a simple change, it has several implications.
> h6. Fairness
> Currently the hierarchical allocator considers revocable resources as regular 
> resources when doing fairness calculations. This may prevent frameworks 
> getting non-revocable resources as part of their role's quota guarantee if 
> they accept some revocable resources as well.
> Consider the following scenario. A single framework in a role with quota set 
> to {{10}} CPUs is allocated {{10}} CPUs as non-revocable resources as part of 
> its quota and additionally {{2}} revocable CPUs. Now a task using {{2}} 
> non-revocable CPUs finishes and its resources are returned. Total allocation 
> for the role is {{8}} non-revocable + {{2}} revocable. However, the role may 
> not be offered additional {{2}} non-revocable since its total allocation 
> satisfies quota.
> h6. Resource math
> If we allocate non-revocable resources as revocable, we should make sure we 
> do accounting right: either we should update total agent resources and mark 
> them as revocable as well, or bookkeep resources as non-revocable and convert 
> them to revocable when necessary.
> h6. Coarse-grained nature of allocation
> The hierarchical allocator performs "coarse-grained" allocation, meaning it 
> always allocates the entire remaining agent resources to a single framework. 
> This may lead to over-allocating some resources as non-revocable beyond quota 
> guarantee.
> h6. Quotas smaller than fair share
> If a quota set for a role is smaller than its fair share, it may reduce the 
> amount of resources offered to this role, if frameworks in it do not accept 
> revocable resources. This is probably the most important consequence of the 
> proposed change. Operators may set quota to get guarantees, but may observe a 
> decrease in amount of resources a role gets, which is not intuitive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6078) Add a agent teardown endpoint

2017-01-17 Thread Michael Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Park updated MESOS-6078:

Assignee: (was: Michael Park)

> Add a agent teardown endpoint
> -
>
> Key: MESOS-6078
> URL: https://issues.apache.org/jira/browse/MESOS-6078
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 1.0.0, 1.0.1
>Reporter: Cody Maloney
>  Labels: mesosphere
>
> Currently, when a whole agent machine is unexpectedly terminated for good 
> (AWS terminated the instance without warning), it goes through the mesos 
> slave removal rate limit before it's gone.
> If a couple agents / a whole rack goes in a cluster of thousands of agents, 
> this can get to be a problem.
> If the agent can be shutdown "cleanly" everything would get scheduled, but 
> once the agent is gone, there currently is no good way for an adminitstrator 
> to indicate the node is gone / gone and it's tasks are lost / should be 
> rescheduled if appropriate as soon as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6937) ContentType/MasterAPITest.ReserveResources/1 fails during Writer close

2017-01-17 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-6937:
-
Attachment: MasterAPITest.ReserveResources.txt

> ContentType/MasterAPITest.ReserveResources/1 fails during Writer close
> --
>
> Key: MESOS-6937
> URL: https://issues.apache.org/jira/browse/MESOS-6937
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
> Environment: ASF CI, Ubuntu 14.04, libevent and SSL enabled
>Reporter: Greg Mann
>  Labels: tests
> Attachments: MasterAPITest.ReserveResources.txt
>
>
> This was observed on ASF CI. Libevent was enabled, but the test in question 
> was not running in SSL-enabled mode. We see the following stack trace:
> {code}
> *** Error in `src/mesos-tests': double free or corruption (fasttop): 
> 0x2b4f7001bf70 ***
> *** Aborted at 1484691168 (unix time) try "date -d @1484691168" if you are 
> using GNU date ***
> PC: @ 0x2b4f2bc9ac37 (unknown)
> *** SIGABRT (@0x3e869c7) received by PID 27079 (TID 0x2b4f35be5700) from 
> PID 27079; stack trace: ***
> @ 0x2b4f2b236330 (unknown)
> @ 0x2b4f2bc9ac37 (unknown)
> @ 0x2b4f2bc9e028 (unknown)
> @ 0x2b4f2bcd72a4 (unknown)
> @ 0x2b4f2bce355e (unknown)
> @ 0x2b4f299e98a0 
> _ZNSt14_Function_base13_Base_managerIZN7process8internal4LoopIZNS1_4http4Pipe6Reader7readAllEvEUlvE_ZNS6_7readAllEvEUlRKSsE0_SsSsE3runENS1_6FutureISsEEEUlvE3_E10_M_managerERSt9_Any_dataRKSG_St18_Manager_operation
> @ 0x2b4f299fadb9 
> _ZN7process8internal4LoopIZNS_4http4Pipe6Reader7readAllEvEUlvE_ZNS4_7readAllEvEUlRKSsE0_SsSsE3runENS_6FutureISsEE
> @ 0x2b4f299fca57 
> _ZNSt17_Function_handlerIFvRKN7process6FutureISsEEEZNKS2_5onAnyIRZNS0_8internal4LoopIZNS0_4http4Pipe6Reader7readAllEvEUlvE_ZNSB_7readAllEvEUlRKSsE0_SsSsE3runES2_EUlS4_E2_vEES4_OT_NS2_6PreferEEUlS4_E_E9_M_invokeERKSt9_Any_dataS4_
> @ 0x2b4f28a4cc16 
> _ZN7process8internal3runISt8functionIFvRKNS_6FutureISsJRS4_EEEvRKSt6vectorIT_SaISB_EEDpOT0_
> @ 0x2b4f29a2479f process::Future<>::_set<>()
> @ 0x2b4f299f46a9 process::http::Pipe::Writer::close()
> @ 0x2b4f29a24d32 
> process::StreamingRequestDecoder::on_message_complete()
> @ 0x2b4f29b0641d http_parser_execute
> @ 0x2b4f29aaeafe process::internal::decode_recv()
> @ 0x2b4f29abc44b 
> _ZNSt17_Function_handlerIFvRKN7process6FutureImEEEZNKS2_5onAnyISt5_BindIFPFvS4_PcmNS0_7network8internal6SocketINS9_4inet7AddressEEEPNS0_23StreamingRequestDecoderEESt12_PlaceholderILi1EES8_mSE_SG_EEvEES4_OT_NS2_6PreferEEUlS4_E_E9_M_invokeERKSt9_Any_dataS4_
> @  0x14e136e process::internal::run<>()
> @  0x14e5d9f process::Future<>::_set<>()
> @ 0x2b4f29a4c23d 
> _ZN7process8internal4LoopIZNS_2io8internal4readEiPvmEUlvE_ZNS3_4readEiS4_mEUlRK6OptionImEE0_S7_mE3runENS_6FutureIS7_EE
> @ 0x2b4f29a4dc6f 
> _ZNSt17_Function_handlerIFvRKN7process6FutureINS0_11ControlFlowImEZNKS4_5onAnyIRZNS0_8internal4LoopIZNS0_2io8internal4readEiPvmEUlvE_ZNSC_4readEiSD_mEUlRK6OptionImEE0_SG_mE3runENS1_ISG_EEEUlS6_E0_vEES6_OT_NS4_6PreferEEUlS6_E_E9_M_invokeERKSt9_Any_dataS6_
> @ 0x2b4f29a5bec6 
> _ZN7process8internal3runISt8functionIFvRKNS_6FutureINS_11ControlFlowImEEJRS6_EEEvRKSt6vectorIT_SaISD_EEDpOT0_
> @ 0x2b4f29a5d971 process::Future<>::_set<>()
> @ 0x2b4f29a600a1 process::Promise<>::associate()
> @ 0x2b4f29a608da process::internal::thenf<>()
> @ 0x2b4f29b0170e 
> _ZN7process8internal3runISt8functionIFvRKNS_6FutureIsJRS4_EEEvRKSt6vectorIT_SaISB_EEDpOT0_
> @ 0x2b4f29b01cd1 process::Future<>::_set<>()
> @ 0x2b4f29b00b36 process::io::internal::pollCallback()
> @ 0x2b4f29b0b990 event_process_active_single_queue
> @ 0x2b4f29b0bf06 event_process_active
> @ 0x2b4f29b0c662 event_base_loop
> @ 0x2b4f29aff96d process::EventLoop::run()
> @ 0x2b4f2b4f5a60 (unknown)
> @ 0x2b4f2b22e184 start_thread
> {code}
> Find the log from the failed run attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4766) Improve allocator performance.

2017-01-17 Thread Michael Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Park updated MESOS-4766:

Assignee: (was: Michael Park)

> Improve allocator performance.
> --
>
> Key: MESOS-4766
> URL: https://issues.apache.org/jira/browse/MESOS-4766
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation
>Reporter: Benjamin Mahler
>Priority: Critical
>
> This is an epic to track the various tickets around improving the performance 
> of the allocator, including the following:
> * Preventing un-necessary backup of the allocator.
> * Reducing the cost of allocations and allocator state updates.
> * Improving performance of the DRF sorter.
> * More benchmarking to simulate scenarios with performance issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6937) ContentType/MasterAPITest.ReserveResources/1 fails during Writer close

2017-01-17 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-6937:
-
Summary: ContentType/MasterAPITest.ReserveResources/1 fails during Writer 
close  (was: ContentType/MasterAPITest.ReserveResources/1 segfaults during 
Writer close)

> ContentType/MasterAPITest.ReserveResources/1 fails during Writer close
> --
>
> Key: MESOS-6937
> URL: https://issues.apache.org/jira/browse/MESOS-6937
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
> Environment: ASF CI, Ubuntu 14.04, libevent and SSL enabled
>Reporter: Greg Mann
>  Labels: tests
>
> This was observed on ASF CI. Libevent was enabled, but the test in question 
> was not running in SSL-enabled mode. We see the following stack trace:
> {code}
> *** Error in `src/mesos-tests': double free or corruption (fasttop): 
> 0x2b4f7001bf70 ***
> *** Aborted at 1484691168 (unix time) try "date -d @1484691168" if you are 
> using GNU date ***
> PC: @ 0x2b4f2bc9ac37 (unknown)
> *** SIGABRT (@0x3e869c7) received by PID 27079 (TID 0x2b4f35be5700) from 
> PID 27079; stack trace: ***
> @ 0x2b4f2b236330 (unknown)
> @ 0x2b4f2bc9ac37 (unknown)
> @ 0x2b4f2bc9e028 (unknown)
> @ 0x2b4f2bcd72a4 (unknown)
> @ 0x2b4f2bce355e (unknown)
> @ 0x2b4f299e98a0 
> _ZNSt14_Function_base13_Base_managerIZN7process8internal4LoopIZNS1_4http4Pipe6Reader7readAllEvEUlvE_ZNS6_7readAllEvEUlRKSsE0_SsSsE3runENS1_6FutureISsEEEUlvE3_E10_M_managerERSt9_Any_dataRKSG_St18_Manager_operation
> @ 0x2b4f299fadb9 
> _ZN7process8internal4LoopIZNS_4http4Pipe6Reader7readAllEvEUlvE_ZNS4_7readAllEvEUlRKSsE0_SsSsE3runENS_6FutureISsEE
> @ 0x2b4f299fca57 
> _ZNSt17_Function_handlerIFvRKN7process6FutureISsEEEZNKS2_5onAnyIRZNS0_8internal4LoopIZNS0_4http4Pipe6Reader7readAllEvEUlvE_ZNSB_7readAllEvEUlRKSsE0_SsSsE3runES2_EUlS4_E2_vEES4_OT_NS2_6PreferEEUlS4_E_E9_M_invokeERKSt9_Any_dataS4_
> @ 0x2b4f28a4cc16 
> _ZN7process8internal3runISt8functionIFvRKNS_6FutureISsJRS4_EEEvRKSt6vectorIT_SaISB_EEDpOT0_
> @ 0x2b4f29a2479f process::Future<>::_set<>()
> @ 0x2b4f299f46a9 process::http::Pipe::Writer::close()
> @ 0x2b4f29a24d32 
> process::StreamingRequestDecoder::on_message_complete()
> @ 0x2b4f29b0641d http_parser_execute
> @ 0x2b4f29aaeafe process::internal::decode_recv()
> @ 0x2b4f29abc44b 
> _ZNSt17_Function_handlerIFvRKN7process6FutureImEEEZNKS2_5onAnyISt5_BindIFPFvS4_PcmNS0_7network8internal6SocketINS9_4inet7AddressEEEPNS0_23StreamingRequestDecoderEESt12_PlaceholderILi1EES8_mSE_SG_EEvEES4_OT_NS2_6PreferEEUlS4_E_E9_M_invokeERKSt9_Any_dataS4_
> @  0x14e136e process::internal::run<>()
> @  0x14e5d9f process::Future<>::_set<>()
> @ 0x2b4f29a4c23d 
> _ZN7process8internal4LoopIZNS_2io8internal4readEiPvmEUlvE_ZNS3_4readEiS4_mEUlRK6OptionImEE0_S7_mE3runENS_6FutureIS7_EE
> @ 0x2b4f29a4dc6f 
> _ZNSt17_Function_handlerIFvRKN7process6FutureINS0_11ControlFlowImEZNKS4_5onAnyIRZNS0_8internal4LoopIZNS0_2io8internal4readEiPvmEUlvE_ZNSC_4readEiSD_mEUlRK6OptionImEE0_SG_mE3runENS1_ISG_EEEUlS6_E0_vEES6_OT_NS4_6PreferEEUlS6_E_E9_M_invokeERKSt9_Any_dataS6_
> @ 0x2b4f29a5bec6 
> _ZN7process8internal3runISt8functionIFvRKNS_6FutureINS_11ControlFlowImEEJRS6_EEEvRKSt6vectorIT_SaISD_EEDpOT0_
> @ 0x2b4f29a5d971 process::Future<>::_set<>()
> @ 0x2b4f29a600a1 process::Promise<>::associate()
> @ 0x2b4f29a608da process::internal::thenf<>()
> @ 0x2b4f29b0170e 
> _ZN7process8internal3runISt8functionIFvRKNS_6FutureIsJRS4_EEEvRKSt6vectorIT_SaISB_EEDpOT0_
> @ 0x2b4f29b01cd1 process::Future<>::_set<>()
> @ 0x2b4f29b00b36 process::io::internal::pollCallback()
> @ 0x2b4f29b0b990 event_process_active_single_queue
> @ 0x2b4f29b0bf06 event_process_active
> @ 0x2b4f29b0c662 event_base_loop
> @ 0x2b4f29aff96d process::EventLoop::run()
> @ 0x2b4f2b4f5a60 (unknown)
> @ 0x2b4f2b22e184 start_thread
> {code}
> Find the log from the failed run attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6937) ContentType/MasterAPITest.ReserveResources/1 segfaults during Writer close

2017-01-17 Thread Greg Mann (JIRA)
Greg Mann created MESOS-6937:


 Summary: ContentType/MasterAPITest.ReserveResources/1 segfaults 
during Writer close
 Key: MESOS-6937
 URL: https://issues.apache.org/jira/browse/MESOS-6937
 Project: Mesos
  Issue Type: Bug
  Components: tests
 Environment: ASF CI, Ubuntu 14.04, libevent and SSL enabled
Reporter: Greg Mann


This was observed on ASF CI. Libevent was enabled, but the test in question was 
not running in SSL-enabled mode. We see the following stack trace:
{code}
*** Error in `src/mesos-tests': double free or corruption (fasttop): 
0x2b4f7001bf70 ***
*** Aborted at 1484691168 (unix time) try "date -d @1484691168" if you are 
using GNU date ***
PC: @ 0x2b4f2bc9ac37 (unknown)
*** SIGABRT (@0x3e869c7) received by PID 27079 (TID 0x2b4f35be5700) from 
PID 27079; stack trace: ***
@ 0x2b4f2b236330 (unknown)
@ 0x2b4f2bc9ac37 (unknown)
@ 0x2b4f2bc9e028 (unknown)
@ 0x2b4f2bcd72a4 (unknown)
@ 0x2b4f2bce355e (unknown)
@ 0x2b4f299e98a0 
_ZNSt14_Function_base13_Base_managerIZN7process8internal4LoopIZNS1_4http4Pipe6Reader7readAllEvEUlvE_ZNS6_7readAllEvEUlRKSsE0_SsSsE3runENS1_6FutureISsEEEUlvE3_E10_M_managerERSt9_Any_dataRKSG_St18_Manager_operation
@ 0x2b4f299fadb9 
_ZN7process8internal4LoopIZNS_4http4Pipe6Reader7readAllEvEUlvE_ZNS4_7readAllEvEUlRKSsE0_SsSsE3runENS_6FutureISsEE
@ 0x2b4f299fca57 
_ZNSt17_Function_handlerIFvRKN7process6FutureISsEEEZNKS2_5onAnyIRZNS0_8internal4LoopIZNS0_4http4Pipe6Reader7readAllEvEUlvE_ZNSB_7readAllEvEUlRKSsE0_SsSsE3runES2_EUlS4_E2_vEES4_OT_NS2_6PreferEEUlS4_E_E9_M_invokeERKSt9_Any_dataS4_
@ 0x2b4f28a4cc16 
_ZN7process8internal3runISt8functionIFvRKNS_6FutureISsJRS4_EEEvRKSt6vectorIT_SaISB_EEDpOT0_
@ 0x2b4f29a2479f process::Future<>::_set<>()
@ 0x2b4f299f46a9 process::http::Pipe::Writer::close()
@ 0x2b4f29a24d32 process::StreamingRequestDecoder::on_message_complete()
@ 0x2b4f29b0641d http_parser_execute
@ 0x2b4f29aaeafe process::internal::decode_recv()
@ 0x2b4f29abc44b 
_ZNSt17_Function_handlerIFvRKN7process6FutureImEEEZNKS2_5onAnyISt5_BindIFPFvS4_PcmNS0_7network8internal6SocketINS9_4inet7AddressEEEPNS0_23StreamingRequestDecoderEESt12_PlaceholderILi1EES8_mSE_SG_EEvEES4_OT_NS2_6PreferEEUlS4_E_E9_M_invokeERKSt9_Any_dataS4_
@  0x14e136e process::internal::run<>()
@  0x14e5d9f process::Future<>::_set<>()
@ 0x2b4f29a4c23d 
_ZN7process8internal4LoopIZNS_2io8internal4readEiPvmEUlvE_ZNS3_4readEiS4_mEUlRK6OptionImEE0_S7_mE3runENS_6FutureIS7_EE
@ 0x2b4f29a4dc6f 
_ZNSt17_Function_handlerIFvRKN7process6FutureINS0_11ControlFlowImEZNKS4_5onAnyIRZNS0_8internal4LoopIZNS0_2io8internal4readEiPvmEUlvE_ZNSC_4readEiSD_mEUlRK6OptionImEE0_SG_mE3runENS1_ISG_EEEUlS6_E0_vEES6_OT_NS4_6PreferEEUlS6_E_E9_M_invokeERKSt9_Any_dataS6_
@ 0x2b4f29a5bec6 
_ZN7process8internal3runISt8functionIFvRKNS_6FutureINS_11ControlFlowImEEJRS6_EEEvRKSt6vectorIT_SaISD_EEDpOT0_
@ 0x2b4f29a5d971 process::Future<>::_set<>()
@ 0x2b4f29a600a1 process::Promise<>::associate()
@ 0x2b4f29a608da process::internal::thenf<>()
@ 0x2b4f29b0170e 
_ZN7process8internal3runISt8functionIFvRKNS_6FutureIsJRS4_EEEvRKSt6vectorIT_SaISB_EEDpOT0_
@ 0x2b4f29b01cd1 process::Future<>::_set<>()
@ 0x2b4f29b00b36 process::io::internal::pollCallback()
@ 0x2b4f29b0b990 event_process_active_single_queue
@ 0x2b4f29b0bf06 event_process_active
@ 0x2b4f29b0c662 event_base_loop
@ 0x2b4f29aff96d process::EventLoop::run()
@ 0x2b4f2b4f5a60 (unknown)
@ 0x2b4f2b22e184 start_thread
{code}

Find the log from the failed run attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6902) Add support for agent capabilities

2017-01-17 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826871#comment-15826871
 ] 

Benjamin Mahler commented on MESOS-6902:


[~neilc] sounds good, it will be picked up automatically in the v1 
protobuf-based endpoints, but we should add it to the old-style http endpoints. 
[~guoger] can you take that on as well?

> Add support for agent capabilities
> --
>
> Key: MESOS-6902
> URL: https://issues.apache.org/jira/browse/MESOS-6902
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Reporter: Neil Conway
>Assignee: Jay Guo
>  Labels: mesosphere
>
> Similarly to how we might add support for master capabilities (MESOS-5675), 
> agent capabilities would also make sense: in a mixed cluster, the master 
> might have support for features that are not present on certain agents, and 
> vice versa.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6902) Add support for agent capabilities

2017-01-17 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826849#comment-15826849
 ] 

Neil Conway commented on MESOS-6902:


Should agent capabilities be exposed via one or more of the HTTP endpoints?

> Add support for agent capabilities
> --
>
> Key: MESOS-6902
> URL: https://issues.apache.org/jira/browse/MESOS-6902
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Reporter: Neil Conway
>Assignee: Jay Guo
>  Labels: mesosphere
>
> Similarly to how we might add support for master capabilities (MESOS-5675), 
> agent capabilities would also make sense: in a mixed cluster, the master 
> might have support for features that are not present on certain agents, and 
> vice versa.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6902) Add support for agent capabilities

2017-01-17 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826840#comment-15826840
 ] 

Benjamin Mahler commented on MESOS-6902:


The following introduce capabilities in AgentInfo, which gets exposed to the 
master. Executors also have AgentInfo available, but schedulers do not.

{noformat}

commit ec1a326397641af74b7349182159c07d360a4d73
Author: Jay Guo 
Date:   Tue Jan 17 13:14:36 2017 -0800

Added Capabilities to SlaveInfo protobuf message.

Frameworks can advertise their capabilities via the protobuf field
in FrameworkInfo and master can behave differently according them.
Similarly, agents should be able to advertise their capabilities in
a mixed cluster, so that master could make better decisions based
on them. For example, multi-role frameworks could only launch tasks
on agents with multi-role capabilities.

This allows us to handle upgrades in a more explicit manner, without
having to rely on version strings.

Review: https://reviews.apache.org/r/55562/
{noformat}

{noformat}
commit 4099daa87b9751d2917656e56dad2e416acc8a02
Author: Jay Guo 
Date:   Tue Jan 17 13:15:47 2017 -0800

Added capabilities to the master's Slave struct.

Review: https://reviews.apache.org/r/55563/
{noformat}

Right now there is no distinction between "internal" / "private" and "external" 
/ "public" capabilities either, which may be necessary in the future if we want 
to hide implementation details from the public API.

> Add support for agent capabilities
> --
>
> Key: MESOS-6902
> URL: https://issues.apache.org/jira/browse/MESOS-6902
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Reporter: Neil Conway
>Assignee: Jay Guo
>  Labels: mesosphere
>
> Similarly to how we might add support for master capabilities (MESOS-5675), 
> agent capabilities would also make sense: in a mixed cluster, the master 
> might have support for features that are not present on certain agents, and 
> vice versa.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4245) Add `dist` target to CMake solution

2017-01-17 Thread Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivas reassigned MESOS-4245:
---

Assignee: Srinivas  (was: Alex Clemmer)

> Add `dist` target to CMake solution
> ---
>
> Key: MESOS-4245
> URL: https://issues.apache.org/jira/browse/MESOS-4245
> Project: Mesos
>  Issue Type: Bug
>  Components: cmake
>Reporter: Alex Clemmer
>Assignee: Srinivas
>  Labels: cmake, mesosphere, microsoft, windows
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4766) Improve allocator performance.

2017-01-17 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826726#comment-15826726
 ] 

Benjamin Mahler commented on MESOS-4766:


The main work currently being done for this is MESOS-6904 which has superseded 
MESOS-3157. I'm not sure if [~jjanco] and [~xujyan] think it will make the 
release.

I will drop the target version for this epic, my plan was to tidy it up with 
what I think we should focus on once we have MESOS-6904 complete.

> Improve allocator performance.
> --
>
> Key: MESOS-4766
> URL: https://issues.apache.org/jira/browse/MESOS-4766
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation
>Reporter: Benjamin Mahler
>Assignee: Michael Park
>Priority: Critical
>
> This is an epic to track the various tickets around improving the performance 
> of the allocator, including the following:
> * Preventing un-necessary backup of the allocator.
> * Reducing the cost of allocations and allocator state updates.
> * Improving performance of the DRF sorter.
> * More benchmarking to simulate scenarios with performance issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4766) Improve allocator performance.

2017-01-17 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-4766:
---
Target Version/s:   (was: 1.2.0)

> Improve allocator performance.
> --
>
> Key: MESOS-4766
> URL: https://issues.apache.org/jira/browse/MESOS-4766
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation
>Reporter: Benjamin Mahler
>Assignee: Michael Park
>Priority: Critical
>
> This is an epic to track the various tickets around improving the performance 
> of the allocator, including the following:
> * Preventing un-necessary backup of the allocator.
> * Reducing the cost of allocations and allocator state updates.
> * Improving performance of the DRF sorter.
> * More benchmarking to simulate scenarios with performance issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6904) Perform batching of allocations to reduce allocator queue backlogging.

2017-01-17 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-6904:
---
Description: 
Per MESOS-3157:

{quote}
Our deployment environments have a lot of churn, with many short-live 
frameworks that often revive offers. Running the allocator takes a long time 
(from seconds up to minutes).

In this situation, event-triggered allocation causes the event queue in the 
allocator process to get very long, and the allocator effectively becomes 
unresponsive (eg. a revive offers message takes too long to come to the head of 
the queue).
{quote}

To remedy the above scenario, it is proposed to perform batching of the 
enqueued allocation operations so that a single allocation operation can 
satisfy N enqueued allocations. This should reduce the potential for 
backlogging in the allocator. See the discussion 
[here|https://issues.apache.org/jira/browse/MESOS-3157?focusedCommentId=14728377=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14728377]
 in MESOS-3157.

  was:
"Our deployment environments have a lot of churn, with many short-live 
frameworks that often revive offers. Running the allocator takes a long time 
(from seconds up to minutes).
In this situation, event-triggered allocation causes the event queue in the 
allocator process to get very long, and the allocator effectively becomes 
unresponsive (eg. a revive offers message takes too long to come to the head of 
the queue)." - MESOS-3157 

To remedy the above scenario, it is proposed to track allocation candidates and 
only dispatch allocation work if there is no pending allocation in the 
allocator queue. When an enqueued allocation is processed, the tracked set of 
candidates is cleared. 

Current behavior will trigger allocation work on cluster events (e.g. 
`addSlave()`, `addFramework()`, etc) as well as during the periodic batched 
allocation running at a defined time interval. 

This ticket tracks the new direction the work has taken since discussion in 
MESOS-3157 where a previous solution by [~jamespeach] introduced batched 
allocation only (which we currently run) as well as an approach to reduce 
redundancy of work in the queue. 

Summary: Perform batching of allocations to reduce allocator queue 
backlogging.  (was: Track resource allocation candidates and batch allocation 
work)

> Perform batching of allocations to reduce allocator queue backlogging.
> --
>
> Key: MESOS-6904
> URL: https://issues.apache.org/jira/browse/MESOS-6904
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Jacob Janco
>Assignee: Jacob Janco
>  Labels: allocator
>
> Per MESOS-3157:
> {quote}
> Our deployment environments have a lot of churn, with many short-live 
> frameworks that often revive offers. Running the allocator takes a long time 
> (from seconds up to minutes).
> In this situation, event-triggered allocation causes the event queue in the 
> allocator process to get very long, and the allocator effectively becomes 
> unresponsive (eg. a revive offers message takes too long to come to the head 
> of the queue).
> {quote}
> To remedy the above scenario, it is proposed to perform batching of the 
> enqueued allocation operations so that a single allocation operation can 
> satisfy N enqueued allocations. This should reduce the potential for 
> backlogging in the allocator. See the discussion 
> [here|https://issues.apache.org/jira/browse/MESOS-3157?focusedCommentId=14728377=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14728377]
>  in MESOS-3157.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3157) Only perform periodic resource allocations.

2017-01-17 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826703#comment-15826703
 ] 

Benjamin Mahler commented on MESOS-3157:


The work to perform batching of enqueued allocations will be done in 
MESOS-6904, unassigning this ticket.

> Only perform periodic resource allocations.
> ---
>
> Key: MESOS-3157
> URL: https://issues.apache.org/jira/browse/MESOS-3157
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: James Peach
>
> Our deployment environments have a lot of churn, with many short-live 
> frameworks that often revive offers. Running the allocator takes a long time 
> (from seconds up to minutes).
> In this situation, event-triggered allocation causes the event queue in the 
> allocator process to get very long, and the allocator effectively becomes 
> unresponsive (eg. a revive offers message takes too long to come to the head 
> of the queue).
> We have been running a patch to remove all the event-triggered allocations 
> and only allocate periodically on the allocation interval. This works great 
> and really improves responsiveness.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3157) Only perform periodic resource allocations.

2017-01-17 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-3157:
---
Assignee: (was: Jacob Janco)

> Only perform periodic resource allocations.
> ---
>
> Key: MESOS-3157
> URL: https://issues.apache.org/jira/browse/MESOS-3157
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: James Peach
>
> Our deployment environments have a lot of churn, with many short-live 
> frameworks that often revive offers. Running the allocator takes a long time 
> (from seconds up to minutes).
> In this situation, event-triggered allocation causes the event queue in the 
> allocator process to get very long, and the allocator effectively becomes 
> unresponsive (eg. a revive offers message takes too long to come to the head 
> of the queue).
> We have been running a patch to remove all the event-triggered allocations 
> and only allocate periodically on the allocation interval. This works great 
> and really improves responsiveness.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3968) DiskQuotaTest.SlaveRecovery is flaky

2017-01-17 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826567#comment-15826567
 ] 

Neil Conway commented on MESOS-3968:


Different flakiness observed:

{noformat}
[ RUN  ] DiskQuotaTest.SlaveRecovery
I0117 10:24:47.009428 3480249280 exec.cpp:162] Version: 1.2.0
I0117 10:24:47.017849 231346176 exec.cpp:237] Executor registered on agent 
ce4c04e6-4504-4e0b-ae39-72d04d61d42c-S0
Received SUBSCRIBED event
Subscribed executor on 10.0.1.9
Received LAUNCH event
Starting task bcda0281-ed47-43f9-b0fd-5c86d61357cd
/Users/neilc/ms/build-mesos/src/mesos-containerizer launch --help="false" 
--launch_info="{"command":{"shell":true,"value":"dd if=\/dev\/zero of=file 
bs=1048576 count=2 && sleep 1000"}}"
Forked command at 43217
I0117 10:24:47.080565 232955904 exec.cpp:283] Received reconnect request from 
agent ce4c04e6-4504-4e0b-ae39-72d04d61d42c-S0
I0117 10:24:47.089931 233492480 exec.cpp:260] Executor re-registered on agent 
ce4c04e6-4504-4e0b-ae39-72d04d61d42c-S0
Received SUBSCRIBED event
Subscribed executor on 10.0.1.9
2+0 records in
2+0 records out
2097152 bytes transferred in 0.006863 secs (305568437 bytes/sec)
../../mesos/src/tests/disk_quota_tests.cpp:655: Failure
Value of: usage->has_disk_limit_bytes()
  Actual: false
Expected: true
I0117 10:24:47.114868 232419328 exec.cpp:410] Executor asked to shutdown
Received SHUTDOWN event
Shutting down
Sending SIGTERM to process tree at pid 43217
[  FAILED  ] DiskQuotaTest.SlaveRecovery (426 ms)
{noformat}

> DiskQuotaTest.SlaveRecovery is flaky
> 
>
> Key: MESOS-3968
> URL: https://issues.apache.org/jira/browse/MESOS-3968
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Mahler
>  Labels: flaky-test, mesosphere
>
> {noformat: title=Failed Run}
> [ RUN  ] DiskQuotaTest.SlaveRecovery
> I1120 12:02:54.015383 29806 leveldb.cpp:176] Opened db in 2.965411ms
> I1120 12:02:54.018033 29806 leveldb.cpp:183] Compacted db in 2.585354ms
> I1120 12:02:54.018175 29806 leveldb.cpp:198] Created db iterator in 27134ns
> I1120 12:02:54.018275 29806 leveldb.cpp:204] Seeked to beginning of db in 
> 3025ns
> I1120 12:02:54.018375 29806 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 679ns
> I1120 12:02:54.018491 29806 replica.cpp:780] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1120 12:02:54.021386 29838 recover.cpp:449] Starting replica recovery
> I1120 12:02:54.021692 29838 recover.cpp:475] Replica is in EMPTY status
> I1120 12:02:54.022189 29827 master.cpp:367] Master 
> 9a3c45ec-28b3-49e6-a83f-1f2035cc1105 (a51e6bb03b55) started on 
> 172.17.5.188:41228
> I1120 12:02:54.022212 29827 master.cpp:369] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/DsMniF/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/DsMniF/master" --zk_session_timeout="10secs"
> I1120 12:02:54.022557 29827 master.cpp:414] Master only allowing 
> authenticated frameworks to register
> I1120 12:02:54.022569 29827 master.cpp:419] Master only allowing 
> authenticated slaves to register
> I1120 12:02:54.022578 29827 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/DsMniF/credentials'
> I1120 12:02:54.022896 29827 master.cpp:458] Using default 'crammd5' 
> authenticator
> I1120 12:02:54.023217 29827 master.cpp:495] Authorization enabled
> I1120 12:02:54.023512 29831 whitelist_watcher.cpp:79] No whitelist given
> I1120 12:02:54.023814 29833 replica.cpp:676] Replica in EMPTY status received 
> a broadcasted recover request from (562)@172.17.5.188:41228
> I1120 12:02:54.023519 29832 hierarchical.cpp:153] Initialized hierarchical 
> allocator process
> I1120 12:02:54.025997 29831 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1120 12:02:54.027042 29832 recover.cpp:566] Updating replica status to 
> STARTING
> I1120 12:02:54.027354 29830 master.cpp:1612] The newly elected leader is 
> master@172.17.5.188:41228 with id 9a3c45ec-28b3-49e6-a83f-1f2035cc1105
> I1120 12:02:54.027385 29830 master.cpp:1625] Elected as the leading master!
> I1120 

[jira] [Updated] (MESOS-3396) Fully separate out libprocess and Stout CMake build system from the Mesos build system

2017-01-17 Thread Alex Clemmer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Clemmer updated MESOS-3396:

Description: 
The official goal is to be able to put libprocess and stout into a call to 
`ExternalProject_Add`, rather than having them built in-tree as they are now. 
Since Libprocess and Stout depend on a few variables being defined by the 
project that is building against it (such as, e.g., the `LINUX` variable) this 
will involve, at minimum, figuring out which `-D` flags have to be passed 
through the `ExternalProject_Add` call.

NOTE: This goal may not be feasible. We will need to trigger a rebuild of many 
source files if we change a header in Libprocess or Stout, and a relink if we 
change a .cpp file in the source files of Libprocess. This might require a fair 
bit of effort.

Another complication is that `StoutConfigure` manages the dependencies of 
Stout, and Stout is built through `ExternalProject_Add`, we will need to make 
sure this is managed in roughly the same way it is now.

  was:
This may or may not be elegant or helpful, as it means copying a lot of 
variables out and around. A good example is that the `LINUX` flag is defined in 
MesosConfigure right now, but it's used throughout the project. So, if you 
wanted to separate that out you'd have to define it independently for both 
codebases.

The same goes for the third-party directory structure.


> Fully separate out libprocess and Stout CMake build system from the Mesos 
> build system
> --
>
> Key: MESOS-3396
> URL: https://issues.apache.org/jira/browse/MESOS-3396
> Project: Mesos
>  Issue Type: Task
>  Components: cmake
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: cmake, mesosphere, microsoft, windows-mvp
>
> The official goal is to be able to put libprocess and stout into a call to 
> `ExternalProject_Add`, rather than having them built in-tree as they are now. 
> Since Libprocess and Stout depend on a few variables being defined by the 
> project that is building against it (such as, e.g., the `LINUX` variable) 
> this will involve, at minimum, figuring out which `-D` flags have to be 
> passed through the `ExternalProject_Add` call.
> NOTE: This goal may not be feasible. We will need to trigger a rebuild of 
> many source files if we change a header in Libprocess or Stout, and a relink 
> if we change a .cpp file in the source files of Libprocess. This might 
> require a fair bit of effort.
> Another complication is that `StoutConfigure` manages the dependencies of 
> Stout, and Stout is built through `ExternalProject_Add`, we will need to make 
> sure this is managed in roughly the same way it is now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3396) Fully separate out libprocess and Stout CMake build system from the Mesos build system

2017-01-17 Thread Alex Clemmer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Clemmer updated MESOS-3396:

Priority: Minor  (was: Major)

> Fully separate out libprocess and Stout CMake build system from the Mesos 
> build system
> --
>
> Key: MESOS-3396
> URL: https://issues.apache.org/jira/browse/MESOS-3396
> Project: Mesos
>  Issue Type: Task
>  Components: cmake
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>Priority: Minor
>  Labels: cmake, mesosphere, microsoft, windows-mvp
>
> The official goal is to be able to put libprocess and stout into a call to 
> `ExternalProject_Add`, rather than having them built in-tree as they are now. 
> Since Libprocess and Stout depend on a few variables being defined by the 
> project that is building against it (such as, e.g., the `LINUX` variable) 
> this will involve, at minimum, figuring out which `-D` flags have to be 
> passed through the `ExternalProject_Add` call.
> NOTE: This goal may not be feasible. We will need to trigger a rebuild of 
> many source files if we change a header in Libprocess or Stout, and a relink 
> if we change a .cpp file in the source files of Libprocess. This might 
> require a fair bit of effort.
> Another complication is that `StoutConfigure` manages the dependencies of 
> Stout, and Stout is built through `ExternalProject_Add`, we will need to make 
> sure this is managed in roughly the same way it is now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6780) ContentType/AgentAPIStreamingTest.AttachContainerInput test fails reliably

2017-01-17 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826515#comment-15826515
 ] 

Till Toenshoff commented on MESOS-6780:
---

Hey [~klueska] - we discussed my approach and IIRC you had a better one in 
mind. I have finally reassigned this to you. 

> ContentType/AgentAPIStreamingTest.AttachContainerInput test fails reliably
> --
>
> Key: MESOS-6780
> URL: https://issues.apache.org/jira/browse/MESOS-6780
> Project: Mesos
>  Issue Type: Bug
> Environment: Mac OS 10.12, clang version 4.0.0 
> (http://llvm.org/git/clang 88800602c0baafb8739cb838c2fa3f5fb6cc6968) 
> (http://llvm.org/git/llvm 25801f0f22e178343ee1eadfb4c6cc058628280e), 
> libc++-513447dbb91dd555ea08297dbee6a1ceb6abdc46
>Reporter: Benjamin Bannier
>Assignee: Kevin Klues
>Priority: Blocker
>  Labels: mesosphere
> Attachments: attach_container_input_no_ssl.log
>
>
> The test {{ContentType/AgentAPIStreamingTest.AttachContainerInput}} (both 
> {{/0}} and {{/1}}) fail consistently for me in an SSL-enabled, optimized 
> build.
> {code}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from ContentType/AgentAPIStreamingTest
> [ RUN  ] ContentType/AgentAPIStreamingTest.AttachContainerInput/0
> I1212 17:11:12.371175 3971208128 cluster.cpp:160] Creating default 'local' 
> authorizer
> I1212 17:11:12.393844 17362944 master.cpp:380] Master 
> c752777c-d947-4a86-b382-643463866472 (172.18.8.114) started on 
> 172.18.8.114:51059
> I1212 17:11:12.393899 17362944 master.cpp:382] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" 
> --credentials="/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/credentials"
>  --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/master"
>  --zk_session_timeout="10secs"
> I1212 17:11:12.394670 17362944 master.cpp:432] Master only allowing 
> authenticated frameworks to register
> I1212 17:11:12.394682 17362944 master.cpp:446] Master only allowing 
> authenticated agents to register
> I1212 17:11:12.394691 17362944 master.cpp:459] Master only allowing 
> authenticated HTTP frameworks to register
> I1212 17:11:12.394701 17362944 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/credentials'
> I1212 17:11:12.394959 17362944 master.cpp:504] Using default 'crammd5' 
> authenticator
> I1212 17:11:12.394996 17362944 authenticator.cpp:519] Initializing server SASL
> I1212 17:11:12.411406 17362944 http.cpp:922] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I1212 17:11:12.411571 17362944 http.cpp:922] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I1212 17:11:12.411682 17362944 http.cpp:922] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I1212 17:11:12.411775 17362944 master.cpp:584] Authorization enabled
> I1212 17:11:12.413318 16289792 master.cpp:2045] Elected as the leading master!
> I1212 17:11:12.413377 16289792 master.cpp:1568] Recovering from registrar
> I1212 17:11:12.417582 14143488 registrar.cpp:362] Successfully fetched the 
> registry (0B) in 4.131072ms
> I1212 17:11:12.417667 14143488 registrar.cpp:461] Applied 1 operations in 
> 27us; attempting to update the registry
> I1212 17:11:12.421799 14143488 registrar.cpp:506] Successfully updated the 
> registry in 4.10496ms
> I1212 17:11:12.421835 14143488 registrar.cpp:392] Successfully recovered 
> registrar
> I1212 17:11:12.421998 17362944 master.cpp:1684] Recovered 0 agents from the 
> registry (136B); allowing 

[jira] [Updated] (MESOS-970) Upgrade bundled leveldb to 1.19

2017-01-17 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-970:
---
Shepherd: haosdent

> Upgrade bundled leveldb to 1.19
> ---
>
> Key: MESOS-970
> URL: https://issues.apache.org/jira/browse/MESOS-970
> Project: Mesos
>  Issue Type: Improvement
>  Components: replicated log
>Reporter: Benjamin Mahler
>Assignee: Tomasz Janiszewski
>
> We currently bundle leveldb 1.4, and the latest version is leveldb 1.19.
> A careful review of the fixes and changes in each release would be prudent. 
> Regression testing and performance testing would also be prudent, given the 
> replicated log is built on leveldb.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6936) Add support for media types needed for streaming request/responses.

2017-01-17 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-6936:
--
Sprint: Mesosphere Sprint 49

> Add support for media types needed for streaming request/responses.
> ---
>
> Key: MESOS-6936
> URL: https://issues.apache.org/jira/browse/MESOS-6936
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>Priority: Blocker
>  Labels: mesosphere
>
> As per the design document created as part of MESOS-3601, we need to add 
> support for the additional media types proposed to our API handlers for 
> supporting request streaming. These headers would also be used by the server 
> in the future for streaming responses.
> The following media types needed to be added:
> {{RecordIO-Accept}}: Enables the client to perform content negotiation for 
> the contents of the stream. The supported values for this header would be 
> {{application/json}} and {{application/x-protobuf}}.
> {{RecordIO-Content-Type}}: The content type of the RecordIO stream sent by 
> the server. The supported values for this header would be 
> {{application/json}} and {{application/x-protobuf}}.
> The {{Content-Type}} for the response would be {{application/recordio}}. For 
> more details/examples see the alternate proposal section of the design doc:
> https://docs.google.com/document/d/1OV1D5uUmWNvTaX3qEO9fZGo4FRlCSqrx0IHq5GuLAk8/edit#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3601) Formalize all headers and metadata for HTTP API Event Stream

2017-01-17 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826427#comment-15826427
 ] 

Anand Mazumdar commented on MESOS-3601:
---

The implementation of the headers proposed in the design doc is being tracked 
on MESOS-6936

Resolving this for now. 

> Formalize all headers and metadata for HTTP API Event Stream
> 
>
> Key: MESOS-3601
> URL: https://issues.apache.org/jira/browse/MESOS-3601
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 0.24.0
> Environment: Mesos 0.24.0
>Reporter: Ben Whitehead
>Assignee: Anand Mazumdar
>Priority: Blocker
>  Labels: api, http, mesosphere, wireprotocol
> Fix For: 1.2.0
>
>
> From an HTTP standpoint the current set of headers returned when connecting 
> to the HTTP scheduler API are insufficient. 
> {code:title=current headers}
> HTTP/1.1 200 OK
> Transfer-Encoding: chunked
> Date: Wed, 30 Sep 2015 21:07:16 GMT
> Content-Type: application/json
> {code}
> Since the response from mesos is intended to function as a stream 
> {{Connection: keep-alive}} should be specified so that the connection can 
> remain open.
> If RecordIO is going to be applied to the messages, the headers should 
> include the information necessary for a client to be able to detect RecordIO 
> and setup it response handlers appropriately.
> How RecordIO is expressed will come down to the semantics of what is actually 
> "Returned" as the response from {{POST /api/v1/scheduler}}.
> h4. Proposal
> One approach would be to leverage http as much as possible, having a client 
> specify an {{Accept-Encoding}} along with the {{Accept}} header to indicate 
> that it can handle RecordIO {{Content-Encoding}} of {{Content-Type}} 
> messages.  (This approach allows for things like gzip to be woven in fairly 
> easily in the future)
> For this approach I would expect the following:
> {code:title=Request}
> POST /api/v1/scheduler HTTP/1.1
> Host: localhost:5050
> Accept: application/x-protobuf
> Accept-Encoding: recordio
> Content-Type: application/x-protobuf
> Content-Length: 35
> User-Agent: RxNetty Client
> {code}
> {code:title=Response}
> HTTP/1.1 200 OK
> Connection: keep-alive
> Transfer-Encoding: chunked
> Content-Type: application/x-protobuf
> Content-Encoding: recordio
> Cache-Control: no-transform
> {code}
> When Content-Encoding is used it is recommended to set {{Cache-Control: 
> no-transform}} to signal to any proxies that no transformation should be 
> applied to the the content encoding [Section 14.11 RFC 
> 2616|http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.11].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6936) Add support for media types needed for streaming request/responses.

2017-01-17 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-6936:
-

 Summary: Add support for media types needed for streaming 
request/responses.
 Key: MESOS-6936
 URL: https://issues.apache.org/jira/browse/MESOS-6936
 Project: Mesos
  Issue Type: Improvement
Reporter: Anand Mazumdar
Assignee: Anand Mazumdar
Priority: Blocker


As per the design document created as part of MESOS-3601, we need to add 
support for the additional media types proposed to our API handlers for 
supporting request streaming. These headers would also be used by the server in 
the future for streaming responses.

The following media types needed to be added:

{{RecordIO-Accept}}: Enables the client to perform content negotiation for the 
contents of the stream. The supported values for this header would be 
{{application/json}} and {{application/x-protobuf}}.
{{RecordIO-Content-Type}}: The content type of the RecordIO stream sent by the 
server. The supported values for this header would be {{application/json}} and 
{{application/x-protobuf}}.

The {{Content-Type}} for the response would be {{application/recordio}}. For 
more details/examples see the alternate proposal section of the design doc:

https://docs.google.com/document/d/1OV1D5uUmWNvTaX3qEO9fZGo4FRlCSqrx0IHq5GuLAk8/edit#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6780) ContentType/AgentAPIStreamingTest.AttachContainerInput test fails reliably

2017-01-17 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-6780:
--
Assignee: Kevin Klues  (was: Till Toenshoff)

> ContentType/AgentAPIStreamingTest.AttachContainerInput test fails reliably
> --
>
> Key: MESOS-6780
> URL: https://issues.apache.org/jira/browse/MESOS-6780
> Project: Mesos
>  Issue Type: Bug
> Environment: Mac OS 10.12, clang version 4.0.0 
> (http://llvm.org/git/clang 88800602c0baafb8739cb838c2fa3f5fb6cc6968) 
> (http://llvm.org/git/llvm 25801f0f22e178343ee1eadfb4c6cc058628280e), 
> libc++-513447dbb91dd555ea08297dbee6a1ceb6abdc46
>Reporter: Benjamin Bannier
>Assignee: Kevin Klues
>Priority: Blocker
>  Labels: mesosphere
> Attachments: attach_container_input_no_ssl.log
>
>
> The test {{ContentType/AgentAPIStreamingTest.AttachContainerInput}} (both 
> {{/0}} and {{/1}}) fail consistently for me in an SSL-enabled, optimized 
> build.
> {code}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from ContentType/AgentAPIStreamingTest
> [ RUN  ] ContentType/AgentAPIStreamingTest.AttachContainerInput/0
> I1212 17:11:12.371175 3971208128 cluster.cpp:160] Creating default 'local' 
> authorizer
> I1212 17:11:12.393844 17362944 master.cpp:380] Master 
> c752777c-d947-4a86-b382-643463866472 (172.18.8.114) started on 
> 172.18.8.114:51059
> I1212 17:11:12.393899 17362944 master.cpp:382] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" 
> --credentials="/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/credentials"
>  --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/master"
>  --zk_session_timeout="10secs"
> I1212 17:11:12.394670 17362944 master.cpp:432] Master only allowing 
> authenticated frameworks to register
> I1212 17:11:12.394682 17362944 master.cpp:446] Master only allowing 
> authenticated agents to register
> I1212 17:11:12.394691 17362944 master.cpp:459] Master only allowing 
> authenticated HTTP frameworks to register
> I1212 17:11:12.394701 17362944 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/credentials'
> I1212 17:11:12.394959 17362944 master.cpp:504] Using default 'crammd5' 
> authenticator
> I1212 17:11:12.394996 17362944 authenticator.cpp:519] Initializing server SASL
> I1212 17:11:12.411406 17362944 http.cpp:922] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I1212 17:11:12.411571 17362944 http.cpp:922] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I1212 17:11:12.411682 17362944 http.cpp:922] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I1212 17:11:12.411775 17362944 master.cpp:584] Authorization enabled
> I1212 17:11:12.413318 16289792 master.cpp:2045] Elected as the leading master!
> I1212 17:11:12.413377 16289792 master.cpp:1568] Recovering from registrar
> I1212 17:11:12.417582 14143488 registrar.cpp:362] Successfully fetched the 
> registry (0B) in 4.131072ms
> I1212 17:11:12.417667 14143488 registrar.cpp:461] Applied 1 operations in 
> 27us; attempting to update the registry
> I1212 17:11:12.421799 14143488 registrar.cpp:506] Successfully updated the 
> registry in 4.10496ms
> I1212 17:11:12.421835 14143488 registrar.cpp:392] Successfully recovered 
> registrar
> I1212 17:11:12.421998 17362944 master.cpp:1684] Recovered 0 agents from the 
> registry (136B); allowing 10mins for agents to re-register
> I1212 17:11:12.422780 3971208128 containerizer.cpp:220] Using isolation: 
> 

[jira] [Created] (MESOS-6935) Operator API to get current frameworks only.

2017-01-17 Thread James Peach (JIRA)
James Peach created MESOS-6935:
--

 Summary: Operator API to get current frameworks only.
 Key: MESOS-6935
 URL: https://issues.apache.org/jira/browse/MESOS-6935
 Project: Mesos
  Issue Type: Improvement
  Components: HTTP API
Reporter: James Peach


The master {{GET_FRAMEWORKS}} operator API always return the current frameworks 
and the {{completed_frameworks}}. Since the set of {{completed_frameworks}} can 
be very large and is often not wanted, it would be helpful if there was a way 
to exclude those.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3505) Support specifying Docker image by Image ID.

2017-01-17 Thread Ilya Pronin (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826373#comment-15826373
 ] 

Ilya Pronin commented on MESOS-3505:


V2 Schema 2 support is in MESOS-6934

> Support specifying Docker image by Image ID.
> 
>
> Key: MESOS-3505
> URL: https://issues.apache.org/jira/browse/MESOS-3505
> Project: Mesos
>  Issue Type: Story
>Reporter: Yan Xu
>Assignee: Ilya Pronin
>  Labels: mesosphere
> Fix For: 1.2.0
>
>
> A common way to specify a Docker image with the docker engine is through 
> {{repo:tag}}, which is convenient and sufficient for most people in most 
> scenarios. However this combination is neither precise nor immutable.
> For this reason, it's possible when an image with a {{repo:tag}} already 
> cached locally on an agent host and a task requiring this {{repo:tag}} 
> arrives, it's using an image that's different than the one the user intended.
> Docker CLI already supports referring to an image by {{repo@id}}, where the 
> ID can have two forms:
> * v1 Image ID
> * digest
> Native Mesos provisioner should support the same for Docker images. IMO it's 
> fine if image discovery by ID is not supported (and thus still requiring 
> {{repo:tag}} to be specified) (looks like [v2 
> registry|http://docs.docker.com/registry/spec/api/] does support it) but the 
> user can optionally specify an image ID and match it against the cached / 
> newly pulled image. If the ID doesn't match the cached image, the store can 
> re-pull it; if the ID doesn't match the newly pulled image (manifest), the 
> provisioner can fail the request without having the user unknowingly running 
> its task on the wrong image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6934) Support pulling Docker images with V2 Schema 2 image manifest

2017-01-17 Thread Ilya Pronin (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826367#comment-15826367
 ] 

Ilya Pronin commented on MESOS-6934:


Review requests:
https://reviews.apache.org/r/53849/
https://reviews.apache.org/r/53850/

> Support pulling Docker images with V2 Schema 2 image manifest
> -
>
> Key: MESOS-6934
> URL: https://issues.apache.org/jira/browse/MESOS-6934
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Ilya Pronin
>Assignee: Ilya Pronin
>
> MESOS-3505 added support for pulling Docker images by their digest to the 
> Mesos Containerizer provisioner. However currently it only works with images 
> that were pushed with Docker 1.9 and older or with Registry 2.2.1 and older. 
> Newer versions use Schema 2 manifests by default. Because of CAS constraints 
> the registry does not convert those manifests on-the-fly to Schema 1 when 
> they are being pulled by digest.
> Compatibility details are documented here: 
> https://docs.docker.com/registry/compatibility/
> Image Manifest V2, Schema 2 is documented here: 
> https://docs.docker.com/registry/spec/manifest-v2-2/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6934) Support pulling Docker images with V2 Schema 2 image manifest

2017-01-17 Thread Ilya Pronin (JIRA)
Ilya Pronin created MESOS-6934:
--

 Summary: Support pulling Docker images with V2 Schema 2 image 
manifest
 Key: MESOS-6934
 URL: https://issues.apache.org/jira/browse/MESOS-6934
 Project: Mesos
  Issue Type: Improvement
  Components: containerization
Reporter: Ilya Pronin
Assignee: Ilya Pronin


MESOS-3505 added support for pulling Docker images by their digest to the Mesos 
Containerizer provisioner. However currently it only works with images that 
were pushed with Docker 1.9 and older or with Registry 2.2.1 and older. Newer 
versions use Schema 2 manifests by default. Because of CAS constraints the 
registry does not convert those manifests on-the-fly to Schema 1 when they are 
being pulled by digest.

Compatibility details are documented here: 
https://docs.docker.com/registry/compatibility/
Image Manifest V2, Schema 2 is documented here: 
https://docs.docker.com/registry/spec/manifest-v2-2/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6917) Segfault when the executor sets an invalid UUID when sending a status update.

2017-01-17 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-6917:
--
Shepherd: Vinod Kone  (was: Anand Mazumdar)

> Segfault when the executor sets an invalid UUID  when sending a status update.
> --
>
> Key: MESOS-6917
> URL: https://issues.apache.org/jira/browse/MESOS-6917
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.0, 1.0.1, 1.0.2, 1.1.0
>Reporter: Aaron Wood
>Assignee: Aaron Wood
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 1.1.1, 1.2.0
>
>
> A segfault occurs when an executor sets a UUID that's not a valid v4 UUID and 
> sends it off to the agent:
> {code}
> ABORT: (../../3rdparty/stout/include/stout/try.hpp:77): Try::get() but state 
> == ERROR: Not a valid UUID
> *** Aborted at 1484262968 (unix time) try "date -d @1484262968" if you are 
> using GNU date ***
> PC: @ 0x7efeb6101428 (unknown)
> *** SIGABRT (@0x36b7) received by PID 14007 (TID 0x7efeabd29700) from PID 
> 14007; stack trace: ***
> @ 0x7efeb64a6390 (unknown)
> @ 0x7efeb6101428 (unknown)
> @ 0x7efeb610302a (unknown)
> @ 0x560df739fa6e _Abort()
> @ 0x560df739fa9c _Abort()
> @ 0x7efebb53a5ad Try<>::get()
> @ 0x7efebb5363d6 Try<>::get()
> @ 0x7efebbd84809 
> mesos::internal::slave::validation::executor::call::validate()
> @ 0x7efebbb59b36 mesos::internal::slave::Slave::Http::executor()
> @ 0x7efebbc773b8 
> _ZZN5mesos8internal5slave5Slave10initializeEvENKUlRKN7process4http7RequestEE1_clES7_
> @ 0x7efebbcb5808 
> _ZNSt17_Function_handlerIFN7process6FutureINS0_4http8ResponseEEERKNS2_7RequestEEZN5mesos8internal5slave5Slave10initializeEvEUlS7_E1_E9_M_invokeERKSt9_Any_dataS7_
> @ 0x7efebbfb2aea std::function<>::operator()()
> @ 0x7efebcb158b8 
> _ZZZN7process11ProcessBase6_visitERKNS0_12HttpEndpointERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS_5OwnedINS_4http7RequestNKUlRK6OptionINSD_14authentication20AuthenticationResultEEE0_clESN_ENKUlbE1_clEb
> @ 0x7efebcb1a10a 
> _ZZZNK7process9_DeferredIZZNS_11ProcessBase6_visitERKNS1_12HttpEndpointERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS_5OwnedINS_4http7RequestNKUlRK6OptionINSE_14authentication20AuthenticationResultEEE0_clESO_EUlbE1_EcvSt8functionIFT_T0_EEINS_6FutureINSE_8ResponseEEERKbEEvENKUlS12_E_clES12_ENKUlvE_clEv
> @ 0x7efebcb1c5f8 
> _ZNSt17_Function_handlerIFN7process6FutureINS0_4http8ResponseEEEvEZZNKS0_9_DeferredIZZNS0_11ProcessBase6_visitERKNS7_12HttpEndpointERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS0_5OwnedINS2_7RequestNKUlRK6OptionINS2_14authentication20AuthenticationResultEEE0_clEST_EUlbE1_EcvSt8functionIFT_T0_EEIS4_RKbEEvENKUlS14_E_clES14_EUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7efebb5ce8ca std::function<>::operator()()
> @ 0x7efebb5c4b27 
> _ZZN7process8internal8DispatchINS_6FutureINS_4http8ResponseclIRSt8functionIFS5_vS5_RKNS_4UPIDEOT_ENKUlPNS_11ProcessBaseEE_clESI_
> @ 0x7efebb5d4e1e 
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8internal8DispatchINS0_6FutureINS0_4http8ResponseclIRSt8functionIFS9_vS9_RKNS0_4UPIDEOT_EUlS2_E_E9_M_invokeERKSt9_Any_dataOS2_
> @ 0x7efebcb30baf std::function<>::operator()()
> @ 0x7efebcb13fd6 process::ProcessBase::visit()
> @ 0x7efebcb1f3c8 process::DispatchEvent::visit()
> @ 0x7efebb3ab2ea process::ProcessBase::serve()
> @ 0x7efebcb0fe8a process::ProcessManager::resume()
> @ 0x7efebcb0c5a3 
> _ZZN7process14ProcessManager12init_threadsEvENKUt_clEv
> @ 0x7efebcb1ea34 
> _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE
> @ 0x7efebcb1e98a 
> _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEclEv
> @ 0x7efebcb1e91a 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7efeb6980c80 (unknown)
> @ 0x7efeb649c6ba start_thread
> @ 0x7efeb61d282d (unknown)
> Aborted (core dumped)
> {code}
> https://reviews.apache.org/r/55480/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3505) Support specifying Docker image by Image ID.

2017-01-17 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826118#comment-15826118
 ] 

Jie Yu commented on MESOS-3505:
---

commit 92595f4f120e48c98b48add4a58548cba7745312
Author: Ilya Pronin 
Date:   Tue Jan 17 14:33:00 2017 +0100

Added support for pulling Docker images by digest.

For now we can only use digests to pull images that were pushed with
Docker 1.9 and older or from Registry 2.2.1 and older. Newer versions
use Schema 2 manifests that are not converted by the registry when
pulling by digest.

Review: https://reviews.apache.org/r/53848/

> Support specifying Docker image by Image ID.
> 
>
> Key: MESOS-3505
> URL: https://issues.apache.org/jira/browse/MESOS-3505
> Project: Mesos
>  Issue Type: Story
>Reporter: Yan Xu
>Assignee: Ilya Pronin
>  Labels: mesosphere
>
> A common way to specify a Docker image with the docker engine is through 
> {{repo:tag}}, which is convenient and sufficient for most people in most 
> scenarios. However this combination is neither precise nor immutable.
> For this reason, it's possible when an image with a {{repo:tag}} already 
> cached locally on an agent host and a task requiring this {{repo:tag}} 
> arrives, it's using an image that's different than the one the user intended.
> Docker CLI already supports referring to an image by {{repo@id}}, where the 
> ID can have two forms:
> * v1 Image ID
> * digest
> Native Mesos provisioner should support the same for Docker images. IMO it's 
> fine if image discovery by ID is not supported (and thus still requiring 
> {{repo:tag}} to be specified) (looks like [v2 
> registry|http://docs.docker.com/registry/spec/api/] does support it) but the 
> user can optionally specify an image ID and match it against the cached / 
> newly pulled image. If the ID doesn't match the cached image, the store can 
> re-pull it; if the ID doesn't match the newly pulled image (manifest), the 
> provisioner can fail the request without having the user unknowingly running 
> its task on the wrong image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6917) Segfault when the executor sets an invalid UUID when sending a status update.

2017-01-17 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826086#comment-15826086
 ] 

Vinod Kone commented on MESOS-6917:
---

Backported to 1.0.x

commit 8083dfe82a3b9135ca7493d29a5033eb10ee07da
Author: Vinod Kone 
Date:   Tue Jan 17 14:59:31 2017 +0100

Added MESOS-6917 to CHANGELOG for 1.0.3.

commit 1b68d74745454a5c9390923da770771c5acf968c
Author: Aaron Wood 
Date:   Tue Jan 17 14:52:49 2017 +0100

Fix segfault when the executor sets a UUID that is not a valid v4 UUID.

This fixes the segfault that occurs when an executor sets a UUID
that's not a valid v4 UUID and sends it off to the agent:

Review: https://reviews.apache.org/r/55480/


> Segfault when the executor sets an invalid UUID  when sending a status update.
> --
>
> Key: MESOS-6917
> URL: https://issues.apache.org/jira/browse/MESOS-6917
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.0, 1.0.1, 1.0.2, 1.1.0
>Reporter: Aaron Wood
>Assignee: Aaron Wood
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 1.1.1, 1.2.0
>
>
> A segfault occurs when an executor sets a UUID that's not a valid v4 UUID and 
> sends it off to the agent:
> {code}
> ABORT: (../../3rdparty/stout/include/stout/try.hpp:77): Try::get() but state 
> == ERROR: Not a valid UUID
> *** Aborted at 1484262968 (unix time) try "date -d @1484262968" if you are 
> using GNU date ***
> PC: @ 0x7efeb6101428 (unknown)
> *** SIGABRT (@0x36b7) received by PID 14007 (TID 0x7efeabd29700) from PID 
> 14007; stack trace: ***
> @ 0x7efeb64a6390 (unknown)
> @ 0x7efeb6101428 (unknown)
> @ 0x7efeb610302a (unknown)
> @ 0x560df739fa6e _Abort()
> @ 0x560df739fa9c _Abort()
> @ 0x7efebb53a5ad Try<>::get()
> @ 0x7efebb5363d6 Try<>::get()
> @ 0x7efebbd84809 
> mesos::internal::slave::validation::executor::call::validate()
> @ 0x7efebbb59b36 mesos::internal::slave::Slave::Http::executor()
> @ 0x7efebbc773b8 
> _ZZN5mesos8internal5slave5Slave10initializeEvENKUlRKN7process4http7RequestEE1_clES7_
> @ 0x7efebbcb5808 
> _ZNSt17_Function_handlerIFN7process6FutureINS0_4http8ResponseEEERKNS2_7RequestEEZN5mesos8internal5slave5Slave10initializeEvEUlS7_E1_E9_M_invokeERKSt9_Any_dataS7_
> @ 0x7efebbfb2aea std::function<>::operator()()
> @ 0x7efebcb158b8 
> _ZZZN7process11ProcessBase6_visitERKNS0_12HttpEndpointERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS_5OwnedINS_4http7RequestNKUlRK6OptionINSD_14authentication20AuthenticationResultEEE0_clESN_ENKUlbE1_clEb
> @ 0x7efebcb1a10a 
> _ZZZNK7process9_DeferredIZZNS_11ProcessBase6_visitERKNS1_12HttpEndpointERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS_5OwnedINS_4http7RequestNKUlRK6OptionINSE_14authentication20AuthenticationResultEEE0_clESO_EUlbE1_EcvSt8functionIFT_T0_EEINS_6FutureINSE_8ResponseEEERKbEEvENKUlS12_E_clES12_ENKUlvE_clEv
> @ 0x7efebcb1c5f8 
> _ZNSt17_Function_handlerIFN7process6FutureINS0_4http8ResponseEEEvEZZNKS0_9_DeferredIZZNS0_11ProcessBase6_visitERKNS7_12HttpEndpointERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS0_5OwnedINS2_7RequestNKUlRK6OptionINS2_14authentication20AuthenticationResultEEE0_clEST_EUlbE1_EcvSt8functionIFT_T0_EEIS4_RKbEEvENKUlS14_E_clES14_EUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7efebb5ce8ca std::function<>::operator()()
> @ 0x7efebb5c4b27 
> _ZZN7process8internal8DispatchINS_6FutureINS_4http8ResponseclIRSt8functionIFS5_vS5_RKNS_4UPIDEOT_ENKUlPNS_11ProcessBaseEE_clESI_
> @ 0x7efebb5d4e1e 
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8internal8DispatchINS0_6FutureINS0_4http8ResponseclIRSt8functionIFS9_vS9_RKNS0_4UPIDEOT_EUlS2_E_E9_M_invokeERKSt9_Any_dataOS2_
> @ 0x7efebcb30baf std::function<>::operator()()
> @ 0x7efebcb13fd6 process::ProcessBase::visit()
> @ 0x7efebcb1f3c8 process::DispatchEvent::visit()
> @ 0x7efebb3ab2ea process::ProcessBase::serve()
> @ 0x7efebcb0fe8a process::ProcessManager::resume()
> @ 0x7efebcb0c5a3 
> _ZZN7process14ProcessManager12init_threadsEvENKUt_clEv
> @ 0x7efebcb1ea34 
> _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE
> @ 0x7efebcb1e98a 
> _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEclEv
> @ 0x7efebcb1e91a 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7efeb6980c80 (unknown)
> @ 0x7efeb649c6ba start_thread
> @ 0x7efeb61d282d (unknown)
> Aborted (core dumped)
> {code}
> 

[jira] [Created] (MESOS-6933) Executor does not respect grace period

2017-01-17 Thread Tomasz Janiszewski (JIRA)
Tomasz Janiszewski created MESOS-6933:
-

 Summary: Executor does not respect grace period
 Key: MESOS-6933
 URL: https://issues.apache.org/jira/browse/MESOS-6933
 Project: Mesos
  Issue Type: Bug
  Components: executor
Reporter: Tomasz Janiszewski


Mesos Defult Executor try to support grace period with escalate but 
unfortunately it does not work. It launches {{command}} by wrapping it in {{sh 
-c}} this cause process tree to look like this

{code}
Received killTask
Shutting down
Sending SIGTERM to process tree at pid 18
Sent SIGTERM to the following process trees:
[ 
-+- 18 sh -c cd offer-i18n-0.1.24 && LD_PRELOAD=../librealresources.so 
./bin/offer-i18n -e prod -p $PORT0 
 \--- 19 command...
]
Command terminated with signal Terminated (pid: 18)
{code}

This cause {{sh}} to immediately close and so executor, while wrapped 
{{command}} might need some more time to finish. Finally, executor thinks 
command executed gracefully so it won't 
[escalate|https://github.com/apache/mesos/blob/1.1.0/src/launcher/executor.cpp#L695]
 to SIGKILL.

This cause leaks when POSIX contenerizer is used because if command ignores 
SIGTERM it will be attached to init and never get killed. Using pid/namespace 
only masks the problem because hanging process is cpatured before it can 
gracefully shutdown.

Fix for this is to sent SIGTERM only to {{sh}} children. {{sh}} will exit when 
all sub processes finish. If not they will be killed by escalation to SIGKILL.

All versions from: 0.20 are affected.

This test should pass 
[src/tests/command_executor_tests.cpp:342|https://github.com/apache/mesos/blob/2c856178b59593ff8068ea8d6c6593943c33008c/src/tests/command_executor_tests.cpp#L342-L343]
[Mailing list 
thread|https://lists.apache.org/thread.html/1025dca0cf4418aee50b14330711500af864f08b53eb82d10cd5c04c@%3Cuser.mesos.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4989) Design document for docker volume driver

2017-01-17 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4989:
--
Fix Version/s: (was: 1.0.3)

> Design document for docker volume driver
> 
>
> Key: MESOS-4989
> URL: https://issues.apache.org/jira/browse/MESOS-4989
> Project: Mesos
>  Issue Type: Task
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6010) Docker registry puller shows decode error "No response decoded".

2017-01-17 Thread Jan Schlicht (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht updated MESOS-6010:

Shepherd: Jie Yu

> Docker registry puller shows decode error "No response decoded".
> 
>
> Key: MESOS-6010
> URL: https://issues.apache.org/jira/browse/MESOS-6010
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 1.0.0, 1.0.1
>Reporter: Sunzhe
>Assignee: Jan Schlicht
>Priority: Critical
>  Labels: Docker, mesos-containerizer
>
> The {{mesos-agent}} flags:
> {code}
>  GLOG_v=1 ./bin/mesos-agent.sh \
>   --master=zk://${MESOS_MASTER_IP}:2181/mesos  \
>   --ip=10.100.3.3  \
>   --work_dir=${MESOS_WORK_DIR} \
>   
> --isolation=cgroups/devices,gpu/nvidia,disk/du,docker/runtime,filesystem/linux
>  \
>   --enforce_container_disk_quota \
>   --containerizers=mesos \
>   --image_providers=docker \
>   --executor_environment_variables="{}"
> {code}
> And the {{mesos-execute}} flags:
> {code}
>  ./src/mesos-execute \
>--master=${MESOS_MASTER_IP}:5050 \
>--name=${INSTANCE_NAME} \
>--docker_image=${DOCKER_IMAGE} \
>--framework_capabilities=GPU_RESOURCES \
>--shell=false
> {code}
> But when {{./src/mesos-execute}}, the errors like below:
> {code}
> I0809 16:11:46.207875 25583 scheduler.cpp:172] Version: 1.0.0
> I0809 16:11:46.212442 25582 scheduler.cpp:461] New master detected at 
> master@10.103.0.125:5050
> Subscribed with ID '168ab900-ee7e-4829-a59a-d16de956637e-0009'
> Submitted task 'test' to agent '168ab900-ee7e-4829-a59a-d16de956637e-S1'
> Received status update TASK_FAILED for task 'test'
>   message: 'Failed to launch container: Failed to decode HTTP responses: No 
> response decoded
> HTTP/1.1 200 Connection established
> HTTP/1.1 401 Unauthorized
> Content-Type: application/json; charset=utf-8
> Docker-Distribution-Api-Version: registry/2.0
> Www-Authenticate: Bearer 
> realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:library/redis:pull;
> Date: Tue, 09 Aug 2016 08:10:32 GMT
> Content-Length: 145
> Strict-Transport-Security: max-age=31536000
> {"errors":[{"code":"UNAUTHORIZED","message":"authentication 
> required","detail":[{"Type":"repository","Name":"library/redis","Action":"pull"}]}]}
> ; Container destroyed while provisioning images'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6922) SlaveRecoveryTest/0.RecoverTerminatedExecutor is flaky

2017-01-17 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-6922:
--
Assignee: Vinod Kone

Looks like the reason for multiple status updates is because we call `resume` 
on status update manager twice:

1) when the agent receives ReregisteredSlaveMessage from the master after 
recovery
2) when the agent receives UpdateFrameworkMessage from the master after 
re-registration

One way to fix the race is to drop the UpdateFrameworkMessage in the test.

> SlaveRecoveryTest/0.RecoverTerminatedExecutor is flaky
> --
>
> Key: MESOS-6922
> URL: https://issues.apache.org/jira/browse/MESOS-6922
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
> Environment: CentOS 7
>Reporter: Greg Mann
>Assignee: Vinod Kone
>  Labels: tests
> Attachments: SlaveRecoveryTest.RecoverTerminatedExecutor.txt
>
>
> This was observed on ASF CI. Find attached the log from a failed run; it 
> appears that too many status updates are being received:
> {code}
> /mesos/src/tests/slave_recovery_tests.cpp:1350: Failure
> Mock function called more times than expected - returning directly.
> Function call: statusUpdate(0x7ffcf00155b8, @0x2b3f4f7ab8c0 120-byte 
> object <50-66 6A-45 3F-2B 00-00 00-00 00-00 00-00 00-00 DF-13 00-00 00-00 
> 00-00 70-59 01-90 3F-2B 00-00 A0-D7 00-90 3F-2B 00-00 05-00 00-00 01-00 00-00 
> D0-01 91-04 00-00 00-00 D0-9C 00-90 3F-2B 00-00 C0-EB 01-90 3F-2B 00-00 18-00 
> 00-00 00-2B 00-00 47-98 7C-B9 92-29 D6-41 90-5B 02-90 3F-2B 00-00 00-00 00-00 
> 00-00 00-00 70-6E 01-90 3F-2B 00-00 00-00 00-00 00-00 00-00>)
>  Expected: to be called once
>Actual: called twice - over-saturated and active
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6035) Add non-recursive version of cgroups::get

2017-01-17 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-6035:
--
Fix Version/s: (was: 1.2.0)

> Add non-recursive version of cgroups::get
> -
>
> Key: MESOS-6035
> URL: https://issues.apache.org/jira/browse/MESOS-6035
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups
>Reporter: haosdent
>Assignee: haosdent
>Priority: Minor
>
> In some cases, we only need to get the top level cgroups instead of to get 
> all cgroups recursively. Add a non-recursive version could help to avoid 
> unnecessary paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6035) Add non-recursive version of cgroups::get

2017-01-17 Thread haosdent huang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825685#comment-15825685
 ] 

haosdent huang commented on MESOS-6035:
---

hi, [~vinodkone] yan reverted patches above because it failed the nested 
container test cases. We have future discussions about how to refactor the 
cgroups test part. Because this issue is not resolved after the patch revert, I 
reopen it before.

> Add non-recursive version of cgroups::get
> -
>
> Key: MESOS-6035
> URL: https://issues.apache.org/jira/browse/MESOS-6035
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups
>Reporter: haosdent
>Assignee: haosdent
>Priority: Minor
> Fix For: 1.2.0
>
>
> In some cases, we only need to get the top level cgroups instead of to get 
> all cgroups recursively. Add a non-recursive version could help to avoid 
> unnecessary paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6932) Rename functions, methods, constants, variables containing "slave" to "agent"

2017-01-17 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-6932:
-

 Summary: Rename functions, methods, constants, variables 
containing "slave" to "agent"
 Key: MESOS-6932
 URL: https://issues.apache.org/jira/browse/MESOS-6932
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6931) Update comments to use the string "agent" instead of "slave"

2017-01-17 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-6931:
-

 Summary: Update comments to use the string "agent" instead of 
"slave"
 Key: MESOS-6931
 URL: https://issues.apache.org/jira/browse/MESOS-6931
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6930) Rename directory names containing the string "slave" to "agent"

2017-01-17 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-6930:
-

 Summary: Rename directory names containing the string "slave" to 
"agent"
 Key: MESOS-6930
 URL: https://issues.apache.org/jira/browse/MESOS-6930
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6929) Rename file names containing the string "slave" to "agent"

2017-01-17 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-6929:
-

 Summary: Rename file names containing the string "slave" to "agent"
 Key: MESOS-6929
 URL: https://issues.apache.org/jira/browse/MESOS-6929
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6928) Rename tests that contain the string "slave" to "agent"

2017-01-17 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-6928:
-

 Summary: Rename tests that contain the string "slave" to "agent"
 Key: MESOS-6928
 URL: https://issues.apache.org/jira/browse/MESOS-6928
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6927) Rename internal classes / structs that contain the string "slave" to "agent"

2017-01-17 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-6927:
-

 Summary: Rename internal classes / structs that contain the string 
"slave" to "agent"
 Key: MESOS-6927
 URL: https://issues.apache.org/jira/browse/MESOS-6927
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5454) Slave to Agent rename (Phase II).

2017-01-17 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-5454:
--
Labels: mesosphere tech-debt  (was: mesosphere)

> Slave to Agent rename (Phase II).
> -
>
> Key: MESOS-5454
> URL: https://issues.apache.org/jira/browse/MESOS-5454
> Project: Mesos
>  Issue Type: Epic
>Reporter: Clark Breyman
>Priority: Minor
>  Labels: mesosphere, tech-debt
>
> This ticket tracks the work needed for Phase II according to 
> https://docs.google.com/document/d/1P8_4wdk29I6NoVTjbFkRl05-tfxV9PY4WLoRNvExupM/edit#heading=h.9g7fqjh6652v



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5821) Clean up the thousands of compiler warnings on MSVC

2017-01-17 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825639#comment-15825639
 ] 

Vinod Kone commented on MESOS-5821:
---

[~hausdorff] Looks like you reopened this ticket after it was resolved? If it 
is by mistake, can you resolve it?

> Clean up the thousands of compiler warnings on MSVC
> ---
>
> Key: MESOS-5821
> URL: https://issues.apache.org/jira/browse/MESOS-5821
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Alex Clemmer
>Assignee: Daniel Pravat
>  Labels: mesosphere, microsoft, slave
> Fix For: 1.2.0
>
>
> Clean builds of Mesos on Windows will result in approximately {{5800 
> Warning(s)}} or more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6782) Inherit Environment from Parent containers image spec when launching DEBUG container

2017-01-17 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-6782:
--
Fix Version/s: (was: 1.2.0)

> Inherit Environment from Parent containers image spec when launching DEBUG 
> container
> 
>
> Key: MESOS-6782
> URL: https://issues.apache.org/jira/browse/MESOS-6782
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Kevin Klues
>Assignee: Jie Yu
>  Labels: debugging, mesosphere
>
> Right now whenever we enter a DEBUG container we have a fresh environment. 
> For a better user experience, we should have the DEBUG container inherit the 
> environment set up in its parent container image spec (if there is one). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6035) Add non-recursive version of cgroups::get

2017-01-17 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825635#comment-15825635
 ] 

Vinod Kone commented on MESOS-6035:
---

[~haosd...@gmail.com] Looks like you reopened the ticket after it was marked as 
resolved? If it is by mistake, can you resolve it?

> Add non-recursive version of cgroups::get
> -
>
> Key: MESOS-6035
> URL: https://issues.apache.org/jira/browse/MESOS-6035
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups
>Reporter: haosdent
>Assignee: haosdent
>Priority: Minor
> Fix For: 1.2.0
>
>
> In some cases, we only need to get the top level cgroups instead of to get 
> all cgroups recursively. Add a non-recursive version could help to avoid 
> unnecessary paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)