[jira] [Created] (MESOS-9633) Create a utility framework to convert `RAW` disks and `MOUNT` disks.

2019-03-04 Thread Chun-Hung Hsiao (JIRA)
Chun-Hung Hsiao created MESOS-9633:
--

 Summary: Create a utility framework to convert `RAW` disks and 
`MOUNT` disks.
 Key: MESOS-9633
 URL: https://issues.apache.org/jira/browse/MESOS-9633
 Project: Mesos
  Issue Type: Task
  Components: storage
Reporter: Chun-Hung Hsiao


Right now it is not easy for users to consume CSI disks. They need to come up 
with a framework to convert {{RAW}} disks to {{MOUNT}} disks first, then the 
{{MOUNT}} disks can be consumed by existing frameworks, e.g., Marathon. It 
could be nice to have a utility framework to do this, say we could change 
{{src/examples/test_csi_user_framework.cpp}} to become a utility executable to 
provide the following functionalities:
 * {{create}}: takes a {{profile}} and a {{size}} and registers a framework to 
wait for an offer of a {{RAW}} disk matching the given {{profile}} and {{size}} 
then convert it to a {{MOUNT}} disk.
 * {{destroy}}: takes a {{vendor}} and an {{id}} and registers a framework to 
wait for an offer of a {{MOUNT}} disk matching the given {{vendor}} and {{id}} 
then convert it to a {{RAW}} disk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9565) Unit tests for destroying persistent volumes in SLRP.

2019-03-04 Thread Chun-Hung Hsiao (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784023#comment-16784023
 ] 

Chun-Hung Hsiao commented on MESOS-9565:


Committed the following patches:
{noformat}
commit bf3982d8d143e7a4d928047a4f7dec9d69479235
Author: Chun-Hung Hsiao 
Date: Thu Jan 31 14:32:54 2019 -0800

Made SLRP `PublishResources` test to check persistent volume cleanup.

This patch renames the `ROOT_PublishResources` test to
`ROOT_CreateDestroyPersistentMountVolume` and makes it verify that the
persistent volume is cleaned up after `DESTROY`.

NOTE: The `filesystem/linux` isolator has been removed from the test
because it is not necessary for the test. However, the root privilege is
still required by the test CSI plugin for bind-mounting.

Review: https://reviews.apache.org/r/69895{noformat}
{noformat}
commit 8aaf50291f8cc601cf1742dfca610bd27bc897f4
Author: Chun-Hung Hsiao 
Date: Thu Jan 31 19:05:48 2019 -0800

Made SLRP `PublishResourcesReboot` test to check persistent volume cleanup.

This patch renames the `ROOT_PublishResourcesReboot` test to
`ROOT_CreateDestroyPersistentMountVolumeWithReboot` and makes it verify
that the persistent volume is cleaned up with `DESTROY` after a reboot.

NOTE: The `filesystem/linux` isolator has been removed from the test
because it is not necessary for the test. However, the root privilege is
still required by the test CSI plugin for bind-mounting.

Review: https://reviews.apache.org/r/69896{noformat}
{noformat}
commit a6e757467d6443ccc1328a65cd3e36aaa871
Author: Chun-Hung Hsiao 
Date: Fri Feb 1 17:01:06 2019 -0800

Made SLRP `PublishResourcesRecovery` test to check volume cleanup.

This patch renames the `ROOT_PublishResourcesRecovery` test to
`ROOT_CreateDestroyPersistentMountVolumeWithRecovery` and makes it
verify that the persistent volume is cleaned up with `DESTROY` after
recovery.

NOTE: The `filesystem/linux` isolator has been removed from the test
because it is not necessary for the test. However, the root privilege is
still required by the test CSI plugin for bind-mounting.

Review: https://reviews.apache.org/r/69897{noformat}
{noformat}
commit 6b05bc8c05c1951999ea787ffb9c1815bb5ef8b4
Author: Chun-Hung Hsiao 
Date: Mon Feb 4 16:38:50 2019 -0800

Updated `CreateDestroyDisk*` SLRP tests to test pipelined operations.

This patch extends the code coverage of the `CreateDestroyDisk` and
`CreateDestroyDiskRecovery` tests by testing pipelined `RESERVE`,
`CREATE`, `DESTROY` and `UNRESERVE` operations along with `CREATE_DISK`
and `DESTROY_DISK`. It also renames `CreateDestroyDiskRecovery` to
`CreateDestroyDiskWithRecovery` for consistency.

Review: https://reviews.apache.org/r/69898{noformat}
{noformat}
commit 43bf73ca2f902356c3a5b92fb7cf535af3b7ccc4
Author: Chun-Hung Hsiao 
Date: Tue Feb 5 21:23:23 2019 -0800

Extracted common offer matching functions from SLRP tests.

This patch extracts lambda functions `isStoragePool`, `isMountDisk` and
`isPreprovisionedVolume` from all tests and makes them common utitily
functions in the test fixture.

Review: https://reviews.apache.org/r/69904{noformat}
{noformat}
commit 069ac8ab9f298d11344accc964120103ba84ddbd
Author: Chun-Hung Hsiao 
Date: Thu Jan 31 16:19:43 2019 -0800

Added a SLRP unit test for failed persistent volume cleanup.

Test `ROOT_CreateDestroyPersistentMountVolumeFailed` verifies that if
SLRP fails to clean up a persistent volume, `DESTROY` would fail, the
persistent volume would remain, and depended operations will be dropped.

Review: https://reviews.apache.org/r/69905{noformat}
{noformat}
commit 069ac8ab9f298d11344accc964120103ba84ddbd
Author: Chun-Hung Hsiao 
Date: Thu Jan 31 16:19:43 2019 -0800

Added a SLRP unit test for failed persistent volume cleanup.

Test `ROOT_CreateDestroyPersistentMountVolumeFailed` verifies that if
SLRP fails to clean up a persistent volume, `DESTROY` would fail, the
persistent volume would remain, and depended operations will be dropped.

Review: https://reviews.apache.org/r/69905{noformat}

> Unit tests for destroying persistent volumes in SLRP.
> -
>
> Key: MESOS-9565
> URL: https://issues.apache.org/jira/browse/MESOS-9565
> Project: Mesos
>  Issue Type: Task
>  Components: test
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>Priority: Major
>  Labels: mesosphere, storage
>
> The plan is to add/update the following unit tests to test persistent volume 
> destroy:
> * CreateDestroyDisk
> * CreateDestroyDiskWithRecovery
> * CreateDestroyPersistentMountVolume
> * CreateDestroyPersistentMountVolumeWithRecovery
> * CreateDestroyPersistentMountVolumeWithReboot
> * CreateDestroyPersistentBlockVolume
> * DestroyPersistentMountVolumeFailed
> * DestroyUnpublishedPersistentVolume
> * DestroyUnpublishedPersistentVolumeWithRecovery
> 

[jira] [Created] (MESOS-9632) Refactor SLRP with a CSI service manager.

2019-03-04 Thread Chun-Hung Hsiao (JIRA)
Chun-Hung Hsiao created MESOS-9632:
--

 Summary: Refactor SLRP with a CSI service manager.
 Key: MESOS-9632
 URL: https://issues.apache.org/jira/browse/MESOS-9632
 Project: Mesos
  Issue Type: Task
  Components: storage
Reporter: Chun-Hung Hsiao
Assignee: Chun-Hung Hsiao


The CSI volume manager relies on service containers, which should be agnostic 
to CSI versions. As the first step of MESOS-9622, we should first refactor SLRP 
with a CSI service manager that manages service container lifecycles before 
refactoring it with a CSI volume manager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9622) Refactor SLRP with a CSI volume manager.

2019-03-04 Thread Chun-Hung Hsiao (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783896#comment-16783896
 ] 

Chun-Hung Hsiao commented on MESOS-9622:


Break the work down with MESOS-9632.

> Refactor SLRP with a CSI volume manager.
> 
>
> Key: MESOS-9622
> URL: https://issues.apache.org/jira/browse/MESOS-9622
> Project: Mesos
>  Issue Type: Task
>  Components: storage
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>Priority: Blocker
>  Labels: mesosphere, mesosphere-dss-ga, storage
>
> To support both CSI v0 and v1, SLRP needs to be agnostic to CSI versions. 
> This could be achieved by refactoring all CSI volume management code into a 
> CSI volume manager that can be implemented with CSI v0 and v1. Also, the 
> volume state proto needs to be agnostic to CSI spec version as well.
> Design doc: 
> https://docs.google.com/document/d/1LPy839zwFw6UcRhmr65iKeMaHcoj6uUX25yJVbMknlY/edit#heading=h.1iswiwd3imin



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-9622) Refactor SLRP with a CSI volume manager.

2019-03-04 Thread Chun-Hung Hsiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-9622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun-Hung Hsiao reassigned MESOS-9622:
--

Assignee: Chun-Hung Hsiao

> Refactor SLRP with a CSI volume manager.
> 
>
> Key: MESOS-9622
> URL: https://issues.apache.org/jira/browse/MESOS-9622
> Project: Mesos
>  Issue Type: Task
>  Components: storage
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>Priority: Blocker
>  Labels: mesosphere, mesosphere-dss-ga, storage
>
> To support both CSI v0 and v1, SLRP needs to be agnostic to CSI versions. 
> This could be achieved by refactoring all CSI volume management code into a 
> CSI volume manager that can be implemented with CSI v0 and v1. Also, the 
> volume state proto needs to be agnostic to CSI spec version as well.
> Design doc: 
> https://docs.google.com/document/d/1LPy839zwFw6UcRhmr65iKeMaHcoj6uUX25yJVbMknlY/edit#heading=h.1iswiwd3imin



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-9619) Mesos Master Crashes with Launch Group when using Port Resources

2019-03-04 Thread Meng Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Meng Zhu reassigned MESOS-9619:
---

Assignee: (was: Meng Zhu)

> Mesos Master Crashes with Launch Group when using Port Resources
> 
>
> Key: MESOS-9619
> URL: https://issues.apache.org/jira/browse/MESOS-9619
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Affects Versions: 1.4.3, 1.7.1
> Environment:  
> Testing in both Mesos 1.4.3 and Mesos 1.7.1
>Reporter: Nimi Wariboko Jr.
>Priority: Blocker
>  Labels: master, mesosphere
> Attachments: mesos-master.log, mesos-master.snippet.log
>
>
> Original Issue: 
> [https://lists.apache.org/thread.html/979c8799d128ad0c436b53f2788568212f97ccf324933524f1b4d189@%3Cuser.mesos.apache.org%3E]
>  When the ports resources is removed, Mesos functions normally (I'm able to 
> launch the task as many times as possible, while it always fails continually).
> Attached is a snippet of the mesos master log from OFFER to crash.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9619) Mesos Master Crashes with Launch Group when using Port Resources

2019-03-04 Thread Meng Zhu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783689#comment-16783689
 ] 

Meng Zhu commented on MESOS-9619:
-

Additional validation on the master side is needed. We should validate that for 
range and set type resources such as ports, the same resources should not be 
specified more than once in the tasks and executors. Because unlike scalars, 
[1, 2] + [1, 2] is still [1, 2]. This will mess up the allocator accounting.

Specifically for the port resources here, we need to check that if we are only 
using the host network namespace (which is the default setting), if so, the 
executor and tasks should have non-overlap port resources.

> Mesos Master Crashes with Launch Group when using Port Resources
> 
>
> Key: MESOS-9619
> URL: https://issues.apache.org/jira/browse/MESOS-9619
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Affects Versions: 1.4.3, 1.7.1
> Environment:  
> Testing in both Mesos 1.4.3 and Mesos 1.7.1
>Reporter: Nimi Wariboko Jr.
>Assignee: Meng Zhu
>Priority: Blocker
>  Labels: allocator, master, mesosphere
> Attachments: mesos-master.log, mesos-master.snippet.log
>
>
> Original Issue: 
> [https://lists.apache.org/thread.html/979c8799d128ad0c436b53f2788568212f97ccf324933524f1b4d189@%3Cuser.mesos.apache.org%3E]
>  When the ports resources is removed, Mesos functions normally (I'm able to 
> launch the task as many times as possible, while it always fails continually).
> Attached is a snippet of the mesos master log from OFFER to crash.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-8292) Update webui to show fault domains

2019-03-04 Thread Benjamin Bannier (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-8292:
---

Assignee: (was: Benno Evers)

> Update webui to show fault domains
> --
>
> Key: MESOS-8292
> URL: https://issues.apache.org/jira/browse/MESOS-8292
> Project: Mesos
>  Issue Type: Task
>  Components: webui
>Reporter: Vinod Kone
>Priority: Major
>  Labels: newbie
>
> At the least the nodes and tasks page should show what region and zone they 
> are running in. Maybe the home page can also show the region/zone of the 
> leading master.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8241) Add metrics for offer operation feedback

2019-03-04 Thread Benno Evers (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783566#comment-16783566
 ] 

Benno Evers commented on MESOS-8241:


I've opened a review for the scope that is outline in the comment above at: 
https://reviews.apache.org/r/70116/

Some ideas I've had for further metrics that might become interesting:

Master-wide versions of the per-framework metrics we currently collect about 
operations types:
 - master/operations/create_disk/finished
 - master/operations/create_disk/dropped
 - [...]

A counter to see how many user-provided operations failed validation:
 - master/invalid_operations

A per-framework counter for the number of unacknowledged operations.

A counter for the total number of operation update retries.

> Add metrics for offer operation feedback
> 
>
> Key: MESOS-8241
> URL: https://issues.apache.org/jira/browse/MESOS-8241
> Project: Mesos
>  Issue Type: Task
>Reporter: Greg Mann
>Assignee: Benno Evers
>Priority: Major
>  Labels: foundations, mesosphere, operation-feedback
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9620) Add metrics for volume gid manager

2019-03-04 Thread Qian Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783418#comment-16783418
 ] 

Qian Zhang commented on MESOS-9620:
---

RR: https://reviews.apache.org/r/70112/

> Add metrics for volume gid manager
> --
>
> Key: MESOS-9620
> URL: https://issues.apache.org/jira/browse/MESOS-9620
> Project: Mesos
>  Issue Type: Task
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>Priority: Major
>
> We need to add some metrics for volume gid manager, e.g., total number of 
> gids configured for volume gid manager and the number of the free gids 
> available to volume gid manager, and add the new metrics into [this 
> doc|https://github.com/apache/mesos/blob/master/docs/monitoring.md].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-9612) Resource provider manager assumes all operations are triggered by frameworks

2019-03-04 Thread Jan Schlicht (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-9612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht reassigned MESOS-9612:
---

Assignee: Jan Schlicht

> Resource provider manager assumes all operations are triggered by frameworks
> 
>
> Key: MESOS-9612
> URL: https://issues.apache.org/jira/browse/MESOS-9612
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Benjamin Bannier
>Assignee: Jan Schlicht
>Priority: Blocker
>  Labels: mesosphere, mesosphere-dss-ga, storage
>
> When the agent tries to apply an operation to resource provider resources, it 
> invokes {{ResourceProviderManager::applyOperation}} which in turn invokes 
> {{ResourceProviderManagerProcess::applyOperation}}. That function currently 
> assumes that the received message contains a valid {{FrameworkID}},
> {noformat}
>  void ResourceProviderManagerProcess::applyOperation(
>   const ApplyOperationMessage& message)   
>   
>   
>   {
> const Offer::Operation& operation = message.operation_info(); 
>   
>   
> 
> const FrameworkID& frameworkId = message.framework_id(); // 
> `framework_id` is `optional`.
> {noformat}
> Since {{FrameworkID}} is not a trivial proto types, but instead one with a 
> {{required}} field {{value}}, the message composed with the {{frameworkId}} 
> below cannot be serialized which leads to a failure below which in turn 
> triggers a {{CHECK}} failure in the agent's function interfacing with the 
> manager.
> A typical scenario where we would want to support operator API calls here is 
> to destroy leftover persistent volumes or reservations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (MESOS-9560) ContentType/AgentAPITest.MarkResourceProviderGone/1 is flaky

2019-03-04 Thread Benjamin Bannier (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783145#comment-16783145
 ] 

Benjamin Bannier edited comment on MESOS-9560 at 3/4/19 9:16 AM:
-

Work-in-progress patches posted here, 
https://github.com/mesosphere/mesos-private/tree/bbannier/t/MESOS-9560. Pausing 
work for now.


was (Author: bbannier):
Work-in-progress patches posted here, 
https://github.com/mesosphere/mesos-private/tree/bbannier/t/MESOS-9560.

> ContentType/AgentAPITest.MarkResourceProviderGone/1 is flaky
> 
>
> Key: MESOS-9560
> URL: https://issues.apache.org/jira/browse/MESOS-9560
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Critical
>  Labels: flaky, flaky-test, mesosphere, storage, test
> Attachments: consoleText.txt
>
>
> We observed a segfault in 
> {{ContentType/AgentAPITest.MarkResourceProviderGone/1}} on test teardown.
> {noformat}
> I0131 23:55:59.378453  6798 slave.cpp:923] Agent terminating
> I0131 23:55:59.378813 31143 master.cpp:1269] Agent 
> a27bcaba-70cc-4ec3-9786-38f9512c61fd-S0 at slave(1112)@172.16.10.236:43229 
> (ip-172-16-10-236.ec2.internal) disconnected
> I0131 23:55:59.378831 31143 master.cpp:3272] Disconnecting agent 
> a27bcaba-70cc-4ec3-9786-38f9512c61fd-S0 at slave(1112)@172.16.10.236:43229 
> (ip-172-16-10-236.ec2.internal)
> I0131 23:55:59.378846 31143 master.cpp:3291] Deactivating agent 
> a27bcaba-70cc-4ec3-9786-38f9512c61fd-S0 at slave(1112)@172.16.10.236:43229 
> (ip-172-16-10-236.ec2.internal)
> I0131 23:55:59.378891 31143 hierarchical.cpp:793] Agent 
> a27bcaba-70cc-4ec3-9786-38f9512c61fd-S0 deactivated
> F0131 23:55:59.378891 31149 logging.cpp:67] RAW: Pure virtual method called
> @ 0x7f633aaaebdd  google::LogMessage::Fail()
> @ 0x7f633aab6281  google::RawLog__()
> @ 0x7f6339821262  __cxa_pure_virtual
> @ 0x55671cacc113  
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> @ 0x55671b532e78  
> mesos::internal::tests::resource_provider::MockResourceProvider<>::disconnected()
> @ 0x7f633978f6b0  process::AsyncExecutorProcess::execute<>()
> @ 0x7f633979f218  
> _ZN5cpp176invokeIZN7process8dispatchI7NothingNS1_20AsyncExecutorProcessERKSt8functionIFvvEES9_EENS1_6FutureIT_EERKNS1_3PIDIT0_EEMSE_FSB_T1_EOT2_EUlSt10unique_ptrINS1_7PromiseIS3_EESt14default_deleteISP_EEOS7_PNS1_11ProcessBaseEE_JSS_S7_SV_EEEDTclcl7forwardISB_Efp_Espcl7forwardIT0_Efp0_EEEOSB_DpOSX_
> @ 0x7f633a9f5d01  process::ProcessBase::consume()
> @ 0x7f633aa1a08a  process::ProcessManager::resume()
> @ 0x7f633aa1db06  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
> @ 0x7f633acc9f80  execute_native_thread_routine
> @ 0x7f6337142e25  start_thread
> @ 0x7f6336241bad  __clone
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9560) ContentType/AgentAPITest.MarkResourceProviderGone/1 is flaky

2019-03-04 Thread Benjamin Bannier (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783145#comment-16783145
 ] 

Benjamin Bannier commented on MESOS-9560:
-

Work-in-progress patches posted here, 
https://github.com/mesosphere/mesos-private/tree/bbannier/t/MESOS-9560.

> ContentType/AgentAPITest.MarkResourceProviderGone/1 is flaky
> 
>
> Key: MESOS-9560
> URL: https://issues.apache.org/jira/browse/MESOS-9560
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Critical
>  Labels: flaky, flaky-test, mesosphere, storage, test
> Attachments: consoleText.txt
>
>
> We observed a segfault in 
> {{ContentType/AgentAPITest.MarkResourceProviderGone/1}} on test teardown.
> {noformat}
> I0131 23:55:59.378453  6798 slave.cpp:923] Agent terminating
> I0131 23:55:59.378813 31143 master.cpp:1269] Agent 
> a27bcaba-70cc-4ec3-9786-38f9512c61fd-S0 at slave(1112)@172.16.10.236:43229 
> (ip-172-16-10-236.ec2.internal) disconnected
> I0131 23:55:59.378831 31143 master.cpp:3272] Disconnecting agent 
> a27bcaba-70cc-4ec3-9786-38f9512c61fd-S0 at slave(1112)@172.16.10.236:43229 
> (ip-172-16-10-236.ec2.internal)
> I0131 23:55:59.378846 31143 master.cpp:3291] Deactivating agent 
> a27bcaba-70cc-4ec3-9786-38f9512c61fd-S0 at slave(1112)@172.16.10.236:43229 
> (ip-172-16-10-236.ec2.internal)
> I0131 23:55:59.378891 31143 hierarchical.cpp:793] Agent 
> a27bcaba-70cc-4ec3-9786-38f9512c61fd-S0 deactivated
> F0131 23:55:59.378891 31149 logging.cpp:67] RAW: Pure virtual method called
> @ 0x7f633aaaebdd  google::LogMessage::Fail()
> @ 0x7f633aab6281  google::RawLog__()
> @ 0x7f6339821262  __cxa_pure_virtual
> @ 0x55671cacc113  
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> @ 0x55671b532e78  
> mesos::internal::tests::resource_provider::MockResourceProvider<>::disconnected()
> @ 0x7f633978f6b0  process::AsyncExecutorProcess::execute<>()
> @ 0x7f633979f218  
> _ZN5cpp176invokeIZN7process8dispatchI7NothingNS1_20AsyncExecutorProcessERKSt8functionIFvvEES9_EENS1_6FutureIT_EERKNS1_3PIDIT0_EEMSE_FSB_T1_EOT2_EUlSt10unique_ptrINS1_7PromiseIS3_EESt14default_deleteISP_EEOS7_PNS1_11ProcessBaseEE_JSS_S7_SV_EEEDTclcl7forwardISB_Efp_Espcl7forwardIT0_Efp0_EEEOSB_DpOSX_
> @ 0x7f633a9f5d01  process::ProcessBase::consume()
> @ 0x7f633aa1a08a  process::ProcessManager::resume()
> @ 0x7f633aa1db06  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
> @ 0x7f633acc9f80  execute_native_thread_routine
> @ 0x7f6337142e25  start_thread
> @ 0x7f6336241bad  __clone
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9631) MasterLoadTest.SimultaneousBatchedRequests segfaults on macOS

2019-03-04 Thread Jan Schlicht (JIRA)
Jan Schlicht created MESOS-9631:
---

 Summary: MasterLoadTest.SimultaneousBatchedRequests segfaults on 
macOS
 Key: MESOS-9631
 URL: https://issues.apache.org/jira/browse/MESOS-9631
 Project: Mesos
  Issue Type: Bug
  Components: test
 Environment: macOS Mojave 10.14.3
Reporter: Jan Schlicht


Also tested on Linux, where this test succeeds. {{GLOG_v=1}} output of this 
test on macOS:
{noformat}
I0304 09:33:08.532002 155725824 master.cpp:414] Master 
8be09e79-ff3b-49bf-86e9-cde00fbdcdaa (172.18.8.49) started on 172.18.8.49:56584
I0304 09:33:08.532045 155725824 master.cpp:417] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="hierarchical" 
--authenticate_agents="true" --authenticate_frameworks="true" 
--authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" 
--authenticators="crammd5" --authorizers="local" 
--credentials="/private/var/folders/0b/srgwj7vd2037pygpz1fpyqgmgn/T/uCWwLH/credentials"
 --filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--http_framework_authenticators="basic" --initialize_driver_logging="true" 
--log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
--max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
--max_completed_tasks_per_framework="1000" 
--max_operator_event_stream_subscribers="1000" 
--max_unreachable_tasks_per_framework="1000" --memory_profiling="false" 
--min_allocatable_resources="cpus:0.01|mem:32" --port="5050" 
--publish_per_framework_metrics="true" --quiet="false" 
--recovery_agent_removal_limit="100%" --registry="in_memory" 
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--require_agent_domain="false" --role_sorter="drf" --root_submissions="true" 
--version="false" --webui_dir="/usr/local/share/mesos/webui" 
--work_dir="/private/var/folders/0b/srgwj7vd2037pygpz1fpyqgmgn/T/uCWwLH/master"
 --zk_session_timeout="10secs"
I0304 09:33:08.532878 155725824 master.cpp:466] Master only allowing 
authenticated frameworks to register
I0304 09:33:08.532889 155725824 master.cpp:472] Master only allowing 
authenticated agents to register
I0304 09:33:08.532896 155725824 master.cpp:478] Master only allowing 
authenticated HTTP frameworks to register
I0304 09:33:08.532903 155725824 credentials.hpp:37] Loading credentials for 
authentication from 
'/private/var/folders/0b/srgwj7vd2037pygpz1fpyqgmgn/T/uCWwLH/credentials'
I0304 09:33:08.533071 155725824 master.cpp:522] Using default 'crammd5' 
authenticator
I0304 09:33:08.533094 155725824 authenticator.cpp:520] Initializing server SASL
I0304 09:33:08.551656 155725824 auxprop.cpp:73] Initialized in-memory auxiliary 
property plugin
I0304 09:33:08.551702 155725824 http.cpp:965] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I0304 09:33:08.551745 155725824 http.cpp:965] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I0304 09:33:08.551766 155725824 http.cpp:965] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I0304 09:33:08.551785 155725824 master.cpp:603] Authorization enabled
I0304 09:33:08.551923 154116096 whitelist_watcher.cpp:77] No whitelist given
I0304 09:33:08.551964 151969792 hierarchical.cpp:208] Initialized hierarchical 
allocator process
I0304 09:33:08.553930 151969792 master.cpp:2103] Elected as the leading master!
I0304 09:33:08.553966 151969792 master.cpp:1638] Recovering from registrar
I0304 09:33:08.554018 153579520 registrar.cpp:339] Recovering registrar
I0304 09:33:08.556378 155725824 registrar.cpp:383] Successfully fetched the 
registry (0B) in 2.342912ms
I0304 09:33:08.556512 155725824 registrar.cpp:487] Applied 1 operations in 
38854ns; attempting to update the registry
I0304 09:33:08.558737 153579520 registrar.cpp:544] Successfully updated the 
registry in 2.206976ms
I0304 09:33:08.558776 153579520 registrar.cpp:416] Successfully recovered 
registrar
I0304 09:33:08.55 153042944 master.cpp:1752] Recovered 0 agents from the 
registry (136B); allowing 10mins for agents to reregister
I0304 09:33:08.558929 155725824 hierarchical.cpp:248] Skipping recovery of 
hierarchical allocator: nothing to recover
I0304 09:33:08.561846 162198976 sched.cpp:232] Version: 1.8.0
I0304 09:33:08.562060 155189248 sched.cpp:336] New master detected at 
master@172.18.8.49:56584
I0304 09:33:08.562099 155189248 sched.cpp:401] Authenticating with master 
master@172.18.8.49:56584
I0304 09:33:08.562110 155189248 sched.cpp:408] Using default CRAM-MD5 
authenticatee
I0304 09:33:08.562196