[jira] [Commented] (MESOS-8275) Remove use of ::_stat on Windows

2018-04-23 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449297#comment-16449297
 ] 

Andrew Schwartzmeyer commented on MESOS-8275:
-

I have {{mtime}} working and delete the others, but there's still these to look 
at:

 
{noformat}
src/tests/files_tests.cpp
340:  ASSERT_EQ(0, ::stat(path::join("1", "2").c_str(), ));
343:  ASSERT_EQ(0, ::stat(path::join("1", "3").c_str(), ));
346:  ASSERT_EQ(0, ::stat(path::join("1", "three").c_str(), ));
349:  ASSERT_EQ(0, ::stat(path::join("1", "two").c_str(), ));

src/tests/container_logger_tests.cpp
770:  ASSERT_GE(::stat(stdoutPath.c_str(), ), 0);

src/slave/containerizer/docker.cpp
505:  if (::stat(directory.c_str(), ) < 0) {

src/tests/containerizer/mesos_containerizer_tests.cpp
696:  EXPECT_EQ(0, ::stat(stdoutPath.c_str(), ));
701:  EXPECT_EQ(0, ::stat(stderrPath.c_str(), ));

src/tests/containerizer/docker_containerizer_tests.cpp
4221:  ASSERT_GE(::stat(stdoutPath.c_str(), ), 0);

3rdparty/stout/include/stout/os/permissions.hpp
64:  if (::stat(path.c_str(), ) < 0) {
{noformat}

I think we can mostly ignore those in test (and I already removed Linux/POSIX 
results). For {{permissions.hpp}}, it seems reasonably to me to just emit a 
warning pending MESOS-3176, which just leaves the use in {{docker.cpp}}...

> Remove use of ::_stat on Windows
> 
>
> Key: MESOS-8275
> URL: https://issues.apache.org/jira/browse/MESOS-8275
> Project: Mesos
>  Issue Type: Task
> Environment: Windows
>Reporter: Andrew Schwartzmeyer
>Assignee: Andrew Schwartzmeyer
>Priority: Major
>  Labels: stout, windows
>
> The Windows stat.hpp header has some remaining uses of non-long-path-aware 
> CRT APIs, specifically {{::_stat}}. This has been punted so far as not yet a 
> problem, but eventually should be fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8618) ReconciliationTest.ReconcileStatusUpdateTaskState is flaky.

2018-04-23 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448971#comment-16448971
 ] 

Yan Xu commented on MESOS-8618:
---

This test failed because we didn't enable replicated log registry so the master 
doesn't know the agent when it reregisters. With MESOS-6406 the intention was 
not to send status updates actively when it is a known agent.

However the discussions about this test exposed a bug that we are not sending 
the "status update state" in this case, for which I filed MESOS-8824 and will 
fix next.

> ReconciliationTest.ReconcileStatusUpdateTaskState is flaky.
> ---
>
> Key: MESOS-8618
> URL: https://issues.apache.org/jira/browse/MESOS-8618
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: ec Debian 9 with SSL
>Reporter: Alexander Rukletsov
>Assignee: Yan Xu
>Priority: Major
>  Labels: flaky-test
> Attachments: 
> ReconciliationTest.ReconcileStatusUpdateTaskState-badrun.txt
>
>
> {noformat}
> ../../src/tests/reconciliation_tests.cpp:1129
>   Expected: TASK_RUNNING
> To be equal to: update->state()
>   Which is: TASK_FINISHED
> {noformat}
> {noformat}
> ../../src/tests/reconciliation_tests.cpp:1130: Failure
>   Expected: TaskStatus::REASON_RECONCILIATION
>   Which is: 9
> To be equal to: update->reason()
>   Which is: 32
> {noformat}
> Full log attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8824) Send the task's latest "status update state" to frameworks when an unreachable agent reregisters.

2018-04-23 Thread Yan Xu (JIRA)
Yan Xu created MESOS-8824:
-

 Summary: Send the task's latest "status update state" to 
frameworks when an unreachable agent reregisters.
 Key: MESOS-8824
 URL: https://issues.apache.org/jira/browse/MESOS-8824
 Project: Mesos
  Issue Type: Bug
Reporter: Yan Xu


With MESOS-6406 the master started to actively send frameworks status updates 
for reregistering agents if the agent:
 - has previously been removed by the master for being unreachable or
 - is unknown to the master due to the garbage collection of the
 unreachable and gone agents in the registry and the master's state.

However we sent the task's [latest 
state|https://github.com/apache/mesos/blob/3711d66aa9eb70e12b184d3c2f79bf56fbd9cffa/include/mesos/v1/mesos.proto#L2147]
 instead of its [latest status update 
state|https://github.com/apache/mesos/blob/3711d66aa9eb70e12b184d3c2f79bf56fbd9cffa/include/mesos/v1/mesos.proto#L2154]
 which means the framework could first get an update with a {{TASK_FINISHED}} 
and then later {{TASK_RUNNING}}.

This is inconsistent with the handling of other master generated updates, e.g,. 
[during 
reconciliation|https://github.com/apache/mesos/blob/3711d66aa9eb70e12b184d3c2f79bf56fbd9cffa/src/master/master.cpp#L8603];
 we should send the status update state instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8729) Libprocess: deadlock in process::finalize

2018-04-23 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448927#comment-16448927
 ] 

Benjamin Mahler commented on MESOS-8729:


Filed MESOS-8823 to capture the broader effort needed to make process::finalize 
safe for use.

> Libprocess: deadlock in process::finalize
> -
>
> Key: MESOS-8729
> URL: https://issues.apache.org/jira/browse/MESOS-8729
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 1.6.0
> Environment: The issue has been reproduced on Ubuntu 16.04, master 
> branch, commit `42848653b2`. 
>Reporter: Andrei Budnik
>Priority: Major
>  Labels: deadlock, libprocess
> Attachments: deadlock.txt
>
>
> Since we are calling 
> [`libprocess::finalize()`|https://github.com/apache/mesos/blob/02ebf9986ab5ce883a71df72e9e3392a3e37e40e/src/slave/containerizer/mesos/io/switchboard_main.cpp#L157]
>  before returning from the IOSwitchboard's main function, we expect that all 
> http responses are going to be sent back to clients before IOSwitchboard 
> terminates. However, after [adding|https://reviews.apache.org/r/66147/] 
> `libprocess::finalize()` we have seen that IOSwitchboard might get stuck in 
> `libprocess::finalize()`. See attached stacktrace.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8823) process::finalize is broken.

2018-04-23 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-8823:
--

 Summary: process::finalize is broken.
 Key: MESOS-8823
 URL: https://issues.apache.org/jira/browse/MESOS-8823
 Project: Mesos
  Issue Type: Epic
  Components: libprocess
Reporter: Benjamin Mahler


{{process::finalize}} can:

(1) deadlock: MESOS-8729
(2) crash if re-initialization occurs
(3) therefore if any calls that use an implicit {{initialize()}} occur (e.g. 
{{spawn()}} post-finalize() they will crash
(4) crash if finalize() is called more than once



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-8778) Fatal error in `DRFSorter::unallocated()` in `SharedPersistentVolumeRescindOnDestroy` test.

2018-04-23 Thread Meng Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Meng Zhu reassigned MESOS-8778:
---

Assignee: Meng Zhu

> Fatal error in `DRFSorter::unallocated()` in 
> `SharedPersistentVolumeRescindOnDestroy` test.
> ---
>
> Key: MESOS-8778
> URL: https://issues.apache.org/jira/browse/MESOS-8778
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, master
> Environment: Centos 6 SSL (internal CI)
> asf/master-99c73e0c (12-Apr-2018 09:17:26)
>Reporter: Andrei Budnik
>Assignee: Meng Zhu
>Priority: Major
>  Labels: flaky, flaky-test
> Attachments: SharedPersistentVolumeRescindOnDestroy-badrun.txt
>
>
> {code}
> 09:52:15 F0412 09:52:12.391731 23764 sorter.hpp:369] Check failed: 
> scalarQuantities.contains(quantitiesToRemove) cpus:1; mem:128 does not 
> contain cpus:1; mem:128; disk(reservations: [(STATIC,default-role)]):4096
> 09:52:15 *** Check failure stack trace: ***
> 09:52:15 @ 0x7ff0dfbf5abd  google::LogMessage::Fail()
> 09:52:15 @ 0x7ff0dfbf790d  google::LogMessage::SendToLog()
> 09:52:15 @ 0x7ff0dfbf56a3  google::LogMessage::Flush()
> 09:52:15 @ 0x7ff0dfbf8309  google::LogMessageFatal::~LogMessageFatal()
> 09:52:15 @ 0x7ff0dee93035  
> mesos::internal::master::allocator::DRFSorter::unallocated()
> 09:52:15 @ 0x7ff0dee69e91  
> mesos::internal::master::allocator::internal::HierarchicalAllocatorProcess::untrackAllocatedResources()
> 09:52:15 @ 0x7ff0dee707a5  
> mesos::internal::master::allocator::internal::HierarchicalAllocatorProcess::recoverResources()
> 09:52:15 @ 0x7ff0dfb46661  process::ProcessBase::consume()
> 09:52:15 @ 0x7ff0dfb5f8ca  process::ProcessManager::resume()
> 09:52:15 @ 0x7ff0dfb63346  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
> 09:52:15 @ 0x7ff0dfd1f2e0  execute_native_thread_routine
> 09:52:15 @ 0x7ff0dcd13aa1  start_thread
> 09:52:15 @ 0x7ff0dc0b8bcd  clone
> {code}
> Observed this failure in internal CI for test
> {code}
> DiskResource/PersistentVolumeTest.SharedPersistentVolumeRescindOnDestroy/1
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8614) DefaultExecutorTests occassionally crash in the V1 Scheduler code

2018-04-23 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448746#comment-16448746
 ] 

Joseph Wu commented on MESOS-8614:
--

Modified the JIRA title as this also appears in {{KillMultipleTasks/0}} (and 
probably can happen in any of the same tests using the V1 Scheduler mock).

> DefaultExecutorTests occassionally crash in the V1 Scheduler code
> -
>
> Key: MESOS-8614
> URL: https://issues.apache.org/jira/browse/MESOS-8614
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Chun-Hung Hsiao
>Priority: Major
>  Labels: flaky-test, mesosphere
> Attachments: KillMultipleTasks-badrun.txt, consoleText.1.log, 
> consoleText.2.log, consoleText.3.log
>
>
> Occasionally the {{DefaultExecutorTest.ResourceLimitation/0}} and 
> {{DefaultExecutorTest.ROOT_ContainerStatusForTask/0}} would crash with the 
> following logs:
> {noformat}
> I*** Aborted at 1519639358 (unix time) try "date -d @1519639358" if you are 
> using GNU date ***
> 0226 10:02:38.030114 21366 task_status_update_manager.cpp:538] Cleaning up 
> status update stream for task a332e0b5-a713-47b2-85d8-358ce6a4118a of 
> framework 507afc07-d395-4e76-aa11-4562ae07a9b3-
> I0226 10:02:38.029911 21370 gc.cpp:90] Scheduling 
> '/tmp/ROOT_DOCKER_DockerAndMesosContainerizers_DefaultExecutorTest_ResourceLimitation_0_UVxsKT/slaves/507afc07-d395-4e76-aa11-4562ae07a9b3-S0/frameworks/507afc07-d395-4e76-aa11-4562ae07a9b3-/executors/default/runs/37678c9e-fc27-40fa-8d26-b540ff88a381'
>  for gc 6.9968157333days in the future
> I0226 10:02:38.030480 21370 gc.cpp:90] Scheduling 
> '/tmp/ROOT_DOCKER_DockerAndMesosContainerizers_DefaultExecutorTest_ResourceLimitation_0_UVxsKT/slaves/507afc07-d395-4e76-aa11-4562ae07a9b3-S0/frameworks/507afc07-d395-4e76-aa11-4562ae07a9b3-/executors/default'
>  for gc 6.9968157333days in the future
> I0226 10:02:38.030591 21370 gc.cpp:90] Scheduling 
> '/tmp/ROOT_DOCKER_DockerAndMesosContainerizers_DefaultExecutorTest_ResourceLimitation_0_UVxsKT/slaves/507afc07-d395-4e76-aa11-4562ae07a9b3-S0/frameworks/507afc07-d395-4e76-aa11-4562ae07a9b3-'
>  for gc 6.9968157333days in the future
> PC: @ 0x7f9b6df74eb3 mesos::v1::scheduler::Mesos::send()
> *** SIGSEGV (@0x0) received by PID 32110 (TID 0x7f9b626a9700) from PID 0; 
> stack trace: ***
> @ 0x7f9b3717b9c2 (unknown)
> @ 0x7f9b37180689 (unknown)
> @ 0x7f9b371743e8 (unknown)
> @ 0x7f9b6b7d3670 (unknown)
> @ 0x7f9b6df74eb3 mesos::v1::scheduler::Mesos::send()
> @ 0x55a24270c0f6 
> _ZNK5mesos8internal5tests2v19scheduler23SendAcknowledgeActionP2INS_2v111FrameworkIDENS5_7AgentIDEE10gmock_ImplIFvPNS5_9scheduler5MesosERKNSA_12Event_UpdateEEE17gmock_PerformImplISC_SF_N7testing8internal12ExcessiveArgESL_SL_SL_SL_SL_SL_SL_EEvRKSt5tupleIJSC_SF_EET_T0_T1_T2_T3_T4_T5_T6_T7_T8_
> @ 0x55a24270c26a 
> _ZN5mesos8internal5tests2v19scheduler23SendAcknowledgeActionP2INS_2v111FrameworkIDENS5_7AgentIDEE10gmock_ImplIFvPNS5_9scheduler5MesosERKNSA_12Event_UpdateEEE7PerformERKSt5tupleIJSC_SF_EE
> @ 0x55a2425fcc1e 
> _ZN7testing8internal12DoBothActionI17PromiseArgActionPILi1EPN7process7PromiseIN5mesos2v19scheduler12Event_UpdateNS5_8internal5tests2v19scheduler23SendAcknowledgeActionP2INS6_11FrameworkIDENS6_7AgentID4ImplIFvPNS7_5MesosERKS8_EE7PerformERKSt5tupleIJSN_SP_EE
> @ 0x55a24262e2b7 
> testing::internal::FunctionMockerBase<>::UntypedPerformAction()
> @ 0x55a2438a2d19 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> @ 0x55a24270f27a 
> mesos::internal::tests::scheduler::MockHTTPScheduler<>::events()
> @ 0x55a24268aae3 std::_Function_handler<>::_M_invoke()
> @ 0x7f9b6df78bf8 process::AsyncExecutorProcess::execute<>()
> @ 0x7f9b6df8155d 
> _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8dispatchI7NothingNS1_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeISH_SaISH_ESL_SR_RSL_EENS1_6FutureIT_EERKNS1_3PIDIT0_EEMSX_FSU_T1_T2_EOT3_OT4_EUlSt10unique_ptrINS1_7PromiseISA_EESt14default_deleteIS1B_EEOSP_OSL_S3_E_JS1E_SP_SL_St12_PlaceholderILi1EEclEOS3_
> @ 0x7f9b6eb3c1f1 process::ProcessBase::consume()
> @ 0x7f9b6eb4eea2 process::ProcessManager::resume()
> @ 0x7f9b6eb52bb6 
> _ZNSt6thread11_State_implISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
> @ 0x7f9b6bcb283f (unknown)
> @ 0x7f9b6b7c96da start_thread
> @ 0x7f9b6b503d7f (unknown)
> {noformat}
> Attached logs of 3 crash instances.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8750) Check failed: !slaves.registered.contains(task->slave_id)

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448618#comment-16448618
 ] 

ASF GitHub Bot commented on MESOS-8750:
---

Github user m9a closed the pull request at:

https://github.com/apache/mesos/pull/279


> Check failed: !slaves.registered.contains(task->slave_id)
> -
>
> Key: MESOS-8750
> URL: https://issues.apache.org/jira/browse/MESOS-8750
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Affects Versions: 1.6.0
>Reporter: Megha Sharma
>Assignee: Megha Sharma
>Priority: Critical
>
> It appears that in certain circumstances an unreachable task doesn't get 
> cleaned up from the framework.unreachableTasks when the respective agent 
> re-registers leading to this check failure later when the framework is being 
> removed. When an agent goes unreachable master adds the tasks from this agent 
> to {{framework.unreachableTasks}} and when such an agent re-registers the 
> master removes the tasks that it specifies during re-registeration from this 
> datastructure but there could be tasks that the agent doesn't know about e.g. 
> if the runTask message for them got dropped and so such tasks will not get 
> removed from unreachableTasks.
> {noformat}
> F0310 13:30:58.856665 62740 master.cpp:9671] Check failed: 
> !slaves.registered.contains(task->slave_id()) Unreachable task  of 
> framework 4f57975b-05dd-4118-8674-5b29a86c6a6c-0850 was found on registered 
> agent 683c4a92-b5a0-490c-998a-6113fc86d37a-S1428
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-4065) slave FD for ZK tcp connection leaked to executor process

2018-04-23 Thread James DeFelice (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448611#comment-16448611
 ] 

James DeFelice commented on MESOS-4065:
---

It looks like the linked ZK ticket was recently resolved.

> slave FD for ZK tcp connection leaked to executor process
> -
>
> Key: MESOS-4065
> URL: https://issues.apache.org/jira/browse/MESOS-4065
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.24.1, 0.25.0, 1.2.2
>Reporter: James DeFelice
>Priority: Major
>  Labels: mesosphere, security
>
> {code}
> core@ip-10-0-0-45 ~ $ ps auxwww|grep -e etcd
> root  1432 99.3  0.0 202420 12928 ?Rsl  21:32  13:51 
> ./etcd-mesos-executor -log_dir=./
> root  1450  0.4  0.1  38332 28752 ?Sl   21:32   0:03 ./etcd 
> --data-dir=etcd_data --name=etcd-1449178273 
> --listen-peer-urls=http://10.0.0.45:1025 
> --initial-advertise-peer-urls=http://10.0.0.45:1025 
> --listen-client-urls=http://10.0.0.45:1026 
> --advertise-client-urls=http://10.0.0.45:1026 
> --initial-cluster=etcd-1449178273=http://10.0.0.45:1025,etcd-1449178271=http://10.0.2.95:1025,etcd-1449178272=http://10.0.2.216:1025
>  --initial-cluster-state=existing
> core  1651  0.0  0.0   6740   928 pts/0S+   21:46   0:00 grep 
> --colour=auto -e etcd
> core@ip-10-0-0-45 ~ $ sudo lsof -p 1432|grep -e 2181
> etcd-meso 1432 root   10u IPv4  21973  0t0TCP 
> ip-10-0-0-45.us-west-2.compute.internal:54016->ip-10-0-5-206.us-west-2.compute.internal:2181
>  (ESTABLISHED)
> core@ip-10-0-0-45 ~ $ ps auxwww|grep -e slave
> root  1124  0.2  0.1 900496 25736 ?Ssl  21:11   0:04 
> /opt/mesosphere/packages/mesos--52cbecde74638029c3ba0ac5e5ab81df8debf0fa/sbin/mesos-slave
> core  1658  0.0  0.0   6740   832 pts/0S+   21:46   0:00 grep 
> --colour=auto -e slave
> core@ip-10-0-0-45 ~ $ sudo lsof -p 1124|grep -e 2181
> mesos-sla 1124 root   10u IPv4  21973  0t0TCP 
> ip-10-0-0-45.us-west-2.compute.internal:54016->ip-10-0-5-206.us-west-2.compute.internal:2181
>  (ESTABLISHED)
> {code}
> I only tested against mesos 0.24.1 and 0.25.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-8618) ReconciliationTest.ReconcileStatusUpdateTaskState is flaky.

2018-04-23 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu reassigned MESOS-8618:
-

Assignee: Yan Xu

> ReconciliationTest.ReconcileStatusUpdateTaskState is flaky.
> ---
>
> Key: MESOS-8618
> URL: https://issues.apache.org/jira/browse/MESOS-8618
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: ec Debian 9 with SSL
>Reporter: Alexander Rukletsov
>Assignee: Yan Xu
>Priority: Major
>  Labels: flaky-test
> Attachments: 
> ReconciliationTest.ReconcileStatusUpdateTaskState-badrun.txt
>
>
> {noformat}
> ../../src/tests/reconciliation_tests.cpp:1129
>   Expected: TASK_RUNNING
> To be equal to: update->state()
>   Which is: TASK_FINISHED
> {noformat}
> {noformat}
> ../../src/tests/reconciliation_tests.cpp:1130: Failure
>   Expected: TaskStatus::REASON_RECONCILIATION
>   Which is: 9
> To be equal to: update->reason()
>   Which is: 32
> {noformat}
> Full log attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8821) stout's `os::permissions()` does not follow symlinks

2018-04-23 Thread Alexander Rojas (JIRA)
Alexander Rojas created MESOS-8821:
--

 Summary: stout's `os::permissions()` does not follow symlinks
 Key: MESOS-8821
 URL: https://issues.apache.org/jira/browse/MESOS-8821
 Project: Mesos
  Issue Type: Task
  Components: stout
Affects Versions: 1.5.0
Reporter: Alexander Rojas


The {{os::permissions()}} function is implemented using {{stat()}}:

{code}
inline Try permissions(const std::string& path)
{
  struct stat status;
  if (::stat(path.c_str(), ) < 0) {
return ErrnoError();
  }

  return Permissions(status.st_mode);
}
{code}

This works pretty well except in cases where the file given is a symlink, since 
symlinks are created with full 777 permissions but defer all the security 
questions to the real file, so probably this function should be implemented 
using {{lstat()}} instead, unless not following symlinks was the original 
intention.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8820) UpgradeTest.ReregisterOldAgentWithMultiRoleMaster is flaky

2018-04-23 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-8820:
---

 Summary: UpgradeTest.ReregisterOldAgentWithMultiRoleMaster is flaky
 Key: MESOS-8820
 URL: https://issues.apache.org/jira/browse/MESOS-8820
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Benjamin Bannier


I see {{UpgradeTest.ReregisterOldAgentWithMultiRoleMaster}} fail rather often 
(>50%) on a busy 16-core machine (concurrent run of {{support/mesos-tidy.sh}} 
and big build).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-8819) mesos.pom file hardcodes developers

2018-04-23 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-8819:
---

Assignee: Benjamin Bannier

> mesos.pom file hardcodes developers
> ---
>
> Key: MESOS-8819
> URL: https://issues.apache.org/jira/browse/MESOS-8819
> Project: Mesos
>  Issue Type: Task
>  Components: java api
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Minor
>
> Currently {{src/java/mesos.pom.in}} hardcodes developers. The information 
> there duplicates {{docs/comitters.md}} and is currently likely outdated and 
> will get out of sync again in the future.
> It seems we should either automatically populate this field during the 
> release process or drop this field without replacement. We already point to 
> the dev mailing list which can be used to reach Mesos developers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8819) mesos.pom file hardcodes developers

2018-04-23 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-8819:
---

 Summary: mesos.pom file hardcodes developers
 Key: MESOS-8819
 URL: https://issues.apache.org/jira/browse/MESOS-8819
 Project: Mesos
  Issue Type: Task
  Components: java api
Reporter: Benjamin Bannier


Currently {{src/java/mesos.pom.in}} hardcodes developers. The information there 
duplicates {{docs/comitters.md}} and is currently likely outdated and will get 
out of sync again in the future.

It seems we should either automatically populate this field during the release 
process or drop this field without replacement. We already point to the dev 
mailing list which can be used to reach Mesos developers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8818) VolumeSandboxPathIsolatorTest.SharedParentTypeVolume fails on macOS

2018-04-23 Thread Jan Schlicht (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447800#comment-16447800
 ] 

Jan Schlicht commented on MESOS-8818:
-

cc [~jpe...@apache.org]

> VolumeSandboxPathIsolatorTest.SharedParentTypeVolume fails on macOS
> ---
>
> Key: MESOS-8818
> URL: https://issues.apache.org/jira/browse/MESOS-8818
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: macOS 10.13.4
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>Priority: Major
>  Labels: mesosphere
>
> This test fails on macOS with:
> {noformat}
> [ RUN  ] VolumeSandboxPathIsolatorTest.SharedParentTypeVolume
> I0423 10:55:19.624977 2767623040 containerizer.cpp:296] Using isolation { 
> environment_secret, filesystem/posix, volume/sandbox_path }
> I0423 10:55:19.625176 2767623040 provisioner.cpp:299] Using default backend 
> 'copy'
> ../../src/tests/containerizer/volume_sandbox_path_isolator_tests.cpp:130: 
> Failure
> create: Unknown or unsupported isolator 'volume/sandbox_path'
> [  FAILED  ] VolumeSandboxPathIsolatorTest.SharedParentTypeVolume (3 ms)
> {noformat}
> Likely a regression introduced in commit 
> {{189efed864ca2455674b0790d6be4a73c820afd6}} which removed 
> {{volume/sandbox_path}} for POSIX.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8818) VolumeSandboxPathIsolatorTest.SharedParentTypeVolume fails on macOS

2018-04-23 Thread Jan Schlicht (JIRA)
Jan Schlicht created MESOS-8818:
---

 Summary: VolumeSandboxPathIsolatorTest.SharedParentTypeVolume 
fails on macOS
 Key: MESOS-8818
 URL: https://issues.apache.org/jira/browse/MESOS-8818
 Project: Mesos
  Issue Type: Bug
  Components: containerization
 Environment: macOS 10.13.4
Reporter: Jan Schlicht
Assignee: Jan Schlicht


This test fails on macOS with:
{noformat}
[ RUN  ] VolumeSandboxPathIsolatorTest.SharedParentTypeVolume
I0423 10:55:19.624977 2767623040 containerizer.cpp:296] Using isolation { 
environment_secret, filesystem/posix, volume/sandbox_path }
I0423 10:55:19.625176 2767623040 provisioner.cpp:299] Using default backend 
'copy'
../../src/tests/containerizer/volume_sandbox_path_isolator_tests.cpp:130: 
Failure
create: Unknown or unsupported isolator 'volume/sandbox_path'
[  FAILED  ] VolumeSandboxPathIsolatorTest.SharedParentTypeVolume (3 ms)
{noformat}

Likely a regression introduced in commit 
{{189efed864ca2455674b0790d6be4a73c820afd6}} which removed 
{{volume/sandbox_path}} for POSIX.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)