[jira] [Commented] (MESOS-7944) Implement jemalloc support for Mesos

2018-04-17 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441877#comment-16441877
 ] 

Alexander Rukletsov commented on MESOS-7944:


{noformat}
commit ca21ca82071f2c53d5817424569977728260da65 (HEAD -> master, apache/master, 
yolo/bevers/jemalloc5)
Author: Alexander Rukletsov 
AuthorDate: Wed Apr 18 05:26:01 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Wed Apr 18 06:20:29 2018 +0200

Refactored `DiskArtifact` in memory profiler.

commit 081ef161a2bd72eb61e354fe9f033f915a5f89cc
Author: Alexander Rukletsov 
AuthorDate: Wed Apr 18 03:10:30 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Wed Apr 18 06:20:29 2018 +0200

Reconciled method names with the actions they perform.

commit 915bf398a930e2b7f0ec571fa5a5712d7bb3ca82
Author: Alexander Rukletsov 
AuthorDate: Wed Apr 18 03:07:32 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Wed Apr 18 06:20:29 2018 +0200

Simplified download pipeline in MemoryProfiler.

commit 963a289c6b3bfcec420092105c1d39837a261bbc
Author: Alexander Rukletsov 
AuthorDate: Wed Apr 18 01:35:40 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Wed Apr 18 06:20:29 2018 +0200

Cleaned up method ordering in memory profiler.

commit f2cb9073c61c1089db559014abe5b6d2d6d71714
Author: Alexander Rukletsov 
AuthorDate: Wed Apr 18 01:50:42 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Wed Apr 18 06:20:29 2018 +0200

Simplified MemoryProfiler by removing a method overload.

commit a3560c16610f79b4b402e0c0af2dadbb16232fdc
Author: Alexander Rukletsov 
AuthorDate: Wed Apr 18 01:01:02 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Wed Apr 18 06:20:29 2018 +0200

Removed std:: prefix in "memory_profiler.cpp".

commit 03221b0b56fb5432df50e77491717c9f2c3056ed
Author: Alexander Rukletsov 
AuthorDate: Wed Apr 18 00:48:53 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Wed Apr 18 06:20:29 2018 +0200

Improved style, comments, and messages in MemoryProfiled.

This patch addresses the following:
  - Single quote instead of backtick in user facing messages;
  - Use process id instead of hardcoded name;
  - Typos and more precise wording in messages and comments;
  - Formatting of help endpoints;
  - Indentation;
  - No period at the end of error messages.

commit 47e88dd987b179d7db7d21b79b70d824560f5a40
Author: Alexander Rukletsov 
AuthorDate: Wed Apr 18 00:02:20 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Wed Apr 18 06:20:29 2018 +0200

Fixed help messages in MemoryProfiler.

commit f5bd65340257dbc7e1784f14a3837808d08be729
Author: Alexander Rukletsov 
AuthorDate: Tue Apr 17 23:05:49 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Wed Apr 18 06:20:29 2018 +0200

Ensured HTTP responses end with period and linefeed.

For consistency, all HTTP responses `MemoryProfiler` can produce now
end with a period and a linefeed character.

commit 58e224efe4173fe5061c968b41712053b7c22a3e
Author: Benno Evers 
AuthorDate: Tue Apr 17 18:31:47 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Wed Apr 18 06:20:29 2018 +0200

Added new --memory_profiling flag to agent and master binaries.

This flag allows explicit disabling of the memory profiler
endpoint in the master and agent binaries.

Review: https://reviews.apache.org/r/63370/

commit 81c9978c0f759f7e39f12ae5343a35aa123b0ba8
Author: Benno Evers 
AuthorDate: Tue Apr 17 18:31:36 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Wed Apr 18 06:20:29 2018 +0200

Added MemoryProfiler class to Libprocess.

This class exposes profiling functionality of jemalloc
memory allocator when it is detected to be the memory
allocator of the current process.

In particular, it enables developers to access live
statistics of current memory usage, and to create
heap profiles that show where most memory is allocated.

Review: https://reviews.apache.org/r/63368/

commit cae2d20c28af742d1b92f5882009cfd0f62dc9d6
Author: Benno Evers 
AuthorDate: Tue Apr 17 18:31:31 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Wed Apr 18 06:20:29 2018 +0200

Added jemalloc release tarball and build rules.

Review: https://reviews.apache.org/r/63366/
{noformat}

> Implement jemalloc support for Mesos
> 
>
> Key: 

[jira] [Commented] (MESOS-8367) The Mesos configuration guide documents the `--ip6` and `--ip6_discovery_command` flags incorrectly

2018-04-17 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441618#comment-16441618
 ] 

Gilbert Song commented on MESOS-8367:
-

[~avinash.mesos][~vi...@twitter.com], do we still want to land this to 1.5.1?

> The Mesos configuration guide documents the `--ip6` and 
> `--ip6_discovery_command` flags incorrectly
> ---
>
> Key: MESOS-8367
> URL: https://issues.apache.org/jira/browse/MESOS-8367
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>Priority: Major
>
> The Apache Mesos configuration guide lists the `ip6_discovery_command` and 
> the `ip6` flags being common to both the master and the agent:
> http://mesos.apache.org/documentation/latest/configuration/master-and-agent/
> However these flags have been introduced only in the agent and not the master:
> ```
> [vagrant@centos8 bin]$ ./mesos-master.sh --help | grep discovery
>   
>with `--ip_discovery_command`.
>   --ip_discovery_command=VALUE
>Optional IP discovery binary: if set, it is expected to emit
> [vagrant@centos8 bin]$ ./mesos-agent.sh --help | grep discovery
>   --appc_simple_discovery_uri_prefix=VALUE URI prefix 
> to be used for simple discovery of appc images,
>with 
> `--ip_discovery_command`.
>with 
> '--ip6_discovery_command'.
>   --ip6_discovery_command=VALUEOptional 
> IPv6 discovery binary: if set, it is expected to emit
>   --ip_discovery_command=VALUE Optional 
> IP discovery binary: if set, it is expected to emit
> [vagrant@centos8 bin]$ cd ~/dev/
> [vagrant@centos8 dev]$ ls
> ```
> So we need to fix the documentation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8247) Executor registered message is lost

2018-04-17 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441616#comment-16441616
 ] 

Gilbert Song commented on MESOS-8247:
-

[~abudnik][~alexr], do we still aim to land this in 1.5.1?

> Executor registered message is lost
> ---
>
> Key: MESOS-8247
> URL: https://issues.apache.org/jira/browse/MESOS-8247
> Project: Mesos
>  Issue Type: Bug
>Reporter: Andrei Budnik
>Priority: Major
>
> h3. Brief description of successful agent-executor communication.
> Executor sends `RegisterExecutorMessage` message to Agent during 
> initialization step. Agent sends a `ExecutorRegisteredMessage` message as a 
> response to the Executor in `registerExecutor()` method. Whenever executor 
> receives `ExecutorRegisteredMessage`, it prints a `Executor registered on 
> agent...` to stderr logs.
> h3. Problem description.
> The agent launches built-in docker executor, which is stuck in `STAGING` 
> state.
> stderr logs of the docker executor:
> {code}
> I1114 23:03:17.919090 14322 exec.cpp:162] Version: 1.2.3
> {code}
> It doesn't contain a message like `Executor registered on agent...`. At the 
> same time agent received `RegisterExecutorMessage` and sent `runTask` message 
> to the executor.
> stdout logs consists of the same repeating message:
> {code}
> Received killTask for task ...
> {code}
> Also, the docker executor process doesn't contain child processes.
> Currently, executor [doesn't 
> attempt|https://github.com/apache/mesos/blob/2a253093ecdc7d743c9c0874d6e01b68f6a813e4/src/exec/exec.cpp#L320]
>  to launch a task if it is not registered at the agent, while [task 
> killing|https://github.com/apache/mesos/blob/2a253093ecdc7d743c9c0874d6e01b68f6a813e4/src/exec/exec.cpp#L343]
>  doesn't have such a check.
> It looks like `ExecutorRegisteredMessage` has been lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-7967) Make `mesos-execute` work with old-style resources

2018-04-17 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441613#comment-16441613
 ] 

Gilbert Song commented on MESOS-7967:
-

[~mcypark], do we still aim to land this in 1.5.1?

> Make `mesos-execute` work with old-style resources
> --
>
> Key: MESOS-7967
> URL: https://issues.apache.org/jira/browse/MESOS-7967
> Project: Mesos
>  Issue Type: Improvement
>  Components: cli
>Reporter: Michael Park
>Priority: Major
>
> {{mesos-execute}} should be updated to be able to handle
> "pre-reservation-refinement" resource format.
> For reservation refinement, new resource format were introduced.
> The master and agent have been carefully updated to be able to handle
> pre/post reservation-refinement resource formats, whereas the example
> frameworks and {{mesos-execute}} were updated such that they require
> the new resource format. While the example frameworks are probably fine
> being updated to use the new format, {{mesos-execute}} is used as a
> developer tool, and as such we should update it to be more robust in its
> handling of resource formats.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-7967) Make `mesos-execute` work with old-style resources

2018-04-17 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441612#comment-16441612
 ] 

Gilbert Song commented on MESOS-7967:
-

[~mcypark], do we still aim to land this in 1.5.1?

> Make `mesos-execute` work with old-style resources
> --
>
> Key: MESOS-7967
> URL: https://issues.apache.org/jira/browse/MESOS-7967
> Project: Mesos
>  Issue Type: Improvement
>  Components: cli
>Reporter: Michael Park
>Priority: Major
>
> {{mesos-execute}} should be updated to be able to handle
> "pre-reservation-refinement" resource format.
> For reservation refinement, new resource format were introduced.
> The master and agent have been carefully updated to be able to handle
> pre/post reservation-refinement resource formats, whereas the example
> frameworks and {{mesos-execute}} were updated such that they require
> the new resource format. While the example frameworks are probably fine
> being updated to use the new format, {{mesos-execute}} is used as a
> developer tool, and as such we should update it to be more robust in its
> handling of resource formats.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-7705) Reconsider restricting the resource format for frameworks.

2018-04-17 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441611#comment-16441611
 ] 

Gilbert Song commented on MESOS-7705:
-

[~mcypark][~bmahler], could we re-target this to 1.5.2?

> Reconsider restricting the resource format for frameworks.
> --
>
> Key: MESOS-7705
> URL: https://issues.apache.org/jira/browse/MESOS-7705
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Michael Park
>Assignee: Michael Park
>Priority: Major
>
> We output the "endpoint" format through the endpoints
> for backward compatibility of external tooling. A framework should be
> able to use the result of an endpoint and pass it back to Mesos,
> since the result was produced by Mesos. This is especially applicable
> to the V1 API. We also allow the "pre-reservation-refinement" format
> because existing "resources files" are written in that format, and
> they should still be usable without modification.
> This is probably too flexible however, since a framework without
> a RESERVATION_REFINEMENT capability could make refined reservations
> using the "post-reservation-refinement" format, although they wouldn't be
> offered such resources. It still seems undesirable if anyone were to
> run into it, and we should consider adding sensible restrictions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-8781) Mesos master shouldn't silently drop operations

2018-04-17 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman reassigned MESOS-8781:
-

Shepherd: Greg Mann
Assignee: Gastón Kleiman
  Sprint: Mesosphere Sprint 78
Story Points: 3

> Mesos master shouldn't silently drop operations
> ---
>
> Key: MESOS-8781
> URL: https://issues.apache.org/jira/browse/MESOS-8781
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>Priority: Major
>
> We should make sure that all call places of {{void Master::drop(Framework*, 
> const Offer::Operation&, const string&)}} send a status update if an 
> operation ID was specified. OR we should make sure that they do NOT send one, 
> and make that method send one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8275) Remove use of ::_stat on Windows

2018-04-17 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441538#comment-16441538
 ] 

Andrew Schwartzmeyer commented on MESOS-8275:
-

The {{dev}}, {{inode}}, and {{mode}} functions can be {{deleted}}d for Windows; 
but {{mtime}} will need to be rewritten with e.g. {{GetFileTime}}.

> Remove use of ::_stat on Windows
> 
>
> Key: MESOS-8275
> URL: https://issues.apache.org/jira/browse/MESOS-8275
> Project: Mesos
>  Issue Type: Task
> Environment: Windows
>Reporter: Andrew Schwartzmeyer
>Assignee: Andrew Schwartzmeyer
>Priority: Major
>  Labels: stout, windows
>
> The Windows stat.hpp header has some remaining uses of non-long-path-aware 
> CRT APIs, specifically {{::_stat}}. This has been punted so far as not yet a 
> problem, but eventually should be fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8799) Master should show dynamic resources in state endpoint

2018-04-17 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-8799:
---

 Summary: Master should show dynamic resources in state endpoint
 Key: MESOS-8799
 URL: https://issues.apache.org/jira/browse/MESOS-8799
 Project: Mesos
  Issue Type: Task
  Components: HTTP API, master
Affects Versions: 1.6.0
Reporter: Benjamin Bannier
Assignee: Benjamin Bannier


The master currently only show static agent resources, i.e., resources defined 
in the agent's {{SlaveInfo}} in its state endpoint. We should fix this code to 
show the dynamical resources so that at least resource provider resources are 
shown. We might need to filter out oversubscribed resources for 
backward-compatibility.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8585) Agent crashes when starting a task with an unknown user.

2018-04-17 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441486#comment-16441486
 ] 

Gilbert Song commented on MESOS-8585:
-

[~jamespeach], is there any update on this issue? should we still mark it as 
blocker for 1.5.1?

> Agent crashes when starting a task with an unknown user.
> 
>
> Key: MESOS-8585
> URL: https://issues.apache.org/jira/browse/MESOS-8585
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Karsten
>Assignee: James Peach
>Priority: Blocker
> Attachments: dcos-mesos-slave.service.1.gz, 
> dcos-mesos-slave.service.2.gz
>
>
> The Marathon team has an integration test that tries to start a task with an 
> unknown user. The test expects a \{{TASK_FAILED}}. However, we see 
> \{{TASK_DROPPED}} instead. The agent logs seem to suggest that the agent 
> crashes and restarts.
>  
> {code}
>  783 2018-02-14 14:55:45: I0214 14:55:45.319974  6213 slave.cpp:2542] 
> Launching task 'sleep-bad-user-7.228ba17d-1197-11e8-baca-6a2835f12cb6' for 
> framework 120721e5-96e5-4c0b-8660-d5ba2e96f05a-0001
> 784 2018-02-14 14:55:45: I0214 14:55:45.320605  6213 paths.cpp:727] 
> Creating sandbox 
> '/var/lib/mesos/slave/slaves/120721e5-96e5-4c0b-8660-d5ba2e96f05a-S3/frameworks/120721e5-96e5-4c0b-8660-d5ba2e96f05
> 784 
> a-0001/executors/sleep-bad-user-7.228ba17d-1197-11e8-baca-6a2835f12cb6/runs/dc99056a-1d85-427f-a34b-ac666d4acc88'
>  for user 'bad'
> 785 2018-02-14 14:55:45: F0214 14:55:45.321131  6213 paths.cpp:735] 
> CHECK_SOME(mkdir): Failed to chown directory to 'bad': No such user 'bad' 
> Failed to create executor directory '/var/lib/mesos/slave/
> 785 
> slaves/120721e5-96e5-4c0b-8660-d5ba2e96f05a-S3/frameworks/120721e5-96e5-4c0b-8660-d5ba2e96f05a-0001/executors/sleep-bad-user-7.228ba17d-1197-11e8-baca-6a2835f12cb6/runs/dc99056a-1d85-427f-a34b-ac6
> 785 66d4acc88'
> 786 2018-02-14 14:55:45: *** Check failure stack trace: ***
> 787 2018-02-14 14:55:45: @ 0x7f72033444ad  
> google::LogMessage::Fail()
> 788 2018-02-14 14:55:45: @ 0x7f72033462dd  
> google::LogMessage::SendToLog()
> 789 2018-02-14 14:55:45: @ 0x7f720334409c  
> google::LogMessage::Flush()
> 790 2018-02-14 14:55:45: @ 0x7f7203346bd9  
> google::LogMessageFatal::~LogMessageFatal()
> 791 2018-02-14 14:55:45: @ 0x56544ca378f9  
> _CheckFatal::~_CheckFatal()
> 792 2018-02-14 14:55:45: @ 0x7f720270f30d  
> mesos::internal::slave::paths::createExecutorDirectory()
> 793 2018-02-14 14:55:45: @ 0x7f720273812c  
> mesos::internal::slave::Framework::addExecutor()
> 794 2018-02-14 14:55:45: @ 0x7f7202753e35  
> mesos::internal::slave::Slave::__run()
> 795 2018-02-14 14:55:45: @ 0x7f7202764292  
> _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8dispatchIN5mesos8internal5slave5SlaveERKNS1_6FutureISt4
> 795 
> listIbSaIbRKNSA_13FrameworkInfoERKNSA_12ExecutorInfoERK6OptionINSA_8TaskInfoEERKSR_INSA_13TaskGroupInfoEERKSt6vectorINSB_19ResourceVersionUUIDESaIS11_EESK_SN_SQ_SV_SZ_S15_EEvRKNS1_3PIDIT_EEMS1
> 795 
> 7_FvT0_T1_T2_T3_T4_T5_EOT6_OT7_OT8_OT9_OT10_OT11_EUlOSI_OSL_OSO_OST_OSX_OS13_S3_E_ISI_SL_SO_ST_SX_S13_St12_PlaceholderILi1EEclEOS3_
> 796 2018-02-14 14:55:45: @ 0x7f72032a2b11  
> process::ProcessBase::consume()
> 797 2018-02-14 14:55:45: @ 0x7f72032b183c  
> process::ProcessManager::resume()
> 798 2018-02-14 14:55:45: @ 0x7f72032b6da6  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
> 799 2018-02-14 14:55:45: @ 0x7f72005ced73  (unknown)
> 800 2018-02-14 14:55:45: @ 0x7f72000cf52c  (unknown)
> 801 2018-02-14 14:55:45: @ 0x7f71ffe0d1dd  (unknown)
> 802 2018-02-14 14:57:15: dcos-mesos-slave.service: Main process exited, 
> code=killed, status=6/ABRT
> 803 2018-02-14 14:57:15: dcos-mesos-slave.service: Unit entered failed 
> state.
> 804 2018-02-14 14:57:15: dcos-mesos-slave.service: Failed with result 
> 'signal'.
> 805 2018-02-14 14:57:20: dcos-mesos-slave.service: Service hold-off time 
> over, scheduling restart.
> 806 2018-02-14 14:57:20: Stopped Mesos Agent: distributed systems kernel 
> agent.
> 807 2018-02-14 14:57:20: Starting Mesos Agent: distributed systems kernel 
> agent...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8798) Link the "unsecure" gRPC libraries to remove SSL dependency

2018-04-17 Thread Chun-Hung Hsiao (JIRA)
Chun-Hung Hsiao created MESOS-8798:
--

 Summary: Link the "unsecure" gRPC libraries to remove SSL 
dependency
 Key: MESOS-8798
 URL: https://issues.apache.org/jira/browse/MESOS-8798
 Project: Mesos
  Issue Type: Improvement
Affects Versions: 1.5.0, 1.4.1, 1.6.0
Reporter: Chun-Hung Hsiao
Assignee: Chun-Hung Hsiao


GRPC can be built without SSL (the "unsecure" libraries) so we should use these 
libraries to avoid a build dependency between gRPC and OpenSSL.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-8683) Remove _close from Windows close.hpp

2018-04-17 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-8683:
---

Assignee: Andrew Schwartzmeyer

> Remove _close from Windows close.hpp
> 
>
> Key: MESOS-8683
> URL: https://issues.apache.org/jira/browse/MESOS-8683
> Project: Mesos
>  Issue Type: Task
>Reporter: Andrew Schwartzmeyer
>Assignee: Andrew Schwartzmeyer
>Priority: Blocker
>  Labels: stout, windows
>
> It should be {{CloseHandle}}, which requires MESOS-8675.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-8681) Clean up os::sendfile on Windows

2018-04-17 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-8681:
---

Shepherd: Andrew Schwartzmeyer
Assignee: Akash Gupta
Target Version/s: 1.6.0

> Clean up os::sendfile on Windows
> 
>
> Key: MESOS-8681
> URL: https://issues.apache.org/jira/browse/MESOS-8681
> Project: Mesos
>  Issue Type: Task
>Reporter: Andrew Schwartzmeyer
>Assignee: Akash Gupta
>Priority: Blocker
>  Labels: stout, windows
>
> We'll want to make sure {{os::sendfile}} is actually working correctly in 
> overlapped mode. Just, revisit this whole thing, and our sockets in general.
> Use {{WSARecv}} over {{recv}} for overlapped support in read.hpp.
> Use {{WSASend}} over {{send}} for overlapped support in write.hpp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8682) Remove remaining CRT functions stout on Windows

2018-04-17 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441126#comment-16441126
 ] 

Andrew Schwartzmeyer commented on MESOS-8682:
-

I removed this from epic MESOS-8668 because those that are needed are already 
in the epic, and the rest of those (under the TODO here) are not urgent.

> Remove remaining CRT functions stout on Windows
> ---
>
> Key: MESOS-8682
> URL: https://issues.apache.org/jira/browse/MESOS-8682
> Project: Mesos
>  Issue Type: Task
>Reporter: Andrew Schwartzmeyer
>Assignee: Andrew Schwartzmeyer
>Priority: Major
>  Labels: stout, windows
>
> h2. TODO
> * {{_waccess}} in access.hpp: does not use fd
> * {{_wmktemp_s}} in mktemp.hpp: does not use fd (reuse logic from mkdtemp.hpp)
> * {{recv}} in read.hpp (also MESOS-8681)
> * {{_stat}} in stat.hpp: does not use fd (MESOS-8275)
> * {{send}} in write.hpp (MESOS-8681)
> * {{strerror_r}} used in strerror.hpp: does not use fd, defined in windows.hpp
> * {{utime}} used in utime.hpp: does not use fd
> h2. DONE
> * {{_wopen}} in open.hpp: *uses fd* (MESOS-8673)
> * {{_close}} in close.hpp: *uses fd* (MESOS-8683)
> * {{_dup}} in dup.hpp: *uses fd* (MESOS-8684)
> * {{_chsize_s}} in ftruncate.hpp: *uses fd* (MESOS-8692)
> * {{_read}} in read.hpp: *uses fd* (also MESOS-8676)
> * {{_write}} in write.hpp: *uses fd* (MESOS-8676)
> * {{_lseek}} in lseek.hpp: *uses fd* (MESOS-8685)
> * {{fstat}} in libprocess/src/http.cpp and http_proxy.cpp: *uses fd*
> * {{_fdopen}} in stout/net.hpp: *uses fd*
> * {{_set_errno}} in kill.hpp: does not use fd (also MESOS-8759)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-8674) Fix os::pipe to work in overlapped mode

2018-04-17 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-8674:
---

Shepherd: Andrew Schwartzmeyer
Assignee: Akash Gupta  (was: Andrew Schwartzmeyer)
Target Version/s: 1.6.0
Priority: Blocker  (was: Major)

> Fix os::pipe to work in overlapped mode
> ---
>
> Key: MESOS-8674
> URL: https://issues.apache.org/jira/browse/MESOS-8674
> Project: Mesos
>  Issue Type: Task
>Reporter: Andrew Schwartzmeyer
>Assignee: Akash Gupta
>Priority: Blocker
>  Labels: stout, windows
>
> This will probably mean using named pipes instead of anonymous pipes. We need 
> to be able to read pipes asynchronously.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8797) Check failed in the default executor while running `MesosContainerizer/DefaultExecutorTest.TaskUsesExecutor/0` test.

2018-04-17 Thread Andrei Budnik (JIRA)
Andrei Budnik created MESOS-8797:


 Summary: Check failed in the default executor while running 
`MesosContainerizer/DefaultExecutorTest.TaskUsesExecutor/0` test.
 Key: MESOS-8797
 URL: https://issues.apache.org/jira/browse/MESOS-8797
 Project: Mesos
  Issue Type: Bug
  Components: executor
 Environment: Centos 7 SSL (internal CI)
master-[a95d9b8|https://github.com/apache/mesos/commit/a95d9b8fb53ab8fbf4a7b6d762c9e0749b4c013a]
 (17-Apr-2018 14:03:14)
Reporter: Andrei Budnik
 Attachments: DefaultExecutorTest.TaskUsesExecutor-badrun.txt

{code:java}
lt-mesos-default-executor: ../../3rdparty/stout/include/stout/option.hpp:119: 
T& Option::get() & [with T = std::basic_string]: Assertion `isSome()' 
failed.
*** Aborted at 1523976443 (unix time) try "date -d @1523976443" if you are 
using GNU date ***
PC: @ 0x7efcfd11f1f7 __GI_raise
*** SIGABRT (@0x4d44) received by PID 19780 (TID 0x7efcf5adb700) from PID 
19780; stack trace: ***
@ 0x7efcfd9da5e0 (unknown)
@ 0x7efcfd11f1f7 __GI_raise
@ 0x7efcfd1208e8 __GI_abort
@ 0x7efcfd118266 __assert_fail_base
@ 0x7efcfd118312 __GI___assert_fail
@ 0x55a05fa269f7 mesos::internal::DefaultExecutor::waited()
@ 0x7efd002212d1 process::ProcessBase::consume()
@ 0x7efd0023a52a process::ProcessManager::resume()
@ 0x7efd0023dfa6 
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
@ 0x7efd003f9470 execute_native_thread_routine
@ 0x7efcfd9d2e25 start_thread
@ 0x7efcfd1e234d __clone
{code}
Observed this failure in internal CI for test
{code:java}
 MesosContainerizer/DefaultExecutorTest.TaskUsesExecutor/0{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-8737) Update composing containerizer tests.

2018-04-17 Thread Andrei Budnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrei Budnik reassigned MESOS-8737:


Assignee: Andrei Budnik

> Update composing containerizer tests.
> -
>
> Key: MESOS-8737
> URL: https://issues.apache.org/jira/browse/MESOS-8737
> Project: Mesos
>  Issue Type: Task
>Reporter: Andrei Budnik
>Assignee: Andrei Budnik
>Priority: Major
>  Labels: mesosphere, test
>
> Composing containerizer tests need to be updated after changing type and 
> semantics of return value for `destroy()` method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (MESOS-8585) Agent crashes when starting a task with an unknown user.

2018-04-17 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-8585:
---
Comment: was deleted

(was: Promoting this to 1.6 blocker since the issue has been introduced in 1.6, 
[~jpe...@apache.org])

> Agent crashes when starting a task with an unknown user.
> 
>
> Key: MESOS-8585
> URL: https://issues.apache.org/jira/browse/MESOS-8585
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Karsten
>Assignee: James Peach
>Priority: Blocker
> Attachments: dcos-mesos-slave.service.1.gz, 
> dcos-mesos-slave.service.2.gz
>
>
> The Marathon team has an integration test that tries to start a task with an 
> unknown user. The test expects a \{{TASK_FAILED}}. However, we see 
> \{{TASK_DROPPED}} instead. The agent logs seem to suggest that the agent 
> crashes and restarts.
>  
> {code}
>  783 2018-02-14 14:55:45: I0214 14:55:45.319974  6213 slave.cpp:2542] 
> Launching task 'sleep-bad-user-7.228ba17d-1197-11e8-baca-6a2835f12cb6' for 
> framework 120721e5-96e5-4c0b-8660-d5ba2e96f05a-0001
> 784 2018-02-14 14:55:45: I0214 14:55:45.320605  6213 paths.cpp:727] 
> Creating sandbox 
> '/var/lib/mesos/slave/slaves/120721e5-96e5-4c0b-8660-d5ba2e96f05a-S3/frameworks/120721e5-96e5-4c0b-8660-d5ba2e96f05
> 784 
> a-0001/executors/sleep-bad-user-7.228ba17d-1197-11e8-baca-6a2835f12cb6/runs/dc99056a-1d85-427f-a34b-ac666d4acc88'
>  for user 'bad'
> 785 2018-02-14 14:55:45: F0214 14:55:45.321131  6213 paths.cpp:735] 
> CHECK_SOME(mkdir): Failed to chown directory to 'bad': No such user 'bad' 
> Failed to create executor directory '/var/lib/mesos/slave/
> 785 
> slaves/120721e5-96e5-4c0b-8660-d5ba2e96f05a-S3/frameworks/120721e5-96e5-4c0b-8660-d5ba2e96f05a-0001/executors/sleep-bad-user-7.228ba17d-1197-11e8-baca-6a2835f12cb6/runs/dc99056a-1d85-427f-a34b-ac6
> 785 66d4acc88'
> 786 2018-02-14 14:55:45: *** Check failure stack trace: ***
> 787 2018-02-14 14:55:45: @ 0x7f72033444ad  
> google::LogMessage::Fail()
> 788 2018-02-14 14:55:45: @ 0x7f72033462dd  
> google::LogMessage::SendToLog()
> 789 2018-02-14 14:55:45: @ 0x7f720334409c  
> google::LogMessage::Flush()
> 790 2018-02-14 14:55:45: @ 0x7f7203346bd9  
> google::LogMessageFatal::~LogMessageFatal()
> 791 2018-02-14 14:55:45: @ 0x56544ca378f9  
> _CheckFatal::~_CheckFatal()
> 792 2018-02-14 14:55:45: @ 0x7f720270f30d  
> mesos::internal::slave::paths::createExecutorDirectory()
> 793 2018-02-14 14:55:45: @ 0x7f720273812c  
> mesos::internal::slave::Framework::addExecutor()
> 794 2018-02-14 14:55:45: @ 0x7f7202753e35  
> mesos::internal::slave::Slave::__run()
> 795 2018-02-14 14:55:45: @ 0x7f7202764292  
> _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8dispatchIN5mesos8internal5slave5SlaveERKNS1_6FutureISt4
> 795 
> listIbSaIbRKNSA_13FrameworkInfoERKNSA_12ExecutorInfoERK6OptionINSA_8TaskInfoEERKSR_INSA_13TaskGroupInfoEERKSt6vectorINSB_19ResourceVersionUUIDESaIS11_EESK_SN_SQ_SV_SZ_S15_EEvRKNS1_3PIDIT_EEMS1
> 795 
> 7_FvT0_T1_T2_T3_T4_T5_EOT6_OT7_OT8_OT9_OT10_OT11_EUlOSI_OSL_OSO_OST_OSX_OS13_S3_E_ISI_SL_SO_ST_SX_S13_St12_PlaceholderILi1EEclEOS3_
> 796 2018-02-14 14:55:45: @ 0x7f72032a2b11  
> process::ProcessBase::consume()
> 797 2018-02-14 14:55:45: @ 0x7f72032b183c  
> process::ProcessManager::resume()
> 798 2018-02-14 14:55:45: @ 0x7f72032b6da6  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
> 799 2018-02-14 14:55:45: @ 0x7f72005ced73  (unknown)
> 800 2018-02-14 14:55:45: @ 0x7f72000cf52c  (unknown)
> 801 2018-02-14 14:55:45: @ 0x7f71ffe0d1dd  (unknown)
> 802 2018-02-14 14:57:15: dcos-mesos-slave.service: Main process exited, 
> code=killed, status=6/ABRT
> 803 2018-02-14 14:57:15: dcos-mesos-slave.service: Unit entered failed 
> state.
> 804 2018-02-14 14:57:15: dcos-mesos-slave.service: Failed with result 
> 'signal'.
> 805 2018-02-14 14:57:20: dcos-mesos-slave.service: Service hold-off time 
> over, scheduling restart.
> 806 2018-02-14 14:57:20: Stopped Mesos Agent: distributed systems kernel 
> agent.
> 807 2018-02-14 14:57:20: Starting Mesos Agent: distributed systems kernel 
> agent...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8585) Agent crashes when starting a task with an unknown user.

2018-04-17 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440904#comment-16440904
 ] 

Alexander Rukletsov commented on MESOS-8585:


Promoting this to 1.6 blocker since the issue has been introduced in 1.6, 
[~jpe...@apache.org]

> Agent crashes when starting a task with an unknown user.
> 
>
> Key: MESOS-8585
> URL: https://issues.apache.org/jira/browse/MESOS-8585
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Karsten
>Assignee: James Peach
>Priority: Blocker
> Attachments: dcos-mesos-slave.service.1.gz, 
> dcos-mesos-slave.service.2.gz
>
>
> The Marathon team has an integration test that tries to start a task with an 
> unknown user. The test expects a \{{TASK_FAILED}}. However, we see 
> \{{TASK_DROPPED}} instead. The agent logs seem to suggest that the agent 
> crashes and restarts.
>  
> {code}
>  783 2018-02-14 14:55:45: I0214 14:55:45.319974  6213 slave.cpp:2542] 
> Launching task 'sleep-bad-user-7.228ba17d-1197-11e8-baca-6a2835f12cb6' for 
> framework 120721e5-96e5-4c0b-8660-d5ba2e96f05a-0001
> 784 2018-02-14 14:55:45: I0214 14:55:45.320605  6213 paths.cpp:727] 
> Creating sandbox 
> '/var/lib/mesos/slave/slaves/120721e5-96e5-4c0b-8660-d5ba2e96f05a-S3/frameworks/120721e5-96e5-4c0b-8660-d5ba2e96f05
> 784 
> a-0001/executors/sleep-bad-user-7.228ba17d-1197-11e8-baca-6a2835f12cb6/runs/dc99056a-1d85-427f-a34b-ac666d4acc88'
>  for user 'bad'
> 785 2018-02-14 14:55:45: F0214 14:55:45.321131  6213 paths.cpp:735] 
> CHECK_SOME(mkdir): Failed to chown directory to 'bad': No such user 'bad' 
> Failed to create executor directory '/var/lib/mesos/slave/
> 785 
> slaves/120721e5-96e5-4c0b-8660-d5ba2e96f05a-S3/frameworks/120721e5-96e5-4c0b-8660-d5ba2e96f05a-0001/executors/sleep-bad-user-7.228ba17d-1197-11e8-baca-6a2835f12cb6/runs/dc99056a-1d85-427f-a34b-ac6
> 785 66d4acc88'
> 786 2018-02-14 14:55:45: *** Check failure stack trace: ***
> 787 2018-02-14 14:55:45: @ 0x7f72033444ad  
> google::LogMessage::Fail()
> 788 2018-02-14 14:55:45: @ 0x7f72033462dd  
> google::LogMessage::SendToLog()
> 789 2018-02-14 14:55:45: @ 0x7f720334409c  
> google::LogMessage::Flush()
> 790 2018-02-14 14:55:45: @ 0x7f7203346bd9  
> google::LogMessageFatal::~LogMessageFatal()
> 791 2018-02-14 14:55:45: @ 0x56544ca378f9  
> _CheckFatal::~_CheckFatal()
> 792 2018-02-14 14:55:45: @ 0x7f720270f30d  
> mesos::internal::slave::paths::createExecutorDirectory()
> 793 2018-02-14 14:55:45: @ 0x7f720273812c  
> mesos::internal::slave::Framework::addExecutor()
> 794 2018-02-14 14:55:45: @ 0x7f7202753e35  
> mesos::internal::slave::Slave::__run()
> 795 2018-02-14 14:55:45: @ 0x7f7202764292  
> _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8dispatchIN5mesos8internal5slave5SlaveERKNS1_6FutureISt4
> 795 
> listIbSaIbRKNSA_13FrameworkInfoERKNSA_12ExecutorInfoERK6OptionINSA_8TaskInfoEERKSR_INSA_13TaskGroupInfoEERKSt6vectorINSB_19ResourceVersionUUIDESaIS11_EESK_SN_SQ_SV_SZ_S15_EEvRKNS1_3PIDIT_EEMS1
> 795 
> 7_FvT0_T1_T2_T3_T4_T5_EOT6_OT7_OT8_OT9_OT10_OT11_EUlOSI_OSL_OSO_OST_OSX_OS13_S3_E_ISI_SL_SO_ST_SX_S13_St12_PlaceholderILi1EEclEOS3_
> 796 2018-02-14 14:55:45: @ 0x7f72032a2b11  
> process::ProcessBase::consume()
> 797 2018-02-14 14:55:45: @ 0x7f72032b183c  
> process::ProcessManager::resume()
> 798 2018-02-14 14:55:45: @ 0x7f72032b6da6  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
> 799 2018-02-14 14:55:45: @ 0x7f72005ced73  (unknown)
> 800 2018-02-14 14:55:45: @ 0x7f72000cf52c  (unknown)
> 801 2018-02-14 14:55:45: @ 0x7f71ffe0d1dd  (unknown)
> 802 2018-02-14 14:57:15: dcos-mesos-slave.service: Main process exited, 
> code=killed, status=6/ABRT
> 803 2018-02-14 14:57:15: dcos-mesos-slave.service: Unit entered failed 
> state.
> 804 2018-02-14 14:57:15: dcos-mesos-slave.service: Failed with result 
> 'signal'.
> 805 2018-02-14 14:57:20: dcos-mesos-slave.service: Service hold-off time 
> over, scheduling restart.
> 806 2018-02-14 14:57:20: Stopped Mesos Agent: distributed systems kernel 
> agent.
> 807 2018-02-14 14:57:20: Starting Mesos Agent: distributed systems kernel 
> agent...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8416) CHECK failure if trying to recover nested containers but the framework checkpointing is not enabled.

2018-04-17 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440897#comment-16440897
 ] 

Alexander Rukletsov commented on MESOS-8416:


[~gilbert] promoted it to the blocker for 1.6.0 per your comment above. Can you 
please help me estimate the workload and find someone to help fix it before we 
cut 1.6 branch?

> CHECK failure if trying to recover nested containers but the framework 
> checkpointing is not enabled.
> 
>
> Key: MESOS-8416
> URL: https://issues.apache.org/jira/browse/MESOS-8416
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>Priority: Blocker
>  Labels: containerizer, mesosphere
>
> {noformat}
> I0108 23:05:25.313344 31743 slave.cpp:620] Agent attributes: [  ]
> I0108 23:05:25.313832 31743 slave.cpp:629] Agent hostname: 
> vagrant-ubuntu-wily-64
> I0108 23:05:25.314916 31763 task_status_update_manager.cpp:181] Pausing 
> sending task status updates
> I0108 23:05:25.323496 31766 state.cpp:66] Recovering state from 
> '/var/lib/mesos/slave/meta'
> I0108 23:05:25.323639 31766 state.cpp:724] No committed checkpointed 
> resources found at '/var/lib/mesos/slave/meta/resources/resources.info'
> I0108 23:05:25.326169 31760 task_status_update_manager.cpp:207] Recovering 
> task status update manager
> I0108 23:05:25.326954 31759 containerizer.cpp:674] Recovering containerizer
> F0108 23:05:25.331529 31759 containerizer.cpp:919] 
> CHECK_SOME(container->directory): is NONE 
> *** Check failure stack trace: ***
> @ 0x7f769dbc98bd  google::LogMessage::Fail()
> @ 0x7f769dbc8c8e  google::LogMessage::SendToLog()
> @ 0x7f769dbc958d  google::LogMessage::Flush()
> @ 0x7f769dbcca08  google::LogMessageFatal::~LogMessageFatal()
> @ 0x556cb4c2b937  _CheckFatal::~_CheckFatal()
> @ 0x7f769c5ac653  
> mesos::internal::slave::MesosContainerizerProcess::recover()
> {noformat}
> If the framework does not enable the checkpointing. It means there is no 
> slave state checkpointed. But containers are still checkpointed at the 
> runtime dir, which mean recovering a nested container would cause the CHECK 
> failure due to its parent's sandbox dir is unknown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8257) Unified Containerizer "leaks" a target container mount path to the host FS when the target resolves to an absolute path

2018-04-17 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440889#comment-16440889
 ] 

Alexander Rukletsov commented on MESOS-8257:


What is the status on this one, [~jasonlai], [~jieyu]?

> Unified Containerizer "leaks" a target container mount path to the host FS 
> when the target resolves to an absolute path
> ---
>
> Key: MESOS-8257
> URL: https://issues.apache.org/jira/browse/MESOS-8257
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.3.1, 1.4.1, 1.5.0
>Reporter: Jason Lai
>Assignee: Jason Lai
>Priority: Critical
>  Labels: bug, containerizer, mountpath
>
> If a target path under the root FS provisioned from an image resolves to an 
> absolute path, it will not appear in the container root FS after 
> {{pivot_root(2)}} is called.
> A typical example is that when the target path is under {{/var/run}} (e.g. 
> {{/var/run/some-dir}}), which is usually a symlink to an absolute path of 
> {{/run}} in Debian images, the target path will get resolved as and created 
> at {{/run/some-dir}} in the host root FS, after the container root FS gets 
> provisioned. The target path will get unmounted after {{pivot_root(2)}} as it 
> is part of the old root (host FS).
> A workaround is to use {{/run}} instead of {{/var/run}}, but absolute 
> symlinks need to be resolved within the scope of the container root FS path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (MESOS-6240) Allow executor/agent communication over non-TCP/IP stream socket.

2018-04-17 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440888#comment-16440888
 ] 

Alexander Rukletsov edited comment on MESOS-6240 at 4/17/18 2:06 PM:
-

No progress for several months. I remove the target version, 
[~benjaminhindman], [~avinash.mesos].


was (Author: alexr):
No progress for several month. I remove the target version, [~benjaminhindman], 
[~avinash.mesos].

> Allow executor/agent communication over non-TCP/IP stream socket.
> -
>
> Key: MESOS-6240
> URL: https://issues.apache.org/jira/browse/MESOS-6240
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
> Environment: Linux and Windows
>Reporter: Avinash Sridharan
>Assignee: Benjamin Hindman
>Priority: Critical
>  Labels: mesosphere
>
> Currently, the executor agent communication happens specifically over TCP 
> sockets. This works fine in most cases, but specifically for the 
> `MesosContainerizer` when containers are running on CNI networks, this mode 
> of communication starts imposing constraints on the CNI network. Since, now 
> there has to connectivity between the CNI network  (on which the executor is 
> running) and the agent. Introducing paths from a CNI network to the 
> underlying agent, at best, creates headaches for operators and at worst 
> introduces serious security holes in the network, since it is breaking the 
> isolation between the container CNI network and the host network (on which 
> the agent is running).
> In order to simplify/strengthen deployment of Mesos containers on CNI 
> networks we therefore need to move away from using TCP/IP sockets for 
> executor/agent communication. Since, executor and agent are guaranteed to run 
> on the same host, the above problems can be resolved if, for the 
> `MesosContainerizer`, we use UNIX domain sockets or named pipes instead of 
> TCP/IP sockets for the executor/agent communication.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (MESOS-8796) GroupTest.GroupPathWithRestrictivePerms is flaky.

2018-04-17 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440830#comment-16440830
 ] 

Alexander Rukletsov edited comment on MESOS-8796 at 4/17/18 1:23 PM:
-

For {{GroupPathWithRestrictivePerms}}, I see a few zookeeper timeouts like
{noformat}
ZOO_WARN@zookeeper_interest@1597: Exceeded deadline by 7705ms
{noformat}
which is likely led to lost connection and expired session and then to the test 
failure. Not sure if this is the problem in the test or the mac machine.


was (Author: alexr):
I see a few zookeeper timeouts like
{noformat}
ZOO_WARN@zookeeper_interest@1597: Exceeded deadline by 7705ms
{noformat}
which is likely led to lost connection and expired session and then to the test 
failure. Not sure if this is the problem in the test or the mac machine.

> GroupTest.GroupPathWithRestrictivePerms is flaky.
> -
>
> Key: MESOS-8796
> URL: https://issues.apache.org/jira/browse/MESOS-8796
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.6.0
> Environment: Mac OS with SSL enabled
>Reporter: Alexander Rukletsov
>Priority: Minor
>  Labels: flaky, flaky-test
> Attachments: GroupPathWithRestrictivePerms-badrun.txt, 
> GroupPathWithRestrictivePerms-goodrun.txt
>
>
> I see some failures related to zookeeper on our Mac machine. Current list of 
> failing tests:
> {noformat}
> GroupTest.GroupPathWithRestrictivePerms
> GroupTest.RetryableErrors
> {noformat}
> Full logs attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8796) GroupTest.GroupPathWithRestrictivePerms is flaky.

2018-04-17 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440830#comment-16440830
 ] 

Alexander Rukletsov commented on MESOS-8796:


I see a few zookeeper timeouts like
{noformat}
ZOO_WARN@zookeeper_interest@1597: Exceeded deadline by 7705ms
{noformat}
which is likely led to lost connection and expired session and then to the test 
failure. Not sure if this is the problem in the test or the mac machine.

> GroupTest.GroupPathWithRestrictivePerms is flaky.
> -
>
> Key: MESOS-8796
> URL: https://issues.apache.org/jira/browse/MESOS-8796
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.6.0
> Environment: Mac OS with SSL enabled
>Reporter: Alexander Rukletsov
>Priority: Minor
>  Labels: flaky, flaky-test
> Attachments: GroupPathWithRestrictivePerms-badrun.txt, 
> GroupPathWithRestrictivePerms-goodrun.txt
>
>
> {noformat}
> ../../src/tests/group_tests.cpp:341
> Failed to wait 15secs for successGroup.join("succeed")
> {noformat}
> Full log attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8795) Catch up new CLI features to be the same as the old one.

2018-04-17 Thread Armand Grillet (JIRA)
Armand Grillet created MESOS-8795:
-

 Summary: Catch up new CLI features to be the same as the old one.
 Key: MESOS-8795
 URL: https://issues.apache.org/jira/browse/MESOS-8795
 Project: Mesos
  Issue Type: Task
Reporter: Armand Grillet
Assignee: Armand Grillet


https://github.com/mesosphere/mesos-cli



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)