[jira] [Created] (MESOS-6774) Role sorter and quota role sorter can have more copies of share resources in allocations than in total.

2016-12-09 Thread Yan Xu (JIRA)
Yan Xu created MESOS-6774:
-

 Summary: Role sorter and quota role sorter can have more copies of 
share resources in allocations than in total.
 Key: MESOS-6774
 URL: https://issues.apache.org/jira/browse/MESOS-6774
 Project: Mesos
  Issue Type: Improvement
  Components: allocation
Reporter: Yan Xu


The way shared resources support works in the allocator is to allocate multiple 
copies of the shared resources so multiple frameworks can receive them. 
Multiple copies of the same shared resources doesn't affect the quantity of the 
sorter's allocations and total pool so it doesn't have an impact on DRF.

To make resource accounting work, though, when the copies of the same resource 
are add to a framework's allocation, we increase total size of the total pool 
in the sorter (again, adding these copies doesn't affect quantity) so that the 
*allocations in a sorter is always bounded by the total pool in the sorter*. 
This invariant is a requirement for the following logic in the allocator to 
work:

{code:title=Remove the resources from the framework sorter when it's 
unallocated from the framework}
  frameworkSorters[role]->unallocated(
  frameworkId.value(), slaveId, resources);
  frameworkSorters[role]->remove(slaveId, resources);
{code}

e.g., if there are 2 copies of a shared disk allocated to framework1, the 
sorter's total pool has 2 copies of the disk as well.

However we currently only do this for the framework sorter below a role because 
the allocator (implicitly) assumes that role sorter, being the root-level 
sorter, has a total pool that's unchanged during allocation or resource 
recover. This is not a problem right now because for this reason, 
{{Sorter::add(const SlaveID& slaveId, const Resources& resources)/remove(const 
SlaveID& slaveId, const Resources& resources)}} are not called during 
allocation or resource recover.

This will likely change with MESOS-6375, when role sorters are having a 
hierarchy so not all of them are bound to the physical size of the cluster. We 
should revisit the shared resource allocation logic then to make sure the 
invariant *allocations in a sorter is always bounded by the total pool in the 
sorter* holds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6717) Add Windows support to agent test harness

2016-12-09 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737005#comment-15737005
 ] 

Joseph Wu edited comment on MESOS-6717 at 12/10/16 2:07 AM:


{code}
commit b9ef614c53373fdc3aadc0f237e7533b1d2a7209
Author: Alex Clemmer 
Date:   Thu Dec 8 17:18:01 2016 -0800

Stout: Moved `os::getenv` from `os.hpp` to `os/getenv.hpp`.

This commit moves `os::getenv` to its own file under the `stout/os/`
directory in preparation for a functional change to `os::temp`;
which should look for the standard environment variable `TMPDIR`
before falling back to `/tmp` on POSIX environments.

In other words, `stout/os/temp.hpp` needs to call `os::getenv`, but
doing so prior to this commit would introduce a circular header
dependency with `stout/os.hpp`.  (`stout/os.hpp` aggregates all header
files in `stout/os/` and therefore, no files in `stout/os/` should be
taking a dependency on it.)

Review: https://reviews.apache.org/r/54519/
{code}
{code}
commit ca0a8d552cd098593c3c3b0f76d5215846de7120
Author: Alex Clemmer 
Date:   Thu Dec 8 17:25:06 2016 -0800

Stout: Added logic for TMPDIR environment variable in `os::temp`.

`TMPDIR` is a POSIX-standard environment variable which can be used
to specify a temporary directory.  This variable is currently read
in the agent tests, but ignored in other parts of the codebase.
(`os::temp` is commonly used by `os::mkdtemp`.)

This commit is one of two commits that will normalize the location
of the temporary directory.

Review: https://reviews.apache.org/r/54489/
{code}
{code}
commit 883f5d2e31eb3f73e808e58c020f5b68ca7b2e1d
Author: Alex Clemmer 
Date:   Thu Dec 8 17:38:41 2016 -0800

Normalized how temporary directories are determined in tests.

This changes the Mesos tests to use the updated `os::temp` helper,
which (on POSIX) now checks the `TMPDIR` environment variable.

On Windows, this changes the temporary directory to an appropriate
location (`/tmp` does not exist on Windows by default).

Review: https://reviews.apache.org/r/54490/
{code}
{code}
commit 3511b5407710e9a0d0a668ce1663a8d89cc028ca
Author: Joseph Wu 
Date:   Fri Dec 9 17:50:38 2016 -0800

Removed the UUID from IO Switchboard tests.

The IO switchboard server creates a UNIX socket at a given path.
Due to OS constraints, this path must be less than 104 characters long.

In the tests, the path is set to a value based on the test directory.
If the test directory is too long, the UNIX socket creation will fail,
as observed in OSX, where the standard temporary directory does not
default to `/tmp` (as is the case on most Linux's).

The test directory was changed to provide platform-specific values
in this review:
https://reviews.apache.org/r/54490/

This commit shortens the UNIX socket address by removing the UUID.
This is safe because we are not running multiple IO switchboards
in the same test, in the same directory.
{code}


was (Author: kaysoky):
{code}
commit ca0a8d552cd098593c3c3b0f76d5215846de7120
Author: Alex Clemmer 
Date:   Thu Dec 8 17:25:06 2016 -0800

Stout: Added logic for TMPDIR environment variable in `os::temp`.

`TMPDIR` is a POSIX-standard environment variable which can be used
to specify a temporary directory.  This variable is currently read
in the agent tests, but ignored in other parts of the codebase.
(`os::temp` is commonly used by `os::mkdtemp`.)

This commit is one of two commits that will normalize the location
of the temporary directory.

Review: https://reviews.apache.org/r/54489/
{code}
{code}
commit 883f5d2e31eb3f73e808e58c020f5b68ca7b2e1d
Author: Alex Clemmer 
Date:   Thu Dec 8 17:38:41 2016 -0800

Normalized how temporary directories are determined in tests.

This changes the Mesos tests to use the updated `os::temp` helper,
which (on POSIX) now checks the `TMPDIR` environment variable.

On Windows, this changes the temporary directory to an appropriate
location (`/tmp` does not exist on Windows by default).

Review: https://reviews.apache.org/r/54490/
{code}
{code}
commit 3511b5407710e9a0d0a668ce1663a8d89cc028ca
Author: Joseph Wu 
Date:   Fri Dec 9 17:50:38 2016 -0800

Removed the UUID from IO Switchboard tests.

The IO switchboard server creates a UNIX socket at a given path.
Due to OS constraints, this path must be less than 104 characters long.

In the tests, the path is set to a value based on the test directory.
If the test directory is too long, the UNIX socket creation 

[jira] [Commented] (MESOS-6717) Add Windows support to agent test harness

2016-12-09 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737005#comment-15737005
 ] 

Joseph Wu commented on MESOS-6717:
--

{code}
commit ca0a8d552cd098593c3c3b0f76d5215846de7120
Author: Alex Clemmer 
Date:   Thu Dec 8 17:25:06 2016 -0800

Stout: Added logic for TMPDIR environment variable in `os::temp`.

`TMPDIR` is a POSIX-standard environment variable which can be used
to specify a temporary directory.  This variable is currently read
in the agent tests, but ignored in other parts of the codebase.
(`os::temp` is commonly used by `os::mkdtemp`.)

This commit is one of two commits that will normalize the location
of the temporary directory.

Review: https://reviews.apache.org/r/54489/
{code}
{code}
commit 883f5d2e31eb3f73e808e58c020f5b68ca7b2e1d
Author: Alex Clemmer 
Date:   Thu Dec 8 17:38:41 2016 -0800

Normalized how temporary directories are determined in tests.

This changes the Mesos tests to use the updated `os::temp` helper,
which (on POSIX) now checks the `TMPDIR` environment variable.

On Windows, this changes the temporary directory to an appropriate
location (`/tmp` does not exist on Windows by default).

Review: https://reviews.apache.org/r/54490/
{code}
{code}
commit 3511b5407710e9a0d0a668ce1663a8d89cc028ca
Author: Joseph Wu 
Date:   Fri Dec 9 17:50:38 2016 -0800

Removed the UUID from IO Switchboard tests.

The IO switchboard server creates a UNIX socket at a given path.
Due to OS constraints, this path must be less than 104 characters long.

In the tests, the path is set to a value based on the test directory.
If the test directory is too long, the UNIX socket creation will fail,
as observed in OSX, where the standard temporary directory does not
default to `/tmp` (as is the case on most Linux's).

The test directory was changed to provide platform-specific values
in this review:
https://reviews.apache.org/r/54490/

This commit shortens the UNIX socket address by removing the UUID.
This is safe because we are not running multiple IO switchboards
in the same test, in the same directory.
{code}

> Add Windows support to agent test harness
> -
>
> Key: MESOS-6717
> URL: https://issues.apache.org/jira/browse/MESOS-6717
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>Priority: Blocker
>  Labels: microsoft, windows-mvp
>
> Of particular interest is in `src/tests/CMakeLists.txt` is support enough of 
> the following that we can successfully run agent tests:
> TEST_HELPER_SRC
> MESOS_TESTS_UTILS_SRC



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6773) Provide REST-style endpoints that map to v1 master/agent Calls.

2016-12-09 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-6773:
--

 Summary: Provide REST-style endpoints that map to v1 master/agent 
Calls.
 Key: MESOS-6773
 URL: https://issues.apache.org/jira/browse/MESOS-6773
 Project: Mesos
  Issue Type: Improvement
  Components: HTTP API
Reporter: Benjamin Mahler


With the addition of V1 {{master::Call}} and {{agent::Call}} to replace the V0 
REST-style endpoints (e.g. /state, /metrics/snapshot, etc), users can no longer 
hit these endpoints in their browser or use query parameters. Also, tooling has 
to send POST data, which is a bit more onerous in most libraries than simply 
using a URL with query parameters.

Per the [design 
doc|https://docs.google.com/document/d/1XfgF4jDXZDVIEWQPx6Y4glgeTTswAAxw6j8dPDAtoeI],
 we can add a mapping to REST-style endpoints to provide users with a means to 
hit these endpoints without POST data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6772) Stop building mesos-slave.

2016-12-09 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15736652#comment-15736652
 ] 

James Peach commented on MESOS-6772:


Rather than building binaries for both {{mesos-agent}} and {{mesos-slave}}, 
just install a symlink from the latter to the former.

> Stop building mesos-slave.
> --
>
> Key: MESOS-6772
> URL: https://issues.apache.org/jira/browse/MESOS-6772
> Project: Mesos
>  Issue Type: Bug
>Reporter: James Peach
>Assignee: James Peach
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6772) Stop building mesos-slave.

2016-12-09 Thread James Peach (JIRA)
James Peach created MESOS-6772:
--

 Summary: Stop building mesos-slave.
 Key: MESOS-6772
 URL: https://issues.apache.org/jira/browse/MESOS-6772
 Project: Mesos
  Issue Type: Bug
Reporter: James Peach
Assignee: James Peach






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5849) Agent sandboxes on Windows surpass the 260 character path length limit

2016-12-09 Thread Alex Clemmer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Clemmer updated MESOS-5849:

Assignee: Daniel Pravat  (was: Alex Clemmer)

> Agent sandboxes on Windows surpass the 260 character path length limit
> --
>
> Key: MESOS-5849
> URL: https://issues.apache.org/jira/browse/MESOS-5849
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
> Environment: Windows Server 2012, Windows Server 2016 RC
>Reporter: Lokendra Malik
>Assignee: Daniel Pravat
>Priority: Blocker
>  Labels: microsoft, tech-debt, windows
> Attachments: Pasted image at 2016_07_14 09_02 PM.png, mesoscrash.jpg
>
>
> When I tried to deploy an application on mesos-agent(windows), the moment 
> application is deployed mesos agent service on windows node is crashed and in 
> logs I can see error:
> I0714 07:20:09.788785  5640 containerizer.cpp:781] Starting container 
> '031878d5-32fa-41ed-8b23-d0d91fe34f05' for executor 
> 'windemo.10cc3e54-49ce-11e6-a2a2-08002786cbbf' of framework 
> '5c83c39f-75a0-4f38-9e47-633767b47976-'
> F0714 07:20:09.797576  5480 slave.cpp:6174] 
> CHECK_SOME(state::checkpoint(path, t)): Failed to create directory 
> 'E:\agentlogs\meta\slaves\803264d5-8f2d-46bb-8019-de0f9565c971-S5\frameworks\5c83c39f-75a0-4f38-9e47-633767b47976-\executors\windemo.10cc3e54-49ce-11e6-a2a2-08002786cbbf\runs\031878d5-32fa-41ed-8b23-d0d91fe34f05\tasks\windemo.10cc3e54-49ce-11e6-a2a2-08002786cbbf':
>  No such file or directory
> We debug the issue and found issue with fine name reached to max filepath 
> length: 
> E:\agentlogs\meta\slaves\803264d5-8f2d-46bb-8019-de0f9565c971-S5\frameworks\5c83c39f-75a0-4f38-9e47-633767b47976-\executors\windemo.10cc3e54-49ce-11e6-a2a2-08002786cbbf\runs\031878d5-32fa-41ed-8b23-d0d91fe34f05\tasks\windemo.10cc3e54-49ce-11e6-a2a2-08002786cbbf
> I think path length limit in windows is 256 which is revoked and this made 
> service to be crashed while this will work fine for linux mesos agents so we 
> may have to control current UUID.toString() method to be shorter



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6771) Add and vet `install` target

2016-12-09 Thread Alex Clemmer (JIRA)
Alex Clemmer created MESOS-6771:
---

 Summary: Add and vet `install` target
 Key: MESOS-6771
 URL: https://issues.apache.org/jira/browse/MESOS-6771
 Project: Mesos
  Issue Type: Bug
  Components: cmake
Reporter: Alex Clemmer
Assignee: Alex Clemmer


We need to be able to do something like `make install` and while CMake comes 
with something like this out of the box, we do need to vet it (at the very 
least).

As a general note (as jpeach suggests), we should take care to not generate a 
separate binary for `mesos-slave` and `mesos-agent`. If it exists at all, it 
should be a symlink generated upon `install`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6770) Handle SSL socket read and write events separately

2016-12-09 Thread Greg Mann (JIRA)
Greg Mann created MESOS-6770:


 Summary: Handle SSL socket read and write events separately
 Key: MESOS-6770
 URL: https://issues.apache.org/jira/browse/MESOS-6770
 Project: Mesos
  Issue Type: Improvement
  Components: libprocess
Reporter: Greg Mann


The SSL socket code in libprocess currently does not distinguish between events 
received during reading and those received during writing. However, libevent 
does provide event flags with this information: {{BEV_EVENT_READING}} and 
{{BEV_EVENT_WRITING}}. We should make use of these flags to handle read and 
write events differently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6769) The server does not close it's end of the connection after returning a response to a streaming request.

2016-12-09 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-6769:
--
Description: 
Consider this scenario, 
- The client starts to send a streaming request to the agent with the 
{{Connection: close}} header set. This means that the client is relying on the 
server to close it's end of the connection after sending the response.
- If the request failed on the server i.e., some validation errors. The server 
sends the response but does not close it's end of the socket.
- Some client libraries e.g., Python Requests rely on the server to close its 
end of the socket after sending the response. Otherwise, the connection just 
hangs on the client when it has no more streaming data to send in such cases.

Libprocess should close its end of the connection after sending the response in 
such cases.

  was:
Consider this scenario, 
- The client starts to send a streaming request to the agent with the 
{{Connection: close}} header set. This means that the client is relying on the 
server to close it's end of the connection after sending the response.
- If the request failed on the server i.e., some validation errors. The server 
sends the response but does not close it's end of the socket.
- Some client libraries e.g., Python Requests rely on the server to close its 
end of the socket after sending the response. Otherwise, the connection just 
hangs on the client when it has no more streaming data to send in such cases.

Libprocess should close its end of the 


> The server does not close it's end of the connection after returning a 
> response to a streaming request.
> ---
>
> Key: MESOS-6769
> URL: https://issues.apache.org/jira/browse/MESOS-6769
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>  Labels: libprocess, mesosphere
>
> Consider this scenario, 
> - The client starts to send a streaming request to the agent with the 
> {{Connection: close}} header set. This means that the client is relying on 
> the server to close it's end of the connection after sending the response.
> - If the request failed on the server i.e., some validation errors. The 
> server sends the response but does not close it's end of the socket.
> - Some client libraries e.g., Python Requests rely on the server to close its 
> end of the socket after sending the response. Otherwise, the connection just 
> hangs on the client when it has no more streaming data to send in such cases.
> Libprocess should close its end of the connection after sending the response 
> in such cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6769) The server does not close it's end of the connection after returning a response to a streaming request.

2016-12-09 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-6769:
-

 Summary: The server does not close it's end of the connection 
after returning a response to a streaming request.
 Key: MESOS-6769
 URL: https://issues.apache.org/jira/browse/MESOS-6769
 Project: Mesos
  Issue Type: Bug
Reporter: Anand Mazumdar


Consider this scenario, 
- The client starts to send a streaming request to the agent with the 
{{Connection: close}} header set. This means that the client is relying on the 
server to close it's end of the connection after sending the response.
- If the request failed on the server i.e., some validation errors. The server 
sends the response but does not close it's end of the socket.
- Some client libraries e.g., Python Requests rely on the server to close its 
end of the socket after sending the response. Otherwise, the connection just 
hangs on the client when it has no more streaming data to send in such cases.

Libprocess should close its end of the 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6350) Raise minimum required cmake version

2016-12-09 Thread Alex Clemmer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Clemmer updated MESOS-6350:

Labels: mesosphere microsoft tech-debt  (was: mesosphere tech-debt)

> Raise minimum required cmake version
> 
>
> Key: MESOS-6350
> URL: https://issues.apache.org/jira/browse/MESOS-6350
> Project: Mesos
>  Issue Type: Improvement
>  Components: cmake
>Reporter: Benjamin Bannier
>  Labels: mesosphere, microsoft, tech-debt
>
> We currently require at least cmake-2.8 which had its first point release 
> 2010 and last update 2013. Meanwhile upstream is preparing the release of 
> 3.7.0. While cmake support in Mesos is still experimental we should evaluate 
> how much we can increase the minimal required version so we are not locked 
> into an old version lacking desirable features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6768) Introduce a containerizer and executor suitable for scale testing

2016-12-09 Thread Ilya Pronin (JIRA)
Ilya Pronin created MESOS-6768:
--

 Summary: Introduce a containerizer and executor suitable for scale 
testing
 Key: MESOS-6768
 URL: https://issues.apache.org/jira/browse/MESOS-6768
 Project: Mesos
  Issue Type: Improvement
  Components: containerization
Reporter: Ilya Pronin
Assignee: Ilya Pronin
Priority: Minor


The {{ScaleTestContainerizer}} and {{ScaleTestExecutor}} implement the basic 
behaviors of a containerizer and executor while consuming no resources. They 
are intended for use in scale testing a large number of agents interacting with 
a master.

The containerizer does no actual containerization nor does it actually even run 
the taks. It simply runs a {{ScaleTestExecutor}} which sends {{TASK_RUNNING}} 
on {{Executor::launchTask()}} and {{TASK_KILLED}} on {{Executor::killTask()}}.

Because no resources are actually consumed (but are accounted for by the agent) 
any amount of resources can be offered by the agent through {{--resources}} so 
that any desired number of tasks can be run.

Enable it with {{--containerizers=scale-test}}.

[~jieyu], can you shepherd this, please?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5387) mesos-execute exit status is always success

2016-12-09 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-5387:
--
Shepherd: Till Toenshoff

> mesos-execute exit status is always success
> ---
>
> Key: MESOS-5387
> URL: https://issues.apache.org/jira/browse/MESOS-5387
> Project: Mesos
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 0.28.1
>Reporter: Luca Bruno
>
> mesos-execute should be able to return an exit status based on the status of 
> the task. Currently it always exists with 0.
> It's very hard for a caller to know the status of the task, for example from 
> a bash script calling mesos-execute.
> I believe mesos-execute is a simple but very useful cli tool for one-shot 
> tasks, and as such deserves more attention.
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6743) Docker executor hangs forever if `docker stop` fails.

2016-12-09 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-6743:

Description: 
If {{docker stop}} finishes with an error status, the executor should catch 
this and react instead of indefinitely waiting for {{reaped}} to return.

An interesting question is _how_ to react. Here are possible solutions.

1. Retry {{docker stop}}. In this case it is unclear how many times to retry 
and what to do if {{docker stop}} continues to fail.

2. Unmark task as {{killed}}. This will allow frameworks to retry the kill. 
However, in this case it is unclear what status updates we should send: 
{{TASK_KILLING}} for every kill retry? an extra update when we failed to kill a 
task? or set a specific reason in {{TASK_KILLING}}?

3. Clean up and exit. In this case we should make sure the task container is 
killed or notify the framework and the operator that the container may still be 
running.

  was:
If {{docker stop}} finishes with an error status, the executor should catch 
this and react instead of indefinitely waiting for {{reaped}} to return.

An interesting question is _how_ to react. Here are possible solutions.

1. Retry {{docker stop}}. In this case it is unclear how many times to retry 
and what to do if {{docker stop}} continues to fail.

2. Unmark task as {{killed}}. This will allow frameworks to retry the kill. 
However, in this case it is unclear what status updates we should send: 
{TASK_KILLING}} for every kill retry? an extra update when we failed to kill a 
task? or set a specific reason in {{TASK_KILLING}}?

3. Clean up and exit. In this case we should make sure the task container is 
killed or notify the framework and the operator that the container may still be 
running.


> Docker executor hangs forever if `docker stop` fails.
> -
>
> Key: MESOS-6743
> URL: https://issues.apache.org/jira/browse/MESOS-6743
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.0.1, 1.1.0
>Reporter: Alexander Rukletsov
>  Labels: mesosphere
>
> If {{docker stop}} finishes with an error status, the executor should catch 
> this and react instead of indefinitely waiting for {{reaped}} to return.
> An interesting question is _how_ to react. Here are possible solutions.
> 1. Retry {{docker stop}}. In this case it is unclear how many times to retry 
> and what to do if {{docker stop}} continues to fail.
> 2. Unmark task as {{killed}}. This will allow frameworks to retry the kill. 
> However, in this case it is unclear what status updates we should send: 
> {{TASK_KILLING}} for every kill retry? an extra update when we failed to kill 
> a task? or set a specific reason in {{TASK_KILLING}}?
> 3. Clean up and exit. In this case we should make sure the task container is 
> killed or notify the framework and the operator that the container may still 
> be running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6743) Docker executor hangs forever if `docker stop` fails.

2016-12-09 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-6743:

Description: 
If {{docker stop}} finishes with an error status, the executor should catch 
this and react instead of indefinitely waiting for {{reaped}} to return.

An interesting question is _how_ to react. Here are possible solutions.

1. Retry {{docker stop}}. In this case it is unclear how many times to retry 
and what to do if {{docker stop}} continues to fail.

2. Unmark task as {{killed}}. This will allow frameworks to retry the kill. 
However, in this case it is unclear what status updates we should send: 
{TASK_KILLING}} for every kill retry? an extra update when we failed to kill a 
task? or set a specific reason in {{TASK_KILLING}}?

3. Clean up and exit. In this case we should make sure the task container is 
killed or notify the framework and the operator that the container may still be 
running.

  was:
If {{docker stop}} finishes with an error status, the executor should catch 
this and react instead of indefinitely waiting for {{reaped}} to return.

An interesting question is _how_ to react. Here are possible solutions.

1. Retry {{docker stop}}. In this case it is unclear how many times to retry 
and what to do if {{docker stop}} continues to fail.

2. Unmark task as {{killed}}. This will allow frameworks to retry the kill. 
However, in this case it is unclear what status updates we should send: 
{[TASK_KILLING}} for every kill retry? an extra update when we failed to kill a 
task? or set a specific reason in {{TASK_KILLING}}?

3. Clean up and exit. In this case we should make sure the task container is 
killed or notify the framework and the operator that the container may still be 
running.


> Docker executor hangs forever if `docker stop` fails.
> -
>
> Key: MESOS-6743
> URL: https://issues.apache.org/jira/browse/MESOS-6743
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.0.1, 1.1.0
>Reporter: Alexander Rukletsov
>  Labels: mesosphere
>
> If {{docker stop}} finishes with an error status, the executor should catch 
> this and react instead of indefinitely waiting for {{reaped}} to return.
> An interesting question is _how_ to react. Here are possible solutions.
> 1. Retry {{docker stop}}. In this case it is unclear how many times to retry 
> and what to do if {{docker stop}} continues to fail.
> 2. Unmark task as {{killed}}. This will allow frameworks to retry the kill. 
> However, in this case it is unclear what status updates we should send: 
> {TASK_KILLING}} for every kill retry? an extra update when we failed to kill 
> a task? or set a specific reason in {{TASK_KILLING}}?
> 3. Clean up and exit. In this case we should make sure the task container is 
> killed or notify the framework and the operator that the container may still 
> be running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6767) Reached unreachable statement at /mesos/src/slave/containerizer/mesos/launch.cpp:766

2016-12-09 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-6767:
---
Description: 
This error message can pop up in unexpected places (e.g. when running a 
LAUNCH_NESTED_CONTAINER_SESSION and an invalid command is passed to it).

We should likely just remove the UNREACHABLE() statement here as it's obviously 
reachable in cases where the command we are trying to launch is not found.

  was:
This error message can pop up in unexpected places (e.g. when running a 
LACUNH_NESTED_CONTAINER_SESSION and an invalid command is passed to it).

We should likely just remove the UNREACHABLE() statement here as it's obviously 
reachable in cases where the command we are trying to launch is not found.


> Reached unreachable statement at 
> /mesos/src/slave/containerizer/mesos/launch.cpp:766
> --
>
> Key: MESOS-6767
> URL: https://issues.apache.org/jira/browse/MESOS-6767
> Project: Mesos
>  Issue Type: Bug
>Reporter: Kevin Klues
>Assignee: Jie Yu
>  Labels: containerizer, mesosphere
>
> This error message can pop up in unexpected places (e.g. when running a 
> LAUNCH_NESTED_CONTAINER_SESSION and an invalid command is passed to it).
> We should likely just remove the UNREACHABLE() statement here as it's 
> obviously reachable in cases where the command we are trying to launch is not 
> found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6767) Reached unreachable statement at /mesos/src/slave/containerizer/mesos/launch.cpp:766

2016-12-09 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-6767:
---
Priority: Blocker  (was: Major)

> Reached unreachable statement at 
> /mesos/src/slave/containerizer/mesos/launch.cpp:766
> --
>
> Key: MESOS-6767
> URL: https://issues.apache.org/jira/browse/MESOS-6767
> Project: Mesos
>  Issue Type: Bug
>Reporter: Kevin Klues
>Assignee: Jie Yu
>Priority: Blocker
>  Labels: containerizer, mesosphere
>
> This error message can pop up in unexpected places (e.g. when running a 
> LAUNCH_NESTED_CONTAINER_SESSION and an invalid command is passed to it).
> We should likely just remove the UNREACHABLE() statement here as it's 
> obviously reachable in cases where the command we are trying to launch is not 
> found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6767) Reached unreachable statement at /mesos/src/slave/containerizer/mesos/launch.cpp:766

2016-12-09 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-6767:
---
Story Points: 1
  Labels: containerizer mesosphere  (was: )

> Reached unreachable statement at 
> /mesos/src/slave/containerizer/mesos/launch.cpp:766
> --
>
> Key: MESOS-6767
> URL: https://issues.apache.org/jira/browse/MESOS-6767
> Project: Mesos
>  Issue Type: Bug
>Reporter: Kevin Klues
>Assignee: Jie Yu
>  Labels: containerizer, mesosphere
>
> This error message can pop up in unexpected places (e.g. when running a 
> LACUNH_NESTED_CONTAINER_SESSION and an invalid command is passed to it).
> We should likely just remove the UNREACHABLE() statement here as it's 
> obviously reachable in cases where the command we are trying to launch is not 
> found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6767) Reached unreachable statement at /mesos/src/slave/containerizer/mesos/launch.cpp:766

2016-12-09 Thread Kevin Klues (JIRA)
Kevin Klues created MESOS-6767:
--

 Summary: Reached unreachable statement at 
/mesos/src/slave/containerizer/mesos/launch.cpp:766
 Key: MESOS-6767
 URL: https://issues.apache.org/jira/browse/MESOS-6767
 Project: Mesos
  Issue Type: Bug
Reporter: Kevin Klues
Assignee: Jie Yu


This error message can pop up in unexpected places (e.g. when running a 
LACUNH_NESTED_CONTAINER_SESSION and an invalid command is passed to it).

We should likely just remove the UNREACHABLE() statement here as it's obviously 
reachable in cases where the command we are trying to launch is not found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-907) Add Kerberos Authentication support

2016-12-09 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15734795#comment-15734795
 ] 

Klaus Ma commented on MESOS-907:


I think Mesos prefer to delegate this feature to user to make a module :).

> Add Kerberos Authentication support
> ---
>
> Key: MESOS-907
> URL: https://issues.apache.org/jira/browse/MESOS-907
> Project: Mesos
>  Issue Type: Story
>  Components: general
>Reporter: Adam B
>Assignee: Tim Anderegg
>  Labels: security, twitter
>
> MESOS-704 added basic authentication support using CRAM-MD5 through SASL. Now 
> we should integrate Kerberos authentication using GSS-API, which is already 
> supported by SASL. Kerberos is a widely-used industry standard authentication 
> service, and integration with Mesos will make it easier for customers to 
> integrate their existing security process with Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-907) Add Kerberos Authentication support

2016-12-09 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15734791#comment-15734791
 ] 

haosdent commented on MESOS-907:


Mesos have not support kerberos.

> Add Kerberos Authentication support
> ---
>
> Key: MESOS-907
> URL: https://issues.apache.org/jira/browse/MESOS-907
> Project: Mesos
>  Issue Type: Story
>  Components: general
>Reporter: Adam B
>Assignee: Tim Anderegg
>  Labels: security, twitter
>
> MESOS-704 added basic authentication support using CRAM-MD5 through SASL. Now 
> we should integrate Kerberos authentication using GSS-API, which is already 
> supported by SASL. Kerberos is a widely-used industry standard authentication 
> service, and integration with Mesos will make it easier for customers to 
> integrate their existing security process with Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6213) Build failure on macOS Sierra: Protobuf atomics deprecated.

2016-12-09 Thread Elve Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15734764#comment-15734764
 ] 

Elve Xu commented on MESOS-6213:


also fail on version :1.1.0 with macOS Sierra

> Build failure on macOS Sierra: Protobuf atomics deprecated.
> ---
>
> Key: MESOS-6213
> URL: https://issues.apache.org/jira/browse/MESOS-6213
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Reporter: Charles Allen
>
> Building on OSX is giving the following error.
> {code}
> In file included from 
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops.h:184:
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops_internals_macosx.h:173:9:
>  error: 'OSAtomicCompareAndSwap64Barrier' is deprecated: first
>   deprecated in macOS 10.12 - Use std::atomic_compare_exchange_strong() 
> from  instead [-Werror,-Wdeprecated-declarations]
> if (OSAtomicCompareAndSwap64Barrier(
> ^
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/libkern/OSAtomicDeprecated.h:645:9:
>  note:
>   'OSAtomicCompareAndSwap64Barrier' has been explicitly marked deprecated 
> here
> boolOSAtomicCompareAndSwap64Barrier( int64_t __oldValue, int64_t 
> __newValue,
> ^
> {code}
> Protobuf is not listed as a component so I just set it as {{build}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)