[jira] [Commented] (MESOS-5821) Clean up the billions of compiler warnings on MSVC

2016-09-23 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15518125#comment-15518125
 ] 

Joseph Wu commented on MESOS-5821:
--

{code}
commit 439db8c36c50fd294b2c978cdc877d9bd77301b3
Author: Daniel Pravat 
Date:   Fri Sep 23 18:36:34 2016 -0700

Windows: Fixed warnings in `shell.hpp`.

The `spawn` functions return the `intptr_t`.  This patch deals with
warnings about implicitly casting this to `int`.

Review: https://reviews.apache.org/r/52065/
{code}
{code}
commit 8dcd13d9f4a6023af574e15c1af42b0dd799e847
Author: Daniel Pravat 
Date:   Fri Sep 23 18:43:35 2016 -0700

Windows: Fixed warnings in `windows.hpp`.

The `_write` function takes an `unsigned int` type as the third
argument.  The underlying type of `size_t` depends on the
platform architecture.

Review: https://reviews.apache.org/r/52193/
{code}
{code}
commit 8bfb11f0711826e9cb899c7d162cf47de911c719
Author: Daniel Pravat 
Date:   Fri Sep 23 18:49:27 2016 -0700

Fixed warnings in StatisticsTest.Statistics.

This removes some implicit double-to-float casts.

Review: https://reviews.apache.org/r/52198/
{code}

> Clean up the billions of compiler warnings on MSVC
> --
>
> Key: MESOS-5821
> URL: https://issues.apache.org/jira/browse/MESOS-5821
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Reporter: Alex Clemmer
>Assignee: Daniel Pravat
>  Labels: mesosphere, slave
>
> Clean builds of Mesos on Windows will result in approximately {{5800 
> Warning(s)}} or more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6246) Libprocess links will not generate an ExitedEvent if the socket creation fails

2016-09-23 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517963#comment-15517963
 ] 

Joseph Wu commented on MESOS-6246:
--

| https://reviews.apache.org/r/52180/ | Fix edge case of {{link}} + 
{{ExitedEvent}} |

> Libprocess links will not generate an ExitedEvent if the socket creation fails
> --
>
> Key: MESOS-6246
> URL: https://issues.apache.org/jira/browse/MESOS-6246
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 0.27.3, 0.28.2, 1.0.1
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: libprocess, mesosphere
>
> Noticed this while inspecting nearby code for potential races.
> Normally, when a libprocess actor (the "linkee") links to a remote process, 
> it does the following:
> 1) Create a socket.
> 2) Connect to the remote process (asynchronous).
> 3) Check the connection succeeded.
> If (2) or (3) fail, the linkee will receive a {{ExitedEvent}}, which 
> indicates that the link broke.  In case (1) fails, there is no 
> {{ExitedEvent}}:
> https://github.com/apache/mesos/blob/7c833abbec9c9e4eb51d67f7a8e7a8d0870825f8/3rdparty/libprocess/src/process.cpp#L1558-L1562



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6245) Driver based schedulers performing explicit acknowledgements cannot acknowledge updates from HTTP based executors.

2016-09-23 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-6245:
--
Description: It seems that the agent code sets 
{{StatusUpdate}}->{{slave_id}} but does not set the 
{{TaskStatus}}->{{slave_id}} if it's not already set. On the driver, when we 
receive such a status update and if it has explicit ACK enabled, it would pass 
the {{TaskStatus}} to the scheduler. But, the scheduler has no way of acking 
this update due to {{slave_id}} not being present. Note that, implicit 
acknowledgements still work since they use the {{slave_id}} from 
{{StatusUpdate}}. Hence, we never noticed this in our tests as all of them use 
implicit acknowledgements on the driver.  (was: It seems that the driver has an 
old check relying on the `PID`. The `PID` is always `UPID()` for HTTP based 
executors. If a scheduler is using explicit acknowledgements, it won't ever be 
able to acknowledge the update since the driver would clean up the {{uuid}} 
field!

Note that all our tests use implicit acknowledgements and we never got around 
to catching this issue till Marathon started using the HTTP based executors.)

> Driver based schedulers performing explicit acknowledgements cannot 
> acknowledge updates from HTTP based executors.
> --
>
> Key: MESOS-6245
> URL: https://issues.apache.org/jira/browse/MESOS-6245
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
> Fix For: 1.1.0, 1.0.2
>
>
> It seems that the agent code sets {{StatusUpdate}}->{{slave_id}} but does not 
> set the {{TaskStatus}}->{{slave_id}} if it's not already set. On the driver, 
> when we receive such a status update and if it has explicit ACK enabled, it 
> would pass the {{TaskStatus}} to the scheduler. But, the scheduler has no way 
> of acking this update due to {{slave_id}} not being present. Note that, 
> implicit acknowledgements still work since they use the {{slave_id}} from 
> {{StatusUpdate}}. Hence, we never noticed this in our tests as all of them 
> use implicit acknowledgements on the driver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6234) Potential socket leak during Zookeeper network changes

2016-09-23 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517678#comment-15517678
 ] 

Joseph Wu edited comment on MESOS-6234 at 9/24/16 12:12 AM:


| https://reviews.apache.org/r/52181/ | Prevent relinking races -> leaks |


was (Author: kaysoky):
| https://reviews.apache.org/r/52180/ | Fix edge case of {{link}} + 
{{ExitedEvent}} |
| https://reviews.apache.org/r/52181/ | Prevent relinking races -> leaks |

> Potential socket leak during Zookeeper network changes
> --
>
> Key: MESOS-6234
> URL: https://issues.apache.org/jira/browse/MESOS-6234
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 0.28.3, 1.0.0
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: libprocess, mesosphere
> Fix For: 0.28.3, 1.1.0, 1.0.2
>
>
> There is a potential leak when using the version of {{link}} with 
> {{RemoteConnection::RECONNECT}}.  This was originally implemented to refresh 
> links during master recovery. 
> The leak occurs here:
> https://github.com/apache/mesos/blob/5e23edd513caec51ce3e94b3d785d714052525e8/3rdparty/libprocess/src/process.cpp#L1592-L1597
> ^ The comment here is not correct, as that is *not* the last reference to the 
> {{existing}} socket.
> At this point, the {{existing}} socket may be a perfectly valid link.  Valid 
> links will all have a reference inside a callback loop created here:
> https://github.com/apache/mesos/blob/5e23edd513caec51ce3e94b3d785d714052525e8/3rdparty/libprocess/src/process.cpp#L1503-L1509
> -
> We need to stop the callback loop but prevent any resulting {{ExitedEvents}} 
> from being sent due to stopping the callback loop.  This means discarding the 
> callback loop's future after we have called {{swap_implementing_socket}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6245) Driver based schedulers performing explicit acknowledgements cannot acknowledge updates from HTTP based executors.

2016-09-23 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-6245:
--
Summary: Driver based schedulers performing explicit acknowledgements 
cannot acknowledge updates from HTTP based executors.  (was: Schedulers 
performing explicit acknowledgements cannot acknowledge updates from HTTP based 
executors.)

> Driver based schedulers performing explicit acknowledgements cannot 
> acknowledge updates from HTTP based executors.
> --
>
> Key: MESOS-6245
> URL: https://issues.apache.org/jira/browse/MESOS-6245
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
> Fix For: 1.1.0, 1.0.2
>
>
> It seems that the driver has an old check relying on the `PID`. The `PID` is 
> always `UPID()` for HTTP based executors. If a scheduler is using explicit 
> acknowledgements, it won't ever be able to acknowledge the update since the 
> driver would clean up the {{uuid}} field!
> Note that all our tests use implicit acknowledgements and we never got around 
> to catching this issue till Marathon started using the HTTP based executors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6245) Schedulers performing explicit acknowledgements cannot acknowledge updates from HTTP based executors.

2016-09-23 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-6245:
-

 Summary: Schedulers performing explicit acknowledgements cannot 
acknowledge updates from HTTP based executors.
 Key: MESOS-6245
 URL: https://issues.apache.org/jira/browse/MESOS-6245
 Project: Mesos
  Issue Type: Bug
Reporter: Anand Mazumdar
Assignee: Anand Mazumdar
 Fix For: 1.1.0, 1.0.2


It seems that the driver has an old check relying on the `PID`. The `PID` is 
always `UPID()` for HTTP based executors. If a scheduler is using explicit 
acknowledgements, it won't ever be able to acknowledge the update since the 
driver would clean up the {{uuid}} field!

Note that all our tests use implicit acknowledgements and we never got around 
to catching this issue till Marathon started using the HTTP based executors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6233) Master CHECK fails during recovery while relinking to other masters

2016-09-23 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517685#comment-15517685
 ] 

Joseph Wu edited comment on MESOS-6233 at 9/23/16 9:58 PM:
---

| https://reviews.apache.org/r/52182/ | Fix relink race that causes this check 
failure |


was (Author: kaysoky):
| https://reviews.apache.org/r/52180/ | Fix relink race that causes this check 
failure |

> Master CHECK fails during recovery while relinking to other masters
> ---
>
> Key: MESOS-6233
> URL: https://issues.apache.org/jira/browse/MESOS-6233
> Project: Mesos
>  Issue Type: Bug
>  Components: general, master
>Affects Versions: 0.28.3, 1.0.1
>Reporter: Alex Kaplan
>Assignee: Joseph Wu
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 0.28.3, 1.1.0, 1.0.2
>
>
> Mesos Version: 1.0.1
> OS: CoreOS 1068
> {code}
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: I0922 20:05:17.948004 
> 104495 manager.cpp:795] overlay-master in `RECOVERING` state . Hence, not 
> sending an update to agentoverlay-agent@10.4.4.1:5051
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: F0922 20:05:17.948120 
> 104529 process.cpp:2243] Check failed: sockets.count(from_fd) > 0
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: *** Check failure 
> stack trace: ***
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc1908829fd  google::LogMessage::Fail()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc19088482d  google::LogMessage::SendToLog()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc1908825ec  google::LogMessage::Flush()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc190885129  google::LogMessageFatal::~LogMessageFatal()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc1908171dd  process::SocketManager::swap_implementing_socket()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc19081aa90  process::SocketManager::link_connect()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc1908227f9  
> _ZNSt17_Function_handlerIFvRKN7process6FutureI7NothingEEEZNKS3_5onAnyISt5_BindIFSt7_Mem_fnIMNS0_13SocketManagerEFvS5_NS0_7network6SocketERKNS0_4UPIDEEEPSA_St12_PlaceholderILi1EESC_SD_EEvEES5_OT_NS3_6PreferEEUlS5_E_E9_M_invokeERKSt9_Any_dataS5_
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @   
> 0x41eb26  
> _ZN7process8internal3runISt8functionIFvRKNS_6FutureI7NothingJRS5_EEEvRKSt6vectorIT_SaISC_EEDpOT0_
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @   
> 0x42a36f  process::Future<>::fail()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc19085283c  process::network::LibeventSSLSocketImpl::event_callback()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc190852f17  process::network::LibeventSSLSocketImpl::event_callback()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc18d616631  bufferevent_run_deferred_callbacks_locked
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc18d60cc5d  event_base_loop
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc190865a1d  process::EventLoop::run()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc18eeabd73  (unknown)
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc18e6a852c  (unknown)
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc18e3e61dd  (unknown)
> Sep 22 20:05:18 node-44a84215535c systemd[1]: 
> [0;1;39mdcos-mesos-master.service: Main process exited, code=killed, 
> status=6/ABRT
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6233) Master CHECK fails during recovery while relinking to other masters

2016-09-23 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517685#comment-15517685
 ] 

Joseph Wu commented on MESOS-6233:
--

| https://reviews.apache.org/r/52180/ | Fix relink race that causes this check 
failure |

> Master CHECK fails during recovery while relinking to other masters
> ---
>
> Key: MESOS-6233
> URL: https://issues.apache.org/jira/browse/MESOS-6233
> Project: Mesos
>  Issue Type: Bug
>  Components: general, master
>Affects Versions: 0.28.3, 1.0.1
>Reporter: Alex Kaplan
>Assignee: Joseph Wu
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 0.28.3, 1.1.0, 1.0.2
>
>
> Mesos Version: 1.0.1
> OS: CoreOS 1068
> {code}
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: I0922 20:05:17.948004 
> 104495 manager.cpp:795] overlay-master in `RECOVERING` state . Hence, not 
> sending an update to agentoverlay-agent@10.4.4.1:5051
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: F0922 20:05:17.948120 
> 104529 process.cpp:2243] Check failed: sockets.count(from_fd) > 0
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: *** Check failure 
> stack trace: ***
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc1908829fd  google::LogMessage::Fail()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc19088482d  google::LogMessage::SendToLog()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc1908825ec  google::LogMessage::Flush()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc190885129  google::LogMessageFatal::~LogMessageFatal()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc1908171dd  process::SocketManager::swap_implementing_socket()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc19081aa90  process::SocketManager::link_connect()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc1908227f9  
> _ZNSt17_Function_handlerIFvRKN7process6FutureI7NothingEEEZNKS3_5onAnyISt5_BindIFSt7_Mem_fnIMNS0_13SocketManagerEFvS5_NS0_7network6SocketERKNS0_4UPIDEEEPSA_St12_PlaceholderILi1EESC_SD_EEvEES5_OT_NS3_6PreferEEUlS5_E_E9_M_invokeERKSt9_Any_dataS5_
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @   
> 0x41eb26  
> _ZN7process8internal3runISt8functionIFvRKNS_6FutureI7NothingJRS5_EEEvRKSt6vectorIT_SaISC_EEDpOT0_
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @   
> 0x42a36f  process::Future<>::fail()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc19085283c  process::network::LibeventSSLSocketImpl::event_callback()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc190852f17  process::network::LibeventSSLSocketImpl::event_callback()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc18d616631  bufferevent_run_deferred_callbacks_locked
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc18d60cc5d  event_base_loop
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc190865a1d  process::EventLoop::run()
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc18eeabd73  (unknown)
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc18e6a852c  (unknown)
> Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ 
> 0x7fc18e3e61dd  (unknown)
> Sep 22 20:05:18 node-44a84215535c systemd[1]: 
> [0;1;39mdcos-mesos-master.service: Main process exited, code=killed, 
> status=6/ABRT
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6234) Potential socket leak during Zookeeper network changes

2016-09-23 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517678#comment-15517678
 ] 

Joseph Wu commented on MESOS-6234:
--

| https://reviews.apache.org/r/52180/ | Fix edge case of {{link}} + 
{{ExitedEvent}} |
| https://reviews.apache.org/r/52181/ | Prevent relinking races -> leaks |

> Potential socket leak during Zookeeper network changes
> --
>
> Key: MESOS-6234
> URL: https://issues.apache.org/jira/browse/MESOS-6234
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 0.28.3, 1.0.0
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: libprocess, mesosphere
> Fix For: 0.28.3, 1.1.0, 1.0.2
>
>
> There is a potential leak when using the version of {{link}} with 
> {{RemoteConnection::RECONNECT}}.  This was originally implemented to refresh 
> links during master recovery. 
> The leak occurs here:
> https://github.com/apache/mesos/blob/5e23edd513caec51ce3e94b3d785d714052525e8/3rdparty/libprocess/src/process.cpp#L1592-L1597
> ^ The comment here is not correct, as that is *not* the last reference to the 
> {{existing}} socket.
> At this point, the {{existing}} socket may be a perfectly valid link.  Valid 
> links will all have a reference inside a callback loop created here:
> https://github.com/apache/mesos/blob/5e23edd513caec51ce3e94b3d785d714052525e8/3rdparty/libprocess/src/process.cpp#L1503-L1509
> -
> We need to stop the callback loop but prevent any resulting {{ExitedEvents}} 
> from being sent due to stopping the callback loop.  This means discarding the 
> callback loop's future after we have called {{swap_implementing_socket}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6244) Add support for streaming HTTP request bodies in libprocess.

2016-09-23 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-6244:
--

 Summary: Add support for streaming HTTP request bodies in 
libprocess.
 Key: MESOS-6244
 URL: https://issues.apache.org/jira/browse/MESOS-6244
 Project: Mesos
  Issue Type: Improvement
  Components: libprocess
Reporter: Benjamin Mahler


We currently have support for streaming responses. See MESOS-2438.  Servers can 
start sending the response body before the body is complete. Clients can start 
reading a response before the body is complete. This is an optimization for 
large responses and is a requirement for infinite "streaming" style endpoints.

We currently do not have support for streaming requests. This would allow a 
client to stream a large or infinite request body to the server without having 
to have the complete body in hand, and it would allow a server to read request 
bodies before they are have been completely received over the connection.

This is a requirement if we want to allow clients to "stream" data into a 
server, i.e. an infinite request body.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3454) Remove duplicated logic in Flags::load

2016-09-23 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517541#comment-15517541
 ] 

Michael Park commented on MESOS-3454:
-

[~greggomann] I can be the shepherd but I'll have to schedule to review it in a 
few weeks time. If you want to put together a patch until then, and have 
discussions, I'm happy to help.

> Remove duplicated logic in Flags::load
> --
>
> Key: MESOS-3454
> URL: https://issues.apache.org/jira/browse/MESOS-3454
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Klaus Ma
>Priority: Minor
>
> In {{flags.hpp}}, there are two functions with almost the same logic; this 
> ticket is used to merge the duplicated part.
> {code}
> inline Try FlagsBase::load(
> const Option& prefix,
> int* argc,
> char*** argv,
> bool unknowns,
> bool duplicates)
> ...
> inline Try FlagsBase::load(
> const Option& prefix,
> int argc,
> const char* const *argv,
> bool unknowns,
> bool duplicates)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6243) Expose failures and unknown container cases from Containerizer::destroy.

2016-09-23 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-6243:
--

 Summary: Expose failures and unknown container cases from 
Containerizer::destroy.
 Key: MESOS-6243
 URL: https://issues.apache.org/jira/browse/MESOS-6243
 Project: Mesos
  Issue Type: Improvement
  Components: containerization
Reporter: Benjamin Mahler
Assignee: Benjamin Mahler


Currently the callers of `destroy` cannot determine if the call
succeeds or fails (without a secondary call to `wait()`).

This also allows the caller to distinguish between a failure and
waiting on an unknown container. This is important for the upcoming
agent child container API, as the end-user would benefit from the
distinction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6118) Agent would crash with docker container tasks due to host mount table read.

2016-09-23 Thread Jamie Briant (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517376#comment-15517376
 ] 

Jamie Briant commented on MESOS-6118:
-

I can't give you any code, but if you can give me a version that dumps the 
output to a file rather than the logging (because its the logging that's 
truncating), then I'll be happy to run it.

> Agent would crash with docker container tasks due to host mount table read.
> ---
>
> Key: MESOS-6118
> URL: https://issues.apache.org/jira/browse/MESOS-6118
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 1.0.1
> Environment: Build: 2016-08-26 23:06:27 by centos
> Version: 1.0.1
> Git tag: 1.0.1
> Git SHA: 3611eb0b7eea8d144e9b2e840e0ba16f2f659ee3
> systemd version `219` detected
> Inializing systemd state
> Created systemd slice: `/run/systemd/system/mesos_executors.slice`
> Started systemd slice `mesos_executors.slice`
> Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
>  Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
> Linux ip-10-254-192-40 3.10.0-327.28.3.el7.x86_64 #1 SMP Thu Aug 18 19:05:49 
> UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Jamie Briant
>Assignee: Kevin Klues
>Priority: Critical
>  Labels: linux, slave
> Fix For: 1.1.0, 1.0.2
>
> Attachments: crashlogfull.log, cycle2.log, cycle3.log, cycle5.log, 
> cycle6.log, slave-crash.log
>
>
> I have a framework which schedules thousands of short running (a few seconds 
> to a few minutes) of tasks, over a period of several minutes. In 1.0.1, the 
> slave process will crash every few minutes (with systemd restarting it).
> Crash is:
> Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: F0901 20:52:23.905678  1232 
> fs.cpp:140] Check failed: !visitedParents.contains(parentId)
> Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: *** Check failure stack trace: 
> ***
> Version 1.0.0 works without this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6242) Expose unknown container case on Containerizer::wait.

2016-09-23 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-6242:
--

 Summary: Expose unknown container case on Containerizer::wait.
 Key: MESOS-6242
 URL: https://issues.apache.org/jira/browse/MESOS-6242
 Project: Mesos
  Issue Type: Improvement
  Components: containerization
Reporter: Benjamin Mahler
Assignee: Benjamin Mahler


This allows the caller to distinguish between a failure and waiting on an 
unknown container.

This is important for the upcoming agent nested container API, as the end-user 
would benefit from the distinction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6241) Add agent::Call / agent::Response API for managing nested containers.

2016-09-23 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-6241:
--

 Summary: Add agent::Call / agent::Response API for managing nested 
containers.
 Key: MESOS-6241
 URL: https://issues.apache.org/jira/browse/MESOS-6241
 Project: Mesos
  Issue Type: Task
  Components: HTTP API, slave
Reporter: Benjamin Mahler
Assignee: Benjamin Mahler


In order to manage nested containers from executors or from tooling, we'll need 
to have an API for managing nested containers.

Per the [design 
doc|https://docs.google.com/document/d/1FtcyQkDfGp-bPHTW4pUoqQCgVlPde936bo-IIENO_ho/]
 we will start with the following:

* Launch: create a new nested container underneath the parent container.
* Wait: wait for the nested container to terminate.
* Kill: kill a non-terminal nested container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6236) Launch subprocesses associated with specified namespaces.

2016-09-23 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517296#comment-15517296
 ] 

Jie Yu commented on MESOS-6236:
---

That means we need to move ns related functions from src/linux/ns.hpp to stout.

> Launch subprocesses associated with specified namespaces.
> -
>
> Key: MESOS-6236
> URL: https://issues.apache.org/jira/browse/MESOS-6236
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>  Labels: mesosphere
> Fix For: 1.1.0
>
>
> Currently there is no standard way in Mesos to launch a child process in a 
> different namespace (e.g. {{net}}, {{mnt}}). A user may leverage 
> {{Subprocess}} and provide its own {{clone}} callback, but this approach is 
> error-prone.
> One possible solution is to implement a {{Subprocess}}' child hook. In 
> [MESOS-5070|https://issues.apache.org/jira/browse/MESOS-5070], we have 
> introduced a child hook framework in subprocess and implemented three child 
> hooks {{CHDIR}}, {{SETSID}} and {{SUPERVISOR}}. We suggest to introduce 
> another child hook {{SETNS}} so that other components (e.g., health check) 
> can call it to enter the namespaces of a specific process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6240) Allow executor/agent communication over domain sockets and named PIPES

2016-09-23 Thread Avinash Sridharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan updated MESOS-6240:
-
Summary: Allow executor/agent communication over domain sockets and named 
PIPES  (was: Allow executor/agent communication over domain socker and named 
PIPES)

> Allow executor/agent communication over domain sockets and named PIPES
> --
>
> Key: MESOS-6240
> URL: https://issues.apache.org/jira/browse/MESOS-6240
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
> Environment: Linux and Windows
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> Currently, the executor agent communication happens specifically over TCP 
> sockets. This works fine in most cases, but specifically for the 
> `MesosContainerizer` when containers are running on CNI networks, this mode 
> of communication starts imposing constraints on the CNI network. Since, now 
> there has to connectivity between the CNI network  (on which the executor is 
> running) and the agent. Introducing paths from a CNI network to the 
> underlying agent, at best, creates headaches for operators and at worst 
> introduces serious security holes in the network, since it is breaking the 
> isolation between the container CNI network and the host network (on which 
> the agent is running).
> In order to simplify/strengthen deployment of Mesos containers on CNI 
> networks we therefore need to move away from using TCP/IP sockets for 
> executor/agent communication. Since, executor and agent are guaranteed to run 
> on the same host, the above problems can be resolved if, for the 
> `MesosContainerizer`, we use UNIX domain sockets or named pipes instead of 
> TCP/IP sockets for the executor/agent communication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6240) Allow executor/agent communication over domain socker and named PIPES

2016-09-23 Thread Avinash Sridharan (JIRA)
Avinash Sridharan created MESOS-6240:


 Summary: Allow executor/agent communication over domain socker and 
named PIPES
 Key: MESOS-6240
 URL: https://issues.apache.org/jira/browse/MESOS-6240
 Project: Mesos
  Issue Type: Improvement
  Components: containerization
 Environment: Linux and Windows
Reporter: Avinash Sridharan
Assignee: Avinash Sridharan


Currently, the executor agent communication happens specifically over TCP 
sockets. This works fine in most cases, but specifically for the 
`MesosContainerizer` when containers are running on CNI networks, this mode of 
communication starts imposing constraints on the CNI network. Since, now there 
has to connectivity between the CNI network  (on which the executor is running) 
and the agent. Introducing paths from a CNI network to the underlying agent, at 
best, creates headaches for operators and at worst introduces serious security 
holes in the network, since it is breaking the isolation between the container 
CNI network and the host network (on which the agent is running).

In order to simplify/strengthen deployment of Mesos containers on CNI networks 
we therefore need to move away from using TCP/IP sockets for executor/agent 
communication. Since, executor and agent are guaranteed to run on the same 
host, the above problems can be resolved if, for the `MesosContainerizer`, we 
use UNIX domain sockets or named pipes instead of TCP/IP sockets for the 
executor/agent communication.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6118) Agent would crash with docker container tasks due to host mount table read.

2016-09-23 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517212#comment-15517212
 ] 

Kevin Klues commented on MESOS-6118:


Ack




-- 
~Kevin


> Agent would crash with docker container tasks due to host mount table read.
> ---
>
> Key: MESOS-6118
> URL: https://issues.apache.org/jira/browse/MESOS-6118
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 1.0.1
> Environment: Build: 2016-08-26 23:06:27 by centos
> Version: 1.0.1
> Git tag: 1.0.1
> Git SHA: 3611eb0b7eea8d144e9b2e840e0ba16f2f659ee3
> systemd version `219` detected
> Inializing systemd state
> Created systemd slice: `/run/systemd/system/mesos_executors.slice`
> Started systemd slice `mesos_executors.slice`
> Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
>  Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
> Linux ip-10-254-192-40 3.10.0-327.28.3.el7.x86_64 #1 SMP Thu Aug 18 19:05:49 
> UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Jamie Briant
>Assignee: Kevin Klues
>Priority: Critical
>  Labels: linux, slave
> Fix For: 1.1.0, 1.0.2
>
> Attachments: crashlogfull.log, cycle2.log, cycle3.log, cycle5.log, 
> cycle6.log, slave-crash.log
>
>
> I have a framework which schedules thousands of short running (a few seconds 
> to a few minutes) of tasks, over a period of several minutes. In 1.0.1, the 
> slave process will crash every few minutes (with systemd restarting it).
> Crash is:
> Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: F0901 20:52:23.905678  1232 
> fs.cpp:140] Check failed: !visitedParents.contains(parentId)
> Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: *** Check failure stack trace: 
> ***
> Version 1.0.0 works without this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6239) Fix warnings and errors produced by new hardened CXXFLAGS

2016-09-23 Thread Aaron Wood (JIRA)
Aaron Wood created MESOS-6239:
-

 Summary: Fix warnings and errors produced by new hardened CXXFLAGS
 Key: MESOS-6239
 URL: https://issues.apache.org/jira/browse/MESOS-6239
 Project: Mesos
  Issue Type: Improvement
Reporter: Aaron Wood
Assignee: Aaron Wood
Priority: Minor


Most of the new warnings/errors come from libprocess/stout as there were never 
any CXXFLAGS propagated to them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6118) Agent would crash with docker container tasks due to host mount table read.

2016-09-23 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517186#comment-15517186
 ] 

Jie Yu commented on MESOS-6118:
---

Thanks! We definitely didn't handle this case. Will make sure a fix to this 
land in 1.0.2.

> Agent would crash with docker container tasks due to host mount table read.
> ---
>
> Key: MESOS-6118
> URL: https://issues.apache.org/jira/browse/MESOS-6118
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 1.0.1
> Environment: Build: 2016-08-26 23:06:27 by centos
> Version: 1.0.1
> Git tag: 1.0.1
> Git SHA: 3611eb0b7eea8d144e9b2e840e0ba16f2f659ee3
> systemd version `219` detected
> Inializing systemd state
> Created systemd slice: `/run/systemd/system/mesos_executors.slice`
> Started systemd slice `mesos_executors.slice`
> Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
>  Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
> Linux ip-10-254-192-40 3.10.0-327.28.3.el7.x86_64 #1 SMP Thu Aug 18 19:05:49 
> UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Jamie Briant
>Assignee: Kevin Klues
>Priority: Critical
>  Labels: linux, slave
> Fix For: 1.1.0, 1.0.2
>
> Attachments: crashlogfull.log, cycle2.log, cycle3.log, cycle5.log, 
> cycle6.log, slave-crash.log
>
>
> I have a framework which schedules thousands of short running (a few seconds 
> to a few minutes) of tasks, over a period of several minutes. In 1.0.1, the 
> slave process will crash every few minutes (with systemd restarting it).
> Crash is:
> Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: F0901 20:52:23.905678  1232 
> fs.cpp:140] Check failed: !visitedParents.contains(parentId)
> Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: *** Check failure stack trace: 
> ***
> Version 1.0.0 works without this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6229) Default to using hardened compilation flags

2016-09-23 Thread Aaron Wood (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517039#comment-15517039
 ] 

Aaron Wood edited comment on MESOS-6229 at 9/23/16 5:48 PM:


Looks like there will need to be some fixes made ahead of time before this 
patch goes in (probably many more than this one):

/bin/sh ../../libtool  --tag=CXX   --mode=compile g++ -DPACKAGE_NAME=\"mesos\" 
-DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"1.1.0\" 
-DPACKAGE_STRING=\"mesos\ 1.1.0\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" 
-DPACKAGE=\"mesos\" -DVERSION=\"1.1.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 
-DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 
-DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 
-DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_CXX11=1 
-DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_FTS_H=1 
-DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_LIBCURL=1 -DMESOS_HAS_JAVA=1 
-DHAVE_PYTHON=\"2.7\" -DMESOS_HAS_PYTHON=1 -DHAVE_LIBSASL2=1 
-DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 
-DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBZ=1 -I. -I../../../3rdparty/libprocess  
-DBUILD_DIR=\"/Users//Code/src/mesos/build/3rdparty/libprocess\" 
-I../../../3rdparty/libprocess/include -isystem ../boost-1.53.0 -I../elfio-3.2 
-I../glog-0.3.3/src  -I../http-parser-2.6.2 -I../libev-4.22 
-DPICOJSON_USE_INT64 -D__STDC_FORMAT_MACROS -I../picojson-1.3.0 
-I../../../3rdparty/libprocess/../stout/include  
-I/usr/local/opt/subversion/include/subversion-1 
-I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include 
-I/usr/include/apr-1 -I/usr/include/apr-1.0  -Wall -Werror -Wsign-compare 
-Wformat-security -Wstack-protector -fno-omit-frame-pointer 
-fstack-protector-strong -pie -fPIE -D_FORTIFY_SOURCE=2 -O3 -g1 -O0 
-Wno-unused-local-typedef -std=c++11 -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 
-DGTEST_LANG_CXX11 -MT libprocess_la-reap.lo -MD -MP -MF 
.deps/libprocess_la-reap.Tpo -c -o libprocess_la-reap.lo `test -f 
'src/reap.cpp' || echo '../../../3rdparty/libprocess/'`src/reap.cpp
../../../3rdparty/libprocess/src/profiler.cpp:35:12: error: unused variable 
'PROFILE_FILE' [-Werror,-Wunused-const-variable]
const char PROFILE_FILE[] = "perftools.out";
   ^
In file included from ../../../3rdparty/libprocess/src/profiler.cpp:24:
../../../3rdparty/libprocess/include/process/profiler.hpp:80:8: error: private 
field 'started' is not used [-Werror,-Wunused-private-field]
  bool started;
   ^
2 errors generated.
make[5]: *** [libprocess_la-profiler.lo] Error 1
make[5]: *** Waiting for unfinished jobs
mv -f .deps/libprocess_la-logging.Tpo .deps/libprocess_la-logging.Plo
mv -f .deps/libprocess_la-io.Tpo .deps/libprocess_la-io.Plo
libtool: compile:  g++ -DPACKAGE_NAME=\"mesos\" -DPACKAGE_TARNAME=\"mesos\" 
-DPACKAGE_VERSION=\"1.1.0\" "-DPACKAGE_STRING=\"mesos 1.1.0\"" 
-DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" 
-DVERSION=\"1.1.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 
-DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 
-DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 
-DLT_OBJDIR=\".libs/\" -DHAVE_CXX11=1 -DHAVE_PTHREAD_PRIO_INHERIT=1 
-DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_FTS_H=1 -DHAVE_APR_POOLS_H=1 
-DHAVE_LIBAPR_1=1 -DHAVE_LIBCURL=1 -DMESOS_HAS_JAVA=1 -DHAVE_PYTHON=\"2.7\" 
-DMESOS_HAS_PYTHON=1 -DHAVE_LIBSASL2=1 -DHAVE_SVN_VERSION_H=1 
-DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 
-DHAVE_LIBZ=1 -I. -I../../../3rdparty/libprocess 
-DBUILD_DIR=\"/Users//Code/src/mesos/build/3rdparty/libprocess\" 
-I../../../3rdparty/libprocess/include -isystem ../boost-1.53.0 -I../elfio-3.2 
-I../glog-0.3.3/src -I../http-parser-2.6.2 -I../libev-4.22 -DPICOJSON_USE_INT64 
-D__STDC_FORMAT_MACROS -I../picojson-1.3.0 
-I../../../3rdparty/libprocess/../stout/include 
-I/usr/local/opt/subversion/include/subversion-1 
-I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include 
-I/usr/include/apr-1 -I/usr/include/apr-1.0 -Wall -Werror -Wsign-compare 
-Wformat-security -Wstack-protector -fno-omit-frame-pointer 
-fstack-protector-strong -D_FORTIFY_SOURCE=2 -O3 -g1 -O0 
-Wno-unused-local-typedef -std=c++11 -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 
-DGTEST_LANG_CXX11 -MT libprocess_la-reap.lo -MD -MP -MF 
.deps/libprocess_la-reap.Tpo -c ../../../3rdparty/libprocess/src/reap.cpp  
-fno-common -DPIC -o .libs/libprocess_la-reap.o
In file included from ../../../3rdparty/libprocess/src/process.cpp:108:
../../../3rdparty/libprocess/src/encoder.hpp:278:15: error: comparison of 
integers of different signs: 'off_t' (aka 'long long') and 'size_t' (aka 
'unsigned long') [-Werror,-Wsign-compare]
if (index >= length) {
~ ^  ~~
../../../3rdparty/libprocess/src/process.cpp:3501:23: error: comparison of 
integers of different signs: 

[jira] [Commented] (MESOS-6229) Default to using hardened compilation flags

2016-09-23 Thread Aaron Wood (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517039#comment-15517039
 ] 

Aaron Wood commented on MESOS-6229:
---

Looks like there will need to be some fixes made ahead of time before this 
patch goes in:

```
/bin/sh ../../libtool  --tag=CXX   --mode=compile g++ -DPACKAGE_NAME=\"mesos\" 
-DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"1.1.0\" 
-DPACKAGE_STRING=\"mesos\ 1.1.0\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" 
-DPACKAGE=\"mesos\" -DVERSION=\"1.1.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 
-DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 
-DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 
-DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_CXX11=1 
-DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_FTS_H=1 
-DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_LIBCURL=1 -DMESOS_HAS_JAVA=1 
-DHAVE_PYTHON=\"2.7\" -DMESOS_HAS_PYTHON=1 -DHAVE_LIBSASL2=1 
-DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 
-DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBZ=1 -I. -I../../../3rdparty/libprocess  
-DBUILD_DIR=\"/Users//Code/src/mesos/build/3rdparty/libprocess\" 
-I../../../3rdparty/libprocess/include -isystem ../boost-1.53.0 -I../elfio-3.2 
-I../glog-0.3.3/src  -I../http-parser-2.6.2 -I../libev-4.22 
-DPICOJSON_USE_INT64 -D__STDC_FORMAT_MACROS -I../picojson-1.3.0 
-I../../../3rdparty/libprocess/../stout/include  
-I/usr/local/opt/subversion/include/subversion-1 
-I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include 
-I/usr/include/apr-1 -I/usr/include/apr-1.0  -Wall -Werror -Wsign-compare 
-Wformat-security -Wstack-protector -fno-omit-frame-pointer 
-fstack-protector-strong -pie -fPIE -D_FORTIFY_SOURCE=2 -O3 -g1 -O0 
-Wno-unused-local-typedef -std=c++11 -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 
-DGTEST_LANG_CXX11 -MT libprocess_la-reap.lo -MD -MP -MF 
.deps/libprocess_la-reap.Tpo -c -o libprocess_la-reap.lo `test -f 
'src/reap.cpp' || echo '../../../3rdparty/libprocess/'`src/reap.cpp
../../../3rdparty/libprocess/src/profiler.cpp:35:12: error: unused variable 
'PROFILE_FILE' [-Werror,-Wunused-const-variable]
const char PROFILE_FILE[] = "perftools.out";
   ^
In file included from ../../../3rdparty/libprocess/src/profiler.cpp:24:
../../../3rdparty/libprocess/include/process/profiler.hpp:80:8: error: private 
field 'started' is not used [-Werror,-Wunused-private-field]
  bool started;
   ^
2 errors generated.
make[5]: *** [libprocess_la-profiler.lo] Error 1
make[5]: *** Waiting for unfinished jobs
mv -f .deps/libprocess_la-logging.Tpo .deps/libprocess_la-logging.Plo
mv -f .deps/libprocess_la-io.Tpo .deps/libprocess_la-io.Plo
libtool: compile:  g++ -DPACKAGE_NAME=\"mesos\" -DPACKAGE_TARNAME=\"mesos\" 
-DPACKAGE_VERSION=\"1.1.0\" "-DPACKAGE_STRING=\"mesos 1.1.0\"" 
-DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" 
-DVERSION=\"1.1.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 
-DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 
-DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 
-DLT_OBJDIR=\".libs/\" -DHAVE_CXX11=1 -DHAVE_PTHREAD_PRIO_INHERIT=1 
-DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_FTS_H=1 -DHAVE_APR_POOLS_H=1 
-DHAVE_LIBAPR_1=1 -DHAVE_LIBCURL=1 -DMESOS_HAS_JAVA=1 -DHAVE_PYTHON=\"2.7\" 
-DMESOS_HAS_PYTHON=1 -DHAVE_LIBSASL2=1 -DHAVE_SVN_VERSION_H=1 
-DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 
-DHAVE_LIBZ=1 -I. -I../../../3rdparty/libprocess 
-DBUILD_DIR=\"/Users//Code/src/mesos/build/3rdparty/libprocess\" 
-I../../../3rdparty/libprocess/include -isystem ../boost-1.53.0 -I../elfio-3.2 
-I../glog-0.3.3/src -I../http-parser-2.6.2 -I../libev-4.22 -DPICOJSON_USE_INT64 
-D__STDC_FORMAT_MACROS -I../picojson-1.3.0 
-I../../../3rdparty/libprocess/../stout/include 
-I/usr/local/opt/subversion/include/subversion-1 
-I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include 
-I/usr/include/apr-1 -I/usr/include/apr-1.0 -Wall -Werror -Wsign-compare 
-Wformat-security -Wstack-protector -fno-omit-frame-pointer 
-fstack-protector-strong -D_FORTIFY_SOURCE=2 -O3 -g1 -O0 
-Wno-unused-local-typedef -std=c++11 -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 
-DGTEST_LANG_CXX11 -MT libprocess_la-reap.lo -MD -MP -MF 
.deps/libprocess_la-reap.Tpo -c ../../../3rdparty/libprocess/src/reap.cpp  
-fno-common -DPIC -o .libs/libprocess_la-reap.o
In file included from ../../../3rdparty/libprocess/src/process.cpp:108:
../../../3rdparty/libprocess/src/encoder.hpp:278:15: error: comparison of 
integers of different signs: 'off_t' (aka 'long long') and 'size_t' (aka 
'unsigned long') [-Werror,-Wsign-compare]
if (index >= length) {
~ ^  ~~
../../../3rdparty/libprocess/src/process.cpp:3501:23: error: comparison of 
integers of different signs: 'int' and 'size_type' (aka 'unsigned long') 
[-Werror,-Wsign-compare]
for (int i 

[jira] [Updated] (MESOS-6237) Agent Sandbox inaccessible when using IPv6 address in patch from https://github.com/lava/mesos/tree/bennoe/ipv6

2016-09-23 Thread Lukas Loesche (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Loesche updated MESOS-6237:
-
Summary: Agent Sandbox inaccessible when using IPv6 address in patch from 
https://github.com/lava/mesos/tree/bennoe/ipv6  (was: Slave Sandbox 
inaccessible when using IPv6 address in patch from 
https://github.com/lava/mesos/tree/bennoe/ipv6)

> Agent Sandbox inaccessible when using IPv6 address in patch from 
> https://github.com/lava/mesos/tree/bennoe/ipv6
> ---
>
> Key: MESOS-6237
> URL: https://issues.apache.org/jira/browse/MESOS-6237
> Project: Mesos
>  Issue Type: Bug
>Reporter: Lukas Loesche
>
> Affects https://github.com/lava/mesos/tree/bennoe/ipv6 at commit 
> 2199a24c0b7a782a0381aad8cceacbc95ec3d5c9
> When using IPs instead of hostnames the Agent Sandbox is inaccessible in the 
> Web UI. The problem seems to be that there's no brackets around the IP so it 
> tries to access e.g. http://2001:41d0:1000:ab9:::5051 instead of 
> http://[2001:41d0:1000:ab9::]:5051



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6237) Slave Sandbox inaccessible when using IPv6 address in patch from https://github.com/lava/mesos/tree/bennoe/ipv6

2016-09-23 Thread Lukas Loesche (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Loesche updated MESOS-6237:
-
Description: 
Affects https://github.com/lava/mesos/tree/bennoe/ipv6 at commit 
2199a24c0b7a782a0381aad8cceacbc95ec3d5c9

When using IPs instead of hostnames the Agent Sandbox is inaccessible in the 
Web UI. The problem seems to be that there's no brackets around the IP so it 
tries to access e.g. http://2001:41d0:1000:ab9:::5051 instead of 
http://[2001:41d0:1000:ab9::]:5051


  was:
Affects https://github.com/lava/mesos/tree/bennoe/ipv6 at commit 
2199a24c0b7a782a0381aad8cceacbc95ec3d5c9

When using IPs instead of hostnames the Agent Sandbox is inaccessible. The 
problem seems to be that there's no brackets around the IP so it tries to 
access e.g. http://2001:41d0:1000:ab9:::5051 instead of 
http://[2001:41d0:1000:ab9::]:5051



> Slave Sandbox inaccessible when using IPv6 address in patch from 
> https://github.com/lava/mesos/tree/bennoe/ipv6
> ---
>
> Key: MESOS-6237
> URL: https://issues.apache.org/jira/browse/MESOS-6237
> Project: Mesos
>  Issue Type: Bug
>Reporter: Lukas Loesche
>
> Affects https://github.com/lava/mesos/tree/bennoe/ipv6 at commit 
> 2199a24c0b7a782a0381aad8cceacbc95ec3d5c9
> When using IPs instead of hostnames the Agent Sandbox is inaccessible in the 
> Web UI. The problem seems to be that there's no brackets around the IP so it 
> tries to access e.g. http://2001:41d0:1000:ab9:::5051 instead of 
> http://[2001:41d0:1000:ab9::]:5051



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6237) Slave Sandbox inaccessible when using IPv6 address in patch from https://github.com/lava/mesos/tree/bennoe/ipv6

2016-09-23 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-6237:

Summary: Slave Sandbox inaccessible when using IPv6 address in patch from 
https://github.com/lava/mesos/tree/bennoe/ipv6  (was: Agent Sandbox 
inaccessible when using IPv6 address in patch from 
https://github.com/lava/mesos/tree/bennoe/ipv6)

> Slave Sandbox inaccessible when using IPv6 address in patch from 
> https://github.com/lava/mesos/tree/bennoe/ipv6
> ---
>
> Key: MESOS-6237
> URL: https://issues.apache.org/jira/browse/MESOS-6237
> Project: Mesos
>  Issue Type: Bug
>Reporter: Lukas Loesche
>
> Affects https://github.com/lava/mesos/tree/bennoe/ipv6 at commit 
> 2199a24c0b7a782a0381aad8cceacbc95ec3d5c9
> When using IPs instead of hostnames the Agent Sandbox is inaccessible. The 
> problem seems to be that there's no brackets around the IP so it tries to 
> access e.g. http://2001:41d0:1000:ab9:::5051 instead of 
> http://[2001:41d0:1000:ab9::]:5051



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6237) Agent Sandbox inaccessible when using IPv6 address in patch from https://github.com/lava/mesos/tree/bennoe/ipv6

2016-09-23 Thread Lukas Loesche (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Loesche updated MESOS-6237:
-
Summary: Agent Sandbox inaccessible when using IPv6 address in patch from 
https://github.com/lava/mesos/tree/bennoe/ipv6  (was: Slave Sandbox 
inaccessible when using IPv6 address in patch from 
https://github.com/lava/mesos/tree/bennoe/ipv6)

> Agent Sandbox inaccessible when using IPv6 address in patch from 
> https://github.com/lava/mesos/tree/bennoe/ipv6
> ---
>
> Key: MESOS-6237
> URL: https://issues.apache.org/jira/browse/MESOS-6237
> Project: Mesos
>  Issue Type: Bug
>Reporter: Lukas Loesche
>
> Affects https://github.com/lava/mesos/tree/bennoe/ipv6 at commit 
> 2199a24c0b7a782a0381aad8cceacbc95ec3d5c9
> When using IPs instead of hostnames the Agent Sandbox is inaccessible. The 
> problem seems to be that there's no brackets around the IP so it tries to 
> access e.g. http://2001:41d0:1000:ab9:::5051 instead of 
> http://[2001:41d0:1000:ab9::]:5051



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6238) SSL / libevent support broken in IPv6 patch from https://github.com/lava/mesos/tree/bennoe/ipv6

2016-09-23 Thread Lukas Loesche (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Loesche updated MESOS-6238:
-
Description: 
Affects https://github.com/lava/mesos/tree/bennoe/ipv6 at commit 
2199a24c0b7a782a0381aad8cceacbc95ec3d5c9 

make fails when configure options --enable-ssl --enable-libevent were given.

Error message:
{noformat}
...
...
../../../3rdparty/libprocess/src/process.cpp: In member function ‘void 
process::SocketManager::link_connect(const process::Future&, 
process::network::Socket, const process::UPID&)’:
../../../3rdparty/libprocess/src/process.cpp:1457:25: error: ‘url’ was not 
declared in this scope
   Try ip = url.ip;
 ^
Makefile:997: recipe for target 'libprocess_la-process.lo' failed
make[5]: *** [libprocess_la-process.lo] Error 1
...
...
{noformat}


  was:
make fails when configure options --enable-ssl --enable-libevent were given.

Error message:
{noformat}
...
...
../../../3rdparty/libprocess/src/process.cpp: In member function ‘void 
process::SocketManager::link_connect(const process::Future&, 
process::network::Socket, const process::UPID&)’:
../../../3rdparty/libprocess/src/process.cpp:1457:25: error: ‘url’ was not 
declared in this scope
   Try ip = url.ip;
 ^
Makefile:997: recipe for target 'libprocess_la-process.lo' failed
make[5]: *** [libprocess_la-process.lo] Error 1
...
...
{noformat}



> SSL / libevent support broken in IPv6 patch from 
> https://github.com/lava/mesos/tree/bennoe/ipv6
> ---
>
> Key: MESOS-6238
> URL: https://issues.apache.org/jira/browse/MESOS-6238
> Project: Mesos
>  Issue Type: Bug
>Reporter: Lukas Loesche
>
> Affects https://github.com/lava/mesos/tree/bennoe/ipv6 at commit 
> 2199a24c0b7a782a0381aad8cceacbc95ec3d5c9 
> make fails when configure options --enable-ssl --enable-libevent were given.
> Error message:
> {noformat}
> ...
> ...
> ../../../3rdparty/libprocess/src/process.cpp: In member function ‘void 
> process::SocketManager::link_connect(const process::Future&, 
> process::network::Socket, const process::UPID&)’:
> ../../../3rdparty/libprocess/src/process.cpp:1457:25: error: ‘url’ was not 
> declared in this scope
>Try ip = url.ip;
>  ^
> Makefile:997: recipe for target 'libprocess_la-process.lo' failed
> make[5]: *** [libprocess_la-process.lo] Error 1
> ...
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6237) Slave Sandbox inaccessible when using IPv6 address in patch from https://github.com/lava/mesos/tree/bennoe/ipv6

2016-09-23 Thread Lukas Loesche (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517001#comment-15517001
 ] 

Lukas Loesche commented on MESOS-6237:
--

Cc [~bennoe]

> Slave Sandbox inaccessible when using IPv6 address in patch from 
> https://github.com/lava/mesos/tree/bennoe/ipv6
> ---
>
> Key: MESOS-6237
> URL: https://issues.apache.org/jira/browse/MESOS-6237
> Project: Mesos
>  Issue Type: Bug
>Reporter: Lukas Loesche
>
> Affects https://github.com/lava/mesos/tree/bennoe/ipv6 at commit 
> 2199a24c0b7a782a0381aad8cceacbc95ec3d5c9
> When using IPs instead of hostnames the Agent Sandbox is inaccessible. The 
> problem seems to be that there's no brackets around the IP so it tries to 
> access e.g. http://2001:41d0:1000:ab9:::5051 instead of 
> http://[2001:41d0:1000:ab9::]:5051



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6238) SSL / libevent support broken in IPv6 patch from https://github.com/lava/mesos/tree/bennoe/ipv6

2016-09-23 Thread Lukas Loesche (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517002#comment-15517002
 ] 

Lukas Loesche commented on MESOS-6238:
--

Cc [~bennoe]

> SSL / libevent support broken in IPv6 patch from 
> https://github.com/lava/mesos/tree/bennoe/ipv6
> ---
>
> Key: MESOS-6238
> URL: https://issues.apache.org/jira/browse/MESOS-6238
> Project: Mesos
>  Issue Type: Bug
>Reporter: Lukas Loesche
>
> make fails when configure options --enable-ssl --enable-libevent were given.
> Error message:
> {noformat}
> ...
> ...
> ../../../3rdparty/libprocess/src/process.cpp: In member function ‘void 
> process::SocketManager::link_connect(const process::Future&, 
> process::network::Socket, const process::UPID&)’:
> ../../../3rdparty/libprocess/src/process.cpp:1457:25: error: ‘url’ was not 
> declared in this scope
>Try ip = url.ip;
>  ^
> Makefile:997: recipe for target 'libprocess_la-process.lo' failed
> make[5]: *** [libprocess_la-process.lo] Error 1
> ...
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6238) SSL / libevent support broken in IPv6 patch from https://github.com/lava/mesos/tree/bennoe/ipv6

2016-09-23 Thread Lukas Loesche (JIRA)
Lukas Loesche created MESOS-6238:


 Summary: SSL / libevent support broken in IPv6 patch from 
https://github.com/lava/mesos/tree/bennoe/ipv6
 Key: MESOS-6238
 URL: https://issues.apache.org/jira/browse/MESOS-6238
 Project: Mesos
  Issue Type: Bug
Reporter: Lukas Loesche


make fails when configure options --enable-ssl --enable-libevent were given.

Error message:
{noformat}
...
...
../../../3rdparty/libprocess/src/process.cpp: In member function ‘void 
process::SocketManager::link_connect(const process::Future&, 
process::network::Socket, const process::UPID&)’:
../../../3rdparty/libprocess/src/process.cpp:1457:25: error: ‘url’ was not 
declared in this scope
   Try ip = url.ip;
 ^
Makefile:997: recipe for target 'libprocess_la-process.lo' failed
make[5]: *** [libprocess_la-process.lo] Error 1
...
...
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6237) Slave Sandbox inaccessible when using IPv6 address in patch from https://github.com/lava/mesos/tree/bennoe/ipv6

2016-09-23 Thread Lukas Loesche (JIRA)
Lukas Loesche created MESOS-6237:


 Summary: Slave Sandbox inaccessible when using IPv6 address in 
patch from https://github.com/lava/mesos/tree/bennoe/ipv6
 Key: MESOS-6237
 URL: https://issues.apache.org/jira/browse/MESOS-6237
 Project: Mesos
  Issue Type: Bug
Reporter: Lukas Loesche


Affects https://github.com/lava/mesos/tree/bennoe/ipv6 at commit 
2199a24c0b7a782a0381aad8cceacbc95ec3d5c9

When using IPs instead of hostnames the Agent Sandbox is inaccessible. The 
problem seems to be that there's no brackets around the IP so it tries to 
access e.g. http://2001:41d0:1000:ab9:::5051 instead of 
http://[2001:41d0:1000:ab9::]:5051




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5909) Stout "OsTest.User" test can fail on some systems

2016-09-23 Thread Mao Geng (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516903#comment-15516903
 ] 

Mao Geng commented on MESOS-5909:
-

[~kaysoky] Thanks for shepherding. Addressed your review comments in 
https://reviews.apache.org/r/52048/, can you please check? 

> Stout "OsTest.User" test can fail on some systems
> -
>
> Key: MESOS-5909
> URL: https://issues.apache.org/jira/browse/MESOS-5909
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Kapil Arya
>Assignee: Mao Geng
>  Labels: mesosphere
> Attachments: MESOS-5909-fix.diff
>
>
> Libc call {{getgrouplist}} doesn't return the {{gid}} list in a sorted manner 
> (in my case, it's returning "471 100") ... whereas {{id -G}} return a sorted 
> list ("100 471" in my case) causing the validation inside the loop to fail.
> We should sort both lists before comparing the values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6118) Agent would crash with docker container tasks due to host mount table read.

2016-09-23 Thread Ian Babrou (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516583#comment-15516583
 ] 

Ian Babrou commented on MESOS-6118:
---

Looks like Mesos is confused by the fact that my root is a parent of itself. 
This is probably because of the fact that system boots from the network and 
keeps / in ram.

> Agent would crash with docker container tasks due to host mount table read.
> ---
>
> Key: MESOS-6118
> URL: https://issues.apache.org/jira/browse/MESOS-6118
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 1.0.1
> Environment: Build: 2016-08-26 23:06:27 by centos
> Version: 1.0.1
> Git tag: 1.0.1
> Git SHA: 3611eb0b7eea8d144e9b2e840e0ba16f2f659ee3
> systemd version `219` detected
> Inializing systemd state
> Created systemd slice: `/run/systemd/system/mesos_executors.slice`
> Started systemd slice `mesos_executors.slice`
> Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
>  Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
> Linux ip-10-254-192-40 3.10.0-327.28.3.el7.x86_64 #1 SMP Thu Aug 18 19:05:49 
> UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Jamie Briant
>Assignee: Kevin Klues
>Priority: Critical
>  Labels: linux, slave
> Fix For: 1.1.0, 1.0.2
>
> Attachments: crashlogfull.log, cycle2.log, cycle3.log, cycle5.log, 
> cycle6.log, slave-crash.log
>
>
> I have a framework which schedules thousands of short running (a few seconds 
> to a few minutes) of tasks, over a period of several minutes. In 1.0.1, the 
> slave process will crash every few minutes (with systemd restarting it).
> Crash is:
> Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: F0901 20:52:23.905678  1232 
> fs.cpp:140] Check failed: !visitedParents.contains(parentId)
> Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: *** Check failure stack trace: 
> ***
> Version 1.0.0 works without this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6118) Agent would crash with docker container tasks due to host mount table read.

2016-09-23 Thread Ian Babrou (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516371#comment-15516371
 ] 

Ian Babrou commented on MESOS-6118:
---

I had to rework your patch a bit to apply on top of master. I then build 1.0.1 
with the resulting fs.cpp:

{noformat}
Sep 23 12:56:49 36com72 mesos-agent[15633]: Failed to perform recovery: Collect 
failed: Unable to unmount volumes for Docker container 
'5ec94354-f785-4d13-b3ef-fb1a37eac007': Failed to get mount table: Cycle found 
in mount table hierarchy through entry '1': 1 1 0:2 / / rw shared:1 - rootfs 
rootfs rw,size=65513288k,nr_inodes=16378322
Sep 23 12:56:49 36com72 mesos-agent[15633]: 17 1 0:17 / /sys 
rw,nosuid,nodev,noexec,relatime shared:2 - sysfs sysfs rw
Sep 23 12:56:49 36com72 mesos-agent[15633]: 18 1 0:5 / /proc 
rw,nosuid,nodev,noexec,relatime shared:7 - proc proc rw
Sep 23 12:56:49 36com72 mesos-agent[15633]: 19 1 0:6 / /dev rw,nosuid shared:8 
- devtmpfs devtmpfs rw,size=65513304k,nr_inodes=16378326,mode=755
Sep 23 12:56:49 36com72 mesos-agent[15633]: 20 17 0:18 / /sys/kernel/security 
rw,nosuid,nodev,noexec,relatime shared:3 - securityfs securityfs rw
Sep 23 12:56:49 36com72 mesos-agent[15633]: 21 17 0:16 / /sys/fs/selinux 
rw,relatime shared:4 - selinuxfs selinuxfs rw
Sep 23 12:56:49 36com72 mesos-agent[15633]: 22 19 0:19 / /dev/shm 
rw,nosuid,nodev shared:9 - tmpfs tmpfs rw
Sep 23 12:56:49 36com72 mesos-agent[15633]: 23 19 0:13 / /dev/pts 
rw,nosuid,noexec,relatime shared:10 - devpts devpts 
rw,gid=5,mode=620,ptmxmode=000
Sep 23 12:56:49 36com72 mesos-agent[15633]: 24 1 0:20 / /run rw,nosuid,nodev 
shared:11 - tmpfs tmpfs rw,mode=755
Sep 23 12:56:49 36com72 mesos-agent[15633]: 25 24 0:21 / /run/lock 
rw,nosuid,nodev,noexec,relatime shared:12 - tmpfs tmpfs rw,size=5120k
Sep 23 12:56:49 36com72 mesos-agent[15633]: 26 17 0:22 / /sys/fs/cgroup 
ro,nosuid,nodev,noexec shared:5 - tmpfs tmpfs ro,mode=755
Sep 23 12:56:49 36com72 mesos-agent[15633]: 27 26 0:23 / /sys/fs/cgroup/systemd 
rw,nosuid,nodev,noexec,relatime shared:6 - cgroup cgroup 
rw,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd
Sep 23 12:56:49 36com72 mesos-agent[15633]: 28 26 0:24 / /sys/fs/cgroup/cpuset 
rw,nosuid,nodev,noexec,relatime shared:13 - cgroup cgroup rw,cpuset
Sep 23 12:56:49 36com72 mesos-agent[15633]: 29 26 0:25 / 
/sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime shared:14 - cgroup 
cgroup rw,cpu,cpuacct
Sep 23 12:56:49 36com72 mesos-agent[15633]: 30 26 0:26 / /sys/fs/cgroup/blkio 
rw,nosuid,nodev,noexec,relatime shared:15 - cgroup cgroup rw,blkio
Sep 23 12:56:49 36com72 mesos-agent[15633]: 31 26 0:27 / /sys/fs/cgroup/memory 
rw,nosuid,nodev,noexec,relatime shared:16 - cgroup cgroup rw,memory
Sep 23 12:56:49 36com72 mesos-agent[15633]: 32 26 0:28 / /sys/fs/cgroup/devices 
rw,nosuid,nodev,noexec,relatime shared:17 - cgroup cgroup rw,devices
Sep 23 12:56:49 36com72 mesos-agent[15633]: 33 26 0:29 / /sys/fs/cgroup/freezer 
rw,nosuid,nodev,noexec,relatime shared:18 - cgroup cgroup rw,freezer
Sep 23 12:56:49 36com72 mesos-agent[15633]: 34 26 0:30 / 
/sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime shared:19 - 
cgroup cgroup rw,net_cls,net_prio
Sep 23 12:56:49 36com72 mesos-agent[15633]: 35 26 0:31 / 
/sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime shared:20 - cgroup 
cgroup rw,perf_event
Sep 23 12:56:49 36com72 mesos-agent[15633]: 36 26 0:32 / /sys/fs/cgroup/hugetlb 
rw,nosuid,nodev,noexec,relatime shared:21 - cgroup cgroup rw,hugetlb
Sep 23 12:56:49 36com72 mesos-agent[15633]: 37 26 0:33 / /sys/fs/cgroup/pids 
rw,nosuid,nodev,noexec,relatime shared:22 - cgroup cgroup rw,pids
Sep 23 12:56:49 36com72 mesos-agent[15633]: 38 18 0:34 / 
/proc/sys/fs/binfmt_misc rw,relatime shared:23 - autofs systemd-1 
rw,fd=22,pgrp=1,timeout=300,minproto=5,maxproto=5,direct
Sep 23 12:56:49 36com72 mesos-agent[15633]: 39 19 0:35 / /dev/hugepages 
rw,relatime shared:24 - hugetlbfs hugetlbfs rw
Sep 23 12:56:49 36com72 mesos-agent[15633]: 40 17 0:8 / /sys/kernel/debug 
rw,relatime shared:25 - debugfs debugfs rw
Sep 23 12:56:49 36com72 mesos-agent[15633]: 41 19 0:15 / /dev/mqueue 
rw,relatime shared:26 - mqueue mqueue rw
Sep 23 12:56:49 36com72 mesos-agent[15633]: 42 1 9:127 / /state rw,relatime 
shared:27 - ext4 /dev/md127 rw,stripe=384,data=ordered
Sep 23 12:56:49 36com72 mesos-agent[15633]: 43 1 0:37 / /srv rw,relatime 
shared:28 - nfs4 10.36.14.18:/srv/hosts/36com72 
rw,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.36.23.25,local_lock=none,addr=10.36.14.18
Sep 23 12:56:49 36com72 mesos-agent[15633]: 44 1 0:37 / /srv-master rw,relatime 
shared:29 - nfs4 10.36.14.18:/srv 
rw,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.36.23.25,local_lock=none,addr=10.36.14.18
Sep 23 12:56:49 36com72 mesos-agent[15633]: 45 38 0:36 / 

[jira] [Issue Comment Deleted] (MESOS-6118) Agent would crash with docker container tasks due to host mount table read.

2016-09-23 Thread Ian Babrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Babrou updated MESOS-6118:
--
Comment: was deleted

(was: I also experience this issue:

{noformat}
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.763520  4995 
slave.cpp:3211] Handling status update TASK_FAILED (UUID: 
084ace64-a1bf-495d-9769-ad831b53d1bf) for task 
pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 
20150606-001827-252388362-5050-5982-0001 from executor(1)@10.10.23.25:46833
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.763664  4991 
slave.cpp:6014] Terminating task 
pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.763825  5002 
docker.cpp:972] Running docker -H unix:///var/run/docker.sock inspect 
mesos-dfc1b04b-941b-4d93-adf4-c65ab307ee2c-S2.c40cea8c-31a9-468f-a183-ed9851cd5aa8
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.821267  4987 
status_update_manager.cpp:320] Received status update TASK_FAILED (UUID: 
084ace64-a1bf-495d-9769-ad831b53d1bf) for task 
pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 
20150606-001827-252388362-5050-5982-0001
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.821296  4987 
status_update_manager.cpp:825] Checkpointing UPDATE for status update 
TASK_FAILED (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task 
pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 
20150606-001827-252388362-5050-5982-0001
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.844871  4987 
status_update_manager.cpp:374] Forwarding update TASK_FAILED (UUID: 
084ace64-a1bf-495d-9769-ad831b53d1bf) for task 
pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 
20150606-001827-252388362-5050-5982-0001 to the agent
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.844970  5009 
slave.cpp:3604] Forwarding the update TASK_FAILED (UUID: 
084ace64-a1bf-495d-9769-ad831b53d1bf) for task 
pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 
20150606-001827-252388362-5050-5982-0001 to master@10.10.11.16:5050
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.845062  5009 
slave.cpp:3498] Status update manager successfully handled status update 
TASK_FAILED (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task 
pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 
20150606-001827-252388362-5050-5982-0001
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.845074  5009 
slave.cpp:3514] Sending acknowledgement for status update TASK_FAILED (UUID: 
084ace64-a1bf-495d-9769-ad831b53d1bf) for task 
pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 
20150606-001827-252388362-5050-5982-0001 to executor(1)@10.10.23.25:46833
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.864859  4987 
slave.cpp:3686] Received ping from slave-observer(149)@10.10.11.16:5050
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.955936  4995 
status_update_manager.cpp:392] Received status update acknowledgement (UUID: 
084ace64-a1bf-495d-9769-ad831b53d1bf) for task 
pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 
20150606-001827-252388362-5050-5982-0001
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.956001  4995 
status_update_manager.cpp:825] Checkpointing ACK for status update TASK_FAILED 
(UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task 
pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 
20150606-001827-252388362-5050-5982-0001
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.982950  4995 
status_update_manager.cpp:528] Cleaning up status update stream for task 
pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 
20150606-001827-252388362-5050-5982-0001
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.983119  4995 
slave.cpp:2597] Status update manager successfully handled status update 
acknowledgement (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task 
pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 
20150606-001827-252388362-5050-5982-0001
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.983131  4995 
slave.cpp:6055] Completing task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1
Sep 23 11:07:40 myhost mesos-agent[4980]: I0923 11:07:40.667191  4981 
process.cpp:3323] Handling HTTP event for process 'slave(1)' with path: 
'/slave(1)/state'
Sep 23 11:07:40 myhost mesos-agent[4980]: I0923 11:07:40.667413  4983 
http.cpp:270] HTTP GET for /slave(1)/state from 10.10.19.24:33570 with 
User-Agent='Go-http-client/1.1'
Sep 23 11:07:40 myhost mesos-agent[4980]: I0923 11:07:40.669677  5012 
process.cpp:3323] Handling HTTP event for process 'files' with path: 
'/files/download'
Sep 23 11:07:40 myhost mesos-agent[4980]: I0923 11:07:40.670250  5005 
process.cpp:1280] Sending file at 

[jira] [Commented] (MESOS-6118) Agent would crash with docker container tasks due to host mount table read.

2016-09-23 Thread Ian Babrou (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516233#comment-15516233
 ] 

Ian Babrou commented on MESOS-6118:
---

I've tried it and it didn't fix the issue for me:

{noformat}
Sep 23 11:52:01 myhost mesos-agent[10627]: F0923 11:52:01.524873 10648 
fs.cpp:140] Check failed: !visitedParents.contains(parentId)
Sep 23 11:52:01 myhost mesos-agent[10627]: *** Check failure stack trace: ***
Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb5a253d  
google::LogMessage::Fail()
Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb5a41bd  
google::LogMessage::SendToLog()
Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb5a2102  
google::LogMessage::Flush()
Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb5a4ba9  
google::LogMessageFatal::~LogMessageFatal()
Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb07183d  
_ZNSt17_Function_handlerIFviEZN5mesos8internal2fs14MountInfoTable4readERK6OptionIiEbEUliE_E9_M_invokeERKSt9_Any_datai
Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb0717a5  
_ZNSt17_Function_handlerIFviEZN5mesos8internal2fs14MountInfoTable4readERK6OptionIiEbEUliE_E9_M_invokeERKSt9_Any_datai
Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb078c5a  
mesos::internal::fs::MountInfoTable::read()
Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bae2c346  
mesos::internal::slave::DockerContainerizerProcess::unmountPersistentVolumes()
Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bae48157  
mesos::internal::slave::DockerContainerizerProcess::___destroy()
Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb546094  
process::ProcessManager::resume()
Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb5463b7  
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3b9c20970  (unknown)
Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3b973f0a4  start_thread
Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3b947487d  (unknown)
{noformat}

/proc/mounts:

{noformat}
rootfs / rootfs rw,size=65513288k,nr_inodes=16378322 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
devtmpfs /dev devtmpfs rw,nosuid,size=65513304k,nr_inodes=16378326,mode=755 0 0
securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
selinuxfs /sys/fs/selinux selinuxfs rw,relatime 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,nodev,mode=755 0 0
tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
tmpfs /sys/fs/cgroup tmpfs ro,nosuid,nodev,noexec,mode=755 0 0
cgroup /sys/fs/cgroup/systemd cgroup 
rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd
 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup 
rw,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0
cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0
cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0
cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0
cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0
cgroup /sys/fs/cgroup/net_cls,net_prio cgroup 
rw,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0
cgroup /sys/fs/cgroup/perf_event cgroup 
rw,nosuid,nodev,noexec,relatime,perf_event 0 0
cgroup /sys/fs/cgroup/hugetlb cgroup rw,nosuid,nodev,noexec,relatime,hugetlb 0 0
cgroup /sys/fs/cgroup/pids cgroup rw,nosuid,nodev,noexec,relatime,pids 0 0
systemd-1 /proc/sys/fs/binfmt_misc autofs 
rw,relatime,fd=22,pgrp=1,timeout=300,minproto=5,maxproto=5,direct 0 0
hugetlbfs /dev/hugepages hugetlbfs rw,relatime 0 0
debugfs /sys/kernel/debug debugfs rw,relatime 0 0
mqueue /dev/mqueue mqueue rw,relatime 0 0
/dev/md127 /state ext4 rw,relatime,stripe=384,data=ordered 0 0
10.10.14.18:/srv/hosts/myhost /srv nfs4 
rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.10.23.25,local_lock=none,addr=10.10.14.18
 0 0
10.10.14.18:/srv /srv-master nfs4 
rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.10.23.25,local_lock=none,addr=10.10.14.18
 0 0
binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0
{noformat}

Build procedure changes:

{noformat}
 build:
cd $(BUILDDIR)/mesos
&& \
+   patch -p1 < $(TOP)/rb51620.patch
&& \
autoreconf -f -i -Wall,no-obsolete  
&& \
./bootstrap  

[jira] [Issue Comment Deleted] (MESOS-6118) Agent would crash with docker container tasks due to host mount table read.

2016-09-23 Thread Ian Babrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Babrou updated MESOS-6118:
--
Comment: was deleted

(was: I've tried it and it didn't fix the issue for me:

{noformat}
Sep 23 11:52:01 myhost mesos-agent[10627]: F0923 11:52:01.524873 10648 
fs.cpp:140] Check failed: !visitedParents.contains(parentId)
Sep 23 11:52:01 myhost mesos-agent[10627]: *** Check failure stack trace: ***
Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb5a253d  
google::LogMessage::Fail()
Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb5a41bd  
google::LogMessage::SendToLog()
Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb5a2102  
google::LogMessage::Flush()
Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb5a4ba9  
google::LogMessageFatal::~LogMessageFatal()
Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb07183d  
_ZNSt17_Function_handlerIFviEZN5mesos8internal2fs14MountInfoTable4readERK6OptionIiEbEUliE_E9_M_invokeERKSt9_Any_datai
Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb0717a5  
_ZNSt17_Function_handlerIFviEZN5mesos8internal2fs14MountInfoTable4readERK6OptionIiEbEUliE_E9_M_invokeERKSt9_Any_datai
Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb078c5a  
mesos::internal::fs::MountInfoTable::read()
Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bae2c346  
mesos::internal::slave::DockerContainerizerProcess::unmountPersistentVolumes()
Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bae48157  
mesos::internal::slave::DockerContainerizerProcess::___destroy()
Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb546094  
process::ProcessManager::resume()
Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb5463b7  
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3b9c20970  (unknown)
Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3b973f0a4  start_thread
Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3b947487d  (unknown)
{noformat}

/proc/mounts:

{noformat}
rootfs / rootfs rw,size=65513288k,nr_inodes=16378322 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
devtmpfs /dev devtmpfs rw,nosuid,size=65513304k,nr_inodes=16378326,mode=755 0 0
securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
selinuxfs /sys/fs/selinux selinuxfs rw,relatime 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,nodev,mode=755 0 0
tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
tmpfs /sys/fs/cgroup tmpfs ro,nosuid,nodev,noexec,mode=755 0 0
cgroup /sys/fs/cgroup/systemd cgroup 
rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd
 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup 
rw,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0
cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0
cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0
cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0
cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0
cgroup /sys/fs/cgroup/net_cls,net_prio cgroup 
rw,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0
cgroup /sys/fs/cgroup/perf_event cgroup 
rw,nosuid,nodev,noexec,relatime,perf_event 0 0
cgroup /sys/fs/cgroup/hugetlb cgroup rw,nosuid,nodev,noexec,relatime,hugetlb 0 0
cgroup /sys/fs/cgroup/pids cgroup rw,nosuid,nodev,noexec,relatime,pids 0 0
systemd-1 /proc/sys/fs/binfmt_misc autofs 
rw,relatime,fd=22,pgrp=1,timeout=300,minproto=5,maxproto=5,direct 0 0
hugetlbfs /dev/hugepages hugetlbfs rw,relatime 0 0
debugfs /sys/kernel/debug debugfs rw,relatime 0 0
mqueue /dev/mqueue mqueue rw,relatime 0 0
/dev/md127 /state ext4 rw,relatime,stripe=384,data=ordered 0 0
10.10.14.18:/srv/hosts/myhost /srv nfs4 
rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.10.23.25,local_lock=none,addr=10.10.14.18
 0 0
10.10.14.18:/srv /srv-master nfs4 
rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.10.23.25,local_lock=none,addr=10.10.14.18
 0 0
binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0
{noformat}

Build procedure changes:

{noformat}
 build:
cd $(BUILDDIR)/mesos
&& \
+   patch -p1 < $(TOP)/rb51620.patch
&& \
autoreconf -f -i -Wall,no-obsolete  
&& \
./bootstrap   

[jira] [Commented] (MESOS-6184) Health checks should use a general mechanism to enter namespaces of the task.

2016-09-23 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516215#comment-15516215
 ] 

Alexander Rukletsov commented on MESOS-6184:


Once we transition to a general solution, there will be no more need to expose 
{{defaultClone}}. See https://reviews.apache.org/r/51636/.

> Health checks should use a general mechanism to enter namespaces of the task.
> -
>
> Key: MESOS-6184
> URL: https://issues.apache.org/jira/browse/MESOS-6184
> Project: Mesos
>  Issue Type: Improvement
>Reporter: haosdent
>Assignee: haosdent
>  Labels: health-check, mesosphere
> Fix For: 1.1.0
>
>
> To perform health checks for tasks, we need to enter the corresponding 
> namespaces of the container. For now health check use custom clone to 
> implement this
> {code}
>   return process::defaultClone([=]() -> int {
> if (taskPid.isSome()) {
>   foreach (const string& ns, namespaces) {
> Try setns = ns::setns(taskPid.get(), ns);
> if (setns.isError()) {
>   ...
> }
>   }
> }
> return func();
>   });
> {code}
> After the childHooks patches merged, we could change the health check to use 
> childHooks to call {{setns}} and make {{process::defaultClone}} private 
> again.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6184) Health checks should use a general mechanism to enter namespaces of the task.

2016-09-23 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6184:
---
Summary: Health checks should use a general mechanism to enter namespaces 
of the task.  (was: Change health check to use childHooks to enter the 
namespaces of the container)

> Health checks should use a general mechanism to enter namespaces of the task.
> -
>
> Key: MESOS-6184
> URL: https://issues.apache.org/jira/browse/MESOS-6184
> Project: Mesos
>  Issue Type: Improvement
>Reporter: haosdent
>Assignee: haosdent
>  Labels: health-check, mesosphere
> Fix For: 1.1.0
>
>
> To perform health checks for tasks, we need to enter the corresponding 
> namespaces of the container. For now health check use custom clone to 
> implement this
> {code}
>   return process::defaultClone([=]() -> int {
> if (taskPid.isSome()) {
>   foreach (const string& ns, namespaces) {
> Try setns = ns::setns(taskPid.get(), ns);
> if (setns.isError()) {
>   ...
> }
>   }
> }
> return func();
>   });
> {code}
> After the childHooks patches merged, we could change the health check to use 
> childHooks to call {{setns}} and make {{process::defaultClone}} private 
> again.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6184) Change health check to use childHooks to enter the namespaces of the container

2016-09-23 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6184:
---
 Shepherd: Alexander Rukletsov
 Story Points: 3
   Labels: health-check mesosphere  (was: health-check)
Fix Version/s: 1.1.0

> Change health check to use childHooks to enter the namespaces of the container
> --
>
> Key: MESOS-6184
> URL: https://issues.apache.org/jira/browse/MESOS-6184
> Project: Mesos
>  Issue Type: Improvement
>Reporter: haosdent
>Assignee: haosdent
>  Labels: health-check, mesosphere
> Fix For: 1.1.0
>
>
> To perform health checks for tasks, we need to enter the corresponding 
> namespaces of the container. For now health check use custom clone to 
> implement this
> {code}
>   return process::defaultClone([=]() -> int {
> if (taskPid.isSome()) {
>   foreach (const string& ns, namespaces) {
> Try setns = ns::setns(taskPid.get(), ns);
> if (setns.isError()) {
>   ...
> }
>   }
> }
> return func();
>   });
> {code}
> After the childHooks patches merged, we could change the health check to use 
> childHooks to call {{setns}} and make {{process::defaultClone}} private 
> again.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6118) Agent would crash with docker container tasks due to host mount table read.

2016-09-23 Thread Ian Babrou (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516181#comment-15516181
 ] 

Ian Babrou commented on MESOS-6118:
---

I also experience this issue:

{noformat}
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.763520  4995 
slave.cpp:3211] Handling status update TASK_FAILED (UUID: 
084ace64-a1bf-495d-9769-ad831b53d1bf) for task 
pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 
20150606-001827-252388362-5050-5982-0001 from executor(1)@10.10.23.25:46833
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.763664  4991 
slave.cpp:6014] Terminating task 
pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.763825  5002 
docker.cpp:972] Running docker -H unix:///var/run/docker.sock inspect 
mesos-dfc1b04b-941b-4d93-adf4-c65ab307ee2c-S2.c40cea8c-31a9-468f-a183-ed9851cd5aa8
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.821267  4987 
status_update_manager.cpp:320] Received status update TASK_FAILED (UUID: 
084ace64-a1bf-495d-9769-ad831b53d1bf) for task 
pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 
20150606-001827-252388362-5050-5982-0001
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.821296  4987 
status_update_manager.cpp:825] Checkpointing UPDATE for status update 
TASK_FAILED (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task 
pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 
20150606-001827-252388362-5050-5982-0001
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.844871  4987 
status_update_manager.cpp:374] Forwarding update TASK_FAILED (UUID: 
084ace64-a1bf-495d-9769-ad831b53d1bf) for task 
pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 
20150606-001827-252388362-5050-5982-0001 to the agent
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.844970  5009 
slave.cpp:3604] Forwarding the update TASK_FAILED (UUID: 
084ace64-a1bf-495d-9769-ad831b53d1bf) for task 
pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 
20150606-001827-252388362-5050-5982-0001 to master@10.10.11.16:5050
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.845062  5009 
slave.cpp:3498] Status update manager successfully handled status update 
TASK_FAILED (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task 
pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 
20150606-001827-252388362-5050-5982-0001
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.845074  5009 
slave.cpp:3514] Sending acknowledgement for status update TASK_FAILED (UUID: 
084ace64-a1bf-495d-9769-ad831b53d1bf) for task 
pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 
20150606-001827-252388362-5050-5982-0001 to executor(1)@10.10.23.25:46833
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.864859  4987 
slave.cpp:3686] Received ping from slave-observer(149)@10.10.11.16:5050
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.955936  4995 
status_update_manager.cpp:392] Received status update acknowledgement (UUID: 
084ace64-a1bf-495d-9769-ad831b53d1bf) for task 
pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 
20150606-001827-252388362-5050-5982-0001
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.956001  4995 
status_update_manager.cpp:825] Checkpointing ACK for status update TASK_FAILED 
(UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task 
pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 
20150606-001827-252388362-5050-5982-0001
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.982950  4995 
status_update_manager.cpp:528] Cleaning up status update stream for task 
pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 
20150606-001827-252388362-5050-5982-0001
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.983119  4995 
slave.cpp:2597] Status update manager successfully handled status update 
acknowledgement (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task 
pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 
20150606-001827-252388362-5050-5982-0001
Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.983131  4995 
slave.cpp:6055] Completing task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1
Sep 23 11:07:40 myhost mesos-agent[4980]: I0923 11:07:40.667191  4981 
process.cpp:3323] Handling HTTP event for process 'slave(1)' with path: 
'/slave(1)/state'
Sep 23 11:07:40 myhost mesos-agent[4980]: I0923 11:07:40.667413  4983 
http.cpp:270] HTTP GET for /slave(1)/state from 10.10.19.24:33570 with 
User-Agent='Go-http-client/1.1'
Sep 23 11:07:40 myhost mesos-agent[4980]: I0923 11:07:40.669677  5012 
process.cpp:3323] Handling HTTP event for process 'files' with path: 
'/files/download'
Sep 23 11:07:40 myhost mesos-agent[4980]: I0923 11:07:40.670250  5005 
process.cpp:1280] Sending file at 

[jira] [Updated] (MESOS-6236) Launch subprocesses associated with specified namespaces.

2016-09-23 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6236:
---
Description: 
Currently there is no standard way in Mesos to launch a child process in a 
different namespace (e.g. {{net}}, {{mnt}}). A user may leverage {{Subprocess}} 
and provide its own {{clone}} callback, but this approach is error-prone.

One possible solution is to implement a {{Subprocess}}' child hook. In 
[MESOS-5070|https://issues.apache.org/jira/browse/MESOS-5070], we have 
introduced a child hook framework in subprocess and implemented three child 
hooks {{CHDIR}}, {{SETSID}} and {{SUPERVISOR}}. We suggest to introduce another 
child hook {{SETNS}} so that other components (e.g., health check) can call it 
to enter the namespaces of a specific process.

  was:In [MESOS-5070|https://issues.apache.org/jira/browse/MESOS-5070], we have 
introduced a child hook framework in subprocess and implemented three child 
hooks {{CHDIR}}, {{SETSID}} and {{SUPERVISOR}}. In this ticket, we'd like to 
introduce another child hook {{SETNS}} so that other components (e.g., health 
check) can call it to enter the namespaces of a specific process.


> Launch subprocesses associated with specified namespaces.
> -
>
> Key: MESOS-6236
> URL: https://issues.apache.org/jira/browse/MESOS-6236
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>  Labels: mesosphere
> Fix For: 1.1.0
>
>
> Currently there is no standard way in Mesos to launch a child process in a 
> different namespace (e.g. {{net}}, {{mnt}}). A user may leverage 
> {{Subprocess}} and provide its own {{clone}} callback, but this approach is 
> error-prone.
> One possible solution is to implement a {{Subprocess}}' child hook. In 
> [MESOS-5070|https://issues.apache.org/jira/browse/MESOS-5070], we have 
> introduced a child hook framework in subprocess and implemented three child 
> hooks {{CHDIR}}, {{SETSID}} and {{SUPERVISOR}}. We suggest to introduce 
> another child hook {{SETNS}} so that other components (e.g., health check) 
> can call it to enter the namespaces of a specific process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6236) Launch subprocesses associated with specified namespaces.

2016-09-23 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6236:
---
Summary: Launch subprocesses associated with specified namespaces.  (was: 
Introduce SETNS child hook in subprocess)

> Launch subprocesses associated with specified namespaces.
> -
>
> Key: MESOS-6236
> URL: https://issues.apache.org/jira/browse/MESOS-6236
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>  Labels: mesosphere
> Fix For: 1.1.0
>
>
> In [MESOS-5070|https://issues.apache.org/jira/browse/MESOS-5070], we have 
> introduced a child hook framework in subprocess and implemented three child 
> hooks {{CHDIR}}, {{SETSID}} and {{SUPERVISOR}}. In this ticket, we'd like to 
> introduce another child hook {{SETNS}} so that other components (e.g., health 
> check) can call it to enter the namespaces of a specific process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6236) Introduce SETNS child hook in subprocess

2016-09-23 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6236:
---
 Story Points: 8
   Labels: mesosphere  (was: )
Fix Version/s: 1.1.0

> Introduce SETNS child hook in subprocess
> 
>
> Key: MESOS-6236
> URL: https://issues.apache.org/jira/browse/MESOS-6236
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>  Labels: mesosphere
> Fix For: 1.1.0
>
>
> In [MESOS-5070|https://issues.apache.org/jira/browse/MESOS-5070], we have 
> introduced a child hook framework in subprocess and implemented three child 
> hooks {{CHDIR}}, {{SETSID}} and {{SUPERVISOR}}. In this ticket, we'd like to 
> introduce another child hook {{SETNS}} so that other components (e.g., health 
> check) can call it to enter the namespaces of a specific process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)