[jira] [Comment Edited] (MESOS-7176) Add versioning support to network/cni isolator

2018-07-19 Thread Qian Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374228#comment-16374228
 ] 

Qian Zhang edited comment on MESOS-7176 at 7/20/18 1:19 AM:


According to [CNI 
spec|https://github.com/containernetworking/cni/blob/master/SPEC.md#released-versions],
 one of the major changes introduced in CNI spec 0.3.0 is rich result type, the 
result type of CNI spec 0.3.0 is 
[https://github.com/containernetworking/cni/blob/spec-v0.3.0/SPEC.md#result|https://github.com/containernetworking/cni/blob/spec-v0.3.0/SPEC.md#result]
 which is different from CNI spec 0.2.0. What CNI isolator in Mesos is using is 
CNI spec 0.2.0, see 
[here|https://github.com/apache/mesos/blob/1.5.0/src/slave/containerizer/mesos/isolators/network/cni/spec.proto#L63:L67]
 for details.

As a result, currently CNI isolator can NOT support CNI network configuration 
whose version is 0.3.0+, because if CNI isolator invokes a CNI plugins (suppose 
it also supports CNI spec 0.3.0+) with a CNI network configuration of version 
0.3.0+ (see below as an example) as its input, the CNI plugin will return the 
result which conforms the same version of CNI spec as the input CNI network 
configuration (i.e., 0.3.0 in the example below), but CNI isolator will always 
use CNI spec 0.2.0 to parse the result (see 
[here|https://github.com/apache/mesos/blob/1.5.0/src/slave/containerizer/mesos/isolators/network/cni/spec.cpp#L46:L59]
 for details.) which will fail.
{code:java}
{
  "cniVersion": "0.3.0",
  "name": "dbnet",
  "type": "bridge",
  "bridge": "cni0",
  "ipam": {
"type": "dhcp"
  }
}{code}
So I think we should improve CNI isolator to support CNI spec 0.3.0 as well, 
and parse the result returned by CNI plugin based on the CNI spec version of 
the result.


was (Author: qianzhang):
According to [CNI 
spec|https://github.com/containernetworking/cni/blob/master/SPEC.md#released-versions],
 one of the major changes introduced in CNI spec 0.3.0 is rich result type, the 
result type of CNI spec 0.3.0 is 
[https://github.com/containernetworking/cni/blob/spec-v0.3.0/SPEC.md#result|https://github.com/containernetworking/cni/blob/spec-v0.3.0/SPEC.md#result,]
 which is different from CNI spec 0.2.0. What CNI isolator in Mesos is using is 
CNI spec 0.2.0, see 
[here|https://github.com/apache/mesos/blob/1.5.0/src/slave/containerizer/mesos/isolators/network/cni/spec.proto#L63:L67]
 for details.

As a result, currently CNI isolator can NOT support CNI network configuration 
whose version is 0.3.0+, because if CNI isolator invokes a CNI plugins (suppose 
it also supports CNI spec 0.3.0+) with a CNI network configuration of version 
0.3.0+ (see below as an example) as its input, the CNI plugin will return the 
result which conforms the same version of CNI spec as the input CNI network 
configuration (i.e., 0.3.0 in the example below), but CNI isolator will always 
use CNI spec 0.2.0 to parse the result (see 
[here|https://github.com/apache/mesos/blob/1.5.0/src/slave/containerizer/mesos/isolators/network/cni/spec.cpp#L46:L59]
 for details.) which will fail.
{code:java}
{
  "cniVersion": "0.3.0",
  "name": "dbnet",
  "type": "bridge",
  "bridge": "cni0",
  "ipam": {
"type": "dhcp"
  }
}{code}
So I think we should improve CNI isolator to support CNI spec 0.3.0 as well, 
and parse the result returned by CNI plugin based on the CNI spec version of 
the result.

> Add versioning support to network/cni isolator
> --
>
> Key: MESOS-7176
> URL: https://issues.apache.org/jira/browse/MESOS-7176
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Avinash Sridharan
>Assignee: Deepak Goel
>Priority: Major
>
> Currently the network/cni isolator support CNI SPEC version 0.2 . The CNI 
> SPEC version 0.3 has already been ratified and introduces new features such 
> as CNI service chaining and CNI plugin capabilities. However, CNI spec 
> version 0.3 is incompatible with CNI spec 0.2. Hence we need to introduce 
> versioning support in `network/cni` isolator in order to make it backward 
> compatible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-5818) Port libprocess reap_tests.cpp

2018-07-19 Thread Andrew Schwartzmeyer (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549979#comment-16549979
 ] 

Andrew Schwartzmeyer commented on MESOS-5818:
-

These are annoying because they use the {{Fork}} and {{Exec}} constructs which 
we chose not to port.

> Port libprocess reap_tests.cpp
> --
>
> Key: MESOS-5818
> URL: https://issues.apache.org/jira/browse/MESOS-5818
> Project: Mesos
>  Issue Type: Task
>Reporter: Andrew Schwartzmeyer
>Assignee: Eric Mumau
>Priority: Major
>  Labels: libprocess, mesosphere, windows
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-5819) Port libprocess sequence_tests.cpp

2018-07-19 Thread Andrew Schwartzmeyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-5819:
---

Assignee: Andrew Schwartzmeyer  (was: Eric Mumau)

> Port libprocess sequence_tests.cpp
> --
>
> Key: MESOS-5819
> URL: https://issues.apache.org/jira/browse/MESOS-5819
> Project: Mesos
>  Issue Type: Task
>Reporter: Andrew Schwartzmeyer
>Assignee: Andrew Schwartzmeyer
>Priority: Major
>  Labels: libprocess, mesosphere, windows
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9098) `os::clone` returns `Failed to clone: Success` on error.

2018-07-19 Thread Chun-Hung Hsiao (JIRA)
Chun-Hung Hsiao created MESOS-9098:
--

 Summary: `os::clone` returns `Failed to clone: Success` on error.
 Key: MESOS-9098
 URL: https://issues.apache.org/jira/browse/MESOS-9098
 Project: Mesos
  Issue Type: Improvement
  Components: stout
Affects Versions: 1.7.0
Reporter: Chun-Hung Hsiao


{{os::clone}} in stout is implemented in a way that when {{::clone}} fails, it 
would call {{::munmap}} to free the allocated stack memory, which would 
overwrite {{errno}}, causing it to return an {{Failed to clone: Success}} error:
[https://github.com/apache/mesos/blob/master/3rdparty/stout/include/stout/os/linux.hpp#L165]
We should preserve {{errno}} before calling {{::munmap}}, and return 
{{::munmap}}'s {{errno}} only if {{::clone}}'s {{errno}} is not zero.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9097) `libwinio_loop` must be initialized before `Socket` constructor is called

2018-07-19 Thread Andrew Schwartzmeyer (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549836#comment-16549836
 ] 

Andrew Schwartzmeyer commented on MESOS-9097:
-

https://reviews.apache.org/r/67976/
https://reviews.apache.org/r/67977/

> `libwinio_loop` must be initialized before `Socket` constructor is called
> -
>
> Key: MESOS-9097
> URL: https://issues.apache.org/jira/browse/MESOS-9097
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.7.0
> Environment: Windows with \{{-DENABLE_LIBWINIO=ON}}
>Reporter: Andrew Schwartzmeyer
>Assignee: Akash Gupta
>Priority: Major
>  Labels: libprocess, windows
>
> When building with {{-DENABLE_LIBWINIO}}, initializing the Windows event loop 
> (specifically the pointer {{process::libwinio_loop}}) becomes a prerequisite 
> to creating a {{Socket}}. If it has not been initialized, then when the 
> {{Socket}} constructor calls {{prepare_async()}}, a null pointer is 
> dereferenced, leading to a hang on Windows.
> This was discovered in the simple program {{test-linkee}} where a {{Socket}} 
> is created and used, but the entire libprocess event loop is unused. This is 
> temporarily fixed by calling {{process::initialize()}} early in 
> {{test-linkee}}, but this should probably not be required. Instead, 
> {{prepare_async()}} (or any use of {{libwinio_loop}} should probably 
> auto-initialize the event loop if required.
> For now, I am adding fatal checks before a null pointer dereference.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9095) Consider including public protobuf definitions in generated jar

2018-07-19 Thread Tim Harper (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549695#comment-16549695
 ] 

Tim Harper commented on MESOS-9095:
---

Thank you for filing this, Benjamin. This will be really helpful.

Currently, Marathon does what you say (we copy the Proto sources into our own 
code base, and check in the generated code).

> Consider including public protobuf definitions in generated jar
> ---
>
> Key: MESOS-9095
> URL: https://issues.apache.org/jira/browse/MESOS-9095
> Project: Mesos
>  Issue Type: Improvement
>  Components: java api
>Reporter: Benjamin Bannier
>Priority: Major
>
> We currently do not package public proto sources alongside other resources in 
> the jar. This is inconsistent with what we do e.g., for packages or {{install 
> rules}} on the C++ side.
> Frameworks seem to work around this by forking required proto sources into 
> their own source code, or (slightly less worse) fetching them from 
> potentially poorly versioned internet resources. Both approaches can lead to 
> complicate dependencies between used jar and proto sources.
> We should include them in the jar we publish, e.g., by declaring them as 
> {{resources}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-9097) `libwinio_loop` must be initialized before `Socket` constructor is called

2018-07-19 Thread Andrew Schwartzmeyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-9097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-9097:
---

Assignee: Akash Gupta

> `libwinio_loop` must be initialized before `Socket` constructor is called
> -
>
> Key: MESOS-9097
> URL: https://issues.apache.org/jira/browse/MESOS-9097
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.7.0
> Environment: Windows with \{{-DENABLE_LIBWINIO=ON}}
>Reporter: Andrew Schwartzmeyer
>Assignee: Akash Gupta
>Priority: Major
>  Labels: libprocess, windows
>
> When building with {{-DENABLE_LIBWINIO}}, initializing the Windows event loop 
> (specifically the pointer {{process::libwinio_loop}}) becomes a prerequisite 
> to creating a {{Socket}}. If it has not been initialized, then when the 
> {{Socket}} constructor calls {{prepare_async()}}, a null pointer is 
> dereferenced, leading to a hang on Windows.
> This was discovered in the simple program {{test-linkee}} where a {{Socket}} 
> is created and used, but the entire libprocess event loop is unused. This is 
> temporarily fixed by calling {{process::initialize()}} early in 
> {{test-linkee}}, but this should probably not be required. Instead, 
> {{prepare_async()}} (or any use of {{libwinio_loop}} should probably 
> auto-initialize the event loop if required.
> For now, I am adding fatal checks before a null pointer dereference.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9097) `libwinio_loop` must be initialized before `Socket` constructor is called

2018-07-19 Thread Andrew Schwartzmeyer (JIRA)
Andrew Schwartzmeyer created MESOS-9097:
---

 Summary: `libwinio_loop` must be initialized before `Socket` 
constructor is called
 Key: MESOS-9097
 URL: https://issues.apache.org/jira/browse/MESOS-9097
 Project: Mesos
  Issue Type: Bug
Affects Versions: 1.7.0
 Environment: Windows with \{{-DENABLE_LIBWINIO=ON}}
Reporter: Andrew Schwartzmeyer


When building with {{-DENABLE_LIBWINIO}}, initializing the Windows event loop 
(specifically the pointer {{process::libwinio_loop}}) becomes a prerequisite to 
creating a {{Socket}}. If it has not been initialized, then when the {{Socket}} 
constructor calls {{prepare_async()}}, a null pointer is dereferenced, leading 
to a hang on Windows.

This was discovered in the simple program {{test-linkee}} where a {{Socket}} is 
created and used, but the entire libprocess event loop is unused. This is 
temporarily fixed by calling {{process::initialize()}} early in 
{{test-linkee}}, but this should probably not be required. Instead, 
{{prepare_async()}} (or any use of {{libwinio_loop}} should probably 
auto-initialize the event loop if required.

For now, I am adding fatal checks before a null pointer dereference.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9096) Consider introducing a linter to check changes to tag numbers in public protos

2018-07-19 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-9096:
---

 Summary: Consider introducing a linter to check changes to tag 
numbers in public protos
 Key: MESOS-9096
 URL: https://issues.apache.org/jira/browse/MESOS-9096
 Project: Mesos
  Issue Type: Improvement
  Components: build
Reporter: Benjamin Bannier


Right now detecting breaking changes to proto messages where a tag number 
changes require manual inspection. It seems it should be possible to write a 
proto linter which would detect such changes.

It could implement the following flow:
* check if the proto is public, e.g., in some public include path
* check that the diff contains no changes to tag numbers (same field name, 
similar location).

We should also check whether such tools already exist and we could add them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-2633) Move implementations of Framework struct functions out of master.hpp

2018-07-19 Thread Alexander Rukletsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov reassigned MESOS-2633:
--

Assignee: (was: Isabel Jimenez)

> Move implementations of Framework struct functions out of master.hpp
> 
>
> Key: MESOS-2633
> URL: https://issues.apache.org/jira/browse/MESOS-2633
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Joris Van Remoortere
>Priority: Trivial
>  Labels: master, newbie, tech-debt, trivial
>
> To help reduce compile time and keep the header easy to read, let's move the 
> implementations of the Framework struct functions out of master.hpp



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9095) Consider including public protobuf definitions in generated jar

2018-07-19 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-9095:
---

 Summary: Consider including public protobuf definitions in 
generated jar
 Key: MESOS-9095
 URL: https://issues.apache.org/jira/browse/MESOS-9095
 Project: Mesos
  Issue Type: Improvement
  Components: java api
Reporter: Benjamin Bannier


We currently do not package public proto sources alongside other resources in 
the jar. This is inconsistent with what we do e.g., for packages or {{install 
rules}} on the C++ side.

Frameworks seem to work around this by forking required proto sources into 
their own source code, or (slightly less worse) fetching them from potentially 
poorly versioned internet resources. Both approaches can lead to complicate 
dependencies between used jar and proto sources.

We should include them in the jar we publish, e.g., by declaring them as 
{{resources}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9094) On macOS libprocess_tests fail to link when compiling with gRPC

2018-07-19 Thread Jan Schlicht (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548922#comment-16548922
 ] 

Jan Schlicht commented on MESOS-9094:
-

cc [~chhsia0]. Found https://grpc.io/grpc/cpp/classgrpc_1_1_time_point.html 
which seems to be related.

> On macOS libprocess_tests fail to link when compiling with gRPC
> ---
>
> Key: MESOS-9094
> URL: https://issues.apache.org/jira/browse/MESOS-9094
> Project: Mesos
>  Issue Type: Bug
> Environment: macOS 10.13.6 with clang 6.0.1.
>Reporter: Jan Schlicht
>Priority: Major
> Fix For: 1.7.0
>
>
> Seems like this was introduces with commit 
> {{a211b4cadf289168464fc50987255d883c226e89}}. Linking {{libprocess-tests}} on 
> macOS with enabled gRPC fails with
> {noformat}
> Undefined symbols for architecture x86_64:
>   
> "grpc::TimePoint std::__1::chrono::duration > > 
> >::you_need_a_specialization_of_TimePoint()", referenced from:
>   process::Future > 
> process::grpc::client::Runtime::call,
>  std::__1::default_delete > > 
> (tests::PingPong::Stub::*)(grpc::ClientContext*, tests::Ping const&, 
> grpc::CompletionQueue*), tests::Ping, tests::Pong, 
> 0>(process::grpc::client::Connection const&, 
> std::__1::unique_ptr, 
> std::__1::default_delete > > 
> (tests::PingPong::Stub::*&&)(grpc::ClientContext*, tests::Ping const&, 
> grpc::CompletionQueue*), tests::Ping&&, process::grpc::client::CallOptions 
> const&)::'lambda'(tests::Ping const&, bool, 
> grpc::CompletionQueue*)::operator()(tests::Ping const&, bool, 
> grpc::CompletionQueue*) const in libprocess_tests-grpc_tests.o
> ld: symbol(s) not found for architecture x86_64
> clang-6.0: error: linker command failed with exit code 1 (use -v to see 
> invocation)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9094) On macOS libprocess_tests fail to link when compiling with gRPC

2018-07-19 Thread Jan Schlicht (JIRA)
Jan Schlicht created MESOS-9094:
---

 Summary: On macOS libprocess_tests fail to link when compiling 
with gRPC
 Key: MESOS-9094
 URL: https://issues.apache.org/jira/browse/MESOS-9094
 Project: Mesos
  Issue Type: Bug
 Environment: macOS 10.13.6 with clang 6.0.1.
Reporter: Jan Schlicht
 Fix For: 1.7.0


Seems like this was introduces with commit 
{{a211b4cadf289168464fc50987255d883c226e89}}. Linking {{libprocess-tests}} on 
macOS with enabled gRPC fails with
{noformat}
Undefined symbols for architecture x86_64:
  "grpc::TimePoint > > 
>::you_need_a_specialization_of_TimePoint()", referenced from:
  process::Future > 
process::grpc::client::Runtime::call,
 std::__1::default_delete > > 
(tests::PingPong::Stub::*)(grpc::ClientContext*, tests::Ping const&, 
grpc::CompletionQueue*), tests::Ping, tests::Pong, 
0>(process::grpc::client::Connection const&, 
std::__1::unique_ptr, 
std::__1::default_delete > > 
(tests::PingPong::Stub::*&&)(grpc::ClientContext*, tests::Ping const&, 
grpc::CompletionQueue*), tests::Ping&&, process::grpc::client::CallOptions 
const&)::'lambda'(tests::Ping const&, bool, 
grpc::CompletionQueue*)::operator()(tests::Ping const&, bool, 
grpc::CompletionQueue*) const in libprocess_tests-grpc_tests.o
ld: symbol(s) not found for architecture x86_64
clang-6.0: error: linker command failed with exit code 1 (use -v to see 
invocation)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)