[jira] [Created] (MESOS-5580) Implement authn/authz for the network/cni isolator

2016-06-08 Thread Avinash Sridharan (JIRA)
Avinash Sridharan created MESOS-5580:


 Summary: Implement authn/authz for the network/cni isolator
 Key: MESOS-5580
 URL: https://issues.apache.org/jira/browse/MESOS-5580
 Project: Mesos
  Issue Type: Task
 Environment: Linux
Reporter: Avinash Sridharan
Assignee: Avinash Sridharan


Currently any framework can launch containers on any CNI network irrespective 
of its role and principal. We need perform authn/authz in the network/cni 
isolator (or Master) to make sure that only roles/principals specified by the 
operator can launch containers on a given network. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5579) Support static IP address allocation with `DockerContainerizer`

2016-06-08 Thread Avinash Sridharan (JIRA)
Avinash Sridharan created MESOS-5579:


 Summary: Support static IP address allocation with 
`DockerContainerizer`
 Key: MESOS-5579
 URL: https://issues.apache.org/jira/browse/MESOS-5579
 Project: Mesos
  Issue Type: Task
 Environment: Linux
Reporter: Avinash Sridharan


Docker run supports the `--ip` option to allocate a specific IPv4 address to 
the container. Also, the `NetworkInfo` protobuf has an `ipaddress` field that 
all frameworks to specify an IP address for the container. The docker executor 
should therefore invoke the `docker run` command with the --ip option whenever 
the `ipaddress` field of the `NetworkInfo` is set allowing frameworks to try 
and assign a static IP address for their services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5578) Support static address allocation in CNI

2016-06-08 Thread Avinash Sridharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan updated MESOS-5578:
-
Story Points: 1

> Support static address allocation in CNI
> 
>
> Key: MESOS-5578
> URL: https://issues.apache.org/jira/browse/MESOS-5578
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Affects Versions: 1.0.0
> Environment: Linux
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> Currently a framework can't specify a static IP address for the container 
> when using the network/cni isolator.
> The `ipaddress` field in the `NetworkInfo` protobuf was designed for this 
> specific purpose but since the CNI spec does not specify a means to allocate 
> an IP address to the container the `network/cni` isolator cannot honor this 
> field even when it is filled in by the framework.
> Creating this ticket to act as a place holder to track this limitation. As 
> and when the CNI spec allows us to specify a static IP address for the 
> container, we can resolve this ticket. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5578) Support static address allocation in CNI

2016-06-08 Thread Avinash Sridharan (JIRA)
Avinash Sridharan created MESOS-5578:


 Summary: Support static address allocation in CNI
 Key: MESOS-5578
 URL: https://issues.apache.org/jira/browse/MESOS-5578
 Project: Mesos
  Issue Type: Task
  Components: containerization
Affects Versions: 1.0.0
 Environment: Linux
Reporter: Avinash Sridharan
Assignee: Avinash Sridharan


Currently a framework can't specify a static IP address for the container when 
using the network/cni isolator.

The `ipaddress` field in the `NetworkInfo` protobuf was designed for this 
specific purpose but since the CNI spec does not specify a means to allocate an 
IP address to the container the `network/cni` isolator cannot honor this field 
even when it is filled in by the framework.

Creating this ticket to act as a place holder to track this limitation. As and 
when the CNI spec allows us to specify a static IP address for the container, 
we can resolve this ticket. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5578) Support static address allocation in CNI

2016-06-08 Thread Avinash Sridharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan updated MESOS-5578:
-
Labels: mesosphere  (was: )

> Support static address allocation in CNI
> 
>
> Key: MESOS-5578
> URL: https://issues.apache.org/jira/browse/MESOS-5578
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Affects Versions: 1.0.0
> Environment: Linux
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> Currently a framework can't specify a static IP address for the container 
> when using the network/cni isolator.
> The `ipaddress` field in the `NetworkInfo` protobuf was designed for this 
> specific purpose but since the CNI spec does not specify a means to allocate 
> an IP address to the container the `network/cni` isolator cannot honor this 
> field even when it is filled in by the framework.
> Creating this ticket to act as a place holder to track this limitation. As 
> and when the CNI spec allows us to specify a static IP address for the 
> container, we can resolve this ticket. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5577) Modules using replicated log state API require zookeeper headers

2016-06-08 Thread Avinash Sridharan (JIRA)
Avinash Sridharan created MESOS-5577:


 Summary: Modules using replicated log state API require zookeeper 
headers
 Key: MESOS-5577
 URL: https://issues.apache.org/jira/browse/MESOS-5577
 Project: Mesos
  Issue Type: Bug
  Components: modules
Affects Versions: 1.0.0
Reporter: Avinash Sridharan
Assignee: Avinash Sridharan
 Fix For: 1.0.0


The state API uses zookeeper client headers and hence the bundled zookeeper 
headers need to be installed during Mesos installation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5576) Masters may drop the first message they send between masters after a network partition

2016-06-08 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-5576:
-
Description: 
We observed the following situation in a cluster of five masters:
|| Time || Master 1 || Master 2 || Master 3 || Master 4 || Master 5 ||
| 0 | Follower | Follower | Follower | Follower | Leader |
| 1 | Follower | Follower | Follower | Follower || Partitioned from cluster by 
downing this VM's network ||
| 2 || Elected Leader by ZK | Voting | Voting | Voting | Suicides due to lost 
leadership |
| 3 | Performs consensus | Replies to leader | Replies to leader | Replies to 
leader | Still down |
| 4 | Performs writing | Acks to leader | Acks to leader | Acks to leader | 
Still down |
| 5 | Leader | Follower | Follower | Follower | Still down |
| 6 | Leader | Follower | Follower | Follower | Comes back up |
| 7 | Leader | Follower | Follower | Follower | Follower |
| 8 || Partitioned in the same way as Master 5 | Follower | Follower | Follower 
| Follower |
| 9 | Suicides due to lost leadership || Elected Leader by ZK | Follower | 
Follower | Follower |
| 10 | Still down | Performs consensus | Replies to leader | Replies to leader 
|| Doesn't get the message! ||
| 11 | Still down | Performs writing | Acks to leader | Acks to leader || Acks 
to leader ||
| 12 | Still down | Leader | Follower | Follower | Follower |

Master 2 sends a series of messages to the recently-restarted Master 5.  The 
first message is dropped, but subsequent messages are not dropped.

This appears to be due to a stale link between the masters.  Before leader 
election, the replicated log actors create a network watcher, which adds links 
to masters that join the ZK group:
https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/network.hpp#L157-L159

This link does not appear to break (Master 2 -> 5) when Master 5 goes down, 
perhaps due to how the network partition was induced (in the hypervisor layer, 
rather than in the VM itself).

When Master 2 tries to send an {{PromiseRequest}} to Master 5, we do not 
observe the [expected log 
message|https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/replica.cpp#L493-L494]

Instead, we see a log line in Master 2:
{code}
process.cpp:2040] Failed to shutdown socket with fd 27: Transport endpoint is 
not connected
{code}

The broken link is removed by the libprocess {{socket_manager}} and the 
following {{WriteRequest}} from Master 2 to Master 5 succeeds via a new socket.

  was:
We observed the following situation in a cluster of five masters:
|| Time || Master 1 || Master 2 || Master 3 || Master 4 || Master 5 ||
| 0 | Follower | Follower | Follower | Follower | Leader |
| 1 | Follower | Follower | Follower | Follower || Partitioned from cluster by 
downing this VM's network ||
| 2 || Elected Leader by ZK | Voting | Voting | Voting | Suicides due to lost 
leadership |
| 3 | Performs consensus | Replies to leader | Replies to leader | Replies to 
leader | Still down |
| 4 | Performs writing | Acks to leader | Acks to leader | Acks to leader | 
Still down |
| 5 | Leader | Follower | Follower | Follower | Still down |
| 6 | Leader | Follower | Follower | Follower | Comes back up |
| 7 | Leader | Follower | Follower | Follower | Follower |
| 8 || Partitioned in the same way as Master 5 | Follower | Follower | Follower 
| Follower |
| 9 | Suicides due to lost leadership || Elected Leader by ZK | Follower | 
Follower | Follower |
| 10 | Still down | Performs consensus | Replies to leader | Replies to leader 
|| Doesn't get the message! ||
| 11 | Still down | Performs writing | Acks to leader | Acks to leader || Acks 
to leader ||
| 12 | Still down | Leader | Follower | Follower | Follower |

Master 1 sends a series of messages to the recently-restarted Master 5.  The 
first message is dropped, but subsequent messages are not dropped.

This appears to be due to a stale link between the masters.  Before leader 
election, the replicated log actors create a network watcher, which adds links 
to masters that join the ZK group:
https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/network.hpp#L157-L159

This link does not appear to break (Master 2 -> 5) when Master 5 goes down, 
perhaps due to how the network partition was induced (in the hypervisor layer, 
rather than in the VM itself).

When Master 2 tries to send an {{PromiseRequest}} to Master 5, we do not 
observe the [expected log 
message|https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/replica.cpp#L493-L494]

Instead, we see a log line in Master 2:
{code}
process.cpp:2040] Failed to shutdown socket with fd 27: Transport endpoint is 
not connected
{code}

The broken link is removed by the libprocess {{socket_manager}} and the 
following {{WriteRequest}} from Master 2 to Master 5 succeeds via a new socket.


> Masters 

[jira] [Updated] (MESOS-5576) Masters may drop the first message they send between masters after a network partition

2016-06-08 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-5576:
-
Description: 
We observed the following situation in a cluster of five masters:
|| Time || Master 1 || Master 2 || Master 3 || Master 4 || Master 5 ||
| 0 | Follower | Follower | Follower | Follower | Leader |
| 1 | Follower | Follower | Follower | Follower || Partitioned from cluster by 
downing this VM's network ||
| 2 || Elected Leader by ZK | Voting | Voting | Voting | Suicides due to lost 
leadership |
| 3 | Performs consensus | Replies to leader | Replies to leader | Replies to 
leader | Still down |
| 4 | Performs writing | Acks to leader | Acks to leader | Acks to leader | 
Still down |
| 5 | Leader | Follower | Follower | Follower | Still down |
| 6 | Leader | Follower | Follower | Follower | Comes back up |
| 7 | Leader | Follower | Follower | Follower | Follower |
| 8 || Partitioned in the same way as Master 5 | Follower | Follower | Follower 
| Follower |
| 9 | Suicides due to lost leadership || Elected Leader by ZK | Follower | 
Follower | Follower |
| 10 | Still down | Performs consensus | Replies to leader | Replies to leader 
|| Doesn't get the message! ||
| 11 | Still down | Performs writing | Acks to leader | Acks to leader || Acks 
to leader ||
| 12 | Still down | Leader | Follower | Follower | Follower |

Master 1 sends a series of messages to the recently-restarted Master 5.  The 
first message is dropped, but subsequent messages are not dropped.

This appears to be due to a stale link between the masters.  Before leader 
election, the replicated log actors create a network watcher, which adds links 
to masters that join the ZK group:
https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/network.hpp#L157-L159

This link does not appear to break (Master 2 -> 5) when Master 5 goes down, 
perhaps due to how the network partition was induced (in the hypervisor layer, 
rather than in the VM itself).

When Master 2 tries to send an {{PromiseRequest}} to Master 5, we do not 
observe the [expected log 
message|https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/replica.cpp#L493-L494]

Instead, we see a log line in Master 2:
{code}
process.cpp:2040] Failed to shutdown socket with fd 27: Transport endpoint is 
not connected
{code}

The broken link is removed by the libprocess {{socket_manager}} and the 
following {{WriteRequest}} from Master 2 to Master 5 succeeds via a new socket.

  was:
We observed the following situation in a cluster of five masters:
|| Time || Master 1 || Master 2 || Master 3 || Master 4 || Master 5 ||
| 0 | Follower | Follower | Follower | Follower | Leader |
| 1 | Follower | Follower | Follower | Follower || Partitioned from cluster by 
downing this VM's network ||
| 2 || Elected Leader by ZK | Voting | Voting | Voting | Suicides due to lost 
leadership |
| 3 | Performs consensus | Replies to leader | Replies to leader | Replies to 
leader | Still down |
| 4 | Performs writing | Acks to leader | Acks to leader | Acks to leader | 
Still down |
| 5 | Leader | Follower | Follower | Follower | Still down |
| 6 | Leader | Follower | Follower | Follower | Comes back up |
| 7 | Leader | Follower | Follower | Follower | Follower |
| 8 || Partitioned in the same way as Master 5 | Follower | Follower | Follower 
| Follower |
| 9 | Suicides due to lost leadership || Elected Leader by ZK | Follower | 
Follower | Follower |
| 10 | Still down | Performs consensus | Replies to leader | Replies to leader 
|| Doesn't get the message! ||
| 11 | Still down | Performs writing | Acks to leader | Acks to leader || Acks 
to leader ||
| 12 | Still down | Leader | Follower | Follower | Follower |

Master 1 sends a series of messages to the recently-restarted Master 5.  The 
first message is dropped, but subsequent messages are not dropped.

This appears to be due to a stale link between the masters.  Before leader 
election, the replicated log actors create a network watcher, which adds links 
to masters that join the ZK group:
https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/network.hpp#L157-L159

This link does not appear to break (Master 1 -> 5) when Master 5 goes down, 
perhaps due to how the network partition was induced (in the hypervisor layer, 
rather than in the VM itself).

When Master 1 tries to send an {{PromiseRequest}} to Master 5, we do not 
observe the [expected log 
message|https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/replica.cpp#L493-L494]

Instead, we see a log line in Master 1:
{code}
process.cpp:2040] Failed to shutdown socket with fd 27: Transport endpoint is 
not connected
{code}

The broken link is removed by the libprocess {{socket_manager}} and the 
following {{WriteRequest}} from Master 1 to Master 5 succeeds via a new socket.


> Masters 

[jira] [Comment Edited] (MESOS-5143) LostSlaveMessage should not be broadcasted.

2016-06-08 Thread Anindya Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275029#comment-15275029
 ] 

Anindya Sinha edited comment on MESOS-5143 at 6/9/16 1:31 AM:
--

RR published:
https://reviews.apache.org/r/48453/
https://reviews.apache.org/r/47082/


was (Author: anindya.sinha):
RR published:
https://reviews.apache.org/r/47082/

> LostSlaveMessage should not be broadcasted.
> ---
>
> Key: MESOS-5143
> URL: https://issues.apache.org/jira/browse/MESOS-5143
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Yan Xu
>Assignee: Anindya Sinha
>
> Currently a {{LostSlaveMessage}} (in v1 it's a type of {{Event::Failure}}) is 
> broadcasted to all registered frameworks in the cluster whenever a slave is 
> lost.
> This is unnecessary and kind of breaks the Mesos abstraction: Frameworks are 
> a given a slice of the cluster, not the entirety. They know about the slice 
> when offers are extended to them, so we shouldn't inform all of them when all 
> agents go away.
> This message should instead be narrowcasted to all frameworks who have a 
> stake in this agent: running tasks, pending offers, reservations, persistent 
> volumes, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5575) Attempting to Parse PID logging is too verbose

2016-06-08 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu reassigned MESOS-5575:
-

Assignee: Yan Xu

> Attempting to Parse PID logging is too verbose
> --
>
> Key: MESOS-5575
> URL: https://issues.apache.org/jira/browse/MESOS-5575
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Yan Xu
>Assignee: Yan Xu
>Priority: Minor
>
> When you crank up the mesos log level to VLOG(2) the logs get flooded with 
> “Attempting to parse PID” messages.
> This line is logged whenever you create a PID/UPID from a string and in all 
> successful cases. Compared to other VLOG(2) logs this is less informative and 
> more frequent.
> We should change it to VLOG(3).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4952) Annoying image provisioner logging for when images are not used.

2016-06-08 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu reassigned MESOS-4952:
-

Assignee: Yan Xu

> Annoying image provisioner logging for when images are not used.
> 
>
> Key: MESOS-4952
> URL: https://issues.apache.org/jira/browse/MESOS-4952
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Yan Xu
>Assignee: Yan Xu
>Priority: Minor
>
> {{Provisioner::destroy()}} logs this message even when images are not used in 
> the Mesos cluster:
> {noformat:title=}
> Ignoring destroy request for unknown container 
> 597f511e-479d-4632-a3b9-43b1e368c744
> {noformat}
> See 
> [code|https://github.com/apache/mesos/blob/37958fd70de1998e6c29b643abd4f43dd1ef4c79/src/slave/containerizer/mesos/provisioner/provisioner.cpp#L306].
> This can be surprising and annoying to people who are not actually using this 
> feature and the container is totally valid, it's just not using images.
> Let's at least tune it down to VLOG(1).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5574) Missing dependency on libdl

2016-06-08 Thread Ian Downes (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Downes updated MESOS-5574:
--
Component/s: build

> Missing dependency on libdl
> ---
>
> Key: MESOS-5574
> URL: https://issues.apache.org/jira/browse/MESOS-5574
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.0.0
> Environment: CentOS5, devtoolset-2, gcc version 4.8.2 20140120 (Red 
> Hat 4.8.2-15) (GCC)
>Reporter: Ian Downes
>
> {noformat}
> $ make
> ...
> Making all in src
> make[1]: Entering directory `/home/idownes/workspace/mesos/build/src'
> make  all-am
> make[2]: Entering directory `/home/idownes/workspace/mesos/build/src'
> /bin/sh ../libtool --tag=CXX   --mode=link g++ -pthread -g1 -O0 
> -Wno-unused-local-typedefs -std=c++11 -Wl,--as-needed  -o mesos-local 
> local/mesos_local-main.o libmesos.la  -lz -lsvn_delta-1 -lsvn_subr-1 
> -lsasl2 -lcurl -lapr-1 -lz  -lrt
> libtool: link: g++ -pthread -g1 -O0 -Wno-unused-local-typedefs -std=c++11 
> -Wl,--as-needed -o .libs/mesos-local local/mesos_local-main.o  
> ./.libs/libmesos.so /usr/lib64/libsvn_delta-1.so /usr/lib64/libsvn_subr-1.so 
> /usr/lib64/libaprutil-1.so -lcrypt -lexpat -ldb-4.7 -lsasl2 -lcurl 
> /usr/lib64/libapr-1.so -lpthread -lz -lrt -pthread -Wl,-rpath -Wl,/usr/lib64
> ./.libs/libmesos.so: error: undefined reference to 'dlopen'
> ./.libs/libmesos.so: error: undefined reference to 'dlerror'
> ./.libs/libmesos.so: error: undefined reference to 'dlclose'
> ./.libs/libmesos.so: error: undefined reference to 'dlsym'
> collect2: error: ld returned 1 exit status
> make[2]: *** [mesos-local] Error 1
> make[2]: Leaving directory `/home/idownes/workspace/mesos/build/src'
> make[1]: *** [all] Error 2
> make[1]: Leaving directory `/home/idownes/workspace/mesos/build/src'
> make: *** [all-recursive] Error 1
> {noformat}
> Builds correctly when libdl inclusion is forced:
> {noformat}
> $ LDFLAGS='-ldl' ../configure
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5574) Missing dependency on libdl

2016-06-08 Thread Ian Downes (JIRA)
Ian Downes created MESOS-5574:
-

 Summary: Missing dependency on libdl
 Key: MESOS-5574
 URL: https://issues.apache.org/jira/browse/MESOS-5574
 Project: Mesos
  Issue Type: Bug
Affects Versions: 1.0.0
 Environment: CentOS5, devtoolset-2, gcc version 4.8.2 20140120 (Red 
Hat 4.8.2-15) (GCC)
Reporter: Ian Downes


{noformat}
$ make
...
Making all in src
make[1]: Entering directory `/home/idownes/workspace/mesos/build/src'
make  all-am
make[2]: Entering directory `/home/idownes/workspace/mesos/build/src'
/bin/sh ../libtool --tag=CXX   --mode=link g++ -pthread -g1 -O0 
-Wno-unused-local-typedefs -std=c++11 -Wl,--as-needed  -o mesos-local 
local/mesos_local-main.o libmesos.la  -lz -lsvn_delta-1 -lsvn_subr-1 
-lsasl2 -lcurl -lapr-1 -lz  -lrt
libtool: link: g++ -pthread -g1 -O0 -Wno-unused-local-typedefs -std=c++11 
-Wl,--as-needed -o .libs/mesos-local local/mesos_local-main.o  
./.libs/libmesos.so /usr/lib64/libsvn_delta-1.so /usr/lib64/libsvn_subr-1.so 
/usr/lib64/libaprutil-1.so -lcrypt -lexpat -ldb-4.7 -lsasl2 -lcurl 
/usr/lib64/libapr-1.so -lpthread -lz -lrt -pthread -Wl,-rpath -Wl,/usr/lib64
./.libs/libmesos.so: error: undefined reference to 'dlopen'
./.libs/libmesos.so: error: undefined reference to 'dlerror'
./.libs/libmesos.so: error: undefined reference to 'dlclose'
./.libs/libmesos.so: error: undefined reference to 'dlsym'
collect2: error: ld returned 1 exit status
make[2]: *** [mesos-local] Error 1
make[2]: Leaving directory `/home/idownes/workspace/mesos/build/src'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/home/idownes/workspace/mesos/build/src'
make: *** [all-recursive] Error 1
{noformat}

Builds correctly when libdl inclusion is forced:
{noformat}
$ LDFLAGS='-ldl' ../configure
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5491) Implement GET_AGENTS Call in v1 master API.

2016-06-08 Thread zhou xing (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321440#comment-15321440
 ] 

zhou xing commented on MESOS-5491:
--

One review request submitted: https://reviews.apache.org/r/48438/

> Implement GET_AGENTS Call in v1 master API.
> ---
>
> Key: MESOS-5491
> URL: https://issues.apache.org/jira/browse/MESOS-5491
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: zhou xing
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources

2016-06-08 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321408#comment-15321408
 ] 

Fan Du commented on MESOS-5545:
---

[~brugidou] Thanks for the sharing, I will look into it!

> Add rack awareness support for Mesos resources
> --
>
> Key: MESOS-5545
> URL: https://issues.apache.org/jira/browse/MESOS-5545
> Project: Mesos
>  Issue Type: Story
>  Components: hadoop, master
>Reporter: Fan Du
> Attachments: RackAwarenessforMesos-Lite.pdf
>
>
> Resources managed by Mesos master have no topology information of the 
> cluster, for example, rack topology. While lots of data center applications 
> have rack awareness feature to provide data locality, fault tolerance and 
> intelligent task placement. This ticket tries to investigate how to add rack 
> awareness for Mesos resources topology.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4279) Docker executor truncates task's output when the task is killed.

2016-06-08 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321312#comment-15321312
 ] 

Benjamin Mahler commented on MESOS-4279:


I posted two fixes related to this ticket.

The first is to send terminal status updates in the same manner as the command 
executor:
https://reviews.apache.org/r/48428/

The second is to eliminate the killing of the 'docker run' subprocess, which 
breaks the log redirection:
https://reviews.apache.org/r/48429/

Let me know if you have any feedback, [~jieyu] kindly agreed to review.

> Docker executor truncates task's output when the task is killed.
> 
>
> Key: MESOS-4279
> URL: https://issues.apache.org/jira/browse/MESOS-4279
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0, 0.26.0, 0.27.2, 0.28.1
>Reporter: Martin Bydzovsky
>Assignee: Benjamin Mahler
>Priority: Critical
>  Labels: docker, mesosphere
> Fix For: 1.0.0
>
>
> I'm implementing a graceful restarts of our mesos-marathon-docker setup and I 
> came to a following issue:
> (it was already discussed on 
> https://github.com/mesosphere/marathon/issues/2876 and guys form mesosphere 
> got to a point that its probably a docker containerizer problem...)
> To sum it up:
> When i deploy simple python script to all mesos-slaves:
> {code}
> #!/usr/bin/python
> from time import sleep
> import signal
> import sys
> import datetime
> def sigterm_handler(_signo, _stack_frame):
> print "got %i" % _signo
> print datetime.datetime.now().time()
> sys.stdout.flush()
> sleep(2)
> print datetime.datetime.now().time()
> print "ending"
> sys.stdout.flush()
> sys.exit(0)
> signal.signal(signal.SIGTERM, sigterm_handler)
> signal.signal(signal.SIGINT, sigterm_handler)
> try:
> print "Hello"
> i = 0
> while True:
> i += 1
> print datetime.datetime.now().time()
> print "Iteration #%i" % i
> sys.stdout.flush()
> sleep(1)
> finally:
> print "Goodbye"
> {code}
> and I run it through Marathon like
> {code:javascript}
> data = {
>   args: ["/tmp/script.py"],
>   instances: 1,
>   cpus: 0.1,
>   mem: 256,
>   id: "marathon-test-api"
> }
> {code}
> During the app restart I get expected result - the task receives sigterm and 
> dies peacefully (during my script-specified 2 seconds period)
> But when i wrap this python script in a docker:
> {code}
> FROM node:4.2
> RUN mkdir /app
> ADD . /app
> WORKDIR /app
> ENTRYPOINT []
> {code}
> and run appropriate application by Marathon:
> {code:javascript}
> data = {
>   args: ["./script.py"],
>   container: {
>   type: "DOCKER",
>   docker: {
>   image: "bydga/marathon-test-api"
>   },
>   forcePullImage: yes
>   },
>   cpus: 0.1,
>   mem: 256,
>   instances: 1,
>   id: "marathon-test-api"
> }
> {code}
> The task during restart (issued from marathon) dies immediately without 
> having a chance to do any cleanup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5573) Executor Driver does not invoke the `disconnected` callback upon disconnection with the agent.

2016-06-08 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-5573:
--
Labels: mesosphere newbie  (was: mesosphere)

> Executor Driver does not invoke the `disconnected` callback upon 
> disconnection with the agent.
> --
>
> Key: MESOS-5573
> URL: https://issues.apache.org/jira/browse/MESOS-5573
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>  Labels: mesosphere, newbie
>
> The executor driver must invoke the {{disconnected}} callback upon 
> disconnecting with the agent i.e. if the agent process restarts as per 
> documentation:
> https://mesos.apache.org/api/latest/java/org/apache/mesos/Executor.html#disconnected(org.apache.mesos.ExecutorDriver)
> It does not seem to be the case that is being done currently. Also, this 
> callback should only be invoked for frameworks with checkpointing enabled as 
> for non-checkpointed frameworks the executor is shutdown upon a disconnection.
> There might already be a JIRA for this. But, I was not able to spot any.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5573) Executor Driver does not invoke the `disconnected` callback upon disconnection with the agent.

2016-06-08 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-5573:
-

 Summary: Executor Driver does not invoke the `disconnected` 
callback upon disconnection with the agent.
 Key: MESOS-5573
 URL: https://issues.apache.org/jira/browse/MESOS-5573
 Project: Mesos
  Issue Type: Bug
Reporter: Anand Mazumdar


The executor driver must invoke the {{disconnected}} callback upon 
disconnecting with the agent i.e. if the agent process restarts as per 
documentation:

https://mesos.apache.org/api/latest/java/org/apache/mesos/Executor.html#disconnected(org.apache.mesos.ExecutorDriver)

It does not seem to be the case that is being done currently. Also, this 
callback should only be invoked for frameworks with checkpointing enabled as 
for non-checkpointed frameworks the executor is shutdown upon a disconnection.

There might already be a JIRA for this. But, I was not able to spot any.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4672) Implement aufs based provisioner backend.

2016-06-08 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321213#comment-15321213
 ] 

Jie Yu commented on MESOS-4672:
---

commit d98de34efc9d8fa891c5d29e5fbdb22382e10d64
Author: Shuai Lin 
Date:   Wed Jun 8 11:54:40 2016 -0700

Fixed compilation on OS X of aufs tests.

Review: https://reviews.apache.org/r/48388/

> Implement aufs based provisioner backend.
> -
>
> Key: MESOS-4672
> URL: https://issues.apache.org/jira/browse/MESOS-4672
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Shuai Lin
> Fix For: 1.0.0
>
>
> Overlay fs support hasn't been merged until kernel 3.18. Docker's default 
> storage backend for ubuntu 14.04 is aufs. We should consider adding a aufs 
> based backend for unified containerizer as well to efficiently provide a 
> union fs (instead of relying on copy backend which is not space efficient).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3014) Support network egress bandwidth as a first-class resource

2016-06-08 Thread Tom Ganem (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321212#comment-15321212
 ] 

Tom Ganem commented on MESOS-3014:
--

So I can see this issue is not a priority to anyone right now, but it is of 
extreme interest to me. The company I work for specializes in high speed file 
transfers. We have recently been brainstorming about creating a mesos framework 
that would launch short-lived transfer processes on the mesos cluster.  If 
outbound network bandwidth was offered as a resource, we could ensure that each 
transfer process we launch would have a desired amount of bandwidth while not 
affecting other processes/containers.

> Support network egress bandwidth as a first-class resource
> --
>
> Key: MESOS-3014
> URL: https://issues.apache.org/jira/browse/MESOS-3014
> Project: Mesos
>  Issue Type: Improvement
>  Components: isolation
>Reporter: Adam B
>  Labels: isolation, network, resource
>
> Mesos 0.23.0 introduced `--egress_rate_limit_per_container=100MB` for 
> statically configuring a fixed, per-container limit on outbound network 
> bandwidth, but if this were instead a standard resource type (outbound 
> network bandwidth) with a fixed total, then different subsets could be 
> reserved/claimed by different frameworks/tasks. This would allow us to adjust 
> the per-container limit depending on how many containers are running, or give 
> some containers more bandwidth than others.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5572) Change Operator API RPC handlers return type to http::Response

2016-06-08 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-5572:
--
Shepherd: Vinod Kone

> Change Operator API RPC handlers return type to http::Response
> --
>
> Key: MESOS-5572
> URL: https://issues.apache.org/jira/browse/MESOS-5572
> Project: Mesos
>  Issue Type: Improvement
>  Components: HTTP API
>Reporter: haosdent
>Assignee: haosdent
>  Labels: http
>
> As discussion in http://search-hadoop.com/m/0Vlr6Uz9otVwkdv , we need to 
> change the return type of RPC handlers to
> {{http::Response}} to make it more flexible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5572) Change Operator API RPC handlers return type to http::Response

2016-06-08 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned MESOS-5572:
---

Assignee: haosdent

> Change Operator API RPC handlers return type to http::Response
> --
>
> Key: MESOS-5572
> URL: https://issues.apache.org/jira/browse/MESOS-5572
> Project: Mesos
>  Issue Type: Improvement
>  Components: HTTP API
>Reporter: haosdent
>Assignee: haosdent
>  Labels: http
>
> As discussion in http://search-hadoop.com/m/0Vlr6Uz9otVwkdv , we need to 
> change the return type of RPC handlers to
> {{http::Response}} to make it more flexible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5572) Change Operator API RPC handlers return type to http::Response

2016-06-08 Thread haosdent (JIRA)
haosdent created MESOS-5572:
---

 Summary: Change Operator API RPC handlers return type to 
http::Response
 Key: MESOS-5572
 URL: https://issues.apache.org/jira/browse/MESOS-5572
 Project: Mesos
  Issue Type: Improvement
  Components: HTTP API
Reporter: haosdent


As discussion in http://search-hadoop.com/m/0Vlr6Uz9otVwkdv , we need to change 
the return type of RPC handlers to
{{http::Response}} to make it more flexible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5571) Scheduler JNI throws exception when the major versions of JAR and libmesos don't match

2016-06-08 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-5571:
--
Shepherd: Vinod Kone

> Scheduler JNI throws exception when the major versions of JAR and libmesos 
> don't match
> --
>
> Key: MESOS-5571
> URL: https://issues.apache.org/jira/browse/MESOS-5571
> Project: Mesos
>  Issue Type: Bug
>  Components: java api
>Affects Versions: 1.0.0
>Reporter: Yan Xu
>Assignee: Yan Xu
>Priority: Blocker
> Fix For: 1.0.0
>
>
> In 
> [convert.cpp|https://github.com/apache/mesos/blob/master/src/java/jni/convert.cpp#L153]
>  we compare the major versions of the native library and the jar. This makes 
> upgrading frameworks unnecessarily hard because you would have to deploy 
> Mesos and frameworks in lockstep.
> Backwards-incompatible changes would warrant a major version bump but not
> vise versa. Plus it's more standard to express and check dependency
> versions outside of the code but through package metadata.
> The proposed solution is to remove this major version check altogether.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5571) Scheduler JNI throws exception when the major versions of JAR and libmesos don't match

2016-06-08 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu reassigned MESOS-5571:
-

Assignee: Yan Xu

> Scheduler JNI throws exception when the major versions of JAR and libmesos 
> don't match
> --
>
> Key: MESOS-5571
> URL: https://issues.apache.org/jira/browse/MESOS-5571
> Project: Mesos
>  Issue Type: Bug
>  Components: java api
>Affects Versions: 1.0.0
>Reporter: Yan Xu
>Assignee: Yan Xu
>Priority: Blocker
> Fix For: 1.0.0
>
>
> In 
> [convert.cpp|https://github.com/apache/mesos/blob/master/src/java/jni/convert.cpp#L153]
>  we compare the major versions of the native library and the jar. This makes 
> upgrading frameworks unnecessarily hard because you would have to deploy 
> Mesos and frameworks in lockstep.
> Backwards-incompatible changes would warrant a major version bump but not
> vise versa. Plus it's more standard to express and check dependency
> versions outside of the code but through package metadata.
> The proposed solution is to remove this major version check altogether.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5280) Inconsistent error checking in DRF sorter.

2016-06-08 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-5280:
--
Description: 
There exist a few different error handling styles in the sorter.

h2. Hard checks
e.g., 
[DRFSorter::update|https://github.com/apache/mesos/blob/c530deb3050d862fd894a9c4ed0a8ddca8714a63/src/master/allocator/sorter/drf/sorter.cpp#L62]
{code}
CHECK(weights.contains(name));
{code}

h2. No-op if it results in an error condition.
e.g., 
[DRFSorter::allocated|https://github.com/apache/mesos/blob/c530deb3050d862fd894a9c4ed0a8ddca8714a63/src/master/allocator/sorter/drf/sorter.cpp#L116]:
{code}
set::iterator it = find(name);

if (it != clients.end()) { // TODO(benh): This should really be a CHECK.
...
}
{code}

The problem:

- Silence no-ops is not ideal. (Implicitness makes it hard to debug things and 
we have run into one instance of this).
- Hard CHECKs on invalid arguments is often too harsh.
- Not checking preconditions can lead to subtle bugs.
- We should check errors consistently.

  was:
There exist a few different error handling styles in the sorter.

h2. Hard checks
e.g., 
[DRFSorter::update|https://github.com/apache/mesos/blob/c530deb3050d862fd894a9c4ed0a8ddca8714a63/src/master/allocator/sorter/drf/sorter.cpp#L62]
{code}
CHECK(weights.contains(name));
{code}

h2. No-op if it results in an error condition.
e.g., 
[DRFSorter::allocated|https://github.com/apache/mesos/blob/c530deb3050d862fd894a9c4ed0a8ddca8714a63/src/master/allocator/sorter/drf/sorter.cpp#L116]:
{code}
set::iterator it = find(name);

if (it != clients.end()) { // TODO(benh): This should really be a CHECK.
...
}
{code}

IMO there should never be silent no-ops. Short of CHECK, we should return an 
error if it's indeed an error. If either path of the branch is valid and one is 
a  noop, we should log the noop branch or return a 'bool' so the caller can 
distinguish the two.

Implicitness makes it hard to debug things and we have run into one instance of 
this.

My proposal is to use CHECKs consistently in sorter.


> Inconsistent error checking in DRF sorter.
> --
>
> Key: MESOS-5280
> URL: https://issues.apache.org/jira/browse/MESOS-5280
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Yan Xu
>Assignee: Yan Xu
>
> There exist a few different error handling styles in the sorter.
> h2. Hard checks
> e.g., 
> [DRFSorter::update|https://github.com/apache/mesos/blob/c530deb3050d862fd894a9c4ed0a8ddca8714a63/src/master/allocator/sorter/drf/sorter.cpp#L62]
> {code}
> CHECK(weights.contains(name));
> {code}
> h2. No-op if it results in an error condition.
> e.g., 
> [DRFSorter::allocated|https://github.com/apache/mesos/blob/c530deb3050d862fd894a9c4ed0a8ddca8714a63/src/master/allocator/sorter/drf/sorter.cpp#L116]:
> {code}
> set::iterator it = find(name);
> if (it != clients.end()) { // TODO(benh): This should really be a CHECK.
> ...
> }
> {code}
> The problem:
> - Silence no-ops is not ideal. (Implicitness makes it hard to debug things 
> and we have run into one instance of this).
> - Hard CHECKs on invalid arguments is often too harsh.
> - Not checking preconditions can lead to subtle bugs.
> - We should check errors consistently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5280) Inconsistent error checking in DRF sorter.

2016-06-08 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320999#comment-15320999
 ] 

Yan Xu commented on MESOS-5280:
---

Derived from the discussion [here|https://reviews.apache.org/r/47259/]:

IMO we should:
- CHECK internal invariants.
- Return error/none/false for invalid arguments or no-ops. If some of these 
things should never happen, they should be CHECKs in the caller (the allocator) 
because they are its internal invariants.
- The sorter should never assume the allocator would call its methods in the 
expected way/order.  (e.g., 
[here|https://github.com/apache/mesos/blob/6ce476461f0fedfb4ed4e40c15f25bb79a39b0f3/src/master/allocator/sorter/drf/sorter.cpp#L242]
 the method has no safegards at all).

/cc [~jvanremoortere]

> Inconsistent error checking in DRF sorter.
> --
>
> Key: MESOS-5280
> URL: https://issues.apache.org/jira/browse/MESOS-5280
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Yan Xu
>Assignee: Yan Xu
>
> There exist a few different error handling styles in the sorter.
> h2. Hard checks
> e.g., 
> [DRFSorter::update|https://github.com/apache/mesos/blob/c530deb3050d862fd894a9c4ed0a8ddca8714a63/src/master/allocator/sorter/drf/sorter.cpp#L62]
> {code}
> CHECK(weights.contains(name));
> {code}
> h2. No-op if it results in an error condition.
> e.g., 
> [DRFSorter::allocated|https://github.com/apache/mesos/blob/c530deb3050d862fd894a9c4ed0a8ddca8714a63/src/master/allocator/sorter/drf/sorter.cpp#L116]:
> {code}
> set::iterator it = find(name);
> if (it != clients.end()) { // TODO(benh): This should really be a CHECK.
> ...
> }
> {code}
> IMO there should never be silent no-ops. Short of CHECK, we should return an 
> error if it's indeed an error. If either path of the branch is valid and one 
> is a  noop, we should log the noop branch or return a 'bool' so the caller 
> can distinguish the two.
> Implicitness makes it hard to debug things and we have run into one instance 
> of this.
> My proposal is to use CHECKs consistently in sorter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5571) Scheduler JNI throws exception when the major versions of JAR and libmesos don't match

2016-06-08 Thread Yan Xu (JIRA)
Yan Xu created MESOS-5571:
-

 Summary: Scheduler JNI throws exception when the major versions of 
JAR and libmesos don't match
 Key: MESOS-5571
 URL: https://issues.apache.org/jira/browse/MESOS-5571
 Project: Mesos
  Issue Type: Bug
  Components: java api
Affects Versions: 1.0.0
Reporter: Yan Xu
Priority: Blocker
 Fix For: 1.0.0


In 
[convert.cpp|https://github.com/apache/mesos/blob/master/src/java/jni/convert.cpp#L153]
 we compare the major versions of the native library and the jar. This makes 
upgrading frameworks unnecessarily hard because you would have to deploy Mesos 
and frameworks in lockstep.

Backwards-incompatible changes would warrant a major version bump but not
vise versa. Plus it's more standard to express and check dependency
versions outside of the code but through package metadata.

The proposed solution is to remove this major version check altogether.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5567) Scheduler HTTP API cuts JSON buffers

2016-06-08 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320854#comment-15320854
 ] 

Vinod Kone commented on MESOS-5567:
---

There is no "\n" at the end of the chunk as you said. Maybe the doc is not 
clear. Would you mind sending a PR/review to fix it? I'll be happy to shepherd 
and commit it.

> Scheduler HTTP API cuts JSON buffers
> 
>
> Key: MESOS-5567
> URL: https://issues.apache.org/jira/browse/MESOS-5567
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.28.1
> Environment: Ubuntu 14.04 latest
>Reporter: Tobias Mueller
>
> According to the docs at 
> http://mesos.apache.org/documentation/latest/scheduler-http-api/ I assumed 
> that the message format would only contain "full" (meaning parseable) JSON 
> messages. In fact, I'm partially seeing splitted JSONs, where the next chunk 
> is just continuing the first part:
> {noformat}
> 1983
> {"offers":{"offers":[{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S0"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-0020"},"hostname":"172.17.10.102","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1055"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.102","ip":"172.17.10.102","port":5051},"path":"\/slave(1)","scheme":"http"}},{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S2"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-0020"},"hostname":"172.17.10.101","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1056"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.101","ip":"172.17.10.101","port":5051},"path":"\/slave(1)","scheme":"http"}},{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S1"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc
> 4f7e-0020"},"hostname":"172.17.10.103","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1057"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.103","ip":"172.17.10.103","port":5051},"path":"\/slave(1)","scheme":"http"}}]},"type":"OFFERS"}
> {noformat}
> I use the standard Node.js (4.4.5) http-client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5537) http v1 SUBSCRIBED scheduler event always has nil http_interval_seconds

2016-06-08 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar reassigned MESOS-5537:
-

Assignee: Anand Mazumdar

> http v1 SUBSCRIBED scheduler event always has nil http_interval_seconds
> ---
>
> Key: MESOS-5537
> URL: https://issues.apache.org/jira/browse/MESOS-5537
> Project: Mesos
>  Issue Type: Bug
>Reporter: James DeFelice
>Assignee: Anand Mazumdar
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 1.0.0
>
>
> I'm writing a controller in Go to monitor heartbeats. I'd like to use the 
> interval as communicated by the master, which should be specified in the 
> SUBSCRIBED event. But it's not.
> {code}
> 2016/06/03 18:34:04 {Type:SUBSCRIBED 
> Subscribed:_Subscribed{FrameworkID:{Value:ffdb6d6e-0167-4fa2-98f9-2c3f8157fc25-0004,},HeartbeatIntervalSeconds:nil,}
>  Offers:nil Rescind:nil Update:nil Message:nil Failure:nil Error:nil}
> {code}
> {code}
> $ dpkg -l |grep -e mesos
> ii  mesos   0.28.0-2.0.16.ubuntu1404 
> amd64Cluster resource manager with efficient resource isolation
> {code}
> I *am* seeing HEARTBEAT events. Just not seeing the interval specified in the 
> SUBSCRIBED event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources

2016-06-08 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320739#comment-15320739
 ] 

Joris Van Remoortere commented on MESOS-5545:
-

[~fan.du] I would like to; however, this is currently not high enough on my 
priority list. I'm passionate about this subject, which is why I've brought it 
up before :-)

We should see in the community meeting if there is some consensus on a timeline.

If the automation aspect is what is most important to you, then I would focus 
on a good interface between Mesos and the modules / tools you want to build to 
source the information.
We likely won't get much traction dragging specific strategies into the Mesos 
project. Rather, we should take the approach of ensuring the interfaces / 
primitives work well for a variety of strategies and tools.

> Add rack awareness support for Mesos resources
> --
>
> Key: MESOS-5545
> URL: https://issues.apache.org/jira/browse/MESOS-5545
> Project: Mesos
>  Issue Type: Story
>  Components: hadoop, master
>Reporter: Fan Du
> Attachments: RackAwarenessforMesos-Lite.pdf
>
>
> Resources managed by Mesos master have no topology information of the 
> cluster, for example, rack topology. While lots of data center applications 
> have rack awareness feature to provide data locality, fault tolerance and 
> intelligent task placement. This ticket tries to investigate how to add rack 
> awareness for Mesos resources topology.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5567) Scheduler HTTP API cuts JSON buffers

2016-06-08 Thread Tobias Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320705#comment-15320705
 ] 

Tobias Mueller commented on MESOS-5567:
---

Thanks for the fast reply. That's what I think as well, strange thing is that 
this occurs maybe in 2% of the events I'm seeing. The other 98% are "one JSON 
per line" events.

Maybe it'd make sense to revise the docs at "RecordIO response format" a 
little, because I'm not seeing the line lengths as described, only the overall 
length at the beginning. Furthermore, when I receive a chunked JSON, there is 
no \n at the end of the chunk.

Excerpt:
{noformat} 
128\n
{"type": "SUBSCRIBED","subscribed": {"framework_id": 
{"value":"12220-3440-12532-2345"},...}104\n
{"framework_id": {"value": "12220-3440-12532-2345"},...{"value" : 
"12220-3440-12532-O12"},}208\n
...
{noformat} 

> Scheduler HTTP API cuts JSON buffers
> 
>
> Key: MESOS-5567
> URL: https://issues.apache.org/jira/browse/MESOS-5567
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.28.1
> Environment: Ubuntu 14.04 latest
>Reporter: Tobias Mueller
>
> According to the docs at 
> http://mesos.apache.org/documentation/latest/scheduler-http-api/ I assumed 
> that the message format would only contain "full" (meaning parseable) JSON 
> messages. In fact, I'm partially seeing splitted JSONs, where the next chunk 
> is just continuing the first part:
> {noformat}
> 1983
> {"offers":{"offers":[{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S0"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-0020"},"hostname":"172.17.10.102","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1055"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.102","ip":"172.17.10.102","port":5051},"path":"\/slave(1)","scheme":"http"}},{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S2"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-0020"},"hostname":"172.17.10.101","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1056"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.101","ip":"172.17.10.101","port":5051},"path":"\/slave(1)","scheme":"http"}},{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S1"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc
> 4f7e-0020"},"hostname":"172.17.10.103","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1057"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.103","ip":"172.17.10.103","port":5051},"path":"\/slave(1)","scheme":"http"}}]},"type":"OFFERS"}
> {noformat}
> I use the standard Node.js (4.4.5) http-client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5567) Scheduler HTTP API cuts JSON buffers

2016-06-08 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320676#comment-15320676
 ] 

Vinod Kone commented on MESOS-5567:
---

I think what you are seeing is HTTP chunked transfer encoding in play. Your 
application/library needs to keep reading more data until it reads the number 
of bytes specified at the beginning (in your example it is 1983 bytes) to read 
the full JSON for an event .

> Scheduler HTTP API cuts JSON buffers
> 
>
> Key: MESOS-5567
> URL: https://issues.apache.org/jira/browse/MESOS-5567
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.28.1
> Environment: Ubuntu 14.04 latest
>Reporter: Tobias Mueller
>
> According to the docs at 
> http://mesos.apache.org/documentation/latest/scheduler-http-api/ I assumed 
> that the message format would only contain "full" (meaning parseable) JSON 
> messages. In fact, I'm partially seeing splitted JSONs, where the next chunk 
> is just continuing the first part:
> {noformat}
> 1983
> {"offers":{"offers":[{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S0"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-0020"},"hostname":"172.17.10.102","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1055"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.102","ip":"172.17.10.102","port":5051},"path":"\/slave(1)","scheme":"http"}},{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S2"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-0020"},"hostname":"172.17.10.101","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1056"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.101","ip":"172.17.10.101","port":5051},"path":"\/slave(1)","scheme":"http"}},{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S1"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc
> 4f7e-0020"},"hostname":"172.17.10.103","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1057"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.103","ip":"172.17.10.103","port":5051},"path":"\/slave(1)","scheme":"http"}}]},"type":"OFFERS"}
> {noformat}
> I use the standard Node.js (4.4.5) http-client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5566) Make "state" field of TaskStatus optional

2016-06-08 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320666#comment-15320666
 ] 

Vinod Kone commented on MESOS-5566:
---

I think `SchedulerDriver::reconcile()` using TaskStatus was a bad idea in 
retrospect, rather than `TaskStatus` having a required `state` field. The v1 
scheduler API fixes this by not using TaskStatus for RECONCILE call.

> Make "state" field of TaskStatus optional
> -
>
> Key: MESOS-5566
> URL: https://issues.apache.org/jira/browse/MESOS-5566
> Project: Mesos
>  Issue Type: Improvement
>  Components: general
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere, newbie
>
> The {{SchedulerDriver}} interface uses a vector of {{TaskStatus}} for task 
> reconciliation: the framework sends a list of taskIDs (with optional agent 
> IDs), and the master replies with their current task statuses. Right now, 
> frameworks also need to specify the {{state}} field of the input 
> {{TaskStatus}}, because {{state}} is a {{required}} field. It would be 
> cleaner to make {{state}} optional, because otherwise this makes the 
> interface confusing.
> After doing this, we should remove the places where we set the {{state}} 
> field when making reconciliation requests, e.g., in various test cases.
> See discussion in https://reviews.apache.org/r/48250/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5565) Add logging when Offer::Operation::Launch message has no tasks.

2016-06-08 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-5565:
--
Fix Version/s: 1.0.0

> Add logging when Offer::Operation::Launch message has no tasks.
> ---
>
> Key: MESOS-5565
> URL: https://issues.apache.org/jira/browse/MESOS-5565
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Anand Mazumdar
>Priority: Minor
>  Labels: newbie
> Fix For: 1.0.0
>
>
> Currently, when a {{Offer::Accept::Launch}} message has no tasks specified, 
> Mesos would treat such requests as implicitly declining all offers. This can 
> be very counter-intuitive for framework developers since we do not have any 
> logging on the Master around this behavior. It would be good to add some 
> logging on the master to apprise the framework developers that all the offers 
> have been implicitly declined.
> {code}
> if (operation.type() == Offer::Operation::LAUNCH) {
>   if (operation.launch().task_infos().size() > 0) {
> ++metrics->messages_launch_tasks;
>   } else {
> ++metrics->messages_decline_offers;
>   }
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5570) Improve CHANGELOG and upgrades.md

2016-06-08 Thread Joerg Schad (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320602#comment-15320602
 ] 

Joerg Schad edited comment on MESOS-5570 at 6/8/16 2:08 PM:


Some prework

Included missing changes for 1.0RC
https://reviews.apache.org/r/48272/

Used tense consistenly in CHANGELOG
https://reviews.apache.org/r/48271/


was (Author: js84):
Included missing changes for 1.0RC
https://reviews.apache.org/r/48272/

Used tense consistenly in CHANGELOG
https://reviews.apache.org/r/48271/

> Improve CHANGELOG and upgrades.md
> -
>
> Key: MESOS-5570
> URL: https://issues.apache.org/jira/browse/MESOS-5570
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Joerg Schad
>Assignee: Joerg Schad
> Fix For: 1.0.0
>
>
> Currently we have a lot of data duplication between the CHANGELOG and 
> upgrades.md. We should try to improve this and potentially make the CHANGLOG 
> a markdown file as well. For inspiration see the Hadoop changelog: 
> https://github.com/apache/hadoop/blob/2e1d0ff4e901b8313c8d71869735b94ed8bc40a0/hadoop-common-project/hadoop-common/src/site/markdown/release/1.2.0/CHANGES.1.2.0.md



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5570) Improve CHANGELOG and upgrades.md

2016-06-08 Thread Joerg Schad (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320602#comment-15320602
 ] 

Joerg Schad commented on MESOS-5570:


Included missing changes for 1.0RC
https://reviews.apache.org/r/48272/

Used tense consistenly in CHANGELOG
https://reviews.apache.org/r/48271/

> Improve CHANGELOG and upgrades.md
> -
>
> Key: MESOS-5570
> URL: https://issues.apache.org/jira/browse/MESOS-5570
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Joerg Schad
>Assignee: Joerg Schad
> Fix For: 1.0.0
>
>
> Currently we have a lot of data duplication between the CHANGELOG and 
> upgrades.md. We should try to improve this and potentially make the CHANGLOG 
> a markdown file as well. For inspiration see the Hadoop changelog: 
> https://github.com/apache/hadoop/blob/2e1d0ff4e901b8313c8d71869735b94ed8bc40a0/hadoop-common-project/hadoop-common/src/site/markdown/release/1.2.0/CHANGES.1.2.0.md



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5570) Improve CHANGELOG and upgrades.md

2016-06-08 Thread Joerg Schad (JIRA)
Joerg Schad created MESOS-5570:
--

 Summary: Improve CHANGELOG and upgrades.md
 Key: MESOS-5570
 URL: https://issues.apache.org/jira/browse/MESOS-5570
 Project: Mesos
  Issue Type: Documentation
Reporter: Joerg Schad
Assignee: Joerg Schad
 Fix For: 1.0.0


Currently we have a lot of data duplication between the CHANGELOG and 
upgrades.md. We should try to improve this and potentially make the CHANGLOG a 
markdown file as well. For inspiration see the Hadoop changelog: 
https://github.com/apache/hadoop/blob/2e1d0ff4e901b8313c8d71869735b94ed8bc40a0/hadoop-common-project/hadoop-common/src/site/markdown/release/1.2.0/CHANGES.1.2.0.md




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5515) Implement READ_FILE Call in v1 agent API.

2016-06-08 Thread zhou xing (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhou xing reassigned MESOS-5515:


Assignee: zhou xing

> Implement READ_FILE Call in v1 agent API.
> -
>
> Key: MESOS-5515
> URL: https://issues.apache.org/jira/browse/MESOS-5515
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: zhou xing
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5569) Document supported releases more prominently

2016-06-08 Thread Neil Conway (JIRA)
Neil Conway created MESOS-5569:
--

 Summary: Document supported releases more prominently
 Key: MESOS-5569
 URL: https://issues.apache.org/jira/browse/MESOS-5569
 Project: Mesos
  Issue Type: Documentation
  Components: documentation, project website
Reporter: Neil Conway


{noformat}
It would be great to make this information more prominent on the
website, especially once 1.0.0 is released. For example, we could list
the supported releases on https://mesos.apache.org/downloads/, along
with a link to the versioning document.
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5568) SlaveTest.KillTaskUnregisteredExecutor is slow

2016-06-08 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-5568:
---
Description: 
{noformat}
[--] 1 test from SlaveTest
[ RUN  ] SlaveTest.KillTaskUnregisteredExecutor
[   OK ] SlaveTest.KillTaskUnregisteredExecutor (5128 ms)
[--] 1 test from SlaveTest (5129 ms total)
{noformat}

I'm guessing this could be fixed by tweaking {{executor_shutdown_grace_period}}.

  was:I'm guessing this could be fixed by tweaking 
{{executor_shutdown_grace_period}}.


> SlaveTest.KillTaskUnregisteredExecutor is slow
> --
>
> Key: MESOS-5568
> URL: https://issues.apache.org/jira/browse/MESOS-5568
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Reporter: Neil Conway
>  Labels: mesosphere
>
> {noformat}
> [--] 1 test from SlaveTest
> [ RUN  ] SlaveTest.KillTaskUnregisteredExecutor
> [   OK ] SlaveTest.KillTaskUnregisteredExecutor (5128 ms)
> [--] 1 test from SlaveTest (5129 ms total)
> {noformat}
> I'm guessing this could be fixed by tweaking 
> {{executor_shutdown_grace_period}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5568) SlaveTest.KillTaskUnregisteredExecutor is slow

2016-06-08 Thread Neil Conway (JIRA)
Neil Conway created MESOS-5568:
--

 Summary: SlaveTest.KillTaskUnregisteredExecutor is slow
 Key: MESOS-5568
 URL: https://issues.apache.org/jira/browse/MESOS-5568
 Project: Mesos
  Issue Type: Bug
  Components: tests
Reporter: Neil Conway


I'm guessing this could be fixed by tweaking {{executor_shutdown_grace_period}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5567) Scheduler HTTP API cuts JSON buffers

2016-06-08 Thread Tobias Mueller (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tobias Mueller updated MESOS-5567:
--
Description: 
According to the docs at 
http://mesos.apache.org/documentation/latest/scheduler-http-api/ I assumed that 
the message format would only contain "full" (meaning parseable) JSON messages. 
In fact, I'm partially seeing splitted JSONs, where the next chunk is just 
continuing the first part:

{noformat}
1983
{"offers":{"offers":[{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S0"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-0020"},"hostname":"172.17.10.102","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1055"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.102","ip":"172.17.10.102","port":5051},"path":"\/slave(1)","scheme":"http"}},{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S2"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-0020"},"hostname":"172.17.10.101","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1056"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.101","ip":"172.17.10.101","port":5051},"path":"\/slave(1)","scheme":"http"}},{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S1"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc
4f7e-0020"},"hostname":"172.17.10.103","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1057"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.103","ip":"172.17.10.103","port":5051},"path":"\/slave(1)","scheme":"http"}}]},"type":"OFFERS"}
{noformat}

I use the standard Node.js (4.4.5) http-client.

  was:
According to the docs at 
http://mesos.apache.org/documentation/latest/scheduler-http-api/ I assumed that 
the message format would only contain "full" (meaning parseable) JSON messages. 
In fact, I'm partially seeing splitted JSONs, where the next chunk is prefixed 
with a string like `4f7e-0020`:

{noformat}
1983
{"offers":{"offers":[{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S0"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-0020"},"hostname":"172.17.10.102","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1055"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.102","ip":"172.17.10.102","port":5051},"path":"\/slave(1)","scheme":"http"}},{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S2"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-0020"},"hostname":"172.17.10.101","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1056"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.101","ip":"172.17.10.101","port":5051},"path":"\/slave(1)","scheme":"http"}},{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S1"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc
4f7e-0020"},"hostname":"172.17.10.103","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1057"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.103","ip":"172.17.10.103","port":5051},"path":"\/slave(1)","scheme":"http"}}]},"type":"OFFERS"}
{noformat}

I use the standard Node.js (4.4.5) http-client.


> Scheduler HTTP API cuts JSON buffers
> 

[jira] [Created] (MESOS-5567) Scheduler HTTP API cuts JSON buffers

2016-06-08 Thread Tobias Mueller (JIRA)
Tobias Mueller created MESOS-5567:
-

 Summary: Scheduler HTTP API cuts JSON buffers
 Key: MESOS-5567
 URL: https://issues.apache.org/jira/browse/MESOS-5567
 Project: Mesos
  Issue Type: Bug
  Components: HTTP API
Affects Versions: 0.28.1
 Environment: Ubuntu 14.04 latest
Reporter: Tobias Mueller


According to the docs at 
http://mesos.apache.org/documentation/latest/scheduler-http-api/ I assumed that 
the message format would only contain "full" (meaning parseable) JSON messages. 
In fact, I'm partially seeing splitted JSONs, where the next chunk is prefixed 
with a string like `4f7e-0020`:

{noformat}
1983
{"offers":{"offers":[{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S0"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-0020"},"hostname":"172.17.10.102","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1055"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.102","ip":"172.17.10.102","port":5051},"path":"\/slave(1)","scheme":"http"}},{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S2"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-0020"},"hostname":"172.17.10.101","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1056"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.101","ip":"172.17.10.101","port":5051},"path":"\/slave(1)","scheme":"http"}},{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S1"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc
4f7e-0020"},"hostname":"172.17.10.103","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1057"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.103","ip":"172.17.10.103","port":5051},"path":"\/slave(1)","scheme":"http"}}]},"type":"OFFERS"}
{noformat}

I use the standard Node.js (4.4.5) http-client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5566) Make "state" field of TaskStatus optional

2016-06-08 Thread Neil Conway (JIRA)
Neil Conway created MESOS-5566:
--

 Summary: Make "state" field of TaskStatus optional
 Key: MESOS-5566
 URL: https://issues.apache.org/jira/browse/MESOS-5566
 Project: Mesos
  Issue Type: Improvement
  Components: general
Reporter: Neil Conway
Priority: Minor


The {{SchedulerDriver}} interface uses a vector of {{TaskStatus}} for task 
reconciliation: the framework sends a list of taskIDs (with optional agent 
IDs), and the master replies with their current task statuses. Right now, 
frameworks also need to specify the {{state}} field of the input 
{{TaskStatus}}, because {{state}} is a {{required}} field. It would be cleaner 
to make {{state}} optional, because otherwise this makes the interface 
confusing.

After doing this, we should remove the places where we set the {{state}} field 
when making reconciliation requests, e.g., in various test cases.

See discussion in https://reviews.apache.org/r/48250/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources

2016-06-08 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320245#comment-15320245
 ] 

Fan Du commented on MESOS-5545:
---

[~jvanremoortere] Thanks for your constructive advices/suggestions!

Yes, this will be a long way, but it's fun to experiment the idea. :)
How about we sync up together in the next community meeting 6/16?

In my heart, it's not the attribute that I hate, but lack of doing this 
automatically with boring maintenance effort.
I will update my design doc to enhance current attribute with the goals:
a. Automatically probing rack topology, modular popular network plugins, e.g. 
Ethernet, Infiniband etc. 
b. Using rack topology information to re-arrange agents in per rack basis.
c. Design a common/friendly attribute scheme for framework to interpret
d. ACLs to enforce security

btw, may I ask can you shepherd this ticket? we can work shoulder by shoulder 
then.
Thanks!


> Add rack awareness support for Mesos resources
> --
>
> Key: MESOS-5545
> URL: https://issues.apache.org/jira/browse/MESOS-5545
> Project: Mesos
>  Issue Type: Story
>  Components: hadoop, master
>Reporter: Fan Du
> Attachments: RackAwarenessforMesos-Lite.pdf
>
>
> Resources managed by Mesos master have no topology information of the 
> cluster, for example, rack topology. While lots of data center applications 
> have rack awareness feature to provide data locality, fault tolerance and 
> intelligent task placement. This ticket tries to investigate how to add rack 
> awareness for Mesos resources topology.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources

2016-06-08 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320125#comment-15320125
 ] 

Fan Du commented on MESOS-5545:
---

[~adam-mesos] Thanks for sharing your thoughts here, profound and impressive! 

Mesos performs the lower level resource scheduling, exporting the network 
topology will fall into Mesos's role. It's up to the framework scheduler like 
[Firmament|https://github.com/camsas/firmament] to do more sophisticated 
scheduling decision based on a qualitative approach.

I will think more about here, willing to discuss with you if anything shiny 
pops up in my mind.

> Add rack awareness support for Mesos resources
> --
>
> Key: MESOS-5545
> URL: https://issues.apache.org/jira/browse/MESOS-5545
> Project: Mesos
>  Issue Type: Story
>  Components: hadoop, master
>Reporter: Fan Du
> Attachments: RackAwarenessforMesos-Lite.pdf
>
>
> Resources managed by Mesos master have no topology information of the 
> cluster, for example, rack topology. While lots of data center applications 
> have rack awareness feature to provide data locality, fault tolerance and 
> intelligent task placement. This ticket tries to investigate how to add rack 
> awareness for Mesos resources topology.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources

2016-06-08 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320093#comment-15320093
 ] 

Fan Du commented on MESOS-5545:
---

[~avin...@mesosphere.io] Thanks for the comments, apparently you did LLDP 
homework :)

The topology here only refer to the access layer, that is the switch the agent 
directly connected to. And lldptool will take care of parsing LLDP packet in 
various ways, so to my best knowledge, this will not relate to libprocess part.

You are right about LLDP has boundary of next bridge, i.e. only hop one time, 
in the scenario when OpenvSwitch invovled, Mesos runs inside KVM guest, I can 
think of two ways here:
1. It's the LLDP packets set by ovs bridge that matters so far, because ovs 
bridge now is the access bridge, and lldpad daemon will broadcast LLDP packets.
2. After commit 
[784b58a3|https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=784b58a327ad16967ab64bbfa558df81980d31e9],
 sys knobs could be tweaked to forward LLDP packets.

I don't have any comments about using the label/attributes at the time being, I 
will work out something more appealing based on it.
Will let you my thoughts!

> Add rack awareness support for Mesos resources
> --
>
> Key: MESOS-5545
> URL: https://issues.apache.org/jira/browse/MESOS-5545
> Project: Mesos
>  Issue Type: Story
>  Components: hadoop, master
>Reporter: Fan Du
> Attachments: RackAwarenessforMesos-Lite.pdf
>
>
> Resources managed by Mesos master have no topology information of the 
> cluster, for example, rack topology. While lots of data center applications 
> have rack awareness feature to provide data locality, fault tolerance and 
> intelligent task placement. This ticket tries to investigate how to add rack 
> awareness for Mesos resources topology.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)