[jira] [Created] (MESOS-5580) Implement authn/authz for the network/cni isolator
Avinash Sridharan created MESOS-5580: Summary: Implement authn/authz for the network/cni isolator Key: MESOS-5580 URL: https://issues.apache.org/jira/browse/MESOS-5580 Project: Mesos Issue Type: Task Environment: Linux Reporter: Avinash Sridharan Assignee: Avinash Sridharan Currently any framework can launch containers on any CNI network irrespective of its role and principal. We need perform authn/authz in the network/cni isolator (or Master) to make sure that only roles/principals specified by the operator can launch containers on a given network. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5579) Support static IP address allocation with `DockerContainerizer`
Avinash Sridharan created MESOS-5579: Summary: Support static IP address allocation with `DockerContainerizer` Key: MESOS-5579 URL: https://issues.apache.org/jira/browse/MESOS-5579 Project: Mesos Issue Type: Task Environment: Linux Reporter: Avinash Sridharan Docker run supports the `--ip` option to allocate a specific IPv4 address to the container. Also, the `NetworkInfo` protobuf has an `ipaddress` field that all frameworks to specify an IP address for the container. The docker executor should therefore invoke the `docker run` command with the --ip option whenever the `ipaddress` field of the `NetworkInfo` is set allowing frameworks to try and assign a static IP address for their services. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5578) Support static address allocation in CNI
[ https://issues.apache.org/jira/browse/MESOS-5578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avinash Sridharan updated MESOS-5578: - Story Points: 1 > Support static address allocation in CNI > > > Key: MESOS-5578 > URL: https://issues.apache.org/jira/browse/MESOS-5578 > Project: Mesos > Issue Type: Task > Components: containerization >Affects Versions: 1.0.0 > Environment: Linux >Reporter: Avinash Sridharan >Assignee: Avinash Sridharan > Labels: mesosphere > > Currently a framework can't specify a static IP address for the container > when using the network/cni isolator. > The `ipaddress` field in the `NetworkInfo` protobuf was designed for this > specific purpose but since the CNI spec does not specify a means to allocate > an IP address to the container the `network/cni` isolator cannot honor this > field even when it is filled in by the framework. > Creating this ticket to act as a place holder to track this limitation. As > and when the CNI spec allows us to specify a static IP address for the > container, we can resolve this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5578) Support static address allocation in CNI
Avinash Sridharan created MESOS-5578: Summary: Support static address allocation in CNI Key: MESOS-5578 URL: https://issues.apache.org/jira/browse/MESOS-5578 Project: Mesos Issue Type: Task Components: containerization Affects Versions: 1.0.0 Environment: Linux Reporter: Avinash Sridharan Assignee: Avinash Sridharan Currently a framework can't specify a static IP address for the container when using the network/cni isolator. The `ipaddress` field in the `NetworkInfo` protobuf was designed for this specific purpose but since the CNI spec does not specify a means to allocate an IP address to the container the `network/cni` isolator cannot honor this field even when it is filled in by the framework. Creating this ticket to act as a place holder to track this limitation. As and when the CNI spec allows us to specify a static IP address for the container, we can resolve this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5578) Support static address allocation in CNI
[ https://issues.apache.org/jira/browse/MESOS-5578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avinash Sridharan updated MESOS-5578: - Labels: mesosphere (was: ) > Support static address allocation in CNI > > > Key: MESOS-5578 > URL: https://issues.apache.org/jira/browse/MESOS-5578 > Project: Mesos > Issue Type: Task > Components: containerization >Affects Versions: 1.0.0 > Environment: Linux >Reporter: Avinash Sridharan >Assignee: Avinash Sridharan > Labels: mesosphere > > Currently a framework can't specify a static IP address for the container > when using the network/cni isolator. > The `ipaddress` field in the `NetworkInfo` protobuf was designed for this > specific purpose but since the CNI spec does not specify a means to allocate > an IP address to the container the `network/cni` isolator cannot honor this > field even when it is filled in by the framework. > Creating this ticket to act as a place holder to track this limitation. As > and when the CNI spec allows us to specify a static IP address for the > container, we can resolve this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5577) Modules using replicated log state API require zookeeper headers
Avinash Sridharan created MESOS-5577: Summary: Modules using replicated log state API require zookeeper headers Key: MESOS-5577 URL: https://issues.apache.org/jira/browse/MESOS-5577 Project: Mesos Issue Type: Bug Components: modules Affects Versions: 1.0.0 Reporter: Avinash Sridharan Assignee: Avinash Sridharan Fix For: 1.0.0 The state API uses zookeeper client headers and hence the bundled zookeeper headers need to be installed during Mesos installation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5576) Masters may drop the first message they send between masters after a network partition
[ https://issues.apache.org/jira/browse/MESOS-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Wu updated MESOS-5576: - Description: We observed the following situation in a cluster of five masters: || Time || Master 1 || Master 2 || Master 3 || Master 4 || Master 5 || | 0 | Follower | Follower | Follower | Follower | Leader | | 1 | Follower | Follower | Follower | Follower || Partitioned from cluster by downing this VM's network || | 2 || Elected Leader by ZK | Voting | Voting | Voting | Suicides due to lost leadership | | 3 | Performs consensus | Replies to leader | Replies to leader | Replies to leader | Still down | | 4 | Performs writing | Acks to leader | Acks to leader | Acks to leader | Still down | | 5 | Leader | Follower | Follower | Follower | Still down | | 6 | Leader | Follower | Follower | Follower | Comes back up | | 7 | Leader | Follower | Follower | Follower | Follower | | 8 || Partitioned in the same way as Master 5 | Follower | Follower | Follower | Follower | | 9 | Suicides due to lost leadership || Elected Leader by ZK | Follower | Follower | Follower | | 10 | Still down | Performs consensus | Replies to leader | Replies to leader || Doesn't get the message! || | 11 | Still down | Performs writing | Acks to leader | Acks to leader || Acks to leader || | 12 | Still down | Leader | Follower | Follower | Follower | Master 2 sends a series of messages to the recently-restarted Master 5. The first message is dropped, but subsequent messages are not dropped. This appears to be due to a stale link between the masters. Before leader election, the replicated log actors create a network watcher, which adds links to masters that join the ZK group: https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/network.hpp#L157-L159 This link does not appear to break (Master 2 -> 5) when Master 5 goes down, perhaps due to how the network partition was induced (in the hypervisor layer, rather than in the VM itself). When Master 2 tries to send an {{PromiseRequest}} to Master 5, we do not observe the [expected log message|https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/replica.cpp#L493-L494] Instead, we see a log line in Master 2: {code} process.cpp:2040] Failed to shutdown socket with fd 27: Transport endpoint is not connected {code} The broken link is removed by the libprocess {{socket_manager}} and the following {{WriteRequest}} from Master 2 to Master 5 succeeds via a new socket. was: We observed the following situation in a cluster of five masters: || Time || Master 1 || Master 2 || Master 3 || Master 4 || Master 5 || | 0 | Follower | Follower | Follower | Follower | Leader | | 1 | Follower | Follower | Follower | Follower || Partitioned from cluster by downing this VM's network || | 2 || Elected Leader by ZK | Voting | Voting | Voting | Suicides due to lost leadership | | 3 | Performs consensus | Replies to leader | Replies to leader | Replies to leader | Still down | | 4 | Performs writing | Acks to leader | Acks to leader | Acks to leader | Still down | | 5 | Leader | Follower | Follower | Follower | Still down | | 6 | Leader | Follower | Follower | Follower | Comes back up | | 7 | Leader | Follower | Follower | Follower | Follower | | 8 || Partitioned in the same way as Master 5 | Follower | Follower | Follower | Follower | | 9 | Suicides due to lost leadership || Elected Leader by ZK | Follower | Follower | Follower | | 10 | Still down | Performs consensus | Replies to leader | Replies to leader || Doesn't get the message! || | 11 | Still down | Performs writing | Acks to leader | Acks to leader || Acks to leader || | 12 | Still down | Leader | Follower | Follower | Follower | Master 1 sends a series of messages to the recently-restarted Master 5. The first message is dropped, but subsequent messages are not dropped. This appears to be due to a stale link between the masters. Before leader election, the replicated log actors create a network watcher, which adds links to masters that join the ZK group: https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/network.hpp#L157-L159 This link does not appear to break (Master 2 -> 5) when Master 5 goes down, perhaps due to how the network partition was induced (in the hypervisor layer, rather than in the VM itself). When Master 2 tries to send an {{PromiseRequest}} to Master 5, we do not observe the [expected log message|https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/replica.cpp#L493-L494] Instead, we see a log line in Master 2: {code} process.cpp:2040] Failed to shutdown socket with fd 27: Transport endpoint is not connected {code} The broken link is removed by the libprocess {{socket_manager}} and the following {{WriteRequest}} from Master 2 to Master 5 succeeds via a new socket. > Masters
[jira] [Updated] (MESOS-5576) Masters may drop the first message they send between masters after a network partition
[ https://issues.apache.org/jira/browse/MESOS-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Wu updated MESOS-5576: - Description: We observed the following situation in a cluster of five masters: || Time || Master 1 || Master 2 || Master 3 || Master 4 || Master 5 || | 0 | Follower | Follower | Follower | Follower | Leader | | 1 | Follower | Follower | Follower | Follower || Partitioned from cluster by downing this VM's network || | 2 || Elected Leader by ZK | Voting | Voting | Voting | Suicides due to lost leadership | | 3 | Performs consensus | Replies to leader | Replies to leader | Replies to leader | Still down | | 4 | Performs writing | Acks to leader | Acks to leader | Acks to leader | Still down | | 5 | Leader | Follower | Follower | Follower | Still down | | 6 | Leader | Follower | Follower | Follower | Comes back up | | 7 | Leader | Follower | Follower | Follower | Follower | | 8 || Partitioned in the same way as Master 5 | Follower | Follower | Follower | Follower | | 9 | Suicides due to lost leadership || Elected Leader by ZK | Follower | Follower | Follower | | 10 | Still down | Performs consensus | Replies to leader | Replies to leader || Doesn't get the message! || | 11 | Still down | Performs writing | Acks to leader | Acks to leader || Acks to leader || | 12 | Still down | Leader | Follower | Follower | Follower | Master 1 sends a series of messages to the recently-restarted Master 5. The first message is dropped, but subsequent messages are not dropped. This appears to be due to a stale link between the masters. Before leader election, the replicated log actors create a network watcher, which adds links to masters that join the ZK group: https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/network.hpp#L157-L159 This link does not appear to break (Master 2 -> 5) when Master 5 goes down, perhaps due to how the network partition was induced (in the hypervisor layer, rather than in the VM itself). When Master 2 tries to send an {{PromiseRequest}} to Master 5, we do not observe the [expected log message|https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/replica.cpp#L493-L494] Instead, we see a log line in Master 2: {code} process.cpp:2040] Failed to shutdown socket with fd 27: Transport endpoint is not connected {code} The broken link is removed by the libprocess {{socket_manager}} and the following {{WriteRequest}} from Master 2 to Master 5 succeeds via a new socket. was: We observed the following situation in a cluster of five masters: || Time || Master 1 || Master 2 || Master 3 || Master 4 || Master 5 || | 0 | Follower | Follower | Follower | Follower | Leader | | 1 | Follower | Follower | Follower | Follower || Partitioned from cluster by downing this VM's network || | 2 || Elected Leader by ZK | Voting | Voting | Voting | Suicides due to lost leadership | | 3 | Performs consensus | Replies to leader | Replies to leader | Replies to leader | Still down | | 4 | Performs writing | Acks to leader | Acks to leader | Acks to leader | Still down | | 5 | Leader | Follower | Follower | Follower | Still down | | 6 | Leader | Follower | Follower | Follower | Comes back up | | 7 | Leader | Follower | Follower | Follower | Follower | | 8 || Partitioned in the same way as Master 5 | Follower | Follower | Follower | Follower | | 9 | Suicides due to lost leadership || Elected Leader by ZK | Follower | Follower | Follower | | 10 | Still down | Performs consensus | Replies to leader | Replies to leader || Doesn't get the message! || | 11 | Still down | Performs writing | Acks to leader | Acks to leader || Acks to leader || | 12 | Still down | Leader | Follower | Follower | Follower | Master 1 sends a series of messages to the recently-restarted Master 5. The first message is dropped, but subsequent messages are not dropped. This appears to be due to a stale link between the masters. Before leader election, the replicated log actors create a network watcher, which adds links to masters that join the ZK group: https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/network.hpp#L157-L159 This link does not appear to break (Master 1 -> 5) when Master 5 goes down, perhaps due to how the network partition was induced (in the hypervisor layer, rather than in the VM itself). When Master 1 tries to send an {{PromiseRequest}} to Master 5, we do not observe the [expected log message|https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/replica.cpp#L493-L494] Instead, we see a log line in Master 1: {code} process.cpp:2040] Failed to shutdown socket with fd 27: Transport endpoint is not connected {code} The broken link is removed by the libprocess {{socket_manager}} and the following {{WriteRequest}} from Master 1 to Master 5 succeeds via a new socket. > Masters
[jira] [Comment Edited] (MESOS-5143) LostSlaveMessage should not be broadcasted.
[ https://issues.apache.org/jira/browse/MESOS-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275029#comment-15275029 ] Anindya Sinha edited comment on MESOS-5143 at 6/9/16 1:31 AM: -- RR published: https://reviews.apache.org/r/48453/ https://reviews.apache.org/r/47082/ was (Author: anindya.sinha): RR published: https://reviews.apache.org/r/47082/ > LostSlaveMessage should not be broadcasted. > --- > > Key: MESOS-5143 > URL: https://issues.apache.org/jira/browse/MESOS-5143 > Project: Mesos > Issue Type: Bug > Components: master >Reporter: Yan Xu >Assignee: Anindya Sinha > > Currently a {{LostSlaveMessage}} (in v1 it's a type of {{Event::Failure}}) is > broadcasted to all registered frameworks in the cluster whenever a slave is > lost. > This is unnecessary and kind of breaks the Mesos abstraction: Frameworks are > a given a slice of the cluster, not the entirety. They know about the slice > when offers are extended to them, so we shouldn't inform all of them when all > agents go away. > This message should instead be narrowcasted to all frameworks who have a > stake in this agent: running tasks, pending offers, reservations, persistent > volumes, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-5575) Attempting to Parse PID logging is too verbose
[ https://issues.apache.org/jira/browse/MESOS-5575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu reassigned MESOS-5575: - Assignee: Yan Xu > Attempting to Parse PID logging is too verbose > -- > > Key: MESOS-5575 > URL: https://issues.apache.org/jira/browse/MESOS-5575 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Yan Xu >Assignee: Yan Xu >Priority: Minor > > When you crank up the mesos log level to VLOG(2) the logs get flooded with > “Attempting to parse PID” messages. > This line is logged whenever you create a PID/UPID from a string and in all > successful cases. Compared to other VLOG(2) logs this is less informative and > more frequent. > We should change it to VLOG(3). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4952) Annoying image provisioner logging for when images are not used.
[ https://issues.apache.org/jira/browse/MESOS-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu reassigned MESOS-4952: - Assignee: Yan Xu > Annoying image provisioner logging for when images are not used. > > > Key: MESOS-4952 > URL: https://issues.apache.org/jira/browse/MESOS-4952 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Yan Xu >Assignee: Yan Xu >Priority: Minor > > {{Provisioner::destroy()}} logs this message even when images are not used in > the Mesos cluster: > {noformat:title=} > Ignoring destroy request for unknown container > 597f511e-479d-4632-a3b9-43b1e368c744 > {noformat} > See > [code|https://github.com/apache/mesos/blob/37958fd70de1998e6c29b643abd4f43dd1ef4c79/src/slave/containerizer/mesos/provisioner/provisioner.cpp#L306]. > This can be surprising and annoying to people who are not actually using this > feature and the container is totally valid, it's just not using images. > Let's at least tune it down to VLOG(1). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5574) Missing dependency on libdl
[ https://issues.apache.org/jira/browse/MESOS-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Downes updated MESOS-5574: -- Component/s: build > Missing dependency on libdl > --- > > Key: MESOS-5574 > URL: https://issues.apache.org/jira/browse/MESOS-5574 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 1.0.0 > Environment: CentOS5, devtoolset-2, gcc version 4.8.2 20140120 (Red > Hat 4.8.2-15) (GCC) >Reporter: Ian Downes > > {noformat} > $ make > ... > Making all in src > make[1]: Entering directory `/home/idownes/workspace/mesos/build/src' > make all-am > make[2]: Entering directory `/home/idownes/workspace/mesos/build/src' > /bin/sh ../libtool --tag=CXX --mode=link g++ -pthread -g1 -O0 > -Wno-unused-local-typedefs -std=c++11 -Wl,--as-needed -o mesos-local > local/mesos_local-main.o libmesos.la -lz -lsvn_delta-1 -lsvn_subr-1 > -lsasl2 -lcurl -lapr-1 -lz -lrt > libtool: link: g++ -pthread -g1 -O0 -Wno-unused-local-typedefs -std=c++11 > -Wl,--as-needed -o .libs/mesos-local local/mesos_local-main.o > ./.libs/libmesos.so /usr/lib64/libsvn_delta-1.so /usr/lib64/libsvn_subr-1.so > /usr/lib64/libaprutil-1.so -lcrypt -lexpat -ldb-4.7 -lsasl2 -lcurl > /usr/lib64/libapr-1.so -lpthread -lz -lrt -pthread -Wl,-rpath -Wl,/usr/lib64 > ./.libs/libmesos.so: error: undefined reference to 'dlopen' > ./.libs/libmesos.so: error: undefined reference to 'dlerror' > ./.libs/libmesos.so: error: undefined reference to 'dlclose' > ./.libs/libmesos.so: error: undefined reference to 'dlsym' > collect2: error: ld returned 1 exit status > make[2]: *** [mesos-local] Error 1 > make[2]: Leaving directory `/home/idownes/workspace/mesos/build/src' > make[1]: *** [all] Error 2 > make[1]: Leaving directory `/home/idownes/workspace/mesos/build/src' > make: *** [all-recursive] Error 1 > {noformat} > Builds correctly when libdl inclusion is forced: > {noformat} > $ LDFLAGS='-ldl' ../configure > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5574) Missing dependency on libdl
Ian Downes created MESOS-5574: - Summary: Missing dependency on libdl Key: MESOS-5574 URL: https://issues.apache.org/jira/browse/MESOS-5574 Project: Mesos Issue Type: Bug Affects Versions: 1.0.0 Environment: CentOS5, devtoolset-2, gcc version 4.8.2 20140120 (Red Hat 4.8.2-15) (GCC) Reporter: Ian Downes {noformat} $ make ... Making all in src make[1]: Entering directory `/home/idownes/workspace/mesos/build/src' make all-am make[2]: Entering directory `/home/idownes/workspace/mesos/build/src' /bin/sh ../libtool --tag=CXX --mode=link g++ -pthread -g1 -O0 -Wno-unused-local-typedefs -std=c++11 -Wl,--as-needed -o mesos-local local/mesos_local-main.o libmesos.la -lz -lsvn_delta-1 -lsvn_subr-1 -lsasl2 -lcurl -lapr-1 -lz -lrt libtool: link: g++ -pthread -g1 -O0 -Wno-unused-local-typedefs -std=c++11 -Wl,--as-needed -o .libs/mesos-local local/mesos_local-main.o ./.libs/libmesos.so /usr/lib64/libsvn_delta-1.so /usr/lib64/libsvn_subr-1.so /usr/lib64/libaprutil-1.so -lcrypt -lexpat -ldb-4.7 -lsasl2 -lcurl /usr/lib64/libapr-1.so -lpthread -lz -lrt -pthread -Wl,-rpath -Wl,/usr/lib64 ./.libs/libmesos.so: error: undefined reference to 'dlopen' ./.libs/libmesos.so: error: undefined reference to 'dlerror' ./.libs/libmesos.so: error: undefined reference to 'dlclose' ./.libs/libmesos.so: error: undefined reference to 'dlsym' collect2: error: ld returned 1 exit status make[2]: *** [mesos-local] Error 1 make[2]: Leaving directory `/home/idownes/workspace/mesos/build/src' make[1]: *** [all] Error 2 make[1]: Leaving directory `/home/idownes/workspace/mesos/build/src' make: *** [all-recursive] Error 1 {noformat} Builds correctly when libdl inclusion is forced: {noformat} $ LDFLAGS='-ldl' ../configure {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5491) Implement GET_AGENTS Call in v1 master API.
[ https://issues.apache.org/jira/browse/MESOS-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321440#comment-15321440 ] zhou xing commented on MESOS-5491: -- One review request submitted: https://reviews.apache.org/r/48438/ > Implement GET_AGENTS Call in v1 master API. > --- > > Key: MESOS-5491 > URL: https://issues.apache.org/jira/browse/MESOS-5491 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: zhou xing > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources
[ https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321408#comment-15321408 ] Fan Du commented on MESOS-5545: --- [~brugidou] Thanks for the sharing, I will look into it! > Add rack awareness support for Mesos resources > -- > > Key: MESOS-5545 > URL: https://issues.apache.org/jira/browse/MESOS-5545 > Project: Mesos > Issue Type: Story > Components: hadoop, master >Reporter: Fan Du > Attachments: RackAwarenessforMesos-Lite.pdf > > > Resources managed by Mesos master have no topology information of the > cluster, for example, rack topology. While lots of data center applications > have rack awareness feature to provide data locality, fault tolerance and > intelligent task placement. This ticket tries to investigate how to add rack > awareness for Mesos resources topology. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4279) Docker executor truncates task's output when the task is killed.
[ https://issues.apache.org/jira/browse/MESOS-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321312#comment-15321312 ] Benjamin Mahler commented on MESOS-4279: I posted two fixes related to this ticket. The first is to send terminal status updates in the same manner as the command executor: https://reviews.apache.org/r/48428/ The second is to eliminate the killing of the 'docker run' subprocess, which breaks the log redirection: https://reviews.apache.org/r/48429/ Let me know if you have any feedback, [~jieyu] kindly agreed to review. > Docker executor truncates task's output when the task is killed. > > > Key: MESOS-4279 > URL: https://issues.apache.org/jira/browse/MESOS-4279 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 0.25.0, 0.26.0, 0.27.2, 0.28.1 >Reporter: Martin Bydzovsky >Assignee: Benjamin Mahler >Priority: Critical > Labels: docker, mesosphere > Fix For: 1.0.0 > > > I'm implementing a graceful restarts of our mesos-marathon-docker setup and I > came to a following issue: > (it was already discussed on > https://github.com/mesosphere/marathon/issues/2876 and guys form mesosphere > got to a point that its probably a docker containerizer problem...) > To sum it up: > When i deploy simple python script to all mesos-slaves: > {code} > #!/usr/bin/python > from time import sleep > import signal > import sys > import datetime > def sigterm_handler(_signo, _stack_frame): > print "got %i" % _signo > print datetime.datetime.now().time() > sys.stdout.flush() > sleep(2) > print datetime.datetime.now().time() > print "ending" > sys.stdout.flush() > sys.exit(0) > signal.signal(signal.SIGTERM, sigterm_handler) > signal.signal(signal.SIGINT, sigterm_handler) > try: > print "Hello" > i = 0 > while True: > i += 1 > print datetime.datetime.now().time() > print "Iteration #%i" % i > sys.stdout.flush() > sleep(1) > finally: > print "Goodbye" > {code} > and I run it through Marathon like > {code:javascript} > data = { > args: ["/tmp/script.py"], > instances: 1, > cpus: 0.1, > mem: 256, > id: "marathon-test-api" > } > {code} > During the app restart I get expected result - the task receives sigterm and > dies peacefully (during my script-specified 2 seconds period) > But when i wrap this python script in a docker: > {code} > FROM node:4.2 > RUN mkdir /app > ADD . /app > WORKDIR /app > ENTRYPOINT [] > {code} > and run appropriate application by Marathon: > {code:javascript} > data = { > args: ["./script.py"], > container: { > type: "DOCKER", > docker: { > image: "bydga/marathon-test-api" > }, > forcePullImage: yes > }, > cpus: 0.1, > mem: 256, > instances: 1, > id: "marathon-test-api" > } > {code} > The task during restart (issued from marathon) dies immediately without > having a chance to do any cleanup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5573) Executor Driver does not invoke the `disconnected` callback upon disconnection with the agent.
[ https://issues.apache.org/jira/browse/MESOS-5573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-5573: -- Labels: mesosphere newbie (was: mesosphere) > Executor Driver does not invoke the `disconnected` callback upon > disconnection with the agent. > -- > > Key: MESOS-5573 > URL: https://issues.apache.org/jira/browse/MESOS-5573 > Project: Mesos > Issue Type: Bug >Reporter: Anand Mazumdar > Labels: mesosphere, newbie > > The executor driver must invoke the {{disconnected}} callback upon > disconnecting with the agent i.e. if the agent process restarts as per > documentation: > https://mesos.apache.org/api/latest/java/org/apache/mesos/Executor.html#disconnected(org.apache.mesos.ExecutorDriver) > It does not seem to be the case that is being done currently. Also, this > callback should only be invoked for frameworks with checkpointing enabled as > for non-checkpointed frameworks the executor is shutdown upon a disconnection. > There might already be a JIRA for this. But, I was not able to spot any. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5573) Executor Driver does not invoke the `disconnected` callback upon disconnection with the agent.
Anand Mazumdar created MESOS-5573: - Summary: Executor Driver does not invoke the `disconnected` callback upon disconnection with the agent. Key: MESOS-5573 URL: https://issues.apache.org/jira/browse/MESOS-5573 Project: Mesos Issue Type: Bug Reporter: Anand Mazumdar The executor driver must invoke the {{disconnected}} callback upon disconnecting with the agent i.e. if the agent process restarts as per documentation: https://mesos.apache.org/api/latest/java/org/apache/mesos/Executor.html#disconnected(org.apache.mesos.ExecutorDriver) It does not seem to be the case that is being done currently. Also, this callback should only be invoked for frameworks with checkpointing enabled as for non-checkpointed frameworks the executor is shutdown upon a disconnection. There might already be a JIRA for this. But, I was not able to spot any. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4672) Implement aufs based provisioner backend.
[ https://issues.apache.org/jira/browse/MESOS-4672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321213#comment-15321213 ] Jie Yu commented on MESOS-4672: --- commit d98de34efc9d8fa891c5d29e5fbdb22382e10d64 Author: Shuai LinDate: Wed Jun 8 11:54:40 2016 -0700 Fixed compilation on OS X of aufs tests. Review: https://reviews.apache.org/r/48388/ > Implement aufs based provisioner backend. > - > > Key: MESOS-4672 > URL: https://issues.apache.org/jira/browse/MESOS-4672 > Project: Mesos > Issue Type: Bug >Reporter: Jie Yu >Assignee: Shuai Lin > Fix For: 1.0.0 > > > Overlay fs support hasn't been merged until kernel 3.18. Docker's default > storage backend for ubuntu 14.04 is aufs. We should consider adding a aufs > based backend for unified containerizer as well to efficiently provide a > union fs (instead of relying on copy backend which is not space efficient). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3014) Support network egress bandwidth as a first-class resource
[ https://issues.apache.org/jira/browse/MESOS-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321212#comment-15321212 ] Tom Ganem commented on MESOS-3014: -- So I can see this issue is not a priority to anyone right now, but it is of extreme interest to me. The company I work for specializes in high speed file transfers. We have recently been brainstorming about creating a mesos framework that would launch short-lived transfer processes on the mesos cluster. If outbound network bandwidth was offered as a resource, we could ensure that each transfer process we launch would have a desired amount of bandwidth while not affecting other processes/containers. > Support network egress bandwidth as a first-class resource > -- > > Key: MESOS-3014 > URL: https://issues.apache.org/jira/browse/MESOS-3014 > Project: Mesos > Issue Type: Improvement > Components: isolation >Reporter: Adam B > Labels: isolation, network, resource > > Mesos 0.23.0 introduced `--egress_rate_limit_per_container=100MB` for > statically configuring a fixed, per-container limit on outbound network > bandwidth, but if this were instead a standard resource type (outbound > network bandwidth) with a fixed total, then different subsets could be > reserved/claimed by different frameworks/tasks. This would allow us to adjust > the per-container limit depending on how many containers are running, or give > some containers more bandwidth than others. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5572) Change Operator API RPC handlers return type to http::Response
[ https://issues.apache.org/jira/browse/MESOS-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-5572: -- Shepherd: Vinod Kone > Change Operator API RPC handlers return type to http::Response > -- > > Key: MESOS-5572 > URL: https://issues.apache.org/jira/browse/MESOS-5572 > Project: Mesos > Issue Type: Improvement > Components: HTTP API >Reporter: haosdent >Assignee: haosdent > Labels: http > > As discussion in http://search-hadoop.com/m/0Vlr6Uz9otVwkdv , we need to > change the return type of RPC handlers to > {{http::Response}} to make it more flexible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-5572) Change Operator API RPC handlers return type to http::Response
[ https://issues.apache.org/jira/browse/MESOS-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent reassigned MESOS-5572: --- Assignee: haosdent > Change Operator API RPC handlers return type to http::Response > -- > > Key: MESOS-5572 > URL: https://issues.apache.org/jira/browse/MESOS-5572 > Project: Mesos > Issue Type: Improvement > Components: HTTP API >Reporter: haosdent >Assignee: haosdent > Labels: http > > As discussion in http://search-hadoop.com/m/0Vlr6Uz9otVwkdv , we need to > change the return type of RPC handlers to > {{http::Response}} to make it more flexible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5572) Change Operator API RPC handlers return type to http::Response
haosdent created MESOS-5572: --- Summary: Change Operator API RPC handlers return type to http::Response Key: MESOS-5572 URL: https://issues.apache.org/jira/browse/MESOS-5572 Project: Mesos Issue Type: Improvement Components: HTTP API Reporter: haosdent As discussion in http://search-hadoop.com/m/0Vlr6Uz9otVwkdv , we need to change the return type of RPC handlers to {{http::Response}} to make it more flexible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5571) Scheduler JNI throws exception when the major versions of JAR and libmesos don't match
[ https://issues.apache.org/jira/browse/MESOS-5571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-5571: -- Shepherd: Vinod Kone > Scheduler JNI throws exception when the major versions of JAR and libmesos > don't match > -- > > Key: MESOS-5571 > URL: https://issues.apache.org/jira/browse/MESOS-5571 > Project: Mesos > Issue Type: Bug > Components: java api >Affects Versions: 1.0.0 >Reporter: Yan Xu >Assignee: Yan Xu >Priority: Blocker > Fix For: 1.0.0 > > > In > [convert.cpp|https://github.com/apache/mesos/blob/master/src/java/jni/convert.cpp#L153] > we compare the major versions of the native library and the jar. This makes > upgrading frameworks unnecessarily hard because you would have to deploy > Mesos and frameworks in lockstep. > Backwards-incompatible changes would warrant a major version bump but not > vise versa. Plus it's more standard to express and check dependency > versions outside of the code but through package metadata. > The proposed solution is to remove this major version check altogether. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-5571) Scheduler JNI throws exception when the major versions of JAR and libmesos don't match
[ https://issues.apache.org/jira/browse/MESOS-5571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu reassigned MESOS-5571: - Assignee: Yan Xu > Scheduler JNI throws exception when the major versions of JAR and libmesos > don't match > -- > > Key: MESOS-5571 > URL: https://issues.apache.org/jira/browse/MESOS-5571 > Project: Mesos > Issue Type: Bug > Components: java api >Affects Versions: 1.0.0 >Reporter: Yan Xu >Assignee: Yan Xu >Priority: Blocker > Fix For: 1.0.0 > > > In > [convert.cpp|https://github.com/apache/mesos/blob/master/src/java/jni/convert.cpp#L153] > we compare the major versions of the native library and the jar. This makes > upgrading frameworks unnecessarily hard because you would have to deploy > Mesos and frameworks in lockstep. > Backwards-incompatible changes would warrant a major version bump but not > vise versa. Plus it's more standard to express and check dependency > versions outside of the code but through package metadata. > The proposed solution is to remove this major version check altogether. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5280) Inconsistent error checking in DRF sorter.
[ https://issues.apache.org/jira/browse/MESOS-5280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-5280: -- Description: There exist a few different error handling styles in the sorter. h2. Hard checks e.g., [DRFSorter::update|https://github.com/apache/mesos/blob/c530deb3050d862fd894a9c4ed0a8ddca8714a63/src/master/allocator/sorter/drf/sorter.cpp#L62] {code} CHECK(weights.contains(name)); {code} h2. No-op if it results in an error condition. e.g., [DRFSorter::allocated|https://github.com/apache/mesos/blob/c530deb3050d862fd894a9c4ed0a8ddca8714a63/src/master/allocator/sorter/drf/sorter.cpp#L116]: {code} set::iterator it = find(name); if (it != clients.end()) { // TODO(benh): This should really be a CHECK. ... } {code} The problem: - Silence no-ops is not ideal. (Implicitness makes it hard to debug things and we have run into one instance of this). - Hard CHECKs on invalid arguments is often too harsh. - Not checking preconditions can lead to subtle bugs. - We should check errors consistently. was: There exist a few different error handling styles in the sorter. h2. Hard checks e.g., [DRFSorter::update|https://github.com/apache/mesos/blob/c530deb3050d862fd894a9c4ed0a8ddca8714a63/src/master/allocator/sorter/drf/sorter.cpp#L62] {code} CHECK(weights.contains(name)); {code} h2. No-op if it results in an error condition. e.g., [DRFSorter::allocated|https://github.com/apache/mesos/blob/c530deb3050d862fd894a9c4ed0a8ddca8714a63/src/master/allocator/sorter/drf/sorter.cpp#L116]: {code} set ::iterator it = find(name); if (it != clients.end()) { // TODO(benh): This should really be a CHECK. ... } {code} IMO there should never be silent no-ops. Short of CHECK, we should return an error if it's indeed an error. If either path of the branch is valid and one is a noop, we should log the noop branch or return a 'bool' so the caller can distinguish the two. Implicitness makes it hard to debug things and we have run into one instance of this. My proposal is to use CHECKs consistently in sorter. > Inconsistent error checking in DRF sorter. > -- > > Key: MESOS-5280 > URL: https://issues.apache.org/jira/browse/MESOS-5280 > Project: Mesos > Issue Type: Bug > Components: allocation >Reporter: Yan Xu >Assignee: Yan Xu > > There exist a few different error handling styles in the sorter. > h2. Hard checks > e.g., > [DRFSorter::update|https://github.com/apache/mesos/blob/c530deb3050d862fd894a9c4ed0a8ddca8714a63/src/master/allocator/sorter/drf/sorter.cpp#L62] > {code} > CHECK(weights.contains(name)); > {code} > h2. No-op if it results in an error condition. > e.g., > [DRFSorter::allocated|https://github.com/apache/mesos/blob/c530deb3050d862fd894a9c4ed0a8ddca8714a63/src/master/allocator/sorter/drf/sorter.cpp#L116]: > {code} > set ::iterator it = find(name); > if (it != clients.end()) { // TODO(benh): This should really be a CHECK. > ... > } > {code} > The problem: > - Silence no-ops is not ideal. (Implicitness makes it hard to debug things > and we have run into one instance of this). > - Hard CHECKs on invalid arguments is often too harsh. > - Not checking preconditions can lead to subtle bugs. > - We should check errors consistently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5280) Inconsistent error checking in DRF sorter.
[ https://issues.apache.org/jira/browse/MESOS-5280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320999#comment-15320999 ] Yan Xu commented on MESOS-5280: --- Derived from the discussion [here|https://reviews.apache.org/r/47259/]: IMO we should: - CHECK internal invariants. - Return error/none/false for invalid arguments or no-ops. If some of these things should never happen, they should be CHECKs in the caller (the allocator) because they are its internal invariants. - The sorter should never assume the allocator would call its methods in the expected way/order. (e.g., [here|https://github.com/apache/mesos/blob/6ce476461f0fedfb4ed4e40c15f25bb79a39b0f3/src/master/allocator/sorter/drf/sorter.cpp#L242] the method has no safegards at all). /cc [~jvanremoortere] > Inconsistent error checking in DRF sorter. > -- > > Key: MESOS-5280 > URL: https://issues.apache.org/jira/browse/MESOS-5280 > Project: Mesos > Issue Type: Bug > Components: allocation >Reporter: Yan Xu >Assignee: Yan Xu > > There exist a few different error handling styles in the sorter. > h2. Hard checks > e.g., > [DRFSorter::update|https://github.com/apache/mesos/blob/c530deb3050d862fd894a9c4ed0a8ddca8714a63/src/master/allocator/sorter/drf/sorter.cpp#L62] > {code} > CHECK(weights.contains(name)); > {code} > h2. No-op if it results in an error condition. > e.g., > [DRFSorter::allocated|https://github.com/apache/mesos/blob/c530deb3050d862fd894a9c4ed0a8ddca8714a63/src/master/allocator/sorter/drf/sorter.cpp#L116]: > {code} > set::iterator it = find(name); > if (it != clients.end()) { // TODO(benh): This should really be a CHECK. > ... > } > {code} > IMO there should never be silent no-ops. Short of CHECK, we should return an > error if it's indeed an error. If either path of the branch is valid and one > is a noop, we should log the noop branch or return a 'bool' so the caller > can distinguish the two. > Implicitness makes it hard to debug things and we have run into one instance > of this. > My proposal is to use CHECKs consistently in sorter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5571) Scheduler JNI throws exception when the major versions of JAR and libmesos don't match
Yan Xu created MESOS-5571: - Summary: Scheduler JNI throws exception when the major versions of JAR and libmesos don't match Key: MESOS-5571 URL: https://issues.apache.org/jira/browse/MESOS-5571 Project: Mesos Issue Type: Bug Components: java api Affects Versions: 1.0.0 Reporter: Yan Xu Priority: Blocker Fix For: 1.0.0 In [convert.cpp|https://github.com/apache/mesos/blob/master/src/java/jni/convert.cpp#L153] we compare the major versions of the native library and the jar. This makes upgrading frameworks unnecessarily hard because you would have to deploy Mesos and frameworks in lockstep. Backwards-incompatible changes would warrant a major version bump but not vise versa. Plus it's more standard to express and check dependency versions outside of the code but through package metadata. The proposed solution is to remove this major version check altogether. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5567) Scheduler HTTP API cuts JSON buffers
[ https://issues.apache.org/jira/browse/MESOS-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320854#comment-15320854 ] Vinod Kone commented on MESOS-5567: --- There is no "\n" at the end of the chunk as you said. Maybe the doc is not clear. Would you mind sending a PR/review to fix it? I'll be happy to shepherd and commit it. > Scheduler HTTP API cuts JSON buffers > > > Key: MESOS-5567 > URL: https://issues.apache.org/jira/browse/MESOS-5567 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Affects Versions: 0.28.1 > Environment: Ubuntu 14.04 latest >Reporter: Tobias Mueller > > According to the docs at > http://mesos.apache.org/documentation/latest/scheduler-http-api/ I assumed > that the message format would only contain "full" (meaning parseable) JSON > messages. In fact, I'm partially seeing splitted JSONs, where the next chunk > is just continuing the first part: > {noformat} > 1983 > {"offers":{"offers":[{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S0"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-0020"},"hostname":"172.17.10.102","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1055"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.102","ip":"172.17.10.102","port":5051},"path":"\/slave(1)","scheme":"http"}},{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S2"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-0020"},"hostname":"172.17.10.101","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1056"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.101","ip":"172.17.10.101","port":5051},"path":"\/slave(1)","scheme":"http"}},{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S1"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc > 4f7e-0020"},"hostname":"172.17.10.103","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1057"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.103","ip":"172.17.10.103","port":5051},"path":"\/slave(1)","scheme":"http"}}]},"type":"OFFERS"} > {noformat} > I use the standard Node.js (4.4.5) http-client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-5537) http v1 SUBSCRIBED scheduler event always has nil http_interval_seconds
[ https://issues.apache.org/jira/browse/MESOS-5537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar reassigned MESOS-5537: - Assignee: Anand Mazumdar > http v1 SUBSCRIBED scheduler event always has nil http_interval_seconds > --- > > Key: MESOS-5537 > URL: https://issues.apache.org/jira/browse/MESOS-5537 > Project: Mesos > Issue Type: Bug >Reporter: James DeFelice >Assignee: Anand Mazumdar >Priority: Blocker > Labels: mesosphere > Fix For: 1.0.0 > > > I'm writing a controller in Go to monitor heartbeats. I'd like to use the > interval as communicated by the master, which should be specified in the > SUBSCRIBED event. But it's not. > {code} > 2016/06/03 18:34:04 {Type:SUBSCRIBED > Subscribed:_Subscribed{FrameworkID:{Value:ffdb6d6e-0167-4fa2-98f9-2c3f8157fc25-0004,},HeartbeatIntervalSeconds:nil,} > Offers:nil Rescind:nil Update:nil Message:nil Failure:nil Error:nil} > {code} > {code} > $ dpkg -l |grep -e mesos > ii mesos 0.28.0-2.0.16.ubuntu1404 > amd64Cluster resource manager with efficient resource isolation > {code} > I *am* seeing HEARTBEAT events. Just not seeing the interval specified in the > SUBSCRIBED event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources
[ https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320739#comment-15320739 ] Joris Van Remoortere commented on MESOS-5545: - [~fan.du] I would like to; however, this is currently not high enough on my priority list. I'm passionate about this subject, which is why I've brought it up before :-) We should see in the community meeting if there is some consensus on a timeline. If the automation aspect is what is most important to you, then I would focus on a good interface between Mesos and the modules / tools you want to build to source the information. We likely won't get much traction dragging specific strategies into the Mesos project. Rather, we should take the approach of ensuring the interfaces / primitives work well for a variety of strategies and tools. > Add rack awareness support for Mesos resources > -- > > Key: MESOS-5545 > URL: https://issues.apache.org/jira/browse/MESOS-5545 > Project: Mesos > Issue Type: Story > Components: hadoop, master >Reporter: Fan Du > Attachments: RackAwarenessforMesos-Lite.pdf > > > Resources managed by Mesos master have no topology information of the > cluster, for example, rack topology. While lots of data center applications > have rack awareness feature to provide data locality, fault tolerance and > intelligent task placement. This ticket tries to investigate how to add rack > awareness for Mesos resources topology. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5567) Scheduler HTTP API cuts JSON buffers
[ https://issues.apache.org/jira/browse/MESOS-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320705#comment-15320705 ] Tobias Mueller commented on MESOS-5567: --- Thanks for the fast reply. That's what I think as well, strange thing is that this occurs maybe in 2% of the events I'm seeing. The other 98% are "one JSON per line" events. Maybe it'd make sense to revise the docs at "RecordIO response format" a little, because I'm not seeing the line lengths as described, only the overall length at the beginning. Furthermore, when I receive a chunked JSON, there is no \n at the end of the chunk. Excerpt: {noformat} 128\n {"type": "SUBSCRIBED","subscribed": {"framework_id": {"value":"12220-3440-12532-2345"},...}104\n {"framework_id": {"value": "12220-3440-12532-2345"},...{"value" : "12220-3440-12532-O12"},}208\n ... {noformat} > Scheduler HTTP API cuts JSON buffers > > > Key: MESOS-5567 > URL: https://issues.apache.org/jira/browse/MESOS-5567 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Affects Versions: 0.28.1 > Environment: Ubuntu 14.04 latest >Reporter: Tobias Mueller > > According to the docs at > http://mesos.apache.org/documentation/latest/scheduler-http-api/ I assumed > that the message format would only contain "full" (meaning parseable) JSON > messages. In fact, I'm partially seeing splitted JSONs, where the next chunk > is just continuing the first part: > {noformat} > 1983 > {"offers":{"offers":[{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S0"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-0020"},"hostname":"172.17.10.102","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1055"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.102","ip":"172.17.10.102","port":5051},"path":"\/slave(1)","scheme":"http"}},{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S2"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-0020"},"hostname":"172.17.10.101","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1056"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.101","ip":"172.17.10.101","port":5051},"path":"\/slave(1)","scheme":"http"}},{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S1"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc > 4f7e-0020"},"hostname":"172.17.10.103","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1057"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.103","ip":"172.17.10.103","port":5051},"path":"\/slave(1)","scheme":"http"}}]},"type":"OFFERS"} > {noformat} > I use the standard Node.js (4.4.5) http-client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5567) Scheduler HTTP API cuts JSON buffers
[ https://issues.apache.org/jira/browse/MESOS-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320676#comment-15320676 ] Vinod Kone commented on MESOS-5567: --- I think what you are seeing is HTTP chunked transfer encoding in play. Your application/library needs to keep reading more data until it reads the number of bytes specified at the beginning (in your example it is 1983 bytes) to read the full JSON for an event . > Scheduler HTTP API cuts JSON buffers > > > Key: MESOS-5567 > URL: https://issues.apache.org/jira/browse/MESOS-5567 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Affects Versions: 0.28.1 > Environment: Ubuntu 14.04 latest >Reporter: Tobias Mueller > > According to the docs at > http://mesos.apache.org/documentation/latest/scheduler-http-api/ I assumed > that the message format would only contain "full" (meaning parseable) JSON > messages. In fact, I'm partially seeing splitted JSONs, where the next chunk > is just continuing the first part: > {noformat} > 1983 > {"offers":{"offers":[{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S0"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-0020"},"hostname":"172.17.10.102","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1055"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.102","ip":"172.17.10.102","port":5051},"path":"\/slave(1)","scheme":"http"}},{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S2"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-0020"},"hostname":"172.17.10.101","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1056"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.101","ip":"172.17.10.101","port":5051},"path":"\/slave(1)","scheme":"http"}},{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S1"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc > 4f7e-0020"},"hostname":"172.17.10.103","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1057"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.103","ip":"172.17.10.103","port":5051},"path":"\/slave(1)","scheme":"http"}}]},"type":"OFFERS"} > {noformat} > I use the standard Node.js (4.4.5) http-client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5566) Make "state" field of TaskStatus optional
[ https://issues.apache.org/jira/browse/MESOS-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320666#comment-15320666 ] Vinod Kone commented on MESOS-5566: --- I think `SchedulerDriver::reconcile()` using TaskStatus was a bad idea in retrospect, rather than `TaskStatus` having a required `state` field. The v1 scheduler API fixes this by not using TaskStatus for RECONCILE call. > Make "state" field of TaskStatus optional > - > > Key: MESOS-5566 > URL: https://issues.apache.org/jira/browse/MESOS-5566 > Project: Mesos > Issue Type: Improvement > Components: general >Reporter: Neil Conway >Priority: Minor > Labels: mesosphere, newbie > > The {{SchedulerDriver}} interface uses a vector of {{TaskStatus}} for task > reconciliation: the framework sends a list of taskIDs (with optional agent > IDs), and the master replies with their current task statuses. Right now, > frameworks also need to specify the {{state}} field of the input > {{TaskStatus}}, because {{state}} is a {{required}} field. It would be > cleaner to make {{state}} optional, because otherwise this makes the > interface confusing. > After doing this, we should remove the places where we set the {{state}} > field when making reconciliation requests, e.g., in various test cases. > See discussion in https://reviews.apache.org/r/48250/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5565) Add logging when Offer::Operation::Launch message has no tasks.
[ https://issues.apache.org/jira/browse/MESOS-5565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-5565: -- Fix Version/s: 1.0.0 > Add logging when Offer::Operation::Launch message has no tasks. > --- > > Key: MESOS-5565 > URL: https://issues.apache.org/jira/browse/MESOS-5565 > Project: Mesos > Issue Type: Improvement >Reporter: Anand Mazumdar >Priority: Minor > Labels: newbie > Fix For: 1.0.0 > > > Currently, when a {{Offer::Accept::Launch}} message has no tasks specified, > Mesos would treat such requests as implicitly declining all offers. This can > be very counter-intuitive for framework developers since we do not have any > logging on the Master around this behavior. It would be good to add some > logging on the master to apprise the framework developers that all the offers > have been implicitly declined. > {code} > if (operation.type() == Offer::Operation::LAUNCH) { > if (operation.launch().task_infos().size() > 0) { > ++metrics->messages_launch_tasks; > } else { > ++metrics->messages_decline_offers; > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-5570) Improve CHANGELOG and upgrades.md
[ https://issues.apache.org/jira/browse/MESOS-5570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320602#comment-15320602 ] Joerg Schad edited comment on MESOS-5570 at 6/8/16 2:08 PM: Some prework Included missing changes for 1.0RC https://reviews.apache.org/r/48272/ Used tense consistenly in CHANGELOG https://reviews.apache.org/r/48271/ was (Author: js84): Included missing changes for 1.0RC https://reviews.apache.org/r/48272/ Used tense consistenly in CHANGELOG https://reviews.apache.org/r/48271/ > Improve CHANGELOG and upgrades.md > - > > Key: MESOS-5570 > URL: https://issues.apache.org/jira/browse/MESOS-5570 > Project: Mesos > Issue Type: Documentation >Reporter: Joerg Schad >Assignee: Joerg Schad > Fix For: 1.0.0 > > > Currently we have a lot of data duplication between the CHANGELOG and > upgrades.md. We should try to improve this and potentially make the CHANGLOG > a markdown file as well. For inspiration see the Hadoop changelog: > https://github.com/apache/hadoop/blob/2e1d0ff4e901b8313c8d71869735b94ed8bc40a0/hadoop-common-project/hadoop-common/src/site/markdown/release/1.2.0/CHANGES.1.2.0.md -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5570) Improve CHANGELOG and upgrades.md
[ https://issues.apache.org/jira/browse/MESOS-5570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320602#comment-15320602 ] Joerg Schad commented on MESOS-5570: Included missing changes for 1.0RC https://reviews.apache.org/r/48272/ Used tense consistenly in CHANGELOG https://reviews.apache.org/r/48271/ > Improve CHANGELOG and upgrades.md > - > > Key: MESOS-5570 > URL: https://issues.apache.org/jira/browse/MESOS-5570 > Project: Mesos > Issue Type: Documentation >Reporter: Joerg Schad >Assignee: Joerg Schad > Fix For: 1.0.0 > > > Currently we have a lot of data duplication between the CHANGELOG and > upgrades.md. We should try to improve this and potentially make the CHANGLOG > a markdown file as well. For inspiration see the Hadoop changelog: > https://github.com/apache/hadoop/blob/2e1d0ff4e901b8313c8d71869735b94ed8bc40a0/hadoop-common-project/hadoop-common/src/site/markdown/release/1.2.0/CHANGES.1.2.0.md -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5570) Improve CHANGELOG and upgrades.md
Joerg Schad created MESOS-5570: -- Summary: Improve CHANGELOG and upgrades.md Key: MESOS-5570 URL: https://issues.apache.org/jira/browse/MESOS-5570 Project: Mesos Issue Type: Documentation Reporter: Joerg Schad Assignee: Joerg Schad Fix For: 1.0.0 Currently we have a lot of data duplication between the CHANGELOG and upgrades.md. We should try to improve this and potentially make the CHANGLOG a markdown file as well. For inspiration see the Hadoop changelog: https://github.com/apache/hadoop/blob/2e1d0ff4e901b8313c8d71869735b94ed8bc40a0/hadoop-common-project/hadoop-common/src/site/markdown/release/1.2.0/CHANGES.1.2.0.md -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-5515) Implement READ_FILE Call in v1 agent API.
[ https://issues.apache.org/jira/browse/MESOS-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhou xing reassigned MESOS-5515: Assignee: zhou xing > Implement READ_FILE Call in v1 agent API. > - > > Key: MESOS-5515 > URL: https://issues.apache.org/jira/browse/MESOS-5515 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: zhou xing > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5569) Document supported releases more prominently
Neil Conway created MESOS-5569: -- Summary: Document supported releases more prominently Key: MESOS-5569 URL: https://issues.apache.org/jira/browse/MESOS-5569 Project: Mesos Issue Type: Documentation Components: documentation, project website Reporter: Neil Conway {noformat} It would be great to make this information more prominent on the website, especially once 1.0.0 is released. For example, we could list the supported releases on https://mesos.apache.org/downloads/, along with a link to the versioning document. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5568) SlaveTest.KillTaskUnregisteredExecutor is slow
[ https://issues.apache.org/jira/browse/MESOS-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-5568: --- Description: {noformat} [--] 1 test from SlaveTest [ RUN ] SlaveTest.KillTaskUnregisteredExecutor [ OK ] SlaveTest.KillTaskUnregisteredExecutor (5128 ms) [--] 1 test from SlaveTest (5129 ms total) {noformat} I'm guessing this could be fixed by tweaking {{executor_shutdown_grace_period}}. was:I'm guessing this could be fixed by tweaking {{executor_shutdown_grace_period}}. > SlaveTest.KillTaskUnregisteredExecutor is slow > -- > > Key: MESOS-5568 > URL: https://issues.apache.org/jira/browse/MESOS-5568 > Project: Mesos > Issue Type: Bug > Components: tests >Reporter: Neil Conway > Labels: mesosphere > > {noformat} > [--] 1 test from SlaveTest > [ RUN ] SlaveTest.KillTaskUnregisteredExecutor > [ OK ] SlaveTest.KillTaskUnregisteredExecutor (5128 ms) > [--] 1 test from SlaveTest (5129 ms total) > {noformat} > I'm guessing this could be fixed by tweaking > {{executor_shutdown_grace_period}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5568) SlaveTest.KillTaskUnregisteredExecutor is slow
Neil Conway created MESOS-5568: -- Summary: SlaveTest.KillTaskUnregisteredExecutor is slow Key: MESOS-5568 URL: https://issues.apache.org/jira/browse/MESOS-5568 Project: Mesos Issue Type: Bug Components: tests Reporter: Neil Conway I'm guessing this could be fixed by tweaking {{executor_shutdown_grace_period}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5567) Scheduler HTTP API cuts JSON buffers
[ https://issues.apache.org/jira/browse/MESOS-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tobias Mueller updated MESOS-5567: -- Description: According to the docs at http://mesos.apache.org/documentation/latest/scheduler-http-api/ I assumed that the message format would only contain "full" (meaning parseable) JSON messages. In fact, I'm partially seeing splitted JSONs, where the next chunk is just continuing the first part: {noformat} 1983 {"offers":{"offers":[{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S0"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-0020"},"hostname":"172.17.10.102","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1055"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.102","ip":"172.17.10.102","port":5051},"path":"\/slave(1)","scheme":"http"}},{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S2"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-0020"},"hostname":"172.17.10.101","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1056"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.101","ip":"172.17.10.101","port":5051},"path":"\/slave(1)","scheme":"http"}},{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S1"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc 4f7e-0020"},"hostname":"172.17.10.103","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1057"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.103","ip":"172.17.10.103","port":5051},"path":"\/slave(1)","scheme":"http"}}]},"type":"OFFERS"} {noformat} I use the standard Node.js (4.4.5) http-client. was: According to the docs at http://mesos.apache.org/documentation/latest/scheduler-http-api/ I assumed that the message format would only contain "full" (meaning parseable) JSON messages. In fact, I'm partially seeing splitted JSONs, where the next chunk is prefixed with a string like `4f7e-0020`: {noformat} 1983 {"offers":{"offers":[{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S0"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-0020"},"hostname":"172.17.10.102","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1055"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.102","ip":"172.17.10.102","port":5051},"path":"\/slave(1)","scheme":"http"}},{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S2"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-0020"},"hostname":"172.17.10.101","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1056"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.101","ip":"172.17.10.101","port":5051},"path":"\/slave(1)","scheme":"http"}},{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S1"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc 4f7e-0020"},"hostname":"172.17.10.103","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1057"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.103","ip":"172.17.10.103","port":5051},"path":"\/slave(1)","scheme":"http"}}]},"type":"OFFERS"} {noformat} I use the standard Node.js (4.4.5) http-client. > Scheduler HTTP API cuts JSON buffers >
[jira] [Created] (MESOS-5567) Scheduler HTTP API cuts JSON buffers
Tobias Mueller created MESOS-5567: - Summary: Scheduler HTTP API cuts JSON buffers Key: MESOS-5567 URL: https://issues.apache.org/jira/browse/MESOS-5567 Project: Mesos Issue Type: Bug Components: HTTP API Affects Versions: 0.28.1 Environment: Ubuntu 14.04 latest Reporter: Tobias Mueller According to the docs at http://mesos.apache.org/documentation/latest/scheduler-http-api/ I assumed that the message format would only contain "full" (meaning parseable) JSON messages. In fact, I'm partially seeing splitted JSONs, where the next chunk is prefixed with a string like `4f7e-0020`: {noformat} 1983 {"offers":{"offers":[{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S0"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-0020"},"hostname":"172.17.10.102","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1055"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.102","ip":"172.17.10.102","port":5051},"path":"\/slave(1)","scheme":"http"}},{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S2"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-0020"},"hostname":"172.17.10.101","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1056"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.101","ip":"172.17.10.101","port":5051},"path":"\/slave(1)","scheme":"http"}},{"agent_id":{"value":"2a0fb93e-6125-4bcd-b19a-0be41b9d7bbe-S1"},"framework_id":{"value":"f7c62096-7fd3-446b-98df-14c991dc 4f7e-0020"},"hostname":"172.17.10.103","id":{"value":"f7c62096-7fd3-446b-98df-14c991dc4f7e-O1057"},"resources":[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":1985.0},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":35164.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"172.17.10.103","ip":"172.17.10.103","port":5051},"path":"\/slave(1)","scheme":"http"}}]},"type":"OFFERS"} {noformat} I use the standard Node.js (4.4.5) http-client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5566) Make "state" field of TaskStatus optional
Neil Conway created MESOS-5566: -- Summary: Make "state" field of TaskStatus optional Key: MESOS-5566 URL: https://issues.apache.org/jira/browse/MESOS-5566 Project: Mesos Issue Type: Improvement Components: general Reporter: Neil Conway Priority: Minor The {{SchedulerDriver}} interface uses a vector of {{TaskStatus}} for task reconciliation: the framework sends a list of taskIDs (with optional agent IDs), and the master replies with their current task statuses. Right now, frameworks also need to specify the {{state}} field of the input {{TaskStatus}}, because {{state}} is a {{required}} field. It would be cleaner to make {{state}} optional, because otherwise this makes the interface confusing. After doing this, we should remove the places where we set the {{state}} field when making reconciliation requests, e.g., in various test cases. See discussion in https://reviews.apache.org/r/48250/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources
[ https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320245#comment-15320245 ] Fan Du commented on MESOS-5545: --- [~jvanremoortere] Thanks for your constructive advices/suggestions! Yes, this will be a long way, but it's fun to experiment the idea. :) How about we sync up together in the next community meeting 6/16? In my heart, it's not the attribute that I hate, but lack of doing this automatically with boring maintenance effort. I will update my design doc to enhance current attribute with the goals: a. Automatically probing rack topology, modular popular network plugins, e.g. Ethernet, Infiniband etc. b. Using rack topology information to re-arrange agents in per rack basis. c. Design a common/friendly attribute scheme for framework to interpret d. ACLs to enforce security btw, may I ask can you shepherd this ticket? we can work shoulder by shoulder then. Thanks! > Add rack awareness support for Mesos resources > -- > > Key: MESOS-5545 > URL: https://issues.apache.org/jira/browse/MESOS-5545 > Project: Mesos > Issue Type: Story > Components: hadoop, master >Reporter: Fan Du > Attachments: RackAwarenessforMesos-Lite.pdf > > > Resources managed by Mesos master have no topology information of the > cluster, for example, rack topology. While lots of data center applications > have rack awareness feature to provide data locality, fault tolerance and > intelligent task placement. This ticket tries to investigate how to add rack > awareness for Mesos resources topology. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources
[ https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320125#comment-15320125 ] Fan Du commented on MESOS-5545: --- [~adam-mesos] Thanks for sharing your thoughts here, profound and impressive! Mesos performs the lower level resource scheduling, exporting the network topology will fall into Mesos's role. It's up to the framework scheduler like [Firmament|https://github.com/camsas/firmament] to do more sophisticated scheduling decision based on a qualitative approach. I will think more about here, willing to discuss with you if anything shiny pops up in my mind. > Add rack awareness support for Mesos resources > -- > > Key: MESOS-5545 > URL: https://issues.apache.org/jira/browse/MESOS-5545 > Project: Mesos > Issue Type: Story > Components: hadoop, master >Reporter: Fan Du > Attachments: RackAwarenessforMesos-Lite.pdf > > > Resources managed by Mesos master have no topology information of the > cluster, for example, rack topology. While lots of data center applications > have rack awareness feature to provide data locality, fault tolerance and > intelligent task placement. This ticket tries to investigate how to add rack > awareness for Mesos resources topology. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources
[ https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320093#comment-15320093 ] Fan Du commented on MESOS-5545: --- [~avin...@mesosphere.io] Thanks for the comments, apparently you did LLDP homework :) The topology here only refer to the access layer, that is the switch the agent directly connected to. And lldptool will take care of parsing LLDP packet in various ways, so to my best knowledge, this will not relate to libprocess part. You are right about LLDP has boundary of next bridge, i.e. only hop one time, in the scenario when OpenvSwitch invovled, Mesos runs inside KVM guest, I can think of two ways here: 1. It's the LLDP packets set by ovs bridge that matters so far, because ovs bridge now is the access bridge, and lldpad daemon will broadcast LLDP packets. 2. After commit [784b58a3|https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=784b58a327ad16967ab64bbfa558df81980d31e9], sys knobs could be tweaked to forward LLDP packets. I don't have any comments about using the label/attributes at the time being, I will work out something more appealing based on it. Will let you my thoughts! > Add rack awareness support for Mesos resources > -- > > Key: MESOS-5545 > URL: https://issues.apache.org/jira/browse/MESOS-5545 > Project: Mesos > Issue Type: Story > Components: hadoop, master >Reporter: Fan Du > Attachments: RackAwarenessforMesos-Lite.pdf > > > Resources managed by Mesos master have no topology information of the > cluster, for example, rack topology. While lots of data center applications > have rack awareness feature to provide data locality, fault tolerance and > intelligent task placement. This ticket tries to investigate how to add rack > awareness for Mesos resources topology. -- This message was sent by Atlassian JIRA (v6.3.4#6332)