RE: High performance, low latency framework over mesos

2017-03-14 Thread Assaf_Waizman
Thanks Benjamin,

I looked into the logs and it seems the delay is between the master and the 
scheduler:
Master log:
I0314 18:23:59.409423 39743 master.cpp:3776] Processing ACCEPT call for offers: 
[ afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-O25 ] on agent 
edbbafb6-4f7b-4da2-8782-8e01461906dc-S0 at slave(1)@10.201.98.16:5051 
(hadoop-master) for framework afd6b67b-cac0-4b9f-baf6-2a456f4e84fa- (Mesos 
BM Scheduler)
W0314 18:23:59.410166 39743 validation.cpp:1064] Executor 'MesosBMExecutorId' 
for task '8' uses less CPUs (None) than the minimum required (0.01). Please 
update your executor, as this will be mandatory in future releases.
W0314 18:23:59.410221 39743 validation.cpp:1076] Executor 'MesosBMExecutorId' 
for task '8' uses less memory (None) than the minimum required (32MB). Please 
update your executor, as this will be mandatory in future releases.
I0314 18:23:59.410292 39743 master.cpp:9053] Adding task 8 with resources 
cpus(*)(allocated: *):0.01 on agent edbbafb6-4f7b-4da2-8782-8e01461906dc-S0 at 
slave(1)@10.201.98.16:5051 (hadoop-master)
I0314 18:23:59.410331 39743 master.cpp:4426] Launching task 8 of framework 
afd6b67b-cac0-4b9f-baf6-2a456f4e84fa- (Mesos BM Scheduler) with resources 
cpus(*)(allocated: *):0.01 on agent edbbafb6-4f7b-4da2-8782-8e01461906dc-S0 at 
slave(1)@10.201.98.16:5051 (hadoop-master)
I0314 18:23:59.411258 39738 hierarchical.cpp:807] Updated allocation of 
framework afd6b67b-cac0-4b9f-baf6-2a456f4e84fa- on agent 
edbbafb6-4f7b-4da2-8782-8e01461906dc-S0 from cpus(*)(allocated: *):0.01 to 
cpus(*)(allocated: *):0.01
I0314 18:23:59.415060 39723 master.cpp:6992] Sending 1 offers to framework 
afd6b67b-cac0-4b9f-baf6-2a456f4e84fa- (Mesos BM Scheduler)
I0314 18:23:59.420624 39757 master.cpp:6154] Status update TASK_FINISHED (UUID: 
583ea071-de66-4050-9513-8ff432da605f) for task 8 of framework 
afd6b67b-cac0-4b9f-baf6-2a456f4e84fa- from agent 
edbbafb6-4f7b-4da2-8782-8e01461906dc-S0 at slave(1)@10.201.98.16:5051 
(hadoop-master)
I0314 18:23:59.420671 39757 master.cpp:6222] Forwarding status update 
TASK_FINISHED (UUID: 583ea071-de66-4050-9513-8ff432da605f) for task 8 of 
framework afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-
I0314 18:23:59.420819 39757 master.cpp:8302] Updating the state of task 8 of 
framework afd6b67b-cac0-4b9f-baf6-2a456f4e84fa- (latest state: 
TASK_FINISHED, status update state: TASK_FINISHED)
I0314 18:23:59.425354 39742 master.cpp:6992] Sending 1 offers to framework 
afd6b67b-cac0-4b9f-baf6-2a456f4e84fa- (Mesos BM Scheduler)
I0314 18:23:59.459801 39741 http.cpp:420] HTTP POST for 
/master/api/v1/scheduler from 127.0.0.1:36100
W0314 18:23:59.459889 39741 master.cpp:3634] Implicitly declining offers: [ 
afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-O26 ] in ACCEPT call for framework 
afd6b67b-cac0-4b9f-baf6-2a456f4e84fa- as the launch operation specified no 
tasks
I0314 18:23:59.460055 39741 master.cpp:3776] Processing ACCEPT call for offers: 
[ afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-O26 ] on agent 
edbbafb6-4f7b-4da2-8782-8e01461906dc-S0 at slave(1)@10.201.98.16:5051 
(hadoop-master) for framework afd6b67b-cac0-4b9f-baf6-2a456f4e84fa- (Mesos 
BM Scheduler)
I0314 18:23:59.460435 39741 http.cpp:420] HTTP POST for 
/master/api/v1/scheduler from 127.0.0.1:36100
I0314 18:23:59.460484 39741 master.cpp:5092] Processing ACKNOWLEDGE call 
583ea071-de66-4050-9513-8ff432da605f for task 8 of framework 
afd6b67b-cac0-4b9f-baf6-2a456f4e84fa- (Mesos BM Scheduler) on agent 
edbbafb6-4f7b-4da2-8782-8e01461906dc-S0

From master log you can see that the time between the master received the offer 
accept, until forwarding task_finished status to scheduler is ~11ms.
For some reason the scheduler acknowledge came after ~40ms although I’m sending 
it immediately upon status update. Moreover I can’t explain why the time 
between receiving an offer until accepting it (even on empty accept) take 
~40ms. I think this might be related to the problem.

My scheduler is written in Java, inheriting 
org.apache.mesos.v1.scheduler.Scheduler and passing it to 
org.apache.mesos.v1.scheduler.V1Mesos.
This is my scheduler impl for processUpdate and processOffers (called upon 
‘received’ with Protos.Event.UPDATE/OFFERS respectively):
private void processOffers(Mesos mesos, List offers) {
for (Offer offer : offers) {
Offer.Operation.Launch.Builder launch = 
Offer.Operation.Launch.newBuilder();

double offerCpus = 0;
for (Resource resource : offer.getResourcesList()){
if (resource.getName().equals("cpus")){
offerCpus += resource.getScalar().getValue();
}
}

LOGGER.info("Received offer " + offer.getId().getValue() + " from 
agent: " + offer.getHostname() + " [cpus: " + offerCpus + "]");

while (!pendingTasks.isEmpty())
{
Task task = pendingTasks.peek();
if (task.requiredCpus <= offerCpus){
   

[RESULT][VOTE] Release Apache Mesos 1.1.1 (rc2)

2017-03-14 Thread Alex Rukletsov
 Hi folks,

The vote for Mesos 1.1.1 (rc2) has passed with the following votes.

+1 (Binding)
--
*** AlexR
*** Till Tönshoff
*** Vinod Kone

There were no 0 or -1 votes.

Please find the release at:
https://dist.apache.org/repos/dist/release/mesos/1.1.1

It is recommended to use a mirror to download the release:
http://www.apache.org/dyn/closer.cgi

The CHANGELOG for the release is available at:
https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.1.1

The mesos-1.1.1.jar has been released to:
https://repository.apache.org

The website (http://mesos.apache.org) will be updated shortly to reflect
this release.

Thanks,
Alex & Till


Re: [VOTE] Release Apache Mesos 1.1.1 (rc2)

2017-03-14 Thread Alex Rukletsov
The vote is up for more than two weeks now and there are no -1's. I go
ahead and vote myself:

+1 (binding)

Tested on internal CI with several know issues.

On Tue, Mar 7, 2017 at 6:08 PM, Till Toenshoff  wrote:

> +1
>
> Tested on:
> - macOS 10.12.4 Beta (16E175b): ok
> - centos 6: mostly ok, MESOS-4736
> - centos 7: internal CI issues on capabilities tests, otherwise fine
> - debian 8: mostly ok, MESOS-7213
> - fedora 23: ok
> - ubuntu 12.04: mostly ok, MESOS-7218
> - ubuntu 14.04: mostly ok, MESOS-7218
> - ubuntu 16.04: mostly ok, MESOS-7218
>
>
> On Mar 4, 2017, at 1:09 AM, Vinod Kone  wrote:
>
> +1 (binding)
>
> Since the perf issue I reported earlier doesn't seem to be a blocker.
>
> On Fri, Mar 3, 2017 at 12:14 AM, Alex Rukletsov 
> wrote:
>
>> Was this perf issue introduced by one of the fixes included in 1.1.1-rc2?
>> If not, I would suggest we vote for 1.1.1-rc2 and back port the perf fix
>> into 1.1.2. IIUC, time based patch releases should *not be worse*, hence
>> if
>> the perf issue was already in 1.1.0 it is *fine* to fix it in 1.1.2. I
>> would like to avoid postponing already belated 1.1.1 for even longer.
>>
>> On Wed, Mar 1, 2017 at 8:02 PM, Vinod Kone  wrote:
>>
>> > Tested on ASF CI.
>> >
>> > Saw 2 configurations fail with
>> > https://issues.apache.org/jira/browse/MESOS-7160
>> >
>> > I think @jpeach and @bbannier were looking into this. Not sure about the
>> > severity of the issue, so withholding my vote.
>> >
>> >
>> > *Revision*: b9d8202a7444d0d1e49476bfc9817eb4583beaff
>> >
>> >- refs/tags/1.1.1-rc2
>> >
>> > Configuration Matrix gcc clang
>> > centos:7 --verbose --enable-libevent --enable-ssl autotools
>> > [image: Success]
>> > > > Release/30/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--
>> verbose%20--
>> > enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%
>> > 20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%
>> > 7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> > [image: Not run]
>> > cmake
>> > [image: Success]
>> > > > Release/30/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--
>> > verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=
>> > GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%
>> > 7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> > [image: Not run]
>> > --verbose autotools
>> > [image: Success]
>> > > > Release/30/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,
>> > ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_
>> > exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> > [image: Not run]
>> > cmake
>> > [image: Success]
>> > > > Release/30/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--
>> > verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%
>> > 3A7,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> > [image: Not run]
>> > ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
>> > [image: Success]
>> > > > Release/30/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--
>> verbose%20--
>> > enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%
>> > 20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%
>> > 7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> > [image: Failed]
>> > > > Release/30/BUILDTOOL=autotools,COMPILER=clang,CONFIGURATION=
>> --verbose%20--
>> > enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%
>> > 20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%
>> > 7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> > cmake
>> > [image: Success]
>> > > > Release/30/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--
>> > verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=
>> > GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(
>> > docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> > [image: Success]
>> > > > Release/30/BUILDTOOL=cmake,COMPILER=clang,CONFIGURATION=-
>> > -verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=
>> > GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(
>> > docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> > --verbose autotools
>> > [image: Success]
>> > > > Release/30/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,
>> > ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,
>> > label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> > [image: Failed]
>> > > > Release/30/BUILDTOOL=autotools,COMPILER=clang,CONFIGURATION=--verbose,
>> >