Re: java driver/shutdown call

2018-01-12 Thread Anand Mazumdar
Yes; It's a newer interface that still allows you to switch between the v1
(new) and the old API.

-anand

On Fri, Jan 12, 2018 at 3:28 PM, Mohit Jaggi <mohit.ja...@uber.com> wrote:

> Are you suggesting
>
> *send(new Call(METHOD, Param1, ...)) *
>
> instead of
>
> *driver.method(Param1, )*
>
> *?*
>
> On Fri, Jan 12, 2018 at 10:59 AM, Anand Mazumdar <mazumdar.an...@gmail.com
> > wrote:
>
>> Mohit,
>>
>> You can use the V1Mesos class that uses the v1 API internally allowing
>> you to send the 'SHUTDOWN' call. We also have a V0Mesos class that uses the
>> old scheduler driver internally.
>>
>> -anand
>>
>> On Wed, Jan 10, 2018 at 2:53 PM, Mohit Jaggi <mohit.ja...@uber.com>
>> wrote:
>>
>>> Thanks Vinod. Is there a V1SchedulerDriver.java file? I see
>>> https://github.com/apache/mesos/tree/72752fc6deb8ebcbfbd
>>> 5448dc599ef3774339d31/src/java/src/org/apache/mesos/v1/scheduler but it
>>> does not have a V1 driver.
>>>
>>> On Fri, Jan 5, 2018 at 3:59 PM, Vinod Kone <vinodk...@apache.org> wrote:
>>>
>>>> That's right. It is only available for v1 schedulers.
>>>>
>>>> On Fri, Jan 5, 2018 at 3:38 PM, Mohit Jaggi <mohit.ja...@uber.com>
>>>> wrote:
>>>>
>>>>> Folks,
>>>>> I am trying to change Apache Aurora's code to call SHUTDOWN instead of
>>>>> KILL. However, it seems that the SchedulerDriver class in Mesos does not
>>>>> have a shutdownExecutor() call.
>>>>>
>>>>> https://github.com/apache/mesos/blob/72752fc6deb8ebcbfbd5448
>>>>> dc599ef3774339d31/src/java/src/org/apache/mesos/SchedulerDriver.java
>>>>>
>>>>> Mohit.
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Anand Mazumdar
>>
>
>


-- 
Anand Mazumdar


Re: java driver/shutdown call

2018-01-12 Thread Anand Mazumdar
Mohit,

You can use the V1Mesos class that uses the v1 API internally allowing you
to send the 'SHUTDOWN' call. We also have a V0Mesos class that uses the old
scheduler driver internally.

-anand

On Wed, Jan 10, 2018 at 2:53 PM, Mohit Jaggi <mohit.ja...@uber.com> wrote:

> Thanks Vinod. Is there a V1SchedulerDriver.java file? I see
> https://github.com/apache/mesos/tree/72752fc6deb8ebcbfbd5448dc599ef
> 3774339d31/src/java/src/org/apache/mesos/v1/scheduler but it does not
> have a V1 driver.
>
> On Fri, Jan 5, 2018 at 3:59 PM, Vinod Kone <vinodk...@apache.org> wrote:
>
>> That's right. It is only available for v1 schedulers.
>>
>> On Fri, Jan 5, 2018 at 3:38 PM, Mohit Jaggi <mohit.ja...@uber.com> wrote:
>>
>>> Folks,
>>> I am trying to change Apache Aurora's code to call SHUTDOWN instead of
>>> KILL. However, it seems that the SchedulerDriver class in Mesos does not
>>> have a shutdownExecutor() call.
>>>
>>> https://github.com/apache/mesos/blob/72752fc6deb8ebcbfbd5448
>>> dc599ef3774339d31/src/java/src/org/apache/mesos/SchedulerDriver.java
>>>
>>> Mohit.
>>>
>>
>>
>


-- 
Anand Mazumdar


Re: [VOTE] Release Apache Mesos 1.4.1 (rc1)

2017-11-14 Thread Anand Mazumdar
ache.org/view/M-R/view/Mesos/job/Mesos-Rel
>> ease/43/BUILDTOOL=cmake,COMPILER=clang,CONFIGURATION=--
>> verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A1
>> 4.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>>
>>
>> On Thu, Nov 9, 2017 at 6:27 PM, Kapil Arya <ka...@mesosphere.io> wrote:
>>
>> > Hi all,
>> >
>> > Please vote on releasing the following candidate as Apache Mesos 1.4.1.
>> >
>> > 1.4.1 includes the following:
>> > 
>> > 
>> > * [MESOS-7873] - Expose `ExecutorInfo.ContainerInfo.NetworkInfo` in
>> Mesos
>> > `state` endpoint.
>> > * [MESOS-7921] - ProcessManager::resume sometimes crashes accessing
>> > EventQueue.
>> > * [MESOS-7964] - Heavy-duty GC makes the agent unresponsive.
>> >
>> > * [MESOS-7968] - Handle `/proc/self/ns/pid_for_children` when parsing
>> > available namespace.
>> > * [MESOS-7969] - Handle cgroups v2 hierarchy when parsing
>> > /proc/self/cgroups.
>> > * [MESOS-7980] - Stout fails to compile with libc >= 2.26.
>> >
>> > * [MESOS-8051] - Killing TASK_GROUP fail to kill some tasks.
>> >
>> > * [MESOS-8080] - The default executor does not propagate missing task
>> exit
>> > status correctly.
>> > * [MESOS-8090] - Mesos 1.4.0 crashes with 1.3.x agent with
>> oversubscription
>> >
>> > * [MESOS-8135] - Masters can lose track of tasks' executor IDs.
>> >
>> > * [MESOS-8169] - Incorrect master validation forces executor IDs to be
>> > globally unique.
>> >
>> >
>> > The CHANGELOG for the release is available at:
>> > https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_
>> > plain;f=CHANGELOG;hb=1.4.1-rc1
>> > 
>> > ----
>> >
>> > The candidate for Mesos 1.4.1 release is available at:
>> > https://dist.apache.org/repos/dist/dev/mesos/1.4.1-rc1/mesos
>> -1.4.1.tar.gz
>> >
>> > The tag to be voted on is 1.4.1-rc1:
>> > https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit
>> ;h=1.4.1-rc1
>> >
>> > The MD5 checksum of the tarball can be found at:
>> > https://dist.apache.org/repos/dist/dev/mesos/1.4.1-rc1/
>> > mesos-1.4.1.tar.gz.md5
>> >
>> > The signature of the tarball can be found at:
>> > https://dist.apache.org/repos/dist/dev/mesos/1.4.1-rc1/
>> > mesos-1.4.1.tar.gz.asc
>> >
>> > The PGP key used to sign the release is here:
>> > https://dist.apache.org/repos/dist/release/mesos/KEYS
>> >
>> > The JAR is in a staging repository here:
>> > https://repository.apache.org/content/repositories/orgapachemesos-1216
>> >
>> > Please vote on releasing this package as Apache Mesos 1.4.1!
>> >
>> > The vote is open until Monday, November 13, 2017, 11:59 PM EST and
>> passes
>> > if a majority of at least 3 +1 PMC votes are cast.
>> >
>> > [ ] +1 Release this package as Apache Mesos 1.4.1
>> > [ ] -1 Do not release this package because ...
>> >
>> > Thanks,
>> > Anand and Kapil
>> >
>>
>
>


-- 
Anand Mazumdar


Re: [VOTE] Release Apache Mesos 1.4.0 (rc5)

2017-09-18 Thread Anand Mazumdar
+1 (binding)

make check passed on Ubuntu 16.04

-anand

On Fri, Sep 15, 2017 at 2:12 PM, Kapil Arya  wrote:

> +1 (binding)
>
> Internal CI with Centos 6/7, Fedora 23, Debian 8, and Ubuntu 12/14/16.
>
> On Fri, Sep 15, 2017 at 5:08 PM, Vinod Kone  wrote:
>
>> Ok. Looks like a test issue per https://reviews.apache.org/r/60467/
>>
>> +1(binding)
>>
>> On Fri, Sep 15, 2017 at 12:16 PM, Michael Park  wrote:
>>
>>> Vinod, regarding MESOS-7729
>>> :
>>>
>>> I found MESOS-6345  
>>> related
>>> to persistent volume framework, which leads me to believe that this is not
>>> new.
>>>
>>> Thanks,
>>>
>>> MPark
>>>
>>> On Tue, Sep 12, 2017 at 12:01 PM Vinod Kone 
>>> wrote:
>>>
 Tested this on ASF CI.

 Saw 3 flaky tests.

 https://issues.apache.org/jira/browse/MESOS-7729
 

 https://issues.apache.org/jira/browse/MESOS-7971
 https://issues.apache.org/jira/browse/MESOS-7972

 The first one was a known (since 1.4.0) flaky test with a double free
 corruption. @Kapil and @MPark can you verify that this is an issue with
 the
 test and not the source code? Once verified, I'll give a +1.

 *Revision*: b3fd2e7ab26e118222fe18af4b92c53a3c01e6cc

- refs/tags/1.4.0-rc5

 Configuration Matrix gcc clang
 centos:7 --verbose --enable-libevent --enable-ssl autotools
 [image: Success]
 
 [image: Not run]
 cmake
 [image: Success]
 
 [image: Not run]
 --verbose autotools
 [image: Failed]
 
 [image: Not run]
 cmake
 [image: Success]
 
 [image: Not run]
 ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
 [image: Success]
 
 [image: Success]
 
 cmake
 [image: Success]
 
 [image: Success]
 
 --verbose autotools
 [image: Success]
 
 [image: Success]
 
 cmake
 [image: Failed]
 

Mesos 1.4.0

2017-07-27 Thread Anand Mazumdar
Hello everyone,

It's about time for Mesos 1.4.0 (somewhat late though, 1.3 rc1 was cut on
5/5) . Kapil would be the primary release manager and I would be the
co-release manager.

We expect to cut rc1 in the coming couple of weeks. Here's how you can help:
- Set *Target Version = "1.4.0"* for anything that needs to go into this
release. Anything not critical can wait for Mesos 1.5.
- Upgrade release blockers to *"Blocker" priority*. Use "Critical" for any
issues that would be painful (but possible) to ship Mesos 1.4 without.

Mesos 1.4 release dashboard:
https://issues.apache.org/jira/secure/Dashboard.jspa?selectPageId=12331513

-anand


Re: Plan for upgrading protobuf==3.2.0 in Mesos

2017-05-26 Thread Anand Mazumdar
We recently committed this [1] and it would be part of the *next major
release* (1.4.0). Also, we upgraded to the newer protobuf release 3.3.0.

For Mesos developers, this means that we can use proto3 features like arena
allocation [2], maps [3] etc. Note that we still need to use the proto2
syntax version for backward compatibility.

Thanks Zhitao for the contributions!

[1] https://issues.apache.org/jira/browse/MESOS-7228
[2] https://issues.apache.org/jira/browse/MESOS-5783
[3] https://developers.google.com/protocol-buffers/docs/proto#maps

-anand


On Thu, Apr 27, 2017 at 10:28 AM, Anand Mazumdar <an...@apache.org> wrote:

> + dev
>
> Bumping up the thread to ensure it's not missed.
>
> -anand
>
> On Tue, Apr 25, 2017 at 11:01 AM, Zhitao Li <zhitaoli...@gmail.com> wrote:
> > Dear framework owners and users,
> >
> > We are working on upgrading the protobuf library in Mesos to 3.2.0 in
> > https://issues.apache.org/jira/browse/MESOS-7228, to overcome some
> protobuf
> > limitation on message size as well as preparing for further improvement.
> We
> > aim to release this with the upcoming Mesos 1.3.0.
> >
> > Because we upgraded the protoc compiler in this process, all generated
> java
> > and python code may not be compatible with protobuf 2.6.1 (the previous
> > dependency), and we ask you to upgrade the protobuf dependency to 3.2.0
> when
> > you upgrade your framework dependency to 1.3.0.
> >
> > For java, a snapshot maven artifact has been prepared (by Anand
> Mazumdar's
> > courtesy) at
> > https://repository.apache.org/content/repositories/
> snapshots/org/apache/mesos/mesos/1.3.0-SNAPSHOT/
> > . Please feel free to play out with it and let us know if you run into
> any
> > issues.
> >
> > Note that the binary upgrade process should still be compatible: any
> java or
> > based framework (scheduler or executor) should still work out of box with
> > Mesos 1.3.0 once released. It is suggested to get your cluster upgraded
> to
> > 1.3.0 first, then come back and upgrade your executors and schedulers.
> >
> > We understand this may expose inconvenience around updating the protobuf
> > dependency, so please let us know if you have any concern or further
> > questions.
> >
> > --
> >
> > Cheers,
> >
> > Zhitao Li and Anand Mazumdar,
>


Re: [VOTE] Release Apache Mesos 1.0.4 (rc2)

2017-05-03 Thread Anand Mazumdar
+1 (binding)

make check passed on Ubuntu 16.04 with clang 3.6

-anand

On Wed, May 3, 2017 at 10:01 AM, Vinod Kone  wrote:

> +1 (binding)
>
> *Revision*: 4154f66d6c6dde8fd2cf2bbf0bfa155f24ac55d4
>
>- refs/tags/1.0.4-rc2
>
> Configuration Matrix gcc clang
> centos:7 --verbose --enable-libevent --enable-ssl autotools
> [image: Success]
> 
> [image: Not run]
> cmake
> [image: Success]
> 
> [image: Not run]
> --verbose autotools
> [image: Success]
> 
> [image: Not run]
> cmake
> [image: Success]
> 
> [image: Not run]
> ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
> [image: Success]
> 
> [image: Success]
> 
> cmake
> [image: Success]
> 
> [image: Success]
> 
> --verbose autotools
> [image: Success]
> 
> [image: Success]
> 
> cmake
> [image: Success]
> 
> [image: Success]
> 
>
> On Tue, May 2, 2017 at 4:03 PM, Benjamin Mahler 
> wrote:
>
>> +1 make check passes on macOS 10.12.4 with clang
>>
>> On Tue, May 2, 2017 at 12:04 PM, Vinod Kone  wrote:
>>
>> > Hi all,
>> >
>> >
>> > Please vote on releasing the following candidate as Apache Mesos 1.0.4.
>> >
>> >
>> > 1.0.4 includes the following:
>> >
>> > 
>> > 
>> >
>> > * [MESOS-2537] - AC_ARG_ENABLED checks are broken
>> >
>> >
>> > * [MESOS-6606] - Reject optimized builds with libcxx before 3.9
>> >
>> >
>> > * [MESOS-7008] - Quota not recovered from registry in empty cluster.
>> >
>> >
>> > * [MESOS-7265] - Containerizer startup may cause sensitive data to
>> leak
>> > into sandbox logs.
>> >
>> > * [MESOS-7366] - Agent sandbox gc could accidentally delete the
>> entire
>> > persistent volume content.
>> >
>> > * [MESOS-7383] - Docker executor logs possibly sensitive parameters.
>> >
>> >
>> > * [MESOS-7422] - Docker 

Re: Plan for upgrading protobuf==3.2.0 in Mesos

2017-04-27 Thread Anand Mazumdar
+ dev

Bumping up the thread to ensure it's not missed.

-anand

On Tue, Apr 25, 2017 at 11:01 AM, Zhitao Li <zhitaoli...@gmail.com> wrote:
> Dear framework owners and users,
>
> We are working on upgrading the protobuf library in Mesos to 3.2.0 in
> https://issues.apache.org/jira/browse/MESOS-7228, to overcome some protobuf
> limitation on message size as well as preparing for further improvement. We
> aim to release this with the upcoming Mesos 1.3.0.
>
> Because we upgraded the protoc compiler in this process, all generated java
> and python code may not be compatible with protobuf 2.6.1 (the previous
> dependency), and we ask you to upgrade the protobuf dependency to 3.2.0 when
> you upgrade your framework dependency to 1.3.0.
>
> For java, a snapshot maven artifact has been prepared (by Anand Mazumdar's
> courtesy) at
> https://repository.apache.org/content/repositories/snapshots/org/apache/mesos/mesos/1.3.0-SNAPSHOT/
> . Please feel free to play out with it and let us know if you run into any
> issues.
>
> Note that the binary upgrade process should still be compatible: any java or
> based framework (scheduler or executor) should still work out of box with
> Mesos 1.3.0 once released. It is suggested to get your cluster upgraded to
> 1.3.0 first, then come back and upgrade your executors and schedulers.
>
> We understand this may expose inconvenience around updating the protobuf
> dependency, so please let us know if you have any concern or further
> questions.
>
> --
>
> Cheers,
>
> Zhitao Li and Anand Mazumdar,


[Design doc][RFC] Agent Lifecycle Management

2017-04-25 Thread Anand Mazumdar
Hello everyone,

We are working on adding support for agent lifecycle management [1] that
will provide a feedback mechanism for frameworks in case of agent node
failures. The existing agent lost [2] signal is not sufficient for
frameworks to ascertain that a given agent node isn't coming back.

Here is a link to the design doc:
https://docs.google.com/document/d/1XvP0acT8xadSev8UG2BXtsPlEh0Rb7R3WV3s-TnTeqg

Please feel free to provide any feedback via comments on the doc.

[1] JIRA Epic: https://issues.apache.org/jira/browse/MESOS-7426

[2]
https://github.com/apache/mesos/blob/master/include/mesos/v1/scheduler/scheduler.proto#L151

-anand


Re: High performance, low latency framework over mesos

2017-03-20 Thread Anand Mazumdar
Thanks for the detailed timeline of the logs. The scheduler library does
not create a new connection per call. I have a suspicion that the slowness
might be related to https://issues.apache.org/jira/browse/MESOS-6405 instead.
To confirm this, can you change your scheduler to use the `V0Mesos` class (
https://github.com/apache/mesos/blob/master/src/java/src/org/apache/mesos/v1/scheduler/V0Mesos.java)
and build an updated timeline of logs based on that?

-anand




On Mon, Mar 20, 2017 at 6:42 AM, <assaf_waiz...@amat.com> wrote:

> Hi,
>
>
>
> I added logs to my scheduler and executor and trace a specific task (task
> 1) timeline (the behavior is similar to all tasks).
>
> It seems the most of the time (~30ms, lines 4-5 below) is ‘lost’ between
> the scheduler submits the task (i.e. sending an ACCEPT call with the
> submitted task) until the master gets the HTTP POST of this call (does the
> scheduler library creates a new connection per call?).
>
> After the master gets the ACCEPT call it takes another ~10ms for materà
> slaveàexecutoràslaveàmaster round trip (but maybe its reasonable for 4
> IPC network calls).
>
>
>
> Later on, I also see that scheduler ACKNOWLEDGE call for task 1 status
> update (i.e. sending an ACKNOWLEDGE call with the uuid got in status
> update) also seen in master only ~35ms (lines 18-19 below) after the call.
> I’m starting to conclude the each call using the scheduler library (which
> actually involves HTTP POST) takes ~40ms.
>
>
>
> To sum it up, it seems that the main factor for the high latency I get
> here is due to HTTP POST mechanism in scheduler library. Is there a way to
> improve it? Is it possible to keep a scheduleràmaster connection always
> connected?
>
>
>
>
>
> Thanks.
>
>
>
> *From:* Benjamin Mahler [mailto:bmah...@apache.org]
> *Sent:* Thursday, March 16, 2017 3:12 AM
> *To:* user
> *Cc:* Anand Mazumdar
>
> *Subject:* Re: High performance, low latency framework over mesos
>
>
>
> The breakdown of the 10ms between launching on the agent and getting
> TASK_FINISHED from the agent could be seen by looking at the agent logs. Do
> you have them?
>
>
>
> The 40ms it takes from forwarding the update to receiving the
> acknowledgement seems excessive, since the scheduler has to do a lot less
> work than the agent and that only takes 10ms. It would be great have the
> scheduler do some logging so that we can see if there is network latency
> here or there is inefficient processing in the v1 scheduler library.
>
>
>
> As a next step, I would suggest building the complete timeline of logs,
> which includes scheduler (you likely need to do some logging here), master
> and agent logs.
>
>
>
> On Tue, Mar 14, 2017 at 8:43 AM, <assaf_waiz...@amat.com> wrote:
>
> Thanks Benjamin,
>
>
>
> I looked into the logs and it seems the delay is between the master and
> the scheduler:
>
> Master log:
>
> I0314 *18:23:59.409423* 39743 master.cpp:3776] *Processing ACCEPT call*
> for offers: [ afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-O25 ] on agent
> edbbafb6-4f7b-4da2-8782-8e01461906dc-S0 at slave(1)@10.201.98.16:5051
> (hadoop-master) for framework afd6b67b-cac0-4b9f-baf6-2a456f4e84fa-
> (Mesos BM Scheduler)
>
> W0314 18:23:59.410166 39743 validation.cpp:1064] Executor
> 'MesosBMExecutorId' for task '8' uses less CPUs (None) than the minimum
> required (0.01). Please update your executor, as this will be mandatory in
> future releases.
>
> W0314 18:23:59.410221 39743 validation.cpp:1076] Executor
> 'MesosBMExecutorId' for task '8' uses less memory (None) than the minimum
> required (32MB). Please update your executor, as this will be mandatory in
> future releases.
>
> I0314 18:23:59.410292 39743 master.cpp:9053] Adding task 8 with resources
> cpus(*)(allocated: *):0.01 on agent edbbafb6-4f7b-4da2-8782-8e01461906dc-S0
> at slave(1)@10.201.98.16:5051 (hadoop-master)
>
> I0314 *18:23:59.410331* 39743 master.cpp:4426] *Launching task 8 of
> framework* afd6b67b-cac0-4b9f-baf6-2a456f4e84fa- (Mesos BM Scheduler)
> with resources cpus(*)(allocated: *):0.01 on agent 
> edbbafb6-4f7b-4da2-8782-8e01461906dc-S0
> at slave(1)@10.201.98.16:5051 (hadoop-master)
>
> I0314 18:23:59.411258 39738 hierarchical.cpp:807] Updated allocation of
> framework afd6b67b-cac0-4b9f-baf6-2a456f4e84fa- on agent
> edbbafb6-4f7b-4da2-8782-8e01461906dc-S0 from cpus(*)(allocated: *):0.01
> to cpus(*)(allocated: *):0.01
>
> I0314 18:23:59.415060 39723 master.cpp:6992] Sending 1 offers to framework
> afd6b67b-cac0-4b9f-baf6-2a456f4e84fa- (Mesos BM Scheduler)
>
> I0314 *18:23:59.420624* 39757 master.cpp:6154] *Status update
> TASK_FINISHED* (UUID: 583ea071-de66-4050

[Proposal] Media type for streaming requests/responses

2017-01-07 Thread Anand Mazumdar
Hello All,

We recently added support for request streaming as part of the Debugging
epic (MESOS-6460). As a follow up on that, we want your suggestions and
feedback via comments on the proposal draft [1] around the media type to
use for the 'Content-Type' header for streaming requests/responses.

[1] http://bit.ly/2iovQVe

-anand


[RESULT][VOTE] Release Apache Mesos 0.28.3 (rc1)

2016-12-05 Thread Anand Mazumdar
Hi all,

The vote for Mesos 0.28.3 (rc1) has passed with the
following votes.

+1 (Binding)
--
Alex Rukletsov
Vinod Kone
Benjamin Mahler

+1 (Non-binding)
--
Greg Mann

There were no 0 or -1 votes.

Please find the release at:
https://dist.apache.org/repos/dist/release/mesos/0.28.3

It is recommended to use a mirror to download the release:
http://www.apache.org/dyn/closer.cgi

The CHANGELOG for the release is available at:
https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.28.3

The mesos-0.28.3.jar has been released to:
https://repository.apache.org

The website (http://mesos.apache.org) will be updated shortly to reflect
this release.

Thanks,
Anand & Joseph


[VOTE] Release Apache Mesos 0.28.3 (rc1)

2016-11-23 Thread Anand Mazumdar
Hi all,

Please vote on releasing the following candidate as Apache Mesos 0.28.3.


0.28.3 includes the following:


** Bug
  * [MESOS-2043] - Framework auth fail with timeout error and never
get authenticated
  * [MESOS-4638] - Versioning preprocessor macros.
  * [MESOS-5073] - Mesos allocator leaks role sorter and quota role sorters.
  * [MESOS-5330] - Agent should backoff before connecting to the master.
  * [MESOS-5390] - v1 Executor Protos not included in maven jar
  * [MESOS-5543] - /dev/fd is missing in the Mesos containerizer environment.
  * [MESOS-5571] - Scheduler JNI throws exception when the major
versions of JAR and libmesos don't match.
  * [MESOS-5576] - Masters may drop the first message they send
between masters after a network partition.
  * [MESOS-5673] - Port mapping isolator may cause segfault if it bind
mount root does not exist.
  * [MESOS-5691] - SSL downgrade support will leak sockets in CLOSE_WAIT status.
  * [MESOS-5698] - Quota sorter not updated for resource changes at agent.
  * [MESOS-5723] - SSL-enabled libprocess will leak incoming links to forks.
  * [MESOS-5740] - Consider adding `relink` functionality to libprocess.
  * [MESOS-5748] - Potential segfault in `link` when linking to a
remote process.
  * [MESOS-5763] - Task stuck in fetching is not cleaned up after
--executor_registration_timeout.
  * [MESOS-5913] - Stale socket FD usage when using libevent + SSL.
  * [MESOS-5927] - Unable to run "scratch" Dockerfiles with Unified
Containerizer.
  * [MESOS-5943] - Incremental http parsing of URLs leads to decoder error.
  * [MESOS-5986] - SSL Socket CHECK can fail after socket receives EOF.
  * [MESOS-6104] - Potential FD double close in libevent's
implementation of `sendfile`.
  * [MESOS-6142] - Frameworks may RESERVE for an arbitrary role.
  * [MESOS-6152] - Resource leak in libevent_ssl_socket.cpp.
  * [MESOS-6233] - Master CHECK fails during recovery while relinking
to other masters.
  * [MESOS-6234] - Potential socket leak during Zookeeper network changes.
  * [MESOS-6246] - Libprocess links will not generate an ExitedEvent
if the socket creation fails.
  * [MESOS-6299] - Master doesn't remove task from pending when it is invalid.
  * [MESOS-6457] - Tasks shouldn't transition from TASK_KILLING to TASK_RUNNING.
  * [MESOS-6502] - _version uses incorrect
MESOS_{MAJOR,MINOR,PATCH}_VERSION in libmesos java binding.
  * [MESOS-6527] - Memory leak in the libprocess request decoder.
  * [MESOS-6621] - SSL downgrade path will CHECK-fail when using both
temporary and persistent sockets


The CHANGELOG for the release is available at:
https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.28.3-rc1


The candidate for Mesos 0.28.3 release is available at:
https://dist.apache.org/repos/dist/dev/mesos/0.28.3-rc1/mesos-0.28.3.tar.gz

The tag to be voted on is 0.28.3-rc1:
https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.28.3-rc1

The MD5 checksum of the tarball can be found at:
https://dist.apache.org/repos/dist/dev/mesos/0.28.3-rc1/mesos-0.28.3.tar.gz.md5

The signature of the tarball can be found at:
https://dist.apache.org/repos/dist/dev/mesos/0.28.3-rc1/mesos-0.28.3.tar.gz.asc

The PGP key used to sign the release is here:
https://dist.apache.org/repos/dist/release/mesos/KEYS

The JAR is up in Maven in a staging repository here:
https://repository.apache.org/content/repositories/orgapachemesos-1170

Please vote on releasing this package as Apache Mesos 0.28.3!

The vote is open until Sat Nov 26 14:59:10 PST 2016 and passes if a
majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Mesos 0.28.3
[ ] -1 Do not release this package because ...

Thanks,
Anand & Joseph


Re: Mesos V1 Operator HTTP API - Java Proto Classes

2016-11-16 Thread Anand Mazumdar
We wanted to move the project away from officially supporting anything
other than C++ and discuss more on if we should be responsible for
publishing to the various language specific channels. However, for the time
being, we had decided to include the v1 protobufs in the mesos JAR itself.
(it already contains the v1 Scheduler/Executor protos)

Please file an issue as Zameer pointed out.

-anand

On Wed, Nov 16, 2016 at 8:34 AM, Zameer Manji  wrote:

> I think this is a bug, I feel the jar should include all v1 protobuf files.
>
> Vijay, I encourage you to file a ticket.
>
> On Tue, Nov 15, 2016 at 8:04 PM, Vijay Srinivasaraghavan <
> vijikar...@yahoo.com.invalid> wrote:
>
>> I believe the HTTP API will use the same underlying message format (proto
>> def) and hence the request/response value objects (java) needs to be
>> auto-generated from the proto files for it to be used in Jersey based java
>> rest client?
>>
>> On Tuesday, November 15, 2016 12:37 PM, Tomek Janiszewski <
>> jani...@gmail.com> wrote:
>>
>>
>>  I suspect jar is deprecated and includes only old API used by mesoslib.
>> The
>> goal is to create HTTP API and stop supporting native libs (jars, so,
>> etc).
>> I think you shouldn't use that jar in your project.
>>
>> wt., 15.11.2016, 20:38 użytkownik Vijay Srinivasaraghavan <
>> vijikar...@yahoo.com> napisał:
>>
>> > Hello,
>> >
>> > I am writing a rest client for "operator APIs" and found that some of
>> the
>> > protobuf java classes (like "include/mesos/v1/quota/quota.proto",
>> > "include/mesos/v1/master/master.proto") are not included in the mesos
>> jar
>> > file. While investigating, I have found that the "Make" file does not
>> > include these proto definition files.
>> >
>> > I have updated the Make file and added the protos that I am interested
>> in
>> > and built a new jar file. Is there any reason why these proto
>> definitions
>> > are not included in the original build apart from the reason that the
>> APIs
>> > are still evolving?
>> >
>> > Regards
>> > Vijay
>> >
>>
>> --
>> Zameer Manji
>>
>


[HTTP API] Client Libraries

2016-07-06 Thread Anand Mazumdar
Hi,

We recently committed documentation around available client libraries for the 
Scheduler 
/Executor
  HTTP 
API’s. 

Link to doc: 
https://github.com/apache/mesos/blob/master/docs/api-client-libraries.md 
 

It would be great if folks can send a PR or review to add more implementations 
that they maintain/use.

-anand

Re: Are you using New HTTP API Yet ?

2016-05-19 Thread Anand Mazumdar
Hi Chris,

Currently, we don’t have any documentation listing all the library 
implementations. I filed MESOS-5419 
 to address this.

Since you guys code in Scala, you might want to have a look at Mesos RxJava: 
https://github.com/mesosphere/mesos-rxjava 


-anand

> On May 19, 2016, at 1:40 PM, Chris Baker  wrote:
> 
> We are moving one of our frameworks to using the HTTP API. We code in Scala, 
> and we had originally looked at using the Jesos because I had thought it was 
> using the HTTP API, but apparently it is not. Neither is pesos (python). The 
> only one that I've been able to find is mesos-go. 
> 
> Is there a list of API wrappers available? Is it just mesos-go right now? 
> 
> On Wed, May 11, 2016 at 2:10 PM Vladimir Vivien  > wrote:
> Is anyone using the new Mesos HTTP Scheduler/Executor APIs to create 
> frameworks? If so:
> - what language ?
> - are you using an existing binding as API wrapper (whichh one) ?
> - or using your own custom built API wrapper ?
> - do you prefer old bindings vs newer http-based api ?
> - any links discussing about your impl that can be shared ?
> 
> Thanks for your help.
> -- 
> Vladimir Vivien



Re: Framework taking default resources even though a role is specified

2016-04-15 Thread Anand Mazumdar
FWIW, we recently fixed `mesos-execute` (command scheduler) to add support for 
roles. It should be available in the next release (0.29).

https://issues.apache.org/jira/browse/MESOS-4744 


-anand

> On Apr 15, 2016, at 11:41 AM, June Taylor  wrote:
> 
> Ken,
> 
> Thanks for your reply.
> 
> Is there a way to ensure a framework only receives the reserved resources?
> 
> I would go ahead and take everything out of the * role, however, the 
> 'mesos-execute' command doesn't support specifying a role, so that's the only 
> way we can currently get mesos-execute to co-exist with pyspark.
> 
> Any other thoughts from the group?
> 
> 
> Thanks,
> June Taylor
> System Administrator, Minnesota Population Center
> University of Minnesota
> 
> On Fri, Apr 15, 2016 at 11:54 AM, Ken Sipe  > wrote:
> The framework with role “production” will receive production resources and * 
> resources
> All other frameworks (assuming no role) will only receive * resources
> 
> ken
> 
> > On Apr 15, 2016, at 11:38 AM, June Taylor  > > wrote:
> >
> > We have a small cluster with 3 nodes in the * resource role default, and 3 
> > nodes in a "production" resource role.
> >
> > Starting up a framework which requests "production" properly executes on 
> > the expected nodes, however, today we noticed that this job also started up 
> > executors under the * resource role as well.
> >
> > We expect these tasks to only go on nodes with the "production" resource 
> > role. Can you advise further?
> >
> > Thanks,
> > June Taylor
> > System Administrator, Minnesota Population Center
> > University of Minnesota
> 
> 



Re: Error on Teardown attempt: Framework is not connected via HTTP

2016-04-15 Thread Anand Mazumdar
The `py-spark` framework looks to be driver based i.e. it uses the 
`MesosSchedulerDriver` underneath. You would need to use the `/teardown` 
endpoint that takes in the `frameworkId`as a query parameter for tearing it 
down. For more details, see: 
http://mesos.apache.org/documentation/latest/endpoints/master/teardown/ 


The `TEARDOWN` call to `/api/v1/scheduler` endpoint only works if your 
framework is using the new Scheduler API 
. Hope this 
helps.

-anand

> On Apr 15, 2016, at 12:56 PM, June Taylor  wrote:
> 
> We're getting the highlighted error message returned when attempting to tear 
> down a framework on our cluster:
> 
> june@cluster:~$ mesos frameworks
>  ID  NAMEHOST 
>   ACTIVE  TASKS   CPU MEM DISK
>  0c540ad0-a050-4c20-82df-7bd14ce95f51-0090  pyspark-shell  cluster   True 
> 4115.0  450560.0  0.0
> 
> 
> june@cluster:~$ curl -XPOST http://cluster 
> :5050/api/v1/scheduler -d '{ "framework_id": { "value": 
> "0c540ad0-a050-4c20-82df-7bd14ce95f51-0090" }, "type": "TEARDOWN"}' -H 
> Content-Type:application/json
> Framework is not connected via HTTP
> 
> We cannot get this framework to shut down. I'm not sure why we're getting 
> this type of error message, as the same POST command has worked against other 
> framework IDs in the past.
> 
> Your thoughts are much appreciated.
> 
> Thanks,
> June Taylor
> System Administrator, Minnesota Population Center
> University of Minnesota



Re: Native Lib vs. Rest API

2015-12-03 Thread Anand Mazumdar
One minor clarification:

> * AuthN support for HTTP API: 
> https://issues.apache.org/jira/browse/MESOS-3923 
> 
> ** Required in order to be able to use persistent volumes or dynamic 
> reservations (added in 0.25.0)


You should be able to use persistent volumes/dynamic reservations via the HTTP 
Scheduler API with AuthN disabled on the master 
 (via 
—no-authenticate flag ).

-anand

> On Dec 3, 2015, at 5:05 PM, Ben Whitehead  wrote:
> 
> There is an example framework in the repo that creates some `sleep` tasks 
> based on resource offers.
> 
> Regarding status there are still a number of items that require 
> implementation from mesos before they can be supported by mesos-rxjava.
> 
> * AuthN support for HTTP API: 
> https://issues.apache.org/jira/browse/MESOS-3923 
> 
> ** Required in order to be able to use persistent volumes or dynamic 
> reservations (added in 0.25.0)
> * Master Redirection: https://issues.apache.org/jira/browse/MESOS-3832 
> 
> 
> Custom Executors still require the use of libmesos.
> 
> The full set of JIRA issues for the HTTP APIs can be seen here: 
> https://issues.apache.org/jira/browse/MESOS-3302 
> 
> 
> Regarding functionality, due to the new design of the Mesos HTTP API some 
> things that libmesos took care of for the user before now have to be taken 
> care of by the user.  For example, reconnecting to the master if the 
> connection is lost. Users have to ACK task status updates that have a 
> specified UUID. One thing that only exists in the new HTTP API is the idea of 
> inverse offers (https://issues.apache.org/jira/browse/MESOS-1474 
> ) to facilitate maintenance 
> on mesos agents.
> 
> I also don't yet have a jar published to maven central or sonatype (snapshot 
> hopefully in the next week or so).
> 
> The other change that isn't really about functionality is the paradigm 
> change. The new HTTP APIs are modeled as an event stream rather than actor 
> messages (translated to callbacks in the current libmesos api).
> 
> Hope this answers your question.
> 
> --Ben Whitehead
> 
> On Thu, Dec 3, 2015 at 3:37 PM, Charles Allen  > wrote:
> @Ben : Is the status of the mesos-rxjava MORE or LESS functional than the 
> legacy mesos .jar with the native library?
> 
> On Thu, Dec 3, 2015 at 12:05 PM Ben Whitehead  > wrote:
> Hi John,
> 
> If you're using Java there is already a prototype client atop the new 
> Scheduler HTTP API using RxJava: https://github.com/mesosphere/mesos-rxjava 
>  Happy to provide more info if 
> interested.
> 
> --Ben Whitehead
> 
> On Thu, Dec 3, 2015 at 11:52 AM, John Omernik  > wrote:
> Thank you!
> 
> On Thu, Dec 3, 2015 at 1:40 PM, Vinod Kone  > wrote:
> Yes, that's the plan.
> 
> Here are the related epics tracking the work: MESOS-2288 
>  and MESOS-3302 
> 
> 
> The user doc for the scheduler API is 
> https://github.com/apache/mesos/blob/master/docs/scheduler-http-api.md 
> 
> 
> On Thu, Dec 3, 2015 at 11:34 AM, John Omernik  > wrote:
> Somewhere in the back of my brain I thought I read something about a 
> migration away from using the mesos native lib and going to a more generic 
> API approach to support better portability and less reliance on the lib. 
> 
> I read about this before I understood things well (or as well I do now I 
> should say). Am I misremembering reading about this? I can't find any 
> stories/documentation on this. If I am correct on this, can someone point me 
> to a JIRA or a discussion on how this is supposed to work? I.e. is the goal 
> to migrate all frameworks off the native library to deprecate it etc?
> 
> Thanks, sorry for the weird questions. 
> 
> John
> 
> 
> 
> 



Re: many outstanding chronos offers

2015-11-05 Thread Anand Mazumdar
Can you try setting the flag `—offer_timeout` on the mesos master to some small 
value e.g. 5 minutes ? The default behavior is that a framework can keep 
hoarding the offered resources forever.

http://mesos.apache.org/documentation/latest/configuration/ 


-anand

> On Nov 5, 2015, at 5:15 PM, craig w  wrote:
> 
> I'm running Mesos 0.24.1, Marathon 0.11.1 and Chronos 2.4.0.
> 
> It seems I'm unable to launch a new app in marathon b/c resources aren't 
> available to it. I say that b/c the "offers" ui page, which are "outstanding 
> offers" (https://issues.apache.org/jira/browse/MESOS-3817 
> ), shows most resources 
> have been offered to chronos which hasn't accepted/declined them.
> 
> Is there some way to prevent this from happening? How do I get the offers to 
> be "unstuck"? Perhaps, stop chronos for a bit?
> 
> Thanks



Re: How does mesos determine how much memory on a node is available for offer?

2015-09-02 Thread Anand Mazumdar
In case you don’t specify the resources via “—resources” flag when you start 
your agent, it picks up the default values. (Example: 
--resources="cpus:4;mem:1024;disk:2”)

The default value for memory is here: 
https://github.com/apache/mesos/blob/master/src/slave/constants.cpp#L46 


-anand

> On Sep 2, 2015, at 6:12 PM, F21  wrote:
> 
> I have 3 CoreOS nodes running in vagrant. Mesos is run natively (not in 
> docker containers).
> 
> There is 1 master/slave and 2 slaves.
> 
> If I ssh into one of my slaves and run free -m, I see:
> 
> Total: 2005
> Used 1342
> Free 662
> Shares 273
> Buffers 13
> Cached 1210
> 
> In the mesos web-ui, I see that the slave  has 1002 MB of memory to offer.
> 
> How is this 1002 MB determined (I am running the masters and slaves with 
> stock defaults and no customizations)?
> 
> Is the 1002MB included in the used memory (1342)? If so, why is the 662MB 
> free? It seems to be a waste and I am sure it should be able to offer another 
> 500MB making the total 1502MB.



Re: [VOTE] Release Apache Mesos 0.24.0 (rc1)

2015-08-28 Thread Anand Mazumdar
Dario,

Thanks for the detailed explanation and for trying out the new API. However, 
this is not a bug. The output from CURL is the encoding used by Mesos for the 
events stream. From the user doc 
https://github.com/apache/mesos/blob/master/docs/scheduler_http_api.md:

Master encodes each Event in RecordIO format, i.e., string representation of 
length of the event in bytes followed by JSON or binary Protobuf  (possibly 
compressed) encoded event. Note that the value of length will never be ‘0’ and 
the size of the length will be the size of unsigned integer (i.e., 64 bits). 
Also, note that the RecordIO encoding should be decoded by the scheduler 
whereas the underlying HTTP chunked encoding is typically invisible at the 
application (scheduler) layer.“

If you run CURL with tracing enabled i.e. —trace, the output would be something 
similar to this:

= Recv header, 2 bytes (0x2)
: 0d 0a   ..
= Recv data, 115 bytes (0x73)
: 36 64 0d 0a 31 30 35 0a 7b 22 73 75 62 73 63 72 6d..105.{subscr
0010: 69 62 65 64 22 3a 7b 22 66 72 61 6d 65 77 6f 72 ibed:{framewor
0020: 6b 5f 69 64 22 3a 7b 22 76 61 6c 75 65 22 3a 22 k_id:{value:
0030: 32 30 31 35 30 38 32 35 2d 31 30 33 30 31 38 2d 20150825-103018-
0040: 33 38 36 33 38 37 31 34 39 38 2d 35 30 35 30 2d 3863871498-5050-
0050: 31 31 38 35 2d 30 30 31 30 22 7d 7d 2c 22 74 79 1185-0010}},ty
0060: 70 65 22 3a 22 53 55 42 53 43 52 49 42 45 44 22 pe:SUBSCRIBED
0070: 7d 0d 0a}..
others

In the output above, the chunks are correctly delimited by ‘CRLF' (0d 0a) as 
per the HTTP RFC. As mentioned earlier, the output that you observe on stdout 
with CURL is of the Record-IO encoding used for the events stream ( and is not 
related to the RFC ):

event = event-size LF
 event-data

Looking forward to more bug reports as you try out the new API !

-anand

 On Aug 28, 2015, at 12:56 AM, Dario Rexin dario.re...@me.com wrote:
 
 -1 (non-binding)
 
 I found a breaking bug in the new HTTP API. The messages do not conform to 
 the HTTP standard for chunked transfer encoding. in RFC 2616 Sec. 3 
 (http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html 
 http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html) a chunk is defined 
 as:
 
 chunk = chunk-size [ chunk-extension ] CRLF
 chunk-data CRLF
 
 The HTTP API currently sends a chunk as:
 
 chunk = chunk-size LF
 chunk-data
 
 A standard conform HTTP client like curl can’t correctly interpret the data 
 as a complete chunk. In curl it currently looks like this:
 
 104
 {subscribed:{framework_id:{value:20150820-114552-16777343-5050-43704-}},type:SUBSCRIBED}20
 {type:HEARTBEAT”}666
 …. waiting …
 {offers:{offers:[{agent_id:{value:20150820-114552-16777343-5050-43704-S0},framework_id:{value:20150820-114552-16777343-5050-43704-},hostname:localhost,id:{value:20150820-114552-16777343-5050-43704-O0},resources:[{name:cpus,role:*,scalar:{value:8},type:SCALAR},{name:mem,role:*,scalar:{value:15360},type:SCALAR},{name:disk,role:*,scalar:{value:2965448},type:SCALAR},{name:ports,ranges:{range:[{begin:31000,end:32000}]},role:*,type:RANGES}],url:{address:{hostname:localhost,ip:127.0.0.1,port:5051},path:\/slave(1),scheme:http}}]},type:OFFERS”}20
 … waiting …
 {type:HEARTBEAT”}20
 … waiting …
 
 It will receive a couple of messages after successful registration with the 
 master and the last thing printed is a number (in this case 666). Then after 
 some time it will print the first offers message followed by the number 20. 
 The explanation for this behavior is, that curl can’t interpret the data it 
 gets from Mesos as a complete chunk and waits for the missing data. So it 
 prints what it thinks is a chunk (a message followed by the size of the next 
 messsage) and keeps the rest of the message until another message arrives and 
 so on. The fix for this is to terminate both lines, the message size and the 
 message data, with CRLF.
 
 Cheers,
 Dario



Re: [VOTE] Release Apache Mesos 0.24.0 (rc1)

2015-08-28 Thread Anand Mazumdar
Dario,

Can you shed a bit more light on what you still find puzzling about the CURL 
behavior after my explanation ? 

PS: A single HTTP chunk can have 0 or more Mesos (Scheduler API) Events. So in 
your example, the first chunk had complete information about the first “event”, 
followed by partial information about the subsequent event from another chunk.

As for the benefit of using RecordIO format here, how else do you think we 
could have de-marcated two events in the response ?

-anand


 On Aug 28, 2015, at 10:01 AM, dario.re...@me.com wrote:
 
 Anand,
 
 thanks for the explanation. I'm still a little puzzled why curl behaves so 
 strange. I will check how other client behave as soon as I have a chance.
 
 Vinod,
 
 what exactly is the benefit of using recordio here? Doesn't it make the 
 content-type somewhat wrong? If I send 'Accept: application/json' and receive 
 'Content-Type: application/json', I actually expect to receive only json in 
 the message.
 
 Thanks,
 Dario
 
 On 28.08.2015, at 18:13, Vinod Kone vinodk...@apache.org 
 mailto:vinodk...@apache.org wrote:
 
 I'm happy to add the \n after the event (note it's different from chunk) 
 if that makes CURL play nicer. I'm not sure about the \r part though? Is 
 that a nice to have or does it have some other benefit?
 
 The design doc is not set in the stone since this has not been released yet. 
 So definitely want to do the right/easy thing.
 
 On Fri, Aug 28, 2015 at 7:53 AM, Anand Mazumdar an...@mesosphere.io 
 mailto:an...@mesosphere.io wrote:
 Dario,
 
 Thanks for the detailed explanation and for trying out the new API. However, 
 this is not a bug. The output from CURL is the encoding used by Mesos for 
 the events stream. From the user doc 
 https://github.com/apache/mesos/blob/master/docs/scheduler_http_api.md:
 
 Master encodes each Event in RecordIO format, i.e., string representation 
 of length of the event in bytes followed by JSON or binary Protobuf  
 (possibly compressed) encoded event. Note that the value of length will 
 never be ‘0’ and the size of the length will be the size of unsigned integer 
 (i.e., 64 bits). Also, note that the RecordIO encoding should be decoded by 
 the scheduler whereas the underlying HTTP chunked encoding is typically 
 invisible at the application (scheduler) layer.“
 
 If you run CURL with tracing enabled i.e. —trace, the output would be 
 something similar to this:
 
 = Recv header, 2 bytes (0x2)
 : 0d 0a   ..
 = Recv data, 115 bytes (0x73)
 : 36 64 0d 0a 31 30 35 0a 7b 22 73 75 62 73 63 72 6d..105.{subscr
 0010: 69 62 65 64 22 3a 7b 22 66 72 61 6d 65 77 6f 72 ibed:{framewor
 0020: 6b 5f 69 64 22 3a 7b 22 76 61 6c 75 65 22 3a 22 k_id:{value:
 0030: 32 30 31 35 30 38 32 35 2d 31 30 33 30 31 38 2d 20150825-103018-
 0040: 33 38 36 33 38 37 31 34 39 38 2d 35 30 35 30 2d 3863871498-5050-
 0050: 31 31 38 35 2d 30 30 31 30 22 7d 7d 2c 22 74 79 1185-0010}},ty
 0060: 70 65 22 3a 22 53 55 42 53 43 52 49 42 45 44 22 pe:SUBSCRIBED
 0070: 7d 0d 0a}..
 others
 
 In the output above, the chunks are correctly delimited by ‘CRLF' (0d 0a) as 
 per the HTTP RFC. As mentioned earlier, the output that you observe on 
 stdout with CURL is of the Record-IO encoding used for the events stream ( 
 and is not related to the RFC ):
 
 event = event-size LF
  event-data
 
 Looking forward to more bug reports as you try out the new API !
 
 -anand
 
 On Aug 28, 2015, at 12:56 AM, Dario Rexin dario.re...@me.com 
 mailto:dario.re...@me.com wrote:
 
 -1 (non-binding)
 
 I found a breaking bug in the new HTTP API. The messages do not conform to 
 the HTTP standard for chunked transfer encoding. in RFC 2616 Sec. 3 
 (http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html 
 http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html) a chunk is defined 
 as:
 
 chunk = chunk-size [ chunk-extension ] CRLF
 chunk-data CRLF
 
 The HTTP API currently sends a chunk as:
 
 chunk = chunk-size LF
 chunk-data
 
 A standard conform HTTP client like curl can’t correctly interpret the data 
 as a complete chunk. In curl it currently looks like this:
 
 104
 {subscribed:{framework_id:{value:20150820-114552-16777343-5050-43704-}},type:SUBSCRIBED}20
 {type:HEARTBEAT”}666
 …. waiting …
 {offers:{offers:[{agent_id:{value:20150820-114552-16777343-5050-43704-S0},framework_id:{value:20150820-114552-16777343-5050-43704-},hostname:localhost,id:{value:20150820-114552-16777343-5050-43704-O0},resources:[{name:cpus,role:*,scalar:{value:8},type:SCALAR},{name:mem,role:*,scalar:{value:15360},type:SCALAR},{name:disk,role:*,scalar:{value:2965448},type:SCALAR},{name:ports,ranges:{range:[{begin:31000,end:32000}]},role:*,type:RANGES}],url:{address:{hostname:localhost,ip:127.0.0.1,port:5051},path:\/slave(1),scheme:http}}]},type:OFFERS”}20
 … waiting …
 {type:HEARTBEAT”}20
 … waiting …
 
 It will receive a couple of messages after successful

Re: [VOTE] Release Apache Mesos 0.24.0 (rc1)

2015-08-28 Thread Anand Mazumdar
Dario,

Most HTTP libraries/parsers ( including one that Mesos uses internally ) 
provide a way to specify a default size of each chunk. If a Mesos Event is too 
big , it would get split into smaller chunks and vice-versa.

-anand

 On Aug 28, 2015, at 11:51 AM, dario.re...@me.com wrote:
 
 Anand,
 
 in the example from my first mail you can see that curl prints the size of a 
 message and then waits for the next message and only when it receives that 
 message it will print the prior message plus the size of the next message, 
 but not the actual message.
 
 What's the benefit of encoding multiple messages in a single chunk? You could 
 simply create a single chunk per event.
 
 Cheers,
 Dario
 
 On 28.08.2015, at 19:43, Anand Mazumdar an...@mesosphere.io 
 mailto:an...@mesosphere.io wrote:
 
 Dario,
 
 Can you shed a bit more light on what you still find puzzling about the CURL 
 behavior after my explanation ? 
 
 PS: A single HTTP chunk can have 0 or more Mesos (Scheduler API) Events. So 
 in your example, the first chunk had complete information about the first 
 “event”, followed by partial information about the subsequent event from 
 another chunk.
 
 As for the benefit of using RecordIO format here, how else do you think we 
 could have de-marcated two events in the response ?
 
 -anand
 
 
 On Aug 28, 2015, at 10:01 AM, dario.re...@me.com 
 mailto:dario.re...@me.com wrote:
 
 Anand,
 
 thanks for the explanation. I'm still a little puzzled why curl behaves so 
 strange. I will check how other client behave as soon as I have a chance.
 
 Vinod,
 
 what exactly is the benefit of using recordio here? Doesn't it make the 
 content-type somewhat wrong? If I send 'Accept: application/json' and 
 receive 'Content-Type: application/json', I actually expect to receive only 
 json in the message.
 
 Thanks,
 Dario
 
 On 28.08.2015, at 18:13, Vinod Kone vinodk...@apache.org 
 mailto:vinodk...@apache.org wrote:
 
 I'm happy to add the \n after the event (note it's different from chunk) 
 if that makes CURL play nicer. I'm not sure about the \r part though? Is 
 that a nice to have or does it have some other benefit?
 
 The design doc is not set in the stone since this has not been released 
 yet. So definitely want to do the right/easy thing.
 
 On Fri, Aug 28, 2015 at 7:53 AM, Anand Mazumdar an...@mesosphere.io 
 mailto:an...@mesosphere.io wrote:
 Dario,
 
 Thanks for the detailed explanation and for trying out the new API. 
 However, this is not a bug. The output from CURL is the encoding used by 
 Mesos for the events stream. From the user doc 
 https://github.com/apache/mesos/blob/master/docs/scheduler_http_api.md:
 
 Master encodes each Event in RecordIO format, i.e., string representation 
 of length of the event in bytes followed by JSON or binary Protobuf  
 (possibly compressed) encoded event. Note that the value of length will 
 never be ‘0’ and the size of the length will be the size of unsigned 
 integer (i.e., 64 bits). Also, note that the RecordIO encoding should be 
 decoded by the scheduler whereas the underlying HTTP chunked encoding is 
 typically invisible at the application (scheduler) layer.“
 
 If you run CURL with tracing enabled i.e. —trace, the output would be 
 something similar to this:
 
 = Recv header, 2 bytes (0x2)
 : 0d 0a   ..
 = Recv data, 115 bytes (0x73)
 : 36 64 0d 0a 31 30 35 0a 7b 22 73 75 62 73 63 72 6d..105.{subscr
 0010: 69 62 65 64 22 3a 7b 22 66 72 61 6d 65 77 6f 72 ibed:{framewor
 0020: 6b 5f 69 64 22 3a 7b 22 76 61 6c 75 65 22 3a 22 k_id:{value:
 0030: 32 30 31 35 30 38 32 35 2d 31 30 33 30 31 38 2d 20150825-103018-
 0040: 33 38 36 33 38 37 31 34 39 38 2d 35 30 35 30 2d 3863871498-5050-
 0050: 31 31 38 35 2d 30 30 31 30 22 7d 7d 2c 22 74 79 1185-0010}},ty
 0060: 70 65 22 3a 22 53 55 42 53 43 52 49 42 45 44 22 pe:SUBSCRIBED
 0070: 7d 0d 0a}..
 others
 
 In the output above, the chunks are correctly delimited by ‘CRLF' (0d 0a) 
 as per the HTTP RFC. As mentioned earlier, the output that you observe on 
 stdout with CURL is of the Record-IO encoding used for the events stream ( 
 and is not related to the RFC ):
 
 event = event-size LF
  event-data
 
 Looking forward to more bug reports as you try out the new API !
 
 -anand
 
 On Aug 28, 2015, at 12:56 AM, Dario Rexin dario.re...@me.com 
 mailto:dario.re...@me.com wrote:
 
 -1 (non-binding)
 
 I found a breaking bug in the new HTTP API. The messages do not conform 
 to the HTTP standard for chunked transfer encoding. in RFC 2616 Sec. 3 
 (http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html 
 http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html) a chunk is 
 defined as:
 
 chunk = chunk-size [ chunk-extension ] CRLF
 chunk-data CRLF
 
 The HTTP API currently sends a chunk as:
 
 chunk = chunk-size LF
 chunk-data
 
 A standard conform HTTP client like curl can’t correctly interpret the 
 data