Re: Allowing both CommandInfo and ExecutorInfo on TaskInfo

2016-11-02 Thread David Greenberg
I have worked with executors that perform both conditional & unconditional
execution of a process graph, configurable by the framework. I think that
it would be hard to standardize.

On Wed, Nov 2, 2016 at 2:49 PM Zameer Manji  wrote:

> Joris,
>
> You make a good point. However, I'm not convinced that `CommandInfo` should
> be the well defined construct that people use. Can you please describe
> different custom executors, and the overlap between them and how
> CommandInfo will reduce that overlap? I'm having a hard time seeing where
> CommandInfo will solve all of their cases.
>
> Consider the cause of Thermos (Aurora's Executor), it could never use a
> `CommandInfo` struct because it executes a processes graph instead of a
> single command.
>
> If the project wants to go down this path, I think generalizing
> `CommandInfo` that could capture more cases (ie multiple commands or a
> graph of commands) would be a better first step.
>
> What do you think?
>
> On Wed, Oct 26, 2016 at 10:38 AM, Joris Van Remoortere <
> jo...@mesosphere.io>
> wrote:
>
> > I do think it would be valuable to have a more well defined contract
> > between frameworks and custom executors.
> >
> > As Zameer pointed out a specific framework and accompanying custom
> executor
> > can decide to do that in the data bytes; however, if we started building
> > out a few different flavors of executors then it would be great for there
> > to be standard way to pass command information to them.
> >
> > The current model works well in a 1-1 mapping between framework and
> > executor binaries. In a world where that is 1-N it means all N executors
> > have to use the same method of passing the command.
> >
> > —
> > *Joris Van Remoortere*
> > Mesosphere
> >
> > On Mon, Oct 17, 2016 at 4:25 PM, Zameer Manji  wrote:
> >
> > > I'm not convinced this is a valid use case.
> > >
> > > Mesos is supposed to be a generic kernel for launching "tasks",
> whatever
> > > they might be.
> > >
> > > In some cases it is useful to launch an executable, in other cases it
> > might
> > > be useful to launch a series of executables, and in some other cases it
> > > might be useful to spawn a thread to do some work. Whatever that might
> > be,
> > > it doesn't matter to Mesos and the executor and framework are free to
> > > establish a contract in `ExecutorInfo.data`, completely independent of
> > the
> > > Mesos API.
> > >
> > > I think formalizing this contract between executors and frameworks via
> > > CommandInfo is going to introduce more problems than what they solve.
> If
> > > the CommandInfo struct is useful, frameworks and executors can just
> stuff
> > > that into ExecutorInfo.data, however it's not something that they need
> to
> > > adhere too.
> > >
> > > What's the underlying motivation for this?
> > >
> > >
> > >
> > > On Thu, Oct 13, 2016 at 10:40 AM, haosdent  wrote:
> > >
> > > > For command task, if its `ExecutorInfo` would set with
> > `CommandExecutor`
> > > as
> > > > well?
> > > >
> > > > Some tickets may relate to this.
> > > >
> > > > [1]: https://issues.apache.org/jira/browse/MESOS-2330
> > > > [2]: https://issues.apache.org/jira/browse/MESOS-527
> > > > [3]: https://issues.apache.org/jira/browse/MESOS-5198
> > > >
> > > > On Fri, Oct 14, 2016 at 1:00 AM, Vinod Kone 
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > We are contemplating whether to allow both CommandInfo and
> > ExecutorInfo
> > > > on
> > > > > TaskInfo (MESOS-6294  > jira/browse/MESOS-6294
> > > > >).
> > > > > Currently we only allow one or the other. The motivation is to
> allow
> > > > custom
> > > > > executors a more structured way to pass information (e.g, command)
> > > about
> > > > > Task. Right now custom executors have to get this data via
> > > > `TaskInfo.bytes`
> > > > > which is not ideal.
> > > > >
> > > > > Are there any custom executors out there that crash if they get
> Tasks
> > > > with
> > > > > CommandInfo set?
> > > > >
> > > > > Thoughts?
> > > > >
> > > > > Vinod
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards,
> > > > Haosdent Huang
> > > >
> > > > --
> > > > Zameer Manji
> > > >
> > >
> >
> > --
> > Zameer Manji
> >
>


Voting for MesosCon Hangzhou Talks! due September 26 / MesosCon 杭州演讲主题投票!截止日期为9月26号

2016-09-20 Thread David Greenberg
Please visit the following link and submit your responses. We'll be
tallying up results on Monday, so you have one week to vote and make your
voice heard!

https://docs.google.com/forms/d/e/1FAIpQLSdW7EDrMo_5fps8imeQFzDxLAcA0pCbOZz3MZdGjdcVJjC0LQ/viewform

请您打开下面的链接并进行投票。在接下来的一周里,您可以仔细考虑并进行投票和反馈意见,我们将在下个星期一汇总最终的投票结果。

https://gdgdocs.org/forms/d/e/1FAIpQLSdW7EDrMo_5fps8imeQFzDxLAcA0pCbOZz3MZdGjdcVJjC0LQ/viewform


Special thanks to haosdent, who has been providing much better translations
than I can!

Best regards,
David, Kiersten, and Artem - MesosCon co-chairs


Re: 中国的Mesos爱好者们,关于今年在杭州的MesosCon大会

2016-09-19 Thread David Greenberg
Specifically, we want to make sure that everyone will be able to access the
forms if we put them on Google forms/docs.

On Mon, Sep 19, 2016 at 7:04 PM Hechen Gao <hechen@autodesk.com> wrote:

> Hey David,
>
> I would love to contribute to your survey about the MesosCon, please count
> me in.
>
> Best regards,
> *Hechen Gao*
> Senior Software Engineer, Cloud Platforms - Engineering Core Services
>
> *Autodesk, Inc.*
> The Landmark @ One Market, Suite 500
> San Francisco, CA  94105
> www.autodesk.com
>
>
> On Sep 19, 2016, at 5:57 PM, tommy xiao <xia...@gmail.com> wrote:
>
> +1
>
> 在 2016年9月20日 上午8:22,David Greenberg <dsg123456...@gmail.com>写道:
>
> 作为此次MesosCon大会的主席,我希望你们能够在今年杭州的MesosCon大会中听到你们喜欢的演讲和分享。所以,我们正在准备发出一个Google
> Forms的调查,这个调查将会帮助我们更好的决定演讲和分享的内容。希望你们能够积极参与这个调查。你们的意见对我们很重要。
>
>
> David Greenberg, co-chair of MesosCon 敬上
>
>
>
>
> --
> Deshi Xiao
> Twitter: xds2000
> E-mail: xiaods(AT)gmail.com
>
>
>


中国的Mesos爱好者们,关于今年在杭州的MesosCon大会

2016-09-19 Thread David Greenberg
作为此次MesosCon大会的主席,我希望你们能够在今年杭州的MesosCon大会中听到你们喜欢的演讲和分享。所以,我们正在准备发出一个Google
Forms的调查,这个调查将会帮助我们更好的决定演讲和分享的内容。希望你们能够积极参与这个调查。你们的意见对我们很重要。


David Greenberg, co-chair of MesosCon 敬上


对于那些在中国问题

2016-09-19 Thread David Greenberg
我们希望您对这些演讲的喜好会在MesosCon Hangzhou。您可以使用Google Forms,或做你喜欢不同的东西?目前,我们正在准备调查。

此致
敬礼

David Greenberg, co-chair of MesosCon 敬上


Re: Do all the topics in MesosCon Asia share in English

2016-08-17 Thread David Greenberg
I believe that we will have translators in some or all tracks, so that
everyone will be able to understand regardless of the speaker's language.
As we get closer to the event, we'll have more details about the
simultaneous translation.

Best regards,
David Greenberg, co-chair of MesosCon Asia
On Wed, Aug 17, 2016 at 11:00 AM haosdent <haosd...@gmail.com> wrote:

> Hi, MesosCon Aisa is accepting submissions now which close on September
> 9th. Do all the topics are supposed to be shared in English?
>
> Or accept sharing in Chinese as well?
>
>
> --
> Best Regards,
> Haosdent Huang
>


Re: tutorial for mesos developer

2016-08-09 Thread David Greenberg
There's a book about this as well, Building Applications on Mesos
(disclaimer: I'm the author). The book goes into great detail about how to
write frameworks, including techniques, pitfalls, and strategies for
building highly scalable systems. Here's a link to it:
http://shop.oreilly.com/product/0636920039952.do

Best regards,
David

On Tue, Aug 9, 2016 at 4:25 PM Vinod Kone  wrote:

> Try this: https://github.com/mesosphere/RENDLER
>
> On Tue, Aug 9, 2016 at 3:21 PM, Rongqing Tu  wrote:
>
> > Hi,
> > Is there any document to help how to develop a framework on mesos besides
> > the development guide? After making, installing, and running mesos master
> > and slaves, I would like to create a simple framework on the system. I am
> > reading some sample codes in examples, but is there any instruction how
> to
> > compile my own framework source code?
> >
> > Thanks very much.
> > Ron
> >
>


Re: Vote on #MesosCon proposals, deadline Friday March 25

2016-04-04 Thread David Greenberg
We have had a couple delays, but we'll be ready with the official
announcement later this week.

Sorry for the delay! We know we're slightly behind our original schedule.
On Mon, Apr 4, 2016 at 9:29 AM Tomek Janiszewski <jani...@gmail.com> wrote:

> @David Do you have results?
>
> pon., 21.03.2016 o 14:39 użytkownik David Greenberg <
> dsg123456...@gmail.com>
> napisał:
>
> > No, sorry--we'll collect the votes on Friday.
> >
> > Thanks,
> > David
> >
> > On Sun, Mar 20, 2016 at 9:01 PM Darren Haas <dh...@apple.com> wrote:
> >
> >> Hi David,
> >>
> >> We could always start the ranking using shuf. :) Is it possible to show
> >> the current votes during the ranking?
> >>
> >> Thanks,
> >> Darren
> >>
> >>
> >>
> >>
> >> Sent from my iPhone
> >> On Mar 19, 2016, at 4:45 PM, David Greenberg <dsg123456...@gmail.com>
> >> wrote:
> >>
> >> Hi Jay,
> >>
> >> Thanks for your feedback! The reason we're asking for you to rank the
> >> topics is that this will allow us to better understand everyone's
> relative
> >> preferences--next, we'll use standard voting algorithms to determine the
> >> schedule, to ensure most people get as many talks they want as
> possible. We
> >> hope you enjoy the program we come up with :)
> >>
> >> Thanks,
> >> David
> >>
> >> On Sat, Mar 19, 2016 at 12:39 AM Jay JN Guo <guojian...@cn.ibm.com>
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> Thank you for this good work and I'm already looking forward to this
> >>> MesosCon.
> >>>
> >>> Although one minor suggestion here, Accept/Reject on a scale of 10 is a
> >>> bit intimidating. Personally, I only have three feeling toward a topic:
> >>> will go/maybe/not interested, whereas quantifying these feeling into a
> >>> scale of 10 for 154 topics is just too much. Maybe we could simplify
> the
> >>> form in the future. We could take OpenStack summit voting form as an
> >>> example.
> >>>
> >>> Cheers,
> >>> /J
> >>>
> >>> - Original message -
> >>> From: Kiersten Gaffney <kiers...@mesosphere.io>
> >>> To: dev@mesos.apache.org, u...@mesos.apache.org
> >>> Cc: David Greenberg <dsg123456...@gmail.com>, Dave Lester <
> >>> d...@davelester.org>, Kiersten Gaffney <kiers...@mesosphere.io>
> >>> Subject: Vote on #MesosCon proposals, deadline Friday March 25
> >>> Date: Sat, Mar 19, 2016 8:11 AM
> >>>
> >>>
> >>> Please take a few minutes the next few days and review what members of
> >>> the
> >>> community have submitted!
> >>>
> >>> Voting forms close Friday, March 25, 2016, 11:55 PST
> >>>
> >>> A total of 154 proposals were submitted in time for #MesosCon review,
> up
> >>> significantly from 63 submitted for last year’s conference. Similar to
> >>> last
> >>> year, the MesosCon program committee is opening these proposals up for
> >>> community review/feedback to better-inform our decisions about what
> >>> should
> >>> be included in the program.
> >>>
> >>> In order to make it easier to review a subset of the proposals, we’ve
> >>> segmented them based upon two loose themes: Developer and Users.
> >>>
> >>> Developers: http://bit.ly/1RpZPvj
> >>>
> >>> Talks on how frameworks can be used, developed, and integrate with
> Mesos.
> >>>
> >>> Users: http://bit.ly/1Mspaxp
> >>>
> >>> A combination of talks that are use cases (how company x uses Mesos),
> and
> >>> operations-focused (how we deploy x, use Docker, etc).
> >>>
> >>> The forms above also include an opportunity to indicate which sessions
> >>> you
> >>> didn't see proposed but would like to attend.
> >>>
> >>> Thanks in advance for your participation!
> >>>
> >>> Kiersten, Dave, and David (Program Committee)
> >>>
> >>>
>


Re: Vote on #MesosCon proposals, deadline Friday March 25

2016-03-21 Thread David Greenberg
No, sorry--we'll collect the votes on Friday.

Thanks,
David

On Sun, Mar 20, 2016 at 9:01 PM Darren Haas <dh...@apple.com> wrote:

> Hi David,
>
> We could always start the ranking using shuf. :) Is it possible to show
> the current votes during the ranking?
>
> Thanks,
> Darren
>
>
>
>
> Sent from my iPhone
> On Mar 19, 2016, at 4:45 PM, David Greenberg <dsg123456...@gmail.com>
> wrote:
>
> Hi Jay,
>
> Thanks for your feedback! The reason we're asking for you to rank the
> topics is that this will allow us to better understand everyone's relative
> preferences--next, we'll use standard voting algorithms to determine the
> schedule, to ensure most people get as many talks they want as possible. We
> hope you enjoy the program we come up with :)
>
> Thanks,
> David
>
> On Sat, Mar 19, 2016 at 12:39 AM Jay JN Guo <guojian...@cn.ibm.com> wrote:
>
>> Hi,
>>
>> Thank you for this good work and I'm already looking forward to this
>> MesosCon.
>>
>> Although one minor suggestion here, Accept/Reject on a scale of 10 is a
>> bit intimidating. Personally, I only have three feeling toward a topic:
>> will go/maybe/not interested, whereas quantifying these feeling into a
>> scale of 10 for 154 topics is just too much. Maybe we could simplify the
>> form in the future. We could take OpenStack summit voting form as an
>> example.
>>
>> Cheers,
>> /J
>>
>> - Original message -
>> From: Kiersten Gaffney <kiers...@mesosphere.io>
>> To: dev@mesos.apache.org, u...@mesos.apache.org
>> Cc: David Greenberg <dsg123456...@gmail.com>, Dave Lester <
>> d...@davelester.org>, Kiersten Gaffney <kiers...@mesosphere.io>
>> Subject: Vote on #MesosCon proposals, deadline Friday March 25
>> Date: Sat, Mar 19, 2016 8:11 AM
>>
>>
>> Please take a few minutes the next few days and review what members of the
>> community have submitted!
>>
>> Voting forms close Friday, March 25, 2016, 11:55 PST
>>
>> A total of 154 proposals were submitted in time for #MesosCon review, up
>> significantly from 63 submitted for last year’s conference. Similar to
>> last
>> year, the MesosCon program committee is opening these proposals up for
>> community review/feedback to better-inform our decisions about what should
>> be included in the program.
>>
>> In order to make it easier to review a subset of the proposals, we’ve
>> segmented them based upon two loose themes: Developer and Users.
>>
>> Developers: http://bit.ly/1RpZPvj
>>
>> Talks on how frameworks can be used, developed, and integrate with Mesos.
>>
>> Users: http://bit.ly/1Mspaxp
>>
>> A combination of talks that are use cases (how company x uses Mesos), and
>> operations-focused (how we deploy x, use Docker, etc).
>>
>> The forms above also include an opportunity to indicate which sessions you
>> didn't see proposed but would like to attend.
>>
>> Thanks in advance for your participation!
>>
>> Kiersten, Dave, and David (Program Committee)
>>
>>


Re: Vote on #MesosCon proposals, deadline Friday March 25

2016-03-19 Thread David Greenberg
Hi Jay,

Thanks for your feedback! The reason we're asking for you to rank the
topics is that this will allow us to better understand everyone's relative
preferences--next, we'll use standard voting algorithms to determine the
schedule, to ensure most people get as many talks they want as possible. We
hope you enjoy the program we come up with :)

Thanks,
David

On Sat, Mar 19, 2016 at 12:39 AM Jay JN Guo <guojian...@cn.ibm.com> wrote:

> Hi,
>
> Thank you for this good work and I'm already looking forward to this
> MesosCon.
>
> Although one minor suggestion here, Accept/Reject on a scale of 10 is a
> bit intimidating. Personally, I only have three feeling toward a topic:
> will go/maybe/not interested, whereas quantifying these feeling into a
> scale of 10 for 154 topics is just too much. Maybe we could simplify the
> form in the future. We could take OpenStack summit voting form as an
> example.
>
> Cheers,
> /J
>
> - Original message -
> From: Kiersten Gaffney <kiers...@mesosphere.io>
> To: dev@mesos.apache.org, u...@mesos.apache.org
> Cc: David Greenberg <dsg123456...@gmail.com>, Dave Lester <
> d...@davelester.org>, Kiersten Gaffney <kiers...@mesosphere.io>
> Subject: Vote on #MesosCon proposals, deadline Friday March 25
> Date: Sat, Mar 19, 2016 8:11 AM
>
>
> Please take a few minutes the next few days and review what members of the
> community have submitted!
>
> Voting forms close Friday, March 25, 2016, 11:55 PST
>
> A total of 154 proposals were submitted in time for #MesosCon review, up
> significantly from 63 submitted for last year’s conference. Similar to last
> year, the MesosCon program committee is opening these proposals up for
> community review/feedback to better-inform our decisions about what should
> be included in the program.
>
> In order to make it easier to review a subset of the proposals, we’ve
> segmented them based upon two loose themes: Developer and Users.
>
> Developers: http://bit.ly/1RpZPvj
>
> Talks on how frameworks can be used, developed, and integrate with Mesos.
>
> Users: http://bit.ly/1Mspaxp
>
> A combination of talks that are use cases (how company x uses Mesos), and
> operations-focused (how we deploy x, use Docker, etc).
>
> The forms above also include an opportunity to indicate which sessions you
> didn't see proposed but would like to attend.
>
> Thanks in advance for your participation!
>
> Kiersten, Dave, and David (Program Committee)
>
>


Re: [proposal] Exposing Multiple Isolated Disks to Frameworks

2015-11-02 Thread David Greenberg
Sorry for not directly linking! Yes, that is the doc. I'll update the
ticket to include it in the description.

On Thu, Oct 29, 2015 at 1:00 PM David Greenberg <dsg123456...@gmail.com>
wrote:

> Hello Everyone,
> At the MesosCon Dublin Hackathon, we started working on MESOS-191. The
> goal of this issue is to enable database and persistent disk frameworks to
> make use of isolated and high-performance disks, to enable frameworks like
> HDFS, Cotton, and Kafka to achieve production-level performance.
>
> First, we captured all of the potential applications of the feature
> through user stories.
> Then, we demonstrated the change we'll make to the Mesos API (this is only
> a protobuf change, no new RPCs are needed).
> Finally, we explicitly point out features that we're not going to include
> in the v1 implementation, but are worthwhile to be added in later
> iterations.
>
> Please look over the document and contribute your feedback.
>
> Thank you!
> David, Jie, Joris, and Michael
>


Re: [Breaking bug fix] Binary in state endpoints

2015-11-02 Thread David Greenberg
Why not base64 encode the field? We use that field in our frameworks, and
some of our platform tools would benefit from being able to read that data.
Base64 seems like a compromise with minimal complexity addition. It also
removes the potential for parse errors, doesn't rule out future
applications from using the data stored there (as specialized frameworks
use that field), and doesn't incur a message size overhead in the (I
presume) majority of frameworks not using that field.
On Mon, Nov 2, 2015 at 4:28 PM Guangya Liu  wrote:

> +1 to remove the field directly, one comment is that the upgrade document
> may need to be updated.
>
> From my understanding, since the data is binary data and I did not see too
> much requirement on retrieving binary data.
>
> Thanks!
>
> On Sat, Oct 24, 2015 at 5:33 AM, Joseph Wu  wrote:
>
> > Hello,
> >
> > The state endpoints, on master and agent, currently serialize two binary
> > data fields in the ExecutorInfo and TaskInfo objects.  These fields are
> set
> > by frameworks; and Mesos does not inspect their values.
> >
> > The data fields can be found in the state JSON blobs:
> > /master/state -> frameworks[*].executors[*].data
> > /slave/state ->
> >
> >
> frameworks[*].(executors|completed_executors)[*].(tasks|queued_tasks|completed_tasks)[*].data
> >
> > *Problem:*
> > The state endpoints are JSON-ified in a non-standard way (i.e. not via
> our
> > normal Protobuf-to-json methods).  When we serialize the binary "data"
> > fields, the binary is dumped as a string, as is.  The resulting JSON may
> > not be valid if the binary data includes random bytes (i.e. not unicode).
> > Most JSON parsers will error on the state endpoints in this case.
> >
> > *Proposed solution *(and breaking change)*:*
> > Simple -- remove the "data" fields from the state endpoints.  (And only
> > from the state endpoints.  The ExecutorInfo and TaskInfo objects will not
> > change.)
> >
> > *Question:*
> > We believe that frameworks/tools do not rely on retrieving the "data"
> > fields from the state endpoints.
> >
> > Is there any framework/tool that retrieves the "data" field from the
> state
> > endpoints?
> > And if so, is it critical to how the framework/tool works?
> >
> > More details here: https://issues.apache.org/jira/browse/MESOS-3771
> >
> > Thanks,
> > ~Joseph
> >
>


Re: [Breaking bug fix] Binary in state endpoints

2015-11-02 Thread David Greenberg
In that case, I rescind my objection. Memory use as a problem and labels as
an alternative work fine. Thanks!
On Mon, Nov 2, 2015 at 7:03 PM Benjamin Mahler <benjamin.mah...@gmail.com>
wrote:

> Sorry for the confusion, the motivation to remove 'data' is for memory
> scalability reasons (the ability to express binary fields is orthogonal and
> is not the reason to remove 'data').
>
> We can get into a really bad state in large clusters if frameworks are
> putting non-trivial amounts of 'data' in TaskInfos and ExecutorInfos. If
> it's too large for the master to hold in memory, the master will
> continually OOM and it becomes impossible to right your cluster. See
> https://issues.apache.org/jira/browse/MESOS-1746 for some history of
> stripping binary data, starting with TaskStatus.
>
> Labels were introduced to aid tooling, can you use labels? I realize they
> are not in ExecutorInfo yet.
>
> On Mon, Nov 2, 2015 at 6:23 PM, David Greenberg <dsg123456...@gmail.com>
> wrote:
>
> > Why not base64 encode the field? We use that field in our frameworks, and
> > some of our platform tools would benefit from being able to read that
> data.
> > Base64 seems like a compromise with minimal complexity addition. It also
> > removes the potential for parse errors, doesn't rule out future
> > applications from using the data stored there (as specialized frameworks
> > use that field), and doesn't incur a message size overhead in the (I
> > presume) majority of frameworks not using that field.
> > On Mon, Nov 2, 2015 at 4:28 PM Guangya Liu <gyliu...@gmail.com> wrote:
> >
> > > +1 to remove the field directly, one comment is that the upgrade
> document
> > > may need to be updated.
> > >
> > > From my understanding, since the data is binary data and I did not see
> > too
> > > much requirement on retrieving binary data.
> > >
> > > Thanks!
> > >
> > > On Sat, Oct 24, 2015 at 5:33 AM, Joseph Wu <jos...@mesosphere.io>
> wrote:
> > >
> > > > Hello,
> > > >
> > > > The state endpoints, on master and agent, currently serialize two
> > binary
> > > > data fields in the ExecutorInfo and TaskInfo objects.  These fields
> are
> > > set
> > > > by frameworks; and Mesos does not inspect their values.
> > > >
> > > > The data fields can be found in the state JSON blobs:
> > > > /master/state -> frameworks[*].executors[*].data
> > > > /slave/state ->
> > > >
> > > >
> > >
> >
> frameworks[*].(executors|completed_executors)[*].(tasks|queued_tasks|completed_tasks)[*].data
> > > >
> > > > *Problem:*
> > > > The state endpoints are JSON-ified in a non-standard way (i.e. not
> via
> > > our
> > > > normal Protobuf-to-json methods).  When we serialize the binary
> "data"
> > > > fields, the binary is dumped as a string, as is.  The resulting JSON
> > may
> > > > not be valid if the binary data includes random bytes (i.e. not
> > unicode).
> > > > Most JSON parsers will error on the state endpoints in this case.
> > > >
> > > > *Proposed solution *(and breaking change)*:*
> > > > Simple -- remove the "data" fields from the state endpoints.  (And
> only
> > > > from the state endpoints.  The ExecutorInfo and TaskInfo objects will
> > not
> > > > change.)
> > > >
> > > > *Question:*
> > > > We believe that frameworks/tools do not rely on retrieving the "data"
> > > > fields from the state endpoints.
> > > >
> > > > Is there any framework/tool that retrieves the "data" field from the
> > > state
> > > > endpoints?
> > > > And if so, is it critical to how the framework/tool works?
> > > >
> > > > More details here: https://issues.apache.org/jira/browse/MESOS-3771
> > > >
> > > > Thanks,
> > > > ~Joseph
> > > >
> > >
> >
>


[proposal] Exposing Multiple Isolated Disks to Frameworks

2015-10-29 Thread David Greenberg
Hello Everyone,
At the MesosCon Dublin Hackathon, we started working on MESOS-191. The goal
of this issue is to enable database and persistent disk frameworks to make
use of isolated and high-performance disks, to enable frameworks like HDFS,
Cotton, and Kafka to achieve production-level performance.

First, we captured all of the potential applications of the feature through
user stories.
Then, we demonstrated the change we'll make to the Mesos API (this is only
a protobuf change, no new RPCs are needed).
Finally, we explicitly point out features that we're not going to include
in the v1 implementation, but are worthwhile to be added in later
iterations.

Please look over the document and contribute your feedback.

Thank you!
David, Jie, Joris, and Michael


Re: Framework testing in Mesos

2014-10-12 Thread David Greenberg
For our frameworks, we don't tend to do much automated testing of the Mesos
interface--instead, we construct the framework state, then send it a
message, since our callbacks take the state of the framework + the event
as the argument. This way, we don't need to have mesos running, and we can
trim away large amounts of code necessary to connect to mesos but
unnecessary for the actual feature under test. We've also been
experimenting with simulation testing by mocking out the mesos APIs. These
techniques are mostly effective when you can pretend that the executors
you're using don't communicate much, or when they're trivial to mock.

On Sun, Oct 12, 2014 at 9:42 AM, Dharmesh Kakadia dhkaka...@gmail.com
wrote:

 Hi,

 I am working on a tiny experimental framework for Mesos. I was wondering
 what is the recommended way of writing testcases for framework testing. I
 looked at the several existing frameworks, but its still not clear to me. I
 understand that I might be able to test executor functionality in isolation
 through normal test cases, but testing as a whole framework is what I am
 unclear about.

 Suggestions? Is that a non-goal? How do other framework developers go
 about it?

 Also, on the related note, is there a way to debug frameworks in better
 way than sifting through logs?

 Thanks,
 Dharmesh





Re: Mesos language bindings in the wild

2014-07-11 Thread David Greenberg
I wrote a Clojure binding that uses reflection (
https://github.com/dgrnbrg/clj-mesos) -- I think that most of the dynamic
langs could use something like this to reduce the pain of building a
binding against a specific version. I ended up writing a simple rule-system
to generate the appropriate static/dynamic marshaller for each protobuf,
based on its naming convention and type.


On Fri, Jul 11, 2014 at 8:48 PM, Tim St Clair tstcl...@redhat.com wrote:

 +1, esp re: Go.

 Test harness for language bindings will be pretty important.

 Cheers,
 Tim

 --

 *From: *Niklas Nielsen nik...@mesosphere.io
 *To: *dev dev@mesos.apache.org
 *Cc: *u...@mesos.apache.org
 *Sent: *Thursday, July 10, 2014 5:57:49 PM
 *Subject: *Re: Mesos language bindings in the wild


 I just wanted to clarify - native, meaning _no_ dependency to libmesos and
 native to its language (only Go, only Python and so on) i.e. use the
 low-level API.

 Sorry for the confusion,
 Niklas


 On 10 July 2014 15:55, Dominic Hamon dha...@twopensource.com wrote:

 In my dream world, we wouldn't need any native bindings. I can imagine
 having example frameworks or starter frameworks that use the low-level API
 (the wire protocol with protocol buffers for message passing), but nothing
 like we have that needs C or JNI, etc.




 On Thu, Jul 10, 2014 at 3:26 PM, Niklas Nielsen nik...@mesosphere.io
 wrote:

  Hi all,
 
  I wanted to start a discussion around the language bindings in the wild
  (Go, Haskell, native Python, Go, Java and so on) and possibly get to a
  strategy where we start bringing those into Mesos proper. As most things
  points towards, it will probably make sense to focus on the native
  bindings leveraging the low-level API. To name one candidate to start
  with, we are especially interested in getting Go native support in Mesos
  proper (and in a solid state). So Vladimir, we'd be super thrilled to
 start
  collaborating with you on your current work.
 
  We are interested to hear what thoughts you all might have on this.
 
  Thanks,
  Niklas
 





 --
 Cheers,
 Timothy St. Clair
 Red Hat Inc.



Re: Trying to get task reconciliation to work

2014-04-18 Thread David Greenberg
Piggybacking onto this thread with a follow up question: what happens if
you ask the master to reconcile some tasks that weren't launched by your
framework? Will you get messages that express those tasks were unknown,
lost, or will nothing respond?

On Thursday, April 17, 2014, Sharma Podila spod...@netflix.com wrote:

 No problem, I have a better understanding now.
 And it was useful to see the three items you listed explicitly.


 On Thu, Apr 17, 2014 at 2:39 PM, Benjamin Mahler 
 benjamin.mah...@gmail.com wrote:

 Good to see you were playing around with reconciliation, we should have
 made the current semantics more clear. Especially in light of the fact that
 it's not implemented fully until one uses a strict registrar (likely
 0.20.0).

 Think of reconciliation as the fallback mechanism to ensure that state is
 consistent, it's not designed to be something to inform you of things you
 were already told (in this case, that the tasks were running). Although we
 could consider sending updates even when task state remains the same.


 For the purpose of this conversation, let's say we're in the 0.20.0 world,
 operating with the registrar. And let's assume your goal is to build a
 highly available framework (I will be documenting how to do this for
 0.20.0):

 (1) *When you receive a status update, you must persist this information
 before returning from the statusUpdate() callback*. Once you return from
 the callback, the driver will acknowledge the slave directly. Slaves will
 retry status update delivery *until* the acknowledgement is received from
 the scheduler driver in order to ensure that the framework processed the
 update.

 (2) *When you receive a slave lost signal, it means that your tasks
 that were running on that slave are in state TASK_LOST*, and any
 reconciliation you perform for these tasks will result in a reply of
 TASK_LOST. Most of the time we'll deliver these TASK_LOST automatically,
 but with a confluence of Master *and* Slave failovers, we are unaware of
 which tasks were running on the slave as we do not persist this information
 in the Master.

 (3) To guarantee that you have a consistent view of task states. *You
 must also periodically reconcile task state against the Master*. This is
 only because the delivery of the slave lost signal in (2) is not reliable
 (the Master could failover after removing a slave but before telling
 frameworks that the slave was lost).

 You'll notice that this model forces one to serially persist all status
 update changes. We are planning to expose mechanisms to allow batch
 acknowledgement of status updates in the lower-level API that benh has
 given talks about. With a lower-level API, it is possible to build more
 powerful libraries that hide much of these details!

 You'll also perhaps notice that only (1) and (3) are strictly required for
 consistency, but (2) is highly recommended as the vast majority of the time
 the slave lost signal will be delivered and you can take action quickly,
 without having to rely on periodic reconciliation.

 Please let me know if anything here was not clear!


 On Thu, Apr 17, 2014 at 1:47 PM, Sharma Podila spod...@netflix.comwrote:

 Should've looked at the code before sending the previous email...
  master/main.cpp confirmed what I needed to know. It doesn't look like I
 will be able to use reconcileTasks the way I thought I could. Effectively,
 a lack of callback could either mean that the master agrees with the
 requested reconcile task state, or that the task and/or slave is currently
 unknown. Which makes it an unreliable source of data. I understand this is
 expected to improve later by leveraging the registrar, but, I suspect
 there's more to it.

 I take it then that individual frameworks need to have their own
 mechanisms to ascertain the state of their tasks.


 On Thu, Apr 17, 2014 at 12:53 PM, Sharma Podila spod...@netflix.comwrote:

 Hello




Re: Trying to get task reconciliation to work

2014-04-18 Thread David Greenberg
So task reconciliation will always tell me if a task is finished when the
slave is still running, and it will give me TASK_LOST if the slave or task
is unknown to the master? If so, these semantics are very convenient for
frameworks that fail to failover in a timely manner, and then ask for tasks
that belonged to their previous FrameworkID.


On Fri, Apr 18, 2014 at 1:55 PM, Benjamin Mahler
benjamin.mah...@gmail.comwrote:

 Vinod, David is asking about tasks that belong to the framework in that
 they were launched by it, in which case your answer is not correct. We
 don't keep track of tasks so we don't know whether the task belongs to
 the framework in this sense.

 David, you will either receive TASK_LOST or nothing (if the slave for
 the task is in a transient state).

 This is determined more so by the SlaveID than the TaskID as the Master
 does not persistently track tasks.

 (a) If you're asking about an unknown slave, you will get TASK_LOST.
 (b) If you're asking about a known slave and an unknown task, you will get
 TASK_LOST.
 (c) If you're asking about a known slave and a known task with a different
 state, you will be sent the latest state.

 If you consider these semantics, you'll realize that you may receive
 TASK_LOST if you try to reconcile your task that finished correctly. This
 is why I mentioned the need to persist updates in (1) above. Let's say you
 receive a terminal update of TASK_FINISHED and then you still try to
 reconcile against a failed over Master. This new Master will reply with
 TASK_LOST because it is unaware of the task/slave. So, you will always
 receive your valid terminal update before getting a TASK_LOST from
 reconciliation.


 On Fri, Apr 18, 2014 at 10:46 AM, Vinod Kone vinodk...@gmail.com wrote:

 If a framework asks to reconcile a task that doesn't belong to it there
 would be no response from the master. This is nice because it avoids
 information leak between frameworks.


 On Fri, Apr 18, 2014 at 5:04 AM, David Greenberg dsg123456...@gmail.com
 wrote:

  Piggybacking onto this thread with a follow up question: what happens if
  you ask the master to reconcile some tasks that weren't launched by your
  framework? Will you get messages that express those tasks were unknown,
  lost, or will nothing respond?
 
 
  On Thursday, April 17, 2014, Sharma Podila spod...@netflix.com wrote:
 
  No problem, I have a better understanding now.
  And it was useful to see the three items you listed explicitly.
 
 
  On Thu, Apr 17, 2014 at 2:39 PM, Benjamin Mahler 
  benjamin.mah...@gmail.com wrote:
 
  Good to see you were playing around with reconciliation, we should have
  made the current semantics more clear. Especially in light of the fact
 that
  it's not implemented fully until one uses a strict registrar (likely
  0.20.0).
 
  Think of reconciliation as the fallback mechanism to ensure that state
 is
  consistent, it's not designed to be something to inform you of things
 you
  were already told (in this case, that the tasks were running).
 Although we
  could consider sending updates even when task state remains the same.
 
 
  For the purpose of this conversation, let's say we're in the 0.20.0
  world, operating with the registrar. And let's assume your goal is to
 build
  a highly available framework (I will be documenting how to do this for
  0.20.0):
 
  (1) *When you receive a status update, you must persist this
 information
  before returning from the statusUpdate() callback*. Once you return
 from

  the callback, the driver will acknowledge the slave directly. Slaves
 will
  retry status update delivery *until* the acknowledgement is received
 from
  the scheduler driver in order to ensure that the framework processed
 the
  update.
 
  (2) *When you receive a slave lost signal, it means that your tasks
  that were running on that slave are in state TASK_LOST*, and any

  reconciliation you perform for these tasks will result in a reply of
  TASK_LOST. Most of the time we'll deliver these TASK_LOST
 automatically,
  but with a confluence of Master *and* Slave failovers, we are unaware
 of
  which tasks were running on the slave as we do not persist this
 information
  in the Master.
 
  (3) To guarantee that you have a consistent view of task states. *You
  must also periodically reconcile task state against the Master*. This
 is

  only because the delivery of the slave lost signal in (2) is not
 reliable
  (the Master could failover after removing a slave but before telling
  frameworks that the slave was lost).
 
  You'll notice that this model forces one to serially persist all status
  update changes. We are planning to expose mechanisms to allow batch
  acknowledgement of status updates in the lower-level API that benh has
  given talks about. With a lower-level API, it is possible to build more
  powerful libraries that hide much of these details!
 
  You'll also perhaps notice that only (1) and (3) are strictly required
  for consistency, but (2

Integrating leader election with framework design

2014-04-16 Thread David Greenberg
Hello Mesos devs,
I'm trying to integrate leader election into the framework I already wrote.
The framework uses a shared database, and will be fine as long as at most
one copy is running at a given time. I am using Curator to provide the
leader election. What I'm not sure about is how to handle when the curator
goes into a SUSPENDED state--I'd like to stop the driver, and restart it
once the connection RECONNECTs, but I'm not sure if I can do that without
creating a whole new MesosSchedulerDriver. Is this the right way to go
about it? Should I just ignore SUSPENDED? What do other frameworks do?


What happens if a scheduler registers with a framework ID that hasn't been used in 48 hours?

2014-04-16 Thread David Greenberg
I don't recall the exact timeout of framework IDs, but what I'm wondering
is what happens if a scheduler tries to failover, but the failover grace
period has elapsed? Does it fail to register, or does it successfully
register and all the old executors are just gone?


Question about LOST status on custom executor

2014-04-07 Thread David Greenberg
I'm working on porting my executor from the CommandExecutor to a custom
executor, in order to take advantage of other features of Mesos. I started
by changing the TaskInfo in the scheduler to define ExecutorInfo instead of
CommandInfo, where the ExecutorInfo's command is the same as the original
CommandInfo. I gave the executor a random ID.

I can see that the executor successfully starts and seems to connect to
Mesos. After a few moments (10s - 100s of ms), the executor fails with the
LOST status.

Am I responsible for explicitly managing the TaskState lifecycle of the
executor? That is, do I need to immediately send the TASK_STARTING status
update, and then send the TASK_RUNNING update once the task has begun? Are
there any heartbeats that I'm responsible for?

Thanks,
David


Re: Question about LOST status on custom executor

2014-04-07 Thread David Greenberg
So, I don't need to notify about STARTING? But I should inform RUNNING,
FINISHED, and FAILED?


On Mon, Apr 7, 2014 at 4:54 PM, Benjamin Mahler
benjamin.mah...@gmail.comwrote:

 Why is your executor failing? When you say failing, is your executor
 crashing or simply exiting after doing the required work?

 You will need to manage the task status lifecycle. If your executor is
 holding non-terminal tasks and it exits, the slave will report these tasks
 as LOST since it does not know whether the tasks were run to completion.
 Your executor will at the very least need to report when things are
 FINISHED or FAILED.

 It's also good practice to report once things are RUNNING to keep your
 scheduler well informed.

 Hope this helps,
 Ben


 On Mon, Apr 7, 2014 at 11:35 AM, David Greenberg dsg123456...@gmail.com
 wrote:

  I'm working on porting my executor from the CommandExecutor to a custom
  executor, in order to take advantage of other features of Mesos. I
 started
  by changing the TaskInfo in the scheduler to define ExecutorInfo instead
 of
  CommandInfo, where the ExecutorInfo's command is the same as the original
  CommandInfo. I gave the executor a random ID.
 
  I can see that the executor successfully starts and seems to connect to
  Mesos. After a few moments (10s - 100s of ms), the executor fails with
 the
  LOST status.
 
  Am I responsible for explicitly managing the TaskState lifecycle of the
  executor? That is, do I need to immediately send the TASK_STARTING status
  update, and then send the TASK_RUNNING update once the task has begun?
 Are
  there any heartbeats that I'm responsible for?
 
  Thanks,
  David
 



What happens when I call reconcileTasks and database divergence

2014-03-07 Thread David Greenberg
I am trying to figure out how to use reconcileTasks to ensure that my DB of
tasks is synchronized with Mesos's tasks. Right now, I first commit the
fact that I ran a task to the DB, then I launchTasks. My concern is that
when I use reconcileTasks to ensure the DB state matches the Mesos state,
the launchTasks could've failed, and I'm not sure how the application can
discover that the task it thought it submitted was never submitted.

How do other frameworks deal with synchronizing their state with the Mesos
state?


[jira] [Commented] (MESOS-426) Python-based frameworks use old API and are broken

2013-08-08 Thread David Greenberg (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734242#comment-13734242
 ] 

David Greenberg commented on MESOS-426:
---

I've posted a review for this here: https://reviews.apache.org/r/13367/

Hopefully I've made it so that you can review it.

 Python-based frameworks use old API and are broken
 --

 Key: MESOS-426
 URL: https://issues.apache.org/jira/browse/MESOS-426
 Project: Mesos
  Issue Type: Bug
  Components: framework, python-api
Affects Versions: 0.9.0
Reporter: David Greenberg
Assignee: David Greenberg
 Attachments: mesos_changes.p1


 If you try to use mesos-submit or torque with mesos 0.9.0+, you get 
 exceptions due to API mismatches in these framework's expectations of the 
 python API.
 Steps to reproduce: try running mesos-submit mymaster echo hi, note the 
 stacktraces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira