Re: Prepping for next release

2015-09-02 Thread Kevin Sweeney
I'd be in favor of setting that flag to Java 7 as well - just because
classes are compiled in Java 6 format doesn't mean the standard library
classes they reference will be available on Java 6 - your compiler
classpath contains Java 7's rt.jar, which contains classes that don't exist
in Java 6's rt.jar.

On Tue, Sep 1, 2015 at 5:08 PM, Vinod Kone <vinodk...@apache.org> wrote:

> Actually looking at the RC1 jar more closely, it looks like the classes
> are built for 1.6 (our pom file
> <https://github.com/apache/mesos/blob/master/src/java/mesos.pom.in#L117>actually
> sets this via maven compiler plugin).
>
> $ file ~/Downloads/Executor.class
>
> /Users/vinod/Downloads/Executor.class: compiled Java class data, version
> 50.0 (Java 1.6)
>
> The confusing part (for me) is that jar's manifest says "Build-Jdk:
> 1.7.0_60" but AFAICT that just means JDK7 was used to build the JAR. It
> has nothing to do with the version of the generated byte code.
>
> So, I think we are OK here.
>
>
> On Tue, Sep 1, 2015 at 5:03 PM, Kevin Sweeney <kevi...@apache.org> wrote:
>
>> I'm generally in favor of dropping support for JDK6 as it's been
>> end-of-life for years.
>>
>> On Tue, Sep 1, 2015 at 4:46 PM, Vinod Kone <vinodk...@apache.org> wrote:
>>
>>> +user
>>>
>>> So looks like this issue is related to JDK6 and not my maven password
>>> settings.
>>>
>>> Related ASF ticket: https://issues.apache.org/jira/browse/BUILDS-85
>>>
>>> The reason it worked for me, when I tagged RC1, was because I also
>>> pointed my maven to use JDK7.
>>>
>>> So we have couple options here:
>>>
>>> #1) (Easy) Do same thing with RC2 as we did for RC1. This does mean the
>>> artifacts we upload to nexus will be compiled with JDK7. IIUC, if any JVM
>>> based frameworks are still on JDK6 they can't link in the new artifacts?
>>>
>>> #2) (Harder) As mentioned in the ticket, have maven compile Mesos jar
>>> with JDK6 but use JDK7 when uploading. Not sure how easy it is to adapt our
>>> Mesos build tool chain for this. Anyone has expertise in this area?
>>>
>>> Thoughts?
>>>
>>>
>>> On Tue, Aug 18, 2015 at 3:14 PM, Vinod Kone <vinodk...@apache.org>
>>> wrote:
>>>
>>>> I re-encrypted the maven passwords and that seemed to have done the
>>>> trick. Thanks Adam!
>>>>
>>>> On Tue, Aug 18, 2015 at 1:59 PM, Adam Bordelon <a...@mesosphere.io>
>>>> wrote:
>>>>
>>>>> Update your ~/.m2/settings.xml?
>>>>> Also check that the output of `gpg --list-keys` and `--list-sigs`
>>>>> matches
>>>>> the keypair you expect
>>>>>
>>>>> On Tue, Aug 18, 2015 at 1:48 PM, Vinod Kone <vinodk...@apache.org>
>>>>> wrote:
>>>>>
>>>>> > I definitely had to create a new gpg key because my previous one
>>>>> expired! I
>>>>> > uploaded them id.apache and our SVN repo containing KEYS.
>>>>> >
>>>>> > Do I need to do anything specific for maven?
>>>>> >
>>>>> > On Tue, Aug 18, 2015 at 1:25 PM, Adam Bordelon <a...@mesosphere.io>
>>>>> wrote:
>>>>> >
>>>>> > > Haven't seen that one. Are you sure you've got your gpg key
>>>>> properly set
>>>>> > up
>>>>> > > with Maven?
>>>>> > >
>>>>> > > On Tue, Aug 18, 2015 at 1:13 PM, Vinod Kone <vinodk...@apache.org>
>>>>> > wrote:
>>>>> > >
>>>>> > > > I'm getting the following error when running ./support/tag.sh.
>>>>> Has any
>>>>> > of
>>>>> > > > the recent release managers seen this one before?
>>>>> > > >
>>>>> > > > [ERROR] Failed to execute goal
>>>>> > > > org.apache.maven.plugins:maven-deploy-plugin:2.7:deploy
>>>>> > (default-deploy)
>>>>> > > on
>>>>> > > > project mesos: Failed to deploy artifacts: Could not transfer
>>>>> artifact
>>>>> > > > org.apache.mesos:mesos:jar:0.24.0-rc1 from/to
>>>>> apache.releases.https (
>>>>> > > >
>>>>> https://repository.apache.org/service/local/staging/deploy/maven2):
>>>>> > >

Re: Prepping for next release

2015-09-01 Thread Kevin Sweeney
I'm generally in favor of dropping support for JDK6 as it's been
end-of-life for years.

On Tue, Sep 1, 2015 at 4:46 PM, Vinod Kone  wrote:

> +user
>
> So looks like this issue is related to JDK6 and not my maven password
> settings.
>
> Related ASF ticket: https://issues.apache.org/jira/browse/BUILDS-85
>
> The reason it worked for me, when I tagged RC1, was because I also pointed
> my maven to use JDK7.
>
> So we have couple options here:
>
> #1) (Easy) Do same thing with RC2 as we did for RC1. This does mean the
> artifacts we upload to nexus will be compiled with JDK7. IIUC, if any JVM
> based frameworks are still on JDK6 they can't link in the new artifacts?
>
> #2) (Harder) As mentioned in the ticket, have maven compile Mesos jar with
> JDK6 but use JDK7 when uploading. Not sure how easy it is to adapt our
> Mesos build tool chain for this. Anyone has expertise in this area?
>
> Thoughts?
>
>
> On Tue, Aug 18, 2015 at 3:14 PM, Vinod Kone  wrote:
>
>> I re-encrypted the maven passwords and that seemed to have done the
>> trick. Thanks Adam!
>>
>> On Tue, Aug 18, 2015 at 1:59 PM, Adam Bordelon 
>> wrote:
>>
>>> Update your ~/.m2/settings.xml?
>>> Also check that the output of `gpg --list-keys` and `--list-sigs` matches
>>> the keypair you expect
>>>
>>> On Tue, Aug 18, 2015 at 1:48 PM, Vinod Kone 
>>> wrote:
>>>
>>> > I definitely had to create a new gpg key because my previous one
>>> expired! I
>>> > uploaded them id.apache and our SVN repo containing KEYS.
>>> >
>>> > Do I need to do anything specific for maven?
>>> >
>>> > On Tue, Aug 18, 2015 at 1:25 PM, Adam Bordelon 
>>> wrote:
>>> >
>>> > > Haven't seen that one. Are you sure you've got your gpg key properly
>>> set
>>> > up
>>> > > with Maven?
>>> > >
>>> > > On Tue, Aug 18, 2015 at 1:13 PM, Vinod Kone 
>>> > wrote:
>>> > >
>>> > > > I'm getting the following error when running ./support/tag.sh. Has
>>> any
>>> > of
>>> > > > the recent release managers seen this one before?
>>> > > >
>>> > > > [ERROR] Failed to execute goal
>>> > > > org.apache.maven.plugins:maven-deploy-plugin:2.7:deploy
>>> > (default-deploy)
>>> > > on
>>> > > > project mesos: Failed to deploy artifacts: Could not transfer
>>> artifact
>>> > > > org.apache.mesos:mesos:jar:0.24.0-rc1 from/to
>>> apache.releases.https (
>>> > > > https://repository.apache.org/service/local/staging/deploy/maven2
>>> ):
>>> > > > java.lang.RuntimeException: Could not generate DH keypair: Prime
>>> size
>>> > > must
>>> > > > be multiple of 64, and can only range from 512 to 1024 (inclusive)
>>> ->
>>> > > [Help
>>> > > > 1]
>>> > > >
>>> > > > On Mon, Aug 17, 2015 at 11:23 AM, Vinod Kone >> >
>>> > > wrote:
>>> > > >
>>> > > > > Update:
>>> > > > >
>>> > > > > There are 3 outstanding tickets (all related to flaky tests),
>>> that we
>>> > > are
>>> > > > > trying to resolve. Any help fixing those (esp. MESOS-3050
>>> > > > > ) would be
>>> > > > appreciated!
>>> > > > >
>>> > > > > Planning to cut an RC as soon as they are fixed (assuming no new
>>> ones
>>> > > > crop
>>> > > > > up).
>>> > > > >
>>> > > > > Thanks,
>>> > > > >
>>> > > > > On Fri, Aug 14, 2015 at 7:50 AM, James DeFelice <
>>> > > > james.defel...@gmail.com>
>>> > > > > wrote:
>>> > > > >
>>> > > > >> Awesome - thanks so much!
>>> > > > >>
>>> > > > >> On Fri, Aug 14, 2015 at 9:37 AM, Bernd Mathiske <
>>> > be...@mesosphere.io>
>>> > > > >> wrote:
>>> > > > >>
>>> > > > >> > I just committed it. Thanks, James!
>>> > > > >> >
>>> > > > >> > > On Aug 13, 2015, at 9:53 PM, James DeFelice <
>>> > > > james.defel...@gmail.com
>>> > > > >> >
>>> > > > >> > wrote:
>>> > > > >> > >
>>> > > > >> > > Hi Vinod,
>>> > > > >> > >
>>> > > > >> > > Would *really* like to see
>>> > > > >> > https://issues.apache.org/jira/browse/MESOS-2841
>>> > > > >> > > in 0.24.0. Currently in review.
>>> > > > >> > >
>>> > > > >> > > Any chance that can make it in?
>>> > > > >> > >
>>> > > > >> > >
>>> > > > >> > > On Wed, Aug 12, 2015 at 1:16 PM, Vinod Kone <
>>> > vinodk...@apache.org
>>> > > >
>>> > > > >> > wrote:
>>> > > > >> > >
>>> > > > >> > >> Removed the target versions for all unresolved tickets
>>> (except
>>> > > for
>>> > > > >> HTTP
>>> > > > >> > >> scheduler API ones) targeted for 0.24.0
>>> > > > >> > >> 
>>> > > > >> > >>
>>> > > > >> > >> Hoping to cut an RC tomorrow.
>>> > > > >> > >>
>>> > > > >> > >> On Wed, Aug 5, 2015 at 11:31 AM, Vinod Kone <
>>> > vinodk...@gmail.com
>>> > > >
>>> > > > >> > wrote:
>>> > > > >> > >>
>>> > > > >> > >>> Hi,
>>> > > > >> > >>>
>>> > > > >> > >>> The tracking ticket for the 0.24.0 release is
>>> > > > >> > >>> https://issues.apache.org/jira/browse/MESOS-2562
>>> > > > >> > >>>
>>> > > > >> > >>> The main feature of this release is going to be v1 (beta)

Re: Custom python executor with Docker

2015-08-11 Thread Kevin Sweeney
Apache Aurora [1] uses a custom Python executor and supports Docker via the
containerizer. There's just one problem - the container has to have a
Python2.7 runtime inside that can run the executor PEX file [2]. So if
you're okay with that restriction you're in business (and you can use the
Aurora configuration DSL to describe setup/teardown steps).

[1] https://aurora.apache.org
[2] https://pex.readthedocs.org/en/latest/

On Tue, Aug 11, 2015 at 4:42 PM, Tim Chen t...@mesosphere.io wrote:

 So currently there is a review out for pre-hooks (
 https://reviews.apache.org/r/36185/) before a docker container launches.

 We can also add a post hook, but like to see if the specified hook
 satifies what you guys are looking for.

 Tim

 On Tue, Aug 11, 2015 at 4:28 PM, Tom Fordon tom.for...@gmail.com wrote:

 We ended up implementing a solution where we did the pre/post steps as
 separate mesos tasks and adding logic to our scheduler to ensure they were
 run on the same machine.  If anybody knows of a standard / openly available
 DockerExecutor like what is described below, my team would be greatly
 interested.


 On Fri, Aug 7, 2015 at 4:01 AM, Kapil Malik kma...@adobe.com wrote:

 Hi,



 We have a similar usecase while running multi-user workloads on mesos.
 Users provide docker images encapsulating application logic, which we (we =
 say some “Central API”) schedule on Chronos / Marathon. However, we need to
 run some standard pre / post steps for every docker submitted by users. We
 have following options –



 1.   Ask every user to embed their logic inside a pre-defined
 docker template which will perform pre/post steps.

 è This is error prone, makes us dependent on whether the users followed
 template, and not very popular with users either.



 2.   Extend every user docker (FROM ) and find a way to add
 pre-post steps in our docker. Refer this docker when scheduling on chronos
 / marathon.

 è Building new dockers does not scale as users and applications grow



 3.   Write a custom executor which will perform the pre-post steps
 and manage the user docker lifetime.

 è Deals with user docker lifetime and is obviously complex.



 Is there a standard / openly available DockerExecutor which manages the
 docker lifetime and which I can extend to build my custom executor? This
 way I will be concerned only with my custom logic (pre/post steps) and
 still get benefits of a standard way to manage docker containers.



 Btw, thanks for the meaningful discussion below, it is very helpful.



 Thanks and regards,



 Kapil Malik | kma...@adobe.com | 33430 / 8800836581



 *From:* James DeFelice [mailto:james.defel...@gmail.com]
 *Sent:* 09 April 2015 18:12
 *To:* user@mesos.apache.org
 *Subject:* Re: Custom python executor with Docker



 If you can run the pre/post steps in a container then I'd recommend
 building a Docker image that includes your pre/post step scripting + your
 algorithm and launching it using the built-in mesos Docker containerizer.
 It's much simpler than managing the lifetime of the Docker container
 yourself.



 On Thu, Apr 9, 2015 at 8:29 AM, Tom Fordon tom.for...@gmail.com wrote:

 Thanks for all the responses, I really appreciate the help.  Let me try
 to state my problem more clearly



 Our project is performing file-based data processing.  I would like to
 keep the actual algorithm as contained as possible since we are in an RD
 setting and will be getting untested code.  We have some pre/post steps
 that need to be run on the same box as the actual algorithm:
 downloading/uploading files and database calls.



 We can run the pre/post steps and algorithm within the same container.
 The algorithm will be a little less contained, but it will work.



 Docker letting you specify a cgroup parent is really exciting.  If I
 invoke a docker container with the executor as the cgroup-parent are there
 any other steps I need to perform?  Would I need to do anything special to
 make mesos aware of the resource usage, or is that handled since the docker
 process would be in the executors cgroup?



 Thanks again,

 Tom



 On Tue, Apr 7, 2015 at 8:10 PM, Timothy Chen tnac...@gmail.com wrote:

 Hi Tom(s),

 Tom Arnfeld is right, if you want to launch your own docker container
 in your custom executor you will have to handle all the issues
 yourself and not able to use the Docker containerizer at all.

 Alternatively, you can actually launch your custom executor in a
 Docker container by Mesos, by specifying the ContainerInfo in the
 ExecutorInfo.
 What this means is that your custom executor is already running in a
 docker container, and you can do your custom logic afterwards. This
 does means you can simply just launch multiple containers in the
 executor anymore.

 If there is something you want to do and doesnt' fit these let us know
 what you're trying to achieve and we can see what we can do.

 Tim

 On Tue, Apr 7, 2015 at 4:15 PM, Tom Arnfeld t...@duedil.com wrote:

  It's 

Re: mesosphere.io broken?

2015-06-17 Thread Kevin Sweeney
Are you sure that's the canonical domain? https://downloads.mesosphere.com
appears to present a certificate for *.mesosphere.io.

On Wed, Jun 17, 2015 at 2:05 PM, Cody Maloney c...@mesosphere.io wrote:

 Thanks for posting this. mesosphere.io should be back up now, and
 mesosphere.io/downloads now working.

 I would note that the mesosphere website, downloads now live at the domain
 'mesosphere.com'. Normally mesosphere.io redirects, but that bit of
 infrastructure unfortunately broke.

 Cody

 On Wed, Jun 17, 2015 at 7:46 AM Marco Massenzio ma...@mesosphere.io
 wrote:

 Just to add some color to the Elastic Mesos thing, we're working with
 Google to enable deploying a complete DCOS cluster on GCP using their brand
 new Deployment Manager (v2) via the Click-to-Deploy framework.

 We have these working on an experimental basis: we need to conduct a
 bit more testing and work on a couple of rough edges before we can
 release them beta for people to have a good user experience.

 I must say it's pretty exciting to click a button and see shortly
 aftewards a full Mesos Cluster come to life on Google Cloud, so I'm really
 itching to get the templates in a state where they can be used by other
 folks!



 *Marco Massenzio*
 *Distributed Systems Engineer*

 On Wed, Jun 17, 2015 at 4:30 AM, Alex Rukletsov a...@mesosphere.com
 wrote:

 For downloads, use https://mesosphere.com/downloads/
 Elastic Mesos has been decommissioned, use
 https://google.mesosphere.com/ or https://digitalocean.mesosphere.com/
 but keep in mind they will be decommissioned soon (~1 month) as well.
 However, if you want to try DCOS installation on AWS, check
 https://mesosphere.com/product/

 On Wed, Jun 17, 2015 at 12:51 PM, Brian Candler b.cand...@pobox.com
 wrote:

 Looking for Mesos .deb packages, on Google I find links to
 http://mesosphere.io/downloads/
 http://elastic.mesosphere.io/
 but these are giving 503 Service Unavailable errors.

 Is there a problem, or have these sites gone / migrated away?






Re: Mesos Security Recommendations

2015-06-04 Thread Kevin Sweeney
Jeff, have you succfessfully run stunnel with a Mesos cluster? I'd
anticipate it to be a bit difficult due to the way that slaves dynamically
discover masters via zookeeper. If I remember correctly, with stunnel you
need to configure all the tunnels beforehand, which would mean that every
master would need to enumerate every possible slave beforehand, and
vice-versa.

IMO that fairly severely limits the reliability of the system.

By the way, is there a design doc for how TLS between slave and master is
going to be implemented in 0.23.0?

On Thu, Jun 4, 2015 at 4:30 PM, Jeff Schroeder jeffschroe...@computer.org
wrote:

 For securing insecure network communication you can use something like
 stunnel, then point the app at the local stunnel. It would be a fair bit of
 hoops to configure it all with any your config management system, but is
 totally doable.


 On Thursday, June 4, 2015, John Webb webbj1...@hotmail.com wrote:

 All,

 I'm looking for some recommendations on how to encrypt Mesos Slave 
 Framework communication to the Mesos Master until Mesos v0.23 is released
 which will include SSL support. I'm concerned about having the slave 
 framework user/password being sent across our network in clear text.

 I would especially like to hear from people who actually running Mesos in
 production environment.

 Thanks,
 John Webb



 --
 Text by Jeff, typos by iPhone



Re: Upcoming change to the Scheduler API

2015-02-13 Thread Kevin Sweeney
Regarding the backwards-compatibility concern, would it make sense to add a
TaskStatusID field to the existing TaskStatus message instead of changing
the Scheduler signature?

On Friday, February 13, 2015, Benjamin Mahler benjamin.mah...@gmail.com
wrote:

 Hi all,

 As part of https://issues.apache.org/jira/browse/MESOS-2347, there is a
 scalability concern with the reconciliation API. Performing an implicit
 reconciliation results in a status update being sent for each task in the
 cluster. For large clusters in the tens of thousands of slaves, this can be
 begin to approach hundreds of thousands of status updates.

 With the current design of the driver, status updates must be persisted
 before the scheduler returns from the 'statusUpdate' callback, as the
 driver sends an acknowledgement implicitly once the call completes. This
 design forces the scheduler to synchronously process individual status
 updates.

 To remedy the issue, we're looking to introduce the ability to optionally
 specify whether the implicit acknowledgements are provided (during
 construction of the scheduler driver). If disabled, then the scheduler must
 send acknowledgments through a new 'acknowledgeStatusUpdate' call on the
 driver. Having explicit acknowledgements allows schedulers to process them
 asynchronously outside of the driver thread, and allows them to process
 updates in batch (e.g. 1:N storage operation:status updates).

 As part of the change, the underlying UUID of the status update needs to be
 exposed to the scheduler, which requires an update to the signature of
 'statusUpdate'. What this means is that when schedulers include the new
 headers/JAR/egg, they need to adjust their code to accept the new uuid
 argument, regardless of whether implicit acknowledgements are desired (to
 my knowledge, there is no way to expose the uuid without requiring
 schedulers to update their code, because of Java's interface semantics).

 I'd like to get this change landed for 0.22.0 to make reconciliation usable
 for large clusters. The patches are up on MESOS-2347. I've outlined the
 compatibility details and upgrade steps in
 https://reviews.apache.org/r/30978/

 Please share any high level feedback or concerns!

 Ben



-- 
Sent from Gmail Mobile


Re: Mesos language bindings in the wild

2014-07-18 Thread Kevin Sweeney
Piggybacking on this thread, Bill Farner and I (from Aurora) have been
working on pure JVM bindings for Mesos.

The code is here [1]. It's currently in a pretty rough state but we have a
demo scheduler that connects, authenticates, registers and launches Hello
World tasks.

We'd love to get feedback, pull requests, etc. Once it's in a reasonable
state we hope to use it in Aurora (and hope to maintain it as a separate
library the whole community can use).

[1] https://github.com/kevints/mesos-framework-api


On Thu, Jul 17, 2014 at 12:11 PM, Tim St Clair tstcl...@redhat.com wrote:



 --

 *From: *Niklas Nielsen nik...@mesosphere.io
 *To: *user@mesos.apache.org
 *Sent: *Thursday, July 17, 2014 1:53:49 PM

 *Subject: *Re: Mesos language bindings in the wild

 -1 for git submodules. I am really not keen on those; worked with them
 while working on Chromium and it was, to be frank, a mess to handle, update
 and maintain.

 Yeah... it can become unwieldy.


 I am rooting for separate repos. Maybe worth a non-binding vote?

 +1 (provided shared testing).



 Niklas


 On 17 July 2014 11:45, Tim St Clair tstcl...@redhat.com wrote:

 Inline -

 --

 *From: *Vladimir Vivien vladimir.viv...@gmail.com
 *To: *user@mesos.apache.org
 *Sent: *Tuesday, July 15, 2014 1:34:37 PM

 *Subject: *Re: Mesos language bindings in the wild

 Hi all,
  Apologies for being super late to this thread.  To answer Niklas point
 at the start of the thread: Yes, I am thrilled to contribute in anyway I
 can.  The project is moving forward and making progress (slower than I
 want, but progress regardless).

 Going Native
 Implementing a native client for Mesos is an arduous process right now
 since there's little doc to guide developers.  Once I went through C++ code
 and a few emails, it became easy (even easier than I thought).  If the push
 is for more native client, at some point we will need basic internals to be
 documented.

 Mesos-Certified
 Maybe a Mesos test suite can be used to certify native clients.  There
 are tons of unit tests in the code that already validate the source code.
  Maybe some of those test logic can be pulled out / copied into a small
 stand-alone mesos test server that clients can communicate with to run a
 test suite (just an idea).  This along with some documentation would help
 with quality of native clients.


 +1.


 In or Out of Core
 Having native clients source hosted in core would be great since all code
 would be in one location. Go code can certainly co-exist a subproject in
 Mesos.  Go's build workflow can be driven by Make. Go's dependency
 management can work with repo subdirectories (at least according to 'go
 help importpath', I haven't tested that myself).  But, as Tom pointed out,
 the thing that raises a flag for me is project velocity.  If author wants
 to move faster or slower than Mesos release cycles, there's no way to do so
 once the code is part of core.

 Anyway, I have gone on long enough.   Looking for ward to feedback.


 I usually don't tread here, but perhaps a git-submodule works in this
 narrow case.
 Thoughts?



 On Tue, Jul 15, 2014 at 10:07 AM, Tim St Clair tstcl...@redhat.com
 wrote:

 Tom -

 I understand the desire to create bindings outside the core.  The point
 I was trying to make earlier around version semantics and testing was to
 'Hedge' the risk.  It basically creates a contract between core 
 framework+bindings writers.

 No one ever intends to break compatibility, but it happens all the time
 and usually in some very subtle ways at first.  A great example of this is
 a patch I recently submitted to Mesos where the cgroup code was writing an
 extra endln out.  Earlier versions of the kernel had no issue with this,
 but recent modifications would cause the cgroup code to fail.  Very subtle,
 and boom-goes-the-dynamite.

 Below was an email I sent a while back, that outlines a possible
 hedge/contract.  Please let me know what you think.

 --
 
  Greetings!
 
  I've conversed with folks about the idea of having a more formalized
 release
  and branching strategy, such that others who are downstream can rely on
  certain version semantics when planning upgrades, etc.  This becomes
 doubly
  important as we start to trend towards a 1.0 release, and folks will
 depend
  heavily on it for their core infrastructure, and APIs (Frameworks, and
 EC).
 
  Therefore, I wanted to propose a more formalized branching and release
  strategy, and see what others think.  I slightly modified this pattern
 from
  the Condor  Kernel projects, which have well established processes.
 
  --
  Basic Idea:
 
  1.) Create 2 Main Branches (Stable/Devel-Master based)
  2.) Devel releases are cadence/time based and lightly tested.
  3.) Stable series only accepts bug fixes.  Merge path for all bug fixes
  deemed worthy, are through the stable series up to master.
  4.) @ some point devel 

Re: Framework unregistered

2014-06-27 Thread Kevin Sweeney
It would be good to call out explicitly in the SchedulerDriver docs that
stop() is a cluster-wide framework shutdown, possibly renaming that method
to something like killAllTasksAndUnregisterFramework to note this. The
current JavaDoc at
http://mesos.apache.org/api/latest/java/org/apache/mesos/SchedulerDriver.html#stop()
doesn't point out the danger and as a framework developer, I'd assume that
if I'm required to call start() (even though I failed over) I should call
stop() as well (as opposed to stop(true)).




On Fri, Jun 27, 2014 at 12:21 PM, Vinod Kone vinodk...@gmail.com wrote:

 Perhaps we should call this out explicitly when we back port and do bug
 fix releases (0.18.0 and 0.19.0) and urge people to upgrade lest this gets
 drowned out in the noise.


 On Fri, Jun 27, 2014 at 11:40 AM, Benjamin Hindman 
 benjamin.hind...@gmail.com wrote:

 Thanks for the bug report Whitney, this looks like a long standing bug
 that apparently is rarely exercised. Here is the JIRA ticket to follow for
 the fix: https://issues.apache.org/jira/browse/MESOS-1550

 For posterity, when the MesosSchedulerDriver instance gets cleaned up by
 the JVM garbage collector we also delete any underlying C++ objects that we
 created but not before we call 'MesosSchedulerDriver.stop'. -- Bug! We
 should never call stop, as that's what sends the 'unregister' request to
 the master.

 Short term fix: don't bother nulling out your instance of the
 MesosSchedulerDriver so that the garbage collector doesn't clean it up.
 (This is likely the common pattern and thus why this bug has lasted as long
 as it has.)


 On Fri, Jun 27, 2014 at 6:40 AM, Whitney Sorenson wsoren...@hubspot.com
 wrote:

 We've been running our Java framework for  6 mos. now and today, for
 what I can tell is the first time, mesos shut down our framework:

 I0627 09:07:05.740335  4753 master.cpp:1034] Asked to unregister
 framework sy3x2
 I0627 09:07:05.740466  4753 master.cpp:2688] Removing framework sy3x2

 All executors running our framework promptly shut down all tasks.

 This happened during a deployment of our framework, in which the
 existing framework shuts down, generally with a driver.abort() call
 followed by the process exiting, which normally (and today) results in the
 log entries:

 I0627 09:07:04.926462  4755 master.cpp:1079] Deactivating framework sy3x2
 I0627 09:07:04.926609  4755 hierarchical_allocator_process.hpp:408]
 Deactivated framework sy3x2

 To complete the deployment, a new framework process starts and shortly
 calls driver.start(). We pass a very large framework timeout parameter in
 order to ensure this never happens:

 I0627 09:51:49.545934  4751 master.cpp:617] Giving framework sy3x2
 1.65343915343915weeks to failover

 I have 2 questions:

 - How/why did the framework unregister? There are 0 calls to
 driver.stop() (after looking at SchedulerDriver again, I'm assuming this
 would accomplish the above) in our codebase (
 https://github.com/HubSpot/Singularity)

 - As a user, I don't think I'm even interested in this functionality
 being in Mesos. I've always figured setting a high framework timeout meant
 I was paying a cost that if I ever wanted to really shutdown my framework,
 I'd either have to wait 1.6 weeks, do some manual zookeeper manipulation,
 or simply start a new Mesos cluster - all of which are acceptable tradeoffs
 to me to avoid the possibility that Mesos shuts down the world. Assuming
 some frameworks still need this unregister functionality and at the same
 time - high framework timeouts - can we add a switch such that the
 framework can say whether or not it can be unregistered before framework
 timeout occurs?

 We are running 0.18.0.

 Thanks!

 -Whitney