Re: Auto scaling spark driver on Mesos!

2017-02-27 Thread Michael Gummelt
I assume you've looked into dynamic allocation.  What do you need that
isn't provided by dynamic allocation?

On Mon, Feb 27, 2017 at 4:11 AM, David J. Palaitis <
david.j.palai...@gmail.com> wrote:

> by using a combination of Spark's dynamic allocation, http://spark.
> apache.org/docs/latest/job-scheduling.html#configuration-and-setup, and a
> framework scheduler like Cook, https://github.com/
> twosigma/Cook/tree/master/spark, you can achieve the desired auto-scaling
> effect without the overhead of managing roles/constraints in mesos.  i'd be
> happy to discuss this in more detail if you decide to give it a try.
>
> On Mon, Feb 27, 2017 at 3:14 AM, Ashish Mehta 
> wrote:
>
>> Hi,
>>
>> We want to move to auto-scaling of spark driver, where in more resources
>> are added into the available resources for "spark driver" based on
>> requirement. The requirement can increase/decrease based on multiple SQL
>> queries being done over REST server, or number of queries with multiple
>> user over thrift server over Spark (HiveServer2).
>>
>> *Existing approach with static number of resources:*
>> We have a very large pool of resources, but my "spark driver" is
>> allocated limited amount of "static" resource, and we achieve this by
>> following
>>
>>1. While running the application I tag machine in Mesos with the name
>>of my application, so that the offer is made accordingly.
>>2. My application is run with the constraint for above tagged machine
>>using "spark.mesos.constraints" configuration, so that the
>>application only accept offer made by these tagged machine, and don't eat
>>up all the resource in my very large cluster.
>>3. Application launches executor on these accepted offers, and they
>>are used to do computation as defined by Spark job, or as and when queries
>>are fired over HTTP/Thrift server.
>>
>> *Approach for auto scaling:*
>> Auto-scaling of driver helps us in many ways, and lets us use the
>> resources with better efficiency.
>> For enabling auto scaling, where in my spark application will get more
>> and more resource offers, if it has consumed all the available resource,
>> the workflow will be as follows
>>
>>1. Running a daemon to monitor my app on Mesos
>>2. Keep on adding/removing machine for the application by
>>tagging/untagging them by monitoring the resource usage metric for my
>>application on Mesos.
>>3. Scale up/down based on Step 2 by tagging and untagging, and take
>>"some buffer" into account.
>>
>> I wanted to know the opinion of you guys on "*Approach for auto scaling*".
>> Is this the right approach to solve auto scaling of Spark driver?
>> Also tagging/untagging machine is something which we do to limit/manage
>> the resources in our big cluster.
>>
>> Thanks,
>> Ashish
>>
>
>


-- 
Michael Gummelt
Software Engineer
Mesosphere


[VOTE] Release Apache Mesos 1.1.1 (rc2)

2017-02-27 Thread Alex Rukletsov
 Hi all,

Please vote on releasing the following candidate as Apache Mesos 1.1.1.

1.1.1 includes the following:

** Bug
  * [MESOS-6002] - The whiteout file cannot be removed correctly using aufs
backend.
  * [MESOS-6010] - Docker registry puller shows decode error "No response
decoded".
  * [MESOS-6142] - Frameworks may RESERVE for an arbitrary role.
  * [MESOS-6360] - The handling of whiteout files in provisioner is not
correct.
  * [MESOS-6411] - Add documentation for CNI port-mapper plugin.
  * [MESOS-6526] - `mesos-containerizer launch --environment` exposes
executor env vars in `ps`.
  * [MESOS-6571] - Add "--task" flag to mesos-execute.
  * [MESOS-6597] - Include v1 Operator API protos in generated JAR and
python packages.
  * [MESOS-6606] - Reject optimized builds with libcxx before 3.9.
  * [MESOS-6621] - SSL downgrade path will CHECK-fail when using both
temporary and persistent sockets.
  * [MESOS-6624] - Master WebUI does not work on Firefox 45.
  * [MESOS-6676] - Always re-link with scheduler during re-registration.
  * [MESOS-6848] - The default executor does not exit if a single task pod
fails.
  * [MESOS-6852] - Nested container's launch command is not set correctly
in docker/runtime isolator.
  * [MESOS-6917] - Segfault when the executor sets an invalid UUID when
sending a status update.
  * [MESOS-7008] - Quota not recovered from registry in empty cluster.
  * [MESOS-7133] - mesos-fetcher fails with openssl-related output.

The CHANGELOG for the release is available at:
https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.1.1-rc2


The candidate for Mesos 1.1.1 release is available at:
https://dist.apache.org/repos/dist/dev/mesos/1.1.1-rc2/mesos-1.1.1.tar.gz

The tag to be voted on is 1.1.1-rc2:
https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.1.1-rc2

The MD5 checksum of the tarball can be found at:
https://dist.apache.org/repos/dist/dev/mesos/1.1.1-rc2/mesos-1.1.1.tar.gz.md5

The signature of the tarball can be found at:
https://dist.apache.org/repos/dist/dev/mesos/1.1.1-rc2/mesos-1.1.1.tar.gz.asc

The PGP key used to sign the release is here:
https://dist.apache.org/repos/dist/release/mesos/KEYS

The JAR is up in Maven in a staging repository here:
https://repository.apache.org/content/repositories/orgapachemesos-1182

Please vote on releasing this package as Apache Mesos 1.1.1!

The vote is open until Thu Mar  2 23:59:59 CET 2017 and passes if a
majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Mesos 1.1.1
[ ] -1 Do not release this package because ...

Thanks,
Till & Alex


Re: Auto scaling spark driver on Mesos!

2017-02-27 Thread David J. Palaitis
by using a combination of Spark's dynamic allocation,
http://spark.apache.org/docs/latest/job-scheduling.html#configuration-and-setup,
and a framework scheduler like Cook,
https://github.com/twosigma/Cook/tree/master/spark, you can achieve the
desired auto-scaling effect without the overhead of managing
roles/constraints in mesos.  i'd be happy to discuss this in more detail if
you decide to give it a try.

On Mon, Feb 27, 2017 at 3:14 AM, Ashish Mehta 
wrote:

> Hi,
>
> We want to move to auto-scaling of spark driver, where in more resources
> are added into the available resources for "spark driver" based on
> requirement. The requirement can increase/decrease based on multiple SQL
> queries being done over REST server, or number of queries with multiple
> user over thrift server over Spark (HiveServer2).
>
> *Existing approach with static number of resources:*
> We have a very large pool of resources, but my "spark driver" is allocated
> limited amount of "static" resource, and we achieve this by following
>
>1. While running the application I tag machine in Mesos with the name
>of my application, so that the offer is made accordingly.
>2. My application is run with the constraint for above tagged machine
>using "spark.mesos.constraints" configuration, so that the application
>only accept offer made by these tagged machine, and don't eat up all the
>resource in my very large cluster.
>3. Application launches executor on these accepted offers, and they
>are used to do computation as defined by Spark job, or as and when queries
>are fired over HTTP/Thrift server.
>
> *Approach for auto scaling:*
> Auto-scaling of driver helps us in many ways, and lets us use the
> resources with better efficiency.
> For enabling auto scaling, where in my spark application will get more and
> more resource offers, if it has consumed all the available resource, the
> workflow will be as follows
>
>1. Running a daemon to monitor my app on Mesos
>2. Keep on adding/removing machine for the application by
>tagging/untagging them by monitoring the resource usage metric for my
>application on Mesos.
>3. Scale up/down based on Step 2 by tagging and untagging, and take
>"some buffer" into account.
>
> I wanted to know the opinion of you guys on "*Approach for auto scaling*".
> Is this the right approach to solve auto scaling of Spark driver?
> Also tagging/untagging machine is something which we do to limit/manage
> the resources in our big cluster.
>
> Thanks,
> Ashish
>


Auto scaling spark driver on Mesos!

2017-02-27 Thread Ashish Mehta
Hi,

We want to move to auto-scaling of spark driver, where in more resources
are added into the available resources for "spark driver" based on
requirement. The requirement can increase/decrease based on multiple SQL
queries being done over REST server, or number of queries with multiple
user over thrift server over Spark (HiveServer2).

*Existing approach with static number of resources:*
We have a very large pool of resources, but my "spark driver" is allocated
limited amount of "static" resource, and we achieve this by following

   1. While running the application I tag machine in Mesos with the name of
   my application, so that the offer is made accordingly.
   2. My application is run with the constraint for above tagged machine
   using "spark.mesos.constraints" configuration, so that the application
   only accept offer made by these tagged machine, and don't eat up all the
   resource in my very large cluster.
   3. Application launches executor on these accepted offers, and they are
   used to do computation as defined by Spark job, or as and when queries are
   fired over HTTP/Thrift server.

*Approach for auto scaling:*
Auto-scaling of driver helps us in many ways, and lets us use the resources
with better efficiency.
For enabling auto scaling, where in my spark application will get more and
more resource offers, if it has consumed all the available resource, the
workflow will be as follows

   1. Running a daemon to monitor my app on Mesos
   2. Keep on adding/removing machine for the application by
   tagging/untagging them by monitoring the resource usage metric for my
   application on Mesos.
   3. Scale up/down based on Step 2 by tagging and untagging, and take
   "some buffer" into account.

I wanted to know the opinion of you guys on "*Approach for auto scaling*".
Is this the right approach to solve auto scaling of Spark driver?
Also tagging/untagging machine is something which we do to limit/manage the
resources in our big cluster.

Thanks,
Ashish