Hi,

We want to move to auto-scaling of spark driver, where in more resources
are added into the available resources for "spark driver" based on
requirement. The requirement can increase/decrease based on multiple SQL
queries being done over REST server, or number of queries with multiple
user over thrift server over Spark (HiveServer2).

*Existing approach with static number of resources:*
We have a very large pool of resources, but my "spark driver" is allocated
limited amount of "static" resource, and we achieve this by following

   1. While running the application I tag machine in Mesos with the name of
   my application, so that the offer is made accordingly.
   2. My application is run with the constraint for above tagged machine
   using "spark.mesos.constraints" configuration, so that the application
   only accept offer made by these tagged machine, and don't eat up all the
   resource in my very large cluster.
   3. Application launches executor on these accepted offers, and they are
   used to do computation as defined by Spark job, or as and when queries are
   fired over HTTP/Thrift server.

*Approach for auto scaling:*
Auto-scaling of driver helps us in many ways, and lets us use the resources
with better efficiency.
For enabling auto scaling, where in my spark application will get more and
more resource offers, if it has consumed all the available resource, the
workflow will be as follows

   1. Running a daemon to monitor my app on Mesos
   2. Keep on adding/removing machine for the application by
   tagging/untagging them by monitoring the resource usage metric for my
   application on Mesos.
   3. Scale up/down based on Step 2 by tagging and untagging, and take
   "some buffer" into account.

I wanted to know the opinion of you guys on "*Approach for auto scaling*".
Is this the right approach to solve auto scaling of Spark driver?
Also tagging/untagging machine is something which we do to limit/manage the
resources in our big cluster.

Thanks,
Ashish

Reply via email to