Re: Auto scaling spark driver on Mesos!
I assume you've looked into dynamic allocation. What do you need that isn't provided by dynamic allocation? On Mon, Feb 27, 2017 at 4:11 AM, David J. Palaitis < david.j.palai...@gmail.com> wrote: > by using a combination of Spark's dynamic allocation, http://spark. > apache.org/docs/latest/job-scheduling.html#configuration-and-setup, and a > framework scheduler like Cook, https://github.com/ > twosigma/Cook/tree/master/spark, you can achieve the desired auto-scaling > effect without the overhead of managing roles/constraints in mesos. i'd be > happy to discuss this in more detail if you decide to give it a try. > > On Mon, Feb 27, 2017 at 3:14 AM, Ashish Mehta > wrote: > >> Hi, >> >> We want to move to auto-scaling of spark driver, where in more resources >> are added into the available resources for "spark driver" based on >> requirement. The requirement can increase/decrease based on multiple SQL >> queries being done over REST server, or number of queries with multiple >> user over thrift server over Spark (HiveServer2). >> >> *Existing approach with static number of resources:* >> We have a very large pool of resources, but my "spark driver" is >> allocated limited amount of "static" resource, and we achieve this by >> following >> >>1. While running the application I tag machine in Mesos with the name >>of my application, so that the offer is made accordingly. >>2. My application is run with the constraint for above tagged machine >>using "spark.mesos.constraints" configuration, so that the >>application only accept offer made by these tagged machine, and don't eat >>up all the resource in my very large cluster. >>3. Application launches executor on these accepted offers, and they >>are used to do computation as defined by Spark job, or as and when queries >>are fired over HTTP/Thrift server. >> >> *Approach for auto scaling:* >> Auto-scaling of driver helps us in many ways, and lets us use the >> resources with better efficiency. >> For enabling auto scaling, where in my spark application will get more >> and more resource offers, if it has consumed all the available resource, >> the workflow will be as follows >> >>1. Running a daemon to monitor my app on Mesos >>2. Keep on adding/removing machine for the application by >>tagging/untagging them by monitoring the resource usage metric for my >>application on Mesos. >>3. Scale up/down based on Step 2 by tagging and untagging, and take >>"some buffer" into account. >> >> I wanted to know the opinion of you guys on "*Approach for auto scaling*". >> Is this the right approach to solve auto scaling of Spark driver? >> Also tagging/untagging machine is something which we do to limit/manage >> the resources in our big cluster. >> >> Thanks, >> Ashish >> > > -- Michael Gummelt Software Engineer Mesosphere
[VOTE] Release Apache Mesos 1.1.1 (rc2)
Hi all, Please vote on releasing the following candidate as Apache Mesos 1.1.1. 1.1.1 includes the following: ** Bug * [MESOS-6002] - The whiteout file cannot be removed correctly using aufs backend. * [MESOS-6010] - Docker registry puller shows decode error "No response decoded". * [MESOS-6142] - Frameworks may RESERVE for an arbitrary role. * [MESOS-6360] - The handling of whiteout files in provisioner is not correct. * [MESOS-6411] - Add documentation for CNI port-mapper plugin. * [MESOS-6526] - `mesos-containerizer launch --environment` exposes executor env vars in `ps`. * [MESOS-6571] - Add "--task" flag to mesos-execute. * [MESOS-6597] - Include v1 Operator API protos in generated JAR and python packages. * [MESOS-6606] - Reject optimized builds with libcxx before 3.9. * [MESOS-6621] - SSL downgrade path will CHECK-fail when using both temporary and persistent sockets. * [MESOS-6624] - Master WebUI does not work on Firefox 45. * [MESOS-6676] - Always re-link with scheduler during re-registration. * [MESOS-6848] - The default executor does not exit if a single task pod fails. * [MESOS-6852] - Nested container's launch command is not set correctly in docker/runtime isolator. * [MESOS-6917] - Segfault when the executor sets an invalid UUID when sending a status update. * [MESOS-7008] - Quota not recovered from registry in empty cluster. * [MESOS-7133] - mesos-fetcher fails with openssl-related output. The CHANGELOG for the release is available at: https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.1.1-rc2 The candidate for Mesos 1.1.1 release is available at: https://dist.apache.org/repos/dist/dev/mesos/1.1.1-rc2/mesos-1.1.1.tar.gz The tag to be voted on is 1.1.1-rc2: https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.1.1-rc2 The MD5 checksum of the tarball can be found at: https://dist.apache.org/repos/dist/dev/mesos/1.1.1-rc2/mesos-1.1.1.tar.gz.md5 The signature of the tarball can be found at: https://dist.apache.org/repos/dist/dev/mesos/1.1.1-rc2/mesos-1.1.1.tar.gz.asc The PGP key used to sign the release is here: https://dist.apache.org/repos/dist/release/mesos/KEYS The JAR is up in Maven in a staging repository here: https://repository.apache.org/content/repositories/orgapachemesos-1182 Please vote on releasing this package as Apache Mesos 1.1.1! The vote is open until Thu Mar 2 23:59:59 CET 2017 and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Mesos 1.1.1 [ ] -1 Do not release this package because ... Thanks, Till & Alex
Re: Auto scaling spark driver on Mesos!
by using a combination of Spark's dynamic allocation, http://spark.apache.org/docs/latest/job-scheduling.html#configuration-and-setup, and a framework scheduler like Cook, https://github.com/twosigma/Cook/tree/master/spark, you can achieve the desired auto-scaling effect without the overhead of managing roles/constraints in mesos. i'd be happy to discuss this in more detail if you decide to give it a try. On Mon, Feb 27, 2017 at 3:14 AM, Ashish Mehta wrote: > Hi, > > We want to move to auto-scaling of spark driver, where in more resources > are added into the available resources for "spark driver" based on > requirement. The requirement can increase/decrease based on multiple SQL > queries being done over REST server, or number of queries with multiple > user over thrift server over Spark (HiveServer2). > > *Existing approach with static number of resources:* > We have a very large pool of resources, but my "spark driver" is allocated > limited amount of "static" resource, and we achieve this by following > >1. While running the application I tag machine in Mesos with the name >of my application, so that the offer is made accordingly. >2. My application is run with the constraint for above tagged machine >using "spark.mesos.constraints" configuration, so that the application >only accept offer made by these tagged machine, and don't eat up all the >resource in my very large cluster. >3. Application launches executor on these accepted offers, and they >are used to do computation as defined by Spark job, or as and when queries >are fired over HTTP/Thrift server. > > *Approach for auto scaling:* > Auto-scaling of driver helps us in many ways, and lets us use the > resources with better efficiency. > For enabling auto scaling, where in my spark application will get more and > more resource offers, if it has consumed all the available resource, the > workflow will be as follows > >1. Running a daemon to monitor my app on Mesos >2. Keep on adding/removing machine for the application by >tagging/untagging them by monitoring the resource usage metric for my >application on Mesos. >3. Scale up/down based on Step 2 by tagging and untagging, and take >"some buffer" into account. > > I wanted to know the opinion of you guys on "*Approach for auto scaling*". > Is this the right approach to solve auto scaling of Spark driver? > Also tagging/untagging machine is something which we do to limit/manage > the resources in our big cluster. > > Thanks, > Ashish >
Auto scaling spark driver on Mesos!
Hi, We want to move to auto-scaling of spark driver, where in more resources are added into the available resources for "spark driver" based on requirement. The requirement can increase/decrease based on multiple SQL queries being done over REST server, or number of queries with multiple user over thrift server over Spark (HiveServer2). *Existing approach with static number of resources:* We have a very large pool of resources, but my "spark driver" is allocated limited amount of "static" resource, and we achieve this by following 1. While running the application I tag machine in Mesos with the name of my application, so that the offer is made accordingly. 2. My application is run with the constraint for above tagged machine using "spark.mesos.constraints" configuration, so that the application only accept offer made by these tagged machine, and don't eat up all the resource in my very large cluster. 3. Application launches executor on these accepted offers, and they are used to do computation as defined by Spark job, or as and when queries are fired over HTTP/Thrift server. *Approach for auto scaling:* Auto-scaling of driver helps us in many ways, and lets us use the resources with better efficiency. For enabling auto scaling, where in my spark application will get more and more resource offers, if it has consumed all the available resource, the workflow will be as follows 1. Running a daemon to monitor my app on Mesos 2. Keep on adding/removing machine for the application by tagging/untagging them by monitoring the resource usage metric for my application on Mesos. 3. Scale up/down based on Step 2 by tagging and untagging, and take "some buffer" into account. I wanted to know the opinion of you guys on "*Approach for auto scaling*". Is this the right approach to solve auto scaling of Spark driver? Also tagging/untagging machine is something which we do to limit/manage the resources in our big cluster. Thanks, Ashish