Re: Best way to determine # of workers
I think the SparkListener is about as close as it gets. That way I can start up the instance (aws, open-stack, vmware, etc) and simply wait until the SparkListener indicates that the executors are online before starting. Thanks for the advise. Aaron On Fri, Mar 25, 2016 at 10:54 AM, Jacek Laskowski <ja...@japila.pl> wrote: > Hi, > > You may want to use SparkListener [1] (as webui) and listens to > SparkListenerExecutorAdded and SparkListenerExecutorRemoved. > > [1] > http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.scheduler.SparkListener > > Pozdrawiam, > Jacek Laskowski > > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Thu, Mar 24, 2016 at 3:24 PM, Aaron Jackson <ajack...@pobox.com> wrote: > > Well thats unfortunate, just means I have to scrape the webui for that > > information. As to why, I have a cluster that is being increased in > size to > > accommodate the processing requirements of a large set of jobs. Its > useful > > to know when the new workers have joined the spark cluster. In my > specific > > case, I may be growing the cluster size by a hundred nodes and if I fail > to > > wait for that initialization to complete the job will not have enough > memory > > to run my jobs. > > > > Aaron > > > > On Thu, Mar 24, 2016 at 3:07 AM, Takeshi Yamamuro <linguin@gmail.com > > > > wrote: > >> > >> Hi, > >> > >> There is no way to get such information from your app. > >> Why do you need that? > >> > >> thanks, > >> maropu > >> > >> On Thu, Mar 24, 2016 at 8:23 AM, Ajaxx <ajack...@pobox.com> wrote: > >>> > >>> I'm building some elasticity into my model and I'd like to know when my > >>> workers have come online. It appears at present that the API only > >>> supports > >>> getting information about applications. Is there a good way to > determine > >>> how many workers are available? > >>> > >>> > >>> > >>> -- > >>> View this message in context: > >>> > http://apache-spark-user-list.1001560.n3.nabble.com/Best-way-to-determine-of-workers-tp26586.html > >>> Sent from the Apache Spark User List mailing list archive at > Nabble.com. > >>> > >>> - > >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >>> For additional commands, e-mail: user-h...@spark.apache.org > >>> > >> > >> > >> > >> -- > >> --- > >> Takeshi Yamamuro > > > > >
Re: Best way to determine # of workers
Hi, You may want to use SparkListener [1] (as webui) and listens to SparkListenerExecutorAdded and SparkListenerExecutorRemoved. [1] http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.scheduler.SparkListener Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Thu, Mar 24, 2016 at 3:24 PM, Aaron Jackson <ajack...@pobox.com> wrote: > Well thats unfortunate, just means I have to scrape the webui for that > information. As to why, I have a cluster that is being increased in size to > accommodate the processing requirements of a large set of jobs. Its useful > to know when the new workers have joined the spark cluster. In my specific > case, I may be growing the cluster size by a hundred nodes and if I fail to > wait for that initialization to complete the job will not have enough memory > to run my jobs. > > Aaron > > On Thu, Mar 24, 2016 at 3:07 AM, Takeshi Yamamuro <linguin@gmail.com> > wrote: >> >> Hi, >> >> There is no way to get such information from your app. >> Why do you need that? >> >> thanks, >> maropu >> >> On Thu, Mar 24, 2016 at 8:23 AM, Ajaxx <ajack...@pobox.com> wrote: >>> >>> I'm building some elasticity into my model and I'd like to know when my >>> workers have come online. It appears at present that the API only >>> supports >>> getting information about applications. Is there a good way to determine >>> how many workers are available? >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Best-way-to-determine-of-workers-tp26586.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> - >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >> >> >> >> -- >> --- >> Takeshi Yamamuro > > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Best way to determine # of workers
Here is the doc for defaultParallelism : /** Default level of parallelism to use when not given by user (e.g. parallelize and makeRDD). */ def defaultParallelism: Int = { What if the user changes parallelism ? Cheers On Fri, Mar 25, 2016 at 5:33 AM, manasdebashiskar <poorinsp...@gmail.com> wrote: > There is a sc.sparkDefaultParallelism parameter that I use to dynamically > maintain elasticity in my application. Depending upon your scenario this > might be enough. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Best-way-to-determine-of-workers-tp26586p26594.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: Best way to determine # of workers
There is a sc.sparkDefaultParallelism parameter that I use to dynamically maintain elasticity in my application. Depending upon your scenario this might be enough. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Best-way-to-determine-of-workers-tp26586p26594.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Best way to determine # of workers
Well thats unfortunate, just means I have to scrape the webui for that information. As to why, I have a cluster that is being increased in size to accommodate the processing requirements of a large set of jobs. Its useful to know when the new workers have joined the spark cluster. In my specific case, I may be growing the cluster size by a hundred nodes and if I fail to wait for that initialization to complete the job will not have enough memory to run my jobs. Aaron On Thu, Mar 24, 2016 at 3:07 AM, Takeshi Yamamuro <linguin@gmail.com> wrote: > Hi, > > There is no way to get such information from your app. > Why do you need that? > > thanks, > maropu > > On Thu, Mar 24, 2016 at 8:23 AM, Ajaxx <ajack...@pobox.com> wrote: > >> I'm building some elasticity into my model and I'd like to know when my >> workers have come online. It appears at present that the API only >> supports >> getting information about applications. Is there a good way to determine >> how many workers are available? >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Best-way-to-determine-of-workers-tp26586.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > > > -- > --- > Takeshi Yamamuro >
Re: Best way to determine # of workers
Hi, There is no way to get such information from your app. Why do you need that? thanks, maropu On Thu, Mar 24, 2016 at 8:23 AM, Ajaxx <ajack...@pobox.com> wrote: > I'm building some elasticity into my model and I'd like to know when my > workers have come online. It appears at present that the API only supports > getting information about applications. Is there a good way to determine > how many workers are available? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Best-way-to-determine-of-workers-tp26586.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- --- Takeshi Yamamuro
Best way to determine # of workers
I'm building some elasticity into my model and I'd like to know when my workers have come online. It appears at present that the API only supports getting information about applications. Is there a good way to determine how many workers are available? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Best-way-to-determine-of-workers-tp26586.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org