Re: Best way to determine # of workers

2016-03-25 Thread Aaron Jackson
I think the SparkListener is about as close as it gets.  That way I can
start up the instance (aws, open-stack, vmware, etc) and simply wait until
the SparkListener indicates that the executors are online before starting.
Thanks for the advise.

Aaron

On Fri, Mar 25, 2016 at 10:54 AM, Jacek Laskowski <ja...@japila.pl> wrote:

> Hi,
>
> You may want to use SparkListener [1] (as webui) and listens to
> SparkListenerExecutorAdded and SparkListenerExecutorRemoved.
>
> [1]
> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.scheduler.SparkListener
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Thu, Mar 24, 2016 at 3:24 PM, Aaron Jackson <ajack...@pobox.com> wrote:
> > Well thats unfortunate, just means I have to scrape the webui for that
> > information.  As to why, I have a cluster that is being increased in
> size to
> > accommodate the processing requirements of a large set of jobs.  Its
> useful
> > to know when the new workers have joined the spark cluster.  In my
> specific
> > case, I may be growing the cluster size by a hundred nodes and if I fail
> to
> > wait for that initialization to complete the job will not have enough
> memory
> > to run my jobs.
> >
> > Aaron
> >
> > On Thu, Mar 24, 2016 at 3:07 AM, Takeshi Yamamuro <linguin@gmail.com
> >
> > wrote:
> >>
> >> Hi,
> >>
> >> There is no way to get such information from your app.
> >> Why do you need that?
> >>
> >> thanks,
> >> maropu
> >>
> >> On Thu, Mar 24, 2016 at 8:23 AM, Ajaxx <ajack...@pobox.com> wrote:
> >>>
> >>> I'm building some elasticity into my model and I'd like to know when my
> >>> workers have come online.  It appears at present that the API only
> >>> supports
> >>> getting information about applications.  Is there a good way to
> determine
> >>> how many workers are available?
> >>>
> >>>
> >>>
> >>> --
> >>> View this message in context:
> >>>
> http://apache-spark-user-list.1001560.n3.nabble.com/Best-way-to-determine-of-workers-tp26586.html
> >>> Sent from the Apache Spark User List mailing list archive at
> Nabble.com.
> >>>
> >>> -
> >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >>> For additional commands, e-mail: user-h...@spark.apache.org
> >>>
> >>
> >>
> >>
> >> --
> >> ---
> >> Takeshi Yamamuro
> >
> >
>


Re: Best way to determine # of workers

2016-03-25 Thread Jacek Laskowski
Hi,

You may want to use SparkListener [1] (as webui) and listens to
SparkListenerExecutorAdded and SparkListenerExecutorRemoved.

[1] 
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.scheduler.SparkListener

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Thu, Mar 24, 2016 at 3:24 PM, Aaron Jackson <ajack...@pobox.com> wrote:
> Well thats unfortunate, just means I have to scrape the webui for that
> information.  As to why, I have a cluster that is being increased in size to
> accommodate the processing requirements of a large set of jobs.  Its useful
> to know when the new workers have joined the spark cluster.  In my specific
> case, I may be growing the cluster size by a hundred nodes and if I fail to
> wait for that initialization to complete the job will not have enough memory
> to run my jobs.
>
> Aaron
>
> On Thu, Mar 24, 2016 at 3:07 AM, Takeshi Yamamuro <linguin@gmail.com>
> wrote:
>>
>> Hi,
>>
>> There is no way to get such information from your app.
>> Why do you need that?
>>
>> thanks,
>> maropu
>>
>> On Thu, Mar 24, 2016 at 8:23 AM, Ajaxx <ajack...@pobox.com> wrote:
>>>
>>> I'm building some elasticity into my model and I'd like to know when my
>>> workers have come online.  It appears at present that the API only
>>> supports
>>> getting information about applications.  Is there a good way to determine
>>> how many workers are available?
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Best-way-to-determine-of-workers-tp26586.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Best way to determine # of workers

2016-03-25 Thread Ted Yu
Here is the doc for defaultParallelism :

  /** Default level of parallelism to use when not given by user (e.g.
parallelize and makeRDD). */
  def defaultParallelism: Int = {

What if the user changes parallelism ?

Cheers

On Fri, Mar 25, 2016 at 5:33 AM, manasdebashiskar <poorinsp...@gmail.com>
wrote:

> There is a sc.sparkDefaultParallelism parameter that I use to dynamically
> maintain elasticity in my application. Depending upon your scenario this
> might be enough.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Best-way-to-determine-of-workers-tp26586p26594.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Best way to determine # of workers

2016-03-25 Thread manasdebashiskar
There is a sc.sparkDefaultParallelism parameter that I use to dynamically
maintain elasticity in my application. Depending upon your scenario this
might be enough.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Best-way-to-determine-of-workers-tp26586p26594.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Best way to determine # of workers

2016-03-24 Thread Aaron Jackson
Well thats unfortunate, just means I have to scrape the webui for that
information.  As to why, I have a cluster that is being increased in size
to accommodate the processing requirements of a large set of jobs.  Its
useful to know when the new workers have joined the spark cluster.  In my
specific case, I may be growing the cluster size by a hundred nodes and if
I fail to wait for that initialization to complete the job will not have
enough memory to run my jobs.

Aaron

On Thu, Mar 24, 2016 at 3:07 AM, Takeshi Yamamuro <linguin@gmail.com>
wrote:

> Hi,
>
> There is no way to get such information from your app.
> Why do you need that?
>
> thanks,
> maropu
>
> On Thu, Mar 24, 2016 at 8:23 AM, Ajaxx <ajack...@pobox.com> wrote:
>
>> I'm building some elasticity into my model and I'd like to know when my
>> workers have come online.  It appears at present that the API only
>> supports
>> getting information about applications.  Is there a good way to determine
>> how many workers are available?
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Best-way-to-determine-of-workers-tp26586.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>


Re: Best way to determine # of workers

2016-03-24 Thread Takeshi Yamamuro
Hi,

There is no way to get such information from your app.
Why do you need that?

thanks,
maropu

On Thu, Mar 24, 2016 at 8:23 AM, Ajaxx <ajack...@pobox.com> wrote:

> I'm building some elasticity into my model and I'd like to know when my
> workers have come online.  It appears at present that the API only supports
> getting information about applications.  Is there a good way to determine
> how many workers are available?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Best-way-to-determine-of-workers-tp26586.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
---
Takeshi Yamamuro


Best way to determine # of workers

2016-03-23 Thread Ajaxx
I'm building some elasticity into my model and I'd like to know when my
workers have come online.  It appears at present that the API only supports
getting information about applications.  Is there a good way to determine
how many workers are available?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Best-way-to-determine-of-workers-tp26586.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org