I suspect you do not actually need to change the number of partitions
dynamically.

Do you just have groupings of data to process? use an RDD of (K,V) pairs
and things like groupByKey. If really have only 1000 unique keys, yes, only
half of the 2000 workers would get data in a phase that groups by
segmentation key I suppose. I'd then ask a) do you really need 2000 workers
or can that be tuned down? b) can you not divide up the day more than 1000
ways? c) does it matter so much in this phase, if you can exploit 2000 way
parallelism elsewhere in the pipeline?

On Sat, Oct 11, 2014 at 3:42 AM, nitinkak001 <nitinkak...@gmail.com> wrote:

> Thanks @category_theory, the post was of great help!!
>
> I had to learn a few thing before I could understand it completely.
> However, I am facing the issue of partitioning the data (using partitionBy)
> without providing a hardcoded value for number of partitions. The
> partitions need to be driven by data(segmentation key I am using) in my
> case.
>
> So my question is say if
>
> the number of partitions generated by my segmentation key = 1000
> the number given to the partitioner = 2000
>
> In this case, would there be 2000 partitions created(which will break the
> partition boundary of the segmentation key)? If so then sliding window will
> roll over multiple partitions and computation would generate wrong results.
>
> Thanks again for the response!!
>
> On Tue, Sep 30, 2014 at 11:51 AM, category_theory [via Apache Spark User
> List] <[hidden email] <http://user/SendEmail.jtp?type=node&node=16201&i=0>
> > wrote:
>
>> Not sure if this is what you are after but its based on a moving average
>> within spark...  I was building an ARIMA model on top of spark and this
>> helped me out a lot:
>>
>> http://stackoverflow.com/questions/23402303/apache-spark-moving-average
>> ᐧ
>>
>>
>>
>>
>> *JIMMY MCERLAIN*
>>
>> DATA SCIENTIST (NERD)
>>
>> *. . . . . . . . . . . . . . . . . .*
>>
>>
>> *IF WE CAN’T DOUBLE YOUR SALES,*
>>
>>
>>
>> *ONE OF US IS IN THE WRONG BUSINESS.*
>>
>> *E*: [hidden email] <http://user/SendEmail.jtp?type=node&node=15407&i=0>
>>
>>
>> *M*: *<a href="tel:510.303.7751" value="+15103037751 <%2B15103037751>"
>> target="_blank">510.303.7751*
>>
>> On Tue, Sep 30, 2014 at 8:19 AM, nitinkak001 <[hidden email]
>> <http://user/SendEmail.jtp?type=node&node=15407&i=1>> wrote:
>>
>>> Any ideas guys?
>>>
>>> Trying to find some information online. Not much luck so far.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Window-comparison-matching-using-the-sliding-window-functionality-feasibility-tp15352p15404.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> <http://user/SendEmail.jtp?type=node&node=15407&i=2>
>>> For additional commands, e-mail: [hidden email]
>>> <http://user/SendEmail.jtp?type=node&node=15407&i=3>
>>>
>>>
>>
>>
>> ------------------------------
>>  If you reply to this email, your message will be added to the
>> discussion below:
>>
>> http://apache-spark-user-list.1001560.n3.nabble.com/Window-comparison-matching-using-the-sliding-window-functionality-feasibility-tp15352p15407.html
>>  To unsubscribe from Window comparison matching using the sliding window
>> functionality: feasibility, click here.
>> NAML
>> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>
> ------------------------------
> View this message in context: Re: Window comparison matching using the
> sliding window functionality: feasibility
> <http://apache-spark-user-list.1001560.n3.nabble.com/Window-comparison-matching-using-the-sliding-window-functionality-feasibility-tp15352p16201.html>
> Sent from the Apache Spark User List mailing list archive
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>

Reply via email to