Thanks Ryan for the correction. Posted to the wrong user list :(


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 5 May 2016 at 19:35, Ryan Harris <ryan.har...@zionsbancorp.com> wrote:

> This is really outside of the scope of Hive and would probably be better
> addressed by the Spark community, however I can say that this very much
> depends on your use case....
>
>
>
> Take a look at this discussion if you haven't already:
>
> https://groups.google.com/forum/embed/#!topic/spark-users/GQoxJHAAtX4
>
>
>
> Generally speaking, the larger the batch window, the better the overall
> performance, but the streaming data output will be updated less
> frequently.....you will likely run into problems setting your batch window
> < 0.5 sec, and/or when the batch window < the amount of time it takes to
> run the task....
>
>
>
> Beyond that, the window length and sliding interval need to be multiples
> of the batch window, but will depend entirely on your reporting
> requirements.
>
>
>
> it would be perfectly reasonable to have
>
> batch window = 30 secs
>
> window length = 1 hour
>
> sliding interval = 5 mins
>
>
>
> In that case, you'd be creating an output every 5 mins, aggregating data
> that you were collecting every 30 seconds over a previous 1 hour period of
> time...
>
>
>
> could you set the batch window to 5 mins?  Possibly, depending on the data
> source, but perhaps you are already using that source on a more frequent
> basis elsewhere....or maybe you only have a 1 min buffer on the source
> data....lots of possibilities, which is why there is the flexibility and no
> hard/fast rule....
>
>
>
> If you were trying to create continuously streaming output as fast as
> possible, then you would probably (almost always) be setting your sliding
> interval = batch window and then shrinking the batch window as short as
> possible.
>
>
>
> More documentation here:
>
>
> https://databricks.gitbooks.io/databricks-spark-reference-applications/content/logs_analyzer/chapter1/windows.html
>
>
>
>
>
>
>
> *From:* Mich Talebzadeh [mailto:mich.talebza...@gmail.com]
> *Sent:* Thursday, May 05, 2016 4:26 AM
> *To:* user
> *Subject:* Re: Spark Streaming, Batch interval, Windows length and
> Sliding Interval settings
>
>
>
> Any ideas/experience on this?
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn  
> *https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
>
>
> On 4 May 2016 at 21:45, Mich Talebzadeh <mich.talebza...@gmail.com> wrote:
>
> Hi,
>
>
>
> Just wanted opinions on this.
>
>
>
> In Spark streaming the parameter
>
>
>
> val ssc = new StreamingContext(sparkConf, Seconds(n))
>
>
>
> defines the batch or sample interval for the incoming streams
>
>
>
> In addition there is windows Length
>
>
>
> // window length - The duration of the window below that must be multiple
> of batch interval n in = > StreamingContext(sparkConf, Seconds(n))
>
>
>
> val windowLength = L
>
>
>
> And fibally the sliding interval
>
> // sliding interval - The interval at which the window operation is
> performed
>
>
>
> val slidingInterval = I
>
>
>
> OK so as given the windowLength  L = multiples of n and the
> slidingInterval has to be consistent to ensure that we can the head and
> tail of the window.
>
>
>
> So as a heuristic approach for a batch interval of say 10 seconds, I put
> the windows length at 3 times  that = 30 seconds and make the
> slidinginterval = batch interval = 10.
>
>
>
> Obviously these are subjective depending on what is being measured.
> However, I believe having slidinginterval = batch interval makes sense?
>
>
>
> Appreciate any views on this.
>
>
>
> Thanks,
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn  
> *https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
>
> ------------------------------
> THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS
> CONFIDENTIAL and may contain information that is privileged and exempt from
> disclosure under applicable law. If you are neither the intended recipient
> nor responsible for delivering the message to the intended recipient,
> please note that any dissemination, distribution, copying or the taking of
> any action in reliance upon the message is strictly prohibited. If you have
> received this communication in error, please notify the sender immediately.
> Thank you.
>

Reply via email to