Thanks Ryan for the correction. Posted to the wrong user list :(
Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 5 May 2016 at 19:35, Ryan Harris <ryan.har...@zionsbancorp.com> wrote: > This is really outside of the scope of Hive and would probably be better > addressed by the Spark community, however I can say that this very much > depends on your use case.... > > > > Take a look at this discussion if you haven't already: > > https://groups.google.com/forum/embed/#!topic/spark-users/GQoxJHAAtX4 > > > > Generally speaking, the larger the batch window, the better the overall > performance, but the streaming data output will be updated less > frequently.....you will likely run into problems setting your batch window > < 0.5 sec, and/or when the batch window < the amount of time it takes to > run the task.... > > > > Beyond that, the window length and sliding interval need to be multiples > of the batch window, but will depend entirely on your reporting > requirements. > > > > it would be perfectly reasonable to have > > batch window = 30 secs > > window length = 1 hour > > sliding interval = 5 mins > > > > In that case, you'd be creating an output every 5 mins, aggregating data > that you were collecting every 30 seconds over a previous 1 hour period of > time... > > > > could you set the batch window to 5 mins? Possibly, depending on the data > source, but perhaps you are already using that source on a more frequent > basis elsewhere....or maybe you only have a 1 min buffer on the source > data....lots of possibilities, which is why there is the flexibility and no > hard/fast rule.... > > > > If you were trying to create continuously streaming output as fast as > possible, then you would probably (almost always) be setting your sliding > interval = batch window and then shrinking the batch window as short as > possible. > > > > More documentation here: > > > https://databricks.gitbooks.io/databricks-spark-reference-applications/content/logs_analyzer/chapter1/windows.html > > > > > > > > *From:* Mich Talebzadeh [mailto:mich.talebza...@gmail.com] > *Sent:* Thursday, May 05, 2016 4:26 AM > *To:* user > *Subject:* Re: Spark Streaming, Batch interval, Windows length and > Sliding Interval settings > > > > Any ideas/experience on this? > > > Dr Mich Talebzadeh > > > > LinkedIn > *https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > > > On 4 May 2016 at 21:45, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > > Hi, > > > > Just wanted opinions on this. > > > > In Spark streaming the parameter > > > > val ssc = new StreamingContext(sparkConf, Seconds(n)) > > > > defines the batch or sample interval for the incoming streams > > > > In addition there is windows Length > > > > // window length - The duration of the window below that must be multiple > of batch interval n in = > StreamingContext(sparkConf, Seconds(n)) > > > > val windowLength = L > > > > And fibally the sliding interval > > // sliding interval - The interval at which the window operation is > performed > > > > val slidingInterval = I > > > > OK so as given the windowLength L = multiples of n and the > slidingInterval has to be consistent to ensure that we can the head and > tail of the window. > > > > So as a heuristic approach for a batch interval of say 10 seconds, I put > the windows length at 3 times that = 30 seconds and make the > slidinginterval = batch interval = 10. > > > > Obviously these are subjective depending on what is being measured. > However, I believe having slidinginterval = batch interval makes sense? > > > > Appreciate any views on this. > > > > Thanks, > > > Dr Mich Talebzadeh > > > > LinkedIn > *https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > > ------------------------------ > THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS > CONFIDENTIAL and may contain information that is privileged and exempt from > disclosure under applicable law. If you are neither the intended recipient > nor responsible for delivering the message to the intended recipient, > please note that any dissemination, distribution, copying or the taking of > any action in reliance upon the message is strictly prohibited. If you have > received this communication in error, please notify the sender immediately. > Thank you. >