mmmmh ok. Providing a pseudo-code would require me to be a bit more awake -- 2AM in Belgium... have to go to sleep, otherwise the pseudo would be more pseudo than code...
However, regarding your (LaTeX style) formula, imho in a streaming use case, k will probably vary with the velocity... which is not the case in batch (cause you can prepare the data beforehand). So I guess you could find an approximated result by "simply" reducing over a window. Note that I'm assuming that the x's are the data flowing in the stream (there are singularities not components of rows incoming atomically as instance of T in RDD[T]). Maybe could you have a look at reduceByWindow (first the scaladoc then google because there are a plenty of examples in either Spark and Storm). HTH Andy Petrella Belgium (Liège) * ********* IT Consultant for *NextLab <http://nextlab.be/> sprl* (co-founder) Engaged Citizen Coder for *WAJUG <http://wajug.be/>* (co-founder) Author of *Learning Play! Framework 2*<http://www.packtpub.com/learning-play-framework-2/book> * *********Mobile: *+32 495 99 11 04* Mails: - [email protected] - [email protected] Socials: - Twitter: https://twitter.com/#!/noootsab - LinkedIn: http://be.linkedin.com/in/andypetrella - Blogger: http://ska-la.blogspot.com/ - GitHub: https://github.com/andypetrella - Masterbranch: https://masterbranch.com/andy.petrella On Wed, Nov 20, 2013 at 1:37 AM, Michael Kun Yang <[email protected]>wrote: > the use case is to fit a moving average model for stock prices with the > form: > x_n = \sum_{i = 1}^k \alpha_i * x_{n - i} > > Can you please provide me the pseudo-code? > > > On Tue, Nov 19, 2013 at 4:30 PM, andy petrella <[email protected]>wrote: > >> 1/ you mean like reshape in R? >> 2/ Or you mean by windowing the stream on a period basis? >> >> 1/ if you have RDD[Seq[Any]], you can have an RDD[Seq[Seq[Any]]] using >> `transform` then `sliding(6,6)` on the passed Seq >> 2/ > If the period is time you may check the method ending with `window` >> in DStream >> > otherwise... I don't see the use case, so I'd have some >> difficulties helping you ^ ^ >> >> Also, if you're building (row, col) pairs along with each cell data when >> the data is coming along, the PairedDStreamFunctions could help you if your >> put this pair as a key. But I'm just guessing... >> >> HTH (a bit :D) >> >> andy >> >> >> On Wed, Nov 20, 2013 at 1:01 AM, Michael Kun Yang >> <[email protected]>wrote: >> >>> Hi spark-enthusiasts, >>> >>> I am new to spark streaming. I need to convert streaming data into >>> table. >>> >>> How to convert a data stream >>> {x_1, x_2, x_3, ..., x_n, ...} >>> into a table with the format: >>> x_1, x_2, x_3, x_4, x_5, x_6 >>> x_2, x_3, x_4, x_5, x_6, x_7 >>> ... >>> x_{n + 1}, x_{n + 2}, ..., x_{n + 7} >>> ... >>> >>> Thank you very much! >>> -Kun >>> >> >> >
