How about existing solution, for example, mapreduce online model?
On Wed, Feb 19, 2014 at 8:15 AM, Mayur Rustagi <mayur.rust...@gmail.com>wrote: > also can we batch by other stuff like number of files as well as time? > > Mayur Rustagi > Ph: +919632149971 > h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com > https://twitter.com/mayur_rustagi > > > > On Wed, Feb 19, 2014 at 5:05 AM, dachuan <hdc1...@gmail.com> wrote: > >> I don't have a conclusive answer but I would like to discuss this. >> >> If one node CPU is slower than the other, Windowing in absolute time >> won't cause any trouble because data are well partitioned. >> On Feb 19, 2014 1:06 AM, "Aries Kong" <aries.ko...@gmail.com> wrote: >> >>> hi all, >>> >>> It seems that the Windowing in Spark Streaming Driven by absolutely >>> time not conventionally by the timestamp of the data, can anybody >>> kindly explains why? How can I do if I need Windowing driven by the >>> data-timestamp? >>> >>> Thanks! >>> >>> >>> Aries.Kong >>> >> > -- Dachuan Huang Cellphone: 614-390-7234 2015 Neil Avenue Ohio State University Columbus, Ohio U.S.A. 43210