Agreed. This makes sense if the aggregation is on fields etc. Although Alec did not mention it in this post, based on his previous posts on the same topic, I would assume he is trying to sort the events because he wanted to "fill in" the missing events (smoothening the curve so to speak) by looking at the previous and next events of the missed timestamp and then do some stream processing on top of it (like for example alerting based on sliding window). Assuming that is the scenario, I guess then he would have to keep more metadata in the State so that he can fill in those events but the question would be when would he stop looking for missing events and fill them and move on (as they can come in different batches), plus he would have to do some stream processing (or store them to ES for later search for example) in the State itself if there is any such processing. This is where I think it gets tricky to do this in the partition aggregator.
So in our earlier posts we suggested he can do the the appropriate partitioning in Kafka (so that events from a given device ends up in the same partition) and he could do the window based sorting (by buffering few events) in the Stream processing. Alec, Please ignore the above if my assumption is not correct. On Tue, Aug 25, 2015 at 6:19 PM, Andrew Xor <[email protected]> wrote: > This is not an issue, as that probably would be done through a partition > aggregator after the groupBy. > > Kindly yours, > > Andrew Grammenos > > -- PGP PKey -- > <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt> > https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt > <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt> > > On Wed, Aug 26, 2015 at 4:16 AM, Kishore Senji <[email protected]> wrote: > >> Interesting. But wouldn't this be impacted by the trident batch size? >> >> Assuming the batch boundary is like below, after bucketing you would >> groupBy on the start time (but how would you sort it?) and assumed it can >> be sorted, we should be done with that batch. so if the batch boundary is >> like below, you would end up with two different sets of sorts for events >> which are supposed to be together (12:44, 12:45 & 12:46 below). If I >> understand the original question, it is how to sort the full stream of >> events irrespective of how they are processed in batches. >> >> 2013-03-22 12:43:00-07:00 >> 2013-03-22 12:44:00-07:00 >> 2013-03-22 12:45:00-07:00 >> 2013-03-22 12:49:00-07:00 >> 2013-03-22 12:47:00-07:00 >> -------------------------------------- >> 2013-03-22 12:48:00-07:00 >> 2013-03-22 12:46:00-07:00 >> 2013-03-22 12:51:00-07:00 >> 2013-03-22 12:50:00-07:00 >> 2013-03-22 12:52:00-07:00 >> >> >> >> >> On Tue, Aug 25, 2015 at 4:58 PM, Andrew Xor <[email protected]> >> wrote: >> >>> Yes, unless I am missing something... try it and if you have any more >>> problems drop an email. >>> >>> Regards. >>> >>> Kindly yours, >>> >>> Andrew Grammenos >>> >>> -- PGP PKey -- >>> <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt> >>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt >>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt> >>> >>> On Wed, Aug 26, 2015 at 2:46 AM, Alec Lee <[email protected]> wrote: >>> >>>> WoW, that code seems to be exactly I want, will read through, double >>>> check, I will still need a partition aggregator to actually sorting after >>>> bucketization, right? >>>> >>>> thanks >>>> >>>> >>>> On Aug 25, 2015, at 4:40 PM, Andrew Xor <[email protected]> >>>> wrote: >>>> >>>> Sure, I found this code useful to start with; he does bucketization for >>>> timed intervals in this gist >>>> <https://gist.github.com/codyaray/75533044fc8c0a12fa67>. >>>> >>>> Hope this helps. >>>> >>>> Kindly yours, >>>> >>>> Andrew Grammenos >>>> >>>> -- PGP PKey -- >>>> <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt> >>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt >>>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt> >>>> >>>> On Wed, Aug 26, 2015 at 2:36 AM, Alec Lee <[email protected]> wrote: >>>> >>>>> All right, will do trident instead, shameless to ask again, any >>>>> example code (particularly for events time sorting) to study? >>>>> >>>>> thanks >>>>> >>>>> >>>>> On Aug 25, 2015, at 4:31 PM, Andrew Xor <[email protected]> >>>>> wrote: >>>>> >>>>> Well, if you need to just preserve the order of received (event) >>>>> tuples then why not use trident instead? Trident ensures correct ordering >>>>> (chronologically) as well as exactly once processing without any gimmicks; >>>>> sorting it secondary to the event generation sounds like you will enter >>>>> into quite a bit of hassle for no reason. >>>>> >>>>> Regards. >>>>> >>>>> Kindly yours, >>>>> >>>>> Andrew Grammenos >>>>> >>>>> -- PGP PKey -- >>>>> <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt> >>>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt >>>>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt> >>>>> >>>>> On Wed, Aug 26, 2015 at 2:00 AM, Alec Lee <[email protected]> >>>>> wrote: >>>>> >>>>>> BTW, I am using spout and bolts, currently not using trident. Thanks >>>>>> >>>>>> >>>>>> On Aug 25, 2015, at 3:47 PM, Andrew Xor <[email protected]> >>>>>> wrote: >>>>>> >>>>>> What do you mean by that? It's a bit vague as timestamps can have >>>>>> quite high resolution (like for example minutes, seconds, msec) so you >>>>>> will >>>>>> probably have to do a bit of bucketization before sorting them.... then >>>>>> by >>>>>> using a partition aggregator (in Trident at least) you can to this very >>>>>> easily. >>>>>> >>>>>> Hope this helps. >>>>>> >>>>>> Kindly yours, >>>>>> >>>>>> Andrew Grammenos >>>>>> >>>>>> -- PGP PKey -- >>>>>> <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt> >>>>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt >>>>>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt> >>>>>> >>>>>> On Wed, Aug 26, 2015 at 1:37 AM, Alec Lee <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi, all >>>>>>> >>>>>>> is there any sample codes to sort the events in terms of the >>>>>>> timestamps field of a tuple? >>>>>>> >>>>>>> thanks >>>>>>> >>>>>>> >>>>>>> AL >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >> >
