For dealing with such type of problem, seems trident is better than spout+bolts even latter is easier to understand and learn?
AL > On Aug 25, 2015, at 9:31 PM, Kishore Senji <[email protected]> wrote: > > Agreed. This makes sense if the aggregation is on fields etc. > > Although Alec did not mention it in this post, based on his previous posts on > the same topic, I would assume he is trying to sort the events because he > wanted to "fill in" the missing events (smoothening the curve so to speak) by > looking at the previous and next events of the missed timestamp and then do > some stream processing on top of it (like for example alerting based on > sliding window). Assuming that is the scenario, I guess then he would have to > keep more metadata in the State so that he can fill in those events but the > question would be when would he stop looking for missing events and fill them > and move on (as they can come in different batches), plus he would have to do > some stream processing (or store them to ES for later search for example) in > the State itself if there is any such processing. This is where I think it > gets tricky to do this in the partition aggregator. > > So in our earlier posts we suggested he can do the the appropriate > partitioning in Kafka (so that events from a given device ends up in the same > partition) and he could do the window based sorting (by buffering few events) > in the Stream processing. > > Alec, Please ignore the above if my assumption is not correct. > > > On Tue, Aug 25, 2015 at 6:19 PM, Andrew Xor <[email protected] > <mailto:[email protected]>> wrote: > This is not an issue, as that probably would be done through a partition > aggregator after the groupBy. > > Kindly yours, > > Andrew Grammenos > > -- PGP PKey -- > <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt> > https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt > <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt> > > On Wed, Aug 26, 2015 at 4:16 AM, Kishore Senji <[email protected] > <mailto:[email protected]>> wrote: > Interesting. But wouldn't this be impacted by the trident batch size? > > Assuming the batch boundary is like below, after bucketing you would groupBy > on the start time (but how would you sort it?) and assumed it can be sorted, > we should be done with that batch. so if the batch boundary is like below, > you would end up with two different sets of sorts for events which are > supposed to be together (12:44, 12:45 & 12:46 below). If I understand the > original question, it is how to sort the full stream of events irrespective > of how they are processed in batches. > > 2013-03-22 12:43:00-07:00 > 2013-03-22 12:44:00-07:00 > 2013-03-22 12:45:00-07:00 > 2013-03-22 12:49:00-07:00 > 2013-03-22 12:47:00-07:00 > -------------------------------------- > 2013-03-22 12:48:00-07:00 > 2013-03-22 12:46:00-07:00 > 2013-03-22 12:51:00-07:00 > 2013-03-22 12:50:00-07:00 > 2013-03-22 12:52:00-07:00 > > > > > On Tue, Aug 25, 2015 at 4:58 PM, Andrew Xor <[email protected] > <mailto:[email protected]>> wrote: > Yes, unless I am missing something... try it and if you have any more > problems drop an email. > > Regards. > > Kindly yours, > > Andrew Grammenos > > -- PGP PKey -- > <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt> > https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt > <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt> > > On Wed, Aug 26, 2015 at 2:46 AM, Alec Lee <[email protected] > <mailto:[email protected]>> wrote: > WoW, that code seems to be exactly I want, will read through, double check, I > will still need a partition aggregator to actually sorting after > bucketization, right? > > thanks > > >> On Aug 25, 2015, at 4:40 PM, Andrew Xor <[email protected] >> <mailto:[email protected]>> wrote: >> >> Sure, I found this code useful to start with; he does bucketization for >> timed intervals in this gist >> <https://gist.github.com/codyaray/75533044fc8c0a12fa67>. >> >> Hope this helps. >> >> Kindly yours, >> >> Andrew Grammenos >> >> -- PGP PKey -- >> <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt> >> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt >> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt> >> >> On Wed, Aug 26, 2015 at 2:36 AM, Alec Lee <[email protected] >> <mailto:[email protected]>> wrote: >> All right, will do trident instead, shameless to ask again, any example code >> (particularly for events time sorting) to study? >> >> thanks >> >> >>> On Aug 25, 2015, at 4:31 PM, Andrew Xor <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Well, if you need to just preserve the order of received (event) tuples >>> then why not use trident instead? Trident ensures correct ordering >>> (chronologically) as well as exactly once processing without any gimmicks; >>> sorting it secondary to the event generation sounds like you will enter >>> into quite a bit of hassle for no reason. >>> >>> Regards. >>> >>> Kindly yours, >>> >>> Andrew Grammenos >>> >>> -- PGP PKey -- >>> <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt> >>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt >>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt> >>> >>> On Wed, Aug 26, 2015 at 2:00 AM, Alec Lee <[email protected] >>> <mailto:[email protected]>> wrote: >>> BTW, I am using spout and bolts, currently not using trident. Thanks >>> >>> >>>> On Aug 25, 2015, at 3:47 PM, Andrew Xor <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> What do you mean by that? It's a bit vague as timestamps can have quite >>>> high resolution (like for example minutes, seconds, msec) so you will >>>> probably have to do a bit of bucketization before sorting them.... then by >>>> using a partition aggregator (in Trident at least) you can to this very >>>> easily. >>>> >>>> Hope this helps. >>>> >>>> Kindly yours, >>>> >>>> Andrew Grammenos >>>> >>>> -- PGP PKey -- >>>> <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt> >>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt >>>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt> >>>> >>>> On Wed, Aug 26, 2015 at 1:37 AM, Alec Lee <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> Hi, all >>>> >>>> is there any sample codes to sort the events in terms of the timestamps >>>> field of a tuple? >>>> >>>> thanks >>>> >>>> >>>> AL >>>> >>> >>> >> >> > > > > >
