Interesting. But wouldn't this be impacted by the trident batch size? Assuming the batch boundary is like below, after bucketing you would groupBy on the start time (but how would you sort it?) and assumed it can be sorted, we should be done with that batch. so if the batch boundary is like below, you would end up with two different sets of sorts for events which are supposed to be together (12:44, 12:45 & 12:46 below). If I understand the original question, it is how to sort the full stream of events irrespective of how they are processed in batches.
2013-03-22 12:43:00-07:00 2013-03-22 12:44:00-07:00 2013-03-22 12:45:00-07:00 2013-03-22 12:49:00-07:00 2013-03-22 12:47:00-07:00 -------------------------------------- 2013-03-22 12:48:00-07:00 2013-03-22 12:46:00-07:00 2013-03-22 12:51:00-07:00 2013-03-22 12:50:00-07:00 2013-03-22 12:52:00-07:00 On Tue, Aug 25, 2015 at 4:58 PM, Andrew Xor <[email protected]> wrote: > Yes, unless I am missing something... try it and if you have any more > problems drop an email. > > Regards. > > Kindly yours, > > Andrew Grammenos > > -- PGP PKey -- > <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt> > https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt > <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt> > > On Wed, Aug 26, 2015 at 2:46 AM, Alec Lee <[email protected]> wrote: > >> WoW, that code seems to be exactly I want, will read through, double >> check, I will still need a partition aggregator to actually sorting after >> bucketization, right? >> >> thanks >> >> >> On Aug 25, 2015, at 4:40 PM, Andrew Xor <[email protected]> >> wrote: >> >> Sure, I found this code useful to start with; he does bucketization for >> timed intervals in this gist >> <https://gist.github.com/codyaray/75533044fc8c0a12fa67>. >> >> Hope this helps. >> >> Kindly yours, >> >> Andrew Grammenos >> >> -- PGP PKey -- >> <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt> >> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt >> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt> >> >> On Wed, Aug 26, 2015 at 2:36 AM, Alec Lee <[email protected]> wrote: >> >>> All right, will do trident instead, shameless to ask again, any example >>> code (particularly for events time sorting) to study? >>> >>> thanks >>> >>> >>> On Aug 25, 2015, at 4:31 PM, Andrew Xor <[email protected]> >>> wrote: >>> >>> Well, if you need to just preserve the order of received (event) tuples >>> then why not use trident instead? Trident ensures correct ordering >>> (chronologically) as well as exactly once processing without any gimmicks; >>> sorting it secondary to the event generation sounds like you will enter >>> into quite a bit of hassle for no reason. >>> >>> Regards. >>> >>> Kindly yours, >>> >>> Andrew Grammenos >>> >>> -- PGP PKey -- >>> <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt> >>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt >>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt> >>> >>> On Wed, Aug 26, 2015 at 2:00 AM, Alec Lee <[email protected]> wrote: >>> >>>> BTW, I am using spout and bolts, currently not using trident. Thanks >>>> >>>> >>>> On Aug 25, 2015, at 3:47 PM, Andrew Xor <[email protected]> >>>> wrote: >>>> >>>> What do you mean by that? It's a bit vague as timestamps can have quite >>>> high resolution (like for example minutes, seconds, msec) so you will >>>> probably have to do a bit of bucketization before sorting them.... then by >>>> using a partition aggregator (in Trident at least) you can to this very >>>> easily. >>>> >>>> Hope this helps. >>>> >>>> Kindly yours, >>>> >>>> Andrew Grammenos >>>> >>>> -- PGP PKey -- >>>> <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt> >>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt >>>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt> >>>> >>>> On Wed, Aug 26, 2015 at 1:37 AM, Alec Lee <[email protected]> wrote: >>>> >>>>> Hi, all >>>>> >>>>> is there any sample codes to sort the events in terms of the >>>>> timestamps field of a tuple? >>>>> >>>>> thanks >>>>> >>>>> >>>>> AL >>>> >>>> >>>> >>>> >>> >>> >> >> >
