This is not an issue, as that probably would be done through a partition aggregator after the groupBy.
Kindly yours, Andrew Grammenos -- PGP PKey -- <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt> On Wed, Aug 26, 2015 at 4:16 AM, Kishore Senji <[email protected]> wrote: > Interesting. But wouldn't this be impacted by the trident batch size? > > Assuming the batch boundary is like below, after bucketing you would > groupBy on the start time (but how would you sort it?) and assumed it can > be sorted, we should be done with that batch. so if the batch boundary is > like below, you would end up with two different sets of sorts for events > which are supposed to be together (12:44, 12:45 & 12:46 below). If I > understand the original question, it is how to sort the full stream of > events irrespective of how they are processed in batches. > > 2013-03-22 12:43:00-07:00 > 2013-03-22 12:44:00-07:00 > 2013-03-22 12:45:00-07:00 > 2013-03-22 12:49:00-07:00 > 2013-03-22 12:47:00-07:00 > -------------------------------------- > 2013-03-22 12:48:00-07:00 > 2013-03-22 12:46:00-07:00 > 2013-03-22 12:51:00-07:00 > 2013-03-22 12:50:00-07:00 > 2013-03-22 12:52:00-07:00 > > > > > On Tue, Aug 25, 2015 at 4:58 PM, Andrew Xor <[email protected]> > wrote: > >> Yes, unless I am missing something... try it and if you have any more >> problems drop an email. >> >> Regards. >> >> Kindly yours, >> >> Andrew Grammenos >> >> -- PGP PKey -- >> <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt> >> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt >> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt> >> >> On Wed, Aug 26, 2015 at 2:46 AM, Alec Lee <[email protected]> wrote: >> >>> WoW, that code seems to be exactly I want, will read through, double >>> check, I will still need a partition aggregator to actually sorting after >>> bucketization, right? >>> >>> thanks >>> >>> >>> On Aug 25, 2015, at 4:40 PM, Andrew Xor <[email protected]> >>> wrote: >>> >>> Sure, I found this code useful to start with; he does bucketization for >>> timed intervals in this gist >>> <https://gist.github.com/codyaray/75533044fc8c0a12fa67>. >>> >>> Hope this helps. >>> >>> Kindly yours, >>> >>> Andrew Grammenos >>> >>> -- PGP PKey -- >>> <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt> >>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt >>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt> >>> >>> On Wed, Aug 26, 2015 at 2:36 AM, Alec Lee <[email protected]> wrote: >>> >>>> All right, will do trident instead, shameless to ask again, any example >>>> code (particularly for events time sorting) to study? >>>> >>>> thanks >>>> >>>> >>>> On Aug 25, 2015, at 4:31 PM, Andrew Xor <[email protected]> >>>> wrote: >>>> >>>> Well, if you need to just preserve the order of received (event) tuples >>>> then why not use trident instead? Trident ensures correct ordering >>>> (chronologically) as well as exactly once processing without any gimmicks; >>>> sorting it secondary to the event generation sounds like you will enter >>>> into quite a bit of hassle for no reason. >>>> >>>> Regards. >>>> >>>> Kindly yours, >>>> >>>> Andrew Grammenos >>>> >>>> -- PGP PKey -- >>>> <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt> >>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt >>>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt> >>>> >>>> On Wed, Aug 26, 2015 at 2:00 AM, Alec Lee <[email protected]> wrote: >>>> >>>>> BTW, I am using spout and bolts, currently not using trident. Thanks >>>>> >>>>> >>>>> On Aug 25, 2015, at 3:47 PM, Andrew Xor <[email protected]> >>>>> wrote: >>>>> >>>>> What do you mean by that? It's a bit vague as timestamps can have >>>>> quite high resolution (like for example minutes, seconds, msec) so you >>>>> will >>>>> probably have to do a bit of bucketization before sorting them.... then by >>>>> using a partition aggregator (in Trident at least) you can to this very >>>>> easily. >>>>> >>>>> Hope this helps. >>>>> >>>>> Kindly yours, >>>>> >>>>> Andrew Grammenos >>>>> >>>>> -- PGP PKey -- >>>>> <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt> >>>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt >>>>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt> >>>>> >>>>> On Wed, Aug 26, 2015 at 1:37 AM, Alec Lee <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi, all >>>>>> >>>>>> is there any sample codes to sort the events in terms of the >>>>>> timestamps field of a tuple? >>>>>> >>>>>> thanks >>>>>> >>>>>> >>>>>> AL >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >
