I don't think it's easier or harder to learn... but both have pros and cons; in your case the semantics that you are trying to apply in your particular scenario sound more like a use-case for a Trident based topology that's all.
Regards. Kindly yours, Andrew Grammenos -- PGP PKey -- <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt> On Wed, Aug 26, 2015 at 8:35 PM, Alec Lee <[email protected]> wrote: > For dealing with such type of problem, seems trident is better than > spout+bolts even latter is easier to understand and learn? > > AL > > On Aug 25, 2015, at 9:31 PM, Kishore Senji <[email protected]> wrote: > > Agreed. This makes sense if the aggregation is on fields etc. > > Although Alec did not mention it in this post, based on his previous posts > on the same topic, I would assume he is trying to sort the events because > he wanted to "fill in" the missing events (smoothening the curve so to > speak) by looking at the previous and next events of the missed timestamp > and then do some stream processing on top of it (like for example alerting > based on sliding window). Assuming that is the scenario, I guess then he > would have to keep more metadata in the State so that he can fill in those > events but the question would be when would he stop looking for missing > events and fill them and move on (as they can come in different batches), > plus he would have to do some stream processing (or store them to ES for > later search for example) in the State itself if there is any such > processing. This is where I think it gets tricky to do this in the > partition aggregator. > > So in our earlier posts we suggested he can do the the appropriate > partitioning in Kafka (so that events from a given device ends up in the > same partition) and he could do the window based sorting (by buffering few > events) in the Stream processing. > > Alec, Please ignore the above if my assumption is not correct. > > > On Tue, Aug 25, 2015 at 6:19 PM, Andrew Xor <[email protected]> > wrote: > >> This is not an issue, as that probably would be done through a partition >> aggregator after the groupBy. >> >> Kindly yours, >> >> Andrew Grammenos >> >> -- PGP PKey -- >> <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt> >> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt >> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt> >> >> On Wed, Aug 26, 2015 at 4:16 AM, Kishore Senji <[email protected]> wrote: >> >>> Interesting. But wouldn't this be impacted by the trident batch size? >>> >>> Assuming the batch boundary is like below, after bucketing you would >>> groupBy on the start time (but how would you sort it?) and assumed it can >>> be sorted, we should be done with that batch. so if the batch boundary is >>> like below, you would end up with two different sets of sorts for events >>> which are supposed to be together (12:44, 12:45 & 12:46 below). If I >>> understand the original question, it is how to sort the full stream of >>> events irrespective of how they are processed in batches. >>> >>> 2013-03-22 12:43:00-07:00 >>> 2013-03-22 12:44:00-07:00 >>> 2013-03-22 12:45:00-07:00 >>> 2013-03-22 12:49:00-07:00 >>> 2013-03-22 12:47:00-07:00 >>> -------------------------------------- >>> 2013-03-22 12:48:00-07:00 >>> 2013-03-22 12:46:00-07:00 >>> 2013-03-22 12:51:00-07:00 >>> 2013-03-22 12:50:00-07:00 >>> 2013-03-22 12:52:00-07:00 >>> >>> >>> >>> >>> On Tue, Aug 25, 2015 at 4:58 PM, Andrew Xor <[email protected] >>> > wrote: >>> >>>> Yes, unless I am missing something... try it and if you have any more >>>> problems drop an email. >>>> >>>> Regards. >>>> >>>> Kindly yours, >>>> >>>> Andrew Grammenos >>>> >>>> -- PGP PKey -- >>>> <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt> >>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt >>>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt> >>>> >>>> On Wed, Aug 26, 2015 at 2:46 AM, Alec Lee <[email protected]> wrote: >>>> >>>>> WoW, that code seems to be exactly I want, will read through, double >>>>> check, I will still need a partition aggregator to actually sorting after >>>>> bucketization, right? >>>>> >>>>> thanks >>>>> >>>>> >>>>> On Aug 25, 2015, at 4:40 PM, Andrew Xor <[email protected]> >>>>> wrote: >>>>> >>>>> Sure, I found this code useful to start with; he does bucketization >>>>> for timed intervals in this gist >>>>> <https://gist.github.com/codyaray/75533044fc8c0a12fa67>. >>>>> >>>>> Hope this helps. >>>>> >>>>> Kindly yours, >>>>> >>>>> Andrew Grammenos >>>>> >>>>> -- PGP PKey -- >>>>> <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt> >>>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt >>>>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt> >>>>> >>>>> On Wed, Aug 26, 2015 at 2:36 AM, Alec Lee <[email protected]> >>>>> wrote: >>>>> >>>>>> All right, will do trident instead, shameless to ask again, any >>>>>> example code (particularly for events time sorting) to study? >>>>>> >>>>>> thanks >>>>>> >>>>>> >>>>>> On Aug 25, 2015, at 4:31 PM, Andrew Xor <[email protected]> >>>>>> wrote: >>>>>> >>>>>> Well, if you need to just preserve the order of received (event) >>>>>> tuples then why not use trident instead? Trident ensures correct ordering >>>>>> (chronologically) as well as exactly once processing without any >>>>>> gimmicks; >>>>>> sorting it secondary to the event generation sounds like you will enter >>>>>> into quite a bit of hassle for no reason. >>>>>> >>>>>> Regards. >>>>>> >>>>>> Kindly yours, >>>>>> >>>>>> Andrew Grammenos >>>>>> >>>>>> -- PGP PKey -- >>>>>> <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt> >>>>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt >>>>>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt> >>>>>> >>>>>> On Wed, Aug 26, 2015 at 2:00 AM, Alec Lee <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> BTW, I am using spout and bolts, currently not using trident. Thanks >>>>>>> >>>>>>> >>>>>>> On Aug 25, 2015, at 3:47 PM, Andrew Xor <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> What do you mean by that? It's a bit vague as timestamps can have >>>>>>> quite high resolution (like for example minutes, seconds, msec) so you >>>>>>> will >>>>>>> probably have to do a bit of bucketization before sorting them.... then >>>>>>> by >>>>>>> using a partition aggregator (in Trident at least) you can to this very >>>>>>> easily. >>>>>>> >>>>>>> Hope this helps. >>>>>>> >>>>>>> Kindly yours, >>>>>>> >>>>>>> Andrew Grammenos >>>>>>> >>>>>>> -- PGP PKey -- >>>>>>> <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt> >>>>>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt >>>>>>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt> >>>>>>> >>>>>>> On Wed, Aug 26, 2015 at 1:37 AM, Alec Lee <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, all >>>>>>>> >>>>>>>> is there any sample codes to sort the events in terms of the >>>>>>>> timestamps field of a tuple? >>>>>>>> >>>>>>>> thanks >>>>>>>> >>>>>>>> >>>>>>>> AL >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>> >> > >
