This is not an issue, as that probably would be done through a partition
aggregator after the groupBy.

Kindly yours,

Andrew Grammenos

-- PGP PKey --
​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
<https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>

On Wed, Aug 26, 2015 at 4:16 AM, Kishore Senji <[email protected]> wrote:

> Interesting. But wouldn't this be impacted by the trident batch size?
>
> Assuming the batch boundary is like below, after bucketing you would
> groupBy on the start time (but how would you sort it?) and assumed it can
> be sorted, we should be done with that batch. so if the batch boundary is
> like below, you would end up with two different sets of sorts for events
> which are supposed to be together (12:44, 12:45 & 12:46 below). If I
> understand the original question, it is how to sort the full stream of
> events irrespective of how they are processed in batches.
>
> 2013-03-22 12:43:00-07:00
> 2013-03-22 12:44:00-07:00
> 2013-03-22 12:45:00-07:00
> 2013-03-22 12:49:00-07:00
> 2013-03-22 12:47:00-07:00
> --------------------------------------
> 2013-03-22 12:48:00-07:00
> 2013-03-22 12:46:00-07:00
> 2013-03-22 12:51:00-07:00
> 2013-03-22 12:50:00-07:00
> 2013-03-22 12:52:00-07:00
>
>
>
>
> On Tue, Aug 25, 2015 at 4:58 PM, Andrew Xor <[email protected]>
> wrote:
>
>> Yes, unless I am missing something... try it and if you have any more
>> problems drop an email.
>>
>> Regards.
>>
>> Kindly yours,
>>
>> Andrew Grammenos
>>
>> -- PGP PKey --
>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>
>> On Wed, Aug 26, 2015 at 2:46 AM, Alec Lee <[email protected]> wrote:
>>
>>> WoW, that code seems to be exactly I want, will read through, double
>>> check, I will still need a partition aggregator to actually sorting after
>>> bucketization, right?
>>>
>>> thanks
>>>
>>>
>>> On Aug 25, 2015, at 4:40 PM, Andrew Xor <[email protected]>
>>> wrote:
>>>
>>> Sure, I found this code useful to start with; he does bucketization for
>>> timed intervals in this gist
>>> <https://gist.github.com/codyaray/75533044fc8c0a12fa67>.
>>>
>>> Hope this helps.
>>>
>>> Kindly yours,
>>>
>>> Andrew Grammenos
>>>
>>> -- PGP PKey --
>>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
>>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>>
>>> On Wed, Aug 26, 2015 at 2:36 AM, Alec Lee <[email protected]> wrote:
>>>
>>>> All right, will do trident instead, shameless to ask again, any example
>>>> code (particularly for events time sorting) to study?
>>>>
>>>> thanks
>>>>
>>>>
>>>> On Aug 25, 2015, at 4:31 PM, Andrew Xor <[email protected]>
>>>> wrote:
>>>>
>>>> Well, if you need to just preserve the order of received (event) tuples
>>>> then why not use trident instead? Trident ensures correct ordering
>>>> (chronologically) as well as exactly once processing without any gimmicks;
>>>> sorting it secondary to the event generation sounds like you will enter
>>>> into quite a bit of hassle for no reason.
>>>>
>>>> Regards.
>>>>
>>>> Kindly yours,
>>>>
>>>> Andrew Grammenos
>>>>
>>>> -- PGP PKey --
>>>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
>>>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>>>
>>>> On Wed, Aug 26, 2015 at 2:00 AM, Alec Lee <[email protected]> wrote:
>>>>
>>>>> BTW, I am using spout and bolts, currently not using trident. Thanks
>>>>>
>>>>>
>>>>> On Aug 25, 2015, at 3:47 PM, Andrew Xor <[email protected]>
>>>>> wrote:
>>>>>
>>>>> What do you mean by that? It's a bit vague as timestamps can have
>>>>> quite high resolution (like for example minutes, seconds, msec) so you 
>>>>> will
>>>>> probably have to do a bit of bucketization before sorting them.... then by
>>>>> using a partition aggregator (in Trident at least) you can to this very
>>>>> easily.
>>>>> ​​
>>>>> Hope this helps.
>>>>>
>>>>> Kindly yours,
>>>>>
>>>>> Andrew Grammenos
>>>>>
>>>>> -- PGP PKey --
>>>>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>>>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
>>>>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>>>>
>>>>> On Wed, Aug 26, 2015 at 1:37 AM, Alec Lee <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi, all
>>>>>>
>>>>>> is there any sample codes to sort the events in terms of the
>>>>>> timestamps field of a tuple?
>>>>>>
>>>>>> thanks
>>>>>>
>>>>>>
>>>>>> AL
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>

Reply via email to