Re: Distinct values with range

Andrey Kuprianov Sun, 14 Apr 2013 23:53:32 -0700

Btw, is reduce function that you mentioned supposed to basically output
de-duplicate keys?



On Mon, Apr 15, 2013 at 1:10 PM, Andrey Kuprianov <
[email protected]> wrote:

> Thanks. I'll try the lists. Completely forgot about them actually
>
>
>
> On Mon, Apr 15, 2013 at 12:59 PM, Jim Klo <[email protected]> wrote:
>
>> Not sure if its ideal but if you need dates in epoch millis, you could
>> round the timestamp to the floor of the current day (say midnight) in a map
>> function, use a built in reduce... Then use a list function to filter
>> unique countries.
>>
>> If you don't need a real timestamp value, use an integer like YYYYMMDD
>> (i.e. 20130710 for 2013-Jul-10).
>>
>> Reduce = true will combine by day making at most (196 countries x number
>> of days in range) to filter in the show function.
>>
>> - JK
>>
>>
>>
>> Sent from my iPad
>>
>> On Apr 14, 2013, at 6:38 PM, "Andrey Kuprianov" <
>> [email protected]> wrote:
>>
>> > Hi guys,
>> >
>> > Just for the sake of a debate. Here's the question. There are
>> transactions.
>> > Among all other attributes there's timestamp (when transaction was
>> made; in
>> > seconds) and a country name  (from where the transaction was made). So,
>> for
>> > instance,
>> >
>> > {
>> >    . . . .
>> >    "timestamp": 1332806400
>> >    "country_name": "Australia",
>> >    . . . .
>> > }
>> >
>> > Question is: how does one get unique / distinct country names in between
>> > dates? For example, give me all country names in between 10-Jul-2010 and
>> > 21-Jan-2013.
>> >
>> > My solution was to write a custom reduce function and set
>> > reduce_limit=false, so that i can enumerate all countries without
>> hitting
>> > the overflow exception. It works great! However, such solutions are
>> frowned
>> > upon by everyone around. Has anyone a better idea on how to tackle this
>> > efficiently?
>> >
>> >    Andrey
>>
>
>

Re: Distinct values with range

Reply via email to