Btw, is reduce function that you mentioned supposed to basically output de-duplicate keys?
On Mon, Apr 15, 2013 at 1:10 PM, Andrey Kuprianov < [email protected]> wrote: > Thanks. I'll try the lists. Completely forgot about them actually > > > > On Mon, Apr 15, 2013 at 12:59 PM, Jim Klo <[email protected]> wrote: > >> Not sure if its ideal but if you need dates in epoch millis, you could >> round the timestamp to the floor of the current day (say midnight) in a map >> function, use a built in reduce... Then use a list function to filter >> unique countries. >> >> If you don't need a real timestamp value, use an integer like YYYYMMDD >> (i.e. 20130710 for 2013-Jul-10). >> >> Reduce = true will combine by day making at most (196 countries x number >> of days in range) to filter in the show function. >> >> - JK >> >> >> >> Sent from my iPad >> >> On Apr 14, 2013, at 6:38 PM, "Andrey Kuprianov" < >> [email protected]> wrote: >> >> > Hi guys, >> > >> > Just for the sake of a debate. Here's the question. There are >> transactions. >> > Among all other attributes there's timestamp (when transaction was >> made; in >> > seconds) and a country name (from where the transaction was made). So, >> for >> > instance, >> > >> > { >> > . . . . >> > "timestamp": 1332806400 >> > "country_name": "Australia", >> > . . . . >> > } >> > >> > Question is: how does one get unique / distinct country names in between >> > dates? For example, give me all country names in between 10-Jul-2010 and >> > 21-Jan-2013. >> > >> > My solution was to write a custom reduce function and set >> > reduce_limit=false, so that i can enumerate all countries without >> hitting >> > the overflow exception. It works great! However, such solutions are >> frowned >> > upon by everyone around. Has anyone a better idea on how to tackle this >> > efficiently? >> > >> > Andrey >> > >
