Re: Distinct values with range

muji Tue, 16 Apr 2013 02:42:43 -0700

I believe you need to query with startkey and endkey as complex keys
(assuming YYYY-MM-DD):


startkey=[startyear,startmonth,startday]
endkey=[endyear,endmonth,endday,{}]

Then you can extract the countries from the key returned with each row (it
will be the last element in the array). You will also need to set the group
view parameter (group_level=4?) for distinct values.

Then you should not need to write a custom reduce function.

The startkey and endkey must be proper JSON (and URL) encoded values.

My understanding is that is the correct approach.

Cheers!


On 16 April 2013 05:46, Andrey Kuprianov <[email protected]>wrote:

> Nope, I need distinct values over a period of time. Not per day.
>
>
> On Tue, Apr 16, 2013 at 11:30 AM, Keith Gable <[email protected]
> >wrote:
>
> > It gives you distinct countries per day. Is that not what you want? With
> > reduce, it should be really fast once the view is built.
> > On Apr 15, 2013 9:05 PM, "Andrey Kuprianov" <[email protected]
> >
> > wrote:
> >
> > > @Keith your method will not give me distinct countries and even with
> > reduce
> > > and after being fed to list function it's still slow
> > >
> > >
> > >
> > > On Tue, Apr 16, 2013 at 2:27 AM, Wendall Cada <[email protected]>
> > wrote:
> > >
> > > > I agree with this approach. I do something similar using _sum:
> > > >
> > > > emit([doc.country_name, toDay(doc.timestamp)], 1);
> > > >
> > > > The toDay() method is basically a floor of the day value. Since I
> don't
> > > > store ts in UTC (Because of an idiotic error some years back) I also
> > do a
> > > > tz offset to correct the day value in my toDay() method.
> > > >
> > > > Using reduce is by far the fastest method for this. I don't see any
> > issue
> > > > with getting this to scale.
> > > >
> > > > Overall, I think I rather prefer the method Keith shows, as it would
> > > > depend on the values returned in the date object versus other
> possibly
> > > > inaccurate means using math.
> > > >
> > > > Wendall
> > > >
> > > >
> > > > On 04/15/2013 07:18 AM, Keith Gable wrote:
> > > >
> > > >> Output keys like so:
> > > >>
> > > >> [2010, 7, 10, "Australia"]
> > > >>
> > > >> Reduce function would be _count.
> > > >>
> > > >> startkey=[year,month,day,null]
> > > >> endkey=[year,month,day,{}]
> > > >>
> > > >> ---
> > > >> Keith Gable
> > > >> A+, Network+, and Storage+ Certified Professional
> > > >> Apple Certified Technical Coordinator
> > > >> Mobile Application Developer / Web Developer
> > > >>
> > > >>
> > > >> On Sun, Apr 14, 2013 at 8:37 PM, Andrey Kuprianov <
> > > >> [email protected]> wrote:
> > > >>
> > > >>  Hi guys,
> > > >>>
> > > >>> Just for the sake of a debate. Here's the question. There are
> > > >>> transactions.
> > > >>> Among all other attributes there's timestamp (when transaction was
> > > made;
> > > >>> in
> > > >>> seconds) and a country name  (from where the transaction was made).
> > So,
> > > >>> for
> > > >>> instance,
> > > >>>
> > > >>> {
> > > >>>      . . . .
> > > >>>      "timestamp": 1332806400
> > > >>>      "country_name": "Australia",
> > > >>>      . . . .
> > > >>> }
> > > >>>
> > > >>> Question is: how does one get unique / distinct country names in
> > > between
> > > >>> dates? For example, give me all country names in between
> 10-Jul-2010
> > > and
> > > >>> 21-Jan-2013.
> > > >>>
> > > >>> My solution was to write a custom reduce function and set
> > > >>> reduce_limit=false, so that i can enumerate all countries without
> > > hitting
> > > >>> the overflow exception. It works great! However, such solutions are
> > > >>> frowned
> > > >>> upon by everyone around. Has anyone a better idea on how to tackle
> > this
> > > >>> efficiently?
> > > >>>
> > > >>>      Andrey
> > > >>>
> > > >>>
> > > >
> > >
> >
>



-- 
mischa (aka muji).

Re: Distinct values with range

Reply via email to