Re: Distinct values with range

Andrey Kuprianov Tue, 16 Apr 2013 02:51:10 -0700

Muji, what happens if you have several hundred transactions per day in a
variety of different countries over several years? Then your view
processing is going to be very slow. We are looking for a near real-time
solution



On Tue, Apr 16, 2013 at 5:42 PM, muji <[email protected]> wrote:

> I believe you need to query with startkey and endkey as complex keys
> (assuming YYYY-MM-DD):
>
> startkey=[startyear,startmonth,startday]
> endkey=[endyear,endmonth,endday,{}]
>
> Then you can extract the countries from the key returned with each row (it
> will be the last element in the array). You will also need to set the group
> view parameter (group_level=4?) for distinct values.
>
> Then you should not need to write a custom reduce function.
>
> The startkey and endkey must be proper JSON (and URL) encoded values.
>
> My understanding is that is the correct approach.
>
> Cheers!
>
>
> On 16 April 2013 05:46, Andrey Kuprianov <[email protected]
> >wrote:
>
> > Nope, I need distinct values over a period of time. Not per day.
> >
> >
> > On Tue, Apr 16, 2013 at 11:30 AM, Keith Gable <
> [email protected]
> > >wrote:
> >
> > > It gives you distinct countries per day. Is that not what you want?
> With
> > > reduce, it should be really fast once the view is built.
> > > On Apr 15, 2013 9:05 PM, "Andrey Kuprianov" <
> [email protected]
> > >
> > > wrote:
> > >
> > > > @Keith your method will not give me distinct countries and even with
> > > reduce
> > > > and after being fed to list function it's still slow
> > > >
> > > >
> > > >
> > > > On Tue, Apr 16, 2013 at 2:27 AM, Wendall Cada <[email protected]>
> > > wrote:
> > > >
> > > > > I agree with this approach. I do something similar using _sum:
> > > > >
> > > > > emit([doc.country_name, toDay(doc.timestamp)], 1);
> > > > >
> > > > > The toDay() method is basically a floor of the day value. Since I
> > don't
> > > > > store ts in UTC (Because of an idiotic error some years back) I
> also
> > > do a
> > > > > tz offset to correct the day value in my toDay() method.
> > > > >
> > > > > Using reduce is by far the fastest method for this. I don't see any
> > > issue
> > > > > with getting this to scale.
> > > > >
> > > > > Overall, I think I rather prefer the method Keith shows, as it
> would
> > > > > depend on the values returned in the date object versus other
> > possibly
> > > > > inaccurate means using math.
> > > > >
> > > > > Wendall
> > > > >
> > > > >
> > > > > On 04/15/2013 07:18 AM, Keith Gable wrote:
> > > > >
> > > > >> Output keys like so:
> > > > >>
> > > > >> [2010, 7, 10, "Australia"]
> > > > >>
> > > > >> Reduce function would be _count.
> > > > >>
> > > > >> startkey=[year,month,day,null]
> > > > >> endkey=[year,month,day,{}]
> > > > >>
> > > > >> ---
> > > > >> Keith Gable
> > > > >> A+, Network+, and Storage+ Certified Professional
> > > > >> Apple Certified Technical Coordinator
> > > > >> Mobile Application Developer / Web Developer
> > > > >>
> > > > >>
> > > > >> On Sun, Apr 14, 2013 at 8:37 PM, Andrey Kuprianov <
> > > > >> [email protected]> wrote:
> > > > >>
> > > > >>  Hi guys,
> > > > >>>
> > > > >>> Just for the sake of a debate. Here's the question. There are
> > > > >>> transactions.
> > > > >>> Among all other attributes there's timestamp (when transaction
> was
> > > > made;
> > > > >>> in
> > > > >>> seconds) and a country name  (from where the transaction was
> made).
> > > So,
> > > > >>> for
> > > > >>> instance,
> > > > >>>
> > > > >>> {
> > > > >>>      . . . .
> > > > >>>      "timestamp": 1332806400
> > > > >>>      "country_name": "Australia",
> > > > >>>      . . . .
> > > > >>> }
> > > > >>>
> > > > >>> Question is: how does one get unique / distinct country names in
> > > > between
> > > > >>> dates? For example, give me all country names in between
> > 10-Jul-2010
> > > > and
> > > > >>> 21-Jan-2013.
> > > > >>>
> > > > >>> My solution was to write a custom reduce function and set
> > > > >>> reduce_limit=false, so that i can enumerate all countries without
> > > > hitting
> > > > >>> the overflow exception. It works great! However, such solutions
> are
> > > > >>> frowned
> > > > >>> upon by everyone around. Has anyone a better idea on how to
> tackle
> > > this
> > > > >>> efficiently?
> > > > >>>
> > > > >>>      Andrey
> > > > >>>
> > > > >>>
> > > > >
> > > >
> > >
> >
>
>
>
> --
> mischa (aka muji).
>

Re: Distinct values with range

Reply via email to