Re: Distinct values with range

Andrey Kuprianov Mon, 15 Apr 2013 04:03:01 -0700

Btw, thank you for clarification.


On Mon, Apr 15, 2013 at 7:01 PM, Andrey Kuprianov <
[email protected]> wrote:

> Lists wont get faster once that view hits 1mil mark either, however,
> reduce will not grow large as number of distinct countries is finite and
> relatively small.
>
>
> On Mon, Apr 15, 2013 at 5:10 PM, Robert Newson <[email protected]> wrote:
>
>> Bounded accumulation in reduce functions is often feasible. The reason
>> we discourage custom reduces is to avoid degenerate cases like "return
>> values" or a function that combines all the items in the values around
>> into a single object. The return values of those functions continues
>> to grow  as the database grows. If your databases stays small then you
>> might well avoid the problem entirely. The reduce_limit feature is
>> designed to catch these mistakes early, before you have a
>> multi-million document database that fails.
>>
>> A list function will be slower than calling the view directly as every
>> row passes through the view server (converting from native to json
>> too). In your case, your view was already fast (at least, when you
>> only have 65k documents), so I'm not too surprised that the list
>> approach was slower. The question is whether that remains true at a
>> million documents.
>>
>> B.
>>
>>
>> On 15 April 2013 09:29, Andrey Kuprianov <[email protected]>
>> wrote:
>> > I feel a little bit deceived here. I was lead to believe that
>> accumulation
>> > of data in reduces will drastically slow things down, but now I am
>> having
>> > second thoughts.
>> >
>> > I've tried Jim's approach with lists and ran it against my old approach
>> > where I was using reduce without limit (over 65k documents were used in
>> the
>> > test). The reduce seems to run 20 times faster! I feel like lists are
>> > actually slowing things down, not custom reduces.
>> >
>> > Can anyone give me some good explanation regarding this?
>> >
>> > Just FYI, I am using CouchDB 1.2.0.
>> >
>> >
>> > On Mon, Apr 15, 2013 at 2:52 PM, Andrey Kuprianov <
>> > [email protected]> wrote:
>> >
>> >> Btw, is reduce function that you mentioned supposed to basically output
>> >> de-duplicate keys?
>> >>
>> >>
>> >> On Mon, Apr 15, 2013 at 1:10 PM, Andrey Kuprianov <
>> >> [email protected]> wrote:
>> >>
>> >>> Thanks. I'll try the lists. Completely forgot about them actually
>> >>>
>> >>>
>> >>>
>> >>> On Mon, Apr 15, 2013 at 12:59 PM, Jim Klo <[email protected]> wrote:
>> >>>
>> >>>> Not sure if its ideal but if you need dates in epoch millis, you
>> could
>> >>>> round the timestamp to the floor of the current day (say midnight)
>> in a map
>> >>>> function, use a built in reduce... Then use a list function to filter
>> >>>> unique countries.
>> >>>>
>> >>>> If you don't need a real timestamp value, use an integer like
>> YYYYMMDD
>> >>>> (i.e. 20130710 for 2013-Jul-10).
>> >>>>
>> >>>> Reduce = true will combine by day making at most (196 countries x
>> number
>> >>>> of days in range) to filter in the show function.
>> >>>>
>> >>>> - JK
>> >>>>
>> >>>>
>> >>>>
>> >>>> Sent from my iPad
>> >>>>
>> >>>> On Apr 14, 2013, at 6:38 PM, "Andrey Kuprianov" <
>> >>>> [email protected]> wrote:
>> >>>>
>> >>>> > Hi guys,
>> >>>> >
>> >>>> > Just for the sake of a debate. Here's the question. There are
>> >>>> transactions.
>> >>>> > Among all other attributes there's timestamp (when transaction was
>> >>>> made; in
>> >>>> > seconds) and a country name  (from where the transaction was made).
>> >>>> So, for
>> >>>> > instance,
>> >>>> >
>> >>>> > {
>> >>>> >    . . . .
>> >>>> >    "timestamp": 1332806400
>> >>>> >    "country_name": "Australia",
>> >>>> >    . . . .
>> >>>> > }
>> >>>> >
>> >>>> > Question is: how does one get unique / distinct country names in
>> >>>> between
>> >>>> > dates? For example, give me all country names in between
>> 10-Jul-2010
>> >>>> and
>> >>>> > 21-Jan-2013.
>> >>>> >
>> >>>> > My solution was to write a custom reduce function and set
>> >>>> > reduce_limit=false, so that i can enumerate all countries without
>> >>>> hitting
>> >>>> > the overflow exception. It works great! However, such solutions are
>> >>>> frowned
>> >>>> > upon by everyone around. Has anyone a better idea on how to tackle
>> this
>> >>>> > efficiently?
>> >>>> >
>> >>>> >    Andrey
>> >>>>
>> >>>
>> >>>
>> >>
>>
>
>

Re: Distinct values with range

Reply via email to