The list function will be a constant factor slower than the equivalent view call. It would be a different mistake to read the entire view through a list function while performing some kind of aggregation in the list function.
B. On 15 April 2013 12:02, Andrey Kuprianov <[email protected]> wrote: > Btw, thank you for clarification. > > > On Mon, Apr 15, 2013 at 7:01 PM, Andrey Kuprianov < > [email protected]> wrote: > >> Lists wont get faster once that view hits 1mil mark either, however, >> reduce will not grow large as number of distinct countries is finite and >> relatively small. >> >> >> On Mon, Apr 15, 2013 at 5:10 PM, Robert Newson <[email protected]> wrote: >> >>> Bounded accumulation in reduce functions is often feasible. The reason >>> we discourage custom reduces is to avoid degenerate cases like "return >>> values" or a function that combines all the items in the values around >>> into a single object. The return values of those functions continues >>> to grow as the database grows. If your databases stays small then you >>> might well avoid the problem entirely. The reduce_limit feature is >>> designed to catch these mistakes early, before you have a >>> multi-million document database that fails. >>> >>> A list function will be slower than calling the view directly as every >>> row passes through the view server (converting from native to json >>> too). In your case, your view was already fast (at least, when you >>> only have 65k documents), so I'm not too surprised that the list >>> approach was slower. The question is whether that remains true at a >>> million documents. >>> >>> B. >>> >>> >>> On 15 April 2013 09:29, Andrey Kuprianov <[email protected]> >>> wrote: >>> > I feel a little bit deceived here. I was lead to believe that >>> accumulation >>> > of data in reduces will drastically slow things down, but now I am >>> having >>> > second thoughts. >>> > >>> > I've tried Jim's approach with lists and ran it against my old approach >>> > where I was using reduce without limit (over 65k documents were used in >>> the >>> > test). The reduce seems to run 20 times faster! I feel like lists are >>> > actually slowing things down, not custom reduces. >>> > >>> > Can anyone give me some good explanation regarding this? >>> > >>> > Just FYI, I am using CouchDB 1.2.0. >>> > >>> > >>> > On Mon, Apr 15, 2013 at 2:52 PM, Andrey Kuprianov < >>> > [email protected]> wrote: >>> > >>> >> Btw, is reduce function that you mentioned supposed to basically output >>> >> de-duplicate keys? >>> >> >>> >> >>> >> On Mon, Apr 15, 2013 at 1:10 PM, Andrey Kuprianov < >>> >> [email protected]> wrote: >>> >> >>> >>> Thanks. I'll try the lists. Completely forgot about them actually >>> >>> >>> >>> >>> >>> >>> >>> On Mon, Apr 15, 2013 at 12:59 PM, Jim Klo <[email protected]> wrote: >>> >>> >>> >>>> Not sure if its ideal but if you need dates in epoch millis, you >>> could >>> >>>> round the timestamp to the floor of the current day (say midnight) >>> in a map >>> >>>> function, use a built in reduce... Then use a list function to filter >>> >>>> unique countries. >>> >>>> >>> >>>> If you don't need a real timestamp value, use an integer like >>> YYYYMMDD >>> >>>> (i.e. 20130710 for 2013-Jul-10). >>> >>>> >>> >>>> Reduce = true will combine by day making at most (196 countries x >>> number >>> >>>> of days in range) to filter in the show function. >>> >>>> >>> >>>> - JK >>> >>>> >>> >>>> >>> >>>> >>> >>>> Sent from my iPad >>> >>>> >>> >>>> On Apr 14, 2013, at 6:38 PM, "Andrey Kuprianov" < >>> >>>> [email protected]> wrote: >>> >>>> >>> >>>> > Hi guys, >>> >>>> > >>> >>>> > Just for the sake of a debate. Here's the question. There are >>> >>>> transactions. >>> >>>> > Among all other attributes there's timestamp (when transaction was >>> >>>> made; in >>> >>>> > seconds) and a country name (from where the transaction was made). >>> >>>> So, for >>> >>>> > instance, >>> >>>> > >>> >>>> > { >>> >>>> > . . . . >>> >>>> > "timestamp": 1332806400 >>> >>>> > "country_name": "Australia", >>> >>>> > . . . . >>> >>>> > } >>> >>>> > >>> >>>> > Question is: how does one get unique / distinct country names in >>> >>>> between >>> >>>> > dates? For example, give me all country names in between >>> 10-Jul-2010 >>> >>>> and >>> >>>> > 21-Jan-2013. >>> >>>> > >>> >>>> > My solution was to write a custom reduce function and set >>> >>>> > reduce_limit=false, so that i can enumerate all countries without >>> >>>> hitting >>> >>>> > the overflow exception. It works great! However, such solutions are >>> >>>> frowned >>> >>>> > upon by everyone around. Has anyone a better idea on how to tackle >>> this >>> >>>> > efficiently? >>> >>>> > >>> >>>> > Andrey >>> >>>> >>> >>> >>> >>> >>> >> >>> >> >>
