Btw, thank you for clarification.
On Mon, Apr 15, 2013 at 7:01 PM, Andrey Kuprianov < [email protected]> wrote: > Lists wont get faster once that view hits 1mil mark either, however, > reduce will not grow large as number of distinct countries is finite and > relatively small. > > > On Mon, Apr 15, 2013 at 5:10 PM, Robert Newson <[email protected]> wrote: > >> Bounded accumulation in reduce functions is often feasible. The reason >> we discourage custom reduces is to avoid degenerate cases like "return >> values" or a function that combines all the items in the values around >> into a single object. The return values of those functions continues >> to grow as the database grows. If your databases stays small then you >> might well avoid the problem entirely. The reduce_limit feature is >> designed to catch these mistakes early, before you have a >> multi-million document database that fails. >> >> A list function will be slower than calling the view directly as every >> row passes through the view server (converting from native to json >> too). In your case, your view was already fast (at least, when you >> only have 65k documents), so I'm not too surprised that the list >> approach was slower. The question is whether that remains true at a >> million documents. >> >> B. >> >> >> On 15 April 2013 09:29, Andrey Kuprianov <[email protected]> >> wrote: >> > I feel a little bit deceived here. I was lead to believe that >> accumulation >> > of data in reduces will drastically slow things down, but now I am >> having >> > second thoughts. >> > >> > I've tried Jim's approach with lists and ran it against my old approach >> > where I was using reduce without limit (over 65k documents were used in >> the >> > test). The reduce seems to run 20 times faster! I feel like lists are >> > actually slowing things down, not custom reduces. >> > >> > Can anyone give me some good explanation regarding this? >> > >> > Just FYI, I am using CouchDB 1.2.0. >> > >> > >> > On Mon, Apr 15, 2013 at 2:52 PM, Andrey Kuprianov < >> > [email protected]> wrote: >> > >> >> Btw, is reduce function that you mentioned supposed to basically output >> >> de-duplicate keys? >> >> >> >> >> >> On Mon, Apr 15, 2013 at 1:10 PM, Andrey Kuprianov < >> >> [email protected]> wrote: >> >> >> >>> Thanks. I'll try the lists. Completely forgot about them actually >> >>> >> >>> >> >>> >> >>> On Mon, Apr 15, 2013 at 12:59 PM, Jim Klo <[email protected]> wrote: >> >>> >> >>>> Not sure if its ideal but if you need dates in epoch millis, you >> could >> >>>> round the timestamp to the floor of the current day (say midnight) >> in a map >> >>>> function, use a built in reduce... Then use a list function to filter >> >>>> unique countries. >> >>>> >> >>>> If you don't need a real timestamp value, use an integer like >> YYYYMMDD >> >>>> (i.e. 20130710 for 2013-Jul-10). >> >>>> >> >>>> Reduce = true will combine by day making at most (196 countries x >> number >> >>>> of days in range) to filter in the show function. >> >>>> >> >>>> - JK >> >>>> >> >>>> >> >>>> >> >>>> Sent from my iPad >> >>>> >> >>>> On Apr 14, 2013, at 6:38 PM, "Andrey Kuprianov" < >> >>>> [email protected]> wrote: >> >>>> >> >>>> > Hi guys, >> >>>> > >> >>>> > Just for the sake of a debate. Here's the question. There are >> >>>> transactions. >> >>>> > Among all other attributes there's timestamp (when transaction was >> >>>> made; in >> >>>> > seconds) and a country name (from where the transaction was made). >> >>>> So, for >> >>>> > instance, >> >>>> > >> >>>> > { >> >>>> > . . . . >> >>>> > "timestamp": 1332806400 >> >>>> > "country_name": "Australia", >> >>>> > . . . . >> >>>> > } >> >>>> > >> >>>> > Question is: how does one get unique / distinct country names in >> >>>> between >> >>>> > dates? For example, give me all country names in between >> 10-Jul-2010 >> >>>> and >> >>>> > 21-Jan-2013. >> >>>> > >> >>>> > My solution was to write a custom reduce function and set >> >>>> > reduce_limit=false, so that i can enumerate all countries without >> >>>> hitting >> >>>> > the overflow exception. It works great! However, such solutions are >> >>>> frowned >> >>>> > upon by everyone around. Has anyone a better idea on how to tackle >> this >> >>>> > efficiently? >> >>>> > >> >>>> > Andrey >> >>>> >> >>> >> >>> >> >> >> > >
