Rob
Since you've already got a view that'll give you Boris's 50k unique
songs, you could use a _list function to return the number of rows.
Something like this should do the trick:
function() {
var count = 0;
while(getRow()) count++;
return JSON.stringify({count: count});
}
If you query this list function, with the same view, key range and
group level, it'll just respond with a bit of JSON, such as:
{"count":"50612"}
Is that more like what you're looking for?
You can read up more here:
http://guide.couchdb.org/draft/transforming.html
http://wiki.apache.org/couchdb/Formatting_with_Show_and_List
--Zach
On Tue, Nov 15, 2011 at 11:39 AM, Rob Crowell <[email protected]> wrote:
> I don't think this works, unless I am misunderstanding.
>
> If our user "boris" listened to the same song 20 times, and only
> listened to that one song, the _count reduce would return 20 would it
> not? I would like the value 1 instead (only 1 distinct song listened
> to).
>
> --Rob
>
> On Tue, Nov 15, 2011 at 12:21 PM, Robert Newson <[email protected]> wrote:
>> then just emit(doc.user_id, null) and use _count?
>>
>> B.
>>
>> On 15 November 2011 17:17, Rob Crowell <[email protected]> wrote:
>>> Hello everyone,
>>>
>>> I'm writing some log-parsing code which is currently running on a
>>> MySQL backend. We're doing a huge amount of aggregates on this data
>>> right now, but performance is suffering and I'm looking for
>>> alternatives. The idea of incremental map/reduce initially seemed
>>> like the exact right thing, but I can't seem to express some of the
>>> most important queries we are currently running in our production
>>> system.
>>>
>>> We're running a lot of queries of the SELECT COUNT(DISTINCT song_id)
>>> WHERE user_id = "boris" AND created >= "2010-01-01" AND created <
>>> "2010-02-01" variety. Currently in MySQL-land we've got a cron job to
>>> pre-compute these aggregates (it checks modified timestamps and pulls
>>> in only new records) and write them to a summary table. I initially
>>> believed I could use CouchDB's incremental map/reduce to effortlessly
>>> build and update our "summary information" as it changes, but I'm
>>> stuck. I'm trying to relax, but I can't figure out exactly how :)
>>>
>>> In our example, our user "boris" listens to the same song many times
>>> each month, and we're interested in the number of distinct songs he's
>>> listened to during a specified time period (NOT the number of song
>>> plays, but the number of distinct songs played). In CouchDB it isn't
>>> much trouble to get all of the unique songs that he's listened to
>>> during a period. Here's our document:
>>>
>>> {
>>> song_id: "happy birthday",
>>> user_id: "boris",
>>> date_played: [2011, 11, 14, 00, 12, 55],
>>> _id: ...
>>> }
>>>
>>> To get the unique values, all we need to do is emit([doc.user_id,
>>> doc.date_played, doc.song_id], null), reduce with _count, and query
>>> with a startkey=["boris", "2011-01-01"]&endkey=["boris",
>>> "2011-02-01"]&group_level=1. This query will yield results like:
>>>
>>> ["boris", "happy birthday"], 20
>>> ["boris", "yesterday"], 14
>>> ...
>>>
>>> However, if our user has listened to 50,000 songs during the date
>>> range, we'll get back 50,000 rows which seems expensive. What I want
>>> is just the scalar 50,000. I've tried writing a reduce that returns
>>> the set of distinct song_ids for each user (turning the values list
>>> into a dictionary and back again), but CouchDB complains that I am not
>>> reducing my values fast enough :-/ I'm also not sure how to reduce
>>> this list to a scalar at the end without returning the whole thing to
>>> my client (which defeats the purpose of all this anyways).
>>>
>>> Is this possible to do in CouchDB today? If not, is it something that
>>> is on the roadmap, or does the internal structure of CouchDB's b-tree
>>> make this really hard to do? It would of course be possible for me to
>>> implement this myself (subscribe to the update notifications and
>>> update my counts as appropriate in a custom script), but I wanted to
>>> move to CouchDB so that I wouldn't have to do all this myself.
>>>
>>> Thanks for any advice!
>>>
>>> --Rob
>>>
>>
>