I would like to just add that CouchDB views represents a single dimensional index. Index as in the same term in a relational database. A list keys is like specifying sub-ordering, order by the first, then the second, then the....
It sounded like at some point you may have been a bit confused as to what a map function is defining. It is defining the index's key and the value mapped to that key. This gives an otherwise completely unstructured assortment of data a view of consistent structure. Anyway, back to what you are trying to accomplish. Honestly it sounds like you are trying to get too advanced for the built-in _count or _sum reduce functions. Have you tried writing a custom reduce function that does the grouping how you want, basically by name alone? https://gist.github.com/1010318 I tried this out with 10 docs fitting your example structure and with a plain query (no grouping, no filtering, reduce on) I get back: { John: 4, Jane: 6 } Maybe this example can get you started. The map function I defined above I used those keys because it looks like you are interested in filtering on them. In my query I used for the example results I actually didn't use the key at all (no filtering). On Mon, Jun 6, 2011 at 8:12 AM, Torstein Krause Johansen < [email protected]> wrote: > Hi Benjamin, > > and thanks for your comments. > > > On 31/05/11 22:11, Benjamin Young wrote: > >> On 5/27/11 5:16 AM, Torstein Krause Johansen wrote: >> > > ?group=true&group_level=2&startkey=["2011-05-26"]&endkey=["2011-05-27", >>>> {}] >>>> >>>> results in: >>>> >>>> { >>>> "key": ["2011-05-26", "Lisa"], >>>> "value": 1 >>>> }, >>>> { >>>> "key": ["2011-05-26", "John"], >>>> "value": 2 >>>> }, >>>> { >>>> "key": ["2011-05-27", "John"], >>>> "value": 1 >>>> } >>>> >>>> You can of course emit not just days, but also weeks, months, >>>> quarters if that's what you always want. If it arbitrary and you need >>>> to aggregate the names afterwards from this smaller set, yo should do >>>> it in the client (whoever calls CouchDB to get this information out). >>>> >>> >>> Mhmm, ok, thanks for explaining this. >>> >>> It means though, that for every unique time stamp that a_name has an >>> entry, there will be a corresponding count returned (like the three >>> you listed above). >>> >>> Hence, if a_name has 1000 entries at slightly different times within >>> the time range I'm searching for (my created_at includes seconds), I >>> will get 1000 such entries back. >>> >> >> It really just depends on what you want to count/reduce/etc. If you only >> want a count of the names (and don't want additional >> granularity--name+year counts) then just return the name as the index. >> If you want the count of names by year/month/day, etc, then return those >> *after* the name, so you can add specificity by incrementing your >> group_level param. >> > > There's probably, something I haven't understood here. If I add my search > fields after a_name, then how can I limit my search on start and endkey when > a_name cannot be included in the start and end keys (since the name is what > I want to count on)? > > Just to be sure, I want to re-state what I want: I have documents with the > following fields: > > { > one_id : 1, > another_id : 22, > created_at : "2011-05-26 23:22:11", > a_name : "Lisa" > } > > I want to be able to search all occurrences with a combination of the three > first ones as query parameters and then count the number of a_name > occurrences within each of these search collections. > > There will be many entries like the one above (say 30.000), where the only > difference is the created_at field. Searching for these variable parameters: > > one_id=1, > another_id=22, > created_at > "2011-05-26 23:30:00" > created_at < "2011-05-27 01:00:00" > > I want to end up with a dictionary listing the names and their count > matching the search parameters: > > { > "Lisa" : 132 > "John" : 16 > } > > If I put [created_at, one_id, another_id, a_name] in the key, I can use the > start and end keys : > ?group=true& > group_level=4& > startkey=["2011-05-26 23:30:00",1,22]& > endkey=["2011-05-27 01:00:00",2,23] > > I will get results like these: > { > "key": ["2011-05-26 23:30:10", 1, 22, "Lisa"], > "value": 1 > }, > { > "key": ["2011-05-26 23:30:12", 1, 22, "Lisa"], > "value": 3 > }, > { > "key": ["2011-05-26 23:33:43", 1, 22, "Lisa"], > "value": 5 > }, > [..] > > Giving me a quite big result set, since there's so many hits where the > created_at is slightly different. > > > Alternatively, if you want to count *just* the names and *just* the >> dates, you'll need two indexes ones for names and one for dates as you >> can't "skip" the key groups (as your example tried to do with [{},...]. >> >> Basically, you'll need an additional view/index for each key you're >> wanting to count + whatever output you want to make the counting more >> granular (in this case, date). >> > > Mhmm. So in this case, it means I need an index for one_id, another_id and > a_name (three ones)? If yes, I'm puzzled as to how I can make use of these > indexes just with one GET request? > > [..] > > Initially, I got something working for my use case, using two indexes, one > to get the a_name values based based on the search queries a_value, > another_value & created_at. Querying the second index, I got the number of > occurrences for a_name within the hits returned from the first query. > > However, this didn't feel optimal (although I've read posts on the mailing > list of people doing two batches of queries before), so I tried to go down a > different road, as described above. > > Best regards, > > -Torstein > -- “The limits of language are the limits of one's world. “ -Ludwig von Wittgenstein
