Hi Benjamin,

and thanks for your comments.

On 31/05/11 22:11, Benjamin Young wrote:
On 5/27/11 5:16 AM, Torstein Krause Johansen wrote:

?group=true&group_level=2&startkey=["2011-05-26"]&endkey=["2011-05-27",
{}]

results in:

{
"key": ["2011-05-26", "Lisa"],
"value": 1
},
{
"key": ["2011-05-26", "John"],
"value": 2
},
{
"key": ["2011-05-27", "John"],
"value": 1
}

You can of course emit not just days, but also weeks, months,
quarters if that's what you always want. If it arbitrary and you need
to aggregate the names afterwards from this smaller set, yo should do
it in the client (whoever calls CouchDB to get this information out).

Mhmm, ok, thanks for explaining this.

It means though, that for every unique time stamp that a_name has an
entry, there will be a corresponding count returned (like the three
you listed above).

Hence, if a_name has 1000 entries at slightly different times within
the time range I'm searching for (my created_at includes seconds), I
will get 1000 such entries back.

It really just depends on what you want to count/reduce/etc. If you only
want a count of the names (and don't want additional
granularity--name+year counts) then just return the name as the index.
If you want the count of names by year/month/day, etc, then return those
*after* the name, so you can add specificity by incrementing your
group_level param.

There's probably, something I haven't understood here. If I add my search fields after a_name, then how can I limit my search on start and endkey when a_name cannot be included in the start and end keys (since the name is what I want to count on)?

Just to be sure, I want to re-state what I want: I have documents with the following fields:

{
    one_id : 1,
    another_id : 22,
    created_at : "2011-05-26 23:22:11",
    a_name : "Lisa"
}

I want to be able to search all occurrences with a combination of the three first ones as query parameters and then count the number of a_name occurrences within each of these search collections.

There will be many entries like the one above (say 30.000), where the only difference is the created_at field. Searching for these variable parameters:

    one_id=1,
    another_id=22,
    created_at > "2011-05-26 23:30:00"
    created_at < "2011-05-27 01:00:00"

I want to end up with a dictionary listing the names and their count matching the search parameters:

{
   "Lisa" : 132
   "John" : 16
}

If I put [created_at, one_id, another_id, a_name] in the key, I can use the start and end keys :
?group=true&
group_level=4&
startkey=["2011-05-26 23:30:00",1,22]&
endkey=["2011-05-27 01:00:00",2,23]

I will get results like these:
{
  "key": ["2011-05-26 23:30:10", 1, 22, "Lisa"],
  "value": 1
},
{
  "key": ["2011-05-26 23:30:12", 1, 22, "Lisa"],
  "value": 3
},
{
  "key": ["2011-05-26 23:33:43", 1, 22, "Lisa"],
  "value": 5
},
[..]

Giving me a quite big result set, since there's so many hits where the created_at is slightly different.

Alternatively, if you want to count *just* the names and *just* the
dates, you'll need two indexes ones for names and one for dates as you
can't "skip" the key groups (as your example tried to do with [{},...].

Basically, you'll need an additional view/index for each key you're
wanting to count + whatever output you want to make the counting more
granular (in this case, date).

Mhmm. So in this case, it means I need an index for one_id, another_id and a_name (three ones)? If yes, I'm puzzled as to how I can make use of these indexes just with one GET request?

[..]

Initially, I got something working for my use case, using two indexes, one to get the a_name values based based on the search queries a_value, another_value & created_at. Querying the second index, I got the number of occurrences for a_name within the hits returned from the first query.

However, this didn't feel optimal (although I've read posts on the mailing list of people doing two batches of queries before), so I tried to go down a different road, as described above.

Best regards,

-Torstein

Reply via email to