Re: Complex queries & results

Torstein Krause Johansen Mon, 06 Jun 2011 05:13:18 -0700

Hi Benjamin,

and thanks for your comments.


On 31/05/11 22:11, Benjamin Young wrote:

On 5/27/11 5:16 AM, Torstein Krause Johansen wrote:

?group=true&group_level=2&startkey=["2011-05-26"]&endkey=["2011-05-27",
{}]

results in:

{
"key": ["2011-05-26", "Lisa"],
"value": 1
},
{
"key": ["2011-05-26", "John"],
"value": 2
},
{
"key": ["2011-05-27", "John"],
"value": 1
}

You can of course emit not just days, but also weeks, months,
quarters if that's what you always want. If it arbitrary and you need
to aggregate the names afterwards from this smaller set, yo should do
it in the client (whoever calls CouchDB to get this information out).


Mhmm, ok, thanks for explaining this.

It means though, that for every unique time stamp that a_name has an
entry, there will be a corresponding count returned (like the three
you listed above).

Hence, if a_name has 1000 entries at slightly different times within
the time range I'm searching for (my created_at includes seconds), I
will get 1000 such entries back.


It really just depends on what you want to count/reduce/etc. If you only
want a count of the names (and don't want additional
granularity--name+year counts) then just return the name as the index.
If you want the count of names by year/month/day, etc, then return those
*after* the name, so you can add specificity by incrementing your
group_level param.

There's probably, something I haven't understood here. If I add mysearch fields after a_name, then how can I limit my search on start andendkey when a_name cannot be included in the start and end keys (sincethe name is what I want to count on)?

Just to be sure, I want to re-state what I want: I have documents withthe following fields:


{
    one_id : 1,
    another_id : 22,
    created_at : "2011-05-26 23:22:11",
    a_name : "Lisa"
}

I want to be able to search all occurrences with a combination of thethree first ones as query parameters and then count the number of a_nameoccurrences within each of these search collections.

There will be many entries like the one above (say 30.000), where theonly difference is the created_at field. Searching for these variableparameters:


    one_id=1,
    another_id=22,
    created_at > "2011-05-26 23:30:00"
    created_at < "2011-05-27 01:00:00"

I want to end up with a dictionary listing the names and their countmatching the search parameters:


{
   "Lisa" : 132
   "John" : 16
}

If I put [created_at, one_id, another_id, a_name] in the key, I can usethe start and end keys :

?group=true&
group_level=4&
startkey=["2011-05-26 23:30:00",1,22]&
endkey=["2011-05-27 01:00:00",2,23]

I will get results like these:
{
  "key": ["2011-05-26 23:30:10", 1, 22, "Lisa"],
  "value": 1
},
{
  "key": ["2011-05-26 23:30:12", 1, 22, "Lisa"],
  "value": 3
},
{
  "key": ["2011-05-26 23:33:43", 1, 22, "Lisa"],
  "value": 5
},
[..]

Giving me a quite big result set, since there's so many hits where thecreated_at is slightly different.

Alternatively, if you want to count *just* the names and *just* the
dates, you'll need two indexes ones for names and one for dates as you
can't "skip" the key groups (as your example tried to do with [{},...].

Basically, you'll need an additional view/index for each key you're
wanting to count + whatever output you want to make the counting more
granular (in this case, date).

Mhmm. So in this case, it means I need an index for one_id, another_idand a_name (three ones)? If yes, I'm puzzled as to how I can make use ofthese indexes just with one GET request?


[..]

Initially, I got something working for my use case, using two indexes,one to get the a_name values based based on the search queries a_value,another_value & created_at. Querying the second index, I got the numberof occurrences for a_name within the hits returned from the first query.

However, this didn't feel optimal (although I've read posts on themailing list of people doing two batches of queries before), so I triedto go down a different road, as described above.


Best regards,

-Torstein

Re: Complex queries & results

Reply via email to