On 2011-01-24 13:49, Brendon McLean wrote:
Our documents really only contain two types of data:
Numeric attributes
Boolean attributes
The boolean attributes essentially mark a document as belonging to one or more
non-exclusive sets. The numeric attributes will always only need to be summed.
One way of structuring the document is like this:
{
"id": 3123123,
"attr": {"x": 2, "y": 4, "z": 6},
"sets": ["A", "B", "C"]
}
With this structure, it's easy to work out aggregate x, y, z values for the sets A,
B and C, but it gets more complicated when you want to see the aggregates for
intersections like A&C.
In this small case I could emit keys for all permutations of ABC ("A, B, C, AB, AC,
BC, ABC"), but I'm worried about how this will scale. Our documents could belong to
some combination of 80 sets and it is fronted by a user-interface which can construct any
conceivable combination of them.
I'm inclined to think that this isn't a job for a CouchDB, and perhaps MongoDB
or something else would be better suited to this problem.
We have solved the problem for attributes by creating one index for each
attribute (running one query for each index) and then just aggregating
the results after.
Regards,
Michael.