I'm trying to understand how couchdb does incremental updates to its map/reduce views. By this, I understand that it only has to read those documents which have changed since the view was last updated.
I have an idea how it might work, so let me pose it as an example. Here's a database with 6 documents and a very simple summing map/reduce function. ------------------------------------------------------------- #!/bin/sh set -xe HOST1=http://localhost:5984 LOCAL1=maptest DB1="$HOST1/$LOCAL1" curl -X DELETE "$DB1"; echo curl -X PUT "$DB1"; echo curl -sX PUT -d '{"values":[1,2]}' "$DB1/doc1" curl -sX PUT -d '{"values":[]}' "$DB1/doc2" curl -sX PUT -d '{"values":[3,4,5]}' "$DB1/doc3" curl -sX PUT -d '{"values":[6,7]}' "$DB1/doc4" curl -sX PUT -d '{"values":[8]}' "$DB1/doc5" curl -sX PUT -d '{"values":[9]}' "$DB1/doc6" curl -sX PUT -T - "$DB1/_design/test" <<EOS { "views": { "sum": { "map": "function(doc) { if (doc.values) { for (var i in doc.values) { emit(null,doc.values[i]) } } }", "reduce": "function(key, values) { return sum(values) }" } } } EOS echo -e "\n\nReading view..." curl "$DB1/_view/test/sum" ------------------------------------------------------------- I am guessing there must be an upper bound to the number of keys and values passed at once to the reduce function, so let me assume for now that limit is 3. Then the process is: doc1 doc3 doc4 doc5 doc6 MAP / \ / | \ / \ | | 1 2 3 4 5 6 7 8 9 REDUCE \ | / \ | / \ | / 6 15 24 REDUCE `--------. | ,--------' 45 Now, suppose I delete doc5. In order to update this sum incrementally, I think I would have to: (1) Regenerate the map block(s) which included doc5 - so this would now be [7,9] instead of [7,8,9] (2) Reduce those block(s) - giving 16 instead of 24 (3) Down the reduction tree, re-reduce those blocks which depend on that input Is that an accurate description of the process? In order to do this, couchdb would have to remember all the intermediate reduce values, and also know that the intermediate value 24 was derived from doc4, doc5 and doc6. Furthermore, it would have to re-map doc4 and doc6 to generate the new value, even though they had not changed. Alternatively, it could keep all the map results materialised, each tagged with the the docid where it came from, so it could simply remove the 8 from the map when doc5 is deleted. Is that what it does? (If so, I think it could be useful to expose that in the view, so that a single view could be used as both map and map+reduce) Now, the other thing which I don't understand is group_level. I can understand that emit gives (key,value) pairs, so group=true gives me the intermediate reduction values for rows with identical keys. For example, if I change the above map function to "map": "function(doc) { if (doc.values) { for (var i in doc.values) { emit(doc._id,doc.values[i]) } } }", this gives the result I expect, and I imagine that the reduction process is working something like this: doc1 doc3 doc4 doc5 doc6 MAP / \ / | \ / \ | | 1 2 3 4 5 6 7 8 9 REDUCE \ / \ | / \ / | | 3 12 13 8 9 ==> group=true REDUCE `------. | ,-----' \ / 28 17 REDUCE `------. ,-----' 45 ==> group=false Is that right? But in that case, what does group_level=2 do? Is this something which comes into play if emitted keys are structured as arrays? Anyway, sorry for the long E-mail. I'm just trying to get all this clear in my head :-) Many thanks, Brian.
