In the CouchDB wiki (http://wiki.apache.org/couchdb/FUQ), it says:
=========================================
In a view, why should I not emit(key,doc) ?
The key point here is that by emitting ,doc you are duplicating the document
which is already present in the database (a .couch file), and including it in
the results of the view (a .view file, with similar structure). This is the
same as having a SQL Index that includes the original table, instead of using a
foreign key.
The same effect can be acheived by using emit(key,null) and ?include_docs=true
with the view request. This approach has the benefit of not duplicating the
document data in the view index, which reduces the disk space consumed by the
view. On the other hand, the file access pattern is slightly more expensive for
CouchDB. It is usually a premature optimization to include the document in the
view. As always, if you think you may need to emit the document it's always
best to test.
=========================================
In my own research, that seems to not be the case. I have a CouchDB 1.2
instance, and when I have a view that uses emit(key, doc) on my 0.8 GB dataset,
I see a 1.4 GB .view file. When I create the exact same view, but instead
emit(key, null) or emit(key, {"_id":doc._id}) it creates a .view file that is
also 1.4 GB. I should also add that the indexing time to create these views is
non-trivial on my current machine. Taking around 10-15 minutes. Examining the
.view files with a text editor seems to indicate that they are all storing the
full documents in the view index, which does not follow what it says in the
wiki.
It seems like the wiki should be updated? Or a bug? As I see no difference
between using emit(key, doc) and emit(key, null).
Thanks!
--
George W.