In the CouchDB wiki (http://wiki.apache.org/couchdb/FUQ), it says:

=========================================
In a view, why should I not emit(key,doc) ?

The key point here is that by emitting ,doc you are duplicating the document 
which is already present in the database (a .couch file), and including it in 
the results of the view (a .view file, with similar structure). This is the 
same as having a SQL Index that includes the original table, instead of using a 
foreign key.

The same effect can be acheived by using emit(key,null) and ?include_docs=true 
with the view request. This approach has the benefit of not duplicating the 
document data in the view index, which reduces the disk space consumed by the 
view. On the other hand, the file access pattern is slightly more expensive for 
CouchDB. It is usually a premature optimization to include the document in the 
view. As always, if you think you may need to emit the document it's always 
best to test.
=========================================

In my own research, that seems to not be the case. I have a CouchDB 1.2 
instance, and when I have a view that uses emit(key, doc) on my 0.8 GB dataset, 
I see a 1.4 GB .view file. When I create the exact same view, but instead 
emit(key, null) or emit(key, {"_id":doc._id}) it creates a .view file that is 
also 1.4 GB. I should also add that the indexing time to create these views is 
non-trivial on my current machine. Taking around 10-15 minutes. Examining the 
.view files with a text editor seems to indicate that they are all storing the 
full documents in the view index, which does not follow what it says in the 
wiki.

It seems like the wiki should be updated? Or a bug? As I see no difference 
between using emit(key, doc) and emit(key, null).

Thanks!
--
George W.

Reply via email to