Hello Joan, I’m getting this information in two places:
- Futon’s “Status” page - CouchDB’s /_active_tasks payload I know there are ~950,000 documents in the database. This numbers appears in the /_utils main page. What I don’t understand is why the total number of documents differ when the active tasks are reported via Status page or /_active_tasks. Each active task has a different total number of docs to be processed. Yes, Genesis case is the initial case where CouchDB hasn’t had the opportunity to index anything. Thanks, — Tito > On Oct 28, 2014, at 10:50 AM, Joan Touzet <[email protected]> wrote: > > Hi Tito, > > Can you explain where you're getting the "total count" from? Is this the > total number of rows emitted by each view after all views have finished > processing? > > What do you mean by "Genesis case" - do you mean building a view for the > first time? > > Thanks, > Joan > > ----- Original Message ----- > From: "Tito Ciuro" <[email protected] <mailto:[email protected]>> > To: [email protected] <mailto:[email protected]> > Sent: Tuesday, October 28, 2014 1:32:37 PM > Subject: How does indexing really work? > > Hello, > > I’m a bit confused about how CouchDB really works. I just launched Futon and > see that the indexer is busy working on a design document. I have almost a > million documents. > > A few minutes later, I see three more tasks appearing, all belonging to > different design documents. No problem, except that the total count is all > different: > > - design doc 1: ~950,000 > - design doc 2: ~450,000 > - design doc 3: ~313,000 > - design doc 4: ~85,000 > > Why are the total counts different? My understanding is/was that a database > holds N documents. Each indexing function is passed a document which then > gets compares whether it’s the doc_type it expects: > > function(doc) { > > <http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#CA-1846e35e0e66fe65e7a443a2459a0272833e6152_2 > > <http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#CA-1846e35e0e66fe65e7a443a2459a0272833e6152_2>>if > (doc.Type == "customer") { > > <http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#CA-1846e35e0e66fe65e7a443a2459a0272833e6152_3 > > <http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#CA-1846e35e0e66fe65e7a443a2459a0272833e6152_3>>emit(doc._id, > {LastName: doc.LastName, FirstName: doc.FirstName}); > > <http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#CA-1846e35e0e66fe65e7a443a2459a0272833e6152_4 > > <http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#CA-1846e35e0e66fe65e7a443a2459a0272833e6152_4>>} > } > > In the Genesis case, I was assuming that each view would have to go through > each document across the database and index its own doc_type. Basically, one > round for each design document for N total documents. For example, if the > database contains 100,000 documents and two design documents, there would be > two active tasks listed: > > - _design/customers => index 100,000 documents > - _design/orders => index 100,000 documents > > Later on, the indexing would be partial and the delta (say 9,000 docs) would > have to be reindexed by each view: > > - _design/customers => index 9,000 documents > - _design/orders => index 9,000 documents > > This doesn’t seem to be the case. I’d love to know how indexing really works. > > Thanks! > > — Tito
