Re: How does indexing really work?

Tito Ciuro Tue, 28 Oct 2014 10:57:49 -0700

Hello Joan,

I’m getting this information in two places:


- Futon’s “Status” page
- CouchDB’s /_active_tasks payload

I know there are ~950,000 documents in the database. This numbers appears in 
the /_utils main page. What I don’t understand is why the total number of 
documents differ when the active tasks are reported via Status page or 
/_active_tasks. Each active task has a different total number of docs to be 
processed.

Yes, Genesis case is the initial case where CouchDB hasn’t had the opportunity 
to index anything.

Thanks,

— Tito

> On Oct 28, 2014, at 10:50 AM, Joan Touzet <[email protected]> wrote:
> 
> Hi Tito,
> 
> Can you explain where you're getting the "total count" from? Is this the 
> total number of rows emitted by each view after all views have finished 
> processing?
> 
> What do you mean by "Genesis case" - do you mean building a view for the 
> first time?
> 
> Thanks,
> Joan
> 
> ----- Original Message -----
> From: "Tito Ciuro" <[email protected] <mailto:[email protected]>>
> To: [email protected] <mailto:[email protected]>
> Sent: Tuesday, October 28, 2014 1:32:37 PM
> Subject: How does indexing really work?
> 
> Hello,
> 
> I’m a bit confused about how CouchDB really works. I just launched Futon and 
> see that the indexer is busy working on a design document. I have almost a 
> million documents.
> 
> A few minutes later, I see three more tasks appearing, all belonging to 
> different design documents. No problem, except that the total count is all 
> different:
> 
> - design doc 1: ~950,000
> - design doc 2: ~450,000
> - design doc 3: ~313,000
> - design doc 4: ~85,000
> 
> Why are the total counts different? My understanding is/was that a database 
> holds N documents. Each indexing function is passed a document which then 
> gets compares whether it’s the doc_type it expects:
> 
> function(doc) {
>    
> <http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#CA-1846e35e0e66fe65e7a443a2459a0272833e6152_2
>  
> <http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#CA-1846e35e0e66fe65e7a443a2459a0272833e6152_2>>if
>  (doc.Type == "customer") {
>    
> <http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#CA-1846e35e0e66fe65e7a443a2459a0272833e6152_3
>  
> <http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#CA-1846e35e0e66fe65e7a443a2459a0272833e6152_3>>emit(doc._id,
>  {LastName: doc.LastName, FirstName: doc.FirstName});
>    
> <http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#CA-1846e35e0e66fe65e7a443a2459a0272833e6152_4
>  
> <http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#CA-1846e35e0e66fe65e7a443a2459a0272833e6152_4>>}
> }
> 
> In the Genesis case, I was assuming that each view would have to go through 
> each document across the database and index its own doc_type. Basically, one 
> round for each design document for N total documents. For example, if the 
> database contains 100,000 documents and two design documents, there would be 
> two active tasks listed:
> 
> - _design/customers => index 100,000 documents
> - _design/orders => index 100,000 documents
> 
> Later on, the indexing would be partial and the delta (say 9,000 docs) would 
> have to be reindexed by each view:
> 
> - _design/customers => index 9,000 documents
> - _design/orders => index 9,000 documents
> 
> This doesn’t seem to be the case. I’d love to know how indexing really works.
> 
> Thanks!
> 
> — Tito

Re: How does indexing really work?

Reply via email to