Re: Question on selecting on reduce values

Aurélien Bénel Fri, 28 May 2010 10:03:09 -0700

Thanks for your answer,

> It seems that you're using a _list function to filter your view results, 
> right? 
> Be aware that even though you're not sending that data to the client, the 
> database still has to iterate thru all the view rows and send them to the 
> _list function, just to get filtered there. So the amount of time it takes to 
> query your view/list will increase proportionally with the number rows 
> returned from the view query.


Yes. This is indeed why I am sceptic about this way of selecting reduce values.

In our project, we try to move our open-source text analysis software from 
PHP/PostgreSQL to CouchDB.
The current issue is about getting repeated phrases (sequences of 3 words) in 
forums. 

Each forum thread is stored as a CouchDB "document".

A view emits every sequence that match different constraints :

function(doc) {
 const ALPHA = /[a-zàâçéêèëïîôöüùû0-9]+|[^a-zàâçéêèëïîôöüùû0-9]+/gi;
 for each (p in doc.posts) {
   var words = p.text.match(ALPHA);
   for (i=0; i<words.length-4; i+=2) {
     if (
       (words[i].length>3 || words[i+2].length>3 || words[i+4].length>3)
       && words[i+1].length==1
       && words[i+3].length==1
     ) {
       emit([
         words[i].toLowerCase(),
         words[i+2].toLowerCase(),
         words[i+4].toLowerCase()
       ], null);
     }
   }
 }
}

Then a reduce is done to count occurrences on the whole corpus :

function(keys, values, combine) {
 if (combine) {
   return sum(values);
 } else {
   return values.length;
 }
}

Then a list filters out unrepeated phrases :

function(head, req) {
 var phrase;
 send('{"rows":[\n');
 while (phrase = getRow()) {
   if (phrase.value>1) { // is repeated
       send(JSON.stringify(phrase));
       send(',\n');
   }
 }  
 send(']}');
}


I know that the view could be done differently and probably more efficiently 
with regular expressions, but my worry is not on the performance of the first 
generation of views (that was what I meant by "cached"), but every time I query 
the list. 


Regards,

Aurélien

Re: Question on selecting on reduce values

Reply via email to