Re: couchdb-lucene indexing issues

Rory Franklin Mon, 05 Sep 2011 02:16:09 -0700

 I've modified my original index in CouchDB to be the following, but not having 
any joy with things being broken up in to tokens:



{
 "_id": "_design/foo",
 "_rev": "19-da99913ce4cdd421903d0d48f9a40cc3",
 "fulltext": {
"by_metadata": {
 "index": "function(doc) { 
var ret=new Document(); 
if (doc['type'] == 'CSAsset' && doc['deleted'] != true) {
for (var i in doc.metadata) { 
if(doc.metadata[i]['key'] == 'Title') { 
ret.add(doc.metadata[i]['value'].toLowerCase(), {'field':'sort_title', 
'store':'yes', 'index' : 'not_analyzed'});
} 
ret.add(doc.metadata[i]['value'],{ 'field' : 
doc.metadata[i]['key'].toLowerCase(), 'analyzer' : 'simple' }); 
ret.add(doc.metadata[i]['value'], { 'analyzer' : 'simple' }); 
} 
for (var i in doc.partitions) { 
ret.add(doc.partitions[i].partition_id,{'field':'partition'}); 
ret.add(doc.partitions[i].partition_id);
} 
ret.add(doc['created_at'], {'field':'sort_created_at', 'store':'yes', 'index' : 
'not_analyzed'}); 
return ret; 
} else { 
return null; 
} 
}"
 }
 }
}

I've opened the index up in Luke and going to the Documents tab and doing 
reconstruct & edit on a particular document shows that the fields aren't being 
split up in to separate tokens.


--

Rory

On Saturday, 3 September 2011 at 17:12, Robert Newson wrote:

> " For instance, searching for the term "wonderland" should return back
> a document where there is a field with the value
> "some_wonderland_example" but it doesn't."
> 
> It shouldn't and doesn't. :)
> 
> 'some_wonderland_example' is a single token when tokenized by the
> default StandardAnalyzer. If instead you specify "analyzer":"simple",
> you will find that it is 3 tokens, and your search should work.
> 
> B.
> 
> On 3 September 2011 16:06, Rory Franklin <[email protected] 
> (mailto:[email protected])> wrote:
> > I'm using couchdb-lucene to index a list of fields (user defined) in a 
> > document using the following design document:
> > 
> > {
> > "_id": "_design/foo",
> > "_rev": "16-dcd0d39369c35b3d74ceef13a388826f",
> > "fulltext": {
> > "by_metadata": {
> > "index": "function(doc) {
> > var ret=new Document();
> > if (doc['type'] == 'CSAsset' && doc['deleted'] != true) {
> > for (var i in doc.metadata) {
> > if(doc.metadata[i]['key'] == 'Title') {
> > ret.add(doc.metadata[i]['value'].toLowerCase(), {'field':'sort_title', 
> > 'store':'yes', 'index' : 'not_analyzed'});
> > }
> > ret.add(doc.metadata[i]['value'],{'field':doc.metadata[i]['key'].toLowerCase()
> >  });
> > ret.add(doc.metadata[i]['value']);
> > }
> > for (var i in doc.partitions) {
> > ret.add(doc.partitions[i].partition_id,{'field':'partition'}); 
> > ret.add(doc.partitions[i].partition_id);
> > }
> > ret.add(doc['created_at'], {'field':'sort_created_at', 'store':'yes', 
> > 'index' : 'not_analyzed'});
> > return ret;
> > } else {
> > return null;
> > }
> > }"
> > }
> > }
> > }
> > 
> > 
> > 
> > (I've formatted the definition so that it's not all on one line for 
> > readability here)
> > 
> > However, when using the by_metadata view it doesn't appear to be breaking 
> > the values up when there are underscores. For instance, searching for the 
> > term "wonderland" should return back a document where there is a field with 
> > the value "some_wonderland_example" but it doesn't. It returns the document 
> > if I search for the full term.
> > 
> > I'm just wondering whether I'm defining the index incorrectly? (of course, 
> > feel free to point out if I'm doing anything else glaringly obviously wrong 
> > too!)
> > 
> > 
> > 
> > Rory

Re: couchdb-lucene indexing issues

Reply via email to