I've modified my original index in CouchDB to be the following, but not having
any joy with things being broken up in to tokens:
{
"_id": "_design/foo",
"_rev": "19-da99913ce4cdd421903d0d48f9a40cc3",
"fulltext": {
"by_metadata": {
"index": "function(doc) {
var ret=new Document();
if (doc['type'] == 'CSAsset' && doc['deleted'] != true) {
for (var i in doc.metadata) {
if(doc.metadata[i]['key'] == 'Title') {
ret.add(doc.metadata[i]['value'].toLowerCase(), {'field':'sort_title',
'store':'yes', 'index' : 'not_analyzed'});
}
ret.add(doc.metadata[i]['value'],{ 'field' :
doc.metadata[i]['key'].toLowerCase(), 'analyzer' : 'simple' });
ret.add(doc.metadata[i]['value'], { 'analyzer' : 'simple' });
}
for (var i in doc.partitions) {
ret.add(doc.partitions[i].partition_id,{'field':'partition'});
ret.add(doc.partitions[i].partition_id);
}
ret.add(doc['created_at'], {'field':'sort_created_at', 'store':'yes', 'index' :
'not_analyzed'});
return ret;
} else {
return null;
}
}"
}
}
}
I've opened the index up in Luke and going to the Documents tab and doing
reconstruct & edit on a particular document shows that the fields aren't being
split up in to separate tokens.
--
Rory
On Saturday, 3 September 2011 at 17:12, Robert Newson wrote:
> " For instance, searching for the term "wonderland" should return back
> a document where there is a field with the value
> "some_wonderland_example" but it doesn't."
>
> It shouldn't and doesn't. :)
>
> 'some_wonderland_example' is a single token when tokenized by the
> default StandardAnalyzer. If instead you specify "analyzer":"simple",
> you will find that it is 3 tokens, and your search should work.
>
> B.
>
> On 3 September 2011 16:06, Rory Franklin <[email protected]
> (mailto:[email protected])> wrote:
> > I'm using couchdb-lucene to index a list of fields (user defined) in a
> > document using the following design document:
> >
> > {
> > "_id": "_design/foo",
> > "_rev": "16-dcd0d39369c35b3d74ceef13a388826f",
> > "fulltext": {
> > "by_metadata": {
> > "index": "function(doc) {
> > var ret=new Document();
> > if (doc['type'] == 'CSAsset' && doc['deleted'] != true) {
> > for (var i in doc.metadata) {
> > if(doc.metadata[i]['key'] == 'Title') {
> > ret.add(doc.metadata[i]['value'].toLowerCase(), {'field':'sort_title',
> > 'store':'yes', 'index' : 'not_analyzed'});
> > }
> > ret.add(doc.metadata[i]['value'],{'field':doc.metadata[i]['key'].toLowerCase()
> > });
> > ret.add(doc.metadata[i]['value']);
> > }
> > for (var i in doc.partitions) {
> > ret.add(doc.partitions[i].partition_id,{'field':'partition'});
> > ret.add(doc.partitions[i].partition_id);
> > }
> > ret.add(doc['created_at'], {'field':'sort_created_at', 'store':'yes',
> > 'index' : 'not_analyzed'});
> > return ret;
> > } else {
> > return null;
> > }
> > }"
> > }
> > }
> > }
> >
> >
> >
> > (I've formatted the definition so that it's not all on one line for
> > readability here)
> >
> > However, when using the by_metadata view it doesn't appear to be breaking
> > the values up when there are underscores. For instance, searching for the
> > term "wonderland" should return back a document where there is a field with
> > the value "some_wonderland_example" but it doesn't. It returns the document
> > if I search for the full term.
> >
> > I'm just wondering whether I'm defining the index incorrectly? (of course,
> > feel free to point out if I'm doing anything else glaringly obviously wrong
> > too!)
> >
> >
> >
> > Rory