Hi Chetan, overall I prefer B, but that would be not a trivial effort and it looks like reinventing what ES (and Solr) already have in that area. However, also given that, I think A and C would be very tricky to solve in OSGi, I have dealt myself with this topic before 1.0 and there was no clean solution (see also [1]), also because Lucene uses ServiceLoader for some of its configuration points [2][3].
Regards, Tommaso [1] : https://issues.apache.org/jira/browse/LUCENE-3167 [2] : https://issues.apache.org/jira/browse/SMX4-1637 [3] : http://www.eclipse.org/forums/index.php/t/474446/ 2014-11-05 14:38 GMT+01:00 Chetan Mehrotra <[email protected]>: > Sent wrong link for Elastic search doc. Refer to > > http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis.html > for how to enabled constructing analyzer > Chetan Mehrotra > > > On Wed, Nov 5, 2014 at 6:34 PM, Chetan Mehrotra > <[email protected]> wrote: > > Hi Team, > > > > Currently Oak uses a hard coded analyzer. As part of OAK-2177 this > > need to be opened up for extension. Possible approaches > > > > A - Create anayzer using reflection > > ------------------------------------------ > > > > Jackrabbit [1] used to support changing the analyzer. We can do the > > same by capturing the analyzer class name and instantiate it > > > > Pros : Simple to implement > > Cons : Would not work well in OSGi > > > > B - Make it configurable via content > > ---------------------------------------------- > > > > Elasticsearch provide a content based dsl to create analyzer [1]. We > > can possibly implement something similar > > > > Pros : End user usability improves quite a bit. No need to code, just > configure! > > Cons : Implementation complexity to support full configuration via > content > > > > C - Lookup Analyzer via OSGi > > ---------------------------------------- > > > > Make use of OSGi Service Registry to look analyzer by name. User > > provides the analyzer as part of config > > > > "title" : { > > "boost" : 1.5, > > "analyzer" : "AnalyzerA" > > } > > > > Oak can ship some default analyzer (lowercase etc) and others can be > > looked from SR. For configuration we have two options > > > > 1. Provide an AnalyzerFactory - The factory can be provided with Index > > definition nodestate corresponding to analyzer element. This can be > > used to configure the analyzer say by reading stop word data from > > content > > > > 2. No default support - Analyzer provider are expected to register the > > analyzer fully configured. They can probably utilize the repository > > api lookup config > > > > Pros : Full extensibility Makes use of OSGi > > Cons : > > 1. Need to export Lucene classes, > > 2. Deal with OSGi dynamic nature etc (we can simply throw exception if > > analyzer is not found) > > > > Chetan Mehrotra > > [1] > http://wiki.apache.org/jackrabbit/IndexingConfiguration#Index_Analyzers >
