Sent wrong link for Elastic search doc. Refer to
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis.html
for how to enabled constructing analyzer
Chetan Mehrotra


On Wed, Nov 5, 2014 at 6:34 PM, Chetan Mehrotra
<[email protected]> wrote:
> Hi Team,
>
> Currently Oak uses a hard coded analyzer. As part of OAK-2177 this
> need to be opened up for extension. Possible approaches
>
> A - Create anayzer using reflection
> ------------------------------------------
>
> Jackrabbit [1] used to support changing the analyzer. We can do the
> same by capturing the analyzer class name and instantiate it
>
> Pros : Simple to implement
> Cons : Would not work well in OSGi
>
> B - Make it configurable via content
> ----------------------------------------------
>
> Elasticsearch provide a content based dsl to create analyzer [1]. We
> can possibly implement something similar
>
> Pros : End user usability improves quite a bit. No need to code, just 
> configure!
> Cons : Implementation complexity to support full configuration via content
>
> C - Lookup Analyzer via OSGi
> ----------------------------------------
>
> Make use of OSGi Service Registry to look analyzer by name. User
> provides the analyzer as part of config
>
> "title" : {
>         "boost" : 1.5,
>          "analyzer" : "AnalyzerA"
> }
>
> Oak can ship some default analyzer (lowercase etc) and others can be
> looked from SR. For configuration we have two options
>
> 1. Provide an AnalyzerFactory - The factory can be provided with Index
> definition nodestate corresponding to analyzer element. This can be
> used to configure the analyzer say by reading stop word data from
> content
>
> 2. No default support - Analyzer provider are expected to register the
> analyzer fully configured. They can probably utilize the repository
> api lookup config
>
> Pros : Full extensibility Makes use of OSGi
> Cons :
> 1. Need to export Lucene classes,
> 2. Deal with OSGi dynamic nature etc (we can simply throw exception if
> analyzer is not found)
>
> Chetan Mehrotra
> [1] http://wiki.apache.org/jackrabbit/IndexingConfiguration#Index_Analyzers

Reply via email to