The "problem", that's the UIMA's power,  is that everyone has it's own
type system.
To produce a lucene document one extract information from some
features, applying the right analyzer. In my case I use maybe only 10%
of the annotation produced by the analysis pipeline to produce a
single lucene doc.
So we need a very highly configurable component, able to map only
certain declared features and applying the right analyzer and so on.
Mny ways are possible:
-completly programmatic: the indexer is abstract and should be
extended to implement the right mapping for a specialized typeSytem
and pipeline
-configurable: mapping rules are defined in a descriptor file; the
JENA component followed this way

I prefer mapping rules in the descriptor. These rules have to be
adjusted by many users to make them compatible with
their type system. Hard coding the mapping rules makes
this task more difficult.

As far as I know was this approach also chosen by the
regex annotator in the sandbox.

Jörn

Reply via email to