Hi Joern,

Jörn Kottmann wrote:
> Hello,
> 
> often annotators are more flexible and reusable than its assumed in UIMA.
> The configuration is static to the annotator because it is set via the
> descriptor.
> 
> There are annotators which benefit from determining the types to use at
> runtime
> via a configuration parameter. This is already possible (e.g the example
> regex annotator),
> but it is not possible to set the capabilities at runtime.
> To improve this UIMA should ask the annotator for the capabilities. To
> make the configuration
> easier it could be considered to add a "Type" range type for parameters.

it's one of the basic principles of UIMA that this
kind of information is declared statically in the
descriptor.  The point here is that we want the
system to be able to figure out things concerning
the analysis flow just by looking at the descriptors,
without having to instantiate the actual analysis
engine.

> 
> This type system mapping makes annotators independent of one specific
> type system
> and allows the reuse with another type system, e.g. currently its hard
> to reuse a tokenizer
>> from one group and combine it with a pos annotator from another group 
> since
> the token type would not match.

True, this is not easy.  However, you could create a
type system independent tokenizer if you really wanted
to.  You wouldn't be able to use the JCas, but you
could certainly configure an annotator to use a token
type that is given externally.  Still, you would need
to specify the token type in the descriptor.

> 
> It is also not possible to set the language during runtime, e.g. an
> annotator
> could have a model/rule file which also specifies the language.
> The language setting is then redundant in the descriptor.

I'm not sure what you mean here.  You can certainly
set the language at runtime and change it for each
document.  We use many language dependent annotators.
There's even something called the LanguageCapabilityFlow
which lets you specify different flows depending on
the language of the annotator.

Of course your annotator needs to know how to handle
the different languages.  So it may have a different
dictionary for each language, which it then needs to
load on demand.

> 
> It should also be considered to reuse annotators with different settings
> e.g. a few name finder instances but with different models and different
> output types.
> This is too already possible but some information in the descriptor must
> be duplicated for every instance, e.g. version, implementing class, etc.

Yes, I kind of agree with you there.  We're not doing
a very good job of separating static information like
version number etc. from variable configuration info.
Similar discussions are happening on the OASIS TC, and
if I recall correctly, the idea there was that primitive
descriptors should be considered static and not change.
Config properties set in those descriptors would be
considered default values and would be overridden from
an aggregate.  (Somebody else on the OASIS TC please
correct me if this isn't right)

Does that make sense?

--Thilo

Reply via email to