Hi Joern, Jörn Kottmann wrote: > Hello, > > often annotators are more flexible and reusable than its assumed in UIMA. > The configuration is static to the annotator because it is set via the > descriptor. > > There are annotators which benefit from determining the types to use at > runtime > via a configuration parameter. This is already possible (e.g the example > regex annotator), > but it is not possible to set the capabilities at runtime. > To improve this UIMA should ask the annotator for the capabilities. To > make the configuration > easier it could be considered to add a "Type" range type for parameters.
it's one of the basic principles of UIMA that this kind of information is declared statically in the descriptor. The point here is that we want the system to be able to figure out things concerning the analysis flow just by looking at the descriptors, without having to instantiate the actual analysis engine. > > This type system mapping makes annotators independent of one specific > type system > and allows the reuse with another type system, e.g. currently its hard > to reuse a tokenizer >> from one group and combine it with a pos annotator from another group > since > the token type would not match. True, this is not easy. However, you could create a type system independent tokenizer if you really wanted to. You wouldn't be able to use the JCas, but you could certainly configure an annotator to use a token type that is given externally. Still, you would need to specify the token type in the descriptor. > > It is also not possible to set the language during runtime, e.g. an > annotator > could have a model/rule file which also specifies the language. > The language setting is then redundant in the descriptor. I'm not sure what you mean here. You can certainly set the language at runtime and change it for each document. We use many language dependent annotators. There's even something called the LanguageCapabilityFlow which lets you specify different flows depending on the language of the annotator. Of course your annotator needs to know how to handle the different languages. So it may have a different dictionary for each language, which it then needs to load on demand. > > It should also be considered to reuse annotators with different settings > e.g. a few name finder instances but with different models and different > output types. > This is too already possible but some information in the descriptor must > be duplicated for every instance, e.g. version, implementing class, etc. Yes, I kind of agree with you there. We're not doing a very good job of separating static information like version number etc. from variable configuration info. Similar discussions are happening on the OASIS TC, and if I recall correctly, the idea there was that primitive descriptors should be considered static and not change. Config properties set in those descriptors would be considered default values and would be overridden from an aggregate. (Somebody else on the OASIS TC please correct me if this isn't right) Does that make sense? --Thilo
