Hi everyone, I'm just getting started with UIMA and have poked through the docs and the sandbox, but still have some questions on best/recommended practices.
A simple example of my question is with stop word processing of text. Processing is broken up into Tokenizer -> Stemmer -> StopWordAnnotator. The tokenizer and stemmer are straightforward. We can create our own or swap in modules such as the sandbox WhitespaceTokenizer or SnowballAnnotator (stemming). My concern is that during initialize(...) of the StopWordAnnotator I load a resource file that contains the list of stop words. These stop words need to be tokenized and stemmed as well (probably in the same manner as the previous steps, but perhaps configurable). What is the best practice on doing this? Specifying an aggregate analysis engine that runs over the stop word list within the initialize() method? That seems a bit strange (and would maybe quite complicated as later annotators have more complex processing), but I haven't yet seen examples for this type of complex, resource-based annotator. Thanks for taking the time to read/help! Dave
