Hello Guys, First of all, Congratulations! of the new release of UIMA packages and special thanks to UIMA team incubator for the hard working to give us a new release in a short time .
Secondly and as always I will ask a new Help again : I have an external list with companies names such as IBM Corporation Coca Cola Microsoft Inc ... etc all the list is made by multi token names of companies. And I want to extract All these names from a collection of HTML documents I have build the Collection Reader to iterate the collection, the Cas initializer to eliminate HTML tags, and a Token annotator... etc and I am now working on the analysis engine The problem that the token annotators return only token from the source documents, for example source.html is like that : article found in magazine, and Microsoft Inc and also IBM Corporation with Coca Cola Tokens are : - article - found .... - Microsoft - Inc - ... - IBM - Corporation ... So how can I annotate Microsoft Inc and IBM Corporation ? Best ragards -Yassine
