Help !!

LASRI YASSINE Thu, 15 Mar 2007 08:58:38 -0800

Hello Guys,

First of all, Congratulations! of the new release of UIMA packages and
special thanks to UIMA team incubator for the hard working to give us a new
release in a short time .


Secondly and as always I will ask a new Help again :

I have an external list with companies names such as

IBM Corporation
Coca Cola
Microsoft Inc
... etc

all the list is made by multi token names of companies.


And I want to extract All these names from a collection of HTML documents

I have build the Collection Reader to iterate the collection, the Cas
initializer to eliminate HTML tags, and a Token annotator... etc

and I am now working on the analysis engine

The problem that the token annotators return only token from the source
documents, for example

source.html  is like that :

article found in magazine, and Microsoft Inc and also IBM Corporation with
Coca Cola

Tokens are :
- article
- found
....
- Microsoft
- Inc
- ...
- IBM
- Corporation

...


So how can I annotate Microsoft Inc and IBM Corporation ?


Best ragards
-Yassine

Help !!

Reply via email to