Thank you for your response, my problem is that :
I have an external file that contains a list of persons names, for example :

adam
smith
lary
page
... etc
and I need to extract all persons names from others source (Text Documents),
for example :
"Lary Page is the creator of google and Adam Smith is an economist"
The annotator shoul extract <Adam Smith> and <Lary Page> as  person name. So
what I can do ?

Bests
- Yassine



2007/2/28, Adam Lally <[EMAIL PROTECTED]>:

On 2/28/07, LASRI YASSINE <[EMAIL PROTECTED]> wrote:
> Hello,
>
>  I have create an annotator that extract all String beginning with a
capital
> (Accccc)letter and I want to use this annotator (in Aggregation) to
extract
> all Sentences containing 2 String all of them begin with capila letter
> (Xaaaaa Ybbbbb) .
>

Hi,

You will need to create a second annotator, which will take the
results of your first annotator and do further processing on them.
This approach is shown in the MeetingAnnotator example that is
excercise 4 of the tutorial (see the Annotator & Analysis Engine
Developer's Guide chapter in the documentation).

Say your first annotator outputs FeatureStructures of the type
CapitalizedWord.  Your second annotator would get an iterator over
CapitalizedWords, for example:

jcas.getJFSIndexRepository().getAnnotationIndex(CapitalizedWord.type
).iterator()

Then you iterate over the Capitalized Word annotations and for each
pair of annotations you can could if they are adjacent in the document
by seeing if the document text between them is all whitespace.  If you
find an adjacent pair of CapitalizedWords you can then create a new
annotation of some other type that spans both CapitalizedWords.

You then create an Aggregate Analysis Engine contains both of your
annotators.  The way to do this is shown in the tutorial as well.

It wasn't clear to me from your question whether you also need to
detect sentence boundaries in your document.  If so you can you the
example SimpleTokenAndSentenceAnnotator that comes with the SDK.

Hope that helps,

-Adam

Reply via email to