Why do you think that you need to have a tokenizer?  The example that
Adam sent doesn't have a tokenizer in it at all.  It simply depends on
an Analysis Engine previous in the pipeline that produces Person
annotations.

Perhaps, rather than going through the tokenizer route you should just
try to do some sort of regular expression matching on your list of
person names.  An example of this is in the version of UIMA from IBM on
AlphaWorks that uses an example of Building Room numbers that might be
helpful for you to follow.  I haven't looked at the latest documentation
to see if this example is around in the Apache versions and the Apache
Website doesn't seem to be working for me at the moment.  This example
might be a little different now as it references buildings and rooms at
IBM.

There are many different ways in which UIMA is valuable such as enabling
distributed processing etc., but what you seem to be the most interested
in is using it connect various extraction processes (i.e. Analysis
Engines).  There isn't any fixed way to do this and there shouldn't be.
For example, if I write a Person Name Annotator Analysis Engine that
depends on tokens with that know whether or not they are capitalized,
then somewhere in the UIMA pipeline these have to be provided.  Suppose
I want to use a tokenizer that doesn't do anything with capitalization.
In that case I have to write some code, be it Java, or C++, python,
Perl, or in my case I've been using JRuby, and Groovy would work well
too.  This code would have to provide tokens with knowledge of
capitalization.  UIMA is itself agnostic on how to do this.  There are
so many different possible variations on what could go into a pipeline
that it would be impossible to handle them all.  If UIMA were to provide
tokenization automatically, why shouldn't it also provide Video Scene
segmentation or phonetic syllable segmentation of audio, which UIMA also
enables?

That being said there are some additional tools that might help with
putting together various pipelines.  For instance, I've been working on
(although not very hard) an Analysis engine that would allow an
different Analysis Engine to work on only part of the output of another
Analysis Engine.  Suppose I had an Analysis Engine that detected the
different languages being used in a text.  Then suppose I had a Person
Annotation Extractor that only works on Japanese.  I might want to be
able to send the Japanese parts of my text to the Person Annotation
Extractor without writing any code.  I'm not at all sure what the best
way to go about this would be.  Such an Analysis Engine might be good to
include in the UIMA package but it might not belong in the
specification. 

BTW is anybody using the Perlator or Pythonator swig stuff with UIMA
2.x? 


-----Original Message-----
From: LASRI YASSINE [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 22, 2007 5:02 PM
To: [email protected]
Subject: Re: Help on UIMA Please !

Hi Adam,

Thanks for the given example ! it's a month that i have started working
with
UIMA API and i can't until now understand what the value added of UIMA ?

for example :
if I want to use external resource and check if an entity in the
external
resource is matched in the given CAS document ? why sould I write a
tokenizer and other thing of java code to do so
Why UIMA doesn't offer this possibility directly whithout any other java
code ?

-Yassine

2007/3/22, Adam Lally <[EMAIL PROTECTED]>:
>
> On 3/15/07, LASRI YASSINE <[EMAIL PROTECTED]> wrote:
> > 2007/3/15, Michael Baessler <[EMAIL PROTECTED]>:
> > >
> > > LASRI YASSINE wrote:
> > > > Exactly what I need, but rule can be either regular expression
or
> > > > aggregation of premitifs annotators ?
> > > > Have any example ?
> > > When I understand you correct, you want to have a rules that says:
> > >
> > > rule1:  [person] /meets/ [person]
> > >
> > > where the rules consist of a person annotation followed by a
regular
> > > expression "meets" followed by another person annotation.
> > > Is that what you mean by "either regular expression or aggregation
of
> > > premitifs annotators"?
> >
> >
> > > Yes of course that's what I mean !
> >
> > Sorry I haven't got any example or implementation that do such kind
of
> > > processing. Maybe some other users on the users list can help you
here
> > > if they have some experience.
> >
> >
> > > If any user have an example, please send it to me
> >
>
> I don't have a ready-to-run example, but to get yourself started I
> would do something like this:
>
>    FSIndex personIndex = aJCas.getAnnotationIndex({Person.type);
>    //iterate over pairs of Person annotations
>    Iterator personIter = personIndex.iterator();
>    while (personIter.hasNext()) {
>      Person person1 = (Person)personIter.next();
>      if (!personIter.hasNext())
>        break;
>      Person person2 = (Person)personIter.next();
>      if (person1.getEnd() < person2.getBegin()) {
>        //check if the text between the annotations contains the word
> "meets"
>        //(this could easily be a regular expression match instead, of
> course)
>        String textBetween =
> aJCas.getDocumentText().substring(person1.getEnd(),
>              person2.getBegin());
>        if (textBetween.indexOf("meets") > -1) {
>          //create annotation
>          MeetsRelationAnnotation newAnnot = new
> MeetsRelationAnnotation(aJCas,
>            person1.getBegin(), person2.getEnd());
>          newAnnot.addToIndexes();
>        }
>     }
>   }
>
> Note I just typed that right into this email, so there might be syntax
> errors.  But it should give you the idea.
>
> Now if you want to turn this into a more general annotator that you
> can configure with arbitrary rules that tell it what to match, then
> that's a much more complex question.  What we can help you with here
> is how to use the UIMA APIs.
>
> Regards,
> -Adam
>

Reply via email to