UIMA Beginners Help?

Andrew Serff Mon, 02 Jul 2007 12:47:46 -0700

Hello. I'm very new to the whole world of data mining and have stumbledupon UIMA within the last week or so. I'm trying to go through all thedocumentation and just create a simple application but am hitting someroad blocks and was wondering where I can find some newbie help. Irealize this is sorta of long, so I appreciate any help anyone can give.First I have a question: What is the difference between a CAS and a JCasand why would I want to use one over the other? Is this determined bythe AEs I'm using (i.e. if they are implemented by extending aJCas_*_impl) or is there some other reason? It seems the CAS is moredeveloped and has things like CasPools, ability to make CASes withmultiple AEs, Consumers, etc. Should I just be using the CAS interfaceand forget about JCas?

My main issue right now is that I can't figure out how to set inputs foran AE. I can't find any examples of how to do it. See the descriptionbelow of what I'm trying to do:

I'm trying to use some pre bundled AEs to parse some text. I basicallywant to do Named Entity Extraction on text. So I wrote a simpleapplication that first does Sentence Boundary detection and prints outthe sentences that it finds. That was easy enough. So now I would liketo take those sentences and feed it into the named entity AE. Both theSentence Boundary AE and the NE AE I'm using are from the JULIE lab(http://www.julielab.de). Reading the documentation for the NE AE itsays that is requires inputs as Sentences (the output of the SentenceBoundary AE). I cannot figure out how to set those inputs and am stuckat this point. Once I figure that out, I think I'll be getting NEs outof the CAS.So now all that being said, I'm also not sure I'm coding this processthe way I'm supposed to. I eventually want to build all this into adistributed architecture with many threads running constantly processingusing a pool of extractors. I want to be able to submit documents tothe named entity extractor, then persist the named entities in adatabase. I would like to have multiple entry points into the extractor(i.e. adhoc (here is a doc, extract it now)) or using a collectionreader to pull mulitple docs in at once and parse them all. Right now,my simple application has 2 CASes and 2 AnalsysEngines (one for SentenceDetection and one for NE Extraction). It seems like I would just wantto make one AE that does the Sentence Detection and passes it on to theNE extractor, but I don't get how you do this. Do I need to make a newAE and define these things in the xml that describes it? Or is this aCPE?If anyone has a simple NE example application that could point me in theright direction, that would be great.

Thanks!
Andrew

UIMA Beginners Help?

Reply via email to