Hello, I am somewhat new to UIMA so I apologize if I misunderstand some things. But this is a very interesting question for me.
I see Lucene as a very wildly adopted but *Java-only framework* of tools for building and maintaining keyword *indexes *on many types of documents. Lucene also has great support for HADOOP and MapForce-type saleability. But Lucene is also designed to work with many front end tools like POI libraries to extract text from Microsoft Word, Excel, PowerPoint etc. I see Apache UIMA as a general purpose *analytic pipeline architecture *with the strengths of a very advanced common in-memory processing model. I thin there is a huge win-win for both projects if we can make UIMA enrich text documents with entities before they are indexed by Lucene and also make these tools much easier to install and work together. You should not have to be a Java developer just to install these tools and have them index and search our file systems. I have spent many hours trying to get UIMA to work without success. Perhaps it has to do with trying to get it to work on a 64 bit Vista.... :-O - Dan On Thu, Dec 4, 2008 at 12:12 PM, Greg Holmberg <[EMAIL PROTECTED]>wrote: > Roberto-- > > It does seem like there should be a close relationship between the two > groups. > > I don't know much about Lucene--can you educate me? For example, have you > given any thought to what to do with UIMA annotations? From what little > I've read about Lucene, they seem to have a thing called a document > analyzer, but they don't mean the same thing we mean by analysis in the NLP > community. They appear to mean something more like "tokenizer". So I > haven't yet found a place to put UIMA annotations, say for example, named > entities or parts of speech. I'm wondering if Lucene needs a major feature > enhancement before its truly useful with UIMA? > > What are your thoughts on how the integrate the two? What functionality is > possible? > > Greg Holmberg > > > -------------- Original message ---------------------- > From: "Roberto Franchini" <[EMAIL PROTECTED]> > > Hi, > > I'm going to write a Lucene CAS consumer. The porpouse is to create a > > Lucene document, or more than one, for each CAS. > > Last year (2007) the JENA university lab (JULIE lab? is it right?) > > delivered such a component, named LUCAS. Then it disappeared. > > LUCAS seems a good piece of software. > > The Technische Universit�t Darmstadt developed one too: > > http://www.ukp.tu-darmstadt.de/projects/dkpro/. (I will write to > > them). > > > > There's anybody interested to share knowledge and/or code to do that > component? > > I think that Lucene and UIMA can be very good friends :) > > > > Roberto > > > > PS: I apologize for my bad English. > > > > -- > > Roberto Franchini > > http://www.celi.it > > http://www.blogmeter.it > > http://www.memesphere.it > > Tel +39-011-6600814 > > jabber:[EMAIL PROTECTED] <[EMAIL PROTECTED]>skype:ro.franchini > > -- Dan McCreary Senior Enterprise Data Architecture and Strategy Consulting (952) 931-9198 cell: (612) 986-1552 [EMAIL PROTECTED] http://www.danmccreary.com
