Re: Lucene cas consumer

Dan McCreary Thu, 04 Dec 2008 11:33:22 -0800

Hello,

I am somewhat new to UIMA so I apologize if I misunderstand some things.
But this is a very interesting question for me.

I see Lucene as a very wildly adopted but *Java-only framework* of tools for
building and maintaining keyword *indexes *on many types of documents.
Lucene also has great support for HADOOP and MapForce-type saleability.  But
Lucene is also designed to work with many front end tools like POI libraries
to extract text from Microsoft Word, Excel, PowerPoint etc.

I see Apache UIMA as a general purpose *analytic pipeline architecture *with
the strengths of a very advanced common in-memory processing model.

I thin there is a huge win-win for both projects if we can make UIMA enrich
text documents with entities before they are indexed by Lucene and also make
these tools much easier to install and work together.  You should not have
to be a Java developer just to install these tools and have them index and
search our file systems.

I have spent many hours trying to get UIMA to work without success.  Perhaps
it has to do with trying to get it to work on a 64 bit Vista....  :-O

- Dan

On Thu, Dec 4, 2008 at 12:12 PM, Greg Holmberg <[EMAIL PROTECTED]>wrote:

> Roberto--
>
> It does seem like there should be a close relationship between the two
> groups.
>
> I don't know much about Lucene--can you educate me?  For example, have you
> given any thought to what to do with UIMA annotations?  From what little
> I've read about Lucene, they seem to have a thing called a document
> analyzer, but they don't mean the same thing we mean by analysis in the NLP
> community.  They appear to mean something more like "tokenizer".  So I
> haven't yet found a place to put UIMA annotations, say for example, named
> entities or parts of speech.  I'm wondering if Lucene needs a major feature
> enhancement before its truly useful with UIMA?
>
> What are your thoughts on how the integrate the two?  What functionality is
> possible?
>
> Greg Holmberg
>
>
>  -------------- Original message ----------------------
> From: "Roberto Franchini" <[EMAIL PROTECTED]>
> > Hi,
> > I'm going to write a Lucene CAS consumer. The porpouse is to create a
> > Lucene document, or more than one, for each CAS.
> > Last year (2007)  the JENA university lab (JULIE lab? is it right?)
> > delivered such a component, named LUCAS. Then it disappeared.
> > LUCAS seems a good piece of software.
> > The Technische Universit�t Darmstadt developed one too:
> > http://www.ukp.tu-darmstadt.de/projects/dkpro/. (I will write to
> > them).
> >
> > There's anybody interested to share knowledge and/or code to do that
> component?
> > I think that Lucene and UIMA can be very good friends :)
> >
> > Roberto
> >
> > PS: I apologize for my bad English.
> >
> > --
> > Roberto Franchini
> > http://www.celi.it
> > http://www.blogmeter.it
> > http://www.memesphere.it
> > Tel +39-011-6600814
> > jabber:[EMAIL PROTECTED] <[EMAIL PROTECTED]>skype:ro.franchini
>
>

-- 
Dan McCreary
Senior Enterprise Data Architecture and Strategy Consulting
(952) 931-9198
cell: (612) 986-1552
[EMAIL PROTECTED]
http://www.danmccreary.com

Re: Lucene cas consumer

Reply via email to