Re: Lucene cas consumer

Marshall Schor Fri, 05 Dec 2008 10:32:36 -0800


Dan McCreary wrote:
> Hello,
>
> I am somewhat new to UIMA so I apologize if I misunderstand some things.
> But this is a very interesting question for me.
>
> I see Lucene as a very wildly adopted but *Java-only framework* of tools for
> building and maintaining keyword *indexes *on many types of documents.
> Lucene also has great support for HADOOP and MapForce-type saleability.  But
> Lucene is also designed to work with many front end tools like POI libraries
> to extract text from Microsoft Word, Excel, PowerPoint etc.
>
> I see Apache UIMA as a general purpose *analytic pipeline architecture *with
> the strengths of a very advanced common in-memory processing model.
>
> I thin there is a huge win-win for both projects if we can make UIMA enrich
> text documents with entities before they are indexed by Lucene and also make
> these tools much easier to install and work together.  You should not have
> to be a Java developer just to install these tools and have them index and
> search our file systems.
>
> I have spent many hours trying to get UIMA to work without success.  Perhaps
> it has to do with trying to get it to work on a 64 bit Vista....  :-O
>   
We have UIMA running on 64 bit Linuxes.  Please consider starting
another thread about issues around getting it working on 64 bit Vista -
that could be quite useful to the community.


-Marshall
> - Dan
>
>
> On Thu, Dec 4, 2008 at 12:12 PM, Greg Holmberg <[EMAIL PROTECTED]>wrote:
>
>   
>> Roberto--
>>
>> It does seem like there should be a close relationship between the two
>> groups.
>>
>> I don't know much about Lucene--can you educate me?  For example, have you
>> given any thought to what to do with UIMA annotations?  From what little
>> I've read about Lucene, they seem to have a thing called a document
>> analyzer, but they don't mean the same thing we mean by analysis in the NLP
>> community.  They appear to mean something more like "tokenizer".  So I
>> haven't yet found a place to put UIMA annotations, say for example, named
>> entities or parts of speech.  I'm wondering if Lucene needs a major feature
>> enhancement before its truly useful with UIMA?
>>
>> What are your thoughts on how the integrate the two?  What functionality is
>> possible?
>>
>> Greg Holmberg
>>
>>
>>  -------------- Original message ----------------------
>> From: "Roberto Franchini" <[EMAIL PROTECTED]>
>>     
>>> Hi,
>>> I'm going to write a Lucene CAS consumer. The porpouse is to create a
>>> Lucene document, or more than one, for each CAS.
>>> Last year (2007)  the JENA university lab (JULIE lab? is it right?)
>>> delivered such a component, named LUCAS. Then it disappeared.
>>> LUCAS seems a good piece of software.
>>> The Technische Universit�t Darmstadt developed one too:
>>> http://www.ukp.tu-darmstadt.de/projects/dkpro/. (I will write to
>>> them).
>>>
>>> There's anybody interested to share knowledge and/or code to do that
>>>       
>> component?
>>     
>>> I think that Lucene and UIMA can be very good friends :)
>>>
>>> Roberto
>>>
>>> PS: I apologize for my bad English.
>>>
>>> --
>>> Roberto Franchini
>>> http://www.celi.it
>>> http://www.blogmeter.it
>>> http://www.memesphere.it
>>> Tel +39-011-6600814
>>> jabber:[EMAIL PROTECTED] <[EMAIL PROTECTED]>skype:ro.franchini
>>>       
>>     
>
>
>

Re: Lucene cas consumer

Reply via email to