Re: AW: AW: Lucas

Erik Fäßler Tue, 26 Aug 2014 06:59:28 -0700

Hi all,

actually, I don't use LuCas anymore to write a Lucene index but rather to send 
the created documents to Solr or ElasticSearch. There are two reasons I 
continue to use LuCas: It's field merging capabilities and the term cover 
mechanics.
Regarding the field merging: I have a lot of machine learning components in my 
pipeline, nothing I could do within a Lucene analyzer. So when I recognize 
entities with an ML component in the text and each entity has an ID, then 
please consider this example:

Barack Obama entered the White House.

Let's pretend we would require an ML system to recognize "White House" as THE 
one White House and let's say we gave it the ID "entity1".
My goal is to be able to search for the ID in the same way I would do using a 
synonym filter, thus finding a document by terms that originally were not 
included in this document's text, AND be able to correctly highlight the 
corresponding text snippet. So, when I search for "entity1" (e.g. because the 
user wants to see documents dealing with the White House), I want to find the 
above example document with the string "Whit House" highlighted.
LuCas can do this for me be aligning or merging the text TokenStream with the 
entity TokenStream, just as it is done within the CAS itself.

If this functionality can be achieved without using LuCas, please tell me, I'd 
be happy to switch to up-to-date maintained default-components. Until now I am 
under the impression this cannot be done by another component.

The term cover mechanics allow me to easily distribute terms across document 
fields in a predefined, possible overlapping, set division, the set cover. I 
use it to automatically deal with a lot of faceting fields. Here, I can model 
n:n mappings from CAS indexes to Lucene fields, e.g. mapping terms originating 
from one CAS index to 10 Lucene fields, or the other way round.
Again, if this is easily possible with another existing, maintained component, 
please point me to it.

In short: I, too, ultimately don't use Lucene but Solr/ES. However, LuCas has 
some (Lucene) document fine-tuning-tuning capabilities I need/work with.
This means: I don't necessarily need LuCas in an Lucene-updated version. I use 
it more as a fine-tuned TokenStream-smith. I could require it to be updated in 
the future when LuCas is not able to express a specific feature of a newer 
Lucene version.

I hope this wall of text was understandable, thanks for reading through it ;-)

Best,

Erik

> On 26 Aug 2014, at 09:43, <[email protected]> wrote:
> 
> Hi Erik and Jörn,
> 
> I've used Solr in the meantime. It is so easy to quickly write a CAS consumer 
> that sends documents to a Solr web service. Writing to a Lucene index is 
> minimally more work. Could this be the reason why nobody cares about the 
> outdated version? Is there really a need for Lucas and Solrcas anymore? What 
> do you think? It would be nice to have some opinions on this. 
> 
> Of all people reading this list, who wants to have a Lucas or Solrcas for the 
> current version of Lucene?
> 
> Cheers,
> Armin
> 
> -----Ursprüngliche Nachricht-----
> Von: Erik Fäßler [mailto:[email protected]] 
> Gesendet: Freitag, 22. August 2014 16:34
> An: [email protected]
> Betreff: Re: AW: Lucas
> 
> I am using  LuCas in production in the last SNAPSHOT version that can be 
> found in the SVN but not in the maven repository. I was also not aware a 
> patch would be required to get it to work, I am using it in its current SVN 
> state, including the splitter filter.
> I would be willing to help with a migration and contribute to 
> discussions/plans. However, I won't have time to do it all on my own, 
> especially since I use it as a bridge to Solr/ElasticSearch that kind of 
> remedies the version difference. Thus I use it with newer Solr/ES versions 
> without problems so far.
> 
> I will be on vacations for two weeks, after that I'd be available for 
> contributions.
> 
> Best,
> 
> Erik
> 
>> On 22 Aug 2014, at 15:36, Jörn Kottmann <[email protected]> wrote:
>> 
>> It would probably nice to migrate those to the current versions of 
>> Lucene/Solr.
>> 
>> Jörn
>> 
>>> On 08/13/2014 08:44 AM, [email protected] wrote:
>>> Hi Renauld,
>>> 
>>> that's nice, thank you. Are you using Lucene 4.x or an older version?
>>> 
>>> It's a while ago, that I've asked that question and I didn't get much 
>>> response. Is the project dead? Is it just to easy to code a simple 
>>> annotator for Lucene or Solr to justify the effort maintaining Lucas and 
>>> Solrcas?
>>> 
>>> Cheers,
>>> Armin
>>> 
>>> 
>>> -----Ursprüngliche Nachricht-----
>>> Von: Renaud Richardet [mailto:[email protected]]
>>> Gesendet: Montag, 11. August 2014 23:12
>>> An: [email protected]
>>> Betreff: Re: Lucas
>>> 
>>> Hi Armin,
>>> 
>>> I used it a while ago. I had to apply the following patch to make it work:
>>> https://gist.github.com/renaud/bc34a48ca22f787f6c11
>>> 
>>> HTH, Renaud
>>> 
>>> 
>>>> On Mon, Jul 28, 2014 at 2:55 PM, <[email protected]> wrote:
>>>> 
>>>> Hi!
>>>> 
>>>> Is someone using Lucas? It seems to be slightly outdated. It depends 
>>>> on Lucene 2.9.3. Lucene is at version 4.9.0 right now. Is there an 
>>>> alternative?
>>>> 
>>>> Regards,
>>>> Armin
>>> 
>>> --
>>> Renaud Richardet
>>> Blue Brain Project  PhD candidate
>>> EPFL  Station 15
>>> CH-1015 Lausanne
>>> phone: +41-78-675-9501
>>> http://people.epfl.ch/renaud.richardet
>>

Re: AW: AW: Lucas

Reply via email to