Hello Erik, in Lucene 4.9 (maybe earlier), you can replace the Lucene analyzer with a UIMA pipeline. At least the docs say so. I don't know how good it is becaus I've never used it.
Cheers, Armin On 8/26/14, Erik Fäßler <[email protected]> wrote: > Hi all, > > actually, I don't use LuCas anymore to write a Lucene index but rather to > send the created documents to Solr or ElasticSearch. There are two reasons I > continue to use LuCas: It's field merging capabilities and the term cover > mechanics. > Regarding the field merging: I have a lot of machine learning components in > my pipeline, nothing I could do within a Lucene analyzer. So when I > recognize entities with an ML component in the text and each entity has an > ID, then please consider this example: > > Barack Obama entered the White House. > > Let's pretend we would require an ML system to recognize "White House" as > THE one White House and let's say we gave it the ID "entity1". > My goal is to be able to search for the ID in the same way I would do using > a synonym filter, thus finding a document by terms that originally were not > included in this document's text, AND be able to correctly highlight the > corresponding text snippet. So, when I search for "entity1" (e.g. because > the user wants to see documents dealing with the White House), I want to > find the above example document with the string "Whit House" highlighted. > LuCas can do this for me be aligning or merging the text TokenStream with > the entity TokenStream, just as it is done within the CAS itself. > > If this functionality can be achieved without using LuCas, please tell me, > I'd be happy to switch to up-to-date maintained default-components. Until > now I am under the impression this cannot be done by another component. > > The term cover mechanics allow me to easily distribute terms across document > fields in a predefined, possible overlapping, set division, the set cover. I > use it to automatically deal with a lot of faceting fields. Here, I can > model n:n mappings from CAS indexes to Lucene fields, e.g. mapping terms > originating from one CAS index to 10 Lucene fields, or the other way round. > Again, if this is easily possible with another existing, maintained > component, please point me to it. > > In short: I, too, ultimately don't use Lucene but Solr/ES. However, LuCas > has some (Lucene) document fine-tuning-tuning capabilities I need/work > with. > This means: I don't necessarily need LuCas in an Lucene-updated version. I > use it more as a fine-tuned TokenStream-smith. I could require it to be > updated in the future when LuCas is not able to express a specific feature > of a newer Lucene version. > > I hope this wall of text was understandable, thanks for reading through it > ;-) > > Best, > > Erik > > > >> On 26 Aug 2014, at 09:43, <[email protected]> wrote: >> >> Hi Erik and Jörn, >> >> I've used Solr in the meantime. It is so easy to quickly write a CAS >> consumer that sends documents to a Solr web service. Writing to a Lucene >> index is minimally more work. Could this be the reason why nobody cares >> about the outdated version? Is there really a need for Lucas and Solrcas >> anymore? What do you think? It would be nice to have some opinions on >> this. >> >> Of all people reading this list, who wants to have a Lucas or Solrcas for >> the current version of Lucene? >> >> Cheers, >> Armin >> >> -----Ursprüngliche Nachricht----- >> Von: Erik Fäßler [mailto:[email protected]] >> Gesendet: Freitag, 22. August 2014 16:34 >> An: [email protected] >> Betreff: Re: AW: Lucas >> >> I am using LuCas in production in the last SNAPSHOT version that can be >> found in the SVN but not in the maven repository. I was also not aware a >> patch would be required to get it to work, I am using it in its current >> SVN state, including the splitter filter. >> I would be willing to help with a migration and contribute to >> discussions/plans. However, I won't have time to do it all on my own, >> especially since I use it as a bridge to Solr/ElasticSearch that kind of >> remedies the version difference. Thus I use it with newer Solr/ES versions >> without problems so far. >> >> I will be on vacations for two weeks, after that I'd be available for >> contributions. >> >> Best, >> >> Erik >> >>> On 22 Aug 2014, at 15:36, Jörn Kottmann <[email protected]> wrote: >>> >>> It would probably nice to migrate those to the current versions of >>> Lucene/Solr. >>> >>> Jörn >>> >>>> On 08/13/2014 08:44 AM, [email protected] wrote: >>>> Hi Renauld, >>>> >>>> that's nice, thank you. Are you using Lucene 4.x or an older version? >>>> >>>> It's a while ago, that I've asked that question and I didn't get much >>>> response. Is the project dead? Is it just to easy to code a simple >>>> annotator for Lucene or Solr to justify the effort maintaining Lucas and >>>> Solrcas? >>>> >>>> Cheers, >>>> Armin >>>> >>>> >>>> -----Ursprüngliche Nachricht----- >>>> Von: Renaud Richardet [mailto:[email protected]] >>>> Gesendet: Montag, 11. August 2014 23:12 >>>> An: [email protected] >>>> Betreff: Re: Lucas >>>> >>>> Hi Armin, >>>> >>>> I used it a while ago. I had to apply the following patch to make it >>>> work: >>>> https://gist.github.com/renaud/bc34a48ca22f787f6c11 >>>> >>>> HTH, Renaud >>>> >>>> >>>>> On Mon, Jul 28, 2014 at 2:55 PM, <[email protected]> wrote: >>>>> >>>>> Hi! >>>>> >>>>> Is someone using Lucas? It seems to be slightly outdated. It depends >>>>> on Lucene 2.9.3. Lucene is at version 4.9.0 right now. Is there an >>>>> alternative? >>>>> >>>>> Regards, >>>>> Armin >>>> >>>> -- >>>> Renaud Richardet >>>> Blue Brain Project PhD candidate >>>> EPFL Station 15 >>>> CH-1015 Lausanne >>>> phone: +41-78-675-9501 >>>> http://people.epfl.ch/renaud.richardet >>> >
