subject:"Integrating NLP into Lucene Analysis Chain"

RE: Re: Integrating NLP into Lucene Analysis Chain

2022-11-22 Thread Lucas Kot-Zaniewski

Hi Benoit, Thanks for the reply and link! My application is english-focused so I have the benefit of having a language with little inflection. This along with a few other reasons pushed me towards an index-heavy approach which doesn't have the complexities involved with synonyms of different posit

RE: RE: Integrating NLP into Lucene Analysis Chain

2022-11-22 Thread Lucas Kot-Zaniewski

example, on the other hand, are slow. If you have to put NLP processing inside the analysis chain, you may have to give up certain NLP capacities... > > My 2cents, > > Guan > > -Original Message- > From: Luke Kot-Zaniewski (BLOOMBERG/ 919 3RD A) > Sent: Saturday, November

RE: Integrating NLP into Lucene Analysis Chain

2022-11-21 Thread Wang, Guan

: Luke Kot-Zaniewski (BLOOMBERG/ 919 3RD A) Sent: Saturday, November 19, 2022 10:27 PM To: java-user@lucene.apache.org Subject: Integrating NLP into Lucene Analysis Chain External Email - Use Caution Greetings, I would greatly appreciate anyone sharing their experience doing NLP/lemmatizat

Re: Integrating NLP into Lucene Analysis Chain

2022-11-21 Thread Mikhail Khludnev

Hello, Benoit. I just came across https://lucene.apache.org/core/8_0_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/TypeAsSynonymFilterFactory.html It sounds similar to what you asking, but it watches TypeAttribute only. Also, spans are superseded with intervals https://lucene.apache

Re: Integrating NLP into Lucene Analysis Chain

2022-11-21 Thread Benoit Mercier

Hi Luke, Thank you for your work and information sharing. From my point of view lemmatization is just a use case of text token annotation. I have been working with Lucene since 2006 to index lexicographic and linguistic data and I always miss the fact that (1) token attributes are not search

Re: Integrating NLP into Lucene Analysis Chain

2022-11-19 Thread Robert Muir

https://github.com/apache/lucene/pull/11955 On Sat, Nov 19, 2022 at 10:43 PM Robert Muir wrote: > > Hi, > > Is this 'synchronized' really needed? > > 1. Lucene tokenstreams are only used by a single thread. If you index > with 10 threads, 10 tokenstreams are used. > 2. These OpenNLP Factories mak

Re: Integrating NLP into Lucene Analysis Chain

2022-11-19 Thread Robert Muir

Hi, Is this 'synchronized' really needed? 1. Lucene tokenstreams are only used by a single thread. If you index with 10 threads, 10 tokenstreams are used. 2. These OpenNLP Factories make a new *Op for each tokenstream that they create. so there's no thread hazard. 3. If i remove 'synchronized' ke

Integrating NLP into Lucene Analysis Chain

2022-11-19 Thread Luke Kot-Zaniewski (BLOOMBERG/ 919 3RD A)

Greetings, I would greatly appreciate anyone sharing their experience doing NLP/lemmatization and am also very curious to gauge the opinion of the lucene community regarding open-nlp. I know there are a few other libraries out there, some of which can’t be directly included in the lucene project

RE: Re: Integrating NLP into Lucene Analysis Chain

RE: RE: Integrating NLP into Lucene Analysis Chain

RE: Integrating NLP into Lucene Analysis Chain

Re: Integrating NLP into Lucene Analysis Chain

Re: Integrating NLP into Lucene Analysis Chain

Re: Integrating NLP into Lucene Analysis Chain

Re: Integrating NLP into Lucene Analysis Chain

Integrating NLP into Lucene Analysis Chain

8 matches

Site Navigation

Mail list logo

Footer information