Hi Benoit,
Thanks for the reply and link! My application is english-focused so I have
the benefit of having a language with little inflection. This along with a
few other reasons pushed me towards an index-heavy approach which doesn't
have the complexities involved with synonyms of different
gt; To: java-user@lucene.apache.org
> Subject: Integrating NLP into Lucene Analysis Chain
>
> External Email - Use Caution
>
> Greetings,
> I would greatly appreciate anyone sharing their experience doing
NLP/lemmatization and am also very curious to gauge the opinion of the
lucene comm
-Zaniewski (BLOOMBERG/ 919 3RD A)
Sent: Saturday, November 19, 2022 10:27 PM
To: java-user@lucene.apache.org
Subject: Integrating NLP into Lucene Analysis Chain
External Email - Use Caution
Greetings,
I would greatly appreciate anyone sharing their experience doing
NLP/lemmatization and am
Hello, Benoit.
I just came across
https://lucene.apache.org/core/8_0_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/TypeAsSynonymFilterFactory.html
It sounds similar to what you asking, but it watches TypeAttribute only.
Also, spans are superseded with intervals
Hi Luke,
Thank you for your work and information sharing. From my point of view
lemmatization is just a use case of text token annotation. I have been
working with Lucene since 2006 to index lexicographic and linguistic
data and I always miss the fact that (1) token attributes are not
https://github.com/apache/lucene/pull/11955
On Sat, Nov 19, 2022 at 10:43 PM Robert Muir wrote:
>
> Hi,
>
> Is this 'synchronized' really needed?
>
> 1. Lucene tokenstreams are only used by a single thread. If you index
> with 10 threads, 10 tokenstreams are used.
> 2. These OpenNLP Factories
Hi,
Is this 'synchronized' really needed?
1. Lucene tokenstreams are only used by a single thread. If you index
with 10 threads, 10 tokenstreams are used.
2. These OpenNLP Factories make a new *Op for each tokenstream that
they create. so there's no thread hazard.
3. If i remove 'synchronized'
Greetings,
I would greatly appreciate anyone sharing their experience doing
NLP/lemmatization and am also very curious to gauge the opinion of the lucene
community regarding open-nlp. I know there are a few other libraries out there,
some of which can’t be directly included in the lucene