Hello!
I tried to perform indexing multithreadedly, with a FixedThreadPool of Callable
workers.
The main operation - parsing a single document and addDocument() to the index -
is done by a single worker.
After parsing a document, a lot (really a lot) of Strings appears, and at the
end of the
Hi,
why are you doing this? Lucene's IndexWriter can handle addDocuments in
multiple threads. And, since Lucene 4, it will process them almost completely
parallel!
If you do the addDocuments single-threaded you are adding an additional
bottleneck in your application. If you are doing a
The word delimiter filter has the ability to pass a table which specifies
the type for a character:
http://lucene.apache.org/core/4_5_1/analyzers-common/org/apache/lucene/analysis/miscellaneous/WordDelimiterFilter.html
Hi,
I am using lucene 3.6 and I am looking to a tokenized that would remove
certain characters when they are present at the beginning or at the end of
a token.
I initially used the StandardAnalyzer and switched to the
WhitespaceAnalyser because it was too agressive for my use case.
A few