date:20131121

Lucene multithreaded indexing problems

2013-11-21 Thread Igor Shalyminov

Hello! I tried to perform indexing multithreadedly, with a FixedThreadPool of Callable workers. The main operation - parsing a single document and addDocument() to the index - is done by a single worker. After parsing a document, a lot (really a lot) of Strings appears, and at the end of the

RE: Lucene multithreaded indexing problems

2013-11-21 Thread Uwe Schindler

Hi, why are you doing this? Lucene's IndexWriter can handle addDocuments in multiple threads. And, since Lucene 4, it will process them almost completely parallel! If you do the addDocuments single-threaded you are adding an additional bottleneck in your application. If you are doing a

Re: tokenizer to strip a set of characters

2013-11-21 Thread Jack Krupansky

The word delimiter filter has the ability to pass a table which specifies the type for a character: http://lucene.apache.org/core/4_5_1/analyzers-common/org/apache/lucene/analysis/miscellaneous/WordDelimiterFilter.html

tokenizer to strip a set of characters

2013-11-21 Thread Stephane Nicoll

Hi, I am using lucene 3.6 and I am looking to a tokenized that would remove certain characters when they are present at the beginning or at the end of a token. I initially used the StandardAnalyzer and switched to the WhitespaceAnalyser because it was too agressive for my use case. A few

Lucene multithreaded indexing problems

RE: Lucene multithreaded indexing problems

Re: tokenizer to strip a set of characters

tokenizer to strip a set of characters

4 matches

Site Navigation

Mail list logo

Footer information