Lucene multithreaded indexing problems

2013-11-21 Thread Igor Shalyminov
Hello! I tried to perform indexing multithreadedly, with a FixedThreadPool of Callable workers. The main operation - parsing a single document and addDocument() to the index - is done by a single worker. After parsing a document, a lot (really a lot) of Strings appears, and at the end of the

RE: Lucene multithreaded indexing problems

2013-11-21 Thread Uwe Schindler
Hi, why are you doing this? Lucene's IndexWriter can handle addDocuments in multiple threads. And, since Lucene 4, it will process them almost completely parallel! If you do the addDocuments single-threaded you are adding an additional bottleneck in your application. If you are doing a

Re: tokenizer to strip a set of characters

2013-11-21 Thread Jack Krupansky
The word delimiter filter has the ability to pass a table which specifies the type for a character: http://lucene.apache.org/core/4_5_1/analyzers-common/org/apache/lucene/analysis/miscellaneous/WordDelimiterFilter.html

tokenizer to strip a set of characters

2013-11-21 Thread Stephane Nicoll
Hi, I am using lucene 3.6 and I am looking to a tokenized that would remove certain characters when they are present at the beginning or at the end of a token. I initially used the StandardAnalyzer and switched to the WhitespaceAnalyser because it was too agressive for my use case. A few