Chris Lamprecht wrote:
I've done exactly what you describe, using N threads where N is the
number of processors on the machine, plus one more thread that writes
to the file system index (since that is I/O-bound anyway). Since most
of the CPU time is tokenizing/stemming/etc, the method works well.
Hi Sodel,
You could use a single queue, where one thread pulls things off the
queue and any number of threads put things on the queue. You can
index say 1000 documents each to RAMDirectories in multiple threads,
then enqueue the RAMDirectories. When the queue reaches a certain
size, the single t
Hi ,
The calls to the IndexWriter.addIndexes is synchronized. Your code
should not have to do anything more than just calling it.
I believe roughly this will be the scenario that you are looking for:
- while(there is more data)
- spawn a thread to handle creating documents for this data
You should only give a single thread access to the indexwriter. I have created
a indexupdater that stores all the delete and write requests and once and a
while a thread (triggered by Quartz) processes the requests in a single batch.
another way would be synchronizing the indexupdater and only