Right after I sent the email I went on and checked for uniqueness of documents...

In theory the were all supposed to be unique... But i've realized that the platform I'm using to reindex, is delaying sending the requests, this in combination with my reindexers reusing document fields (instead of creating new instances to save on GC) lead to the same document being sent many times with invalid data...

I am fairly sure now that this is the source of my problem... My reindexers originally used LuceneWriter directly, which blocks thread excecution until the document is added to the index, and the new framework i'm using uses messaging which releases control back to the thread before the documents are actually sent to be indexed, my threads update the document fields meanwhile, so the data written to the index is transitioning and invalid...

I've done an adjustment to my reindexing threads to ensure new instances of everything are used... I will test it shortly...

But you point out exactly why i have less documents than 'add' requests...

Thanks!

Shalin Shekhar Mangar wrote:
On Fri, Jun 12, 2009 at 11:40 PM, Alexander Wallace <a...@rwmotloc.com> wrote:

Hi all!

I'm using Solr 1.3 and currently testing reindexing...

In my client app, i am sending 17494 requests to add documents...  In 3
different scenarios:

a) not using threads
b) using 1 thread
c) using 2 threads

In scenario a), everything seems to work fine... In my client log, is see
17494 requests sent to solr, in solr's log, I see the same number of 'add'
requests received, and If i search the index, i can see the same amount of
documents.

However, if I use 1 thread, I see the right amount of requests in logs, but
I only find 15k or so documents (this varies a bit every time i run this
scenario).

It gets way worse if I use 2 threads... I can see the right amount of
requests in both logs, but i end up with ~ 600 docs in the index!

In all scenarios, I don't see any errors on the logs...

As you can imagine, I need to be able to use multiple threads to speed up
the process... It is also very concertning that I don't get any errors
anywhere...

Looking at solr's admin stats, I see also 17494 cumulative adds, but only a
tiny fraction of actual documents can be found...

Any clues?


What is the uniqueKey in your schema.xml? Is it possible that those 17494
documents have a common uniqueKey and are therefore getting overwritten?

Reply via email to