Folks:
I have a corpus of approx 6 M documents each of approx 4K bytes.
Currently, the way indexing is set up I read documents from a database and
issue solr post requests in batches (batches are set up so that the
maxPostSize of tomcat which is set to 2MB is adhered to). This means that
in each batch we write approx 600 or so documents to SOLR. What I am seeing
is that I am able to push about 2500 docs per minute or approx 40 or so per
second.
I saw in Erik's talk on Friday that speeds of 250 docs/sec to 25000 docs/sec
have been achieved. Needless to say I am sure that performance numbers vary
widely and are dependent on the domain, machine configurations, etc.
I am running on Windows 2003 server, with 4 GB RAM, dual core xeon.
Any tips on what I can do to speed this up?
Thanks,
Bill