I think what makes the most sense is to limit the number of
connections to another host.  A host only has so many CPU resources,
and beyond a certain point throughput would start to suffer anyway
(and then only make the problem worse).  It also makes sense in that a
client could generate documents faster than we can index them (either
for a short period of time, or on average) and having flow control to
prevent unlimited buffering (which is essentially what this is) makes
sense.

Nick - when you switched to HttpSolrServer, things worked because this
added an explicit flow control mechanism.
A single request (i.e. an add with one or more documents) is fully
indexed to all endpoints before the response is returned.  Hence if
you have 10 indexing threads and are adding documents in batches of
100, there can be only 1000 documents buffered in the system at any
one time.

-Yonik
http://lucidimagination.com

Reply via email to