Your indexing client, if written in SolrJ, should use CloudSolrServer
which is, in Matt's terms "leader aware". It divides up the
documents to be indexed into packets that where each doc in
the packet belongs on the same shard, and then sends the packet
to the shard leader. This avoids a lot of re-routing and should
scale essentially linearly. You may have to add more clients
though, depending upon who hard the document-generator is
working.

Also, make sure that you send batches of documents as Shawn
suggests, I use 1,000 as a starting point.

Best,
Erick

On Thu, Oct 30, 2014 at 2:10 PM, Shawn Heisey <apa...@elyograg.org> wrote:
> On 10/30/2014 2:56 PM, Ian Rose wrote:
>> I think this is true only for actual queries, right? I am not issuing
>> any queries, only writes (document inserts). In the case of writes,
>> increasing the number of shards should increase my throughput (in
>> ops/sec) more or less linearly, right?
>
> No, that won't affect indexing speed all that much.  The way to increase
> indexing speed is to increase the number of processes or threads that
> are indexing at the same time.  Instead of having one client sending
> update requests, try five of them.  Also, index many documents with each
> update request.  Sending one document at a time is very inefficient.
>
> You didn't say how you're doing commits, but those need to be as
> infrequent as you can manage.  Ideally, you would use autoCommit with
> openSearcher=false on an interval of about five minutes, and send an
> explicit commit (with the default openSearcher=true) after all the
> indexing is done.
>
> You may have requirements regarding document visibility that this won't
> satisfy, but try to avoid doing commits with openSearcher=true (soft
> commits qualify for this) extremely frequently, like once a second.
> Once a minute is much more realistic.  Opening a new searcher is an
> expensive operation, especially if you have cache warming configured.
>
> Thanks,
> Shawn
>

Reply via email to