First, you shouldn't be using HttpSolrClient, use CloudSolrServer
(CloudSolrClient in 5.x). That takes
the ZK address and routes the docs to the leader, reducing the network
hops docs have to go
through. AFAIK, in cloud setups it is in every way superior to http.

I'm guessing your docs aren't huge. You haven't really told us what
"high indexing rates" and
"high query rates" are in your environment, so it's hard to say much.
For comparison I get
2-3K docs/sec on my laptop (no query load though).

The most frequent problem for nodes going into recovery in this
scenario is the ZK timeout
being exceeded. This is often triggered by excessive GC pauses, some
more details would
help here:

How much memory are you allocating to Solr? Have you turned on GC
logging to see whether
you're getting "stop the world" GC pauses? What rates _are_ you seeing?

Personally, I'd concentrate on the nodes going into recovery before
anything else. Until that's
fixed any other things you do will not be predictive of much.

BTW, I typically start with batch sizes of 1,000 FWIW. Sometimes
that's too big, sometimes
too small but it seems pretty reasonable most of the time.

Best,
Erick

On Thu, Apr 30, 2015 at 12:20 PM, Vinay Pothnis <poth...@gmail.com> wrote:
> Hello,
>
> I have a usecase with the following characteristics:
>
>  - High index update rate (adds/updates)
>  - High query rate
>  - Low index size (~800MB for 2.4Million docs)
>  - The documents that are created at the high rate eventually "expire" and
> are deleted regularly at half hour intervals
>
> I currently have a solr cloud set up with 1 shard and 4 replicas.
>  * My index updates are sent to a VIP/loadbalancer (round robins to one of
> the 4 solr nodes)
>  * I am using http client to send the updates
>  * Using batch size of 100 and 8 to 10 threads sending the batch of updates
> to solr.
>
> When I try to run tests to scale out the indexing rate, I see the following:
>  * solr nodes go into recovery
>  * updates are taking really long to complete.
>
> As I understand, when a node receives an update:
>  * If it is the leader, it forwards the update to all the replicas and
> waits until it receives the reply from all of them before replying back to
> the client that sent the reply.
>  * If it is not the leader, it forwards the update to the leader, which
> THEN does the above steps mentioned.
>
> How do I go about scaling the index updates:
>  * As I add more replicas, my updates would get slower and slower?
>  * Is there a way I can configure the leader to wait for say N out of M
> replicas only?
>  * Should I be targeting the updates to only the leader?
>  * Any other approach i should be considering?
>
> Thanks
> Vinay

Reply via email to