The QTime's are from the updates.

We don't have the resource right now to switch to SolrJ, but I would assume only sending updates to the leaders would take some redirects out of the process, I can regularly query for the collection status to know who's who.

I'm now more interested in the caches that are thrown away on softCommit, since we do see some performance issues on queries too. Would these caches affect querying and faceting?

Thanks,
Rob



On 06/04/16 00:41, Erick Erickson wrote:
bq: Apart from the obvious delay, I'm also seeing QTime's of 1,000 to 5,000....

QTimes for what? The update? Queries? If for queries, autowarming may help,
especially as your soft commit is throwing away all the top-level
caches (i.e. the
ones configured in solrconfig.xml) every minute. It shouldn't be that bad on the
lower-level Lucene caches though, at least the per-segment ones.

You'll get some improvement by using SolrJ (with CloudSolrClient)
rather than cURL.
no matter which node you hit, about half your documents will have to
be forwarded to
the other shard when using cURL, whereas SolrJ (with CloudSolrClient)
will route the docs
to the correct leader right from the client.

Best,
Erick

On Tue, Apr 5, 2016 at 2:53 PM, John Bickerstaff
<j...@johnbickerstaff.com> wrote:
A few thoughts...

 From a black-box testing perspective, you might try changing that
softCommit time frame  to something longer and see if it makes a difference.

The size of  your documents will make a difference too - so the comparison
to 300 - 500 on other cloud setups may or may not be comparing apples to
oranges...

Are the "new" documents actually new or are you overwriting existing solr
doc ID's?  If you are overwriting, you may want to optimize and see if that
helps.



On Tue, Apr 5, 2016 at 2:38 PM, Robert Brown <r...@intelcompute.com> wrote:

Hi,

I'm currently posting updates via cURL, in batches of 1,000 docs in JSON
files.

My setup consists of 2 shards, 1 replica each, 50m docs in total.

These updates are hitting a node at random, from a server across the
Internet.

Apart from the obvious delay, I'm also seeing QTime's of 1,000 to 5,000.

This strikes me as quite high since I also sometimes see times of around
300-500, on similar cloud setups.

The setup is running on VMs with rotary disks, and enough RAM to hold
roughly half the entire index in disk cache (I'm in the process of
upgrading this).

I hard commit every 10 minutes but don't open a new searcher, just to make
sure data is "safe".  I softCommit every 1 minute to make data available.

Are there any obvious things I can do to improve my situation?

Thanks,
Rob






Reply via email to