Re: Update Speed: QTime 1,000 - 5,000

2016-04-06 Thread Erick Erickson
you can mitigate the impact of throwing away caches on soft commits by
doing appropriate autowarming, both the newSearcher and cache settings
in solrconfig.xml.

Be aware that you don't want to go overboard here, I'd start with 20
or so as the autowarm counts for queryResultCache and filterCache.

And if you ever get warnings about "too many on deck searchers", your
commit intervals are too short or your autowarm is too long. Do not
try to fix this error by bumping the maxWarmingSearchers in
solrconfig.xml.

Best,
Erick

On Wed, Apr 6, 2016 at 3:49 AM, Alessandro Benedetti
 wrote:
> On Wed, Apr 6, 2016 at 7:53 AM, Robert Brown  wrote:
>
>> The QTime's are from the updates.
>>
>> We don't have the resource right now to switch to SolrJ, but I would
>> assume only sending updates to the leaders would take some redirects out of
>> the process,
>
> How do you route your documents now ?
> Aren't you using Solr routing ?
>
>
>> I can regularly query for the collection status to know who's who.
>>
>> I'm now more interested in the caches that are thrown away on softCommit,
>> since we do see some performance issues on queries too. Would these caches
>> affect querying and faceting?
>>
>
> You should check your caches stats and performances.
> Filter Cache could be heavily involved in querying and faceting .
> Query Result Cache as the name says woul affect the query results fetching
> as well.
> Document cache will impact in fetching what you display for the documents.
> Much more could be discussed about caching, a good start would be to verify
> how currently your caches are configured and how they are currently
> performing.
>
> Cheers
>
>
>>
>> Thanks,
>> Rob
>>
>>
>>
>>
>> On 06/04/16 00:41, Erick Erickson wrote:
>>
>>> bq: Apart from the obvious delay, I'm also seeing QTime's of 1,000 to
>>> 5,000
>>>
>>> QTimes for what? The update? Queries? If for queries, autowarming may
>>> help,
>>> especially as your soft commit is throwing away all the top-level
>>> caches (i.e. the
>>> ones configured in solrconfig.xml) every minute. It shouldn't be that bad
>>> on the
>>> lower-level Lucene caches though, at least the per-segment ones.
>>>
>>> You'll get some improvement by using SolrJ (with CloudSolrClient)
>>> rather than cURL.
>>> no matter which node you hit, about half your documents will have to
>>> be forwarded to
>>> the other shard when using cURL, whereas SolrJ (with CloudSolrClient)
>>> will route the docs
>>> to the correct leader right from the client.
>>>
>>> Best,
>>> Erick
>>>
>>> On Tue, Apr 5, 2016 at 2:53 PM, John Bickerstaff
>>>  wrote:
>>>
 A few thoughts...

  From a black-box testing perspective, you might try changing that
 softCommit time frame  to something longer and see if it makes a
 difference.

 The size of  your documents will make a difference too - so the
 comparison
 to 300 - 500 on other cloud setups may or may not be comparing apples to
 oranges...

 Are the "new" documents actually new or are you overwriting existing solr
 doc ID's?  If you are overwriting, you may want to optimize and see if
 that
 helps.



 On Tue, Apr 5, 2016 at 2:38 PM, Robert Brown 
 wrote:

 Hi,
>
> I'm currently posting updates via cURL, in batches of 1,000 docs in JSON
> files.
>
> My setup consists of 2 shards, 1 replica each, 50m docs in total.
>
> These updates are hitting a node at random, from a server across the
> Internet.
>
> Apart from the obvious delay, I'm also seeing QTime's of 1,000 to 5,000.
>
> This strikes me as quite high since I also sometimes see times of around
> 300-500, on similar cloud setups.
>
> The setup is running on VMs with rotary disks, and enough RAM to hold
> roughly half the entire index in disk cache (I'm in the process of
> upgrading this).
>
> I hard commit every 10 minutes but don't open a new searcher, just to
> make
> sure data is "safe".  I softCommit every 1 minute to make data
> available.
>
> Are there any obvious things I can do to improve my situation?
>
> Thanks,
> Rob
>
>
>
>
>
>
>>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England


Re: Update Speed: QTime 1,000 - 5,000

2016-04-06 Thread Alessandro Benedetti
On Wed, Apr 6, 2016 at 7:53 AM, Robert Brown  wrote:

> The QTime's are from the updates.
>
> We don't have the resource right now to switch to SolrJ, but I would
> assume only sending updates to the leaders would take some redirects out of
> the process,

How do you route your documents now ?
Aren't you using Solr routing ?


> I can regularly query for the collection status to know who's who.
>
> I'm now more interested in the caches that are thrown away on softCommit,
> since we do see some performance issues on queries too. Would these caches
> affect querying and faceting?
>

You should check your caches stats and performances.
Filter Cache could be heavily involved in querying and faceting .
Query Result Cache as the name says woul affect the query results fetching
as well.
Document cache will impact in fetching what you display for the documents.
Much more could be discussed about caching, a good start would be to verify
how currently your caches are configured and how they are currently
performing.

Cheers


>
> Thanks,
> Rob
>
>
>
>
> On 06/04/16 00:41, Erick Erickson wrote:
>
>> bq: Apart from the obvious delay, I'm also seeing QTime's of 1,000 to
>> 5,000
>>
>> QTimes for what? The update? Queries? If for queries, autowarming may
>> help,
>> especially as your soft commit is throwing away all the top-level
>> caches (i.e. the
>> ones configured in solrconfig.xml) every minute. It shouldn't be that bad
>> on the
>> lower-level Lucene caches though, at least the per-segment ones.
>>
>> You'll get some improvement by using SolrJ (with CloudSolrClient)
>> rather than cURL.
>> no matter which node you hit, about half your documents will have to
>> be forwarded to
>> the other shard when using cURL, whereas SolrJ (with CloudSolrClient)
>> will route the docs
>> to the correct leader right from the client.
>>
>> Best,
>> Erick
>>
>> On Tue, Apr 5, 2016 at 2:53 PM, John Bickerstaff
>>  wrote:
>>
>>> A few thoughts...
>>>
>>>  From a black-box testing perspective, you might try changing that
>>> softCommit time frame  to something longer and see if it makes a
>>> difference.
>>>
>>> The size of  your documents will make a difference too - so the
>>> comparison
>>> to 300 - 500 on other cloud setups may or may not be comparing apples to
>>> oranges...
>>>
>>> Are the "new" documents actually new or are you overwriting existing solr
>>> doc ID's?  If you are overwriting, you may want to optimize and see if
>>> that
>>> helps.
>>>
>>>
>>>
>>> On Tue, Apr 5, 2016 at 2:38 PM, Robert Brown 
>>> wrote:
>>>
>>> Hi,

 I'm currently posting updates via cURL, in batches of 1,000 docs in JSON
 files.

 My setup consists of 2 shards, 1 replica each, 50m docs in total.

 These updates are hitting a node at random, from a server across the
 Internet.

 Apart from the obvious delay, I'm also seeing QTime's of 1,000 to 5,000.

 This strikes me as quite high since I also sometimes see times of around
 300-500, on similar cloud setups.

 The setup is running on VMs with rotary disks, and enough RAM to hold
 roughly half the entire index in disk cache (I'm in the process of
 upgrading this).

 I hard commit every 10 minutes but don't open a new searcher, just to
 make
 sure data is "safe".  I softCommit every 1 minute to make data
 available.

 Are there any obvious things I can do to improve my situation?

 Thanks,
 Rob






>


-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Update Speed: QTime 1,000 - 5,000

2016-04-06 Thread Robert Brown

The QTime's are from the updates.

We don't have the resource right now to switch to SolrJ, but I would 
assume only sending updates to the leaders would take some redirects out 
of the process, I can regularly query for the collection status to know 
who's who.


I'm now more interested in the caches that are thrown away on 
softCommit, since we do see some performance issues on queries too. 
Would these caches affect querying and faceting?


Thanks,
Rob



On 06/04/16 00:41, Erick Erickson wrote:

bq: Apart from the obvious delay, I'm also seeing QTime's of 1,000 to 5,000

QTimes for what? The update? Queries? If for queries, autowarming may help,
especially as your soft commit is throwing away all the top-level
caches (i.e. the
ones configured in solrconfig.xml) every minute. It shouldn't be that bad on the
lower-level Lucene caches though, at least the per-segment ones.

You'll get some improvement by using SolrJ (with CloudSolrClient)
rather than cURL.
no matter which node you hit, about half your documents will have to
be forwarded to
the other shard when using cURL, whereas SolrJ (with CloudSolrClient)
will route the docs
to the correct leader right from the client.

Best,
Erick

On Tue, Apr 5, 2016 at 2:53 PM, John Bickerstaff
 wrote:

A few thoughts...

 From a black-box testing perspective, you might try changing that
softCommit time frame  to something longer and see if it makes a difference.

The size of  your documents will make a difference too - so the comparison
to 300 - 500 on other cloud setups may or may not be comparing apples to
oranges...

Are the "new" documents actually new or are you overwriting existing solr
doc ID's?  If you are overwriting, you may want to optimize and see if that
helps.



On Tue, Apr 5, 2016 at 2:38 PM, Robert Brown  wrote:


Hi,

I'm currently posting updates via cURL, in batches of 1,000 docs in JSON
files.

My setup consists of 2 shards, 1 replica each, 50m docs in total.

These updates are hitting a node at random, from a server across the
Internet.

Apart from the obvious delay, I'm also seeing QTime's of 1,000 to 5,000.

This strikes me as quite high since I also sometimes see times of around
300-500, on similar cloud setups.

The setup is running on VMs with rotary disks, and enough RAM to hold
roughly half the entire index in disk cache (I'm in the process of
upgrading this).

I hard commit every 10 minutes but don't open a new searcher, just to make
sure data is "safe".  I softCommit every 1 minute to make data available.

Are there any obvious things I can do to improve my situation?

Thanks,
Rob









Re: Update Speed: QTime 1,000 - 5,000

2016-04-05 Thread Erick Erickson
bq: Apart from the obvious delay, I'm also seeing QTime's of 1,000 to 5,000

QTimes for what? The update? Queries? If for queries, autowarming may help,
especially as your soft commit is throwing away all the top-level
caches (i.e. the
ones configured in solrconfig.xml) every minute. It shouldn't be that bad on the
lower-level Lucene caches though, at least the per-segment ones.

You'll get some improvement by using SolrJ (with CloudSolrClient)
rather than cURL.
no matter which node you hit, about half your documents will have to
be forwarded to
the other shard when using cURL, whereas SolrJ (with CloudSolrClient)
will route the docs
to the correct leader right from the client.

Best,
Erick

On Tue, Apr 5, 2016 at 2:53 PM, John Bickerstaff
 wrote:
> A few thoughts...
>
> From a black-box testing perspective, you might try changing that
> softCommit time frame  to something longer and see if it makes a difference.
>
> The size of  your documents will make a difference too - so the comparison
> to 300 - 500 on other cloud setups may or may not be comparing apples to
> oranges...
>
> Are the "new" documents actually new or are you overwriting existing solr
> doc ID's?  If you are overwriting, you may want to optimize and see if that
> helps.
>
>
>
> On Tue, Apr 5, 2016 at 2:38 PM, Robert Brown  wrote:
>
>> Hi,
>>
>> I'm currently posting updates via cURL, in batches of 1,000 docs in JSON
>> files.
>>
>> My setup consists of 2 shards, 1 replica each, 50m docs in total.
>>
>> These updates are hitting a node at random, from a server across the
>> Internet.
>>
>> Apart from the obvious delay, I'm also seeing QTime's of 1,000 to 5,000.
>>
>> This strikes me as quite high since I also sometimes see times of around
>> 300-500, on similar cloud setups.
>>
>> The setup is running on VMs with rotary disks, and enough RAM to hold
>> roughly half the entire index in disk cache (I'm in the process of
>> upgrading this).
>>
>> I hard commit every 10 minutes but don't open a new searcher, just to make
>> sure data is "safe".  I softCommit every 1 minute to make data available.
>>
>> Are there any obvious things I can do to improve my situation?
>>
>> Thanks,
>> Rob
>>
>>
>>
>>
>>


Re: Update Speed: QTime 1,000 - 5,000

2016-04-05 Thread John Bickerstaff
A few thoughts...

>From a black-box testing perspective, you might try changing that
softCommit time frame  to something longer and see if it makes a difference.

The size of  your documents will make a difference too - so the comparison
to 300 - 500 on other cloud setups may or may not be comparing apples to
oranges...

Are the "new" documents actually new or are you overwriting existing solr
doc ID's?  If you are overwriting, you may want to optimize and see if that
helps.



On Tue, Apr 5, 2016 at 2:38 PM, Robert Brown  wrote:

> Hi,
>
> I'm currently posting updates via cURL, in batches of 1,000 docs in JSON
> files.
>
> My setup consists of 2 shards, 1 replica each, 50m docs in total.
>
> These updates are hitting a node at random, from a server across the
> Internet.
>
> Apart from the obvious delay, I'm also seeing QTime's of 1,000 to 5,000.
>
> This strikes me as quite high since I also sometimes see times of around
> 300-500, on similar cloud setups.
>
> The setup is running on VMs with rotary disks, and enough RAM to hold
> roughly half the entire index in disk cache (I'm in the process of
> upgrading this).
>
> I hard commit every 10 minutes but don't open a new searcher, just to make
> sure data is "safe".  I softCommit every 1 minute to make data available.
>
> Are there any obvious things I can do to improve my situation?
>
> Thanks,
> Rob
>
>
>
>
>


Update Speed: QTime 1,000 - 5,000

2016-04-05 Thread Robert Brown

Hi,

I'm currently posting updates via cURL, in batches of 1,000 docs in JSON 
files.


My setup consists of 2 shards, 1 replica each, 50m docs in total.

These updates are hitting a node at random, from a server across the 
Internet.


Apart from the obvious delay, I'm also seeing QTime's of 1,000 to 5,000.

This strikes me as quite high since I also sometimes see times of around 
300-500, on similar cloud setups.


The setup is running on VMs with rotary disks, and enough RAM to hold 
roughly half the entire index in disk cache (I'm in the process of 
upgrading this).


I hard commit every 10 minutes but don't open a new searcher, just to 
make sure data is "safe".  I softCommit every 1 minute to make data 
available.


Are there any obvious things I can do to improve my situation?

Thanks,
Rob