Re: Update Speed: QTime 1,000 - 5,000
you can mitigate the impact of throwing away caches on soft commits by doing appropriate autowarming, both the newSearcher and cache settings in solrconfig.xml. Be aware that you don't want to go overboard here, I'd start with 20 or so as the autowarm counts for queryResultCache and filterCache. And if you ever get warnings about "too many on deck searchers", your commit intervals are too short or your autowarm is too long. Do not try to fix this error by bumping the maxWarmingSearchers in solrconfig.xml. Best, Erick On Wed, Apr 6, 2016 at 3:49 AM, Alessandro Benedettiwrote: > On Wed, Apr 6, 2016 at 7:53 AM, Robert Brown wrote: > >> The QTime's are from the updates. >> >> We don't have the resource right now to switch to SolrJ, but I would >> assume only sending updates to the leaders would take some redirects out of >> the process, > > How do you route your documents now ? > Aren't you using Solr routing ? > > >> I can regularly query for the collection status to know who's who. >> >> I'm now more interested in the caches that are thrown away on softCommit, >> since we do see some performance issues on queries too. Would these caches >> affect querying and faceting? >> > > You should check your caches stats and performances. > Filter Cache could be heavily involved in querying and faceting . > Query Result Cache as the name says woul affect the query results fetching > as well. > Document cache will impact in fetching what you display for the documents. > Much more could be discussed about caching, a good start would be to verify > how currently your caches are configured and how they are currently > performing. > > Cheers > > >> >> Thanks, >> Rob >> >> >> >> >> On 06/04/16 00:41, Erick Erickson wrote: >> >>> bq: Apart from the obvious delay, I'm also seeing QTime's of 1,000 to >>> 5,000 >>> >>> QTimes for what? The update? Queries? If for queries, autowarming may >>> help, >>> especially as your soft commit is throwing away all the top-level >>> caches (i.e. the >>> ones configured in solrconfig.xml) every minute. It shouldn't be that bad >>> on the >>> lower-level Lucene caches though, at least the per-segment ones. >>> >>> You'll get some improvement by using SolrJ (with CloudSolrClient) >>> rather than cURL. >>> no matter which node you hit, about half your documents will have to >>> be forwarded to >>> the other shard when using cURL, whereas SolrJ (with CloudSolrClient) >>> will route the docs >>> to the correct leader right from the client. >>> >>> Best, >>> Erick >>> >>> On Tue, Apr 5, 2016 at 2:53 PM, John Bickerstaff >>> wrote: >>> A few thoughts... From a black-box testing perspective, you might try changing that softCommit time frame to something longer and see if it makes a difference. The size of your documents will make a difference too - so the comparison to 300 - 500 on other cloud setups may or may not be comparing apples to oranges... Are the "new" documents actually new or are you overwriting existing solr doc ID's? If you are overwriting, you may want to optimize and see if that helps. On Tue, Apr 5, 2016 at 2:38 PM, Robert Brown wrote: Hi, > > I'm currently posting updates via cURL, in batches of 1,000 docs in JSON > files. > > My setup consists of 2 shards, 1 replica each, 50m docs in total. > > These updates are hitting a node at random, from a server across the > Internet. > > Apart from the obvious delay, I'm also seeing QTime's of 1,000 to 5,000. > > This strikes me as quite high since I also sometimes see times of around > 300-500, on similar cloud setups. > > The setup is running on VMs with rotary disks, and enough RAM to hold > roughly half the entire index in disk cache (I'm in the process of > upgrading this). > > I hard commit every 10 minutes but don't open a new searcher, just to > make > sure data is "safe". I softCommit every 1 minute to make data > available. > > Are there any obvious things I can do to improve my situation? > > Thanks, > Rob > > > > > > >> > > > -- > -- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England
Re: Update Speed: QTime 1,000 - 5,000
On Wed, Apr 6, 2016 at 7:53 AM, Robert Brownwrote: > The QTime's are from the updates. > > We don't have the resource right now to switch to SolrJ, but I would > assume only sending updates to the leaders would take some redirects out of > the process, How do you route your documents now ? Aren't you using Solr routing ? > I can regularly query for the collection status to know who's who. > > I'm now more interested in the caches that are thrown away on softCommit, > since we do see some performance issues on queries too. Would these caches > affect querying and faceting? > You should check your caches stats and performances. Filter Cache could be heavily involved in querying and faceting . Query Result Cache as the name says woul affect the query results fetching as well. Document cache will impact in fetching what you display for the documents. Much more could be discussed about caching, a good start would be to verify how currently your caches are configured and how they are currently performing. Cheers > > Thanks, > Rob > > > > > On 06/04/16 00:41, Erick Erickson wrote: > >> bq: Apart from the obvious delay, I'm also seeing QTime's of 1,000 to >> 5,000 >> >> QTimes for what? The update? Queries? If for queries, autowarming may >> help, >> especially as your soft commit is throwing away all the top-level >> caches (i.e. the >> ones configured in solrconfig.xml) every minute. It shouldn't be that bad >> on the >> lower-level Lucene caches though, at least the per-segment ones. >> >> You'll get some improvement by using SolrJ (with CloudSolrClient) >> rather than cURL. >> no matter which node you hit, about half your documents will have to >> be forwarded to >> the other shard when using cURL, whereas SolrJ (with CloudSolrClient) >> will route the docs >> to the correct leader right from the client. >> >> Best, >> Erick >> >> On Tue, Apr 5, 2016 at 2:53 PM, John Bickerstaff >> wrote: >> >>> A few thoughts... >>> >>> From a black-box testing perspective, you might try changing that >>> softCommit time frame to something longer and see if it makes a >>> difference. >>> >>> The size of your documents will make a difference too - so the >>> comparison >>> to 300 - 500 on other cloud setups may or may not be comparing apples to >>> oranges... >>> >>> Are the "new" documents actually new or are you overwriting existing solr >>> doc ID's? If you are overwriting, you may want to optimize and see if >>> that >>> helps. >>> >>> >>> >>> On Tue, Apr 5, 2016 at 2:38 PM, Robert Brown >>> wrote: >>> >>> Hi, I'm currently posting updates via cURL, in batches of 1,000 docs in JSON files. My setup consists of 2 shards, 1 replica each, 50m docs in total. These updates are hitting a node at random, from a server across the Internet. Apart from the obvious delay, I'm also seeing QTime's of 1,000 to 5,000. This strikes me as quite high since I also sometimes see times of around 300-500, on similar cloud setups. The setup is running on VMs with rotary disks, and enough RAM to hold roughly half the entire index in disk cache (I'm in the process of upgrading this). I hard commit every 10 minutes but don't open a new searcher, just to make sure data is "safe". I softCommit every 1 minute to make data available. Are there any obvious things I can do to improve my situation? Thanks, Rob > -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England
Re: Update Speed: QTime 1,000 - 5,000
The QTime's are from the updates. We don't have the resource right now to switch to SolrJ, but I would assume only sending updates to the leaders would take some redirects out of the process, I can regularly query for the collection status to know who's who. I'm now more interested in the caches that are thrown away on softCommit, since we do see some performance issues on queries too. Would these caches affect querying and faceting? Thanks, Rob On 06/04/16 00:41, Erick Erickson wrote: bq: Apart from the obvious delay, I'm also seeing QTime's of 1,000 to 5,000 QTimes for what? The update? Queries? If for queries, autowarming may help, especially as your soft commit is throwing away all the top-level caches (i.e. the ones configured in solrconfig.xml) every minute. It shouldn't be that bad on the lower-level Lucene caches though, at least the per-segment ones. You'll get some improvement by using SolrJ (with CloudSolrClient) rather than cURL. no matter which node you hit, about half your documents will have to be forwarded to the other shard when using cURL, whereas SolrJ (with CloudSolrClient) will route the docs to the correct leader right from the client. Best, Erick On Tue, Apr 5, 2016 at 2:53 PM, John Bickerstaffwrote: A few thoughts... From a black-box testing perspective, you might try changing that softCommit time frame to something longer and see if it makes a difference. The size of your documents will make a difference too - so the comparison to 300 - 500 on other cloud setups may or may not be comparing apples to oranges... Are the "new" documents actually new or are you overwriting existing solr doc ID's? If you are overwriting, you may want to optimize and see if that helps. On Tue, Apr 5, 2016 at 2:38 PM, Robert Brown wrote: Hi, I'm currently posting updates via cURL, in batches of 1,000 docs in JSON files. My setup consists of 2 shards, 1 replica each, 50m docs in total. These updates are hitting a node at random, from a server across the Internet. Apart from the obvious delay, I'm also seeing QTime's of 1,000 to 5,000. This strikes me as quite high since I also sometimes see times of around 300-500, on similar cloud setups. The setup is running on VMs with rotary disks, and enough RAM to hold roughly half the entire index in disk cache (I'm in the process of upgrading this). I hard commit every 10 minutes but don't open a new searcher, just to make sure data is "safe". I softCommit every 1 minute to make data available. Are there any obvious things I can do to improve my situation? Thanks, Rob
Re: Update Speed: QTime 1,000 - 5,000
bq: Apart from the obvious delay, I'm also seeing QTime's of 1,000 to 5,000 QTimes for what? The update? Queries? If for queries, autowarming may help, especially as your soft commit is throwing away all the top-level caches (i.e. the ones configured in solrconfig.xml) every minute. It shouldn't be that bad on the lower-level Lucene caches though, at least the per-segment ones. You'll get some improvement by using SolrJ (with CloudSolrClient) rather than cURL. no matter which node you hit, about half your documents will have to be forwarded to the other shard when using cURL, whereas SolrJ (with CloudSolrClient) will route the docs to the correct leader right from the client. Best, Erick On Tue, Apr 5, 2016 at 2:53 PM, John Bickerstaffwrote: > A few thoughts... > > From a black-box testing perspective, you might try changing that > softCommit time frame to something longer and see if it makes a difference. > > The size of your documents will make a difference too - so the comparison > to 300 - 500 on other cloud setups may or may not be comparing apples to > oranges... > > Are the "new" documents actually new or are you overwriting existing solr > doc ID's? If you are overwriting, you may want to optimize and see if that > helps. > > > > On Tue, Apr 5, 2016 at 2:38 PM, Robert Brown wrote: > >> Hi, >> >> I'm currently posting updates via cURL, in batches of 1,000 docs in JSON >> files. >> >> My setup consists of 2 shards, 1 replica each, 50m docs in total. >> >> These updates are hitting a node at random, from a server across the >> Internet. >> >> Apart from the obvious delay, I'm also seeing QTime's of 1,000 to 5,000. >> >> This strikes me as quite high since I also sometimes see times of around >> 300-500, on similar cloud setups. >> >> The setup is running on VMs with rotary disks, and enough RAM to hold >> roughly half the entire index in disk cache (I'm in the process of >> upgrading this). >> >> I hard commit every 10 minutes but don't open a new searcher, just to make >> sure data is "safe". I softCommit every 1 minute to make data available. >> >> Are there any obvious things I can do to improve my situation? >> >> Thanks, >> Rob >> >> >> >> >>
Re: Update Speed: QTime 1,000 - 5,000
A few thoughts... >From a black-box testing perspective, you might try changing that softCommit time frame to something longer and see if it makes a difference. The size of your documents will make a difference too - so the comparison to 300 - 500 on other cloud setups may or may not be comparing apples to oranges... Are the "new" documents actually new or are you overwriting existing solr doc ID's? If you are overwriting, you may want to optimize and see if that helps. On Tue, Apr 5, 2016 at 2:38 PM, Robert Brownwrote: > Hi, > > I'm currently posting updates via cURL, in batches of 1,000 docs in JSON > files. > > My setup consists of 2 shards, 1 replica each, 50m docs in total. > > These updates are hitting a node at random, from a server across the > Internet. > > Apart from the obvious delay, I'm also seeing QTime's of 1,000 to 5,000. > > This strikes me as quite high since I also sometimes see times of around > 300-500, on similar cloud setups. > > The setup is running on VMs with rotary disks, and enough RAM to hold > roughly half the entire index in disk cache (I'm in the process of > upgrading this). > > I hard commit every 10 minutes but don't open a new searcher, just to make > sure data is "safe". I softCommit every 1 minute to make data available. > > Are there any obvious things I can do to improve my situation? > > Thanks, > Rob > > > > >
Update Speed: QTime 1,000 - 5,000
Hi, I'm currently posting updates via cURL, in batches of 1,000 docs in JSON files. My setup consists of 2 shards, 1 replica each, 50m docs in total. These updates are hitting a node at random, from a server across the Internet. Apart from the obvious delay, I'm also seeing QTime's of 1,000 to 5,000. This strikes me as quite high since I also sometimes see times of around 300-500, on similar cloud setups. The setup is running on VMs with rotary disks, and enough RAM to hold roughly half the entire index in disk cache (I'm in the process of upgrading this). I hard commit every 10 minutes but don't open a new searcher, just to make sure data is "safe". I softCommit every 1 minute to make data available. Are there any obvious things I can do to improve my situation? Thanks, Rob