Re: Slow highlighting on Solr 5.0.0
I¹ve been looking into this again. The phrase highlighter is much slower than the default highlighter, so you might be able to add hl.usePhraseHighlighter=false to your query to make it faster. Note that web interface will NOT help here, because that param is true by default, and the checkbox is basically broken in that respect. Also, the default highlighter doesn¹t seem to work in all case the phrase highlighter does though. Also, the current development branch of 5x is much better than 5.1, but not as good as 4.10. This ticket seems to be hitting on some of the issues at hand: https://issues.apache.org/jira/browse/SOLR-5855 I think this means they are getting there, but the performance is really still much worse than 4.10, and its not obvious why. On 5/5/15, 2:06 AM, Ere Maijala ere.maij...@helsinki.fi wrote: I'm seeing the same with Solr 5.1.0 after upgrading from 4.10.2. Here are my timings: 4.10.2: process: 1432.0 highlight: 723.0 5.1.0: process: 9570.0 highlight: 8790.0 schema.xml and solrconfig.xml are available at https://github.com/NatLibFi/NDL-VuFind-Solr/tree/master/vufind/biblio/conf . A couple of jstack outputs taken when the query was executing are available at http://pastebin.com/eJrEy2Wb Any suggestions would be appreciated. Or would it make sense to just file a JIRA issue? --Ere 3.3.2015, 0.48, Matt Hilt kirjoitti: Short form: While testing Solr 5.0.0 within our staging environment, I noticed that highlight enabled queries are much slower than I saw with 4.10. Are there any obvious reasons why this might be the case? As far as I can tell, nothing has changed with the default highlight search component or its parameters. A little more detail: The bulk of the collection config set was stolen from the basic 4.X example config set. I changed my schema.xml and solrconfig.xml just enough to get 5.0 to create a new collection (removed non-trie fields, some other deprecated response handler definitions, etc). I can provide my version of the solr.HighlightComponent config, but it is identical to the sample_techproducts_configs example in 5.0. Are there any other config files I could provide that might be useful? Number on ³much slower²: I indexed a very small subset of my data into the new collection and used the /select interface to do a simple debug query. Solr 4.10 gives the following pertinent info: response: { numFound: 72628, ... debug: { timing: { time: 95, process: { time: 94, query: { time: 6 }, highlight: { time: 84 }, debug: { time: 4 } } --- Whereas solr 5.0 is: response: { numFound: 1093, ... debug: { timing: { time: 6551, process: { time: 6549, query: { time: 0 }, highlight: { time: 6524 }, debug: { time: 25 } -- Ere Maijala Kansalliskirjasto / The National Library of Finland smime.p7s Description: S/MIME cryptographic signature
Slow highlighting on Solr 5.0.0
Short form: While testing Solr 5.0.0 within our staging environment, I noticed that highlight enabled queries are much slower than I saw with 4.10. Are there any obvious reasons why this might be the case? As far as I can tell, nothing has changed with the default highlight search component or its parameters. A little more detail: The bulk of the collection config set was stolen from the basic 4.X example config set. I changed my schema.xml and solrconfig.xml just enough to get 5.0 to create a new collection (removed non-trie fields, some other deprecated response handler definitions, etc). I can provide my version of the solr.HighlightComponent config, but it is identical to the sample_techproducts_configs example in 5.0. Are there any other config files I could provide that might be useful? Number on “much slower”: I indexed a very small subset of my data into the new collection and used the /select interface to do a simple debug query. Solr 4.10 gives the following pertinent info: response: { numFound: 72628, ... debug: { timing: { time: 95, process: { time: 94, query: { time: 6 }, highlight: { time: 84 }, debug: { time: 4 } } --- Whereas solr 5.0 is: response: { numFound: 1093, ... debug: { timing: { time: 6551, process: { time: 6549, query: { time: 0 }, highlight: { time: 6524 }, debug: { time: 25 } smime.p7s Description: S/MIME cryptographic signature
Re: Slow forwarding requests to collection leader
Thanks for the info Daniel. I will go forth and make a better client. On Oct 29, 2014, at 2:28 AM, Daniel Collins danwcoll...@gmail.com wrote: I kind of think this might be working as designed, but I'll be happy to be corrected by others :) We had a similar issue which we discovered by accident, we had 2 or 3 collections spread across some machines, and we accidentally tried to send an indexing request to a node in teh cloud that didn't have a replica of collection1 (but it had other collections). We saw an instant jump in indexing latency to 5s, which given the previous latencies had been ~20ms was rather obvious! Querying seems to be fine with this kind of forwarding approach, but indexing would logically require ZK information (to find the right shard for the destination collection and the leader of that shard), so I'm wondering if a node in the cloud that has a replica of collection1 has that information cached, whereas a node in the (same) cloud that only has a collection2 replica only has collection2 information cached, and has to go to ZK for every forwarding request. I haven't checked the code recently, but that seems plausible to me. Would you really want all your collection2 nodes to be running ZK watches for all collection1 updates as well as their own collection2 watches, that would clog them up processing updates that in all honestly, they shouldn't have to deal with. Every node in the cloud would have to have a watch on everything else which if you have a lot of independent collections would be an unnecessary burden on each of them. If you use SolrJ as a client, that would route to a correct node in the cloud (which is what we ended up using through JNI which was interesting), but if you are using HTTP to index, that's something your application has to take care of. On 28 October 2014 19:29, Matt Hilt matt.h...@numerica.us wrote: I have three equal machines each running solr cloud (4.8). I have multiple collections that are replicated but not sharded. I also have document generation processes running on these nodes which involves querying the collection ~5 times per document generated. Node 1 has a replica of collection A and is running document generation code that pushes to the HTTP /update/json hander. Node 2 is the leader of collection A. Node 3 does not have a replica of node A, but is running document generation code for collection A. The issue I see is that node 1 can push documents into Solr 3-5 times faster than node 3 when they both talk to the solr instance on their localhost. If either of them talk directly to the solr instance on node 2, the performance is excellent (on par with node 1). To me it seems that the only difference in these cases is the query/put request forwarding. Does this involve some slow zookeeper communication that should be avoided? Any other insights? Thanks smime.p7s Description: S/MIME cryptographic signature
Re: Ideas for debugging poor SolrCloud scalability
If you are issuing writes to shard non-leaders, then there is a large overhead for the eventual redirect to the leader. I noticed a 3-5 times performance increase by making my write client leader aware. On Oct 30, 2014, at 2:56 PM, Ian Rose ianr...@fullstory.com wrote: If you want to increase QPS, you should not be increasing numShards. You need to increase replicationFactor. When your numShards matches the number of servers, every single server will be doing part of the work for every query. I think this is true only for actual queries, right? I am not issuing any queries, only writes (document inserts). In the case of writes, increasing the number of shards should increase my throughput (in ops/sec) more or less linearly, right? On Thu, Oct 30, 2014 at 4:50 PM, Shawn Heisey apa...@elyograg.org wrote: On 10/30/2014 2:23 PM, Ian Rose wrote: My methodology is as follows. 1. Start up a K solr servers. 2. Remove all existing collections. 3. Create N collections, with numShards=K for each. 4. Start load testing. Every minute, print the number of successful updates and the number of failed updates. 5. Keep increasing the offered load (via simulated users) until the qps flatlines. If you want to increase QPS, you should not be increasing numShards. You need to increase replicationFactor. When your numShards matches the number of servers, every single server will be doing part of the work for every query. If you increase replicationFactor instead, then each server can be doing a different query in parallel. Sharding the index is what you need to do when you need to scale the size of the index, so each server does not get overwhelmed by dealing with every document for every query. Getting a high QPS with a big index requires increasing both numShards *AND* replicationFactor. Thanks, Shawn smime.p7s Description: S/MIME cryptographic signature
Slow forwarding requests to collection leader
I have three equal machines each running solr cloud (4.8). I have multiple collections that are replicated but not sharded. I also have document generation processes running on these nodes which involves querying the collection ~5 times per document generated. Node 1 has a replica of collection A and is running document generation code that pushes to the HTTP /update/json hander. Node 2 is the leader of collection A. Node 3 does not have a replica of node A, but is running document generation code for collection A. The issue I see is that node 1 can push documents into Solr 3-5 times faster than node 3 when they both talk to the solr instance on their localhost. If either of them talk directly to the solr instance on node 2, the performance is excellent (on par with node 1). To me it seems that the only difference in these cases is the query/put request forwarding. Does this involve some slow zookeeper communication that should be avoided? Any other insights? Thanks smime.p7s Description: S/MIME cryptographic signature