[jira] [Commented] (SOLR-6832) Queries be served locally rather than being forwarded to another replica
[ https://issues.apache.org/jira/browse/SOLR-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578181#comment-14578181 ] Sachin Goyal commented on SOLR-6832: https://issues.apache.org/jira/browse/SOLR-7121 is a patch to address the other part of this problem. It helps nodes become aware of their slowness and tell the ZK that they should be moved out of the network for a while. When their health has recovered, the nodes automatically request the ZK to be joined back in the cluster. These two patches have resulted in making our cluster stable, though we have yet to quantify by how much (Quantification is not really a priority right now given that we will need to compare the cluster with an un-patched cluster and then put load on them to bring them down etc.) Queries be served locally rather than being forwarded to another replica Key: SOLR-6832 URL: https://issues.apache.org/jira/browse/SOLR-6832 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2 Reporter: Sachin Goyal Assignee: Timothy Potter Fix For: Trunk, 5.1 Attachments: SOLR-6832.patch, SOLR-6832.patch, SOLR-6832.patch, SOLR-6832.patch Currently, I see that code flow for a query in SolrCloud is as follows: For distributed query: SolrCore - SearchHandler.handleRequestBody() - HttpShardHandler.submit() For non-distributed query: SolrCore - SearchHandler.handleRequestBody() - QueryComponent.process() \\ \\ \\ For a distributed query, the request is always sent to all the shards even if the originating SolrCore (handling the original distributed query) is a replica of one of the shards. If the original Solr-Core can check itself before sending http requests for any shard, we can probably save some network hopping and gain some performance. \\ \\ We can change SearchHandler.handleRequestBody() or HttpShardHandler.submit() to fix this behavior (most likely the former and not the latter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6832) Queries be served locally rather than being forwarded to another replica
[ https://issues.apache.org/jira/browse/SOLR-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337776#comment-14337776 ] Otis Gospodnetic commented on SOLR-6832: bq. The performance gain increases if coresPerMachine is 1 and a single JVM has cores from 'k' shards. Ever managed to measure how much this feature helps in various scenarios? bq. For a distributed query, the request is always sent to all the shards even if the originating SolrCore (handling the original distributed query) is a replica of one of the shards. If the original Solr-Core can check itself before sending http requests for any shard, we can probably save some network hopping and gain some performance. This sounds as like it saves only a N local calls out of M, where M N, N is the number of local replicas that could be queried locally, and M is the total number of primary shards in the cluster that are to be queries. Is this correct? So say there are 20 shards spread evenly over 20 nodes (i.e., 1 shard per node) and a query request comes in, the node that got the request will query send 19 requests to the remaining 19 nodes and thus save just one network trip by querying a local shard? I must be missing something... Queries be served locally rather than being forwarded to another replica Key: SOLR-6832 URL: https://issues.apache.org/jira/browse/SOLR-6832 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2 Reporter: Sachin Goyal Assignee: Timothy Potter Fix For: Trunk, 5.1 Attachments: SOLR-6832.patch, SOLR-6832.patch, SOLR-6832.patch, SOLR-6832.patch Currently, I see that code flow for a query in SolrCloud is as follows: For distributed query: SolrCore - SearchHandler.handleRequestBody() - HttpShardHandler.submit() For non-distributed query: SolrCore - SearchHandler.handleRequestBody() - QueryComponent.process() \\ \\ \\ For a distributed query, the request is always sent to all the shards even if the originating SolrCore (handling the original distributed query) is a replica of one of the shards. If the original Solr-Core can check itself before sending http requests for any shard, we can probably save some network hopping and gain some performance. \\ \\ We can change SearchHandler.handleRequestBody() or HttpShardHandler.submit() to fix this behavior (most likely the former and not the latter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6832) Queries be served locally rather than being forwarded to another replica
[ https://issues.apache.org/jira/browse/SOLR-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337800#comment-14337800 ] Ayon Sinha commented on SOLR-6832: -- @Otis, you are correct. This helps only where there is over-sharding. And in our particular scenario where we sharded to get better CPU core utilization and write speeds based on Tim's experiments with over-sharding. Since all queries were send to other nodes, we were getting hit with distributed deadlocks more often when one or more nodes were slow/overloaded. So this patch is a slight optimization and a reduction of likelihood of getting bogged down by other slow nodes when the parent query node has the core. Queries be served locally rather than being forwarded to another replica Key: SOLR-6832 URL: https://issues.apache.org/jira/browse/SOLR-6832 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2 Reporter: Sachin Goyal Assignee: Timothy Potter Fix For: Trunk, 5.1 Attachments: SOLR-6832.patch, SOLR-6832.patch, SOLR-6832.patch, SOLR-6832.patch Currently, I see that code flow for a query in SolrCloud is as follows: For distributed query: SolrCore - SearchHandler.handleRequestBody() - HttpShardHandler.submit() For non-distributed query: SolrCore - SearchHandler.handleRequestBody() - QueryComponent.process() \\ \\ \\ For a distributed query, the request is always sent to all the shards even if the originating SolrCore (handling the original distributed query) is a replica of one of the shards. If the original Solr-Core can check itself before sending http requests for any shard, we can probably save some network hopping and gain some performance. \\ \\ We can change SearchHandler.handleRequestBody() or HttpShardHandler.submit() to fix this behavior (most likely the former and not the latter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6832) Queries be served locally rather than being forwarded to another replica
[ https://issues.apache.org/jira/browse/SOLR-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337866#comment-14337866 ] Ayon Sinha commented on SOLR-6832: -- Actually, in our experience, network has been the most flaky piece. So any network hop saved is a big deal. And again you are right that the root cause (first domino) of the distributed deadlock is yet to be identified. What we see is when 1 machine in the cluster goes for a GC pause or traffic spike, it brings down all the other machines be quickly. The slow machine currently does not tell ZK that its struggling and hence all other nodes keep sending it queries. This is being addressed in another JIRA. This particular patch buys us some time. Queries be served locally rather than being forwarded to another replica Key: SOLR-6832 URL: https://issues.apache.org/jira/browse/SOLR-6832 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2 Reporter: Sachin Goyal Assignee: Timothy Potter Fix For: Trunk, 5.1 Attachments: SOLR-6832.patch, SOLR-6832.patch, SOLR-6832.patch, SOLR-6832.patch Currently, I see that code flow for a query in SolrCloud is as follows: For distributed query: SolrCore - SearchHandler.handleRequestBody() - HttpShardHandler.submit() For non-distributed query: SolrCore - SearchHandler.handleRequestBody() - QueryComponent.process() \\ \\ \\ For a distributed query, the request is always sent to all the shards even if the originating SolrCore (handling the original distributed query) is a replica of one of the shards. If the original Solr-Core can check itself before sending http requests for any shard, we can probably save some network hopping and gain some performance. \\ \\ We can change SearchHandler.handleRequestBody() or HttpShardHandler.submit() to fix this behavior (most likely the former and not the latter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6832) Queries be served locally rather than being forwarded to another replica
[ https://issues.apache.org/jira/browse/SOLR-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321249#comment-14321249 ] Sachin Goyal commented on SOLR-6832: Thank you [~thelabdude]! Queries be served locally rather than being forwarded to another replica Key: SOLR-6832 URL: https://issues.apache.org/jira/browse/SOLR-6832 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2 Reporter: Sachin Goyal Assignee: Timothy Potter Fix For: Trunk, 5.1 Attachments: SOLR-6832.patch, SOLR-6832.patch, SOLR-6832.patch, SOLR-6832.patch Currently, I see that code flow for a query in SolrCloud is as follows: For distributed query: SolrCore - SearchHandler.handleRequestBody() - HttpShardHandler.submit() For non-distributed query: SolrCore - SearchHandler.handleRequestBody() - QueryComponent.process() \\ \\ \\ For a distributed query, the request is always sent to all the shards even if the originating SolrCore (handling the original distributed query) is a replica of one of the shards. If the original Solr-Core can check itself before sending http requests for any shard, we can probably save some network hopping and gain some performance. \\ \\ We can change SearchHandler.handleRequestBody() or HttpShardHandler.submit() to fix this behavior (most likely the former and not the latter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6832) Queries be served locally rather than being forwarded to another replica
[ https://issues.apache.org/jira/browse/SOLR-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320851#comment-14320851 ] Timothy Potter commented on SOLR-6832: -- Working on committing this now. Queries be served locally rather than being forwarded to another replica Key: SOLR-6832 URL: https://issues.apache.org/jira/browse/SOLR-6832 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2 Reporter: Sachin Goyal Assignee: Timothy Potter Attachments: SOLR-6832.patch, SOLR-6832.patch, SOLR-6832.patch, SOLR-6832.patch Currently, I see that code flow for a query in SolrCloud is as follows: For distributed query: SolrCore - SearchHandler.handleRequestBody() - HttpShardHandler.submit() For non-distributed query: SolrCore - SearchHandler.handleRequestBody() - QueryComponent.process() \\ \\ \\ For a distributed query, the request is always sent to all the shards even if the originating SolrCore (handling the original distributed query) is a replica of one of the shards. If the original Solr-Core can check itself before sending http requests for any shard, we can probably save some network hopping and gain some performance. \\ \\ We can change SearchHandler.handleRequestBody() or HttpShardHandler.submit() to fix this behavior (most likely the former and not the latter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6832) Queries be served locally rather than being forwarded to another replica
[ https://issues.apache.org/jira/browse/SOLR-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321192#comment-14321192 ] ASF subversion and git services commented on SOLR-6832: --- Commit 1659750 from [~thelabdude] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1659750 ] SOLR-6832: Queries be served locally rather than being forwarded to another replica Queries be served locally rather than being forwarded to another replica Key: SOLR-6832 URL: https://issues.apache.org/jira/browse/SOLR-6832 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2 Reporter: Sachin Goyal Assignee: Timothy Potter Fix For: Trunk, 5.1 Attachments: SOLR-6832.patch, SOLR-6832.patch, SOLR-6832.patch, SOLR-6832.patch Currently, I see that code flow for a query in SolrCloud is as follows: For distributed query: SolrCore - SearchHandler.handleRequestBody() - HttpShardHandler.submit() For non-distributed query: SolrCore - SearchHandler.handleRequestBody() - QueryComponent.process() \\ \\ \\ For a distributed query, the request is always sent to all the shards even if the originating SolrCore (handling the original distributed query) is a replica of one of the shards. If the original Solr-Core can check itself before sending http requests for any shard, we can probably save some network hopping and gain some performance. \\ \\ We can change SearchHandler.handleRequestBody() or HttpShardHandler.submit() to fix this behavior (most likely the former and not the latter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6832) Queries be served locally rather than being forwarded to another replica
[ https://issues.apache.org/jira/browse/SOLR-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321177#comment-14321177 ] ASF subversion and git services commented on SOLR-6832: --- Commit 1659748 from [~thelabdude] in branch 'dev/trunk' [ https://svn.apache.org/r1659748 ] SOLR-6832: Queries be served locally rather than being forwarded to another replica Queries be served locally rather than being forwarded to another replica Key: SOLR-6832 URL: https://issues.apache.org/jira/browse/SOLR-6832 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2 Reporter: Sachin Goyal Assignee: Timothy Potter Attachments: SOLR-6832.patch, SOLR-6832.patch, SOLR-6832.patch, SOLR-6832.patch Currently, I see that code flow for a query in SolrCloud is as follows: For distributed query: SolrCore - SearchHandler.handleRequestBody() - HttpShardHandler.submit() For non-distributed query: SolrCore - SearchHandler.handleRequestBody() - QueryComponent.process() \\ \\ \\ For a distributed query, the request is always sent to all the shards even if the originating SolrCore (handling the original distributed query) is a replica of one of the shards. If the original Solr-Core can check itself before sending http requests for any shard, we can probably save some network hopping and gain some performance. \\ \\ We can change SearchHandler.handleRequestBody() or HttpShardHandler.submit() to fix this behavior (most likely the former and not the latter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6832) Queries be served locally rather than being forwarded to another replica
[ https://issues.apache.org/jira/browse/SOLR-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316779#comment-14316779 ] Sachin Goyal commented on SOLR-6832: Thank you [~thelabdude]. Please let me know how we can get this committed into the trunk and I can edit the Solr reference guide. I would also like to back-port this into the 5x branch. Queries be served locally rather than being forwarded to another replica Key: SOLR-6832 URL: https://issues.apache.org/jira/browse/SOLR-6832 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2 Reporter: Sachin Goyal Assignee: Timothy Potter Attachments: SOLR-6832.patch, SOLR-6832.patch, SOLR-6832.patch, SOLR-6832.patch Currently, I see that code flow for a query in SolrCloud is as follows: For distributed query: SolrCore - SearchHandler.handleRequestBody() - HttpShardHandler.submit() For non-distributed query: SolrCore - SearchHandler.handleRequestBody() - QueryComponent.process() \\ \\ \\ For a distributed query, the request is always sent to all the shards even if the originating SolrCore (handling the original distributed query) is a replica of one of the shards. If the original Solr-Core can check itself before sending http requests for any shard, we can probably save some network hopping and gain some performance. \\ \\ We can change SearchHandler.handleRequestBody() or HttpShardHandler.submit() to fix this behavior (most likely the former and not the latter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6832) Queries be served locally rather than being forwarded to another replica
[ https://issues.apache.org/jira/browse/SOLR-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316697#comment-14316697 ] Timothy Potter commented on SOLR-6832: -- Also, I don't think we need to include this parameter in all of the configs, as we're trying to get away from bloated configs. So I changed the patch to just include in the sample techproducts configs. We'll also need to document this parameter in the Solr reference guide. Queries be served locally rather than being forwarded to another replica Key: SOLR-6832 URL: https://issues.apache.org/jira/browse/SOLR-6832 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2 Reporter: Sachin Goyal Assignee: Timothy Potter Attachments: SOLR-6832.patch, SOLR-6832.patch, SOLR-6832.patch, SOLR-6832.patch Currently, I see that code flow for a query in SolrCloud is as follows: For distributed query: SolrCore - SearchHandler.handleRequestBody() - HttpShardHandler.submit() For non-distributed query: SolrCore - SearchHandler.handleRequestBody() - QueryComponent.process() \\ \\ \\ For a distributed query, the request is always sent to all the shards even if the originating SolrCore (handling the original distributed query) is a replica of one of the shards. If the original Solr-Core can check itself before sending http requests for any shard, we can probably save some network hopping and gain some performance. \\ \\ We can change SearchHandler.handleRequestBody() or HttpShardHandler.submit() to fix this behavior (most likely the former and not the latter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6832) Queries be served locally rather than being forwarded to another replica
[ https://issues.apache.org/jira/browse/SOLR-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312895#comment-14312895 ] Sachin Goyal commented on SOLR-6832: Thanks [~thelabdude]. If you can point me to some existing test-case from where I can see the creation of multiple nodes' cluster and running updates/queries on the same, then I can help with unit test creation. Queries be served locally rather than being forwarded to another replica Key: SOLR-6832 URL: https://issues.apache.org/jira/browse/SOLR-6832 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2 Reporter: Sachin Goyal Assignee: Timothy Potter Attachments: SOLR-6832.patch, SOLR-6832.patch Currently, I see that code flow for a query in SolrCloud is as follows: For distributed query: SolrCore - SearchHandler.handleRequestBody() - HttpShardHandler.submit() For non-distributed query: SolrCore - SearchHandler.handleRequestBody() - QueryComponent.process() \\ \\ \\ For a distributed query, the request is always sent to all the shards even if the originating SolrCore (handling the original distributed query) is a replica of one of the shards. If the original Solr-Core can check itself before sending http requests for any shard, we can probably save some network hopping and gain some performance. \\ \\ We can change SearchHandler.handleRequestBody() or HttpShardHandler.submit() to fix this behavior (most likely the former and not the latter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6832) Queries be served locally rather than being forwarded to another replica
[ https://issues.apache.org/jira/browse/SOLR-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307520#comment-14307520 ] Timothy Potter commented on SOLR-6832: -- Thanks for the updated patch. Only thing we need now is a good unit test. I can take a stab at that over the next few days. Queries be served locally rather than being forwarded to another replica Key: SOLR-6832 URL: https://issues.apache.org/jira/browse/SOLR-6832 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2 Reporter: Sachin Goyal Assignee: Timothy Potter Attachments: SOLR-6832.patch, SOLR-6832.patch Currently, I see that code flow for a query in SolrCloud is as follows: For distributed query: SolrCore - SearchHandler.handleRequestBody() - HttpShardHandler.submit() For non-distributed query: SolrCore - SearchHandler.handleRequestBody() - QueryComponent.process() \\ \\ \\ For a distributed query, the request is always sent to all the shards even if the originating SolrCore (handling the original distributed query) is a replica of one of the shards. If the original Solr-Core can check itself before sending http requests for any shard, we can probably save some network hopping and gain some performance. \\ \\ We can change SearchHandler.handleRequestBody() or HttpShardHandler.submit() to fix this behavior (most likely the former and not the latter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6832) Queries be served locally rather than being forwarded to another replica
[ https://issues.apache.org/jira/browse/SOLR-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304149#comment-14304149 ] Sachin Goyal commented on SOLR-6832: Oh great. Thanks for saving me a search for the ZkController :) Queries be served locally rather than being forwarded to another replica Key: SOLR-6832 URL: https://issues.apache.org/jira/browse/SOLR-6832 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2 Reporter: Sachin Goyal Assignee: Timothy Potter Attachments: SOLR-6832.patch Currently, I see that code flow for a query in SolrCloud is as follows: For distributed query: SolrCore - SearchHandler.handleRequestBody() - HttpShardHandler.submit() For non-distributed query: SolrCore - SearchHandler.handleRequestBody() - QueryComponent.process() \\ \\ \\ For a distributed query, the request is always sent to all the shards even if the originating SolrCore (handling the original distributed query) is a replica of one of the shards. If the original Solr-Core can check itself before sending http requests for any shard, we can probably save some network hopping and gain some performance. \\ \\ We can change SearchHandler.handleRequestBody() or HttpShardHandler.submit() to fix this behavior (most likely the former and not the latter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6832) Queries be served locally rather than being forwarded to another replica
[ https://issues.apache.org/jira/browse/SOLR-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14303922#comment-14303922 ] Timothy Potter commented on SOLR-6832: -- [~sachingoyal] Thanks for the patch! I'm working to get it to a committable state. I don't think adding {{preferLocalShards}} as a collection-level setting (in SolrConfig) adds much value here. If an operator wants to enforce that query parameter for all requests, they can use the built-in support for defaults or invariants on the appropriate query request handler, e.g. to make this the default on the /select handler, you could do: {code} requestHandler name=/select class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows10/int bool name=preferLocalShardstrue/bool ... {code} Both approaches require some config changes in solrconfig.xml, but the latter (my suggestion) avoids adding new code / config settings. That said, please let me know if you think there's another reason to have this as an explicit setting in solrconfig.xml. Also, all the code in {{findCurrentHostAddress}} can simply be replaced by {{ZkController.getBaseUrl()}} when needed. Queries be served locally rather than being forwarded to another replica Key: SOLR-6832 URL: https://issues.apache.org/jira/browse/SOLR-6832 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2 Reporter: Sachin Goyal Assignee: Timothy Potter Attachments: SOLR-6832.patch Currently, I see that code flow for a query in SolrCloud is as follows: For distributed query: SolrCore - SearchHandler.handleRequestBody() - HttpShardHandler.submit() For non-distributed query: SolrCore - SearchHandler.handleRequestBody() - QueryComponent.process() \\ \\ \\ For a distributed query, the request is always sent to all the shards even if the originating SolrCore (handling the original distributed query) is a replica of one of the shards. If the original Solr-Core can check itself before sending http requests for any shard, we can probably save some network hopping and gain some performance. \\ \\ We can change SearchHandler.handleRequestBody() or HttpShardHandler.submit() to fix this behavior (most likely the former and not the latter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6832) Queries be served locally rather than being forwarded to another replica
[ https://issues.apache.org/jira/browse/SOLR-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304033#comment-14304033 ] Timothy Potter commented on SOLR-6832: -- Awesome - btw ... in case you haven't seen this before, it's a little cumbersome to get at the ZkController from the req object, something like: {code} req.getCore().getCoreDescriptor().getCoreContainer().getZkController().getBaseUrl() {code} Queries be served locally rather than being forwarded to another replica Key: SOLR-6832 URL: https://issues.apache.org/jira/browse/SOLR-6832 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2 Reporter: Sachin Goyal Assignee: Timothy Potter Attachments: SOLR-6832.patch Currently, I see that code flow for a query in SolrCloud is as follows: For distributed query: SolrCore - SearchHandler.handleRequestBody() - HttpShardHandler.submit() For non-distributed query: SolrCore - SearchHandler.handleRequestBody() - QueryComponent.process() \\ \\ \\ For a distributed query, the request is always sent to all the shards even if the originating SolrCore (handling the original distributed query) is a replica of one of the shards. If the original Solr-Core can check itself before sending http requests for any shard, we can probably save some network hopping and gain some performance. \\ \\ We can change SearchHandler.handleRequestBody() or HttpShardHandler.submit() to fix this behavior (most likely the former and not the latter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6832) Queries be served locally rather than being forwarded to another replica
[ https://issues.apache.org/jira/browse/SOLR-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304014#comment-14304014 ] Sachin Goyal commented on SOLR-6832: [~thelabdude], I do not feel very strongly about the configuration option in solrconfig.xml I kept it here only because specifying a global option looked simpler to use. I will try the ZkController.getBaseUrl() and update the patch shortly with the above suggestions. Thank you for reviewing. Queries be served locally rather than being forwarded to another replica Key: SOLR-6832 URL: https://issues.apache.org/jira/browse/SOLR-6832 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2 Reporter: Sachin Goyal Assignee: Timothy Potter Attachments: SOLR-6832.patch Currently, I see that code flow for a query in SolrCloud is as follows: For distributed query: SolrCore - SearchHandler.handleRequestBody() - HttpShardHandler.submit() For non-distributed query: SolrCore - SearchHandler.handleRequestBody() - QueryComponent.process() \\ \\ \\ For a distributed query, the request is always sent to all the shards even if the originating SolrCore (handling the original distributed query) is a replica of one of the shards. If the original Solr-Core can check itself before sending http requests for any shard, we can probably save some network hopping and gain some performance. \\ \\ We can change SearchHandler.handleRequestBody() or HttpShardHandler.submit() to fix this behavior (most likely the former and not the latter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6832) Queries be served locally rather than being forwarded to another replica
[ https://issues.apache.org/jira/browse/SOLR-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246130#comment-14246130 ] Shawn Heisey commented on SOLR-6832: A slightly better choice might be preferLocalReplicas ... but Shards is pretty good too. Queries be served locally rather than being forwarded to another replica Key: SOLR-6832 URL: https://issues.apache.org/jira/browse/SOLR-6832 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2 Reporter: Sachin Goyal Currently, I see that code flow for a query in SolrCloud is as follows: For distributed query: SolrCore - SearchHandler.handleRequestBody() - HttpShardHandler.submit() For non-distributed query: SolrCore - SearchHandler.handleRequestBody() - QueryComponent.process() \\ \\ \\ For a distributed query, the request is always sent to all the shards even if the originating SolrCore (handling the original distributed query) is a replica of one of the shards. If the original Solr-Core can check itself before sending http requests for any shard, we can probably save some network hopping and gain some performance. \\ \\ We can change SearchHandler.handleRequestBody() or HttpShardHandler.submit() to fix this behavior (most likely the former and not the latter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6832) Queries be served locally rather than being forwarded to another replica
[ https://issues.apache.org/jira/browse/SOLR-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14245538#comment-14245538 ] Shawn Heisey commented on SOLR-6832: That sounds like a perfect use-case for this option. In your setup, you have an external load balancer and are not relying on SolrCloud itself or the zookeeper-aware Java client (CloudSolrServer) to do the load balancing for you. For an environment like that, letting SolrCloud forward the request adds a completely unnecessary network hop, along with new Java objects and subsequent garbage that must be collected. This is why I said I didn't want to derail the work. If you have a solution, we should try to get it to a state where it can be committed. It is very clear that it will be an immense help for many users. I just don't want it to become the default. Trying to come up with a useful and descriptive option name that's not horribly long ... that's a challenge. :) Something like handleRequestsLocally may be too generic, but it's a lot shorter than handleShardRequestsLocallyIfPossible! Queries be served locally rather than being forwarded to another replica Key: SOLR-6832 URL: https://issues.apache.org/jira/browse/SOLR-6832 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2 Reporter: Sachin Goyal Currently, I see that code flow for a query in SolrCloud is as follows: For distributed query: SolrCore - SearchHandler.handleRequestBody() - HttpShardHandler.submit() For non-distributed query: SolrCore - SearchHandler.handleRequestBody() - QueryComponent.process() \\ \\ \\ For a distributed query, the request is always sent to all the shards even if the originating SolrCore (handling the original distributed query) is a replica of one of the shards. If the original Solr-Core can check itself before sending http requests for any shard, we can probably save some network hopping and gain some performance. \\ \\ We can change SearchHandler.handleRequestBody() or HttpShardHandler.submit() to fix this behavior (most likely the former and not the latter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6832) Queries be served locally rather than being forwarded to another replica
[ https://issues.apache.org/jira/browse/SOLR-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14245587#comment-14245587 ] Ayon Sinha commented on SOLR-6832: -- Our clients actually do use CloudSolrServer (LB SolrJ client). Is there something we should be worrying about there? We are under the impression that the Zk aware CloudSolrServer is doing a round-robin load balancing sending query requests. We only intend to 'preferLocalShards' on the Solr node side only. BTW, how is the name 'preferLocalShards' ? Queries be served locally rather than being forwarded to another replica Key: SOLR-6832 URL: https://issues.apache.org/jira/browse/SOLR-6832 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2 Reporter: Sachin Goyal Currently, I see that code flow for a query in SolrCloud is as follows: For distributed query: SolrCore - SearchHandler.handleRequestBody() - HttpShardHandler.submit() For non-distributed query: SolrCore - SearchHandler.handleRequestBody() - QueryComponent.process() \\ \\ \\ For a distributed query, the request is always sent to all the shards even if the originating SolrCore (handling the original distributed query) is a replica of one of the shards. If the original Solr-Core can check itself before sending http requests for any shard, we can probably save some network hopping and gain some performance. \\ \\ We can change SearchHandler.handleRequestBody() or HttpShardHandler.submit() to fix this behavior (most likely the former and not the latter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6832) Queries be served locally rather than being forwarded to another replica
[ https://issues.apache.org/jira/browse/SOLR-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14245665#comment-14245665 ] Shawn Heisey commented on SOLR-6832: CloudSolrServer does load balance, so you do not need an external load balancer. Internally, it uses changes in the zookeeper clusterstate to add and remove URLs on an instance of LBHttpSolrServer, which in turn uses HttpSolrServer for each of those URLs. https://lucene.apache.org/solr/4_10_2/solr-solrj/org/apache/solr/client/solrj/impl/LBHttpSolrServer.html The name preferLocalShards is perfect ... and I think a good case can be made for CloudSolrServer using this for queries (probably via a query URL parameter) by default. Queries be served locally rather than being forwarded to another replica Key: SOLR-6832 URL: https://issues.apache.org/jira/browse/SOLR-6832 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2 Reporter: Sachin Goyal Currently, I see that code flow for a query in SolrCloud is as follows: For distributed query: SolrCore - SearchHandler.handleRequestBody() - HttpShardHandler.submit() For non-distributed query: SolrCore - SearchHandler.handleRequestBody() - QueryComponent.process() \\ \\ \\ For a distributed query, the request is always sent to all the shards even if the originating SolrCore (handling the original distributed query) is a replica of one of the shards. If the original Solr-Core can check itself before sending http requests for any shard, we can probably save some network hopping and gain some performance. \\ \\ We can change SearchHandler.handleRequestBody() or HttpShardHandler.submit() to fix this behavior (most likely the former and not the latter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6832) Queries be served locally rather than being forwarded to another replica
[ https://issues.apache.org/jira/browse/SOLR-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14245667#comment-14245667 ] Shawn Heisey commented on SOLR-6832: we might even be able to shorten the parameter name to preferLocal, but that will require some further thought. I'd hate to have the shorter version be in use when another preferLocalXXX requirement comes up. Queries be served locally rather than being forwarded to another replica Key: SOLR-6832 URL: https://issues.apache.org/jira/browse/SOLR-6832 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2 Reporter: Sachin Goyal Currently, I see that code flow for a query in SolrCloud is as follows: For distributed query: SolrCore - SearchHandler.handleRequestBody() - HttpShardHandler.submit() For non-distributed query: SolrCore - SearchHandler.handleRequestBody() - QueryComponent.process() \\ \\ \\ For a distributed query, the request is always sent to all the shards even if the originating SolrCore (handling the original distributed query) is a replica of one of the shards. If the original Solr-Core can check itself before sending http requests for any shard, we can probably save some network hopping and gain some performance. \\ \\ We can change SearchHandler.handleRequestBody() or HttpShardHandler.submit() to fix this behavior (most likely the former and not the latter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6832) Queries be served locally rather than being forwarded to another replica
[ https://issues.apache.org/jira/browse/SOLR-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14244844#comment-14244844 ] Ayon Sinha commented on SOLR-6832: -- Hi [~elyograg], I work with [~sachingoyal]. The background of this patch is that, we have a cluster of 14 machines actually serving upwards of 5000 qps, and when one machine goes into a multi-second GC pause, it easily brings down the entire cluster. I know this is not the sole cause of the distributed deadlock and we definitely fixed other things like (gc pauses, thread counts etc) to reduce the likelihood of this problem. In the scenario that you mention, the load balancer outside SolrCloud is at fault and when that is the case we'd like it to take down only one replica rather than propagate the problem to other replicas. So to be clear, when this Option is ON, the only thing you'll lose is extra load balancing among the shard-queries. And frankly when I have all the shards in the same node, I prefer to NOT go over the network as network is among the most unreliable and taxed resource in cloud environments. When we go over the network to another compute, I have no idea what is carrying me over there and how is that other node doing overall. We will post our results on the benefit of having this option as ON. Queries be served locally rather than being forwarded to another replica Key: SOLR-6832 URL: https://issues.apache.org/jira/browse/SOLR-6832 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2 Reporter: Sachin Goyal Currently, I see that code flow for a query in SolrCloud is as follows: For distributed query: SolrCore - SearchHandler.handleRequestBody() - HttpShardHandler.submit() For non-distributed query: SolrCore - SearchHandler.handleRequestBody() - QueryComponent.process() \\ \\ \\ For a distributed query, the request is always sent to all the shards even if the originating SolrCore (handling the original distributed query) is a replica of one of the shards. If the original Solr-Core can check itself before sending http requests for any shard, we can probably save some network hopping and gain some performance. \\ \\ We can change SearchHandler.handleRequestBody() or HttpShardHandler.submit() to fix this behavior (most likely the former and not the latter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6832) Queries be served locally rather than being forwarded to another replica
[ https://issues.apache.org/jira/browse/SOLR-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241408#comment-14241408 ] Sachin Goyal commented on SOLR-6832: The performance gain increases if coresPerMachine is 1 and a single JVM has cores from 'k' shards. Queries be served locally rather than being forwarded to another replica Key: SOLR-6832 URL: https://issues.apache.org/jira/browse/SOLR-6832 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2 Reporter: Sachin Goyal Currently, I see that code flow for a query in SolrCloud is as follows: For distributed query: SolrCore - SearchHandler.handleRequestBody() - HttpShardHandler.submit() For non-distributed query: SolrCore - SearchHandler.handleRequestBody() - QueryComponent.process() \\ \\ \\ For a distributed query, the request is always sent to all the shards even if the originating SolrCore (handling the original distributed query) is a replica of one of the shards. If the original Solr-Core can check itself before sending http requests for any shard, we can probably save some network hopping and gain some performance. \\ \\ We can change SearchHandler.handleRequestBody() or HttpShardHandler.submit() to fix this behavior (most likely the former and not the latter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6832) Queries be served locally rather than being forwarded to another replica
[ https://issues.apache.org/jira/browse/SOLR-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242231#comment-14242231 ] Shawn Heisey commented on SOLR-6832: I have concerns, but I don't want to derail the work. There are use-cases for which this would be very useful, but many other use-cases where it would cause a single machine to crumble under the load while other machines in the cloud are nearly idle. Duplicating what I said on the dev@l.a.o thread: Consider a SolrCloud that is handling 5000 requests per second with a replicationFactor of 20 or 30. This could be one shard or multiple shards. Currently, those requests will be load balanced to the entire cluster. If this option is implemented, suddenly EVERY request will have at least one part handled locally ... and unless the index is very tiny or 99 percent of the queries hit a Solr cache, one index core simply won't be able to handle 5000 queries per second. Getting a single machine capable of handling that load MIGHT be possible, but it would likely be *VERY* expensive. This would be great as an *OPTION* that can be enabled when the index composition and query patterns dictate it will be beneficial ... but it definitely should not be default behavior. Queries be served locally rather than being forwarded to another replica Key: SOLR-6832 URL: https://issues.apache.org/jira/browse/SOLR-6832 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2 Reporter: Sachin Goyal Currently, I see that code flow for a query in SolrCloud is as follows: For distributed query: SolrCore - SearchHandler.handleRequestBody() - HttpShardHandler.submit() For non-distributed query: SolrCore - SearchHandler.handleRequestBody() - QueryComponent.process() \\ \\ \\ For a distributed query, the request is always sent to all the shards even if the originating SolrCore (handling the original distributed query) is a replica of one of the shards. If the original Solr-Core can check itself before sending http requests for any shard, we can probably save some network hopping and gain some performance. \\ \\ We can change SearchHandler.handleRequestBody() or HttpShardHandler.submit() to fix this behavior (most likely the former and not the latter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org