[jira] [Commented] (SOLR-5986) Don't allow runaway queries from harming Solr cluster health or search performance
[ https://issues.apache.org/jira/browse/SOLR-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102715#comment-14102715 ] Jim Walker commented on SOLR-5986: -- Steve, your last post does not incorporate the patch from Anshum does it? Don't allow runaway queries from harming Solr cluster health or search performance -- Key: SOLR-5986 URL: https://issues.apache.org/jira/browse/SOLR-5986 Project: Solr Issue Type: Improvement Components: search Reporter: Steve Davids Assignee: Anshum Gupta Priority: Critical Fix For: 4.10 Attachments: SOLR-5986.patch The intent of this ticket is to have all distributed search requests stop wasting CPU cycles on requests that have already timed out or are so complicated that they won't be able to execute. We have come across a case where a nasty wildcard query within a proximity clause was causing the cluster to enumerate terms for hours even though the query timeout was set to minutes. This caused a noticeable slowdown within the system which made us restart the replicas that happened to service that one request, the worst case scenario are users with a relatively low zk timeout value will have nodes start dropping from the cluster due to long GC pauses. [~amccurry] Built a mechanism into Apache Blur to help with the issue in BLUR-142 (see commit comment for code, though look at the latest code on the trunk for newer bug fixes). Solr should be able to either prevent these problematic queries from running by some heuristic (possibly estimated size of heap usage) or be able to execute a thread interrupt on all query threads once the time threshold is met. This issue mirrors what others have discussed on the mailing list: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200903.mbox/%3c856ac15f0903272054q2dbdbd19kea3c5ba9e105b...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Deleted] (SOLR-6304) Add a way to flatten an input JSON to multiple docs
[ https://issues.apache.org/jira/browse/SOLR-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Walker updated SOLR-6304: - Comment: was deleted (was: Noble, are you looking to rename the first id field to recipeid ?) Add a way to flatten an input JSON to multiple docs --- Key: SOLR-6304 URL: https://issues.apache.org/jira/browse/SOLR-6304 Project: Solr Issue Type: Improvement Reporter: Noble Paul Assignee: Noble Paul example {noformat} curl localhost:8983/update/json/docs?split=/batters/batterf=recipeId:/idf=recipeType:/typef=id:/batters/batter/idf=type:/batters/batter/type -d ' { id: 0001, type: donut, name: Cake, ppu: 0.55, batters: { batter: [ { id: 1001, type: Regular }, { id: 1002, type: Chocolate }, { id: 1003, type: Blueberry }, { id: 1004, type: Devil's Food } ] } }' {noformat} should produce the following output docs {noformat} { recipeId:001, recipeType:donut, id:1001, type:Regular } { recipeId:001, recipeType:donut, id:1002, type:Chocolate } { recipeId:001, recipeType:donut, id:1003, type:Blueberry } { recipeId:001, recipeType:donut, id:1004, type:Devil's food } {noformat} the split param is the element in the tree where it should be split into multiple docs. The 'f' are field name mappings -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6304) Add a way to flatten an input JSON to multiple docs
[ https://issues.apache.org/jira/browse/SOLR-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087687#comment-14087687 ] Jim Walker commented on SOLR-6304: -- Noble, are you looking to rename the first id field to recipeid ? Add a way to flatten an input JSON to multiple docs --- Key: SOLR-6304 URL: https://issues.apache.org/jira/browse/SOLR-6304 Project: Solr Issue Type: Improvement Reporter: Noble Paul Assignee: Noble Paul example {noformat} curl localhost:8983/update/json/docs?split=/batters/batterf=recipeId:/idf=recipeType:/typef=id:/batters/batter/idf=type:/batters/batter/type -d ' { id: 0001, type: donut, name: Cake, ppu: 0.55, batters: { batter: [ { id: 1001, type: Regular }, { id: 1002, type: Chocolate }, { id: 1003, type: Blueberry }, { id: 1004, type: Devil's Food } ] } }' {noformat} should produce the following output docs {noformat} { recipeId:001, recipeType:donut, id:1001, type:Regular } { recipeId:001, recipeType:donut, id:1002, type:Chocolate } { recipeId:001, recipeType:donut, id:1003, type:Blueberry } { recipeId:001, recipeType:donut, id:1004, type:Devil's food } {noformat} the split param is the element in the tree where it should be split into multiple docs. The 'f' are field name mappings -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5986) Don't allow runaway queries from harming Solr cluster health or search performance
[ https://issues.apache.org/jira/browse/SOLR-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069225#comment-14069225 ] Jim Walker commented on SOLR-5986: -- Steve, good point regarding automation. That should come first. I just talked to Anshum who has this covered; he will know what needs to happen here best. Cheers Don't allow runaway queries from harming Solr cluster health or search performance -- Key: SOLR-5986 URL: https://issues.apache.org/jira/browse/SOLR-5986 Project: Solr Issue Type: Improvement Components: search Reporter: Steve Davids Priority: Critical Fix For: 4.9 The intent of this ticket is to have all distributed search requests stop wasting CPU cycles on requests that have already timed out or are so complicated that they won't be able to execute. We have come across a case where a nasty wildcard query within a proximity clause was causing the cluster to enumerate terms for hours even though the query timeout was set to minutes. This caused a noticeable slowdown within the system which made us restart the replicas that happened to service that one request, the worst case scenario are users with a relatively low zk timeout value will have nodes start dropping from the cluster due to long GC pauses. [~amccurry] Built a mechanism into Apache Blur to help with the issue in BLUR-142 (see commit comment for code, though look at the latest code on the trunk for newer bug fixes). Solr should be able to either prevent these problematic queries from running by some heuristic (possibly estimated size of heap usage) or be able to execute a thread interrupt on all query threads once the time threshold is met. This issue mirrors what others have discussed on the mailing list: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200903.mbox/%3c856ac15f0903272054q2dbdbd19kea3c5ba9e105b...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5986) Don't allow runaway queries from harming Solr cluster health or search performance
[ https://issues.apache.org/jira/browse/SOLR-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062228#comment-14062228 ] Jim Walker commented on SOLR-5986: -- Steve, I wonder why you would have to restart the replica? I presume this is because that is your only recourse to stop a query that might take days to complete? If a query takes that long and is ignoring a specified timeout, that seems like it's own issue that needs resolution. IMHO, the primary goal should be to make SolrCloud clusters more resilient to performance degradations caused by such nasty queries described above. The circuit-breaker approach in the linked ES tickets is clever, but it does not seem to be as generally applicable as the ability to view all running queries with an option to stop them. For example, it seems the linked ES circuit breaker will only trigger for issues deriving from loading too much field data. The problem described above may result from this cause, or any number of other causes. My preference would be to have a response mechanism that 1) applies broadly and 2) a dev-ops guy can execute in a UI like Solr Admin, or even by API. Don't allow runaway queries from harming Solr cluster health or search performance -- Key: SOLR-5986 URL: https://issues.apache.org/jira/browse/SOLR-5986 Project: Solr Issue Type: Improvement Components: search Reporter: Steve Davids Priority: Critical Fix For: 4.9 The intent of this ticket is to have all distributed search requests stop wasting CPU cycles on requests that have already timed out or are so complicated that they won't be able to execute. We have come across a case where a nasty wildcard query within a proximity clause was causing the cluster to enumerate terms for hours even though the query timeout was set to minutes. This caused a noticeable slowdown within the system which made us restart the replicas that happened to service that one request, the worst case scenario are users with a relatively low zk timeout value will have nodes start dropping from the cluster due to long GC pauses. [~amccurry] Built a mechanism into Apache Blur to help with the issue in BLUR-142 (see commit comment for code, though look at the latest code on the trunk for newer bug fixes). Solr should be able to either prevent these problematic queries from running by some heuristic (possibly estimated size of heap usage) or be able to execute a thread interrupt on all query threads once the time threshold is met. This issue mirrors what others have discussed on the mailing list: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200903.mbox/%3c856ac15f0903272054q2dbdbd19kea3c5ba9e105b...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org