[jira] [Commented] (SOLR-5986) Don't allow runaway queries from harming Solr cluster health or search performance

2014-08-19 Thread Jim Walker (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102715#comment-14102715
 ] 

Jim Walker commented on SOLR-5986:
--

Steve, your last post does not incorporate the patch from Anshum does it?

 Don't allow runaway queries from harming Solr cluster health or search 
 performance
 --

 Key: SOLR-5986
 URL: https://issues.apache.org/jira/browse/SOLR-5986
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Steve Davids
Assignee: Anshum Gupta
Priority: Critical
 Fix For: 4.10

 Attachments: SOLR-5986.patch


 The intent of this ticket is to have all distributed search requests stop 
 wasting CPU cycles on requests that have already timed out or are so 
 complicated that they won't be able to execute. We have come across a case 
 where a nasty wildcard query within a proximity clause was causing the 
 cluster to enumerate terms for hours even though the query timeout was set to 
 minutes. This caused a noticeable slowdown within the system which made us 
 restart the replicas that happened to service that one request, the worst 
 case scenario are users with a relatively low zk timeout value will have 
 nodes start dropping from the cluster due to long GC pauses.
 [~amccurry] Built a mechanism into Apache Blur to help with the issue in 
 BLUR-142 (see commit comment for code, though look at the latest code on the 
 trunk for newer bug fixes).
 Solr should be able to either prevent these problematic queries from running 
 by some heuristic (possibly estimated size of heap usage) or be able to 
 execute a thread interrupt on all query threads once the time threshold is 
 met. This issue mirrors what others have discussed on the mailing list: 
 http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200903.mbox/%3c856ac15f0903272054q2dbdbd19kea3c5ba9e105b...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Deleted] (SOLR-6304) Add a way to flatten an input JSON to multiple docs

2014-08-06 Thread Jim Walker (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Walker updated SOLR-6304:
-

Comment: was deleted

(was: Noble, are you looking to rename the first id field to recipeid ?)

 Add a way to flatten an input JSON to multiple docs
 ---

 Key: SOLR-6304
 URL: https://issues.apache.org/jira/browse/SOLR-6304
 Project: Solr
  Issue Type: Improvement
Reporter: Noble Paul
Assignee: Noble Paul

 example
 {noformat}
 curl 
 localhost:8983/update/json/docs?split=/batters/batterf=recipeId:/idf=recipeType:/typef=id:/batters/batter/idf=type:/batters/batter/type
  -d '
 {
   id: 0001,
   type: donut,
   name: Cake,
   ppu: 0.55,
   batters: {
   batter:
   [
   { id: 1001, type: 
 Regular },
   { id: 1002, type: 
 Chocolate },
   { id: 1003, type: 
 Blueberry },
   { id: 1004, type: 
 Devil's Food }
   ]
   }
 }'
 {noformat}
 should produce the following output docs
 {noformat}
 { recipeId:001, recipeType:donut, id:1001, type:Regular }
 { recipeId:001, recipeType:donut, id:1002, type:Chocolate }
 { recipeId:001, recipeType:donut, id:1003, type:Blueberry }
 { recipeId:001, recipeType:donut, id:1004, type:Devil's food }
 {noformat}
 the split param is the element in the tree where it should be split into 
 multiple docs. The 'f' are field name mappings



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6304) Add a way to flatten an input JSON to multiple docs

2014-08-06 Thread Jim Walker (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087687#comment-14087687
 ] 

Jim Walker commented on SOLR-6304:
--

Noble, are you looking to rename the first id field to recipeid ?

 Add a way to flatten an input JSON to multiple docs
 ---

 Key: SOLR-6304
 URL: https://issues.apache.org/jira/browse/SOLR-6304
 Project: Solr
  Issue Type: Improvement
Reporter: Noble Paul
Assignee: Noble Paul

 example
 {noformat}
 curl 
 localhost:8983/update/json/docs?split=/batters/batterf=recipeId:/idf=recipeType:/typef=id:/batters/batter/idf=type:/batters/batter/type
  -d '
 {
   id: 0001,
   type: donut,
   name: Cake,
   ppu: 0.55,
   batters: {
   batter:
   [
   { id: 1001, type: 
 Regular },
   { id: 1002, type: 
 Chocolate },
   { id: 1003, type: 
 Blueberry },
   { id: 1004, type: 
 Devil's Food }
   ]
   }
 }'
 {noformat}
 should produce the following output docs
 {noformat}
 { recipeId:001, recipeType:donut, id:1001, type:Regular }
 { recipeId:001, recipeType:donut, id:1002, type:Chocolate }
 { recipeId:001, recipeType:donut, id:1003, type:Blueberry }
 { recipeId:001, recipeType:donut, id:1004, type:Devil's food }
 {noformat}
 the split param is the element in the tree where it should be split into 
 multiple docs. The 'f' are field name mappings



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5986) Don't allow runaway queries from harming Solr cluster health or search performance

2014-07-21 Thread Jim Walker (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069225#comment-14069225
 ] 

Jim Walker commented on SOLR-5986:
--

Steve, good point regarding automation. That should come first.

I just talked to Anshum who has this covered; he will know what needs to happen 
here best. Cheers

 Don't allow runaway queries from harming Solr cluster health or search 
 performance
 --

 Key: SOLR-5986
 URL: https://issues.apache.org/jira/browse/SOLR-5986
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Steve Davids
Priority: Critical
 Fix For: 4.9


 The intent of this ticket is to have all distributed search requests stop 
 wasting CPU cycles on requests that have already timed out or are so 
 complicated that they won't be able to execute. We have come across a case 
 where a nasty wildcard query within a proximity clause was causing the 
 cluster to enumerate terms for hours even though the query timeout was set to 
 minutes. This caused a noticeable slowdown within the system which made us 
 restart the replicas that happened to service that one request, the worst 
 case scenario are users with a relatively low zk timeout value will have 
 nodes start dropping from the cluster due to long GC pauses.
 [~amccurry] Built a mechanism into Apache Blur to help with the issue in 
 BLUR-142 (see commit comment for code, though look at the latest code on the 
 trunk for newer bug fixes).
 Solr should be able to either prevent these problematic queries from running 
 by some heuristic (possibly estimated size of heap usage) or be able to 
 execute a thread interrupt on all query threads once the time threshold is 
 met. This issue mirrors what others have discussed on the mailing list: 
 http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200903.mbox/%3c856ac15f0903272054q2dbdbd19kea3c5ba9e105b...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5986) Don't allow runaway queries from harming Solr cluster health or search performance

2014-07-15 Thread Jim Walker (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062228#comment-14062228
 ] 

Jim Walker commented on SOLR-5986:
--

Steve, I wonder why you would have to restart the replica? I presume this is 
because that is your only recourse to stop a query that might take days to 
complete?

If a query takes that long and is ignoring a specified timeout, that seems like 
it's own issue that needs resolution.

IMHO, the primary goal should be to make SolrCloud clusters more resilient to 
performance degradations caused by such nasty queries described above.

The circuit-breaker approach in the linked ES tickets is clever, but it does 
not seem to be as generally applicable as the ability to view all running 
queries with an option to stop them. For example, it seems the linked ES 
circuit breaker will only trigger for issues deriving from loading too much 
field data. The problem described above may result from this cause, or any 
number of other causes.

My preference would be to have a response mechanism that 1) applies broadly and 
2) a dev-ops guy can execute in a UI like Solr Admin, or even by API.

 Don't allow runaway queries from harming Solr cluster health or search 
 performance
 --

 Key: SOLR-5986
 URL: https://issues.apache.org/jira/browse/SOLR-5986
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Steve Davids
Priority: Critical
 Fix For: 4.9


 The intent of this ticket is to have all distributed search requests stop 
 wasting CPU cycles on requests that have already timed out or are so 
 complicated that they won't be able to execute. We have come across a case 
 where a nasty wildcard query within a proximity clause was causing the 
 cluster to enumerate terms for hours even though the query timeout was set to 
 minutes. This caused a noticeable slowdown within the system which made us 
 restart the replicas that happened to service that one request, the worst 
 case scenario are users with a relatively low zk timeout value will have 
 nodes start dropping from the cluster due to long GC pauses.
 [~amccurry] Built a mechanism into Apache Blur to help with the issue in 
 BLUR-142 (see commit comment for code, though look at the latest code on the 
 trunk for newer bug fixes).
 Solr should be able to either prevent these problematic queries from running 
 by some heuristic (possibly estimated size of heap usage) or be able to 
 execute a thread interrupt on all query threads once the time threshold is 
 met. This issue mirrors what others have discussed on the mailing list: 
 http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200903.mbox/%3c856ac15f0903272054q2dbdbd19kea3c5ba9e105b...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org