[jira] [Commented] (SOLR-13270) SolrJ does not send "Expect: 100-continue" header
[ https://issues.apache.org/jira/browse/SOLR-13270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785093#comment-16785093 ] Jason Gerlowski commented on SOLR-13270: I'm going to assign this to myself and hope to get to this by or over this weekend. A few questions/notes: 1. If the issue is as simple as us overriding the RequestConfig in {{executeMethod}} how does the POST manage to get the 100-continue header through? Is POST not following the same code path, or is it following the same codepath and 100-continue is coming from somewhere else? Still need to trace this through... 2. Is pulling the RequestConfig from the HttpClient (if it exists) the right fix, or is "RequestConfig" an important-enough configuration object that it should be exposed on the HttpSolrClient.Builder in its own right? Is this an awful idea in light of us moving away from Apache HttpComponents with the in-development HTTP2 versions of these clients? 3. How should conflicts between RequestConfig and any other HttpSolrClient settings interact? Should we overlay the provided RequestConfig settings on top of our defaults where possible? Which values should win when a user specifies a RequestConfig but also chooses conflicting {{SolrClientBuilder.withConnectionTimeout}}/{{SolrClientBuilder.withSocketTimeout}} values? (I don't think any of these are huge roadblocks, just leaving notes for myself on where to pick this up when I return in a few days. If anyone has any thoughts or insight though) > SolrJ does not send "Expect: 100-continue" header > - > > Key: SOLR-13270 > URL: https://issues.apache.org/jira/browse/SOLR-13270 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ >Affects Versions: 7.7 >Reporter: Erlend Garåsen >Priority: Major > > SolrJ does not set the "Expect: 100-continue" header, even though it's > configured in HttpClient: > {code:java} > builder.setDefaultRequestConfig(RequestConfig.custom().setExpectContinueEnabled(true).build());{code} > A HttpClient developer has reviewed the code and says we're setting up > the client correctly, so we have a reason to believe there is a bug in > SolrJ. It's actually a problem we are facing in ManifoldCF, explained in: > https://issues.apache.org/jira/browse/CONNECTORS-1564 > The problem can be reproduced by building and running the following small > Maven project: > [http://folk.uio.no/erlendfg/solr/missing-header.zip] > The application runs SolrJ code where the header does not show up and > HttpClient code where the header is present. > > {code:java} > HttpClientBuilder builder = HttpClients.custom(); > // This should add an Expect: 100-continue header: > builder.setDefaultRequestConfig(RequestConfig.custom().setExpectContinueEnabled(true).build()); > HttpClient httpClient = builder.build(); > // Start Solr and create a core named "test". > String baseUrl = "http://localhost:8983/solr/test;; > // Test using SolrJ — no expect 100 header > HttpSolrClient client = new HttpSolrClient.Builder() > .withHttpClient(httpClient) > .withBaseSolrUrl(baseUrl).build(); > SolrQuery query = new SolrQuery(); > query.setQuery("*:*"); > client.query(query); > // Test using HttpClient directly — expect 100 header shows up: > HttpPost httpPost = new HttpPost(baseUrl); > HttpEntity entity = new InputStreamEntity(new > ByteArrayInputStream("test".getBytes())); > httpPost.setEntity(entity); > httpClient.execute(httpPost); > {code} > When using the last HttpClient test, the expect 100 header appears in > missing-header.log: > {noformat} > http-outgoing-1 >> Expect: 100-continue{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-13256) Ref Guide: Upgrade Notes for 7.7
[ https://issues.apache.org/jira/browse/SOLR-13256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski resolved SOLR-13256. Resolution: Fixed Fix Version/s: 7.7 master (9.0) 8.0 > Ref Guide: Upgrade Notes for 7.7 > > > Key: SOLR-13256 > URL: https://issues.apache.org/jira/browse/SOLR-13256 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Fix For: 8.0, master (9.0), 7.7 > > Attachments: SOLR-13256.patch > > > With 7.7 released and out the door, we should get the ball moving on a 7.7 > ref-guide. One of the prerequisites for that process is putting together > some upgrade notes that can go in > {{solr/solr-ref-guide/src/solr-upgrade-notes.adoc}} for users upgrading to > 7.7. > I'm going to take a look at CHANGES and take a first pass at the "upgrading" > section for 7.7. If anyone has anything they know should be in the list, > please let me know and I'll try to include it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13256) Ref Guide: Upgrade Notes for 7.7
[ https://issues.apache.org/jira/browse/SOLR-13256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783406#comment-16783406 ] Jason Gerlowski commented on SOLR-13256: A bugfix release (7.7.1) was sent out the door last week, so we no longer need to document the maxShardsPerNode and URP issues in our upgrade notes for 7.7. (Though we need to be extra sure to steer users away from 7.7.0). So I'm going to commit the current patch as it is, minus those two bullet points. > Ref Guide: Upgrade Notes for 7.7 > > > Key: SOLR-13256 > URL: https://issues.apache.org/jira/browse/SOLR-13256 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13256.patch > > > With 7.7 released and out the door, we should get the ball moving on a 7.7 > ref-guide. One of the prerequisites for that process is putting together > some upgrade notes that can go in > {{solr/solr-ref-guide/src/solr-upgrade-notes.adoc}} for users upgrading to > 7.7. > I'm going to take a look at CHANGES and take a first pass at the "upgrading" > section for 7.7. If anyone has anything they know should be in the list, > please let me know and I'll try to include it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13255) LanguageIdentifierUpdateProcessor broken for documents sent with SolrJ/javabin
[ https://issues.apache.org/jira/browse/SOLR-13255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781929#comment-16781929 ] Jason Gerlowski commented on SOLR-13255: Hey [~ahubold], have you had a chance to confirm whether 7.7.1 has fixed this issue for you? I trust Noble's fix, but there was a report on the mailing list this morning about a similar ClassCastException on Solr 7.7.1 so I figured it was worth checking in to see if you'd tried out the fix yet or had a chance to do so in the near future... > LanguageIdentifierUpdateProcessor broken for documents sent with SolrJ/javabin > -- > > Key: SOLR-13255 > URL: https://issues.apache.org/jira/browse/SOLR-13255 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - LangId >Affects Versions: 7.7 >Reporter: Andreas Hubold >Assignee: Noble Paul >Priority: Blocker > Fix For: 8.0, 7.7.1 > > Attachments: SOLR-13255.patch, SOLR-13255.patch, SOLR-13255.patch > > > 7.7 changed the object type of string field values that are passed to > UpdateRequestProcessor implementations from java.lang.String to > ByteArrayUtf8CharSequence. SOLR-12992 was mentioned on solr-user as cause. > The LangDetectLanguageIdentifierUpdateProcessor still expects String values, > does not work for CharSequences, and logs warnings instead. For example: > {noformat} > 2019-02-14 13:14:47.537 WARN (qtp802600647-19) [ x:studio] > o.a.s.u.p.LangDetectLanguageIdentifierUpdateProcessor Field name_tokenized > not a String value, not including in detection > {noformat} > I'm not sure, but there could be further places where the changed type for > string values needs to be handled. (Our custom UpdateRequestProcessor are > broken as well since 7.7 and it would be great to have a proper upgrade note > as part of the release notes) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13256) Ref Guide: Upgrade Notes for 7.7
[ https://issues.apache.org/jira/browse/SOLR-13256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771382#comment-16771382 ] Jason Gerlowski commented on SOLR-13256: bq. Maybe it makes sense to hold the 7.7 Ref Guide until we figure out what is going to happen with those issues re: 7.7.1? I think that makes sense. If there is going to be a 7.7.1 soon that we're going to be steering everyone towards anyways, there's no need to include this in the ref-guide. If no one volunteers to do a 7.7.1 release soon and people are going to be using 7.7.0, then we can cross that bridge when we come to it. (Thoughts below are only relevant if there is no 7.7.1 soon, and we need to cross the bridge of deciding whether to include Known Issues in our Upgrade Notes) bq. to date, we haven't mentioned Known Issues in the Upgrade Notes ... [this is] actually really hard for Solr ... What's the criteria for being included here? What about all the prior releases? I'm not sure the slope is as slippery as it looks. Yes, there are 1500 unresolved Solr bugs, but only 8 specifically tagged as affecting 7.7. And only 2 of those are being talked about as serious enough to trigger a bugfix release. The number of "candidates-for-inclusion" drops to just a few pretty quickly. If that's not convincing and your question about having guidelines/criteria wasn't rhetorical, let me offer a strawman for discussion: "Known Issues should only be included in the Upgrade Notes if they are generating discussion about an immediate bugfix release at the time the ref-guide release is being worked on". > Ref Guide: Upgrade Notes for 7.7 > > > Key: SOLR-13256 > URL: https://issues.apache.org/jira/browse/SOLR-13256 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13256.patch > > > With 7.7 released and out the door, we should get the ball moving on a 7.7 > ref-guide. One of the prerequisites for that process is putting together > some upgrade notes that can go in > {{solr/solr-ref-guide/src/solr-upgrade-notes.adoc}} for users upgrading to > 7.7. > I'm going to take a look at CHANGES and take a first pass at the "upgrading" > section for 7.7. If anyone has anything they know should be in the list, > please let me know and I'll try to include it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-13241) Add "autoscaling" tool to the Windows script
[ https://issues.apache.org/jira/browse/SOLR-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski resolved SOLR-13241. Resolution: Fixed > Add "autoscaling" tool to the Windows script > > > Key: SOLR-13241 > URL: https://issues.apache.org/jira/browse/SOLR-13241 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Andrzej Bialecki >Assignee: Jason Gerlowski >Priority: Minor > Attachments: SOLR-13241.patch > > > SOLR-13155 added a command-line tool for testing autoscaling configurations. > The tool can be accessed by Unix {{bin/solr}} script but it's not integrated > with the Windows {{bin\solr.cmd}} script. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13255) LanguageIdentifierUpdateProcessor broken for documents sent with SolrJ/javabin
[ https://issues.apache.org/jira/browse/SOLR-13255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771196#comment-16771196 ] Jason Gerlowski commented on SOLR-13255: bq. it would be great to have a proper upgrade note as part of the release notes Hey [~ahubold], I'm working on "Upgrade Notes" for users for the next release of our ref-guide, and I wanted them to include this issue. I included a short paragraph over on SOLR-13256. Since you mentioned you were interested in seeing this get documented, I wanted to give you a heads up. Feel free to chime in over there about anything I got wrong or any suggestions you might have. > LanguageIdentifierUpdateProcessor broken for documents sent with SolrJ/javabin > -- > > Key: SOLR-13255 > URL: https://issues.apache.org/jira/browse/SOLR-13255 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - LangId >Affects Versions: 7.7 >Reporter: Andreas Hubold >Priority: Major > Fix For: 8.0, 7.7.1 > > Attachments: SOLR-13255.patch > > > 7.7 changed the object type of string field values that are passed to > UpdateRequestProcessor implementations from java.lang.String to > ByteArrayUtf8CharSequence. SOLR-12992 was mentioned on solr-user as cause. > The LangDetectLanguageIdentifierUpdateProcessor still expects String values, > does not work for CharSequences, and logs warnings instead. For example: > {noformat} > 2019-02-14 13:14:47.537 WARN (qtp802600647-19) [ x:studio] > o.a.s.u.p.LangDetectLanguageIdentifierUpdateProcessor Field name_tokenized > not a String value, not including in detection > {noformat} > I'm not sure, but there could be further places where the changed type for > string values needs to be handled. (Our custom UpdateRequestProcessor are > broken as well since 7.7 and it would be great to have a proper upgrade note > as part of the release notes) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-13256) Ref Guide: Upgrade Notes for 7.7
[ https://issues.apache.org/jira/browse/SOLR-13256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski updated SOLR-13256: --- Attachment: SOLR-13256.patch > Ref Guide: Upgrade Notes for 7.7 > > > Key: SOLR-13256 > URL: https://issues.apache.org/jira/browse/SOLR-13256 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13256.patch > > > With 7.7 released and out the door, we should get the ball moving on a 7.7 > ref-guide. One of the prerequisites for that process is putting together > some upgrade notes that can go in > {{solr/solr-ref-guide/src/solr-upgrade-notes.adoc}} for users upgrading to > 7.7. > I'm going to take a look at CHANGES and take a first pass at the "upgrading" > section for 7.7. If anyone has anything they know should be in the list, > please let me know and I'll try to include it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-13255) LanguageIdentifierUpdateProcessor broken for documents sent with SolrJ/javabin
[ https://issues.apache.org/jira/browse/SOLR-13255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771196#comment-16771196 ] Jason Gerlowski edited comment on SOLR-13255 at 2/18/19 4:24 PM: - bq. it would be great to have a proper upgrade note as part of the release notes Hey [~ahubold], I'm working on "Upgrade Notes" for the next release of our ref-guide, and I wanted them to include this issue. I included a short paragraph over on SOLR-13256. Since you mentioned you were interested in seeing this get documented, I wanted to give you a heads up. Feel free to chime in over there about anything I got wrong or any suggestions you might have. was (Author: gerlowskija): bq. it would be great to have a proper upgrade note as part of the release notes Hey [~ahubold], I'm working on "Upgrade Notes" for users for the next release of our ref-guide, and I wanted them to include this issue. I included a short paragraph over on SOLR-13256. Since you mentioned you were interested in seeing this get documented, I wanted to give you a heads up. Feel free to chime in over there about anything I got wrong or any suggestions you might have. > LanguageIdentifierUpdateProcessor broken for documents sent with SolrJ/javabin > -- > > Key: SOLR-13255 > URL: https://issues.apache.org/jira/browse/SOLR-13255 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - LangId >Affects Versions: 7.7 >Reporter: Andreas Hubold >Priority: Major > Fix For: 8.0, 7.7.1 > > Attachments: SOLR-13255.patch > > > 7.7 changed the object type of string field values that are passed to > UpdateRequestProcessor implementations from java.lang.String to > ByteArrayUtf8CharSequence. SOLR-12992 was mentioned on solr-user as cause. > The LangDetectLanguageIdentifierUpdateProcessor still expects String values, > does not work for CharSequences, and logs warnings instead. For example: > {noformat} > 2019-02-14 13:14:47.537 WARN (qtp802600647-19) [ x:studio] > o.a.s.u.p.LangDetectLanguageIdentifierUpdateProcessor Field name_tokenized > not a String value, not including in detection > {noformat} > I'm not sure, but there could be further places where the changed type for > string values needs to be handled. (Our custom UpdateRequestProcessor are > broken as well since 7.7 and it would be great to have a proper upgrade note > as part of the release notes) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-13256) Ref Guide: Upgrade Notes for 7.7
Jason Gerlowski created SOLR-13256: -- Summary: Ref Guide: Upgrade Notes for 7.7 Key: SOLR-13256 URL: https://issues.apache.org/jira/browse/SOLR-13256 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: documentation Reporter: Jason Gerlowski Assignee: Jason Gerlowski With 7.7 released and out the door, we should get the ball moving on a 7.7 ref-guide. One of the prerequisites for that process is putting together some upgrade notes that can go in {{solr/solr-ref-guide/src/solr-upgrade-notes.adoc}} for users upgrading to 7.7. I'm going to take a look at CHANGES and take a first pass at the "upgrading" section for 7.7. If anyone has anything they know should be in the list, please let me know and I'll try to include it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13155) CLI tool for testing autoscaling suggestions against a live cluster
[ https://issues.apache.org/jira/browse/SOLR-13155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16767213#comment-16767213 ] Jason Gerlowski commented on SOLR-13155: Great, I'll remove it as a part of SOLR-13241. bq. I think the pattern for other CLI commands is that there is some (partial) validation of the arguments in the script and the remaining part is done in Java. In this case it's perfectly valid to call this tool without any arguments Yeah, it's a bit confusing with the two different tool-patterns we have right now. As I understand things the difference is less about having a valid 0-arg usage, and more around a decision that was made at some point to put as little new code in {{bin/solr}} and {{bin/solr.cmd}} as we can get away with. e.g. the {{config}} tool has required arguments but does all arg parsing in Java.Windows-script is impossible to maintain. Even if it was a more well-known language there's still the issue of duplicating logic that could just live in one place. So all the newer tools do arg-parsing in Java afaik. > CLI tool for testing autoscaling suggestions against a live cluster > --- > > Key: SOLR-13155 > URL: https://issues.apache.org/jira/browse/SOLR-13155 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Fix For: 8.0, master (9.0) > > Attachments: SOLR-13155.patch, SOLR-13155.patch, SOLR-13155.patch > > > Solr already provides /autoscaling/diagnostics and /autoscaling/suggestions > endpoints. In some situations it would be very helpful to be able to run > "what if" scenarios using data about nodes and replicas taken from a > production cluster but with a different autoscaling policy than the one that > is deployed, without also worrying that the calculations would negatively > impact a production cluster's Overseer leader. > All necessary classes (including the Policy engine) are self-contained in the > SolrJ component, so it's just a matter of packaging and writing a CLI tool + > a wrapper script. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-13155) CLI tool for testing autoscaling suggestions against a live cluster
[ https://issues.apache.org/jira/browse/SOLR-13155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16767213#comment-16767213 ] Jason Gerlowski edited comment on SOLR-13155 at 2/13/19 1:56 PM: - Great, I'll remove it as a part of SOLR-13241. bq. I think the pattern for other CLI commands is that there is some (partial) validation of the arguments in the script and the remaining part is done in Java. In this case it's perfectly valid to call this tool without any arguments Yeah, it's a bit confusing with the two different tool-patterns we have right now. As I understand things the difference is less about having a valid 0-arg usage, and more around a decision that was made at some point to put as little new code in {{bin/solr}} and {{bin/solr.cmd}} as we can get away with. e.g. the {{config}} tool has required arguments but does all arg parsing in Java. Windows-script is impossible to maintain. Even if it was a more well-known language there's still the issue of duplicating logic that could just live in one place. So all the newer tools do arg-parsing in Java afaik. was (Author: gerlowskija): Great, I'll remove it as a part of SOLR-13241. bq. I think the pattern for other CLI commands is that there is some (partial) validation of the arguments in the script and the remaining part is done in Java. In this case it's perfectly valid to call this tool without any arguments Yeah, it's a bit confusing with the two different tool-patterns we have right now. As I understand things the difference is less about having a valid 0-arg usage, and more around a decision that was made at some point to put as little new code in {{bin/solr}} and {{bin/solr.cmd}} as we can get away with. e.g. the {{config}} tool has required arguments but does all arg parsing in Java.Windows-script is impossible to maintain. Even if it was a more well-known language there's still the issue of duplicating logic that could just live in one place. So all the newer tools do arg-parsing in Java afaik. > CLI tool for testing autoscaling suggestions against a live cluster > --- > > Key: SOLR-13155 > URL: https://issues.apache.org/jira/browse/SOLR-13155 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Fix For: 8.0, master (9.0) > > Attachments: SOLR-13155.patch, SOLR-13155.patch, SOLR-13155.patch > > > Solr already provides /autoscaling/diagnostics and /autoscaling/suggestions > endpoints. In some situations it would be very helpful to be able to run > "what if" scenarios using data about nodes and replicas taken from a > production cluster but with a different autoscaling policy than the one that > is deployed, without also worrying that the calculations would negatively > impact a production cluster's Overseer leader. > All necessary classes (including the Policy engine) are self-contained in the > SolrJ component, so it's just a matter of packaging and writing a CLI tool + > a wrapper script. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-13241) Add "autoscaling" tool to the Windows script
[ https://issues.apache.org/jira/browse/SOLR-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski reassigned SOLR-13241: -- Assignee: Jason Gerlowski > Add "autoscaling" tool to the Windows script > > > Key: SOLR-13241 > URL: https://issues.apache.org/jira/browse/SOLR-13241 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Andrzej Bialecki >Assignee: Jason Gerlowski >Priority: Minor > Attachments: SOLR-13241.patch > > > SOLR-13155 added a command-line tool for testing autoscaling configurations. > The tool can be accessed by Unix {{bin/solr}} script but it's not integrated > with the Windows {{bin\solr.cmd}} script. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-13241) Add "autoscaling" tool to the Windows script
[ https://issues.apache.org/jira/browse/SOLR-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski updated SOLR-13241: --- Attachment: SOLR-13241.patch > Add "autoscaling" tool to the Windows script > > > Key: SOLR-13241 > URL: https://issues.apache.org/jira/browse/SOLR-13241 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Andrzej Bialecki >Priority: Minor > Attachments: SOLR-13241.patch > > > SOLR-13155 added a command-line tool for testing autoscaling configurations. > The tool can be accessed by Unix {{bin/solr}} script but it's not integrated > with the Windows {{bin\solr.cmd}} script. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13155) CLI tool for testing autoscaling suggestions against a live cluster
[ https://issues.apache.org/jira/browse/SOLR-13155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766434#comment-16766434 ] Jason Gerlowski commented on SOLR-13155: Hey [~ab], took a look at your latest patch this morning while preparing to write a Windows equivalent of the {{bin/solr}} bits you just added. One question: You add a {{print_usage}} section for the new autoscaling command... {code} + elif [ "$CMD" == "autoscaling" ]; then +echo "" +echo "Usage: solr autoscaling [-z zkHost] [-a ] [-s] [-d] [-n] [-r]" +echo "" +echo " Calculate autoscaling policy suggestions and diagnostic information, using either the deployed" +echo " autoscaling configuration or the one supplied on the command line. This calculation takes place" +echo " on the client-side without affecting the running cluster except for fetching the node and replica" +echo " metrics from the cluster. For detailed usage instructions, do:" +echo "" +echo "bin/solr autoscaling -help" +echo "" {code} But I can't figure out what command would actually trigger this help text. The "autoscaling" command defers parsing its args until Java-land, so any {{-h}}/{{--help}}/etc. argument will trigger the commons-cli generated help text instead: {code} ➜ solr git:(master) ✗ bin/solr autoscaling -h INFO - 2019-02-12 15:33:01.434; org.apache.solr.util.configuration.SSLCredentialProviderFactory; Processing SSL Credential Provider chain: env;sysprop Failed to parse command-line arguments due to: Unrecognized option: -h usage: org.apache.solr.util.SolrCLI -a,--configAutoscaling config file, defaults to the one deployed in the cluster. -all Turn on all options to get all available information. -c,--clusterState Show ClusterState (collections layout) -d,--diagnostics Show calculated diagnostics -help Print this message -n,--sortedNodes Show sorted nodes with diagnostics -r,--redactRedact node and collection names (original names will be consistently randomized) -s,--suggestions Show calculated suggestions -stats Show summarized collection & node statistics. -verbose Generate verbose log messages -zkHost Address of the Zookeeper ensemble; defaults to: localhost:9983 {code} Am I missing some command that manages to trigger that help text, or is it dead-code that we can remove or change? (I'm only asking so I know whether to include similar help text in the solr.cmd version. If the {{bin/solr}} help text is dead code, I'm happy to remove it for you when I commit plumbing on the Windows side tomorrow.) > CLI tool for testing autoscaling suggestions against a live cluster > --- > > Key: SOLR-13155 > URL: https://issues.apache.org/jira/browse/SOLR-13155 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Fix For: 8.0, master (9.0) > > Attachments: SOLR-13155.patch, SOLR-13155.patch, SOLR-13155.patch > > > Solr already provides /autoscaling/diagnostics and /autoscaling/suggestions > endpoints. In some situations it would be very helpful to be able to run > "what if" scenarios using data about nodes and replicas taken from a > production cluster but with a different autoscaling policy than the one that > is deployed, without also worrying that the calculations would negatively > impact a production cluster's Overseer leader. > All necessary classes (including the Policy engine) are self-contained in the > SolrJ component, so it's just a matter of packaging and writing a CLI tool + > a wrapper script. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13042) Miscellaneous JSON Facet API docs improvements
[ https://issues.apache.org/jira/browse/SOLR-13042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762311#comment-16762311 ] Jason Gerlowski commented on SOLR-13042: Thanks for the double-check Mikhail. I didn't want to squeeze this into branch_7_7 with the release going out the door. I guess the ref-guide is built separately, but it still seemed last minute. Anyways, I've committed this everywhere else I wanted to, so I'll mark this as closed. > Miscellaneous JSON Facet API docs improvements > -- > > Key: SOLR-13042 > URL: https://issues.apache.org/jira/browse/SOLR-13042 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation >Affects Versions: 7.5, 8.0 >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Minor > Attachments: SOLR-13042.patch, SOLR-13042.patch, SOLR-13042.patch > > > While working on SOLR-12965 I noticed a few minor issues with the JSON > faceting ref-guide pages. Nothing serious, just a few annoyances. Tweaks > include: > * missing/insufficient description of some params for Heatmap facets > * Weird formatting on "Domain Filters" example > * missing "fields"/"fl" in the "Parameters Mapping" table > Figured I'd just create a JIRA and fix these before I forgot about them -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12330) JSON Facet syntax errors are responded as runtime exceptions with 500 code
[ https://issues.apache.org/jira/browse/SOLR-12330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762313#comment-16762313 ] Jason Gerlowski commented on SOLR-12330: LGTM. I _think_ you should be able to drop the {{json-facet-api.adoc}} changes from the patch, as I have that information covered already in some recent tweaks I made the the JSON faceting docs over on SOLR-13042. But worth double checking me on that, as there might be a detail I missed. > JSON Facet syntax errors are responded as runtime exceptions with 500 code > -- > > Key: SOLR-12330 > URL: https://issues.apache.org/jira/browse/SOLR-12330 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module >Affects Versions: 7.3 >Reporter: Mikhail Khludnev >Assignee: Mikhail Khludnev >Priority: Major > Attachments: SOLR-12330-combined.patch, SOLR-12330.patch, > SOLR-12330.patch, SOLR-12330.patch, SOLR-12330.patch, SOLR-12330.patch, > SOLR-12330.patch, SOLR-12330.patch > > > Just encounter such weird behaviour, will recheck and followup. > \{{"filter":["\{!v=$bogus}"]}} responds back with just NPE which makes > impossible to guess the reason. > -It might be even worse, since- \{{"filter":[\{"param":"bogus"}]}} seems > like just silently ignored. Turns out it's ok see SOLR-9682 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-12330) JSON Facet syntax errors are responded as runtime exceptions with 500 code
[ https://issues.apache.org/jira/browse/SOLR-12330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762313#comment-16762313 ] Jason Gerlowski edited comment on SOLR-12330 at 2/7/19 2:37 AM: LGTM. I _think_ you should be able to drop the {{json-facet-api.adoc}} changes from the patch, as I have that information covered already in some recent tweaks I made to the JSON faceting docs over on SOLR-13042. But worth double checking me on that, as there might be a detail I missed. was (Author: gerlowskija): LGTM. I _think_ you should be able to drop the {{json-facet-api.adoc}} changes from the patch, as I have that information covered already in some recent tweaks I made the the JSON faceting docs over on SOLR-13042. But worth double checking me on that, as there might be a detail I missed. > JSON Facet syntax errors are responded as runtime exceptions with 500 code > -- > > Key: SOLR-12330 > URL: https://issues.apache.org/jira/browse/SOLR-12330 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module >Affects Versions: 7.3 >Reporter: Mikhail Khludnev >Assignee: Mikhail Khludnev >Priority: Major > Attachments: SOLR-12330-combined.patch, SOLR-12330.patch, > SOLR-12330.patch, SOLR-12330.patch, SOLR-12330.patch, SOLR-12330.patch, > SOLR-12330.patch, SOLR-12330.patch > > > Just encounter such weird behaviour, will recheck and followup. > \{{"filter":["\{!v=$bogus}"]}} responds back with just NPE which makes > impossible to guess the reason. > -It might be even worse, since- \{{"filter":[\{"param":"bogus"}]}} seems > like just silently ignored. Turns out it's ok see SOLR-9682 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-13042) Miscellaneous JSON Facet API docs improvements
[ https://issues.apache.org/jira/browse/SOLR-13042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski resolved SOLR-13042. Resolution: Done > Miscellaneous JSON Facet API docs improvements > -- > > Key: SOLR-13042 > URL: https://issues.apache.org/jira/browse/SOLR-13042 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation >Affects Versions: 7.5, 8.0 >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Minor > Attachments: SOLR-13042.patch, SOLR-13042.patch, SOLR-13042.patch > > > While working on SOLR-12965 I noticed a few minor issues with the JSON > faceting ref-guide pages. Nothing serious, just a few annoyances. Tweaks > include: > * missing/insufficient description of some params for Heatmap facets > * Weird formatting on "Domain Filters" example > * missing "fields"/"fl" in the "Parameters Mapping" table > Figured I'd just create a JIRA and fix these before I forgot about them -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13174) NPE in Json Facet API for Facet range
[ https://issues.apache.org/jira/browse/SOLR-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762257#comment-16762257 ] Jason Gerlowski commented on SOLR-13174: Ok, I'll leave it in Mikhail's hands over there and will close this out. Thanks for the heads up, and for putting the legwork in! > NPE in Json Facet API for Facet range > - > > Key: SOLR-13174 > URL: https://issues.apache.org/jira/browse/SOLR-13174 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module >Reporter: Munendra S N >Assignee: Jason Gerlowski >Priority: Minor > Attachments: SOLR-13174.patch > > > There is mismatch in the error and status code between JSON facet's facet > range and Classical facet range. > When start or end or gap is not specified in the request, Classical faceting > returns Bad request where as JSON facet returns 500 without below trace > {code:java} > { > "trace": "java.lang.NullPointerException\n\tat > org.apache.solr.search.facet.FacetRangeProcessor.createRangeList(FacetRange.java:216)\n\tat > > org.apache.solr.search.facet.FacetRangeProcessor.getRangeCounts(FacetRange.java:206)\n\tat > > org.apache.solr.search.facet.FacetRangeProcessor.process(FacetRange.java:98)\n\tat > > org.apache.solr.search.facet.FacetProcessor.processSubs(FacetProcessor.java:460)\n\tat > > org.apache.solr.search.facet.FacetProcessor.fillBucket(FacetProcessor.java:407)\n\tat > > org.apache.solr.search.facet.FacetQueryProcessor.process(FacetQuery.java:64)\n\tat > org.apache.solr.search.facet.FacetModule.process(FacetModule.java:154)\n\tat > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:296)\n\tat > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)\n\tat > org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)\n\tat > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)\n\tat > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)\n\tat > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)\n\tat > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)\n\tat > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat > > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat > org.eclipse.jetty.server.Server.handle(Server.java:534)\n\tat > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)\n\tat > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)\n\tat > > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\tat > org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\tat > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat > > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)\n\tat > > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)\n\tat > > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)\n\tat > > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)\n\tat > > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)\n\tat > java.lang.Thread.run(Thread.java:748)\n", > "code": 500 > } > {code} -- This message was sent by Atlassian JIRA
[jira] [Commented] (SOLR-13042) Miscellaneous JSON Facet API docs improvements
[ https://issues.apache.org/jira/browse/SOLR-13042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760886#comment-16760886 ] Jason Gerlowski commented on SOLR-13042: Going to merge this later today if no one has any feedback on the structure or wording. > Miscellaneous JSON Facet API docs improvements > -- > > Key: SOLR-13042 > URL: https://issues.apache.org/jira/browse/SOLR-13042 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation >Affects Versions: 7.5, 8.0 >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Minor > Attachments: SOLR-13042.patch, SOLR-13042.patch, SOLR-13042.patch > > > While working on SOLR-12965 I noticed a few minor issues with the JSON > faceting ref-guide pages. Nothing serious, just a few annoyances. Tweaks > include: > * missing/insufficient description of some params for Heatmap facets > * Weird formatting on "Domain Filters" example > * missing "fields"/"fl" in the "Parameters Mapping" table > Figured I'd just create a JIRA and fix these before I forgot about them -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-13174) NPE in Json Facet API for Facet range
[ https://issues.apache.org/jira/browse/SOLR-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski reassigned SOLR-13174: -- Assignee: Jason Gerlowski > NPE in Json Facet API for Facet range > - > > Key: SOLR-13174 > URL: https://issues.apache.org/jira/browse/SOLR-13174 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module >Reporter: Munendra S N >Assignee: Jason Gerlowski >Priority: Minor > Attachments: SOLR-13174.patch > > > There is mismatch in the error and status code between JSON facet's facet > range and Classical facet range. > When start or end or gap is not specified in the request, Classical faceting > returns Bad request where as JSON facet returns 500 without below trace > {code:java} > { > "trace": "java.lang.NullPointerException\n\tat > org.apache.solr.search.facet.FacetRangeProcessor.createRangeList(FacetRange.java:216)\n\tat > > org.apache.solr.search.facet.FacetRangeProcessor.getRangeCounts(FacetRange.java:206)\n\tat > > org.apache.solr.search.facet.FacetRangeProcessor.process(FacetRange.java:98)\n\tat > > org.apache.solr.search.facet.FacetProcessor.processSubs(FacetProcessor.java:460)\n\tat > > org.apache.solr.search.facet.FacetProcessor.fillBucket(FacetProcessor.java:407)\n\tat > > org.apache.solr.search.facet.FacetQueryProcessor.process(FacetQuery.java:64)\n\tat > org.apache.solr.search.facet.FacetModule.process(FacetModule.java:154)\n\tat > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:296)\n\tat > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)\n\tat > org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)\n\tat > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)\n\tat > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)\n\tat > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)\n\tat > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)\n\tat > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat > > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat > org.eclipse.jetty.server.Server.handle(Server.java:534)\n\tat > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)\n\tat > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)\n\tat > > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\tat > org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\tat > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat > > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)\n\tat > > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)\n\tat > > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)\n\tat > > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)\n\tat > > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)\n\tat > java.lang.Thread.run(Thread.java:748)\n", > "code": 500 > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail:
[jira] [Commented] (SOLR-9515) Update to Hadoop 3
[ https://issues.apache.org/jira/browse/SOLR-9515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756413#comment-16756413 ] Jason Gerlowski commented on SOLR-9515: --- Took a quick look. I see the biggest part of the patch (other than license changes) is the HttpServer2 class you added. But I couldn't trace out how HttpServer2 gets invoked. Nothing calls the Builder in that class, AFAICT. What am I missing? Other than that question, everything looks good so far to me at least. > Update to Hadoop 3 > -- > > Key: SOLR-9515 > URL: https://issues.apache.org/jira/browse/SOLR-9515 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Mark Miller >Assignee: Kevin Risden >Priority: Major > Fix For: 8.0, master (9.0) > > Attachments: SOLR-9515.patch, SOLR-9515.patch, SOLR-9515.patch > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Hadoop 3 is not out yet, but I'd like to iron out the upgrade to be prepared. > I'll start up a dev branch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-13177) aboul SOLR-5480
[ https://issues.apache.org/jira/browse/SOLR-13177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski resolved SOLR-13177. Resolution: Invalid Closing this ticket as "invalid". [~phoema], Solr's JIRA instance is for tracking bugs, not for use as a support portal or for asking questions about JIRAs that already exist. (There's nothing wrong with those questions, they just don't belong here. Try asking if anyone has any updates on SOLR-5480 itself. If no one answers, that likely means no one has any updates that aren't already on that issue.) > aboul SOLR-5480 > --- > > Key: SOLR-13177 > URL: https://issues.apache.org/jira/browse/SOLR-13177 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 7.6 >Reporter: phoema >Priority: Blocker > > I have the same problem as Issue SOLR-5480. When will this issue be solved? > https://issues.apache.org/jira/browse/SOLR-5480 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13162) Admin UI development-test cycle is slow
[ https://issues.apache.org/jira/browse/SOLR-13162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748882#comment-16748882 ] Jason Gerlowski commented on SOLR-13162: It depends what files you're editing, but I think there is an ant command for repackaging the admin-ui alone. You should be able to run {{ant dist}} from the {{solr/webapp}} dir. Could totally be misunderstanding what you're after here, or maybe {{ant dist}} is deficient in some way. Just wanted to mention it on the off chance that's what you're looking for. > Admin UI development-test cycle is slow > --- > > Key: SOLR-13162 > URL: https://issues.apache.org/jira/browse/SOLR-13162 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Reporter: Jeremy Branham >Priority: Minor > > When developing the admin user interface, it takes a long time to rebuild the > server to do testing. > It would be nice to have a small test harness or the admin ui, so that 'ant > server' doesnt need to be executed before testing changes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13116) Add Admin UI login support for Kerberos
[ https://issues.apache.org/jira/browse/SOLR-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748812#comment-16748812 ] Jason Gerlowski commented on SOLR-13116: I guess I'm fine with that. I'm not sure what information we'd add that wouldn't be a restatement of the instructions already on the login page. Probably worth double checking that this is given a good description in CHANGES.txt though, since it's such a visible change for anyone using auth. > Add Admin UI login support for Kerberos > --- > > Key: SOLR-13116 > URL: https://issues.apache.org/jira/browse/SOLR-13116 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Affects Versions: 8.0, 7.7 >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Fix For: 8.0, 7.7 > > Attachments: SOLR-13116.patch, SOLR-13116.patch, eventual_auth.png, > improved_login_page.png > > > Spinoff from SOLR-7896. Kerberos auth plugin should get Admin UI Login > support. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13116) Add Admin UI login support for Kerberos
[ https://issues.apache.org/jira/browse/SOLR-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743048#comment-16743048 ] Jason Gerlowski commented on SOLR-13116: Oh shoot, I missed your last comment, sorry. I don't remember if there was a refguide link or not. Maybe there was a link there but my browser had issues with it for some reason? I'll take a look again today with your latest patch and let you know. Hopefully we can get this cleared up. > Add Admin UI login support for Kerberos > --- > > Key: SOLR-13116 > URL: https://issues.apache.org/jira/browse/SOLR-13116 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Affects Versions: 8.0, 7.7 >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Fix For: 8.0, 7.7 > > Attachments: SOLR-13116.patch, SOLR-13116.patch, eventual_auth.png, > improved_login_page.png > > > Spinoff from SOLR-7896. Kerberos auth plugin should get Admin UI Login > support. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13116) Add Admin UI login support for Kerberos
[ https://issues.apache.org/jira/browse/SOLR-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740479#comment-16740479 ] Jason Gerlowski commented on SOLR-13116: Just got a chance to test your patch. Things look better (for Kerberos at least). I've attached a screenshot showing the result: !improved_login_page.png! > Add Admin UI login support for Kerberos > --- > > Key: SOLR-13116 > URL: https://issues.apache.org/jira/browse/SOLR-13116 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Affects Versions: 8.0, 7.7 >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Fix For: 8.0, 7.7 > > Attachments: SOLR-13116.patch, eventual_auth.png, > improved_login_page.png > > > Spinoff from SOLR-7896. Kerberos auth plugin should get Admin UI Login > support. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-13116) Add Admin UI login support for Kerberos
[ https://issues.apache.org/jira/browse/SOLR-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski updated SOLR-13116: --- Attachment: improved_login_page.png > Add Admin UI login support for Kerberos > --- > > Key: SOLR-13116 > URL: https://issues.apache.org/jira/browse/SOLR-13116 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Affects Versions: 8.0, 7.7 >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Fix For: 8.0, 7.7 > > Attachments: SOLR-13116.patch, eventual_auth.png, > improved_login_page.png > > > Spinoff from SOLR-7896. Kerberos auth plugin should get Admin UI Login > support. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-13042) Miscellaneous JSON Facet API docs improvements
[ https://issues.apache.org/jira/browse/SOLR-13042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski updated SOLR-13042: --- Attachment: SOLR-13042.patch > Miscellaneous JSON Facet API docs improvements > -- > > Key: SOLR-13042 > URL: https://issues.apache.org/jira/browse/SOLR-13042 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation >Affects Versions: 7.5, 8.0 >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Minor > Attachments: SOLR-13042.patch, SOLR-13042.patch, SOLR-13042.patch > > > While working on SOLR-12965 I noticed a few minor issues with the JSON > faceting ref-guide pages. Nothing serious, just a few annoyances. Tweaks > include: > * missing/insufficient description of some params for Heatmap facets > * Weird formatting on "Domain Filters" example > * missing "fields"/"fl" in the "Parameters Mapping" table > Figured I'd just create a JIRA and fix these before I forgot about them -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13116) Add Admin UI login support for Kerberos
[ https://issues.apache.org/jira/browse/SOLR-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737238#comment-16737238 ] Jason Gerlowski commented on SOLR-13116: Thanks for the pointers Kevin; will check them out. [~janhoy] I reproduced again this morning and saw the following error in my browser's web console. I'm not familiar enough with how the login page is implemented to tell if it's helpful. But hopefully you find it enlightening: {code} Error: wwwHeader is null @http://solr1:8983/solr/js/angular/controllers/login.js:31:11 invoke@http://solr1:8983/solr/libs/angular.js:4205:14 instantiate@http://solr1:8983/solr/libs/angular.js:4213:27 $ControllerProvider/this.$gethttp://solr1:8983/solr/libs/angular.js:8472:18 link@http://solr1:8983/solr/libs/angular-route.min.js:30:268 invokeLinkFn@http://solr1:8983/solr/libs/angular.js:8236:9 nodeLinkFn@http://solr1:8983/solr/libs/angular.js:7745:11 compositeLinkFn@http://solr1:8983/solr/libs/angular.js:7098:13 publicLinkFn@http://solr1:8983/solr/libs/angular.js:6977:30 boundTranscludeFn@http://solr1:8983/solr/libs/angular.js:7116:16 controllersBoundTransclude@http://solr1:8983/solr/libs/angular.js:7772:18 x@http://solr1:8983/solr/libs/angular-route.min.js:29:364 $broadcast@http://solr1:8983/solr/libs/angular.js:14725:15 m/<@http://solr1:8983/solr/libs/angular-route.min.js:34:426 processQueue@http://solr1:8983/solr/libs/angular.js:13193:27 scheduleProcessQueue/<@http://solr1:8983/solr/libs/angular.js:13209:27 $eval@http://solr1:8983/solr/libs/angular.js:14406:16 $digest@http://solr1:8983/solr/libs/angular.js:14222:15 $apply@http://solr1:8983/solr/libs/angular.js:14511:13 done@http://solr1:8983/solr/libs/angular.js:9669:36 completeRequest@http://solr1:8983/solr/libs/angular.js:9859:7 requestLoaded@http://solr1:8983/solr/libs/angular.js:9800:9 {code} There's nothing that appears relevant in {{solr.log}}. As for why your kinit command just hung, I've got a guess. Docker on Linux allows the host machine to reach docker containers by IP address. But docker on Mac [doesnt|https://docs.docker.com/docker-for-mac/networking/#per-container-ip-addressing-is-not-possible]. Since running {{kinit}} on the host machine (your macbook) has it try to talk to the Kerberos KDC server by IP address, {{kinit}} just hangs because it can't route to the docker container hosting the KDC. That's my theory at least. If you give it a shot on a Linux box, I bet it'll work for you. Anyway, hopefully you can reproduce it on your own. But if you still can't reproduce, or want a double check that a fix works, happy to run the reproduction again. > Add Admin UI login support for Kerberos > --- > > Key: SOLR-13116 > URL: https://issues.apache.org/jira/browse/SOLR-13116 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Affects Versions: 8.0, 7.7 >Reporter: Jan Høydahl >Priority: Major > Attachments: eventual_auth.png > > > Spinoff from SOLR-7896. Kerberos auth plugin should get Admin UI Login > support. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-12613) Rename "Cloud" tab as "Cluster" in Admin UI
[ https://issues.apache.org/jira/browse/SOLR-12613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737185#comment-16737185 ] Jason Gerlowski edited comment on SOLR-12613 at 1/8/19 2:51 PM: Why not both? I think there's general consensus that we would love to improve the UI in larger ways, but any larger effort is bound to take longer to get going (particularly when few committers are familiar with the UI). If renaming this menu tab helps our users in the interim, and there's going to be at least one release before a broader effort might address this, I think people should feel welcome to take it on if they've got time. was (Author: gerlowskija): Why not both? I think there's general consensus that we would love to improve the UI in larger ways, but any larger effort is bound to take longer to get going (particularly when few committers are familiar with the UI). If renaming this menu tab helps our users in the interim, and there's going to be at least one release before a broader effort might address this, I think people should feel welcome to take it on. > Rename "Cloud" tab as "Cluster" in Admin UI > --- > > Key: SOLR-12613 > URL: https://issues.apache.org/jira/browse/SOLR-12613 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Reporter: Jan Høydahl >Priority: Major > Labels: newdev > Fix For: 8.0 > > > Spinoff from SOLR-8207. When adding more cluster-wide functionality to the > Admin UI, it feels better to name the "Cloud" UI tab as "Cluster". > In addition to renaming the "Cloud" tab, we should also change the URL part > from {{~cloud}} to {{~cluster}}, update reference guide page names, > screenshots and references etc. > I propose this change is not introduced in 7.x due to the impact, so tagged > it as fix-version 8.0. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12613) Rename "Cloud" tab as "Cluster" in Admin UI
[ https://issues.apache.org/jira/browse/SOLR-12613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737185#comment-16737185 ] Jason Gerlowski commented on SOLR-12613: Why not both? I think there's general consensus that we would love to improve the UI in larger ways, but any larger effort is bound to take longer to get going (particularly when few committers are familiar with the UI). If renaming this menu tab helps our users in the interim, and there's going to be at least one release before a broader effort might address this, I think people should feel welcome to take it on. > Rename "Cloud" tab as "Cluster" in Admin UI > --- > > Key: SOLR-12613 > URL: https://issues.apache.org/jira/browse/SOLR-12613 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Reporter: Jan Høydahl >Priority: Major > Labels: newdev > Fix For: 8.0 > > > Spinoff from SOLR-8207. When adding more cluster-wide functionality to the > Admin UI, it feels better to name the "Cloud" UI tab as "Cluster". > In addition to renaming the "Cloud" tab, we should also change the URL part > from {{~cloud}} to {{~cluster}}, update reference guide page names, > screenshots and references etc. > I propose this change is not introduced in 7.x due to the impact, so tagged > it as fix-version 8.0. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13116) Add Admin UI login support for Kerberos
[ https://issues.apache.org/jira/browse/SOLR-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735903#comment-16735903 ] Jason Gerlowski commented on SOLR-13116: Ok, that makes sense. The page would be much more appropriate if only the bottom section appeared, as you indicate is "expected'. I'll retry this afternoon when I get a few spare minutes and see if there's any particularly helpful errors in the browser console. I didn't see anything interesting in solr.log previously, fwiw. Yeah, improving the message for Kerberos to be close to what you suggested would be a big improvement IMO. I'd suggest a slight rewording...there's two main things that can go wrong with Kerberos in the browser, and it'd be helpful to mention both of them a bit more explicitly. I'd suggest something like: "Your browser did not provide the required information to authenticate using Kerberos. Please check that your computer has a valid ticket for communicating with Solr, and that your browser is properly configured to provide that ticket when required. For more information consult Solr's Kerberos documentation[link]. The response from the server was: <..>" > Add Admin UI login support for Kerberos > --- > > Key: SOLR-13116 > URL: https://issues.apache.org/jira/browse/SOLR-13116 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Affects Versions: master (9.0), 7.7 >Reporter: Jan Høydahl >Priority: Major > Attachments: eventual_auth.png > > > Spinoff from SOLR-7896. Kerberos auth plugin should get Admin UI Login > support. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-13116) Add Admin UI login support for Kerberos
[ https://issues.apache.org/jira/browse/SOLR-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735833#comment-16735833 ] Jason Gerlowski edited comment on SOLR-13116 at 1/7/19 1:53 PM: Hey Jan, I just tested your login screen with Kerberos (this includes the changes you made an hour or so ago, to clarify) This is the behavior I'm seeing: 1. With a Kerberos ticket in my local ticket cache, I can get to the admin UI and perform operations without ever seeing a login screen. The admin UI is definitely usable. 2. If I destroy my Kerberos ticket or it expires, subsequent navigation or operations will produce a username/password login page. 3. If my machine acquires a valid ticket, I can then click on the 'Dashboard' menu item to get away from the login page and back to the dashboard. So in summary, the Admin UI is definitely usable when Kerberos auth is being used. But that said the login/auth page still seems a little BasicAuth-specific, and inappropriate for other auth schemes. Some specific issues: 1. We probably shouldn't be displaying {{username}} and {{password}} dialog boxes unless we're sure the user is using a auth scheme where those values make sense (they don't in Kerberos, for example). 2. Some other terms on the page also seem a little too Basic Auth specific to be useful for other auth schemes. "Login/Logout" might be examples of this - those terms are rarely used when discussing Kerberos authentication. Not entirely sure on this though. 3. It looks like when Kerberos is used, several templated values needed for the auth page are missing, causing UI errors. Not familiar with how the UI works, so I may be off on the cause here. I've attached a screenshot below of the UI errors for the auth page on {{master}} !eventual_auth.png! As for Kerberos/Solr testing, I recently came across a writeup/helper-repo that Ishan put together a year or two ago. If you've got docker installed, it makes setting up and testing Kerberos refreshingly straightforward. Give it a shot if you get a chance: https://github.com/chatman/solr-kerberos-docker was (Author: gerlowskija): Hey Jan, I just tested your login screen with Kerberos (this includes the changes you made an hour or so ago, to clarify) This is the behavior I'm seeing: 1. With a Kerberos ticket in my local ticket cache, I can get to the admin UI and perform operations without ever seeing a login screen. The admin UI is definitely usable. 2. If I destroy my Kerberos ticket or it expires, subsequent navigation or operations will produce a username/password login page. 3. If my machine acquires a valid ticket, I can then click on the 'Dashboard' menu item to get away from the login page and back to the dashboard. So in summary, the Admin UI is definitely usable when Kerberos auth is being used. But that said the login/auth page still seems a little BasicAuth-specific, and inappropriate for other auth schemes. Some specific issues: #. We probably shouldn't be displaying {{username}} and {{password}} dialog boxes unless we're sure the user is using a auth scheme where those values make sense (they don't in Kerberos, for example). #. Some other terms on the page also seem a little too Basic Auth specific to be useful for other auth schemes. "Login/Logout" might be examples of this - those terms are rarely used when discussing Kerberos authentication. Not entirely sure on this though. #. It looks like when Kerberos is used, several templated values needed for the auth page are missing, causing UI errors. Not familiar with how the UI works, so I may be off on the cause here. I've attached a screenshot below of the UI errors for the auth page on {{master}} !eventual_auth.png! As for Kerberos/Solr testing, I recently came across a writeup/helper-repo that Ishan put together a year or two ago. If you've got docker installed, it makes setting up and testing Kerberos refreshingly straightforward. Give it a shot if you get a chance: https://github.com/chatman/solr-kerberos-docker > Add Admin UI login support for Kerberos > --- > > Key: SOLR-13116 > URL: https://issues.apache.org/jira/browse/SOLR-13116 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Affects Versions: master (8.0), 7.7 >Reporter: Jan Høydahl >Priority: Major > Attachments: eventual_auth.png > > > Spinoff from SOLR-7896. Kerberos auth plugin should get Admin UI Login > support. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail:
[jira] [Commented] (SOLR-13116) Add Admin UI login support for Kerberos
[ https://issues.apache.org/jira/browse/SOLR-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735833#comment-16735833 ] Jason Gerlowski commented on SOLR-13116: Hey Jan, I just tested your login screen with Kerberos (this includes the changes you made an hour or so ago, to clarify) This is the behavior I'm seeing: 1. With a Kerberos ticket in my local ticket cache, I can get to the admin UI and perform operations without ever seeing a login screen. The admin UI is definitely usable. 2. If I destroy my Kerberos ticket or it expires, subsequent navigation or operations will produce a username/password login page. 3. If my machine acquires a valid ticket, I can then click on the 'Dashboard' menu item to get away from the login page and back to the dashboard. So in summary, the Admin UI is definitely usable when Kerberos auth is being used. But that said the login/auth page still seems a little BasicAuth-specific, and inappropriate for other auth schemes. Some specific issues: #. We probably shouldn't be displaying {{username}} and {{password}} dialog boxes unless we're sure the user is using a auth scheme where those values make sense (they don't in Kerberos, for example). #. Some other terms on the page also seem a little too Basic Auth specific to be useful for other auth schemes. "Login/Logout" might be examples of this - those terms are rarely used when discussing Kerberos authentication. Not entirely sure on this though. #. It looks like when Kerberos is used, several templated values needed for the auth page are missing, causing UI errors. Not familiar with how the UI works, so I may be off on the cause here. I've attached a screenshot below of the UI errors for the auth page on {{master}} !eventual_auth.png! As for Kerberos/Solr testing, I recently came across a writeup/helper-repo that Ishan put together a year or two ago. If you've got docker installed, it makes setting up and testing Kerberos refreshingly straightforward. Give it a shot if you get a chance: https://github.com/chatman/solr-kerberos-docker > Add Admin UI login support for Kerberos > --- > > Key: SOLR-13116 > URL: https://issues.apache.org/jira/browse/SOLR-13116 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Affects Versions: master (8.0), 7.7 >Reporter: Jan Høydahl >Priority: Major > Attachments: eventual_auth.png > > > Spinoff from SOLR-7896. Kerberos auth plugin should get Admin UI Login > support. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-13116) Add Admin UI login support for Kerberos
[ https://issues.apache.org/jira/browse/SOLR-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski updated SOLR-13116: --- Attachment: eventual_auth.png > Add Admin UI login support for Kerberos > --- > > Key: SOLR-13116 > URL: https://issues.apache.org/jira/browse/SOLR-13116 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Affects Versions: master (8.0), 7.7 >Reporter: Jan Høydahl >Priority: Major > Attachments: eventual_auth.png > > > Spinoff from SOLR-7896. Kerberos auth plugin should get Admin UI Login > support. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7896) Add a login page for Solr Administrative Interface
[ https://issues.apache.org/jira/browse/SOLR-7896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski updated SOLR-7896: -- Attachment: eventual_auth.png > Add a login page for Solr Administrative Interface > -- > > Key: SOLR-7896 > URL: https://issues.apache.org/jira/browse/SOLR-7896 > Project: Solr > Issue Type: New Feature > Components: Admin UI, Authentication, security >Affects Versions: 5.2.1 >Reporter: Aaron Greenspan >Assignee: Jan Høydahl >Priority: Major > Labels: authentication, login, password > Fix For: master (8.0), 7.7 > > Attachments: SOLR-7896-bugfix-7jan.patch, > SOLR-7896-bugfix-7jan.patch, dispatchfilter-code.png, eventual_auth.png, > login-page.png, login-screen-2.png, logout.png, unknown_scheme.png > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Now that Solr supports Authentication plugins, the missing piece is to be > allowed access from Admin UI when authentication is enabled. For this we need > * Some plumbing in Admin UI that allows the UI to detect 401 responses and > redirect to login page > * Possibility to have multiple login pages depending on auth method and > redirect to the correct one > * [AngularJS HTTP > interceptors|https://docs.angularjs.org/api/ng/service/$http#interceptors] to > add correct HTTP headers on all requests when user is logged in > This issue should aim to implement some of the plumbing mentioned above, and > make it work with Basic Auth. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-13045) Harden TestSimPolicyCloud
[ https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski resolved SOLR-13045. Resolution: Fixed Fix Version/s: 7.6.1 7.7 master (8.0) > Harden TestSimPolicyCloud > - > > Key: SOLR-13045 > URL: https://issues.apache.org/jira/browse/SOLR-13045 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: master (8.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Fix For: master (8.0), 7.7, 7.6.1 > > Attachments: SOLR-13045.patch, SOLR-13045.patch, jenkins.log.txt.gz > > > Several tests in TestSimPolicyCloud, but especially > {{testCreateCollectionAddReplica}}, have some flaky behavior, even after > Mark's recent test-fix commit. This JIRA covers looking into and (hopefully) > fixing this test failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13045) Harden TestSimPolicyCloud
[ https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732443#comment-16732443 ] Jason Gerlowski commented on SOLR-13045: fucit.org reports zero failures in the past week, so I think we can call this done. I'm going to backport the fixes to branch_7_6 tonight, in case there's interest in a 7.6.1 at some point, and then I'll be closing this out. > Harden TestSimPolicyCloud > - > > Key: SOLR-13045 > URL: https://issues.apache.org/jira/browse/SOLR-13045 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: master (8.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13045.patch, SOLR-13045.patch, jenkins.log.txt.gz > > > Several tests in TestSimPolicyCloud, but especially > {{testCreateCollectionAddReplica}}, have some flaky behavior, even after > Mark's recent test-fix commit. This JIRA covers looking into and (hopefully) > fixing this test failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-13090) Make maxBooleanClauses support system-property override
[ https://issues.apache.org/jira/browse/SOLR-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski resolved SOLR-13090. Resolution: Fixed Fix Version/s: 7.7 master (8.0) > Make maxBooleanClauses support system-property override > --- > > Key: SOLR-13090 > URL: https://issues.apache.org/jira/browse/SOLR-13090 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (8.0), 7.7 >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Minor > Fix For: master (8.0), 7.7 > > Attachments: SOLR-13090.patch > > > Currently, the {{maxBooleanClauses}} property is specified in most > solrconfig's as the hardcoded value "1024". It'd be nice if we changed our > shipped configs so that they instead specified it as > {{${solr.max.booleanClauses:1024} This would maintain the current OOTB behavior (maxBooleanClauses would still > default to 1024) while adding the ability to update maxBooleanClauses values > across the board much more easily. (I see users want to do this often when > they first run up against this limit.) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-6595) Improve error response in case distributed collection cmd fails
[ https://issues.apache.org/jira/browse/SOLR-6595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski reassigned SOLR-6595: - Assignee: (was: Jason Gerlowski) > Improve error response in case distributed collection cmd fails > --- > > Key: SOLR-6595 > URL: https://issues.apache.org/jira/browse/SOLR-6595 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.10 > Environment: SolrCloud with Client SSL >Reporter: Sindre Fiskaa >Priority: Minor > Attachments: SOLR-6595.patch > > > Followed the description > https://cwiki.apache.org/confluence/display/solr/Enabling+SSL and generated a > self signed key pair. Configured a few solr-nodes and used the collection api > to crate a new collection. -I get error message when specify the nodes with > the createNodeSet param. When I don't use createNodeSet param the collection > gets created without error on random nodes. Could this be a bug related to > the createNodeSet param?- *Update: It failed due to what turned out to be > invalid client certificate on the overseer, and returned the following > response:* > {code:xml} > > 0 name="QTime">185 > > org.apache.solr.client.solrj.SolrServerException:IOException occured > when talking to server at: https://vt-searchln04:443/solr > > > {code} > *Update: Three problems:* > # Status=0 when the cmd did not succeed (only ZK was updated, but cores not > created due to failing to connect to shard nodes to talk to core admin API). > # The error printed does not tell which action failed. Would be helpful to > either get the msg from the original exception or at least some message > saying "Failed to create core, see log on Overseer > # State of collection is not clean since it exists as far as ZK is concerned > but cores not created. Thus retrying the CREATECOLLECTION cmd would fail. > Should Overseer detect error in distributed cmds and rollback changes already > made in ZK? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-13038) Overseer actions fail with NoHttpResponseException following a node restart
[ https://issues.apache.org/jira/browse/SOLR-13038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski reassigned SOLR-13038: -- Assignee: (was: Jason Gerlowski) I hope to revisit this soon, but don't have time to focus on it in the immediate future. So I'm removing myself as the assignee. I still think this is an important issue to fix though, as it's a continuing contributor to test flakiness, as well as production behavior. > Overseer actions fail with NoHttpResponseException following a node restart > --- > > Key: SOLR-13038 > URL: https://issues.apache.org/jira/browse/SOLR-13038 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: master (8.0) >Reporter: Jason Gerlowski >Priority: Major > Attachments: SOLR-13038.patch > > > I noticed recently that a lot of overseer operations fail if they're executed > right after a restart of a Solr node. The failure returns a message like > "org.apache.solr.client.solrj.SolrServerException:IOException occured when > talking to server at: https://127.0.0.1:62253/solr;. The logs are a bit more > helpful: > {code} > org.apache.solr.client.solrj.SolrServerException: IOException occured when > talking to server at: https://127.0.0.1:62253/solr > at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:657) > ~[java/:?] > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255) > ~[java/:?] > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244) > ~[java/:?] > at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1260) > ~[java/:?] > at > org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:172) > ~[java/:?] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_172] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[?:1.8.0_172] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_172] > at > com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176) > ~[metrics-core-3.2.6.jar:3.2.6] > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) > ~[java/:?] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_172] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [?:1.8.0_172] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172] > Caused by: org.apache.http.NoHttpResponseException: 127.0.0.1:62253 failed to > respond > at > org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141) > ~[httpclient-4.5.6.jar:4.5.6] > at > org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) > ~[httpclient-4.5.6.jar:4.5.6] > at > org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) > ~[httpcore-4.4.10.jar:4.4.10] > at > org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) > ~[httpcore-4.4.10.jar:4.4.10] > at > org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165) > ~[httpclient-4.5.6.jar:4.5.6] > at > org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) > ~[httpcore-4.4.10.jar:4.4.10] > at > org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) > ~[httpcore-4.4.10.jar:4.4.10] > at > org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:120) > ~[java/:?] > at > org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) > ~[httpclient-4.5.6.jar:4.5.6] > at > org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) > ~[httpclient-4.5.6.jar:4.5.6] > at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) > ~[httpclient-4.5.6.jar:4.5.6] > at > org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) > ~[httpclient-4.5.6.jar:4.5.6] > at > org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) > ~[httpclient-4.5.6.jar:4.5.6] > at > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) > ~[httpclient-4.5.6.jar:4.5.6] > at > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) > ~[httpclient-4.5.6.jar:4.5.6] >
[jira] [Commented] (SOLR-6595) Improve error response in case distributed collection cmd fails
[ https://issues.apache.org/jira/browse/SOLR-6595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732042#comment-16732042 ] Jason Gerlowski commented on SOLR-6595: --- I'm not going to have much time in the immediate future to finish this up, so I wanted to summarize the progress so far: - the latest patch sets the "status" property to 500 when the "failure" list is present and non-empty - because of this, SolrJ will now throw exceptions in failure cases where it previously allowed the request to fail silently. This causes some tests to fail that were passing (incorrectly) before. I investigated a few examples of this, and most were in test setup/cleanup when the expectations were a bit off. There weren't a ton of these failures though and they should be simpler to debug thanks to other recent test flakiness improvements. - I investigated making changes to SolrJ that would attach a NamedList to SolrExceptions thrown because of a 500, but didn't pursue that too far. It's probably a separate JIRA anyways. > Improve error response in case distributed collection cmd fails > --- > > Key: SOLR-6595 > URL: https://issues.apache.org/jira/browse/SOLR-6595 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.10 > Environment: SolrCloud with Client SSL >Reporter: Sindre Fiskaa >Assignee: Jason Gerlowski >Priority: Minor > Attachments: SOLR-6595.patch > > > Followed the description > https://cwiki.apache.org/confluence/display/solr/Enabling+SSL and generated a > self signed key pair. Configured a few solr-nodes and used the collection api > to crate a new collection. -I get error message when specify the nodes with > the createNodeSet param. When I don't use createNodeSet param the collection > gets created without error on random nodes. Could this be a bug related to > the createNodeSet param?- *Update: It failed due to what turned out to be > invalid client certificate on the overseer, and returned the following > response:* > {code:xml} > > 0 name="QTime">185 > > org.apache.solr.client.solrj.SolrServerException:IOException occured > when talking to server at: https://vt-searchln04:443/solr > > > {code} > *Update: Three problems:* > # Status=0 when the cmd did not succeed (only ZK was updated, but cores not > created due to failing to connect to shard nodes to talk to core admin API). > # The error printed does not tell which action failed. Would be helpful to > either get the msg from the original exception or at least some message > saying "Failed to create core, see log on Overseer > # State of collection is not clean since it exists as far as ZK is concerned > but cores not created. Thus retrying the CREATECOLLECTION cmd would fail. > Should Overseer detect error in distributed cmds and rollback changes already > made in ZK? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13037) Harden TestSimGenericDistributedQueue.
[ https://issues.apache.org/jira/browse/SOLR-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726913#comment-16726913 ] Jason Gerlowski commented on SOLR-13037: fucit.org hasn't shown any {{branch_7x}} or {{master}} failures for this test since the fix went in last week. So I'm going to mark this as closed. (There are a few branch_7_6 failures, which makes sense since the fix hasn't gone to that branch. I'm happy to add the fix to that branch as well if anyone wants it, but my understanding is that we don't normally do this unless unless the fix is for a production-bug. It might make it marginally easier for anyone cutting a theoretical 7.6.1 to get passing builds, which was apparently a serious problem with 7.6. So I've got mixed feelings, but will hold off for now.) > Harden TestSimGenericDistributedQueue. > -- > > Key: SOLR-13037 > URL: https://issues.apache.org/jira/browse/SOLR-13037 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Reporter: Mark Miller >Assignee: Jason Gerlowski >Priority: Major > Fix For: master (8.0), 7.7 > > Attachments: SOLR-13037.patch, repro-log.txt > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-13037) Harden TestSimGenericDistributedQueue.
[ https://issues.apache.org/jira/browse/SOLR-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski resolved SOLR-13037. Resolution: Fixed Fix Version/s: 7.7 master (8.0) > Harden TestSimGenericDistributedQueue. > -- > > Key: SOLR-13037 > URL: https://issues.apache.org/jira/browse/SOLR-13037 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Reporter: Mark Miller >Assignee: Jason Gerlowski >Priority: Major > Fix For: master (8.0), 7.7 > > Attachments: SOLR-13037.patch, repro-log.txt > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-13090) Make maxBooleanClauses support system-property override
[ https://issues.apache.org/jira/browse/SOLR-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski reassigned SOLR-13090: -- Assignee: Jason Gerlowski > Make maxBooleanClauses support system-property override > --- > > Key: SOLR-13090 > URL: https://issues.apache.org/jira/browse/SOLR-13090 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (8.0), 7.7 >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Minor > > Currently, the {{maxBooleanClauses}} property is specified in most > solrconfig's as the hardcoded value "1024". It'd be nice if we changed our > shipped configs so that they instead specified it as > {{${solr.max.booleanClauses:1024} This would maintain the current OOTB behavior (maxBooleanClauses would still > default to 1024) while adding the ability to update maxBooleanClauses values > across the board much more easily. (I see users want to do this often when > they first run up against this limit.) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-13090) Make maxBooleanClauses support system-property override
Jason Gerlowski created SOLR-13090: -- Summary: Make maxBooleanClauses support system-property override Key: SOLR-13090 URL: https://issues.apache.org/jira/browse/SOLR-13090 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Affects Versions: master (8.0), 7.7 Reporter: Jason Gerlowski Currently, the {{maxBooleanClauses}} property is specified in most solrconfig's as the hardcoded value "1024". It'd be nice if we changed our shipped configs so that they instead specified it as {{${solr.max.booleanClauses:1024}
[jira] [Commented] (SOLR-13045) Harden TestSimPolicyCloud
[ https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725884#comment-16725884 ] Jason Gerlowski commented on SOLR-13045: One of the remaining failures for in TestSimPolicyCloud occurs in {{testCreateCollectionAddShardUsingPolicy}} when the initial collection creation (and subsequent shard creation) seem to violate a policy which specifies that all replicas should be created on the same node. After looking closer, it looks like this comes down to a race condition of sorts between two threads attempting to set the autoscaling.json "ZK" node. Two different threads touch the autoscaling config node in this test: the OverseerTriggerThread tries to set the default nodeAdded trigger, and the test code tries to set a policy that the test relies on. These threads rely on optimistic concurrency versioning to ensure that updates don't clobber one another. But SimDistribStateManager has a bug which prevents this from working correctly all the time. The initial node version in the sim framework is -1, which is also the flag used to indicate "I don't care about concurrency, just overwrite the node". (For comparison, ZkDistribStateManager has node versions start at 0). Depending on timing, this causes the default nodeAdded trigger to clobber the policy that our test relies on, causing it to fail. So one fix that'll make this test (and probably others in the sim framework) more reliable is to ensure that SimDistribStateManager's node-versioning lines up better with ZkDistribStateManager's. Or at least that it avoids this -1 edge case. I've been testing variations of a patch to accomplish this, and will upload my results shortly. > Harden TestSimPolicyCloud > - > > Key: SOLR-13045 > URL: https://issues.apache.org/jira/browse/SOLR-13045 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: master (8.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13045.patch, SOLR-13045.patch, jenkins.log.txt.gz > > > Several tests in TestSimPolicyCloud, but especially > {{testCreateCollectionAddReplica}}, have some flaky behavior, even after > Mark's recent test-fix commit. This JIRA covers looking into and (hopefully) > fixing this test failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13045) Harden TestSimPolicyCloud
[ https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724116#comment-16724116 ] Jason Gerlowski commented on SOLR-13045: Checking back in a week later. The work above has cut down the failure rate from 5% to maybe 1-2%, but there's still issues with this test. Attaching a jenkins log containing a current failure from 2 days ago. (Don't want to lose the log when it cycles out of fucit). At first glance, the failure looks like it happens because a replica is created on the wrong node (contrary to a specified policy). Starting to look into things now. > Harden TestSimPolicyCloud > - > > Key: SOLR-13045 > URL: https://issues.apache.org/jira/browse/SOLR-13045 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: master (8.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13045.patch, SOLR-13045.patch > > > Several tests in TestSimPolicyCloud, but especially > {{testCreateCollectionAddReplica}}, have some flaky behavior, even after > Mark's recent test-fix commit. This JIRA covers looking into and (hopefully) > fixing this test failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-13045) Harden TestSimPolicyCloud
[ https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski updated SOLR-13045: --- Attachment: jenkins.log.txt.gz > Harden TestSimPolicyCloud > - > > Key: SOLR-13045 > URL: https://issues.apache.org/jira/browse/SOLR-13045 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: master (8.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13045.patch, SOLR-13045.patch, jenkins.log.txt.gz > > > Several tests in TestSimPolicyCloud, but especially > {{testCreateCollectionAddReplica}}, have some flaky behavior, even after > Mark's recent test-fix commit. This JIRA covers looking into and (hopefully) > fixing this test failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-13078) Harden TestSimNodeAddedTrigger
[ https://issues.apache.org/jira/browse/SOLR-13078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski reassigned SOLR-13078: -- Assignee: Jason Gerlowski > Harden TestSimNodeAddedTrigger > -- > > Key: SOLR-13078 > URL: https://issues.apache.org/jira/browse/SOLR-13078 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > > Jenkins has been failing occasionally with issues in TestSimNodeAddedTrigger. > We should look into these and make it pass more reliably. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-13078) Harden TestSimNodeAddedTrigger
Jason Gerlowski created SOLR-13078: -- Summary: Harden TestSimNodeAddedTrigger Key: SOLR-13078 URL: https://issues.apache.org/jira/browse/SOLR-13078 Project: Solr Issue Type: Test Security Level: Public (Default Security Level. Issues are Public) Reporter: Jason Gerlowski Jenkins has been failing occasionally with issues in TestSimNodeAddedTrigger. We should look into these and make it pass more reliably. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-13042) Miscellaneous JSON Facet API docs improvements
[ https://issues.apache.org/jira/browse/SOLR-13042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski updated SOLR-13042: --- Attachment: SOLR-13042.patch > Miscellaneous JSON Facet API docs improvements > -- > > Key: SOLR-13042 > URL: https://issues.apache.org/jira/browse/SOLR-13042 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation >Affects Versions: 7.5, master (8.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Minor > Attachments: SOLR-13042.patch, SOLR-13042.patch > > > While working on SOLR-12965 I noticed a few minor issues with the JSON > faceting ref-guide pages. Nothing serious, just a few annoyances. Tweaks > include: > * missing/insufficient description of some params for Heatmap facets > * Weird formatting on "Domain Filters" example > * missing "fields"/"fl" in the "Parameters Mapping" table > Figured I'd just create a JIRA and fix these before I forgot about them -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13065) Harden TestSimExecuteActionPlan
[ https://issues.apache.org/jira/browse/SOLR-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719443#comment-16719443 ] Jason Gerlowski commented on SOLR-13065: When I disable SimClusterStateProvider's caching, the error disappears in a beast run of {{-Dbeast.iters=400 -Dtests.dupes=30 -Dtests.iters=20}}, which implies that the cluster state caching is the only issue, and we'll need to follow a similar fix to SOLR-13045. > Harden TestSimExecuteActionPlan > --- > > Key: SOLR-13065 > URL: https://issues.apache.org/jira/browse/SOLR-13065 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (8.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > > TestSimExecuteActionPlan is a serial offender in our failed Jenkins jobs. > Would like to look into improving it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13065) Harden TestSimExecuteActionPlan
[ https://issues.apache.org/jira/browse/SOLR-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719267#comment-16719267 ] Jason Gerlowski commented on SOLR-13065: At first glance, this looks like a similar problem to what I recently saw in SOLR-13045. The test fails in a {{waitForState}} block, but there's some indication that we're using an outdated (cached?) copy of the clusterstatus info. Here's a partial stack from a recent failure I got: {code} [beaster] 2> NOTE: reproduce with: ant test -Dtestcase=TestSimExecutePlanAction -Dtests.method=testIntegration -Dtests.seed=18902C9108C137F1 -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=es-GT -Dtests.timezone=Asia/Rangoon -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [beaster] 2> 24745 INFO (simCloudManagerPool-112-thread-8) [] o.a.s.c.CloudTestUtils -- wrong number of active replicas in slice shard1, expected=1, found=2 [beaster] [12:26:46.105] FAILURE 2.13s | TestSimExecutePlanAction.testIntegration {seed=[18902C9108C137F1:7163CC06353074F9]} <<< [beaster]> Throwable #1: java.lang.AssertionError: Timed out waiting for replicas of collection to be 2 again [beaster]> Live Nodes: [127.0.0.1:10016_solr] [beaster]> Last available state: DocCollection(testIntegration//clusterstate.json/444)={ ... [beaster]> at __randomizedtesting.SeedInfo.seed([18902C9108C137F1:7163CC06353074F9]:0) [beaster]> at org.apache.solr.cloud.CloudTestUtils.waitForState(CloudTestUtils.java:70) [beaster]> at org.apache.solr.cloud.autoscaling.sim.TestSimExecutePlanAction.testIntegration(TestSimExecutePlanAction.java:200 ... [beaster]> Caused by: java.util.concurrent.TimeoutException: last ClusterState: znodeVersion: 445 {code} Note the different reported "last" clusterstate versions. We see that there's a clusterstate.json version 445, but the failing assertion only has 444. That's not to say definitively that version 445 would pass the assertion, but it's a place to start. > Harden TestSimExecuteActionPlan > --- > > Key: SOLR-13065 > URL: https://issues.apache.org/jira/browse/SOLR-13065 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (8.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > > TestSimExecuteActionPlan is a serial offender in our failed Jenkins jobs. > Would like to look into improving it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-13065) Harden TestSimExecuteActionPlan
Jason Gerlowski created SOLR-13065: -- Summary: Harden TestSimExecuteActionPlan Key: SOLR-13065 URL: https://issues.apache.org/jira/browse/SOLR-13065 Project: Solr Issue Type: Test Security Level: Public (Default Security Level. Issues are Public) Affects Versions: master (8.0) Reporter: Jason Gerlowski Assignee: Jason Gerlowski TestSimExecuteActionPlan is a serial offender in our failed Jenkins jobs. Would like to look into improving it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13037) Harden TestSimGenericDistributedQueue.
[ https://issues.apache.org/jira/browse/SOLR-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719102#comment-16719102 ] Jason Gerlowski commented on SOLR-13037: I've attached a patch which takes approach #2 above. With it, I haven't seen any GDQ test failures, though I'll be more confident with more beasting. Will run some tests in the background the rest of today and then commit tonight if things still look good. > Harden TestSimGenericDistributedQueue. > -- > > Key: SOLR-13037 > URL: https://issues.apache.org/jira/browse/SOLR-13037 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Reporter: Mark Miller >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13037.patch, repro-log.txt > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-13037) Harden TestSimGenericDistributedQueue.
[ https://issues.apache.org/jira/browse/SOLR-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski updated SOLR-13037: --- Attachment: SOLR-13037.patch > Harden TestSimGenericDistributedQueue. > -- > > Key: SOLR-13037 > URL: https://issues.apache.org/jira/browse/SOLR-13037 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Reporter: Mark Miller >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13037.patch, repro-log.txt > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-13037) Harden TestSimGenericDistributedQueue.
[ https://issues.apache.org/jira/browse/SOLR-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719000#comment-16719000 ] Jason Gerlowski edited comment on SOLR-13037 at 12/12/18 1:59 PM: -- To (hopefully) explain things a little more clearly, here's the race condition I think we're running into here. There's a few sections of {{TestSimGenericDistributedQueue}} that seem to fail, but let's zoom in on one in particular. Check out TestSimDistributedQueue lines 73-74: {code} (new QueueChangerThread(dq,1000)).start(); assertNotNull(dq.peek(15000)); {code} This test code has two threads of interest. The QueueChangerThread we see created here will sleep for one second, and then insert data into the queue. Meanwhile the main test thread will wait for some data to be inserted into the queue. Our queue-reading waits a pretty generous amount of time for things to enter the queue, so the insert should always finish in time and the read should always pick it up. Some more detail on the operation of each queue operation happens. First the queue-write (i.e. {{offer()}}): - [Acquire lock 'multilock'|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L461] - [Create queue entry node and attach it to parent|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L324] - [Wake up any threads sleeping on the 'changed' Condition|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L593] - [Release lock 'multilock'|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L465] - [Set data for queue entry|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L468] Now the queue-read. Queue-reading works off of a cache of "known queue entries" and most queue-reads are handled from there. But the test failure only occurs when we need to refresh this cache and read straight from ZK, so I'll skip the cache logic here. - [Acquire lock 'updateLock'|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L186] - [loop until we're out of time to wait:|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L189] ** [look for an element and return if non-null|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L190] ** [sleep until we receive a wakeup from 'changed' Condition or we time out.|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L194] - [Release lock 'updateLock'.|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L198] There's a problem with the queue-write code above. We wake up threads after creating the queue-entry, but before it's fully initialized with its data. This opens the door to readers seeing the data before it's fully ready and going back to sleep. The 'changed' signalling has already happened, so any readers that see the data too early will go back to sleep and not wake up again until timeout. There's a few ways we can fix this: - we could add a `changed.signalAll()` call at the end of {{offer()}}, to ensure that there's at least 1 wakeup after the data has been fully added. - we can alter the flow of SimDistribStateManager.createData so that the node is only attached to the tree after its data has been fully initialized - we could register a Watcher that triggers on "data-changed", similar to how we already trigger a watcher on "child-added" I think the second option is probably the "right" fix here, so I'll pursue that unless others have other opinions. was (Author: gerlowskija): To (hopefully) explain things a little more clearly, here's the race condition I think we're running into here. There's a few sections of {{TestSimGenericDistributedQueue}} that seem to fail, but let's zoom in on one in particular. Check out TestSimDistributedQueue lines 73-74: {code} (new QueueChangerThread(dq,1000)).start();
[jira] [Comment Edited] (SOLR-13037) Harden TestSimGenericDistributedQueue.
[ https://issues.apache.org/jira/browse/SOLR-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719000#comment-16719000 ] Jason Gerlowski edited comment on SOLR-13037 at 12/12/18 1:58 PM: -- To (hopefully) explain things a little more clearly, here's the race condition I think we're running into here. There's a few sections of {{TestSimGenericDistributedQueue}} that seem to fail, but let's zoom in on one in particular. Check out TestSimDistributedQueue lines 73-74: {code} (new QueueChangerThread(dq,1000)).start(); assertNotNull(dq.peek(15000)); {code} This test code has two threads of interest. The QueueChangerThread we see created here will sleep for one second, and then insert data into the queue. Meanwhile the main test thread will wait for some data to be inserted into the queue. Our queue-reading waits a pretty generous amount of time for things to enter the queue, so the insert should always finish in time and the read should always pick it up. Some more detail on the operation of each queue operation happens. First the queue-write (i.e. {{offer()}}): - [Acquire lock 'multilock'|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L461] - [Create queue entry node and attach it to parent|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L324] - [Wake up any threads sleeping on the 'changed' Condition|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L593] - [Release lock 'multilock'|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L465] - [Set data for queue entry|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L468] Now the queue-read. Queue-reading works off of a cache of "known queue entries" and most queue-reads are handled from there. But the test failure only occurs when we need to refresh this cache and read straight from ZK, so I'll skip the cache logic here. - [Acquire lock 'updateLock'|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L186] - [loop until we're out of time to wait:|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L189] ** [look for an element and return if non-null|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L190] ** [sleep until we receive a wakeup from 'changed' Condition or we time out.|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L194] - [Release lock 'updateLock'.|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L198] There's a problem with the queue-write code above. We wake up threads after creating the queue-entry, but before it's fully initialized with its data. This opens the door to readers seeing the data before it's fully ready and going back to sleep. The 'changed' signalling has already happened, so any readers that see the data too early will go back to sleep and not wake up again until timeout. There's a few ways we can fix this: - we could add a `changed.signalAll()` call at the end of {{offer()}}, to ensure that there's at least 1 wakeup after the data has been fully added. - we can alter the flow of SimDistribStateManager.createData so that the node is only attached to the tree after its data has been fully initialized - we could register a Watcher that triggers on "data-changed", similar to how we already trigger a watcher on "child-added" I think the second option is probably the "right" fix here, so I'll pursue that unless others have other opinions. was (Author: gerlowskija): To (hopefully) explain things a little more clearly, here's the race condition I think we're running into here. There's a few sections of {{TestSimGenericDistributedQueue}} that seem to fail, but let's zoom in on one in particular. Check out TestSimDistributedQueue lines 73-74: {code} (new QueueChangerThread(dq,1000)).start();
[jira] [Comment Edited] (SOLR-13037) Harden TestSimGenericDistributedQueue.
[ https://issues.apache.org/jira/browse/SOLR-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719000#comment-16719000 ] Jason Gerlowski edited comment on SOLR-13037 at 12/12/18 1:57 PM: -- To (hopefully) explain things a little more clearly, here's the race condition I think we're running into here. There's a few sections of {{TestSimGenericDistributedQueue}} that seem to fail, but let's zoom in on one in particular. Check out TestSimDistributedQueue lines 73-74: {code} (new QueueChangerThread(dq,1000)).start(); assertNotNull(dq.peek(15000)); {code} This test code has two threads of interest. The QueueChangerThread we see created here will sleep for one second, and then insert data into the queue. Meanwhile the main test thread will wait for some data to be inserted into the queue. Our queue-reading waits a pretty generous amount of time for things to enter the queue, so the insert should always finish in time and the read should always pick it up. Some more detail on the operation of each queue operation happens. First the queue-write (i.e. {{offer()}}): - [Acquire lock 'multilock'|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L461] - [Create queue entry node and attach it to parent|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L324] - [Wake up any threads sleeping on the 'changed' Condition|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L593] - [Release lock 'multilock'|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L465] - [Set data for queue entry|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L468] Now the queue-read. Queue-reading works off of a cache of "known queue entries" and most queue-reads are handled from there. But the test failure only occurs when we need to refresh this cache and read straight from ZK, so I'll skip the cache logic here. - [Acquire lock 'updateLock'|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L186] - [loop until we're out of time to wait:|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L189] ** [look for an element and return if non-null|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L190] ** [sleep until we receive a wakeup from 'changed' Condition or we time out.|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L194] - [Release lock 'updateLock'.|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L198] There's a problem with the queue-write code above. We wake up threads after creating the queue-entry, but before it's fully initialized with its data. This opens the door to readers seeing the data before it's fully ready and going back to sleep. The 'changed' signalling has already happened, so any readers that see the data too early will go back to sleep and not wake up again until timeout. There's a few ways we can fix this: - we could add a `changed.signalAll()` call at the end of {{offer()}}, to ensure that there's at least 1 wakeup after the data has been fully added. - we can alter the flow of SimDistribStateManager.createData so that the node is only attached to the tree after its data has been fully initialized - we could register a Watcher that triggers on "data-changed", similar to how we already trigger a watcher on "child-added" was (Author: gerlowskija): To (hopefully) explain things a little more clearly, here's the race condition I think we're running into here. There's a few sections of {{TestSimGenericDistributedQueue}} that seem to fail, but let's zoom in on one in particular. Check out TestSimDistributedQueue lines 73-74: (code} (new QueueChangerThread(dq,1000)).start(); assertNotNull(dq.peek(15000)); {code} This test code has two threads of interest. The QueueChangerThread we see
[jira] [Commented] (SOLR-13037) Harden TestSimGenericDistributedQueue.
[ https://issues.apache.org/jira/browse/SOLR-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719000#comment-16719000 ] Jason Gerlowski commented on SOLR-13037: To (hopefully) explain things a little more clearly, here's the race condition I think we're running into here. There's a few sections of {{TestSimGenericDistributedQueue}} that seem to fail, but let's zoom in on one in particular. Check out TestSimDistributedQueue lines 73-74: (code} (new QueueChangerThread(dq,1000)).start(); assertNotNull(dq.peek(15000)); {code} This test code has two threads of interest. The QueueChangerThread we see created here will sleep for one second, and then insert data into the queue. Meanwhile the main test thread will wait for some data to be inserted into the queue. Our queue-reading waits a pretty generous amount of time for things to enter the queue, so the insert should always finish in time and the read should always pick it up. Some more detail on the operation of each queue operation happens. First the queue-write (i.e. {{offer()}}): - [Acquire lock 'multilock'|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L461] - [Create queue entry node and attach it to parent|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L324] - [Wake up any threads sleeping on the 'changed' Condition|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L593] - [Release lock 'multilock'|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L465] - [Set data for queue entry|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L468] Now the queue-read. Queue-reading works off of a cache of "known queue entries" and most queue-reads are handled from there. But the test failure only occurs when we need to refresh this cache and read straight from ZK, so I'll skip the cache logic here. - [Acquire lock 'updateLock'|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L186] - [loop until we're out of time to wait:|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L189] ** [look for an element and return if non-null|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L190] ** [sleep until we receive a wakeup from 'changed' Condition or we time out.|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L194] - [Release lock 'updateLock'.|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L198] There's a problem with the queue-write code above. We wake up threads after creating the queue-entry, but before it's fully initialized with its data. This opens the door to readers seeing the data before it's fully ready and going back to sleep. The 'changed' signalling has already happened, so any readers that see the data too early will go back to sleep and not wake up again until timeout. There's a few ways we can fix this: - we could add a `changed.signalAll()` call at the end of {{offer()}}, to ensure that there's at least 1 wakeup after the data has been fully added. - we can alter the flow of SimDistribStateManager.createData so that the node is only attached to the tree after its data has been fully initialized - we could register a Watcher that triggers on "data-changed", similar to how we already trigger a watcher on "child-added" > Harden TestSimGenericDistributedQueue. > -- > > Key: SOLR-13037 > URL: https://issues.apache.org/jira/browse/SOLR-13037 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Reporter: Mark Miller >Assignee: Jason Gerlowski >Priority: Major > Attachments: repro-log.txt > > -- This message was sent by
[jira] [Updated] (SOLR-13037) Harden TestSimGenericDistributedQueue.
[ https://issues.apache.org/jira/browse/SOLR-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski updated SOLR-13037: --- Attachment: repro-log.txt > Harden TestSimGenericDistributedQueue. > -- > > Key: SOLR-13037 > URL: https://issues.apache.org/jira/browse/SOLR-13037 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Reporter: Mark Miller >Assignee: Jason Gerlowski >Priority: Major > Attachments: repro-log.txt > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13037) Harden TestSimGenericDistributedQueue.
[ https://issues.apache.org/jira/browse/SOLR-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718885#comment-16718885 ] Jason Gerlowski commented on SOLR-13037: I've attached a log file which shows the race condition that causes this to occur. Most of this logging is custom, but it should still be helpful for others trying to understand the problem. > Harden TestSimGenericDistributedQueue. > -- > > Key: SOLR-13037 > URL: https://issues.apache.org/jira/browse/SOLR-13037 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Reporter: Mark Miller >Assignee: Jason Gerlowski >Priority: Major > Attachments: repro-log.txt > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-13037) Harden TestSimGenericDistributedQueue.
[ https://issues.apache.org/jira/browse/SOLR-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski reassigned SOLR-13037: -- Assignee: Jason Gerlowski > Harden TestSimGenericDistributedQueue. > -- > > Key: SOLR-13037 > URL: https://issues.apache.org/jira/browse/SOLR-13037 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Reporter: Mark Miller >Assignee: Jason Gerlowski >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13045) Harden TestSimPolicyCloud
[ https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716891#comment-16716891 ] Jason Gerlowski commented on SOLR-13045: Committed this fix to master and branch_7x. In testing, it looked like it also cleared up issues in TestSimExtremeIndexing. So maybe we'll get a fix there for 'free'. I'll keep this open for the next week to check for failures, but I'll close it if things looks good after that. > Harden TestSimPolicyCloud > - > > Key: SOLR-13045 > URL: https://issues.apache.org/jira/browse/SOLR-13045 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: master (8.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13045.patch, SOLR-13045.patch > > > Several tests in TestSimPolicyCloud, but especially > {{testCreateCollectionAddReplica}}, have some flaky behavior, even after > Mark's recent test-fix commit. This JIRA covers looking into and (hopefully) > fixing this test failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-13045) Harden TestSimPolicyCloud
[ https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski updated SOLR-13045: --- Attachment: SOLR-13045.patch > Harden TestSimPolicyCloud > - > > Key: SOLR-13045 > URL: https://issues.apache.org/jira/browse/SOLR-13045 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: master (8.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13045.patch, SOLR-13045.patch > > > Several tests in TestSimPolicyCloud, but especially > {{testCreateCollectionAddReplica}}, have some flaky behavior, even after > Mark's recent test-fix commit. This JIRA covers looking into and (hopefully) > fixing this test failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13045) Harden TestSimPolicyCloud
[ https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715236#comment-16715236 ] Jason Gerlowski commented on SOLR-13045: I found another bug where SimCloudManager was setting the "nodeRole" property as a single-valued-Set, instead of just giving it a String value. This causes things to blow up later when a cast to {{String}} fails. Attached patch includes a small fix for that. > Harden TestSimPolicyCloud > - > > Key: SOLR-13045 > URL: https://issues.apache.org/jira/browse/SOLR-13045 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: master (8.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13045.patch, SOLR-13045.patch > > > Several tests in TestSimPolicyCloud, but especially > {{testCreateCollectionAddReplica}}, have some flaky behavior, even after > Mark's recent test-fix commit. This JIRA covers looking into and (hopefully) > fixing this test failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13042) Miscellaneous JSON Facet API docs improvements
[ https://issues.apache.org/jira/browse/SOLR-13042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714973#comment-16714973 ] Jason Gerlowski commented on SOLR-13042: I was able to work on this some over the weekend and wanted to upload my progress for preliminary review. I've made most of the structural changes I mentioned above, including pulling the various types of "domain changes" out into their own ref-guide page. I'm still undecided whether "JSON Faceting" should become a sub-page of the "JSON Request API" or not. Much of my time was spent adding SolrJ snippets to the pages. I've finished this on the "Request API" page, and on the "Query DSL" page. Still need to go through that effort on the "Facet API" page and the "Domain Changes" page (new to this patch). I also changed all of the examples in these pages over to using the "techproducts" exampledocs. This will make it easier for readers to try out the examples themselves. (The existing "books" corpus isn't hard to setup, but it's not as easy as {{bin/solr start -e techproducts, so)}} I'd say this patch is 80% of what I wanted to change on these pages. Most of the work remaining is additional SolrJ examples and polish/wording tweaks. Would love some feedback if anyone has opinions or time to read through things. > Miscellaneous JSON Facet API docs improvements > -- > > Key: SOLR-13042 > URL: https://issues.apache.org/jira/browse/SOLR-13042 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation >Affects Versions: 7.5, master (8.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Minor > Attachments: SOLR-13042.patch > > > While working on SOLR-12965 I noticed a few minor issues with the JSON > faceting ref-guide pages. Nothing serious, just a few annoyances. Tweaks > include: > * missing/insufficient description of some params for Heatmap facets > * Weird formatting on "Domain Filters" example > * missing "fields"/"fl" in the "Parameters Mapping" table > Figured I'd just create a JIRA and fix these before I forgot about them -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-13042) Miscellaneous JSON Facet API docs improvements
[ https://issues.apache.org/jira/browse/SOLR-13042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski updated SOLR-13042: --- Attachment: SOLR-13042.patch > Miscellaneous JSON Facet API docs improvements > -- > > Key: SOLR-13042 > URL: https://issues.apache.org/jira/browse/SOLR-13042 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation >Affects Versions: 7.5, master (8.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Minor > Attachments: SOLR-13042.patch > > > While working on SOLR-12965 I noticed a few minor issues with the JSON > faceting ref-guide pages. Nothing serious, just a few annoyances. Tweaks > include: > * missing/insufficient description of some params for Heatmap facets > * Weird formatting on "Domain Filters" example > * missing "fields"/"fl" in the "Parameters Mapping" table > Figured I'd just create a JIRA and fix these before I forgot about them -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13045) Harden TestSimPolicyCloud
[ https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713056#comment-16713056 ] Jason Gerlowski commented on SOLR-13045: I've attached a proposed fix for this. With this, all tests in {{TestSimPolicyCloud}} looked good. Ran them ~5000 times. Gonna do some beast runs to trigger things that way, but otherwise things look good here. > Harden TestSimPolicyCloud > - > > Key: SOLR-13045 > URL: https://issues.apache.org/jira/browse/SOLR-13045 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: master (8.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13045.patch > > > Several tests in TestSimPolicyCloud, but especially > {{testCreateCollectionAddReplica}}, have some flaky behavior, even after > Mark's recent test-fix commit. This JIRA covers looking into and (hopefully) > fixing this test failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-13045) Harden TestSimPolicyCloud
[ https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski updated SOLR-13045: --- Attachment: SOLR-13045.patch > Harden TestSimPolicyCloud > - > > Key: SOLR-13045 > URL: https://issues.apache.org/jira/browse/SOLR-13045 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: master (8.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13045.patch > > > Several tests in TestSimPolicyCloud, but especially > {{testCreateCollectionAddReplica}}, have some flaky behavior, even after > Mark's recent test-fix commit. This JIRA covers looking into and (hopefully) > fixing this test failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-13045) Harden TestSimPolicyCloud
[ https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski reassigned SOLR-13045: -- Assignee: Jason Gerlowski > Harden TestSimPolicyCloud > - > > Key: SOLR-13045 > URL: https://issues.apache.org/jira/browse/SOLR-13045 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: master (8.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > > Several tests in TestSimPolicyCloud, but especially > {{testCreateCollectionAddReplica}}, have some flaky behavior, even after > Mark's recent test-fix commit. This JIRA covers looking into and (hopefully) > fixing this test failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13045) Harden TestSimPolicyCloud
[ https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713023#comment-16713023 ] Jason Gerlowski commented on SOLR-13045: I believe I found the race condition causing these failures. It looks like an issue between the {{waitForState}} polling, which occurs in the main test thread, and the leader-election execution, which occurs in a {{Future}} submitted to {{SimCloudManager}}'s ExecutorService. The {{waitForState}} thread repeatedly asks for the cluster state, which looks a bit like this: * [return cached value, if any. Otherwise continue|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimClusterStateProvider.java#L2090] * [Grab lock|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimClusterStateProvider.java#L2093] * [Clear cache|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimClusterStateProvider.java#L2094] * [Build Map to store in cache|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimClusterStateProvider.java#L2126] * [Set cache with Map|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimClusterStateProvider.java#L2141] * [Release lock|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimClusterStateProvider.java#L2144] The Leader Election Future looks a bit like this: * [Give a ReplicaInfo "leader=true"|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimClusterStateProvider.java#L756] * [Clear cache|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimClusterStateProvider.java#L766] Note that the leader election Future does this without acquiring the lock. Now imagine the following interleaving of these two threads: * [Thread-Test] Grab lock * [Thread-Test] Clear cache * [Thread-Test] Build Map to store in cache * [Thread-LeaderElection] Give ReplicaInfo "leader=true" * [Thread-LeaderElection] Clear cache * [Thread-Test] Set cache with Map At the end of this interleaving the cache has a value that's missing the latest "leader=true" changes, and nothing will ever clear it. So the {{waitForState}} polling will go on to fail. We should be able to fix this by having the leader election code use the same Lock used elsewhere. I've actually got this change staged locally and am running tests on it currently. If all looks well I should have this uploaded soon. One thing I'll be curious to see is whether this affects any of the other TestSim* failures we've seen recently. If we're lucky we may get 2 (or more) birds with this one stone. > Harden TestSimPolicyCloud > - > > Key: SOLR-13045 > URL: https://issues.apache.org/jira/browse/SOLR-13045 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: master (8.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > > Several tests in TestSimPolicyCloud, but especially > {{testCreateCollectionAddReplica}}, have some flaky behavior, even after > Mark's recent test-fix commit. This JIRA covers looking into and (hopefully) > fixing this test failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-13045) Harden TestSimPolicyCloud
[ https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712293#comment-16712293 ] Jason Gerlowski edited comment on SOLR-13045 at 12/7/18 3:21 AM: - Looking at {{testCreateCollectionAddReplica}} first. I'm still in the early stages of looking into this, but I think I see some things pointing to this being a sim-framework issue, as opposed to being a production problem. I'm not super familiar with the sim-framework though, so I'll try and give some detail here in case anyone with more context can correct me and save me from a potential red-herring. *TL;DR* I believe this to be a test-framework bug related to how the SimClusterStateProvider caches clusterstate values. The test starts by creating a collection using a specific policy. Maybe 1 time in 10 it'll fail in a {{CloudTestUtils.waitForState}} call. On these failures, this {{waitForState}} call fails because the collection (supposedly) doesn't have a leader: {code} last coll state: DocCollection(testCreateCollectionAddReplica//clusterstate.json/5)={ "replicationFactor":"1", "pullReplicas":"0", "router":{"name":"compositeId"}, "maxShardsPerNode":"1", "autoAddReplicas":"false", "nrtReplicas":"1", "tlogReplicas":"0", "autoCreated":"true", "policy":"c1", "shards":{"shard1":{ "replicas":{"core_node1":{ "core":"testCreateCollectionAddReplica_shard1_replica_n1", "SEARCHER.searcher.maxDoc":0, "SEARCHER.searcher.deletedDocs":0, "INDEX.sizeInBytes":10240, "node_name":"127.0.0.1:10068_solr", "state":"active", "type":"NRT", "INDEX.sizeInGB":9.5367431640625E-6, "SEARCHER.searcher.numDocs":0}}, "range":"8000-7fff", "state":"active"}}} {code} But other statements in the logs indicate that this collection *does* have a leader. We get this series of messages right as the test ends: {code} 14445 INFO (TEST-TestSimPolicyCloud.testCreateCollectionAddReplica-seed#[6FE5447E15D3DD6F]) [] o.a.s.SolrTestCaseJ4 ###Ending testCreateCollectionAddReplica 14446 DEBUG (TEST-TestSimPolicyCloud.testCreateCollectionAddReplica-seed#[6FE5447E15D3DD6F]) [] o.a.s.c.a.s.SimClusterStateProvider ** creating new collection states, currentVersion=6 14446 INFO (TEST-TestSimPolicyCloud.testCreateCollectionAddReplica-seed#[6FE5447E15D3DD6F]) [] o.a.s.c.a.s.SimClusterStateProvider JEGERLOW: Saving clusterstate 14446 DEBUG (TEST-TestSimPolicyCloud.testCreateCollectionAddReplica-seed#[6FE5447E15D3DD6F]) [] o.a.s.c.a.s.SimClusterStateProvider ** saved cluster state version 6 14446 INFO (TEST-TestSimPolicyCloud.testCreateCollectionAddReplica-seed#[6FE5447E15D3DD6F]) [] o.a.s.c.a.s.SimSolrCloudTestCase ### CLUSTER STATE ### ## Live nodes: 2 ## Empty nodes: 1 ## Dead nodes: 0 ## Collections: ## * testCreateCollectionAddReplica ##shardsTotal 1 ##shardsState {active=1} ## shardsWithoutLeader 0 {code} One thing that stands out to me are the different clusterstate versions in play here. The log snippets above show information from {{/clusterstate.json/5}}, and {{/clusterstate.json/6}} respectively. I looked into {{SimClusterStateProvider}} and noticed that it caches the cluster state locally (see [here|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimClusterStateProvider.java#L2086]) and warns readers that the cache must be explicitly cleared before new changes become visible. With this caching temporarily disabled the test failure disappeared. (Or at least, I couldn't trigger it in 2000 runs). I suspect that the test failure is caused by either (1) some codepath not properly clearing/resetting this clusterstate cache, or (2) a subtler synchronization bug in how this cache is locked down. was (Author: gerlowskija): Looking at {{testCreateCollectionAddReplica}} first. I'm still in the early stages of looking into this, but I think I see some things pointing to this being a sim-framework issue, as opposed to being a production problem. I'm not super familiar with the sim-framework though, so I'll try and give some detail here in case anyone with more context can correct me and save me from a potential red-herring. *TL;DR* I believe this to be a test-framework bug related to how the SimClusterStateProvider caches clusterstate values. The test starts by creating a collection using a specific policy. Maybe 1 time in 10 it'll fail in a {{CloudTestUtils.waitForState}} call. On these failures, this {{waitForState}} call fails because the collection (supposedly) doesn't have a leader: {code} last coll state:
[jira] [Commented] (SOLR-13045) Harden TestSimPolicyCloud
[ https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712293#comment-16712293 ] Jason Gerlowski commented on SOLR-13045: Looking at {{testCreateCollectionAddReplica}} first. I'm still in the early stages of looking into this, but I think I see some things pointing to this being a sim-framework issue, as opposed to being a production problem. I'm not super familiar with the sim-framework though, so I'll try and give some detail here in case anyone with more context can correct me and save me from a potential red-herring. *TL;DR* I believe this to be a test-framework bug related to how the SimClusterStateProvider caches clusterstate values. The test starts by creating a collection using a specific policy. Maybe 1 time in 10 it'll fail in a {{CloudTestUtils.waitForState}} call. On these failures, this {{waitForState}} call fails because the collection (supposedly) doesn't have a leader: {code} last coll state: DocCollection(testCreateCollectionAddReplica//clusterstate.json/5)={ "replicationFactor":"1", "pullReplicas":"0", "router":{"name":"compositeId"}, "maxShardsPerNode":"1", "autoAddReplicas":"false", "nrtReplicas":"1", "tlogReplicas":"0", "autoCreated":"true", "policy":"c1", "shards":{"shard1":{ "replicas":{"core_node1":{ "core":"testCreateCollectionAddReplica_shard1_replica_n1", "SEARCHER.searcher.maxDoc":0, "SEARCHER.searcher.deletedDocs":0, "INDEX.sizeInBytes":10240, "node_name":"127.0.0.1:10068_solr", "state":"active", "type":"NRT", "INDEX.sizeInGB":9.5367431640625E-6, "SEARCHER.searcher.numDocs":0}}, "range":"8000-7fff", "state":"active"}}} {code} But other statements in the logs indicate that this collection *does* have a leader. We get this series of messages right as the test ends: {code} 14445 INFO (TEST-TestSimPolicyCloud.testCreateCollectionAddReplica-seed#[6FE5447E15D3DD6F]) [] o.a.s.SolrTestCaseJ4 ###Ending testCreateCollectionAddReplica 14446 DEBUG (TEST-TestSimPolicyCloud.testCreateCollectionAddReplica-seed#[6FE5447E15D3DD6F]) [] o.a.s.c.a.s.SimClusterStateProvider ** creating new collection states, currentVersion=6 14446 INFO (TEST-TestSimPolicyCloud.testCreateCollectionAddReplica-seed#[6FE5447E15D3DD6F]) [] o.a.s.c.a.s.SimClusterStateProvider JEGERLOW: Saving clusterstate 14446 DEBUG (TEST-TestSimPolicyCloud.testCreateCollectionAddReplica-seed#[6FE5447E15D3DD6F]) [] o.a.s.c.a.s.SimClusterStateProvider ** saved cluster state version 6 14446 INFO (TEST-TestSimPolicyCloud.testCreateCollectionAddReplica-seed#[6FE5447E15D3DD6F]) [] o.a.s.c.a.s.SimSolrCloudTestCase ### CLUSTER STATE ### ## Live nodes: 2 ## Empty nodes: 1 ## Dead nodes: 0 ## Collections: ## * testCreateCollectionAddReplica ##shardsTotal 1 ##shardsState {active=1} ## shardsWithoutLeader 0 {code} One thing that stands out to me are the different clusterstate versions in play here. The log snippets above show information from {{/clusterstate.json/5}}, and {{/clusterstate.json/6}} respectively. I looked into {{SimClusterStateProvider}} and noticed that it caches the cluster state locally (see [here|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimClusterStateProvider.java#L2086] and warns readers that the cache must be explicitly cleared before new changes become visible. With this caching temporarily disabled the test failure disappeared. (Or at least, I couldn't trigger it in 2000 runs). I suspect that the test failure is caused by either (1) some codepath not properly clearing/resetting this clusterstate cache, or (2) a subtler synchronization bug in how this cache is locked down. > Harden TestSimPolicyCloud > - > > Key: SOLR-13045 > URL: https://issues.apache.org/jira/browse/SOLR-13045 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: master (8.0) >Reporter: Jason Gerlowski >Priority: Major > > Several tests in TestSimPolicyCloud, but especially > {{testCreateCollectionAddReplica}}, have some flaky behavior, even after > Mark's recent test-fix commit. This JIRA covers looking into and (hopefully) > fixing this test failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail:
[jira] [Created] (SOLR-13045) Harden TestSimPolicyCloud
Jason Gerlowski created SOLR-13045: -- Summary: Harden TestSimPolicyCloud Key: SOLR-13045 URL: https://issues.apache.org/jira/browse/SOLR-13045 Project: Solr Issue Type: Test Security Level: Public (Default Security Level. Issues are Public) Components: AutoScaling Affects Versions: master (8.0) Reporter: Jason Gerlowski Several tests in TestSimPolicyCloud, but especially {{testCreateCollectionAddReplica}}, have some flaky behavior, even after Mark's recent test-fix commit. This JIRA covers looking into and (hopefully) fixing this test failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13042) Miscellaneous JSON Facet API docs improvements
[ https://issues.apache.org/jira/browse/SOLR-13042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710524#comment-16710524 ] Jason Gerlowski commented on SOLR-13042: Though I only had small changes in mind initially, as I looked at these pages I think they could use a bit of a larger overhaul. I think there's a good few things that could be improved. Some of these are more general: * add quotes to JSON snippets so that examples can be pasted into other editors or shared in other contexts without causing syntax highlighters to flare up. * Change comments in existing JSON snippets to "callouts", so that they aren't included when the snippets get copy/pasted. * Add corresponding SolrJ snippets for facet/query examples where possible and not already present Some are specific to individual pages and tend to be a bit more structural: *Json-Request-API Page* * remove the "Error Detection" section, or move it somewhere else where it fits better * move "Debugging" section out from under "Smart Merging of Multiple JSON Parameters", since it applies just as much to "Param Substitution" and "Passing Params" * since we already have a descendant page for the querying syntax, would it make sense to move the JSON faceting page so it is also a descendant of "Json Request API"? *Json-Query-DSL Page* * give it a little more explanation on why you would tag a query *Json-Facet-API Page* * removed the "design goals "section, as the doesn't seem appropriate for a rest guide * reverse the metrics example in the bucket in example to Match the order they are introduced in * get rid of the "making a faceting request" section and update all examples to have the appropriate curl header. * Move the "Noggit"/"Json extensions" section over to the JSON top-level page since it applies to querying as well as faceting * move terms, heatmap, range, etc under a "Types of Facets" top level section * there's probably enough uses and examples of changing facet domains for it to be its own page. Would that work? * Remove the "references" section that has the links to Yonik's personal site. I'm not sure where the ref-guide comes down on external links to blogs, etc. in general. I'm not against it. But here I dislike it because those links are already out of date in slight ways and that will only get worse as JSON faceting develops further. > Miscellaneous JSON Facet API docs improvements > -- > > Key: SOLR-13042 > URL: https://issues.apache.org/jira/browse/SOLR-13042 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation >Affects Versions: 7.5, master (8.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Minor > > While working on SOLR-12965 I noticed a few minor issues with the JSON > faceting ref-guide pages. Nothing serious, just a few annoyances. Tweaks > include: > * missing/insufficient description of some params for Heatmap facets > * Weird formatting on "Domain Filters" example > * missing "fields"/"fl" in the "Parameters Mapping" table > Figured I'd just create a JIRA and fix these before I forgot about them -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-13042) Miscellaneous JSON Facet API docs improvements
[ https://issues.apache.org/jira/browse/SOLR-13042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski updated SOLR-13042: --- Description: While working on SOLR-12965 I noticed a few minor issues with the JSON faceting ref-guide pages. Nothing serious, just a few annoyances. Tweaks include: * missing/insufficient description of some params for Heatmap facets * Weird formatting on "Domain Filters" example * missing "fields"/"fl" in the "Parameters Mapping" table Figured I'd just create a JIRA and fix these before I forgot about them was: While working on SOLR-12965 I noticed a few minor issues with the JSON faceting ref-guide page. Nothing serious, just a few annoyances. Tweaks include: * missing/insufficient description of some params for Heatmap facets * Weird formatting on "Domain Filters" example * missing "fields"/"fl" in the "Parameters Mapping" table Figured I'd just create a JIRA and fix these before I forgot about them > Miscellaneous JSON Facet API docs improvements > -- > > Key: SOLR-13042 > URL: https://issues.apache.org/jira/browse/SOLR-13042 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation >Affects Versions: 7.5, master (8.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Minor > > While working on SOLR-12965 I noticed a few minor issues with the JSON > faceting ref-guide pages. Nothing serious, just a few annoyances. Tweaks > include: > * missing/insufficient description of some params for Heatmap facets > * Weird formatting on "Domain Filters" example > * missing "fields"/"fl" in the "Parameters Mapping" table > Figured I'd just create a JIRA and fix these before I forgot about them -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-13042) Miscellaneous JSON Facet API docs improvements
Jason Gerlowski created SOLR-13042: -- Summary: Miscellaneous JSON Facet API docs improvements Key: SOLR-13042 URL: https://issues.apache.org/jira/browse/SOLR-13042 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: documentation Affects Versions: 7.5, master (8.0) Reporter: Jason Gerlowski Assignee: Jason Gerlowski While working on SOLR-12965 I noticed a few minor issues with the JSON faceting ref-guide page. Nothing serious, just a few annoyances. Tweaks include: * missing/insufficient description of some params for Heatmap facets * Weird formatting on "Domain Filters" example * missing "fields"/"fl" in the "Parameters Mapping" table Figured I'd just create a JIRA and fix these before I forgot about them -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9492) Request status API returns a completed status even if the collection API call failed
[ https://issues.apache.org/jira/browse/SOLR-9492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708939#comment-16708939 ] Jason Gerlowski commented on SOLR-9492: --- I'd like to rectify this if it's still an issue, but haven't been able to reproduce the issue yet. I can get SPLITSHARD to fail in a few different ways, but none produce the "status=completed" that Shalin mentions in his example above. It's possible that Steve's fix on SOLR-5970 fixed this issue for us. Assuming the problem still exists and I'm just not creative enough to reproduce it successfully, I've got a pretty good guess where the problem lies. The overseer's {{OverseerCollectionMessageHandler}} has a {{processResponses}} method which is invoked several times to check for errors while executing subtasks within a SPLITSHARD request. The SPLITSHARD code tells {{processResponses}} to abort on error (by throwing a SolrException), but the logic that does this only checks for an "exception" field in the response namedList. This is sufficient for a lot of error cases, but Solr's APIs don't consistently return "exceptions" fields on all error cases. If one of these responses is returned, we'll log the error under the "failure" map, but never abort the splitshard request. > Request status API returns a completed status even if the collection API call > failed > > > Key: SOLR-9492 > URL: https://issues.apache.org/jira/browse/SOLR-9492 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 5.5.2, 6.2 >Reporter: Shalin Shekhar Mangar >Priority: Major > Labels: difficulty-medium, impact-high > Fix For: 6.7, 7.0 > > > A failed split shard response is: > {code} > {success={127.0.0.1:43245_hfnp%2Fbq={responseHeader={status=0,QTime=2}},127.0.0.1:43245_hfnp%2Fbq={responseHeader={status=0,QTime=0}},127.0.0.1:43245_hfnp%2Fbq={responseHeader={status=0,QTime=0}},127.0.0.1:43245_hfnp%2Fbq={responseHeader={status=0,QTime=0}},127.0.0.1:43245_hfnp%2Fbq={responseHeader={status=0,QTime=0}},127.0.0.1:43245_hfnp%2Fbq={responseHeader={status=0,QTime=0}},127.0.0.1:43245_hfnp%2Fbq={responseHeader={status=0,QTime=0}},127.0.0.1:50948_hfnp%2Fbq={responseHeader={status=0,QTime=0}}},c32001ed-3bca-4ae0-baae-25a3c99e35e65883644576126044={responseHeader={status=0,QTime=0},STATUS=completed,Response=TaskId: > c32001ed-3bca-4ae0-baae-25a3c99e35e65883644576126044 webapp=null > path=/admin/cores > params={async=c32001ed-3bca-4ae0-baae-25a3c99e35e65883644576126044=/admin/cores=conf1=collection1_shard1_0_replica1=CREATE=collection1=shard1_0=javabin=2} > status=0 > QTime=2},c32001ed-3bca-4ae0-baae-25a3c99e35e65883647597130004={responseHeader={status=0,QTime=0},STATUS=completed,Response=TaskId: > c32001ed-3bca-4ae0-baae-25a3c99e35e65883647597130004 webapp=null > path=/admin/cores > params={async=c32001ed-3bca-4ae0-baae-25a3c99e35e65883647597130004=/admin/cores=conf1=collection1_shard1_1_replica1=CREATE=collection1=shard1_1=javabin=2} > status=0 > QTime=0},c32001ed-3bca-4ae0-baae-25a3c99e35e65883649607943904={responseHeader={status=0,QTime=0},STATUS=completed,Response=TaskId: > c32001ed-3bca-4ae0-baae-25a3c99e35e65883649607943904 webapp=null > path=/admin/cores > params={nodeName=127.0.0.1:43245_hfnp%252Fbq=collection1_shard1_1_replica1=c32001ed-3bca-4ae0-baae-25a3c99e35e65883649607943904=/admin/cores=core_node6=PREPRECOVERY=true=active=true=javabin=2} > status=0 > QTime=0},c32001ed-3bca-4ae0-baae-25a3c99e35e65883649612565003={responseHeader={status=0,QTime=0},STATUS=completed,Response=TaskId: > c32001ed-3bca-4ae0-baae-25a3c99e35e65883649612565003 webapp=null > path=/admin/cores > params={core=collection1=c32001ed-3bca-4ae0-baae-25a3c99e35e65883649612565003=/admin/cores=SPLIT=collection1_shard1_0_replica1=collection1_shard1_1_replica1=javabin=2} > status=0 > QTime=0},c32001ed-3bca-4ae0-baae-25a3c99e35e65883650618358632={responseHeader={status=0,QTime=0},STATUS=completed,Response=TaskId: > c32001ed-3bca-4ae0-baae-25a3c99e35e65883650618358632 webapp=null > path=/admin/cores > params={async=c32001ed-3bca-4ae0-baae-25a3c99e35e65883650618358632=/admin/cores=collection1_shard1_1_replica1=REQUESTAPPLYUPDATES=javabin=2} > status=0 > QTime=0},c32001ed-3bca-4ae0-baae-25a3c99e35e65883650636428900={responseHeader={status=0,QTime=0},STATUS=completed,Response=TaskId: > c32001ed-3bca-4ae0-baae-25a3c99e35e65883650636428900 webapp=null > path=/admin/cores > params={async=c32001ed-3bca-4ae0-baae-25a3c99e35e65883650636428900=/admin/cores=conf1=collection1_shard1_0_replica0=CREATE=collection1=shard1_0=javabin=2} > status=0 >
[jira] [Resolved] (SOLR-13019) Fix typo in MailEntityProcessor.java
[ https://issues.apache.org/jira/browse/SOLR-13019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski resolved SOLR-13019. Resolution: Fixed Fix Version/s: master (8.0) > Fix typo in MailEntityProcessor.java > > > Key: SOLR-13019 > URL: https://issues.apache.org/jira/browse/SOLR-13019 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tommy Marshment-Howell >Assignee: Jason Gerlowski >Priority: Trivial > Fix For: master (8.0) > > > https://github.com/apache/lucene-solr/pull/509 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13019) Fix typo in MailEntityProcessor.java
[ https://issues.apache.org/jira/browse/SOLR-13019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708726#comment-16708726 ] Jason Gerlowski commented on SOLR-13019: Thanks for the patch Tommy. Merged and closing. > Fix typo in MailEntityProcessor.java > > > Key: SOLR-13019 > URL: https://issues.apache.org/jira/browse/SOLR-13019 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tommy Marshment-Howell >Assignee: Jason Gerlowski >Priority: Trivial > Fix For: master (8.0) > > > https://github.com/apache/lucene-solr/pull/509 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-13019) Fix typo in MailEntityProcessor.java
[ https://issues.apache.org/jira/browse/SOLR-13019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski reassigned SOLR-13019: -- Assignee: Jason Gerlowski > Fix typo in MailEntityProcessor.java > > > Key: SOLR-13019 > URL: https://issues.apache.org/jira/browse/SOLR-13019 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tommy Marshment-Howell >Assignee: Jason Gerlowski >Priority: Trivial > > https://github.com/apache/lucene-solr/pull/509 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-13027) Harden LeaderTragicEventTest.
[ https://issues.apache.org/jira/browse/SOLR-13027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708690#comment-16708690 ] Jason Gerlowski edited comment on SOLR-13027 at 12/4/18 1:23 PM: - Looks like you added an empty/useless if-statement [here|https://github.com/apache/lucene-solr/blob/33c40a8da40677f43ea377ca0cb2a1def8649c52/solr/solrj/src/java/org/apache/solr/client/solrj/impl/SolrClientNodeStateProvider.java#L310]. Assuming that was an accident? Also, I noticed elsewhere that the cleanup command in LeaderTragicEventTest [here|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/LeaderTragicEventTest.java#L193] often/always ( ? ) fails due to a race condition where the overseer doesn't get rid of connections orphaned/closed by the Jetty restart. We ask the overseer to delete a collection for us and it fails because it tries to use these old connections. (You helped me out on this yesterday offline actually, though I don't think I mentioned this test by name at the time.). Anyway, this cleanup failure doesn't typically cause test failures due to a different bug altogether (SOLR-6595), but if you're beasting you might see the incomplete cleanup cause issues so I wanted to mention it. (See SOLR-13038 for more details if you're interested, or willing to chime in) was (Author: gerlowskija): Looks like you added an empty/useless if-statement [here|https://github.com/apache/lucene-solr/blob/33c40a8da40677f43ea377ca0cb2a1def8649c52/solr/solrj/src/java/org/apache/solr/client/solrj/impl/SolrClientNodeStateProvider.java#L310]. Assuming that was an accident? Also, I noticed elsewhere that the cleanup command in LeaderTragicEventTest [here|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/LeaderTragicEventTest.java#L193] often/always ( ? ) fails due to a race condition where the overseer doesn't get rid of connections orphaned/closed by the Jetty restart. We ask the overseer to delete a collection for us and it fails because it tries to use these old connections. (You helped me out on this yesterday offline actually, though I don't think I mentioned this test by name at the time.). Anyway, this cleanup failure doesn't typically cause test failures due to a different bug altogether (SOLR-6595), but if you're beasting you might see the incomplete cleanup cause issues so I wanted to mention it. > Harden LeaderTragicEventTest. > - > > Key: SOLR-13027 > URL: https://issues.apache.org/jira/browse/SOLR-13027 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Reporter: Mark Miller >Assignee: Mark Miller >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13027) Harden LeaderTragicEventTest.
[ https://issues.apache.org/jira/browse/SOLR-13027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708690#comment-16708690 ] Jason Gerlowski commented on SOLR-13027: Looks like you added an empty/useless if-statement [here|https://github.com/apache/lucene-solr/blob/33c40a8da40677f43ea377ca0cb2a1def8649c52/solr/solrj/src/java/org/apache/solr/client/solrj/impl/SolrClientNodeStateProvider.java#L310]. Assuming that was an accident? Also, I noticed elsewhere that the cleanup command in LeaderTragicEventTest [here|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/LeaderTragicEventTest.java#L193] often/always(?) fails due to a race condition where the overseer doesn't get rid of connections orphaned/closed by the Jetty restart. We ask the overseer to delete a collection for us and it fails because it tries to use these old connections. (You helped me out on this yesterday offline actually, though I don't think I mentioned this test by name at the time.). Anyway, this cleanup failure doesn't typically cause test failures due to a different bug altogether (SOLR-6595), but if you're beasting you might see the incomplete cleanup cause issues so I wanted to mention it. > Harden LeaderTragicEventTest. > - > > Key: SOLR-13027 > URL: https://issues.apache.org/jira/browse/SOLR-13027 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Reporter: Mark Miller >Assignee: Mark Miller >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-13027) Harden LeaderTragicEventTest.
[ https://issues.apache.org/jira/browse/SOLR-13027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708690#comment-16708690 ] Jason Gerlowski edited comment on SOLR-13027 at 12/4/18 1:20 PM: - Looks like you added an empty/useless if-statement [here|https://github.com/apache/lucene-solr/blob/33c40a8da40677f43ea377ca0cb2a1def8649c52/solr/solrj/src/java/org/apache/solr/client/solrj/impl/SolrClientNodeStateProvider.java#L310]. Assuming that was an accident? Also, I noticed elsewhere that the cleanup command in LeaderTragicEventTest [here|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/LeaderTragicEventTest.java#L193] often/always ( ? ) fails due to a race condition where the overseer doesn't get rid of connections orphaned/closed by the Jetty restart. We ask the overseer to delete a collection for us and it fails because it tries to use these old connections. (You helped me out on this yesterday offline actually, though I don't think I mentioned this test by name at the time.). Anyway, this cleanup failure doesn't typically cause test failures due to a different bug altogether (SOLR-6595), but if you're beasting you might see the incomplete cleanup cause issues so I wanted to mention it. was (Author: gerlowskija): Looks like you added an empty/useless if-statement [here|https://github.com/apache/lucene-solr/blob/33c40a8da40677f43ea377ca0cb2a1def8649c52/solr/solrj/src/java/org/apache/solr/client/solrj/impl/SolrClientNodeStateProvider.java#L310]. Assuming that was an accident? Also, I noticed elsewhere that the cleanup command in LeaderTragicEventTest [here|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/LeaderTragicEventTest.java#L193] often/always(?) fails due to a race condition where the overseer doesn't get rid of connections orphaned/closed by the Jetty restart. We ask the overseer to delete a collection for us and it fails because it tries to use these old connections. (You helped me out on this yesterday offline actually, though I don't think I mentioned this test by name at the time.). Anyway, this cleanup failure doesn't typically cause test failures due to a different bug altogether (SOLR-6595), but if you're beasting you might see the incomplete cleanup cause issues so I wanted to mention it. > Harden LeaderTragicEventTest. > - > > Key: SOLR-13027 > URL: https://issues.apache.org/jira/browse/SOLR-13027 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Reporter: Mark Miller >Assignee: Mark Miller >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12555) Replace try-fail-catch test patterns
[ https://issues.apache.org/jira/browse/SOLR-12555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708138#comment-16708138 ] Jason Gerlowski commented on SOLR-12555: Thanks for review Bar. I committed the resulting patch this past weekend. Will post here if I'm able to bite off a few more packages this week. > Replace try-fail-catch test patterns > > > Key: SOLR-12555 > URL: https://issues.apache.org/jira/browse/SOLR-12555 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Affects Versions: master (8.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Trivial > Attachments: SOLR-12555-sorted-by-package.txt, SOLR-12555.patch, > SOLR-12555.patch, SOLR-12555.txt > > Time Spent: 4h 20m > Remaining Estimate: 0h > > I recently added some test code through SOLR-12427 which used the following > test anti-pattern: > {code} > try { > actionExpectedToThrowException(); > fail("I expected this to throw an exception, but it didn't"); > catch (Exception e) { > assertOnThrownException(e); > } > {code} > Hoss (rightfully) objected that this should instead be written using the > formulation below, which is clearer and more concise. > {code} > SolrException e = expectThrows(() -> {...}); > {code} > We should remove many of these older formulations where it makes sense. Many > of them were written before {{expectThrows}} was introduced, and having the > old style assertions around makes it easier for them to continue creeping in. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13038) Overseer actions fail with NoHttpResponseException following a node restart
[ https://issues.apache.org/jira/browse/SOLR-13038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708133#comment-16708133 ] Jason Gerlowski commented on SOLR-13038: I've attached a strawman patch that adds a very basic retry check into HttpShardHandler. Most of the patch is just plumbing to pass around a "retryable" boolean to where it can be added to {{ShardRequest}}. This plumbing is pretty rough - I wouldn't commit it without finding something a little more elegant - but it's sufficient for showing the change conceptually. Having seen a lot of discussion on prior JIRAs related to this issue, it seems like there's a lot of concern about retrying on this particular error case. To summarize, {{NoHttpResponseException}} is ambiguous - there's no way to tell whether the server received and processed your request or not. So a requirement is that we avoid retrying any non-idempotent requests. That was the main goal in choosing the approach I did for this strawman patch. Each caller of HttpShardHandler can choose whether they're OK with their request being retried, with the default being to not retry. Anyway, curious if people have any thoughts. Oh, one last thing. Also in this patch is an additional assertion to LeaderTragicEventTest that exhibits the problem. It passes with the rest of the patch, but will fail and show the problem when applied on its own. > Overseer actions fail with NoHttpResponseException following a node restart > --- > > Key: SOLR-13038 > URL: https://issues.apache.org/jira/browse/SOLR-13038 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: master (8.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13038.patch > > > I noticed recently that a lot of overseer operations fail if they're executed > right after a restart of a Solr node. The failure returns a message like > "org.apache.solr.client.solrj.SolrServerException:IOException occured when > talking to server at: https://127.0.0.1:62253/solr;. The logs are a bit more > helpful: > {code} > org.apache.solr.client.solrj.SolrServerException: IOException occured when > talking to server at: https://127.0.0.1:62253/solr > at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:657) > ~[java/:?] > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255) > ~[java/:?] > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244) > ~[java/:?] > at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1260) > ~[java/:?] > at > org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:172) > ~[java/:?] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_172] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[?:1.8.0_172] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_172] > at > com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176) > ~[metrics-core-3.2.6.jar:3.2.6] > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) > ~[java/:?] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_172] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [?:1.8.0_172] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172] > Caused by: org.apache.http.NoHttpResponseException: 127.0.0.1:62253 failed to > respond > at > org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141) > ~[httpclient-4.5.6.jar:4.5.6] > at > org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) > ~[httpclient-4.5.6.jar:4.5.6] > at > org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) > ~[httpcore-4.4.10.jar:4.4.10] > at > org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) > ~[httpcore-4.4.10.jar:4.4.10] > at > org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165) > ~[httpclient-4.5.6.jar:4.5.6] > at > org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) > ~[httpcore-4.4.10.jar:4.4.10] > at > org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) > ~[httpcore-4.4.10.jar:4.4.10] > at
[jira] [Updated] (SOLR-13038) Overseer actions fail with NoHttpResponseException following a node restart
[ https://issues.apache.org/jira/browse/SOLR-13038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski updated SOLR-13038: --- Attachment: SOLR-13038.patch > Overseer actions fail with NoHttpResponseException following a node restart > --- > > Key: SOLR-13038 > URL: https://issues.apache.org/jira/browse/SOLR-13038 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: master (8.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13038.patch > > > I noticed recently that a lot of overseer operations fail if they're executed > right after a restart of a Solr node. The failure returns a message like > "org.apache.solr.client.solrj.SolrServerException:IOException occured when > talking to server at: https://127.0.0.1:62253/solr;. The logs are a bit more > helpful: > {code} > org.apache.solr.client.solrj.SolrServerException: IOException occured when > talking to server at: https://127.0.0.1:62253/solr > at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:657) > ~[java/:?] > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255) > ~[java/:?] > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244) > ~[java/:?] > at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1260) > ~[java/:?] > at > org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:172) > ~[java/:?] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_172] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[?:1.8.0_172] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_172] > at > com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176) > ~[metrics-core-3.2.6.jar:3.2.6] > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) > ~[java/:?] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_172] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [?:1.8.0_172] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172] > Caused by: org.apache.http.NoHttpResponseException: 127.0.0.1:62253 failed to > respond > at > org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141) > ~[httpclient-4.5.6.jar:4.5.6] > at > org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) > ~[httpclient-4.5.6.jar:4.5.6] > at > org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) > ~[httpcore-4.4.10.jar:4.4.10] > at > org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) > ~[httpcore-4.4.10.jar:4.4.10] > at > org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165) > ~[httpclient-4.5.6.jar:4.5.6] > at > org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) > ~[httpcore-4.4.10.jar:4.4.10] > at > org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) > ~[httpcore-4.4.10.jar:4.4.10] > at > org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:120) > ~[java/:?] > at > org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) > ~[httpclient-4.5.6.jar:4.5.6] > at > org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) > ~[httpclient-4.5.6.jar:4.5.6] > at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) > ~[httpclient-4.5.6.jar:4.5.6] > at > org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) > ~[httpclient-4.5.6.jar:4.5.6] > at > org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) > ~[httpclient-4.5.6.jar:4.5.6] > at > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) > ~[httpclient-4.5.6.jar:4.5.6] > at > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) > ~[httpclient-4.5.6.jar:4.5.6] > at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:542) > ~[java/:?] > ... 12 more > {code} > After a bit of debugging I was able to confirm the problem: when some > non-overseer node gets restarted,
[jira] [Commented] (SOLR-13038) Overseer actions fail with NoHttpResponseException following a node restart
[ https://issues.apache.org/jira/browse/SOLR-13038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707814#comment-16707814 ] Jason Gerlowski commented on SOLR-13038: You can reproduce this behavior pretty regularly with the JUnit test below that uses SolrCloudTestCase as its base: {code} @Test public void testOtherReplicasAreNotActive() throws Exception { final String collection = "collection1"; CollectionAdminRequest .createCollection(collection, "config", 1, 2) .process(cluster.getSolrClient()); cluster.waitForActiveCollection(collection, 1, 2); Slice shard = getCollectionState(collection).getSlice("shard1"); JettySolrRunner otherReplicaJetty = cluster.getReplicaJetty(getNonLeader(shard)); otherReplicaJetty.stop(); cluster.waitForJettyToStop(otherReplicaJetty); waitForState("Timeout waiting for replica get down", collection, (liveNodes, collectionState) -> getNonLeader(collectionState.getSlice("shard1")).getState() != Replica.State.ACTIVE); otherReplicaJetty.start(); cluster.waitForNode(otherReplicaJetty, 30); waitForState("Timeout waiting for replica get up", collection, (liveNodes, collectionState) -> getNonLeader(collectionState.getSlice("shard1")).getState() == Replica.State.ACTIVE); CollectionAdminResponse response = CollectionAdminRequest.deleteCollection(collection).process(cluster.getSolrClient()); assertNull("Expected collection-delete to fully succeed", response.getResponse().get("failure")); } {code} > Overseer actions fail with NoHttpResponseException following a node restart > --- > > Key: SOLR-13038 > URL: https://issues.apache.org/jira/browse/SOLR-13038 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: master (8.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > > I noticed recently that a lot of overseer operations fail if they're executed > right after a restart of a Solr node. The failure returns a message like > "org.apache.solr.client.solrj.SolrServerException:IOException occured when > talking to server at: https://127.0.0.1:62253/solr;. The logs are a bit more > helpful: > {code} > org.apache.solr.client.solrj.SolrServerException: IOException occured when > talking to server at: https://127.0.0.1:62253/solr > at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:657) > ~[java/:?] > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255) > ~[java/:?] > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244) > ~[java/:?] > at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1260) > ~[java/:?] > at > org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:172) > ~[java/:?] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_172] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[?:1.8.0_172] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_172] > at > com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176) > ~[metrics-core-3.2.6.jar:3.2.6] > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) > ~[java/:?] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_172] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [?:1.8.0_172] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172] > Caused by: org.apache.http.NoHttpResponseException: 127.0.0.1:62253 failed to > respond > at > org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141) > ~[httpclient-4.5.6.jar:4.5.6] > at > org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) > ~[httpclient-4.5.6.jar:4.5.6] > at > org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) > ~[httpcore-4.4.10.jar:4.4.10] > at > org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) > ~[httpcore-4.4.10.jar:4.4.10] > at > org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165) > ~[httpclient-4.5.6.jar:4.5.6] > at > org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) > ~[httpcore-4.4.10.jar:4.4.10] > at >
[jira] [Created] (SOLR-13038) Overseer actions fail with NoHttpResponseException following a node restart
Jason Gerlowski created SOLR-13038: -- Summary: Overseer actions fail with NoHttpResponseException following a node restart Key: SOLR-13038 URL: https://issues.apache.org/jira/browse/SOLR-13038 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: SolrCloud Affects Versions: master (8.0) Reporter: Jason Gerlowski Assignee: Jason Gerlowski I noticed recently that a lot of overseer operations fail if they're executed right after a restart of a Solr node. The failure returns a message like "org.apache.solr.client.solrj.SolrServerException:IOException occured when talking to server at: https://127.0.0.1:62253/solr;. The logs are a bit more helpful: {code} org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: https://127.0.0.1:62253/solr at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:657) ~[java/:?] at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255) ~[java/:?] at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244) ~[java/:?] at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1260) ~[java/:?] at org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:172) ~[java/:?] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_172] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_172] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_172] at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176) ~[metrics-core-3.2.6.jar:3.2.6] at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) ~[java/:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_172] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_172] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172] Caused by: org.apache.http.NoHttpResponseException: 127.0.0.1:62253 failed to respond at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) ~[httpcore-4.4.10.jar:4.4.10] at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) ~[httpcore-4.4.10.jar:4.4.10] at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) ~[httpcore-4.4.10.jar:4.4.10] at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) ~[httpcore-4.4.10.jar:4.4.10] at org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:120) ~[java/:?] at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:542) ~[java/:?] ... 12 more {code} After a bit of debugging I was able to confirm the problem: when some non-overseer node gets restarted, the overseer never notices that its connections are invalid and will try to reuse them for subsequent requests that happen right after the restart. There's a few ways we might be able to tackle this: * we could look at adding logic to {{SolrHttpRequestRetryHandler}} to retry when this happens. SHRRH already retries NoHttpResponseException generally, but has other logic which prevents any retries on collection/core-admin APIs. Maybe we could elaborate this a bit. * we
[jira] [Resolved] (SOLR-6117) Replication command=fetchindex always return success.
[ https://issues.apache.org/jira/browse/SOLR-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski resolved SOLR-6117. --- Resolution: Fixed Fix Version/s: 7.7 master (8.0) Marking this as 'Fixed' for 8.0 and 7.7. To summarize/clarify, the fixes on {{master}} and {{branch_7x}} are a little different based on the need to avoid potentially breaking changes on 7x. The 7x changes only go far enough to fix the bug where we return a "success" status even when the request fails. The master changes do this, as well as correcting a few inconsistencies between the different error cases. > Replication command=fetchindex always return success. > - > > Key: SOLR-6117 > URL: https://issues.apache.org/jira/browse/SOLR-6117 > Project: Solr > Issue Type: Bug > Components: replication (java) >Affects Versions: 4.6, 7.5 >Reporter: Raintung Li >Assignee: Jason Gerlowski >Priority: Major > Fix For: master (8.0), 7.7 > > Attachments: SOLR-6117-master.patch, SOLR-6117.patch, > SOLR-6117.patch, SOLR-6117.patch, SOLR-6117.txt > > > Replication API command=fetchindex do fetch the index. while occur the error, > still give success response. > API should return the right status, especially WAIT parameter is > true.(synchronous). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6595) Improve error response in case distributed collection cmd fails
[ https://issues.apache.org/jira/browse/SOLR-6595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702292#comment-16702292 ] Jason Gerlowski commented on SOLR-6595: --- Thinking aloud here, and I guess also soliciting feedback. The current patch sets 500 as the value for the "status' property, as well as the HTTP status code on the response. The expectation in most other places seems to be that the "status" property matches the HTTP status code. So this seems like the technically correct thing to do from an API perspective. There's is a downside to this though- SolrJ converts non-200 responses into exceptions. So while the failure information is still in the response, SolrJ users can't get at it. (This isn't strictly true...SolrJ tries its best to come up with a good exception message by looking for properties like "error" and "failure". But that's a pale substitute to giving users access to the response itself if they want it). It'd be cool if SolrJ users could access the original response in exceptional cases. Maybe we should attach the parsed NamedList to RemoteSolrExceptions that get thrown by SolrJ. That seems like a separate JIRA, but wanted to raise it here since it bears on these response changes indirectly. > Improve error response in case distributed collection cmd fails > --- > > Key: SOLR-6595 > URL: https://issues.apache.org/jira/browse/SOLR-6595 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.10 > Environment: SolrCloud with Client SSL >Reporter: Sindre Fiskaa >Assignee: Jason Gerlowski >Priority: Minor > Attachments: SOLR-6595.patch > > > Followed the description > https://cwiki.apache.org/confluence/display/solr/Enabling+SSL and generated a > self signed key pair. Configured a few solr-nodes and used the collection api > to crate a new collection. -I get error message when specify the nodes with > the createNodeSet param. When I don't use createNodeSet param the collection > gets created without error on random nodes. Could this be a bug related to > the createNodeSet param?- *Update: It failed due to what turned out to be > invalid client certificate on the overseer, and returned the following > response:* > {code:xml} > > 0 name="QTime">185 > > org.apache.solr.client.solrj.SolrServerException:IOException occured > when talking to server at: https://vt-searchln04:443/solr > > > {code} > *Update: Three problems:* > # Status=0 when the cmd did not succeed (only ZK was updated, but cores not > created due to failing to connect to shard nodes to talk to core admin API). > # The error printed does not tell which action failed. Would be helpful to > either get the msg from the original exception or at least some message > saying "Failed to create core, see log on Overseer > # State of collection is not clean since it exists as far as ZK is concerned > but cores not created. Thus retrying the CREATECOLLECTION cmd would fail. > Should Overseer detect error in distributed cmds and rollback changes already > made in ZK? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6117) Replication command=fetchindex always return success.
[ https://issues.apache.org/jira/browse/SOLR-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16701957#comment-16701957 ] Jason Gerlowski commented on SOLR-6117: --- Attached an updated patch that's intended for the master branch, and thus has liberty to do more to make the various responses from the /replication API more uniform. This version of the patch addresses all of the bullet points in my previous comment. Haven't run tests more generally yet, but I hope to commit to master in the next week or so. One thing I forgot to clarify in my previous comment: both of these patches address _all_ subcommands in the /replication API (not just "fetchindex") That was a point of discussion in the original effort on this JIRA, so just thought I'd clarify. > Replication command=fetchindex always return success. > - > > Key: SOLR-6117 > URL: https://issues.apache.org/jira/browse/SOLR-6117 > Project: Solr > Issue Type: Bug > Components: replication (java) >Affects Versions: 4.6, 7.5 >Reporter: Raintung Li >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-6117-master.patch, SOLR-6117.patch, > SOLR-6117.patch, SOLR-6117.patch, SOLR-6117.txt > > > Replication API command=fetchindex do fetch the index. while occur the error, > still give success response. > API should return the right status, especially WAIT parameter is > true.(synchronous). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6117) Replication command=fetchindex always return success.
[ https://issues.apache.org/jira/browse/SOLR-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski updated SOLR-6117: -- Attachment: SOLR-6117-master.patch > Replication command=fetchindex always return success. > - > > Key: SOLR-6117 > URL: https://issues.apache.org/jira/browse/SOLR-6117 > Project: Solr > Issue Type: Bug > Components: replication (java) >Affects Versions: 4.6, 7.5 >Reporter: Raintung Li >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-6117-master.patch, SOLR-6117.patch, > SOLR-6117.patch, SOLR-6117.patch, SOLR-6117.txt > > > Replication API command=fetchindex do fetch the index. while occur the error, > still give success response. > API should return the right status, especially WAIT parameter is > true.(synchronous). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6117) Replication command=fetchindex always return success.
[ https://issues.apache.org/jira/browse/SOLR-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski updated SOLR-6117: -- Affects Version/s: 7.5 > Replication command=fetchindex always return success. > - > > Key: SOLR-6117 > URL: https://issues.apache.org/jira/browse/SOLR-6117 > Project: Solr > Issue Type: Bug > Components: replication (java) >Affects Versions: 4.6, 7.5 >Reporter: Raintung Li >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-6117.patch, SOLR-6117.patch, SOLR-6117.patch, > SOLR-6117.txt > > > Replication API command=fetchindex do fetch the index. while occur the error, > still give success response. > API should return the right status, especially WAIT parameter is > true.(synchronous). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6117) Replication command=fetchindex always return success.
[ https://issues.apache.org/jira/browse/SOLR-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700791#comment-16700791 ] Jason Gerlowski commented on SOLR-6117: --- Most recent attached patch is a slight update of Shalin's. I'd hoped to add a lot more tests with this that trigger the various failure conditions, but it's hard to reproduce many of them via JUnit. Also looked at adding unit tests for ReplicationHandler directly, but it relies heavily on SolrCore, which is final which makes mocking/stubbing difficult as well. If anyone sees a way to get more coverage on this without major surgery, I'd love to hear it. The current patch makes sure that we never advertise a response as status=OK falsely, so it's just a bugfix and should be safe to include in branch_7x from a breaking-change perspective. There's a lot of other problems with the replication handler responses that would require breaking changes. Specifically: * "status" is only present on some responses. Ideally it should be present on all /replication responses so that clients can rely on it being there. * "status" is used inconsistently. Some uses give it an enum-like value that clients could key off of, others treat it like a "message" field and just give it random error messages * when errors occur, the "message" and "exception" fields are used inconsistently. Ideally if an error occurs there would always be a message, and sometimes there would also be an exception. * many of the error-cases involving argument-validation set the status field properly but return with the wrong HTTP status (200). (i.e. they should throw a SolrException). I plan on working some of these out soon in a larger commit that can be put on master. > Replication command=fetchindex always return success. > - > > Key: SOLR-6117 > URL: https://issues.apache.org/jira/browse/SOLR-6117 > Project: Solr > Issue Type: Bug > Components: replication (java) >Affects Versions: 4.6 >Reporter: Raintung Li >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-6117.patch, SOLR-6117.patch, SOLR-6117.patch, > SOLR-6117.txt > > > Replication API command=fetchindex do fetch the index. while occur the error, > still give success response. > API should return the right status, especially WAIT parameter is > true.(synchronous). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6117) Replication command=fetchindex always return success.
[ https://issues.apache.org/jira/browse/SOLR-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski updated SOLR-6117: -- Attachment: SOLR-6117.patch > Replication command=fetchindex always return success. > - > > Key: SOLR-6117 > URL: https://issues.apache.org/jira/browse/SOLR-6117 > Project: Solr > Issue Type: Bug > Components: replication (java) >Affects Versions: 4.6 >Reporter: Raintung Li >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-6117.patch, SOLR-6117.patch, SOLR-6117.patch, > SOLR-6117.txt > > > Replication API command=fetchindex do fetch the index. while occur the error, > still give success response. > API should return the right status, especially WAIT parameter is > true.(synchronous). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org