from:"Jason Gerlowski \\\(JIRA\\\)"

[jira] [Commented] (SOLR-13270) SolrJ does not send "Expect: 100-continue" header

2019-03-05 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785093#comment-16785093
 ] 

Jason Gerlowski commented on SOLR-13270:


I'm going to assign this to myself and hope to get to this by or over this 
weekend.  A few questions/notes:

1. If the issue is as simple as us overriding the RequestConfig in 
{{executeMethod}} how does the POST manage to get the 100-continue header 
through?  Is POST not following the same code path, or is it following the same 
codepath and 100-continue is coming from somewhere else?  Still need to trace 
this through...
2. Is pulling the RequestConfig from the HttpClient (if it exists) the right 
fix, or is "RequestConfig" an important-enough configuration object that it 
should be exposed on the HttpSolrClient.Builder in its own right?  Is this an 
awful idea in light of us moving away from Apache HttpComponents with the 
in-development HTTP2 versions of these clients?
3.  How should conflicts between RequestConfig and any other HttpSolrClient 
settings interact?  Should we overlay the provided RequestConfig settings on 
top of our defaults where possible?  Which values should win when a user 
specifies a RequestConfig but also chooses conflicting 
{{SolrClientBuilder.withConnectionTimeout}}/{{SolrClientBuilder.withSocketTimeout}}
 values?

(I don't think any of these are huge roadblocks, just leaving notes for myself 
on where to pick this up when I return in a few days.  If anyone has any 
thoughts or insight though)

> SolrJ does not send "Expect: 100-continue" header
> -
>
> Key: SOLR-13270
> URL: https://issues.apache.org/jira/browse/SOLR-13270
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.7
>Reporter: Erlend Garåsen
>Priority: Major
>
> SolrJ does not set the "Expect: 100-continue" header, even though it's 
> configured in HttpClient:
> {code:java}
> builder.setDefaultRequestConfig(RequestConfig.custom().setExpectContinueEnabled(true).build());{code}
> A HttpClient developer has reviewed the code and says we're setting up
>  the client correctly, so we have a reason to believe there is a bug in
>  SolrJ. It's actually a problem we are facing in ManifoldCF, explained in:
>  https://issues.apache.org/jira/browse/CONNECTORS-1564
> The problem can be reproduced by building and running the following small 
> Maven project:
> [http://folk.uio.no/erlendfg/solr/missing-header.zip]
> The application runs SolrJ code where the header does not show up and 
> HttpClient code where the header is present.
>  
> {code:java}
> HttpClientBuilder builder = HttpClients.custom();
> // This should add an Expect: 100-continue header:
> builder.setDefaultRequestConfig(RequestConfig.custom().setExpectContinueEnabled(true).build());
> HttpClient httpClient = builder.build();
> // Start Solr and create a core named "test".
> String baseUrl = "http://localhost:8983/solr/test;;
> // Test using SolrJ — no expect 100 header
> HttpSolrClient client = new HttpSolrClient.Builder()
>   .withHttpClient(httpClient)
>   .withBaseSolrUrl(baseUrl).build();
> SolrQuery query = new SolrQuery();
> query.setQuery("*:*");
> client.query(query);
> // Test using HttpClient directly — expect 100 header shows up:
> HttpPost httpPost = new HttpPost(baseUrl);
> HttpEntity entity = new InputStreamEntity(new 
> ByteArrayInputStream("test".getBytes()));
> httpPost.setEntity(entity);
> httpClient.execute(httpPost);
> {code}
> When using the last HttpClient test, the expect 100 header appears in 
> missing-header.log:
> {noformat}
> http-outgoing-1 >> Expect: 100-continue{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-13256) Ref Guide: Upgrade Notes for 7.7

2019-03-04 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski resolved SOLR-13256.

   Resolution: Fixed
Fix Version/s: 7.7
   master (9.0)
   8.0

> Ref Guide: Upgrade Notes for 7.7
> 
>
> Key: SOLR-13256
> URL: https://issues.apache.org/jira/browse/SOLR-13256
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Fix For: 8.0, master (9.0), 7.7
>
> Attachments: SOLR-13256.patch
>
>
> With 7.7 released and out the door, we should get the ball moving on a 7.7 
> ref-guide.  One of the prerequisites for that process is putting together 
> some upgrade notes that can go in 
> {{solr/solr-ref-guide/src/solr-upgrade-notes.adoc}} for users upgrading to 
> 7.7.
> I'm going to take a look at CHANGES and take a first pass at the "upgrading" 
> section for 7.7.  If anyone has anything they know should be in the list, 
> please let me know and I'll try to include it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13256) Ref Guide: Upgrade Notes for 7.7

2019-03-04 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783406#comment-16783406
 ] 

Jason Gerlowski commented on SOLR-13256:


A bugfix release (7.7.1) was sent out the door last week, so we no longer need 
to document the maxShardsPerNode and URP issues in our upgrade notes for 7.7.  
(Though we need to be extra sure to steer users away from 7.7.0).  So I'm going 
to commit the current patch as it is, minus those two bullet points.

> Ref Guide: Upgrade Notes for 7.7
> 
>
> Key: SOLR-13256
> URL: https://issues.apache.org/jira/browse/SOLR-13256
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13256.patch
>
>
> With 7.7 released and out the door, we should get the ball moving on a 7.7 
> ref-guide.  One of the prerequisites for that process is putting together 
> some upgrade notes that can go in 
> {{solr/solr-ref-guide/src/solr-upgrade-notes.adoc}} for users upgrading to 
> 7.7.
> I'm going to take a look at CHANGES and take a first pass at the "upgrading" 
> section for 7.7.  If anyone has anything they know should be in the list, 
> please let me know and I'll try to include it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13255) LanguageIdentifierUpdateProcessor broken for documents sent with SolrJ/javabin

2019-03-01 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781929#comment-16781929
 ] 

Jason Gerlowski commented on SOLR-13255:


Hey [~ahubold], have you had a chance to confirm whether 7.7.1 has fixed this 
issue for you?  I trust Noble's fix, but there was a report on the mailing list 
this morning about a similar ClassCastException on Solr 7.7.1 so I figured it 
was worth checking in to see if you'd tried out the fix yet or had a chance to 
do so in the near future...

> LanguageIdentifierUpdateProcessor broken for documents sent with SolrJ/javabin
> --
>
> Key: SOLR-13255
> URL: https://issues.apache.org/jira/browse/SOLR-13255
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LangId
>Affects Versions: 7.7
>Reporter: Andreas Hubold
>Assignee: Noble Paul
>Priority: Blocker
> Fix For: 8.0, 7.7.1
>
> Attachments: SOLR-13255.patch, SOLR-13255.patch, SOLR-13255.patch
>
>
> 7.7 changed the object type of string field values that are passed to 
> UpdateRequestProcessor implementations from java.lang.String to 
> ByteArrayUtf8CharSequence. SOLR-12992 was mentioned on solr-user as cause.
> The LangDetectLanguageIdentifierUpdateProcessor still expects String values, 
> does not work for CharSequences, and logs warnings instead. For example:
> {noformat}
> 2019-02-14 13:14:47.537 WARN  (qtp802600647-19) [   x:studio] 
> o.a.s.u.p.LangDetectLanguageIdentifierUpdateProcessor Field name_tokenized 
> not a String value, not including in detection
> {noformat}
> I'm not sure, but there could be further places where the changed type for 
> string values needs to be handled. (Our custom UpdateRequestProcessor are 
> broken as well since 7.7 and it would be great to have a proper upgrade note 
> as part of the release notes)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13256) Ref Guide: Upgrade Notes for 7.7

2019-02-18 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771382#comment-16771382
 ] 

Jason Gerlowski commented on SOLR-13256:


bq. Maybe it makes sense to hold the 7.7 Ref Guide until we figure out what is 
going to happen with those issues re: 7.7.1?

I think that makes sense. If there is going to be a 7.7.1 soon that we're going 
to be steering everyone towards anyways, there's no need to include this in the 
ref-guide.  If no one volunteers to do a 7.7.1 release soon and people are 
going to be using 7.7.0, then we can cross that bridge when we come to it.



(Thoughts below are only relevant if there is no 7.7.1 soon, and we need to 
cross the bridge of deciding whether to include Known Issues in our Upgrade 
Notes)

bq.  to date, we haven't mentioned Known Issues in the Upgrade Notes ... [this 
is] actually really hard for Solr ... What's the criteria for being included 
here? What about all the prior releases?

I'm not sure the slope is as slippery as it looks.  Yes, there are 1500 
unresolved Solr bugs, but only 8 specifically tagged as affecting 7.7.  And 
only 2 of those are being talked about as serious enough to trigger a bugfix 
release.  The number of "candidates-for-inclusion" drops to just a few pretty 
quickly.

If that's not convincing and your question about having guidelines/criteria 
wasn't rhetorical, let me offer a strawman for discussion: "Known Issues should 
only be included in the Upgrade Notes if they are generating discussion about 
an immediate bugfix release at the time the ref-guide release is being worked 
on".

> Ref Guide: Upgrade Notes for 7.7
> 
>
> Key: SOLR-13256
> URL: https://issues.apache.org/jira/browse/SOLR-13256
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13256.patch
>
>
> With 7.7 released and out the door, we should get the ball moving on a 7.7 
> ref-guide.  One of the prerequisites for that process is putting together 
> some upgrade notes that can go in 
> {{solr/solr-ref-guide/src/solr-upgrade-notes.adoc}} for users upgrading to 
> 7.7.
> I'm going to take a look at CHANGES and take a first pass at the "upgrading" 
> section for 7.7.  If anyone has anything they know should be in the list, 
> please let me know and I'll try to include it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-13241) Add "autoscaling" tool to the Windows script

2019-02-18 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski resolved SOLR-13241.

Resolution: Fixed

> Add "autoscaling" tool to the Windows script
> 
>
> Key: SOLR-13241
> URL: https://issues.apache.org/jira/browse/SOLR-13241
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki 
>Assignee: Jason Gerlowski
>Priority: Minor
> Attachments: SOLR-13241.patch
>
>
> SOLR-13155 added a command-line tool for testing autoscaling configurations. 
> The tool can be accessed by Unix {{bin/solr}} script but it's not integrated 
> with the Windows {{bin\solr.cmd}} script.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13255) LanguageIdentifierUpdateProcessor broken for documents sent with SolrJ/javabin

2019-02-18 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771196#comment-16771196
 ] 

Jason Gerlowski commented on SOLR-13255:


bq. it would be great to have a proper upgrade note as part of the release notes

Hey [~ahubold], I'm working on "Upgrade Notes" for users for the next release 
of our ref-guide, and I wanted them to include this issue.  I included a short 
paragraph over on SOLR-13256.  Since you mentioned you were interested in 
seeing this get documented, I wanted to give you a heads up.  Feel free to 
chime in over there about anything I got wrong or any suggestions you might 
have.

> LanguageIdentifierUpdateProcessor broken for documents sent with SolrJ/javabin
> --
>
> Key: SOLR-13255
> URL: https://issues.apache.org/jira/browse/SOLR-13255
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LangId
>Affects Versions: 7.7
>Reporter: Andreas Hubold
>Priority: Major
> Fix For: 8.0, 7.7.1
>
> Attachments: SOLR-13255.patch
>
>
> 7.7 changed the object type of string field values that are passed to 
> UpdateRequestProcessor implementations from java.lang.String to 
> ByteArrayUtf8CharSequence. SOLR-12992 was mentioned on solr-user as cause.
> The LangDetectLanguageIdentifierUpdateProcessor still expects String values, 
> does not work for CharSequences, and logs warnings instead. For example:
> {noformat}
> 2019-02-14 13:14:47.537 WARN  (qtp802600647-19) [   x:studio] 
> o.a.s.u.p.LangDetectLanguageIdentifierUpdateProcessor Field name_tokenized 
> not a String value, not including in detection
> {noformat}
> I'm not sure, but there could be further places where the changed type for 
> string values needs to be handled. (Our custom UpdateRequestProcessor are 
> broken as well since 7.7 and it would be great to have a proper upgrade note 
> as part of the release notes)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-13256) Ref Guide: Upgrade Notes for 7.7

2019-02-18 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-13256:
---
Attachment: SOLR-13256.patch

> Ref Guide: Upgrade Notes for 7.7
> 
>
> Key: SOLR-13256
> URL: https://issues.apache.org/jira/browse/SOLR-13256
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13256.patch
>
>
> With 7.7 released and out the door, we should get the ball moving on a 7.7 
> ref-guide.  One of the prerequisites for that process is putting together 
> some upgrade notes that can go in 
> {{solr/solr-ref-guide/src/solr-upgrade-notes.adoc}} for users upgrading to 
> 7.7.
> I'm going to take a look at CHANGES and take a first pass at the "upgrading" 
> section for 7.7.  If anyone has anything they know should be in the list, 
> please let me know and I'll try to include it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-13255) LanguageIdentifierUpdateProcessor broken for documents sent with SolrJ/javabin

2019-02-18 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771196#comment-16771196
 ] 

Jason Gerlowski edited comment on SOLR-13255 at 2/18/19 4:24 PM:
-

bq. it would be great to have a proper upgrade note as part of the release notes

Hey [~ahubold], I'm working on "Upgrade Notes" for the next release of our 
ref-guide, and I wanted them to include this issue.  I included a short 
paragraph over on SOLR-13256.  Since you mentioned you were interested in 
seeing this get documented, I wanted to give you a heads up.  Feel free to 
chime in over there about anything I got wrong or any suggestions you might 
have.


was (Author: gerlowskija):
bq. it would be great to have a proper upgrade note as part of the release notes

Hey [~ahubold], I'm working on "Upgrade Notes" for users for the next release 
of our ref-guide, and I wanted them to include this issue.  I included a short 
paragraph over on SOLR-13256.  Since you mentioned you were interested in 
seeing this get documented, I wanted to give you a heads up.  Feel free to 
chime in over there about anything I got wrong or any suggestions you might 
have.

> LanguageIdentifierUpdateProcessor broken for documents sent with SolrJ/javabin
> --
>
> Key: SOLR-13255
> URL: https://issues.apache.org/jira/browse/SOLR-13255
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LangId
>Affects Versions: 7.7
>Reporter: Andreas Hubold
>Priority: Major
> Fix For: 8.0, 7.7.1
>
> Attachments: SOLR-13255.patch
>
>
> 7.7 changed the object type of string field values that are passed to 
> UpdateRequestProcessor implementations from java.lang.String to 
> ByteArrayUtf8CharSequence. SOLR-12992 was mentioned on solr-user as cause.
> The LangDetectLanguageIdentifierUpdateProcessor still expects String values, 
> does not work for CharSequences, and logs warnings instead. For example:
> {noformat}
> 2019-02-14 13:14:47.537 WARN  (qtp802600647-19) [   x:studio] 
> o.a.s.u.p.LangDetectLanguageIdentifierUpdateProcessor Field name_tokenized 
> not a String value, not including in detection
> {noformat}
> I'm not sure, but there could be further places where the changed type for 
> string values needs to be handled. (Our custom UpdateRequestProcessor are 
> broken as well since 7.7 and it would be great to have a proper upgrade note 
> as part of the release notes)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-13256) Ref Guide: Upgrade Notes for 7.7

2019-02-15 Thread Jason Gerlowski (JIRA)

Jason Gerlowski created SOLR-13256:
--

 Summary: Ref Guide: Upgrade Notes for 7.7
 Key: SOLR-13256
 URL: https://issues.apache.org/jira/browse/SOLR-13256
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: documentation
Reporter: Jason Gerlowski
Assignee: Jason Gerlowski


With 7.7 released and out the door, we should get the ball moving on a 7.7 
ref-guide.  One of the prerequisites for that process is putting together some 
upgrade notes that can go in 
{{solr/solr-ref-guide/src/solr-upgrade-notes.adoc}} for users upgrading to 7.7.

I'm going to take a look at CHANGES and take a first pass at the "upgrading" 
section for 7.7.  If anyone has anything they know should be in the list, 
please let me know and I'll try to include it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13155) CLI tool for testing autoscaling suggestions against a live cluster

2019-02-13 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16767213#comment-16767213
 ] 

Jason Gerlowski commented on SOLR-13155:


Great, I'll remove it as a part of SOLR-13241.

bq. I think the pattern for other CLI commands is that there is some (partial) 
validation of the arguments in the script and the remaining part is done in 
Java. In this case it's perfectly valid to call this tool without any arguments

Yeah, it's a bit confusing with the two different tool-patterns we have right 
now.  As I understand things the difference is less about having a valid 0-arg 
usage, and more around a decision that was made at some point to put as little 
new code in {{bin/solr}} and {{bin/solr.cmd}} as we can get away with.  e.g. 
the {{config}} tool has required arguments but does all arg parsing in 
Java.Windows-script is impossible to maintain.  Even if it was a more 
well-known language there's still the issue of duplicating logic that could 
just live in one place.  So all the newer tools do arg-parsing in Java afaik.

> CLI tool for testing autoscaling suggestions against a live cluster
> ---
>
> Key: SOLR-13155
> URL: https://issues.apache.org/jira/browse/SOLR-13155
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Reporter: Andrzej Bialecki 
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: 8.0, master (9.0)
>
> Attachments: SOLR-13155.patch, SOLR-13155.patch, SOLR-13155.patch
>
>
> Solr already provides /autoscaling/diagnostics and /autoscaling/suggestions 
> endpoints. In some situations it would be very helpful to be able to run 
> "what if" scenarios using data about nodes and replicas taken from a 
> production cluster but with a different autoscaling policy than the one that 
> is deployed, without also worrying that the calculations would negatively 
> impact a production cluster's Overseer leader.
> All necessary classes (including the Policy engine) are self-contained in the 
> SolrJ component, so it's just a matter of packaging and writing a CLI tool + 
> a wrapper script.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-13155) CLI tool for testing autoscaling suggestions against a live cluster

2019-02-13 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16767213#comment-16767213
 ] 

Jason Gerlowski edited comment on SOLR-13155 at 2/13/19 1:56 PM:
-

Great, I'll remove it as a part of SOLR-13241.

bq. I think the pattern for other CLI commands is that there is some (partial) 
validation of the arguments in the script and the remaining part is done in 
Java. In this case it's perfectly valid to call this tool without any arguments

Yeah, it's a bit confusing with the two different tool-patterns we have right 
now.  As I understand things the difference is less about having a valid 0-arg 
usage, and more around a decision that was made at some point to put as little 
new code in {{bin/solr}} and {{bin/solr.cmd}} as we can get away with.  e.g. 
the {{config}} tool has required arguments but does all arg parsing in Java.  
Windows-script is impossible to maintain.  Even if it was a more well-known 
language there's still the issue of duplicating logic that could just live in 
one place.  So all the newer tools do arg-parsing in Java afaik.


was (Author: gerlowskija):
Great, I'll remove it as a part of SOLR-13241.

bq. I think the pattern for other CLI commands is that there is some (partial) 
validation of the arguments in the script and the remaining part is done in 
Java. In this case it's perfectly valid to call this tool without any arguments

Yeah, it's a bit confusing with the two different tool-patterns we have right 
now.  As I understand things the difference is less about having a valid 0-arg 
usage, and more around a decision that was made at some point to put as little 
new code in {{bin/solr}} and {{bin/solr.cmd}} as we can get away with.  e.g. 
the {{config}} tool has required arguments but does all arg parsing in 
Java.Windows-script is impossible to maintain.  Even if it was a more 
well-known language there's still the issue of duplicating logic that could 
just live in one place.  So all the newer tools do arg-parsing in Java afaik.

> CLI tool for testing autoscaling suggestions against a live cluster
> ---
>
> Key: SOLR-13155
> URL: https://issues.apache.org/jira/browse/SOLR-13155
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Reporter: Andrzej Bialecki 
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: 8.0, master (9.0)
>
> Attachments: SOLR-13155.patch, SOLR-13155.patch, SOLR-13155.patch
>
>
> Solr already provides /autoscaling/diagnostics and /autoscaling/suggestions 
> endpoints. In some situations it would be very helpful to be able to run 
> "what if" scenarios using data about nodes and replicas taken from a 
> production cluster but with a different autoscaling policy than the one that 
> is deployed, without also worrying that the calculations would negatively 
> impact a production cluster's Overseer leader.
> All necessary classes (including the Policy engine) are self-contained in the 
> SolrJ component, so it's just a matter of packaging and writing a CLI tool + 
> a wrapper script.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-13241) Add "autoscaling" tool to the Windows script

2019-02-13 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski reassigned SOLR-13241:
--

Assignee: Jason Gerlowski

> Add "autoscaling" tool to the Windows script
> 
>
> Key: SOLR-13241
> URL: https://issues.apache.org/jira/browse/SOLR-13241
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki 
>Assignee: Jason Gerlowski
>Priority: Minor
> Attachments: SOLR-13241.patch
>
>
> SOLR-13155 added a command-line tool for testing autoscaling configurations. 
> The tool can be accessed by Unix {{bin/solr}} script but it's not integrated 
> with the Windows {{bin\solr.cmd}} script.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-13241) Add "autoscaling" tool to the Windows script

2019-02-12 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-13241:
---
Attachment: SOLR-13241.patch

> Add "autoscaling" tool to the Windows script
> 
>
> Key: SOLR-13241
> URL: https://issues.apache.org/jira/browse/SOLR-13241
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki 
>Priority: Minor
> Attachments: SOLR-13241.patch
>
>
> SOLR-13155 added a command-line tool for testing autoscaling configurations. 
> The tool can be accessed by Unix {{bin/solr}} script but it's not integrated 
> with the Windows {{bin\solr.cmd}} script.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13155) CLI tool for testing autoscaling suggestions against a live cluster

2019-02-12 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766434#comment-16766434
 ] 

Jason Gerlowski commented on SOLR-13155:


Hey [~ab], took a look at your latest patch this morning while preparing to 
write a Windows equivalent of the {{bin/solr}} bits you just added.  One 
question:

You add a {{print_usage}} section for the new autoscaling command...
{code}
+  elif [ "$CMD" == "autoscaling" ]; then
+echo ""
+echo "Usage: solr autoscaling [-z zkHost] [-a ] 
[-s] [-d] [-n] [-r]"
+echo ""
+echo "  Calculate autoscaling policy suggestions and diagnostic 
information, using either the deployed"
+echo "  autoscaling configuration or the one supplied on the command line. 
This calculation takes place"
+echo "  on the client-side without affecting the running cluster except 
for fetching the node and replica"
+echo "  metrics from the cluster. For detailed usage instructions, do:"
+echo ""
+echo "bin/solr autoscaling -help"
+echo ""
{code}

But I can't figure out what command would actually trigger this help text.  The 
"autoscaling" command defers parsing its args until Java-land, so any 
{{-h}}/{{--help}}/etc. argument will trigger the commons-cli generated help 
text instead:

{code}
➜  solr git:(master) ✗ bin/solr autoscaling -h
INFO  - 2019-02-12 15:33:01.434; 
org.apache.solr.util.configuration.SSLCredentialProviderFactory; Processing SSL 
Credential Provider chain: env;sysprop
Failed to parse command-line arguments due to: Unrecognized option: -h
usage: org.apache.solr.util.SolrCLI
 -a,--configAutoscaling config file, defaults to the one
deployed in the cluster.
 -all   Turn on all options to get all available
information.
 -c,--clusterState  Show ClusterState (collections layout)
 -d,--diagnostics   Show calculated diagnostics
 -help  Print this message
 -n,--sortedNodes   Show sorted nodes with diagnostics
 -r,--redactRedact node and collection names (original names
will be consistently randomized)
 -s,--suggestions   Show calculated suggestions
 -stats Show summarized collection & node statistics.
 -verbose   Generate verbose log messages
 -zkHost  Address of the Zookeeper ensemble; defaults to:
localhost:9983
{code}

Am I missing some command that manages to trigger that help text, or is it 
dead-code that we can remove or change?  (I'm only asking so I know whether to 
include similar help text in the solr.cmd version.  If the {{bin/solr}} help 
text is dead code, I'm happy to remove it for you when I commit plumbing on the 
Windows side tomorrow.)


> CLI tool for testing autoscaling suggestions against a live cluster
> ---
>
> Key: SOLR-13155
> URL: https://issues.apache.org/jira/browse/SOLR-13155
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Reporter: Andrzej Bialecki 
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: 8.0, master (9.0)
>
> Attachments: SOLR-13155.patch, SOLR-13155.patch, SOLR-13155.patch
>
>
> Solr already provides /autoscaling/diagnostics and /autoscaling/suggestions 
> endpoints. In some situations it would be very helpful to be able to run 
> "what if" scenarios using data about nodes and replicas taken from a 
> production cluster but with a different autoscaling policy than the one that 
> is deployed, without also worrying that the calculations would negatively 
> impact a production cluster's Overseer leader.
> All necessary classes (including the Policy engine) are self-contained in the 
> SolrJ component, so it's just a matter of packaging and writing a CLI tool + 
> a wrapper script.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13042) Miscellaneous JSON Facet API docs improvements

2019-02-06 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762311#comment-16762311
 ] 

Jason Gerlowski commented on SOLR-13042:


Thanks for the double-check Mikhail.

I didn't want to squeeze this into branch_7_7 with the release going out the 
door.  I guess the ref-guide is built separately, but it still seemed last 
minute.  Anyways, I've committed this everywhere else I wanted to, so I'll mark 
this as closed.

> Miscellaneous JSON Facet API docs improvements
> --
>
> Key: SOLR-13042
> URL: https://issues.apache.org/jira/browse/SOLR-13042
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Affects Versions: 7.5, 8.0
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Minor
> Attachments: SOLR-13042.patch, SOLR-13042.patch, SOLR-13042.patch
>
>
> While working on SOLR-12965 I noticed a few minor issues with the JSON 
> faceting ref-guide pages.  Nothing serious, just a few annoyances.  Tweaks 
> include:
> * missing/insufficient description of some params for Heatmap facets
> * Weird formatting on "Domain Filters" example
> * missing "fields"/"fl" in the "Parameters Mapping" table
> Figured I'd just create a JIRA and fix these before I forgot about them



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12330) JSON Facet syntax errors are responded as runtime exceptions with 500 code

2019-02-06 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762313#comment-16762313
 ] 

Jason Gerlowski commented on SOLR-12330:


LGTM.  I _think_ you should be able to drop the {{json-facet-api.adoc}} changes 
from the patch, as I have that information covered already in some recent 
tweaks I made the the JSON faceting docs over on SOLR-13042.  But worth double 
checking me on that, as there might be a detail I missed.

> JSON Facet syntax errors are responded as runtime exceptions with 500 code
> --
>
> Key: SOLR-12330
> URL: https://issues.apache.org/jira/browse/SOLR-12330
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Facet Module
>Affects Versions: 7.3
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Major
> Attachments: SOLR-12330-combined.patch, SOLR-12330.patch, 
> SOLR-12330.patch, SOLR-12330.patch, SOLR-12330.patch, SOLR-12330.patch, 
> SOLR-12330.patch, SOLR-12330.patch
>
>
> Just encounter such weird behaviour, will recheck and followup. 
>  \{{"filter":["\{!v=$bogus}"]}} responds back with just NPE which makes 
> impossible to guess the reason.
>  -It might be even worse, since- \{{"filter":[\{"param":"bogus"}]}} seems 
> like just silently ignored. Turns out it's ok see SOLR-9682



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-12330) JSON Facet syntax errors are responded as runtime exceptions with 500 code

2019-02-06 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762313#comment-16762313
 ] 

Jason Gerlowski edited comment on SOLR-12330 at 2/7/19 2:37 AM:


LGTM.  I _think_ you should be able to drop the {{json-facet-api.adoc}} changes 
from the patch, as I have that information covered already in some recent 
tweaks I made to the JSON faceting docs over on SOLR-13042.  But worth double 
checking me on that, as there might be a detail I missed.


was (Author: gerlowskija):
LGTM.  I _think_ you should be able to drop the {{json-facet-api.adoc}} changes 
from the patch, as I have that information covered already in some recent 
tweaks I made the the JSON faceting docs over on SOLR-13042.  But worth double 
checking me on that, as there might be a detail I missed.

> JSON Facet syntax errors are responded as runtime exceptions with 500 code
> --
>
> Key: SOLR-12330
> URL: https://issues.apache.org/jira/browse/SOLR-12330
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Facet Module
>Affects Versions: 7.3
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Major
> Attachments: SOLR-12330-combined.patch, SOLR-12330.patch, 
> SOLR-12330.patch, SOLR-12330.patch, SOLR-12330.patch, SOLR-12330.patch, 
> SOLR-12330.patch, SOLR-12330.patch
>
>
> Just encounter such weird behaviour, will recheck and followup. 
>  \{{"filter":["\{!v=$bogus}"]}} responds back with just NPE which makes 
> impossible to guess the reason.
>  -It might be even worse, since- \{{"filter":[\{"param":"bogus"}]}} seems 
> like just silently ignored. Turns out it's ok see SOLR-9682



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-13042) Miscellaneous JSON Facet API docs improvements

2019-02-06 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski resolved SOLR-13042.

Resolution: Done

> Miscellaneous JSON Facet API docs improvements
> --
>
> Key: SOLR-13042
> URL: https://issues.apache.org/jira/browse/SOLR-13042
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Affects Versions: 7.5, 8.0
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Minor
> Attachments: SOLR-13042.patch, SOLR-13042.patch, SOLR-13042.patch
>
>
> While working on SOLR-12965 I noticed a few minor issues with the JSON 
> faceting ref-guide pages.  Nothing serious, just a few annoyances.  Tweaks 
> include:
> * missing/insufficient description of some params for Heatmap facets
> * Weird formatting on "Domain Filters" example
> * missing "fields"/"fl" in the "Parameters Mapping" table
> Figured I'd just create a JIRA and fix these before I forgot about them



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13174) NPE in Json Facet API for Facet range

2019-02-06 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762257#comment-16762257
 ] 

Jason Gerlowski commented on SOLR-13174:


Ok, I'll leave it in Mikhail's hands over there and will close this out.  
Thanks for the heads up, and for putting the legwork in!

> NPE in Json Facet API for Facet range
> -
>
> Key: SOLR-13174
> URL: https://issues.apache.org/jira/browse/SOLR-13174
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Facet Module
>Reporter: Munendra S N
>Assignee: Jason Gerlowski
>Priority: Minor
> Attachments: SOLR-13174.patch
>
>
> There is mismatch in the error and status code between JSON facet's facet 
> range and Classical facet range.
> When start or end or gap is not specified in the request, Classical faceting 
> returns Bad request where as JSON facet returns 500 without below trace
> {code:java}
> {
> "trace": "java.lang.NullPointerException\n\tat 
> org.apache.solr.search.facet.FacetRangeProcessor.createRangeList(FacetRange.java:216)\n\tat
>  
> org.apache.solr.search.facet.FacetRangeProcessor.getRangeCounts(FacetRange.java:206)\n\tat
>  
> org.apache.solr.search.facet.FacetRangeProcessor.process(FacetRange.java:98)\n\tat
>  
> org.apache.solr.search.facet.FacetProcessor.processSubs(FacetProcessor.java:460)\n\tat
>  
> org.apache.solr.search.facet.FacetProcessor.fillBucket(FacetProcessor.java:407)\n\tat
>  
> org.apache.solr.search.facet.FacetQueryProcessor.process(FacetQuery.java:64)\n\tat
>  org.apache.solr.search.facet.FacetModule.process(FacetModule.java:154)\n\tat 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:296)\n\tat
>  
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)\n\tat
>  org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)\n\tat 
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)\n\tat 
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)\n\tat 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)\n\tat
>  
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
>  
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
>  
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
>  org.eclipse.jetty.server.Server.handle(Server.java:534)\n\tat 
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)\n\tat 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)\n\tat
>  
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\tat
>  org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\tat 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat
>  
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)\n\tat
>  
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)\n\tat
>  
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)\n\tat
>  
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)\n\tat
>  
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)\n\tat
>  java.lang.Thread.run(Thread.java:748)\n",
> "code": 500
> }
> {code}



--
This message was sent by Atlassian JIRA

[jira] [Commented] (SOLR-13042) Miscellaneous JSON Facet API docs improvements

2019-02-05 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760886#comment-16760886
 ] 

Jason Gerlowski commented on SOLR-13042:


Going to merge this later today if no one has any feedback on the structure or 
wording.

> Miscellaneous JSON Facet API docs improvements
> --
>
> Key: SOLR-13042
> URL: https://issues.apache.org/jira/browse/SOLR-13042
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Affects Versions: 7.5, 8.0
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Minor
> Attachments: SOLR-13042.patch, SOLR-13042.patch, SOLR-13042.patch
>
>
> While working on SOLR-12965 I noticed a few minor issues with the JSON 
> faceting ref-guide pages.  Nothing serious, just a few annoyances.  Tweaks 
> include:
> * missing/insufficient description of some params for Heatmap facets
> * Weird formatting on "Domain Filters" example
> * missing "fields"/"fl" in the "Parameters Mapping" table
> Figured I'd just create a JIRA and fix these before I forgot about them



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-13174) NPE in Json Facet API for Facet range

2019-02-05 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski reassigned SOLR-13174:
--

Assignee: Jason Gerlowski

> NPE in Json Facet API for Facet range
> -
>
> Key: SOLR-13174
> URL: https://issues.apache.org/jira/browse/SOLR-13174
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Facet Module
>Reporter: Munendra S N
>Assignee: Jason Gerlowski
>Priority: Minor
> Attachments: SOLR-13174.patch
>
>
> There is mismatch in the error and status code between JSON facet's facet 
> range and Classical facet range.
> When start or end or gap is not specified in the request, Classical faceting 
> returns Bad request where as JSON facet returns 500 without below trace
> {code:java}
> {
> "trace": "java.lang.NullPointerException\n\tat 
> org.apache.solr.search.facet.FacetRangeProcessor.createRangeList(FacetRange.java:216)\n\tat
>  
> org.apache.solr.search.facet.FacetRangeProcessor.getRangeCounts(FacetRange.java:206)\n\tat
>  
> org.apache.solr.search.facet.FacetRangeProcessor.process(FacetRange.java:98)\n\tat
>  
> org.apache.solr.search.facet.FacetProcessor.processSubs(FacetProcessor.java:460)\n\tat
>  
> org.apache.solr.search.facet.FacetProcessor.fillBucket(FacetProcessor.java:407)\n\tat
>  
> org.apache.solr.search.facet.FacetQueryProcessor.process(FacetQuery.java:64)\n\tat
>  org.apache.solr.search.facet.FacetModule.process(FacetModule.java:154)\n\tat 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:296)\n\tat
>  
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)\n\tat
>  org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)\n\tat 
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)\n\tat 
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)\n\tat 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)\n\tat
>  
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
>  
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
>  
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
>  org.eclipse.jetty.server.Server.handle(Server.java:534)\n\tat 
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)\n\tat 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)\n\tat
>  
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\tat
>  org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\tat 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat
>  
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)\n\tat
>  
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)\n\tat
>  
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)\n\tat
>  
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)\n\tat
>  
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)\n\tat
>  java.lang.Thread.run(Thread.java:748)\n",
> "code": 500
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail:

[jira] [Commented] (SOLR-9515) Update to Hadoop 3

2019-01-30 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-9515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756413#comment-16756413
 ] 

Jason Gerlowski commented on SOLR-9515:
---

Took a quick look.  I see the biggest part of the patch (other than license 
changes) is the HttpServer2 class you added.  But I couldn't trace out how 
HttpServer2 gets invoked.  Nothing calls the Builder in that class, AFAICT.  
What am I missing?

Other than that question, everything looks good so far to me at least.

> Update to Hadoop 3
> --
>
> Key: SOLR-9515
> URL: https://issues.apache.org/jira/browse/SOLR-9515
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Mark Miller
>Assignee: Kevin Risden
>Priority: Major
> Fix For: 8.0, master (9.0)
>
> Attachments: SOLR-9515.patch, SOLR-9515.patch, SOLR-9515.patch
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Hadoop 3 is not out yet, but I'd like to iron out the upgrade to be prepared. 
> I'll start up a dev branch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-13177) aboul SOLR-5480

2019-01-29 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski resolved SOLR-13177.

Resolution: Invalid

Closing this ticket as "invalid".  [~phoema], Solr's JIRA instance is for 
tracking bugs, not for use as a support portal or for asking questions about 
JIRAs that already exist.

(There's nothing wrong with those questions, they just don't belong here.  Try 
asking if anyone has any updates on SOLR-5480 itself.  If no one answers, that 
likely means no one has any updates that aren't already on that issue.)

> aboul SOLR-5480
> ---
>
> Key: SOLR-13177
> URL: https://issues.apache.org/jira/browse/SOLR-13177
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 7.6
>Reporter: phoema
>Priority: Blocker
>
> I have the same problem as Issue SOLR-5480. When will this issue be solved?
> https://issues.apache.org/jira/browse/SOLR-5480



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13162) Admin UI development-test cycle is slow

2019-01-22 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748882#comment-16748882
 ] 

Jason Gerlowski commented on SOLR-13162:


It depends what files you're editing, but I think there is an ant command for 
repackaging the admin-ui alone.  You should be able to run {{ant dist}} from 
the {{solr/webapp}} dir.  Could totally be misunderstanding what you're after 
here, or maybe {{ant dist}} is deficient in some way.  Just wanted to mention 
it on the off chance that's what you're looking for.

> Admin UI development-test cycle is slow
> ---
>
> Key: SOLR-13162
> URL: https://issues.apache.org/jira/browse/SOLR-13162
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: Jeremy Branham
>Priority: Minor
>
> When developing the admin user interface, it takes a long time to rebuild the 
> server to do testing.
> It would be nice to have a small test harness or the admin ui, so that 'ant 
> server' doesnt need to be executed before testing changes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13116) Add Admin UI login support for Kerberos

2019-01-22 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748812#comment-16748812
 ] 

Jason Gerlowski commented on SOLR-13116:


I guess I'm fine with that.  I'm not sure what information we'd add that 
wouldn't be a restatement of the instructions already on the login page.

Probably worth double checking that this is given a good description in 
CHANGES.txt though, since it's such a visible change for anyone using auth.

> Add Admin UI login support for Kerberos
> ---
>
> Key: SOLR-13116
> URL: https://issues.apache.org/jira/browse/SOLR-13116
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: 8.0, 7.7
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
> Fix For: 8.0, 7.7
>
> Attachments: SOLR-13116.patch, SOLR-13116.patch, eventual_auth.png, 
> improved_login_page.png
>
>
> Spinoff from SOLR-7896. Kerberos auth plugin should get Admin UI Login 
> support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13116) Add Admin UI login support for Kerberos

2019-01-15 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743048#comment-16743048
 ] 

Jason Gerlowski commented on SOLR-13116:


Oh shoot, I missed your last comment, sorry.  I don't remember if there was a 
refguide link or not.  Maybe there was a link there but my browser had issues 
with it for some reason?  I'll take a look again today with your latest patch 
and let you know.  Hopefully we can get this cleared up.

> Add Admin UI login support for Kerberos
> ---
>
> Key: SOLR-13116
> URL: https://issues.apache.org/jira/browse/SOLR-13116
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: 8.0, 7.7
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
> Fix For: 8.0, 7.7
>
> Attachments: SOLR-13116.patch, SOLR-13116.patch, eventual_auth.png, 
> improved_login_page.png
>
>
> Spinoff from SOLR-7896. Kerberos auth plugin should get Admin UI Login 
> support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13116) Add Admin UI login support for Kerberos

2019-01-11 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740479#comment-16740479
 ] 

Jason Gerlowski commented on SOLR-13116:


Just got a chance to test your patch.  Things look better (for Kerberos at 
least).  I've attached a screenshot showing the result:

 !improved_login_page.png! 

> Add Admin UI login support for Kerberos
> ---
>
> Key: SOLR-13116
> URL: https://issues.apache.org/jira/browse/SOLR-13116
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: 8.0, 7.7
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
> Fix For: 8.0, 7.7
>
> Attachments: SOLR-13116.patch, eventual_auth.png, 
> improved_login_page.png
>
>
> Spinoff from SOLR-7896. Kerberos auth plugin should get Admin UI Login 
> support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-13116) Add Admin UI login support for Kerberos

2019-01-11 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-13116:
---
Attachment: improved_login_page.png

> Add Admin UI login support for Kerberos
> ---
>
> Key: SOLR-13116
> URL: https://issues.apache.org/jira/browse/SOLR-13116
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: 8.0, 7.7
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
> Fix For: 8.0, 7.7
>
> Attachments: SOLR-13116.patch, eventual_auth.png, 
> improved_login_page.png
>
>
> Spinoff from SOLR-7896. Kerberos auth plugin should get Admin UI Login 
> support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-13042) Miscellaneous JSON Facet API docs improvements

2019-01-09 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-13042:
---
Attachment: SOLR-13042.patch

> Miscellaneous JSON Facet API docs improvements
> --
>
> Key: SOLR-13042
> URL: https://issues.apache.org/jira/browse/SOLR-13042
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Affects Versions: 7.5, 8.0
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Minor
> Attachments: SOLR-13042.patch, SOLR-13042.patch, SOLR-13042.patch
>
>
> While working on SOLR-12965 I noticed a few minor issues with the JSON 
> faceting ref-guide pages.  Nothing serious, just a few annoyances.  Tweaks 
> include:
> * missing/insufficient description of some params for Heatmap facets
> * Weird formatting on "Domain Filters" example
> * missing "fields"/"fl" in the "Parameters Mapping" table
> Figured I'd just create a JIRA and fix these before I forgot about them



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13116) Add Admin UI login support for Kerberos

2019-01-08 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737238#comment-16737238
 ] 

Jason Gerlowski commented on SOLR-13116:


Thanks for the pointers Kevin; will check them out.

[~janhoy]  I reproduced again this morning and saw the following error in my 
browser's web console.  I'm not familiar enough with how the login page is 
implemented to tell if it's helpful.  But hopefully you find it enlightening:
{code}
Error: wwwHeader is null
@http://solr1:8983/solr/js/angular/controllers/login.js:31:11
invoke@http://solr1:8983/solr/libs/angular.js:4205:14
instantiate@http://solr1:8983/solr/libs/angular.js:4213:27
$ControllerProvider/this.$gethttp://solr1:8983/solr/libs/angular.js:8472:18
link@http://solr1:8983/solr/libs/angular-route.min.js:30:268
invokeLinkFn@http://solr1:8983/solr/libs/angular.js:8236:9
nodeLinkFn@http://solr1:8983/solr/libs/angular.js:7745:11
compositeLinkFn@http://solr1:8983/solr/libs/angular.js:7098:13
publicLinkFn@http://solr1:8983/solr/libs/angular.js:6977:30
boundTranscludeFn@http://solr1:8983/solr/libs/angular.js:7116:16
controllersBoundTransclude@http://solr1:8983/solr/libs/angular.js:7772:18
x@http://solr1:8983/solr/libs/angular-route.min.js:29:364
$broadcast@http://solr1:8983/solr/libs/angular.js:14725:15
m/<@http://solr1:8983/solr/libs/angular-route.min.js:34:426
processQueue@http://solr1:8983/solr/libs/angular.js:13193:27
scheduleProcessQueue/<@http://solr1:8983/solr/libs/angular.js:13209:27
$eval@http://solr1:8983/solr/libs/angular.js:14406:16
$digest@http://solr1:8983/solr/libs/angular.js:14222:15
$apply@http://solr1:8983/solr/libs/angular.js:14511:13
done@http://solr1:8983/solr/libs/angular.js:9669:36
completeRequest@http://solr1:8983/solr/libs/angular.js:9859:7
requestLoaded@http://solr1:8983/solr/libs/angular.js:9800:9
 
{code}

There's nothing that appears relevant in {{solr.log}}.

As for why your kinit command just hung, I've got a guess.  Docker on Linux 
allows the host machine to reach docker containers by IP address.  But docker 
on Mac 
[doesnt|https://docs.docker.com/docker-for-mac/networking/#per-container-ip-addressing-is-not-possible].
  Since running {{kinit}} on the host machine (your macbook) has it try to talk 
to the Kerberos KDC server by IP address, {{kinit}} just hangs because it can't 
route to the docker container hosting the KDC.  That's my theory at least.  If 
you give it a shot on a Linux box, I bet it'll work for you.

Anyway, hopefully you can reproduce it on your own.  But if you still can't 
reproduce, or want a double check that a fix works, happy to run the 
reproduction again.

> Add Admin UI login support for Kerberos
> ---
>
> Key: SOLR-13116
> URL: https://issues.apache.org/jira/browse/SOLR-13116
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: 8.0, 7.7
>Reporter: Jan Høydahl
>Priority: Major
> Attachments: eventual_auth.png
>
>
> Spinoff from SOLR-7896. Kerberos auth plugin should get Admin UI Login 
> support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-12613) Rename "Cloud" tab as "Cluster" in Admin UI

2019-01-08 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737185#comment-16737185
 ] 

Jason Gerlowski edited comment on SOLR-12613 at 1/8/19 2:51 PM:


Why not both?  I think there's general consensus that we would love to improve 
the UI in larger ways, but any larger effort is bound to take longer to get 
going (particularly when few committers are familiar with the UI).  If renaming 
this menu tab helps our users in the interim, and there's going to be at least 
one release before a broader effort might address this, I think people should 
feel welcome to take it on if they've got time.


was (Author: gerlowskija):
Why not both?  I think there's general consensus that we would love to improve 
the UI in larger ways, but any larger effort is bound to take longer to get 
going (particularly when few committers are familiar with the UI).  If renaming 
this menu tab helps our users in the interim, and there's going to be at least 
one release before a broader effort might address this, I think people should 
feel welcome to take it on.

> Rename "Cloud" tab as "Cluster" in Admin UI
> ---
>
> Key: SOLR-12613
> URL: https://issues.apache.org/jira/browse/SOLR-12613
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: Jan Høydahl
>Priority: Major
>  Labels: newdev
> Fix For: 8.0
>
>
> Spinoff from SOLR-8207. When adding more cluster-wide functionality to the 
> Admin UI, it feels better to name the "Cloud" UI tab as "Cluster".
> In addition to renaming the "Cloud" tab, we should also change the URL part 
> from {{~cloud}} to {{~cluster}}, update reference guide page names, 
> screenshots and references etc.
> I propose this change is not introduced in 7.x due to the impact, so tagged 
> it as fix-version 8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12613) Rename "Cloud" tab as "Cluster" in Admin UI

2019-01-08 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737185#comment-16737185
 ] 

Jason Gerlowski commented on SOLR-12613:


Why not both?  I think there's general consensus that we would love to improve 
the UI in larger ways, but any larger effort is bound to take longer to get 
going (particularly when few committers are familiar with the UI).  If renaming 
this menu tab helps our users in the interim, and there's going to be at least 
one release before a broader effort might address this, I think people should 
feel welcome to take it on.

> Rename "Cloud" tab as "Cluster" in Admin UI
> ---
>
> Key: SOLR-12613
> URL: https://issues.apache.org/jira/browse/SOLR-12613
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: Jan Høydahl
>Priority: Major
>  Labels: newdev
> Fix For: 8.0
>
>
> Spinoff from SOLR-8207. When adding more cluster-wide functionality to the 
> Admin UI, it feels better to name the "Cloud" UI tab as "Cluster".
> In addition to renaming the "Cloud" tab, we should also change the URL part 
> from {{~cloud}} to {{~cluster}}, update reference guide page names, 
> screenshots and references etc.
> I propose this change is not introduced in 7.x due to the impact, so tagged 
> it as fix-version 8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13116) Add Admin UI login support for Kerberos

2019-01-07 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735903#comment-16735903
 ] 

Jason Gerlowski commented on SOLR-13116:


Ok, that makes sense.  The page would be much more appropriate if only the 
bottom section appeared, as you indicate is "expected'.  I'll retry this 
afternoon when I get a few spare minutes and see if there's any particularly 
helpful errors in the browser console.  I didn't see anything interesting in 
solr.log previously, fwiw.

Yeah, improving the message for Kerberos to be close to what you suggested 
would be a big improvement IMO.  I'd suggest a slight rewording...there's two 
main things that can go wrong with Kerberos in the browser, and it'd be helpful 
to mention both of them a bit more explicitly.  I'd suggest something like:

"Your browser did not provide the required information to authenticate using 
Kerberos.  Please check that your computer has a valid ticket for communicating 
with Solr, and that your browser is properly configured to provide that ticket 
when required.  For more information consult Solr's Kerberos 
documentation[link].  The response from the server was: <..>"

> Add Admin UI login support for Kerberos
> ---
>
> Key: SOLR-13116
> URL: https://issues.apache.org/jira/browse/SOLR-13116
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: master (9.0), 7.7
>Reporter: Jan Høydahl
>Priority: Major
> Attachments: eventual_auth.png
>
>
> Spinoff from SOLR-7896. Kerberos auth plugin should get Admin UI Login 
> support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-13116) Add Admin UI login support for Kerberos

2019-01-07 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735833#comment-16735833
 ] 

Jason Gerlowski edited comment on SOLR-13116 at 1/7/19 1:53 PM:


Hey Jan,  I just tested your login screen with Kerberos (this includes the 
changes you made an hour or so ago, to clarify)

This is the behavior I'm seeing:

1. With a Kerberos ticket in my local ticket cache, I can get to the admin UI 
and perform operations without ever seeing a login screen.  The admin UI is 
definitely usable.
2. If I destroy my Kerberos ticket or it expires, subsequent navigation or 
operations will produce a username/password login page.
3. If my machine acquires a valid ticket, I can then click on the 'Dashboard' 
menu item to get away from the login page and back to the dashboard.

So in summary, the Admin UI is definitely usable when Kerberos auth is being 
used.  But that said the login/auth page still seems a little 
BasicAuth-specific, and inappropriate for other auth schemes.  Some specific 
issues:

1. We probably shouldn't be displaying {{username}} and {{password}} dialog 
boxes unless we're sure the user is using a auth scheme where those values make 
sense (they don't in Kerberos, for example).
2. Some other terms on the page also seem a little too Basic Auth specific to 
be useful for other auth schemes.  "Login/Logout" might be examples of this - 
those terms are rarely used when discussing Kerberos authentication.  Not 
entirely sure on this though.
3. It looks like when Kerberos is used, several templated values needed for the 
auth page are missing, causing UI errors.  Not familiar with how the UI works, 
so I may be off on the cause here.  I've attached a screenshot below of the UI 
errors for the auth page on {{master}}
 !eventual_auth.png! 

As for Kerberos/Solr testing, I recently came across a writeup/helper-repo that 
Ishan put together a year or two ago.  If you've got docker installed, it makes 
setting up and testing Kerberos refreshingly straightforward.  Give it a shot 
if you get a chance: https://github.com/chatman/solr-kerberos-docker


was (Author: gerlowskija):
Hey Jan,  I just tested your login screen with Kerberos (this includes the 
changes you made an hour or so ago, to clarify)

This is the behavior I'm seeing:

1. With a Kerberos ticket in my local ticket cache, I can get to the admin UI 
and perform operations without ever seeing a login screen.  The admin UI is 
definitely usable.
2. If I destroy my Kerberos ticket or it expires, subsequent navigation or 
operations will produce a username/password login page.
3. If my machine acquires a valid ticket, I can then click on the 'Dashboard' 
menu item to get away from the login page and back to the dashboard.

So in summary, the Admin UI is definitely usable when Kerberos auth is being 
used.  But that said the login/auth page still seems a little 
BasicAuth-specific, and inappropriate for other auth schemes.  Some specific 
issues:

#. We probably shouldn't be displaying {{username}} and {{password}} dialog 
boxes unless we're sure the user is using a auth scheme where those values make 
sense (they don't in Kerberos, for example).
#. Some other terms on the page also seem a little too Basic Auth specific to 
be useful for other auth schemes.  "Login/Logout" might be examples of this - 
those terms are rarely used when discussing Kerberos authentication.  Not 
entirely sure on this though.
#. It looks like when Kerberos is used, several templated values needed for the 
auth page are missing, causing UI errors.  Not familiar with how the UI works, 
so I may be off on the cause here.  I've attached a screenshot below of the UI 
errors for the auth page on {{master}}
 !eventual_auth.png! 

As for Kerberos/Solr testing, I recently came across a writeup/helper-repo that 
Ishan put together a year or two ago.  If you've got docker installed, it makes 
setting up and testing Kerberos refreshingly straightforward.  Give it a shot 
if you get a chance: https://github.com/chatman/solr-kerberos-docker

> Add Admin UI login support for Kerberos
> ---
>
> Key: SOLR-13116
> URL: https://issues.apache.org/jira/browse/SOLR-13116
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: master (8.0), 7.7
>Reporter: Jan Høydahl
>Priority: Major
> Attachments: eventual_auth.png
>
>
> Spinoff from SOLR-7896. Kerberos auth plugin should get Admin UI Login 
> support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail:

[jira] [Commented] (SOLR-13116) Add Admin UI login support for Kerberos

2019-01-07 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735833#comment-16735833
 ] 

Jason Gerlowski commented on SOLR-13116:


Hey Jan,  I just tested your login screen with Kerberos (this includes the 
changes you made an hour or so ago, to clarify)

This is the behavior I'm seeing:

1. With a Kerberos ticket in my local ticket cache, I can get to the admin UI 
and perform operations without ever seeing a login screen.  The admin UI is 
definitely usable.
2. If I destroy my Kerberos ticket or it expires, subsequent navigation or 
operations will produce a username/password login page.
3. If my machine acquires a valid ticket, I can then click on the 'Dashboard' 
menu item to get away from the login page and back to the dashboard.

So in summary, the Admin UI is definitely usable when Kerberos auth is being 
used.  But that said the login/auth page still seems a little 
BasicAuth-specific, and inappropriate for other auth schemes.  Some specific 
issues:

#. We probably shouldn't be displaying {{username}} and {{password}} dialog 
boxes unless we're sure the user is using a auth scheme where those values make 
sense (they don't in Kerberos, for example).
#. Some other terms on the page also seem a little too Basic Auth specific to 
be useful for other auth schemes.  "Login/Logout" might be examples of this - 
those terms are rarely used when discussing Kerberos authentication.  Not 
entirely sure on this though.
#. It looks like when Kerberos is used, several templated values needed for the 
auth page are missing, causing UI errors.  Not familiar with how the UI works, 
so I may be off on the cause here.  I've attached a screenshot below of the UI 
errors for the auth page on {{master}}
 !eventual_auth.png! 

As for Kerberos/Solr testing, I recently came across a writeup/helper-repo that 
Ishan put together a year or two ago.  If you've got docker installed, it makes 
setting up and testing Kerberos refreshingly straightforward.  Give it a shot 
if you get a chance: https://github.com/chatman/solr-kerberos-docker

> Add Admin UI login support for Kerberos
> ---
>
> Key: SOLR-13116
> URL: https://issues.apache.org/jira/browse/SOLR-13116
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: master (8.0), 7.7
>Reporter: Jan Høydahl
>Priority: Major
> Attachments: eventual_auth.png
>
>
> Spinoff from SOLR-7896. Kerberos auth plugin should get Admin UI Login 
> support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-13116) Add Admin UI login support for Kerberos

2019-01-07 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-13116:
---
Attachment: eventual_auth.png

> Add Admin UI login support for Kerberos
> ---
>
> Key: SOLR-13116
> URL: https://issues.apache.org/jira/browse/SOLR-13116
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: master (8.0), 7.7
>Reporter: Jan Høydahl
>Priority: Major
> Attachments: eventual_auth.png
>
>
> Spinoff from SOLR-7896. Kerberos auth plugin should get Admin UI Login 
> support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-7896) Add a login page for Solr Administrative Interface

2019-01-07 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-7896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-7896:
--
Attachment: eventual_auth.png

> Add a login page for Solr Administrative Interface
> --
>
> Key: SOLR-7896
> URL: https://issues.apache.org/jira/browse/SOLR-7896
> Project: Solr
>  Issue Type: New Feature
>  Components: Admin UI, Authentication, security
>Affects Versions: 5.2.1
>Reporter: Aaron Greenspan
>Assignee: Jan Høydahl
>Priority: Major
>  Labels: authentication, login, password
> Fix For: master (8.0), 7.7
>
> Attachments: SOLR-7896-bugfix-7jan.patch, 
> SOLR-7896-bugfix-7jan.patch, dispatchfilter-code.png, eventual_auth.png, 
> login-page.png, login-screen-2.png, logout.png, unknown_scheme.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Now that Solr supports Authentication plugins, the missing piece is to be 
> allowed access from Admin UI when authentication is enabled. For this we need
>  * Some plumbing in Admin UI that allows the UI to detect 401 responses and 
> redirect to login page
>  * Possibility to have multiple login pages depending on auth method and 
> redirect to the correct one
>  * [AngularJS HTTP 
> interceptors|https://docs.angularjs.org/api/ng/service/$http#interceptors] to 
> add correct HTTP headers on all requests when user is logged in
> This issue should aim to implement some of the plumbing mentioned above, and 
> make it work with Basic Auth.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-13045) Harden TestSimPolicyCloud

2019-01-03 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski resolved SOLR-13045.

   Resolution: Fixed
Fix Version/s: 7.6.1
   7.7
   master (8.0)

> Harden TestSimPolicyCloud
> -
>
> Key: SOLR-13045
> URL: https://issues.apache.org/jira/browse/SOLR-13045
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: master (8.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Fix For: master (8.0), 7.7, 7.6.1
>
> Attachments: SOLR-13045.patch, SOLR-13045.patch, jenkins.log.txt.gz
>
>
> Several tests in TestSimPolicyCloud, but especially 
> {{testCreateCollectionAddReplica}}, have some flaky behavior, even after 
> Mark's recent test-fix commit.  This JIRA covers looking into and (hopefully) 
> fixing this test failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13045) Harden TestSimPolicyCloud

2019-01-02 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732443#comment-16732443
 ] 

Jason Gerlowski commented on SOLR-13045:


fucit.org reports zero failures in the past week, so I think we can call this 
done.  I'm going to backport the fixes to branch_7_6 tonight, in case there's 
interest in a 7.6.1 at some point, and then I'll be closing this out.

> Harden TestSimPolicyCloud
> -
>
> Key: SOLR-13045
> URL: https://issues.apache.org/jira/browse/SOLR-13045
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: master (8.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13045.patch, SOLR-13045.patch, jenkins.log.txt.gz
>
>
> Several tests in TestSimPolicyCloud, but especially 
> {{testCreateCollectionAddReplica}}, have some flaky behavior, even after 
> Mark's recent test-fix commit.  This JIRA covers looking into and (hopefully) 
> fixing this test failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-13090) Make maxBooleanClauses support system-property override

2019-01-02 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski resolved SOLR-13090.

   Resolution: Fixed
Fix Version/s: 7.7
   master (8.0)

> Make maxBooleanClauses support system-property override
> ---
>
> Key: SOLR-13090
> URL: https://issues.apache.org/jira/browse/SOLR-13090
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (8.0), 7.7
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Minor
> Fix For: master (8.0), 7.7
>
> Attachments: SOLR-13090.patch
>
>
> Currently, the {{maxBooleanClauses}} property is specified in most 
> solrconfig's as the hardcoded value "1024".  It'd be nice if we changed our 
> shipped configs so that they instead specified it as 
> {{${solr.max.booleanClauses:1024} This would maintain the current OOTB behavior (maxBooleanClauses would still 
> default to 1024) while adding the ability to update maxBooleanClauses values 
> across the board much more easily.  (I see users want to do this often when 
> they first run up against this limit.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-6595) Improve error response in case distributed collection cmd fails

2019-01-02 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-6595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski reassigned SOLR-6595:
-

Assignee: (was: Jason Gerlowski)

> Improve error response in case distributed collection cmd fails
> ---
>
> Key: SOLR-6595
> URL: https://issues.apache.org/jira/browse/SOLR-6595
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.10
> Environment: SolrCloud with Client SSL
>Reporter: Sindre Fiskaa
>Priority: Minor
> Attachments: SOLR-6595.patch
>
>
> Followed the description 
> https://cwiki.apache.org/confluence/display/solr/Enabling+SSL and generated a 
> self signed key pair. Configured a few solr-nodes and used the collection api 
> to crate a new collection. -I get error message when specify the nodes with 
> the createNodeSet param. When I don't use createNodeSet param the collection 
> gets created without error on random nodes. Could this be a bug related to 
> the createNodeSet param?- *Update: It failed due to what turned out to be 
> invalid client certificate on the overseer, and returned the following 
> response:*
> {code:xml}
> 
>   0 name="QTime">185
>   
> org.apache.solr.client.solrj.SolrServerException:IOException occured 
> when talking to server at: https://vt-searchln04:443/solr
>   
> 
> {code}
> *Update: Three problems:*
> # Status=0 when the cmd did not succeed (only ZK was updated, but cores not 
> created due to failing to connect to shard nodes to talk to core admin API).
> # The error printed does not tell which action failed. Would be helpful to 
> either get the msg from the original exception or at least some message 
> saying "Failed to create core, see log on Overseer 
> # State of collection is not clean since it exists as far as ZK is concerned 
> but cores not created. Thus retrying the CREATECOLLECTION cmd would fail. 
> Should Overseer detect error in distributed cmds and rollback changes already 
> made in ZK?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-13038) Overseer actions fail with NoHttpResponseException following a node restart

2019-01-02 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski reassigned SOLR-13038:
--

Assignee: (was: Jason Gerlowski)

I hope to revisit this soon, but don't have time to focus on it in the 
immediate future.  So I'm removing myself as the assignee.

I still think this is an important issue to fix though, as it's a continuing 
contributor to test flakiness, as well as production behavior.

> Overseer actions fail with NoHttpResponseException following a node restart
> ---
>
> Key: SOLR-13038
> URL: https://issues.apache.org/jira/browse/SOLR-13038
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: master (8.0)
>Reporter: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13038.patch
>
>
> I noticed recently that a lot of overseer operations fail if they're executed 
> right after a restart of a Solr node.  The failure returns a message like 
> "org.apache.solr.client.solrj.SolrServerException:IOException occured when 
> talking to server at: https://127.0.0.1:62253/solr;.  The logs are a bit more 
> helpful:
> {code}
> org.apache.solr.client.solrj.SolrServerException: IOException occured when 
> talking to server at: https://127.0.0.1:62253/solr
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:657)
>  ~[java/:?]
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
>  ~[java/:?]
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
>  ~[java/:?]
> at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1260) 
> ~[java/:?]
> at 
> org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:172)
>  ~[java/:?]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_172]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_172]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_172]
> at 
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
>  ~[metrics-core-3.2.6.jar:3.2.6]
> at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
>  ~[java/:?]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [?:1.8.0_172]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [?:1.8.0_172]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172]
> Caused by: org.apache.http.NoHttpResponseException: 127.0.0.1:62253 failed to 
> respond
> at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141)
>  ~[httpclient-4.5.6.jar:4.5.6]
> at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>  ~[httpclient-4.5.6.jar:4.5.6]
> at 
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>  ~[httpcore-4.4.10.jar:4.4.10]
> at 
> org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>  ~[httpcore-4.4.10.jar:4.4.10]
> at 
> org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
>  ~[httpclient-4.5.6.jar:4.5.6]
> at 
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>  ~[httpcore-4.4.10.jar:4.4.10]
> at 
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>  ~[httpcore-4.4.10.jar:4.4.10]
> at 
> org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:120)
>  ~[java/:?]
> at 
> org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
>  ~[httpclient-4.5.6.jar:4.5.6]
> at 
> org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) 
> ~[httpclient-4.5.6.jar:4.5.6]
> at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) 
> ~[httpclient-4.5.6.jar:4.5.6]
> at 
> org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) 
> ~[httpclient-4.5.6.jar:4.5.6]
> at 
> org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
>  ~[httpclient-4.5.6.jar:4.5.6]
> at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
>  ~[httpclient-4.5.6.jar:4.5.6]
> at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
>  ~[httpclient-4.5.6.jar:4.5.6]
>

[jira] [Commented] (SOLR-6595) Improve error response in case distributed collection cmd fails

2019-01-02 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-6595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732042#comment-16732042
 ] 

Jason Gerlowski commented on SOLR-6595:
---

I'm not going to have much time in the immediate future to finish this up, so I 
wanted to summarize the progress so far:

- the latest patch sets the "status" property to 500 when the "failure" list is 
present and non-empty
- because of this, SolrJ will now throw exceptions in failure cases where it 
previously allowed the request to fail silently.  This causes some tests to 
fail that were passing (incorrectly) before.  I investigated a few examples of 
this, and most were in test setup/cleanup when the expectations were a bit off. 
 There weren't a ton of these failures though and they should be simpler to 
debug thanks to other recent test flakiness improvements.
- I investigated making changes to SolrJ that would attach a NamedList to 
SolrExceptions thrown because of a 500, but didn't pursue that too far.  It's 
probably a separate JIRA anyways. 

> Improve error response in case distributed collection cmd fails
> ---
>
> Key: SOLR-6595
> URL: https://issues.apache.org/jira/browse/SOLR-6595
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.10
> Environment: SolrCloud with Client SSL
>Reporter: Sindre Fiskaa
>Assignee: Jason Gerlowski
>Priority: Minor
> Attachments: SOLR-6595.patch
>
>
> Followed the description 
> https://cwiki.apache.org/confluence/display/solr/Enabling+SSL and generated a 
> self signed key pair. Configured a few solr-nodes and used the collection api 
> to crate a new collection. -I get error message when specify the nodes with 
> the createNodeSet param. When I don't use createNodeSet param the collection 
> gets created without error on random nodes. Could this be a bug related to 
> the createNodeSet param?- *Update: It failed due to what turned out to be 
> invalid client certificate on the overseer, and returned the following 
> response:*
> {code:xml}
> 
>   0 name="QTime">185
>   
> org.apache.solr.client.solrj.SolrServerException:IOException occured 
> when talking to server at: https://vt-searchln04:443/solr
>   
> 
> {code}
> *Update: Three problems:*
> # Status=0 when the cmd did not succeed (only ZK was updated, but cores not 
> created due to failing to connect to shard nodes to talk to core admin API).
> # The error printed does not tell which action failed. Would be helpful to 
> either get the msg from the original exception or at least some message 
> saying "Failed to create core, see log on Overseer 
> # State of collection is not clean since it exists as far as ZK is concerned 
> but cores not created. Thus retrying the CREATECOLLECTION cmd would fail. 
> Should Overseer detect error in distributed cmds and rollback changes already 
> made in ZK?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13037) Harden TestSimGenericDistributedQueue.

2018-12-21 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726913#comment-16726913
 ] 

Jason Gerlowski commented on SOLR-13037:


fucit.org hasn't shown any {{branch_7x}} or {{master}} failures for this test 
since the fix went in last week.  So I'm going to mark this as closed.

(There are a few branch_7_6 failures, which makes sense since the fix hasn't 
gone to that branch.  I'm happy to add the fix to that branch as well if anyone 
wants it, but my understanding is that we don't normally do this unless unless 
the fix is for a production-bug.  It might make it marginally easier for anyone 
cutting a theoretical 7.6.1 to get passing builds, which was apparently a 
serious problem with 7.6.  So I've got mixed feelings, but will hold off for 
now.)

> Harden TestSimGenericDistributedQueue.
> --
>
> Key: SOLR-13037
> URL: https://issues.apache.org/jira/browse/SOLR-13037
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mark Miller
>Assignee: Jason Gerlowski
>Priority: Major
> Fix For: master (8.0), 7.7
>
> Attachments: SOLR-13037.patch, repro-log.txt
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-13037) Harden TestSimGenericDistributedQueue.

2018-12-21 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski resolved SOLR-13037.

   Resolution: Fixed
Fix Version/s: 7.7
   master (8.0)

> Harden TestSimGenericDistributedQueue.
> --
>
> Key: SOLR-13037
> URL: https://issues.apache.org/jira/browse/SOLR-13037
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mark Miller
>Assignee: Jason Gerlowski
>Priority: Major
> Fix For: master (8.0), 7.7
>
> Attachments: SOLR-13037.patch, repro-log.txt
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-13090) Make maxBooleanClauses support system-property override

2018-12-21 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski reassigned SOLR-13090:
--

Assignee: Jason Gerlowski

> Make maxBooleanClauses support system-property override
> ---
>
> Key: SOLR-13090
> URL: https://issues.apache.org/jira/browse/SOLR-13090
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (8.0), 7.7
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Minor
>
> Currently, the {{maxBooleanClauses}} property is specified in most 
> solrconfig's as the hardcoded value "1024".  It'd be nice if we changed our 
> shipped configs so that they instead specified it as 
> {{${solr.max.booleanClauses:1024} This would maintain the current OOTB behavior (maxBooleanClauses would still 
> default to 1024) while adding the ability to update maxBooleanClauses values 
> across the board much more easily.  (I see users want to do this often when 
> they first run up against this limit.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-13090) Make maxBooleanClauses support system-property override

2018-12-21 Thread Jason Gerlowski (JIRA)

Jason Gerlowski created SOLR-13090:
--

 Summary: Make maxBooleanClauses support system-property override
 Key: SOLR-13090
 URL: https://issues.apache.org/jira/browse/SOLR-13090
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: master (8.0), 7.7
Reporter: Jason Gerlowski


Currently, the {{maxBooleanClauses}} property is specified in most solrconfig's 
as the hardcoded value "1024".  It'd be nice if we changed our shipped configs 
so that they instead specified it as 
{{${solr.max.booleanClauses:1024}

[jira] [Commented] (SOLR-13045) Harden TestSimPolicyCloud

2018-12-20 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725884#comment-16725884
 ] 

Jason Gerlowski commented on SOLR-13045:


One of the remaining failures for in TestSimPolicyCloud occurs in 
{{testCreateCollectionAddShardUsingPolicy}} when the initial collection 
creation (and subsequent shard creation) seem to violate a policy which 
specifies that all replicas should be created on the same node.  After looking 
closer, it looks like this comes down to a race condition of sorts between two 
threads attempting to set the autoscaling.json "ZK" node.

Two different threads touch the autoscaling config node in this test: the 
OverseerTriggerThread tries to set the default nodeAdded trigger, and the test 
code tries to set a policy that the test relies on.  These threads rely on 
optimistic concurrency versioning to ensure that updates don't clobber one 
another.  But SimDistribStateManager has a bug which prevents this from working 
correctly all the time.  The initial node version in the sim framework is -1, 
which is also the flag used to indicate "I don't care about concurrency, just 
overwrite the node".  (For comparison, ZkDistribStateManager has node versions 
start at 0).  Depending on timing, this causes the default nodeAdded trigger to 
clobber the policy that our test relies on, causing it to fail.

So one fix that'll make this test (and probably others in the sim framework) 
more reliable is to ensure that SimDistribStateManager's node-versioning lines 
up better with ZkDistribStateManager's.  Or at least that it avoids this -1 
edge case.  I've been testing variations of a patch to accomplish this, and 
will upload my results shortly.

> Harden TestSimPolicyCloud
> -
>
> Key: SOLR-13045
> URL: https://issues.apache.org/jira/browse/SOLR-13045
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: master (8.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13045.patch, SOLR-13045.patch, jenkins.log.txt.gz
>
>
> Several tests in TestSimPolicyCloud, but especially 
> {{testCreateCollectionAddReplica}}, have some flaky behavior, even after 
> Mark's recent test-fix commit.  This JIRA covers looking into and (hopefully) 
> fixing this test failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13045) Harden TestSimPolicyCloud

2018-12-18 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724116#comment-16724116
 ] 

Jason Gerlowski commented on SOLR-13045:


Checking back in a week later.  The work above has cut down the failure rate 
from 5% to maybe 1-2%, but there's still issues with this test.  Attaching a 
jenkins log containing a current failure from 2 days ago.  (Don't want to lose 
the log when it cycles out of fucit).

At first glance, the failure looks like it happens because a replica is created 
on the wrong node (contrary to a specified policy).  Starting to look into 
things now.

> Harden TestSimPolicyCloud
> -
>
> Key: SOLR-13045
> URL: https://issues.apache.org/jira/browse/SOLR-13045
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: master (8.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13045.patch, SOLR-13045.patch
>
>
> Several tests in TestSimPolicyCloud, but especially 
> {{testCreateCollectionAddReplica}}, have some flaky behavior, even after 
> Mark's recent test-fix commit.  This JIRA covers looking into and (hopefully) 
> fixing this test failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-13045) Harden TestSimPolicyCloud

2018-12-18 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-13045:
---
Attachment: jenkins.log.txt.gz

> Harden TestSimPolicyCloud
> -
>
> Key: SOLR-13045
> URL: https://issues.apache.org/jira/browse/SOLR-13045
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: master (8.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13045.patch, SOLR-13045.patch, jenkins.log.txt.gz
>
>
> Several tests in TestSimPolicyCloud, but especially 
> {{testCreateCollectionAddReplica}}, have some flaky behavior, even after 
> Mark's recent test-fix commit.  This JIRA covers looking into and (hopefully) 
> fixing this test failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-13078) Harden TestSimNodeAddedTrigger

2018-12-17 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski reassigned SOLR-13078:
--

Assignee: Jason Gerlowski

> Harden TestSimNodeAddedTrigger
> --
>
> Key: SOLR-13078
> URL: https://issues.apache.org/jira/browse/SOLR-13078
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
>
> Jenkins has been failing occasionally with issues in TestSimNodeAddedTrigger. 
>  We should look into these and make it pass more reliably.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-13078) Harden TestSimNodeAddedTrigger

2018-12-17 Thread Jason Gerlowski (JIRA)

Jason Gerlowski created SOLR-13078:
--

 Summary: Harden TestSimNodeAddedTrigger
 Key: SOLR-13078
 URL: https://issues.apache.org/jira/browse/SOLR-13078
 Project: Solr
  Issue Type: Test
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Jason Gerlowski


Jenkins has been failing occasionally with issues in TestSimNodeAddedTrigger.  
We should look into these and make it pass more reliably.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-13042) Miscellaneous JSON Facet API docs improvements

2018-12-12 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-13042:
---
Attachment: SOLR-13042.patch

> Miscellaneous JSON Facet API docs improvements
> --
>
> Key: SOLR-13042
> URL: https://issues.apache.org/jira/browse/SOLR-13042
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Affects Versions: 7.5, master (8.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Minor
> Attachments: SOLR-13042.patch, SOLR-13042.patch
>
>
> While working on SOLR-12965 I noticed a few minor issues with the JSON 
> faceting ref-guide pages.  Nothing serious, just a few annoyances.  Tweaks 
> include:
> * missing/insufficient description of some params for Heatmap facets
> * Weird formatting on "Domain Filters" example
> * missing "fields"/"fl" in the "Parameters Mapping" table
> Figured I'd just create a JIRA and fix these before I forgot about them



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13065) Harden TestSimExecuteActionPlan

2018-12-12 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719443#comment-16719443
 ] 

Jason Gerlowski commented on SOLR-13065:


When I disable SimClusterStateProvider's caching, the error disappears in a 
beast run of {{-Dbeast.iters=400 -Dtests.dupes=30 -Dtests.iters=20}}, which 
implies that the cluster state caching is the only issue, and we'll need to 
follow a similar fix to SOLR-13045. 

> Harden TestSimExecuteActionPlan
> ---
>
> Key: SOLR-13065
> URL: https://issues.apache.org/jira/browse/SOLR-13065
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (8.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
>
> TestSimExecuteActionPlan is a serial offender in our failed Jenkins jobs.  
> Would like to look into improving it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13065) Harden TestSimExecuteActionPlan

2018-12-12 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719267#comment-16719267
 ] 

Jason Gerlowski commented on SOLR-13065:


At first glance, this looks like a similar problem to what I recently saw in 
SOLR-13045.  The test fails in a {{waitForState}} block, but there's some 
indication that we're using an outdated (cached?) copy of the clusterstatus 
info.

Here's a partial stack from a recent failure I got:

{code}
  [beaster]   2> NOTE: reproduce with: ant test  
-Dtestcase=TestSimExecutePlanAction -Dtests.method=testIntegration 
-Dtests.seed=18902C9108C137F1 -Dtests.slow=true -Dtests.badapples=true 
-Dtests.locale=es-GT -Dtests.timezone=Asia/Rangoon -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8
  [beaster]   2> 24745 INFO  (simCloudManagerPool-112-thread-8) [] 
o.a.s.c.CloudTestUtils -- wrong number of active replicas in slice shard1, 
expected=1, found=2
  [beaster] [12:26:46.105] FAILURE 2.13s | 
TestSimExecutePlanAction.testIntegration 
{seed=[18902C9108C137F1:7163CC06353074F9]} <<< 
  [beaster]> Throwable #1: java.lang.AssertionError: Timed out waiting for 
replicas of collection to be 2 again
  [beaster]> Live Nodes: [127.0.0.1:10016_solr]
  [beaster]> Last available state: 
DocCollection(testIntegration//clusterstate.json/444)={
 ...
  [beaster]>  at 
__randomizedtesting.SeedInfo.seed([18902C9108C137F1:7163CC06353074F9]:0)
  [beaster]>  at 
org.apache.solr.cloud.CloudTestUtils.waitForState(CloudTestUtils.java:70)
  [beaster]>  at 
org.apache.solr.cloud.autoscaling.sim.TestSimExecutePlanAction.testIntegration(TestSimExecutePlanAction.java:200
...
  [beaster]> Caused by: java.util.concurrent.TimeoutException: last 
ClusterState: znodeVersion: 445
{code}

Note the different reported "last" clusterstate versions.  We see that there's 
a clusterstate.json version 445, but the failing assertion only has 444.  
That's not to say definitively that version 445 would pass the assertion, but 
it's a place to start. 

> Harden TestSimExecuteActionPlan
> ---
>
> Key: SOLR-13065
> URL: https://issues.apache.org/jira/browse/SOLR-13065
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (8.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
>
> TestSimExecuteActionPlan is a serial offender in our failed Jenkins jobs.  
> Would like to look into improving it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-13065) Harden TestSimExecuteActionPlan

2018-12-12 Thread Jason Gerlowski (JIRA)

Jason Gerlowski created SOLR-13065:
--

 Summary: Harden TestSimExecuteActionPlan
 Key: SOLR-13065
 URL: https://issues.apache.org/jira/browse/SOLR-13065
 Project: Solr
  Issue Type: Test
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: master (8.0)
Reporter: Jason Gerlowski
Assignee: Jason Gerlowski


TestSimExecuteActionPlan is a serial offender in our failed Jenkins jobs.  
Would like to look into improving it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13037) Harden TestSimGenericDistributedQueue.

2018-12-12 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719102#comment-16719102
 ] 

Jason Gerlowski commented on SOLR-13037:


I've attached a patch which takes approach #2 above.  With it, I haven't seen 
any GDQ test failures, though I'll be more confident with more beasting.  Will 
run some tests in the background the rest of today and then commit tonight if 
things still look good. 

> Harden TestSimGenericDistributedQueue.
> --
>
> Key: SOLR-13037
> URL: https://issues.apache.org/jira/browse/SOLR-13037
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mark Miller
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13037.patch, repro-log.txt
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-13037) Harden TestSimGenericDistributedQueue.

2018-12-12 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-13037:
---
Attachment: SOLR-13037.patch

> Harden TestSimGenericDistributedQueue.
> --
>
> Key: SOLR-13037
> URL: https://issues.apache.org/jira/browse/SOLR-13037
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mark Miller
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13037.patch, repro-log.txt
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-13037) Harden TestSimGenericDistributedQueue.

2018-12-12 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719000#comment-16719000
 ] 

Jason Gerlowski edited comment on SOLR-13037 at 12/12/18 1:59 PM:
--

To (hopefully) explain things a little more clearly, here's the race condition 
I think we're running into here.  There's a few sections of 
{{TestSimGenericDistributedQueue}} that seem to fail, but let's zoom in on one 
in particular.  Check out TestSimDistributedQueue lines 73-74:

{code}
 (new QueueChangerThread(dq,1000)).start();
 assertNotNull(dq.peek(15000));
{code}

This test code has two threads of interest. The QueueChangerThread we see 
created here will sleep for one second, and then insert data into the queue. 
Meanwhile the main test thread will wait for some data to be inserted into the 
queue. Our queue-reading waits a pretty generous amount of time for things to 
enter the queue, so the insert should always finish in time and the read should 
always pick it up.

Some more detail on the operation of each queue operation happens. First the 
queue-write (i.e. {{offer()}}):
 - [Acquire lock 
'multilock'|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L461]
 - [Create queue entry node and attach it to 
parent|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L324]
 - [Wake up any threads sleeping on the 'changed' 
Condition|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L593]
 - [Release lock 
'multilock'|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L465]
 - [Set data for queue 
entry|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L468]

Now the queue-read. Queue-reading works off of a cache of "known queue entries" 
and most queue-reads are handled from there. But the test failure only occurs 
when we need to refresh this cache and read straight from ZK, so I'll skip the 
cache logic here.
 - [Acquire lock 
'updateLock'|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L186]
 - [loop until we're out of time to 
wait:|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L189]
 ** [look for an element and return if 
non-null|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L190]
 ** [sleep until we receive a wakeup  from 'changed' Condition or we time 
out.|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L194]
 - [Release lock 
'updateLock'.|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L198]

There's a problem with the queue-write code above.  We wake up threads after 
creating the queue-entry, but before it's fully initialized with its data.  
This opens the door to readers seeing the data before it's fully ready and 
going back to sleep.  The 'changed' signalling has already happened, so any 
readers that see the data too early will go back to sleep and not wake up again 
until timeout.

There's a few ways we can fix this:
- we could add a `changed.signalAll()` call at the end of {{offer()}}, to 
ensure that there's at least 1 wakeup after the data has been fully added.
- we can alter the flow of SimDistribStateManager.createData so that the node 
is only attached to the tree after its data has been fully initialized
- we could register a Watcher that triggers on "data-changed", similar to how 
we already trigger a watcher on "child-added"  
 

I think the second option is probably the "right" fix here, so I'll pursue that 
unless others have other opinions.


was (Author: gerlowskija):
To (hopefully) explain things a little more clearly, here's the race condition 
I think we're running into here.  There's a few sections of 
{{TestSimGenericDistributedQueue}} that seem to fail, but let's zoom in on one 
in particular.  Check out TestSimDistributedQueue lines 73-74:

{code}
 (new QueueChangerThread(dq,1000)).start();

[jira] [Comment Edited] (SOLR-13037) Harden TestSimGenericDistributedQueue.

2018-12-12 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719000#comment-16719000
 ] 

Jason Gerlowski edited comment on SOLR-13037 at 12/12/18 1:58 PM:
--

To (hopefully) explain things a little more clearly, here's the race condition 
I think we're running into here.  There's a few sections of 
{{TestSimGenericDistributedQueue}} that seem to fail, but let's zoom in on one 
in particular.  Check out TestSimDistributedQueue lines 73-74:

{code}
 (new QueueChangerThread(dq,1000)).start();
 assertNotNull(dq.peek(15000));
{code}

This test code has two threads of interest. The QueueChangerThread we see 
created here will sleep for one second, and then insert data into the queue. 
Meanwhile the main test thread will wait for some data to be inserted into the 
queue. Our queue-reading waits a pretty generous amount of time for things to 
enter the queue, so the insert should always finish in time and the read should 
always pick it up.

Some more detail on the operation of each queue operation happens. First the 
queue-write (i.e. {{offer()}}):
 - [Acquire lock 
'multilock'|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L461]
 - [Create queue entry node and attach it to 
parent|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L324]
 - [Wake up any threads sleeping on the 'changed' 
Condition|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L593]
 - [Release lock 
'multilock'|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L465]
 - [Set data for queue 
entry|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L468]

Now the queue-read. Queue-reading works off of a cache of "known queue entries" 
and most queue-reads are handled from there. But the test failure only occurs 
when we need to refresh this cache and read straight from ZK, so I'll skip the 
cache logic here.
 - [Acquire lock 
'updateLock'|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L186]
 - [loop until we're out of time to 
wait:|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L189]
 ** [look for an element and return if 
non-null|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L190]
 ** [sleep until we receive a wakeup  from 'changed' Condition or we time 
out.|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L194]
 - [Release lock 
'updateLock'.|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L198]

There's a problem with the queue-write code above.  We wake up threads after 
creating the queue-entry, but before it's fully initialized with its data.  
This opens the door to readers seeing the data before it's fully ready and 
going back to sleep.  The 'changed' signalling has already happened, so any 
readers that see the data too early will go back to sleep and not wake up again 
until timeout.

There's a few ways we can fix this:
- we could add a `changed.signalAll()` call at the end of {{offer()}}, to 
ensure that there's at least 1 wakeup after the data has been fully added.
- we can alter the flow of SimDistribStateManager.createData so that the node 
is only attached to the tree after its data has been fully initialized
- we could register a Watcher that triggers on "data-changed", similar to how 
we already trigger a watcher on "child-added"  
 
I think the second option is probably the "right" fix here, so I'll pursue that 
unless others have other opinions.


was (Author: gerlowskija):
To (hopefully) explain things a little more clearly, here's the race condition 
I think we're running into here.  There's a few sections of 
{{TestSimGenericDistributedQueue}} that seem to fail, but let's zoom in on one 
in particular.  Check out TestSimDistributedQueue lines 73-74:

{code}
 (new QueueChangerThread(dq,1000)).start();

[jira] [Comment Edited] (SOLR-13037) Harden TestSimGenericDistributedQueue.

2018-12-12 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719000#comment-16719000
 ] 

Jason Gerlowski edited comment on SOLR-13037 at 12/12/18 1:57 PM:
--

To (hopefully) explain things a little more clearly, here's the race condition 
I think we're running into here.  There's a few sections of 
{{TestSimGenericDistributedQueue}} that seem to fail, but let's zoom in on one 
in particular.  Check out TestSimDistributedQueue lines 73-74:

{code}
 (new QueueChangerThread(dq,1000)).start();
 assertNotNull(dq.peek(15000));
{code}

This test code has two threads of interest. The QueueChangerThread we see 
created here will sleep for one second, and then insert data into the queue. 
Meanwhile the main test thread will wait for some data to be inserted into the 
queue. Our queue-reading waits a pretty generous amount of time for things to 
enter the queue, so the insert should always finish in time and the read should 
always pick it up.

Some more detail on the operation of each queue operation happens. First the 
queue-write (i.e. {{offer()}}):
 - [Acquire lock 
'multilock'|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L461]
 - [Create queue entry node and attach it to 
parent|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L324]
 - [Wake up any threads sleeping on the 'changed' 
Condition|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L593]
 - [Release lock 
'multilock'|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L465]
 - [Set data for queue 
entry|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L468]

Now the queue-read. Queue-reading works off of a cache of "known queue entries" 
and most queue-reads are handled from there. But the test failure only occurs 
when we need to refresh this cache and read straight from ZK, so I'll skip the 
cache logic here.
 - [Acquire lock 
'updateLock'|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L186]
 - [loop until we're out of time to 
wait:|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L189]
 ** [look for an element and return if 
non-null|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L190]
 ** [sleep until we receive a wakeup  from 'changed' Condition or we time 
out.|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L194]
 - [Release lock 
'updateLock'.|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L198]

There's a problem with the queue-write code above.  We wake up threads after 
creating the queue-entry, but before it's fully initialized with its data.  
This opens the door to readers seeing the data before it's fully ready and 
going back to sleep.  The 'changed' signalling has already happened, so any 
readers that see the data too early will go back to sleep and not wake up again 
until timeout.

There's a few ways we can fix this:
- we could add a `changed.signalAll()` call at the end of {{offer()}}, to 
ensure that there's at least 1 wakeup after the data has been fully added.
- we can alter the flow of SimDistribStateManager.createData so that the node 
is only attached to the tree after its data has been fully initialized
- we could register a Watcher that triggers on "data-changed", similar to how 
we already trigger a watcher on "child-added"  
 


was (Author: gerlowskija):
To (hopefully) explain things a little more clearly, here's the race condition 
I think we're running into here.  There's a few sections of 
{{TestSimGenericDistributedQueue}} that seem to fail, but let's zoom in on one 
in particular.  Check out TestSimDistributedQueue lines 73-74:

(code}
 (new QueueChangerThread(dq,1000)).start();
 assertNotNull(dq.peek(15000));
{code}

This test code has two threads of interest. The QueueChangerThread we see

[jira] [Commented] (SOLR-13037) Harden TestSimGenericDistributedQueue.

2018-12-12 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719000#comment-16719000
 ] 

Jason Gerlowski commented on SOLR-13037:


To (hopefully) explain things a little more clearly, here's the race condition 
I think we're running into here.  There's a few sections of 
{{TestSimGenericDistributedQueue}} that seem to fail, but let's zoom in on one 
in particular.  Check out TestSimDistributedQueue lines 73-74:

(code}
 (new QueueChangerThread(dq,1000)).start();
 assertNotNull(dq.peek(15000));
{code}

This test code has two threads of interest. The QueueChangerThread we see 
created here will sleep for one second, and then insert data into the queue. 
Meanwhile the main test thread will wait for some data to be inserted into the 
queue. Our queue-reading waits a pretty generous amount of time for things to 
enter the queue, so the insert should always finish in time and the read should 
always pick it up.

Some more detail on the operation of each queue operation happens. First the 
queue-write (i.e. {{offer()}}):
 - [Acquire lock 
'multilock'|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L461]
 - [Create queue entry node and attach it to 
parent|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L324]
 - [Wake up any threads sleeping on the 'changed' 
Condition|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L593]
 - [Release lock 
'multilock'|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L465]
 - [Set data for queue 
entry|https://github.com/apache/lucene-solr/blob/18356de83738d64e619898016d873993ec474d17/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimDistribStateManager.java#L468]

Now the queue-read. Queue-reading works off of a cache of "known queue entries" 
and most queue-reads are handled from there. But the test failure only occurs 
when we need to refresh this cache and read straight from ZK, so I'll skip the 
cache logic here.
 - [Acquire lock 
'updateLock'|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L186]
 - [loop until we're out of time to 
wait:|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L189]
 ** [look for an element and return if 
non-null|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L190]
 ** [sleep until we receive a wakeup  from 'changed' Condition or we time 
out.|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L194]
 - [Release lock 
'updateLock'.|https://github.com/apache/lucene-solr/blob/8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/GenericDistributedQueue.java#L198]

There's a problem with the queue-write code above.  We wake up threads after 
creating the queue-entry, but before it's fully initialized with its data.  
This opens the door to readers seeing the data before it's fully ready and 
going back to sleep.  The 'changed' signalling has already happened, so any 
readers that see the data too early will go back to sleep and not wake up again 
until timeout.

There's a few ways we can fix this:
- we could add a `changed.signalAll()` call at the end of {{offer()}}, to 
ensure that there's at least 1 wakeup after the data has been fully added.
- we can alter the flow of SimDistribStateManager.createData so that the node 
is only attached to the tree after its data has been fully initialized
- we could register a Watcher that triggers on "data-changed", similar to how 
we already trigger a watcher on "child-added"  
 

> Harden TestSimGenericDistributedQueue.
> --
>
> Key: SOLR-13037
> URL: https://issues.apache.org/jira/browse/SOLR-13037
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mark Miller
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: repro-log.txt
>
>




--
This message was sent by

[jira] [Updated] (SOLR-13037) Harden TestSimGenericDistributedQueue.

2018-12-12 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-13037:
---
Attachment: repro-log.txt

> Harden TestSimGenericDistributedQueue.
> --
>
> Key: SOLR-13037
> URL: https://issues.apache.org/jira/browse/SOLR-13037
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mark Miller
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: repro-log.txt
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13037) Harden TestSimGenericDistributedQueue.

2018-12-12 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718885#comment-16718885
 ] 

Jason Gerlowski commented on SOLR-13037:


I've attached a log file which shows the race condition that causes this to 
occur.  Most of this logging is custom, but it should still be helpful for 
others trying to understand the problem.

> Harden TestSimGenericDistributedQueue.
> --
>
> Key: SOLR-13037
> URL: https://issues.apache.org/jira/browse/SOLR-13037
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mark Miller
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: repro-log.txt
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-13037) Harden TestSimGenericDistributedQueue.

2018-12-12 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski reassigned SOLR-13037:
--

Assignee: Jason Gerlowski

> Harden TestSimGenericDistributedQueue.
> --
>
> Key: SOLR-13037
> URL: https://issues.apache.org/jira/browse/SOLR-13037
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mark Miller
>Assignee: Jason Gerlowski
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13045) Harden TestSimPolicyCloud

2018-12-11 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716891#comment-16716891
 ] 

Jason Gerlowski commented on SOLR-13045:


Committed this fix to master and branch_7x.  In testing, it looked like it also 
cleared up issues in TestSimExtremeIndexing.  So maybe we'll get a fix there 
for 'free'.  I'll keep this open for the next week to check for failures, but 
I'll close it if things looks good after that.

> Harden TestSimPolicyCloud
> -
>
> Key: SOLR-13045
> URL: https://issues.apache.org/jira/browse/SOLR-13045
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: master (8.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13045.patch, SOLR-13045.patch
>
>
> Several tests in TestSimPolicyCloud, but especially 
> {{testCreateCollectionAddReplica}}, have some flaky behavior, even after 
> Mark's recent test-fix commit.  This JIRA covers looking into and (hopefully) 
> fixing this test failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-13045) Harden TestSimPolicyCloud

2018-12-10 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-13045:
---
Attachment: SOLR-13045.patch

> Harden TestSimPolicyCloud
> -
>
> Key: SOLR-13045
> URL: https://issues.apache.org/jira/browse/SOLR-13045
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: master (8.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13045.patch, SOLR-13045.patch
>
>
> Several tests in TestSimPolicyCloud, but especially 
> {{testCreateCollectionAddReplica}}, have some flaky behavior, even after 
> Mark's recent test-fix commit.  This JIRA covers looking into and (hopefully) 
> fixing this test failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13045) Harden TestSimPolicyCloud

2018-12-10 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715236#comment-16715236
 ] 

Jason Gerlowski commented on SOLR-13045:


I found another bug where SimCloudManager was setting the "nodeRole" property 
as a single-valued-Set, instead of just giving it a String value.  This causes 
things to blow up later when a cast to {{String}} fails.  Attached patch 
includes a small fix for that.

> Harden TestSimPolicyCloud
> -
>
> Key: SOLR-13045
> URL: https://issues.apache.org/jira/browse/SOLR-13045
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: master (8.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13045.patch, SOLR-13045.patch
>
>
> Several tests in TestSimPolicyCloud, but especially 
> {{testCreateCollectionAddReplica}}, have some flaky behavior, even after 
> Mark's recent test-fix commit.  This JIRA covers looking into and (hopefully) 
> fixing this test failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13042) Miscellaneous JSON Facet API docs improvements

2018-12-10 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714973#comment-16714973
 ] 

Jason Gerlowski commented on SOLR-13042:


I was able to work on this some over the weekend and wanted to upload my 
progress for preliminary review.  I've made most of the structural changes I 
mentioned above, including pulling the various types of "domain changes" out 
into their own ref-guide page.  I'm still undecided whether "JSON Faceting" 
should become a sub-page of the "JSON Request API" or not.

Much of my time was spent adding SolrJ snippets to the pages.  I've finished 
this on the "Request API" page, and on the "Query DSL" page.  Still need to go 
through that effort on the "Facet API" page and the "Domain Changes" page (new 
to this patch).  I also changed all of the examples in these pages over to 
using the "techproducts" exampledocs.  This will make it easier for readers to 
try out the examples themselves.  (The existing "books" corpus isn't hard to 
setup, but it's not as easy as {{bin/solr start -e techproducts, so)}}

I'd say this patch is 80% of what I wanted to change on these pages.  Most of 
the work remaining is additional SolrJ examples and polish/wording tweaks.  
Would love some feedback if anyone has opinions or time to read through things.

> Miscellaneous JSON Facet API docs improvements
> --
>
> Key: SOLR-13042
> URL: https://issues.apache.org/jira/browse/SOLR-13042
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Affects Versions: 7.5, master (8.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Minor
> Attachments: SOLR-13042.patch
>
>
> While working on SOLR-12965 I noticed a few minor issues with the JSON 
> faceting ref-guide pages.  Nothing serious, just a few annoyances.  Tweaks 
> include:
> * missing/insufficient description of some params for Heatmap facets
> * Weird formatting on "Domain Filters" example
> * missing "fields"/"fl" in the "Parameters Mapping" table
> Figured I'd just create a JIRA and fix these before I forgot about them



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-13042) Miscellaneous JSON Facet API docs improvements

2018-12-10 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-13042:
---
Attachment: SOLR-13042.patch

> Miscellaneous JSON Facet API docs improvements
> --
>
> Key: SOLR-13042
> URL: https://issues.apache.org/jira/browse/SOLR-13042
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Affects Versions: 7.5, master (8.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Minor
> Attachments: SOLR-13042.patch
>
>
> While working on SOLR-12965 I noticed a few minor issues with the JSON 
> faceting ref-guide pages.  Nothing serious, just a few annoyances.  Tweaks 
> include:
> * missing/insufficient description of some params for Heatmap facets
> * Weird formatting on "Domain Filters" example
> * missing "fields"/"fl" in the "Parameters Mapping" table
> Figured I'd just create a JIRA and fix these before I forgot about them



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13045) Harden TestSimPolicyCloud

2018-12-07 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713056#comment-16713056
 ] 

Jason Gerlowski commented on SOLR-13045:


I've attached a proposed fix for this.  With this, all tests in 
{{TestSimPolicyCloud}} looked good.  Ran them ~5000 times.  Gonna do some beast 
runs to trigger things that way, but otherwise things look good here.

> Harden TestSimPolicyCloud
> -
>
> Key: SOLR-13045
> URL: https://issues.apache.org/jira/browse/SOLR-13045
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: master (8.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13045.patch
>
>
> Several tests in TestSimPolicyCloud, but especially 
> {{testCreateCollectionAddReplica}}, have some flaky behavior, even after 
> Mark's recent test-fix commit.  This JIRA covers looking into and (hopefully) 
> fixing this test failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-13045) Harden TestSimPolicyCloud

2018-12-07 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-13045:
---
Attachment: SOLR-13045.patch

> Harden TestSimPolicyCloud
> -
>
> Key: SOLR-13045
> URL: https://issues.apache.org/jira/browse/SOLR-13045
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: master (8.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13045.patch
>
>
> Several tests in TestSimPolicyCloud, but especially 
> {{testCreateCollectionAddReplica}}, have some flaky behavior, even after 
> Mark's recent test-fix commit.  This JIRA covers looking into and (hopefully) 
> fixing this test failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-13045) Harden TestSimPolicyCloud

2018-12-07 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski reassigned SOLR-13045:
--

Assignee: Jason Gerlowski

> Harden TestSimPolicyCloud
> -
>
> Key: SOLR-13045
> URL: https://issues.apache.org/jira/browse/SOLR-13045
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: master (8.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
>
> Several tests in TestSimPolicyCloud, but especially 
> {{testCreateCollectionAddReplica}}, have some flaky behavior, even after 
> Mark's recent test-fix commit.  This JIRA covers looking into and (hopefully) 
> fixing this test failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13045) Harden TestSimPolicyCloud

2018-12-07 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713023#comment-16713023
 ] 

Jason Gerlowski commented on SOLR-13045:


I believe I found the race condition causing these failures. It looks like an 
issue between the {{waitForState}} polling, which occurs in the main test 
thread, and the leader-election execution, which occurs in a {{Future}} 
submitted to {{SimCloudManager}}'s ExecutorService.

The {{waitForState}} thread repeatedly asks for the cluster state, which looks 
a bit like this:
 * [return cached value, if any. Otherwise 
continue|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimClusterStateProvider.java#L2090]
 * [Grab 
lock|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimClusterStateProvider.java#L2093]
 * [Clear 
cache|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimClusterStateProvider.java#L2094]
 * [Build Map to store in 
cache|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimClusterStateProvider.java#L2126]
 * [Set cache with 
Map|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimClusterStateProvider.java#L2141]
 * [Release 
lock|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimClusterStateProvider.java#L2144]

The Leader Election Future looks a bit like this:
 * [Give a ReplicaInfo 
"leader=true"|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimClusterStateProvider.java#L756]
 * [Clear 
cache|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimClusterStateProvider.java#L766]

Note that the leader election Future does this without acquiring the lock. Now 
imagine the following interleaving of these two threads:
 * [Thread-Test] Grab lock
 * [Thread-Test] Clear cache
 * [Thread-Test] Build Map to store in cache
 * [Thread-LeaderElection] Give ReplicaInfo "leader=true"
 * [Thread-LeaderElection] Clear cache
 * [Thread-Test] Set cache with Map

At the end of this interleaving the cache has a value that's missing the latest 
"leader=true" changes, and nothing will ever clear it. So the {{waitForState}} 
polling will go on to fail.

We should be able to fix this by having the leader election code use the same 
Lock used elsewhere. I've actually got this change staged locally and am 
running tests on it currently. If all looks well I should have this uploaded 
soon. One thing I'll be curious to see is whether this affects any of the other 
TestSim* failures we've seen recently. If we're lucky we may get 2 (or more) 
birds with this one stone.

> Harden TestSimPolicyCloud
> -
>
> Key: SOLR-13045
> URL: https://issues.apache.org/jira/browse/SOLR-13045
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: master (8.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
>
> Several tests in TestSimPolicyCloud, but especially 
> {{testCreateCollectionAddReplica}}, have some flaky behavior, even after 
> Mark's recent test-fix commit.  This JIRA covers looking into and (hopefully) 
> fixing this test failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-13045) Harden TestSimPolicyCloud

2018-12-06 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712293#comment-16712293
 ] 

Jason Gerlowski edited comment on SOLR-13045 at 12/7/18 3:21 AM:
-

Looking at {{testCreateCollectionAddReplica}} first.  I'm still in the early 
stages of looking into this, but I think I see some things pointing to this 
being a sim-framework issue, as opposed to being a production problem.  I'm not 
super familiar with the sim-framework though, so I'll try and give some detail 
here in case anyone with more context can correct me and save me from a 
potential red-herring.

*TL;DR* I believe this to be a test-framework bug related to how the 
SimClusterStateProvider caches clusterstate values.

The test starts by creating a collection using a specific policy.  Maybe 1 time 
in 10 it'll fail in a {{CloudTestUtils.waitForState}} call.  On these failures, 
this {{waitForState}} call fails because the collection (supposedly) doesn't 
have a leader:
{code}
 last coll state: 
DocCollection(testCreateCollectionAddReplica//clusterstate.json/5)={
  "replicationFactor":"1",
  "pullReplicas":"0",
  "router":{"name":"compositeId"},
  "maxShardsPerNode":"1",
  "autoAddReplicas":"false",
  "nrtReplicas":"1",
  "tlogReplicas":"0",
  "autoCreated":"true",
  "policy":"c1",
  "shards":{"shard1":{
  "replicas":{"core_node1":{
  "core":"testCreateCollectionAddReplica_shard1_replica_n1",
  "SEARCHER.searcher.maxDoc":0,
  "SEARCHER.searcher.deletedDocs":0,
  "INDEX.sizeInBytes":10240,
  "node_name":"127.0.0.1:10068_solr",
  "state":"active",
  "type":"NRT",
  "INDEX.sizeInGB":9.5367431640625E-6,
  "SEARCHER.searcher.numDocs":0}},
  "range":"8000-7fff",
  "state":"active"}}}
{code}

But other statements in the logs indicate that this collection *does* have a 
leader.  We get this series of messages right as the test ends:
{code}
14445 INFO  
(TEST-TestSimPolicyCloud.testCreateCollectionAddReplica-seed#[6FE5447E15D3DD6F])
 [] o.a.s.SolrTestCaseJ4 ###Ending testCreateCollectionAddReplica
14446 DEBUG 
(TEST-TestSimPolicyCloud.testCreateCollectionAddReplica-seed#[6FE5447E15D3DD6F])
 [] o.a.s.c.a.s.SimClusterStateProvider ** creating new collection states, 
currentVersion=6
14446 INFO  
(TEST-TestSimPolicyCloud.testCreateCollectionAddReplica-seed#[6FE5447E15D3DD6F])
 [] o.a.s.c.a.s.SimClusterStateProvider JEGERLOW: Saving clusterstate
14446 DEBUG 
(TEST-TestSimPolicyCloud.testCreateCollectionAddReplica-seed#[6FE5447E15D3DD6F])
 [] o.a.s.c.a.s.SimClusterStateProvider ** saved cluster state version 6
14446 INFO  
(TEST-TestSimPolicyCloud.testCreateCollectionAddReplica-seed#[6FE5447E15D3DD6F])
 [] o.a.s.c.a.s.SimSolrCloudTestCase ###
 CLUSTER STATE 
###
## Live nodes:  2
## Empty nodes: 1
## Dead nodes:  0
## Collections:
##  * testCreateCollectionAddReplica
##shardsTotal   1
##shardsState   {active=1}
##  shardsWithoutLeader 0
{code}

One thing that stands out to me are the different clusterstate versions in play 
here.  The log snippets above show information from {{/clusterstate.json/5}}, 
and {{/clusterstate.json/6}} respectively.

I looked into {{SimClusterStateProvider}} and noticed that it caches the 
cluster state locally (see 
[here|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimClusterStateProvider.java#L2086])
 and warns readers that the cache must be explicitly cleared before new changes 
become visible.  With this caching temporarily disabled the test failure 
disappeared.  (Or at least, I couldn't trigger it in 2000 runs).  I suspect 
that the test failure is caused by either (1) some codepath not properly 
clearing/resetting this clusterstate cache, or (2) a subtler synchronization 
bug in how this cache is locked down.


was (Author: gerlowskija):
Looking at {{testCreateCollectionAddReplica}} first.  I'm still in the early 
stages of looking into this, but I think I see some things pointing to this 
being a sim-framework issue, as opposed to being a production problem.  I'm not 
super familiar with the sim-framework though, so I'll try and give some detail 
here in case anyone with more context can correct me and save me from a 
potential red-herring.

*TL;DR* I believe this to be a test-framework bug related to how the 
SimClusterStateProvider caches clusterstate values.

The test starts by creating a collection using a specific policy.  Maybe 1 time 
in 10 it'll fail in a {{CloudTestUtils.waitForState}} call.  On these failures, 
this {{waitForState}} call fails because the collection (supposedly) doesn't 
have a leader:
{code}
 last coll state:

[jira] [Commented] (SOLR-13045) Harden TestSimPolicyCloud

2018-12-06 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712293#comment-16712293
 ] 

Jason Gerlowski commented on SOLR-13045:


Looking at {{testCreateCollectionAddReplica}} first.  I'm still in the early 
stages of looking into this, but I think I see some things pointing to this 
being a sim-framework issue, as opposed to being a production problem.  I'm not 
super familiar with the sim-framework though, so I'll try and give some detail 
here in case anyone with more context can correct me and save me from a 
potential red-herring.

*TL;DR* I believe this to be a test-framework bug related to how the 
SimClusterStateProvider caches clusterstate values.

The test starts by creating a collection using a specific policy.  Maybe 1 time 
in 10 it'll fail in a {{CloudTestUtils.waitForState}} call.  On these failures, 
this {{waitForState}} call fails because the collection (supposedly) doesn't 
have a leader:
{code}
 last coll state: 
DocCollection(testCreateCollectionAddReplica//clusterstate.json/5)={
  "replicationFactor":"1",
  "pullReplicas":"0",
  "router":{"name":"compositeId"},
  "maxShardsPerNode":"1",
  "autoAddReplicas":"false",
  "nrtReplicas":"1",
  "tlogReplicas":"0",
  "autoCreated":"true",
  "policy":"c1",
  "shards":{"shard1":{
  "replicas":{"core_node1":{
  "core":"testCreateCollectionAddReplica_shard1_replica_n1",
  "SEARCHER.searcher.maxDoc":0,
  "SEARCHER.searcher.deletedDocs":0,
  "INDEX.sizeInBytes":10240,
  "node_name":"127.0.0.1:10068_solr",
  "state":"active",
  "type":"NRT",
  "INDEX.sizeInGB":9.5367431640625E-6,
  "SEARCHER.searcher.numDocs":0}},
  "range":"8000-7fff",
  "state":"active"}}}
{code}

But other statements in the logs indicate that this collection *does* have a 
leader.  We get this series of messages right as the test ends:
{code}
14445 INFO  
(TEST-TestSimPolicyCloud.testCreateCollectionAddReplica-seed#[6FE5447E15D3DD6F])
 [] o.a.s.SolrTestCaseJ4 ###Ending testCreateCollectionAddReplica
14446 DEBUG 
(TEST-TestSimPolicyCloud.testCreateCollectionAddReplica-seed#[6FE5447E15D3DD6F])
 [] o.a.s.c.a.s.SimClusterStateProvider ** creating new collection states, 
currentVersion=6
14446 INFO  
(TEST-TestSimPolicyCloud.testCreateCollectionAddReplica-seed#[6FE5447E15D3DD6F])
 [] o.a.s.c.a.s.SimClusterStateProvider JEGERLOW: Saving clusterstate
14446 DEBUG 
(TEST-TestSimPolicyCloud.testCreateCollectionAddReplica-seed#[6FE5447E15D3DD6F])
 [] o.a.s.c.a.s.SimClusterStateProvider ** saved cluster state version 6
14446 INFO  
(TEST-TestSimPolicyCloud.testCreateCollectionAddReplica-seed#[6FE5447E15D3DD6F])
 [] o.a.s.c.a.s.SimSolrCloudTestCase ###
 CLUSTER STATE 
###
## Live nodes:  2
## Empty nodes: 1
## Dead nodes:  0
## Collections:
##  * testCreateCollectionAddReplica
##shardsTotal   1
##shardsState   {active=1}
##  shardsWithoutLeader 0
{code}

One thing that stands out to me are the different clusterstate versions in play 
here.  The log snippets above show information from {{/clusterstate.json/5}}, 
and {{/clusterstate.json/6}} respectively.

I looked into {{SimClusterStateProvider}} and noticed that it caches the 
cluster state locally (see 
[here|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimClusterStateProvider.java#L2086]
 and warns readers that the cache must be explicitly cleared before new changes 
become visible.  With this caching temporarily disabled the test failure 
disappeared.  (Or at least, I couldn't trigger it in 2000 runs).  I suspect 
that the test failure is caused by either (1) some codepath not properly 
clearing/resetting this clusterstate cache, or (2) a subtler synchronization 
bug in how this cache is locked down.

> Harden TestSimPolicyCloud
> -
>
> Key: SOLR-13045
> URL: https://issues.apache.org/jira/browse/SOLR-13045
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: master (8.0)
>Reporter: Jason Gerlowski
>Priority: Major
>
> Several tests in TestSimPolicyCloud, but especially 
> {{testCreateCollectionAddReplica}}, have some flaky behavior, even after 
> Mark's recent test-fix commit.  This JIRA covers looking into and (hopefully) 
> fixing this test failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail:

[jira] [Created] (SOLR-13045) Harden TestSimPolicyCloud

2018-12-06 Thread Jason Gerlowski (JIRA)

Jason Gerlowski created SOLR-13045:
--

 Summary: Harden TestSimPolicyCloud
 Key: SOLR-13045
 URL: https://issues.apache.org/jira/browse/SOLR-13045
 Project: Solr
  Issue Type: Test
  Security Level: Public (Default Security Level. Issues are Public)
  Components: AutoScaling
Affects Versions: master (8.0)
Reporter: Jason Gerlowski


Several tests in TestSimPolicyCloud, but especially 
{{testCreateCollectionAddReplica}}, have some flaky behavior, even after Mark's 
recent test-fix commit.  This JIRA covers looking into and (hopefully) fixing 
this test failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13042) Miscellaneous JSON Facet API docs improvements

2018-12-05 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710524#comment-16710524
 ] 

Jason Gerlowski commented on SOLR-13042:


Though I only had small changes in mind initially, as I looked at these pages I 
think they could use a bit of a larger overhaul.  I think there's a good few 
things that could be improved.

Some of these are more general:
* add quotes to JSON snippets so that examples can be pasted into other editors 
or shared in other contexts without causing syntax highlighters to flare up.
* Change comments in existing JSON snippets to "callouts", so that they aren't 
included when the snippets get copy/pasted.
* Add corresponding SolrJ snippets for facet/query examples where possible and 
not already present


Some are specific to individual pages and tend to be a bit more structural:

*Json-Request-API Page*
* remove the "Error Detection" section, or move it somewhere else where it fits 
better
* move "Debugging" section out from under "Smart Merging of Multiple JSON 
Parameters", since it applies just as much to "Param Substitution" and "Passing 
Params"
* since we already have a descendant page for the querying syntax, would it 
make sense to move the JSON faceting page so it is also a descendant of "Json 
Request API"? 

*Json-Query-DSL Page*
* give it a little more explanation on why you would tag a query

*Json-Facet-API Page*
* removed the "design goals "section, as the doesn't seem appropriate for a 
rest guide
* reverse the metrics example in the bucket in example to Match the order they 
are introduced in
* get rid of the "making a faceting request" section and update all examples to 
have the appropriate curl header.
* Move the "Noggit"/"Json extensions" section over to the JSON top-level page 
since it applies to querying as well as faceting
* move terms, heatmap, range, etc under a "Types of Facets" top level section
* there's probably enough uses and examples of changing facet domains for it to 
be its own page.  Would that work?
* Remove the "references" section that has the links to Yonik's personal site.  
I'm not sure where the ref-guide comes down on external links to blogs, etc. in 
general.  I'm not against it.  But here I dislike it because those links are 
already out of date in slight ways and that will only get worse as JSON 
faceting develops further.

> Miscellaneous JSON Facet API docs improvements
> --
>
> Key: SOLR-13042
> URL: https://issues.apache.org/jira/browse/SOLR-13042
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Affects Versions: 7.5, master (8.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Minor
>
> While working on SOLR-12965 I noticed a few minor issues with the JSON 
> faceting ref-guide pages.  Nothing serious, just a few annoyances.  Tweaks 
> include:
> * missing/insufficient description of some params for Heatmap facets
> * Weird formatting on "Domain Filters" example
> * missing "fields"/"fl" in the "Parameters Mapping" table
> Figured I'd just create a JIRA and fix these before I forgot about them



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-13042) Miscellaneous JSON Facet API docs improvements

2018-12-05 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-13042:
---
Description: 
While working on SOLR-12965 I noticed a few minor issues with the JSON faceting 
ref-guide pages.  Nothing serious, just a few annoyances.  Tweaks include:
* missing/insufficient description of some params for Heatmap facets
* Weird formatting on "Domain Filters" example
* missing "fields"/"fl" in the "Parameters Mapping" table

Figured I'd just create a JIRA and fix these before I forgot about them

  was:
While working on SOLR-12965 I noticed a few minor issues with the JSON faceting 
ref-guide page.  Nothing serious, just a few annoyances.  Tweaks include:
* missing/insufficient description of some params for Heatmap facets
* Weird formatting on "Domain Filters" example
* missing "fields"/"fl" in the "Parameters Mapping" table

Figured I'd just create a JIRA and fix these before I forgot about them


> Miscellaneous JSON Facet API docs improvements
> --
>
> Key: SOLR-13042
> URL: https://issues.apache.org/jira/browse/SOLR-13042
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Affects Versions: 7.5, master (8.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Minor
>
> While working on SOLR-12965 I noticed a few minor issues with the JSON 
> faceting ref-guide pages.  Nothing serious, just a few annoyances.  Tweaks 
> include:
> * missing/insufficient description of some params for Heatmap facets
> * Weird formatting on "Domain Filters" example
> * missing "fields"/"fl" in the "Parameters Mapping" table
> Figured I'd just create a JIRA and fix these before I forgot about them



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-13042) Miscellaneous JSON Facet API docs improvements

2018-12-05 Thread Jason Gerlowski (JIRA)

Jason Gerlowski created SOLR-13042:
--

 Summary: Miscellaneous JSON Facet API docs improvements
 Key: SOLR-13042
 URL: https://issues.apache.org/jira/browse/SOLR-13042
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: documentation
Affects Versions: 7.5, master (8.0)
Reporter: Jason Gerlowski
Assignee: Jason Gerlowski


While working on SOLR-12965 I noticed a few minor issues with the JSON faceting 
ref-guide page.  Nothing serious, just a few annoyances.  Tweaks include:
* missing/insufficient description of some params for Heatmap facets
* Weird formatting on "Domain Filters" example
* missing "fields"/"fl" in the "Parameters Mapping" table

Figured I'd just create a JIRA and fix these before I forgot about them



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-9492) Request status API returns a completed status even if the collection API call failed

2018-12-04 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-9492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708939#comment-16708939
 ] 

Jason Gerlowski commented on SOLR-9492:
---

I'd like to rectify this if it's still an issue, but haven't been able to 
reproduce the issue yet.  I can get SPLITSHARD to fail in a few different ways, 
but none produce the "status=completed" that Shalin mentions in his example 
above.  It's possible that Steve's fix on SOLR-5970 fixed this issue for us.

Assuming the problem still exists and I'm just not creative enough to reproduce 
it successfully, I've got a pretty good guess where the problem lies.  The 
overseer's {{OverseerCollectionMessageHandler}} has a {{processResponses}} 
method which is invoked several times to check for errors while executing 
subtasks within a SPLITSHARD request.  The SPLITSHARD code tells 
{{processResponses}} to abort on error (by throwing a SolrException), but the 
logic that does this only checks for an "exception" field in the response 
namedList.  This is sufficient for a lot of error cases, but Solr's APIs don't 
consistently return "exceptions" fields on all error cases.  If one of these 
responses is returned, we'll log the error under the "failure" map, but never 
abort the splitshard request.

> Request status API returns a completed status even if the collection API call 
> failed
> 
>
> Key: SOLR-9492
> URL: https://issues.apache.org/jira/browse/SOLR-9492
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 5.5.2, 6.2
>Reporter: Shalin Shekhar Mangar
>Priority: Major
>  Labels: difficulty-medium, impact-high
> Fix For: 6.7, 7.0
>
>
> A failed split shard response is:
> {code}
> {success={127.0.0.1:43245_hfnp%2Fbq={responseHeader={status=0,QTime=2}},127.0.0.1:43245_hfnp%2Fbq={responseHeader={status=0,QTime=0}},127.0.0.1:43245_hfnp%2Fbq={responseHeader={status=0,QTime=0}},127.0.0.1:43245_hfnp%2Fbq={responseHeader={status=0,QTime=0}},127.0.0.1:43245_hfnp%2Fbq={responseHeader={status=0,QTime=0}},127.0.0.1:43245_hfnp%2Fbq={responseHeader={status=0,QTime=0}},127.0.0.1:43245_hfnp%2Fbq={responseHeader={status=0,QTime=0}},127.0.0.1:50948_hfnp%2Fbq={responseHeader={status=0,QTime=0}}},c32001ed-3bca-4ae0-baae-25a3c99e35e65883644576126044={responseHeader={status=0,QTime=0},STATUS=completed,Response=TaskId:
>  c32001ed-3bca-4ae0-baae-25a3c99e35e65883644576126044 webapp=null 
> path=/admin/cores 
> params={async=c32001ed-3bca-4ae0-baae-25a3c99e35e65883644576126044=/admin/cores=conf1=collection1_shard1_0_replica1=CREATE=collection1=shard1_0=javabin=2}
>  status=0 
> QTime=2},c32001ed-3bca-4ae0-baae-25a3c99e35e65883647597130004={responseHeader={status=0,QTime=0},STATUS=completed,Response=TaskId:
>  c32001ed-3bca-4ae0-baae-25a3c99e35e65883647597130004 webapp=null 
> path=/admin/cores 
> params={async=c32001ed-3bca-4ae0-baae-25a3c99e35e65883647597130004=/admin/cores=conf1=collection1_shard1_1_replica1=CREATE=collection1=shard1_1=javabin=2}
>  status=0 
> QTime=0},c32001ed-3bca-4ae0-baae-25a3c99e35e65883649607943904={responseHeader={status=0,QTime=0},STATUS=completed,Response=TaskId:
>  c32001ed-3bca-4ae0-baae-25a3c99e35e65883649607943904 webapp=null 
> path=/admin/cores 
> params={nodeName=127.0.0.1:43245_hfnp%252Fbq=collection1_shard1_1_replica1=c32001ed-3bca-4ae0-baae-25a3c99e35e65883649607943904=/admin/cores=core_node6=PREPRECOVERY=true=active=true=javabin=2}
>  status=0 
> QTime=0},c32001ed-3bca-4ae0-baae-25a3c99e35e65883649612565003={responseHeader={status=0,QTime=0},STATUS=completed,Response=TaskId:
>  c32001ed-3bca-4ae0-baae-25a3c99e35e65883649612565003 webapp=null 
> path=/admin/cores 
> params={core=collection1=c32001ed-3bca-4ae0-baae-25a3c99e35e65883649612565003=/admin/cores=SPLIT=collection1_shard1_0_replica1=collection1_shard1_1_replica1=javabin=2}
>  status=0 
> QTime=0},c32001ed-3bca-4ae0-baae-25a3c99e35e65883650618358632={responseHeader={status=0,QTime=0},STATUS=completed,Response=TaskId:
>  c32001ed-3bca-4ae0-baae-25a3c99e35e65883650618358632 webapp=null 
> path=/admin/cores 
> params={async=c32001ed-3bca-4ae0-baae-25a3c99e35e65883650618358632=/admin/cores=collection1_shard1_1_replica1=REQUESTAPPLYUPDATES=javabin=2}
>  status=0 
> QTime=0},c32001ed-3bca-4ae0-baae-25a3c99e35e65883650636428900={responseHeader={status=0,QTime=0},STATUS=completed,Response=TaskId:
>  c32001ed-3bca-4ae0-baae-25a3c99e35e65883650636428900 webapp=null 
> path=/admin/cores 
> params={async=c32001ed-3bca-4ae0-baae-25a3c99e35e65883650636428900=/admin/cores=conf1=collection1_shard1_0_replica0=CREATE=collection1=shard1_0=javabin=2}
>  status=0 
>

[jira] [Resolved] (SOLR-13019) Fix typo in MailEntityProcessor.java

2018-12-04 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski resolved SOLR-13019.

   Resolution: Fixed
Fix Version/s: master (8.0)

> Fix typo in MailEntityProcessor.java
> 
>
> Key: SOLR-13019
> URL: https://issues.apache.org/jira/browse/SOLR-13019
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - DataImportHandler
>Reporter: Tommy Marshment-Howell
>Assignee: Jason Gerlowski
>Priority: Trivial
> Fix For: master (8.0)
>
>
> https://github.com/apache/lucene-solr/pull/509



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13019) Fix typo in MailEntityProcessor.java

2018-12-04 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708726#comment-16708726
 ] 

Jason Gerlowski commented on SOLR-13019:


Thanks for the patch Tommy.  Merged and closing.

> Fix typo in MailEntityProcessor.java
> 
>
> Key: SOLR-13019
> URL: https://issues.apache.org/jira/browse/SOLR-13019
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - DataImportHandler
>Reporter: Tommy Marshment-Howell
>Assignee: Jason Gerlowski
>Priority: Trivial
> Fix For: master (8.0)
>
>
> https://github.com/apache/lucene-solr/pull/509



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-13019) Fix typo in MailEntityProcessor.java

2018-12-04 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski reassigned SOLR-13019:
--

Assignee: Jason Gerlowski

> Fix typo in MailEntityProcessor.java
> 
>
> Key: SOLR-13019
> URL: https://issues.apache.org/jira/browse/SOLR-13019
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - DataImportHandler
>Reporter: Tommy Marshment-Howell
>Assignee: Jason Gerlowski
>Priority: Trivial
>
> https://github.com/apache/lucene-solr/pull/509



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-13027) Harden LeaderTragicEventTest.

2018-12-04 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708690#comment-16708690
 ] 

Jason Gerlowski edited comment on SOLR-13027 at 12/4/18 1:23 PM:
-

Looks like you added an empty/useless if-statement 
[here|https://github.com/apache/lucene-solr/blob/33c40a8da40677f43ea377ca0cb2a1def8649c52/solr/solrj/src/java/org/apache/solr/client/solrj/impl/SolrClientNodeStateProvider.java#L310].
  Assuming that was an accident?

Also, I noticed elsewhere that the cleanup command in LeaderTragicEventTest 
[here|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/LeaderTragicEventTest.java#L193]
 often/always ( ? ) fails due to a race condition where the overseer doesn't 
get rid of connections orphaned/closed by the Jetty restart.  We ask the 
overseer to delete a collection for us and it fails because it tries to use 
these old connections.  (You helped me out on this yesterday offline actually, 
though I don't think I mentioned this test by name at the time.). Anyway, this 
cleanup failure doesn't typically cause test failures due to a different bug 
altogether (SOLR-6595), but if you're beasting you might see the incomplete 
cleanup cause issues so I wanted to mention it. (See SOLR-13038 for more 
details if you're interested, or willing to chime in)


was (Author: gerlowskija):
Looks like you added an empty/useless if-statement 
[here|https://github.com/apache/lucene-solr/blob/33c40a8da40677f43ea377ca0cb2a1def8649c52/solr/solrj/src/java/org/apache/solr/client/solrj/impl/SolrClientNodeStateProvider.java#L310].
  Assuming that was an accident?

Also, I noticed elsewhere that the cleanup command in LeaderTragicEventTest 
[here|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/LeaderTragicEventTest.java#L193]
 often/always ( ? ) fails due to a race condition where the overseer doesn't 
get rid of connections orphaned/closed by the Jetty restart.  We ask the 
overseer to delete a collection for us and it fails because it tries to use 
these old connections.  (You helped me out on this yesterday offline actually, 
though I don't think I mentioned this test by name at the time.). Anyway, this 
cleanup failure doesn't typically cause test failures due to a different bug 
altogether (SOLR-6595), but if you're beasting you might see the incomplete 
cleanup cause issues so I wanted to mention it.

> Harden LeaderTragicEventTest.
> -
>
> Key: SOLR-13027
> URL: https://issues.apache.org/jira/browse/SOLR-13027
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13027) Harden LeaderTragicEventTest.

2018-12-04 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708690#comment-16708690
 ] 

Jason Gerlowski commented on SOLR-13027:


Looks like you added an empty/useless if-statement 
[here|https://github.com/apache/lucene-solr/blob/33c40a8da40677f43ea377ca0cb2a1def8649c52/solr/solrj/src/java/org/apache/solr/client/solrj/impl/SolrClientNodeStateProvider.java#L310].
  Assuming that was an accident?

Also, I noticed elsewhere that the cleanup command in LeaderTragicEventTest 
[here|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/LeaderTragicEventTest.java#L193]
 often/always(?) fails due to a race condition where the overseer doesn't get 
rid of connections orphaned/closed by the Jetty restart.  We ask the overseer 
to delete a collection for us and it fails because it tries to use these old 
connections.  (You helped me out on this yesterday offline actually, though I 
don't think I mentioned this test by name at the time.). Anyway, this cleanup 
failure doesn't typically cause test failures due to a different bug altogether 
(SOLR-6595), but if you're beasting you might see the incomplete cleanup cause 
issues so I wanted to mention it.

> Harden LeaderTragicEventTest.
> -
>
> Key: SOLR-13027
> URL: https://issues.apache.org/jira/browse/SOLR-13027
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-13027) Harden LeaderTragicEventTest.

2018-12-04 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708690#comment-16708690
 ] 

Jason Gerlowski edited comment on SOLR-13027 at 12/4/18 1:20 PM:
-

Looks like you added an empty/useless if-statement 
[here|https://github.com/apache/lucene-solr/blob/33c40a8da40677f43ea377ca0cb2a1def8649c52/solr/solrj/src/java/org/apache/solr/client/solrj/impl/SolrClientNodeStateProvider.java#L310].
  Assuming that was an accident?

Also, I noticed elsewhere that the cleanup command in LeaderTragicEventTest 
[here|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/LeaderTragicEventTest.java#L193]
 often/always ( ? ) fails due to a race condition where the overseer doesn't 
get rid of connections orphaned/closed by the Jetty restart.  We ask the 
overseer to delete a collection for us and it fails because it tries to use 
these old connections.  (You helped me out on this yesterday offline actually, 
though I don't think I mentioned this test by name at the time.). Anyway, this 
cleanup failure doesn't typically cause test failures due to a different bug 
altogether (SOLR-6595), but if you're beasting you might see the incomplete 
cleanup cause issues so I wanted to mention it.


was (Author: gerlowskija):
Looks like you added an empty/useless if-statement 
[here|https://github.com/apache/lucene-solr/blob/33c40a8da40677f43ea377ca0cb2a1def8649c52/solr/solrj/src/java/org/apache/solr/client/solrj/impl/SolrClientNodeStateProvider.java#L310].
  Assuming that was an accident?

Also, I noticed elsewhere that the cleanup command in LeaderTragicEventTest 
[here|https://github.com/apache/lucene-solr/blob/75b183196798232aa6f2dcb117f309119053/solr/core/src/test/org/apache/solr/cloud/LeaderTragicEventTest.java#L193]
 often/always(?) fails due to a race condition where the overseer doesn't get 
rid of connections orphaned/closed by the Jetty restart.  We ask the overseer 
to delete a collection for us and it fails because it tries to use these old 
connections.  (You helped me out on this yesterday offline actually, though I 
don't think I mentioned this test by name at the time.). Anyway, this cleanup 
failure doesn't typically cause test failures due to a different bug altogether 
(SOLR-6595), but if you're beasting you might see the incomplete cleanup cause 
issues so I wanted to mention it.

> Harden LeaderTragicEventTest.
> -
>
> Key: SOLR-13027
> URL: https://issues.apache.org/jira/browse/SOLR-13027
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12555) Replace try-fail-catch test patterns

2018-12-03 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708138#comment-16708138
 ] 

Jason Gerlowski commented on SOLR-12555:


Thanks for review Bar.  I committed the resulting patch this past weekend.  
Will post here if I'm able to bite off a few more packages this week.

> Replace try-fail-catch test patterns
> 
>
> Key: SOLR-12555
> URL: https://issues.apache.org/jira/browse/SOLR-12555
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Affects Versions: master (8.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Trivial
> Attachments: SOLR-12555-sorted-by-package.txt, SOLR-12555.patch, 
> SOLR-12555.patch, SOLR-12555.txt
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> I recently added some test code through SOLR-12427 which used the following 
> test anti-pattern:
> {code}
> try {
> actionExpectedToThrowException();
> fail("I expected this to throw an exception, but it didn't");
> catch (Exception e) {
> assertOnThrownException(e);
> }
> {code}
> Hoss (rightfully) objected that this should instead be written using the 
> formulation below, which is clearer and more concise.
> {code}
> SolrException e = expectThrows(() -> {...});
> {code}
> We should remove many of these older formulations where it makes sense.  Many 
> of them were written before {{expectThrows}} was introduced, and having the 
> old style assertions around makes it easier for them to continue creeping in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13038) Overseer actions fail with NoHttpResponseException following a node restart

2018-12-03 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708133#comment-16708133
 ] 

Jason Gerlowski commented on SOLR-13038:


I've attached a strawman patch that adds a very basic retry check into 
HttpShardHandler.  Most of the patch is just plumbing to pass around a 
"retryable" boolean to where it can be added to {{ShardRequest}}.  This 
plumbing is pretty rough - I wouldn't commit it without finding something a 
little more elegant - but it's sufficient for showing the change conceptually.

Having seen a lot of discussion on prior JIRAs related to this issue, it seems 
like there's a lot of concern about retrying on this particular error case.  To 
summarize, {{NoHttpResponseException}} is ambiguous - there's no way to tell 
whether the server received and processed your request or not.  So a 
requirement is that we avoid retrying any non-idempotent requests.  That was 
the main goal in choosing the approach I did for this strawman patch.  Each 
caller of HttpShardHandler can choose whether they're OK with their request 
being retried, with the default being to not retry.

Anyway, curious if people have any thoughts.

Oh, one last thing.  Also in this patch is an additional assertion to 
LeaderTragicEventTest that exhibits the problem.  It passes with the rest of 
the patch, but will fail and show the problem when applied on its own. 

> Overseer actions fail with NoHttpResponseException following a node restart
> ---
>
> Key: SOLR-13038
> URL: https://issues.apache.org/jira/browse/SOLR-13038
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: master (8.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13038.patch
>
>
> I noticed recently that a lot of overseer operations fail if they're executed 
> right after a restart of a Solr node.  The failure returns a message like 
> "org.apache.solr.client.solrj.SolrServerException:IOException occured when 
> talking to server at: https://127.0.0.1:62253/solr;.  The logs are a bit more 
> helpful:
> {code}
> org.apache.solr.client.solrj.SolrServerException: IOException occured when 
> talking to server at: https://127.0.0.1:62253/solr
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:657)
>  ~[java/:?]
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
>  ~[java/:?]
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
>  ~[java/:?]
> at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1260) 
> ~[java/:?]
> at 
> org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:172)
>  ~[java/:?]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_172]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_172]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_172]
> at 
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
>  ~[metrics-core-3.2.6.jar:3.2.6]
> at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
>  ~[java/:?]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [?:1.8.0_172]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [?:1.8.0_172]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172]
> Caused by: org.apache.http.NoHttpResponseException: 127.0.0.1:62253 failed to 
> respond
> at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141)
>  ~[httpclient-4.5.6.jar:4.5.6]
> at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>  ~[httpclient-4.5.6.jar:4.5.6]
> at 
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>  ~[httpcore-4.4.10.jar:4.4.10]
> at 
> org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>  ~[httpcore-4.4.10.jar:4.4.10]
> at 
> org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
>  ~[httpclient-4.5.6.jar:4.5.6]
> at 
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>  ~[httpcore-4.4.10.jar:4.4.10]
> at 
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>  ~[httpcore-4.4.10.jar:4.4.10]
> at

[jira] [Updated] (SOLR-13038) Overseer actions fail with NoHttpResponseException following a node restart

2018-12-03 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-13038:
---
Attachment: SOLR-13038.patch

> Overseer actions fail with NoHttpResponseException following a node restart
> ---
>
> Key: SOLR-13038
> URL: https://issues.apache.org/jira/browse/SOLR-13038
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: master (8.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13038.patch
>
>
> I noticed recently that a lot of overseer operations fail if they're executed 
> right after a restart of a Solr node.  The failure returns a message like 
> "org.apache.solr.client.solrj.SolrServerException:IOException occured when 
> talking to server at: https://127.0.0.1:62253/solr;.  The logs are a bit more 
> helpful:
> {code}
> org.apache.solr.client.solrj.SolrServerException: IOException occured when 
> talking to server at: https://127.0.0.1:62253/solr
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:657)
>  ~[java/:?]
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
>  ~[java/:?]
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
>  ~[java/:?]
> at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1260) 
> ~[java/:?]
> at 
> org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:172)
>  ~[java/:?]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_172]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_172]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_172]
> at 
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
>  ~[metrics-core-3.2.6.jar:3.2.6]
> at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
>  ~[java/:?]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [?:1.8.0_172]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [?:1.8.0_172]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172]
> Caused by: org.apache.http.NoHttpResponseException: 127.0.0.1:62253 failed to 
> respond
> at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141)
>  ~[httpclient-4.5.6.jar:4.5.6]
> at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>  ~[httpclient-4.5.6.jar:4.5.6]
> at 
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>  ~[httpcore-4.4.10.jar:4.4.10]
> at 
> org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>  ~[httpcore-4.4.10.jar:4.4.10]
> at 
> org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
>  ~[httpclient-4.5.6.jar:4.5.6]
> at 
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>  ~[httpcore-4.4.10.jar:4.4.10]
> at 
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>  ~[httpcore-4.4.10.jar:4.4.10]
> at 
> org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:120)
>  ~[java/:?]
> at 
> org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
>  ~[httpclient-4.5.6.jar:4.5.6]
> at 
> org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) 
> ~[httpclient-4.5.6.jar:4.5.6]
> at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) 
> ~[httpclient-4.5.6.jar:4.5.6]
> at 
> org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) 
> ~[httpclient-4.5.6.jar:4.5.6]
> at 
> org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
>  ~[httpclient-4.5.6.jar:4.5.6]
> at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
>  ~[httpclient-4.5.6.jar:4.5.6]
> at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
>  ~[httpclient-4.5.6.jar:4.5.6]
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:542)
>  ~[java/:?]
> ... 12 more
> {code}
> After a bit of debugging I was able to confirm the problem: when some 
> non-overseer node gets restarted,

[jira] [Commented] (SOLR-13038) Overseer actions fail with NoHttpResponseException following a node restart

2018-12-03 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707814#comment-16707814
 ] 

Jason Gerlowski commented on SOLR-13038:


You can reproduce this behavior pretty regularly with the JUnit test below that 
uses SolrCloudTestCase as its base:

{code}
@Test
  public void testOtherReplicasAreNotActive() throws Exception {
final String collection = "collection1";
CollectionAdminRequest
.createCollection(collection, "config", 1, 2)
.process(cluster.getSolrClient());
cluster.waitForActiveCollection(collection, 1, 2);
Slice shard = getCollectionState(collection).getSlice("shard1");
JettySolrRunner otherReplicaJetty = 
cluster.getReplicaJetty(getNonLeader(shard));

otherReplicaJetty.stop();
cluster.waitForJettyToStop(otherReplicaJetty);
waitForState("Timeout waiting for replica get down", collection, 
(liveNodes, collectionState) -> 
getNonLeader(collectionState.getSlice("shard1")).getState() != 
Replica.State.ACTIVE);
otherReplicaJetty.start();
cluster.waitForNode(otherReplicaJetty, 30);
waitForState("Timeout waiting for replica get up", collection, (liveNodes, 
collectionState) -> getNonLeader(collectionState.getSlice("shard1")).getState() 
== Replica.State.ACTIVE);
CollectionAdminResponse response = 
CollectionAdminRequest.deleteCollection(collection).process(cluster.getSolrClient());
assertNull("Expected collection-delete to fully succeed", 
response.getResponse().get("failure"));
  }
{code}

> Overseer actions fail with NoHttpResponseException following a node restart
> ---
>
> Key: SOLR-13038
> URL: https://issues.apache.org/jira/browse/SOLR-13038
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: master (8.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
>
> I noticed recently that a lot of overseer operations fail if they're executed 
> right after a restart of a Solr node.  The failure returns a message like 
> "org.apache.solr.client.solrj.SolrServerException:IOException occured when 
> talking to server at: https://127.0.0.1:62253/solr;.  The logs are a bit more 
> helpful:
> {code}
> org.apache.solr.client.solrj.SolrServerException: IOException occured when 
> talking to server at: https://127.0.0.1:62253/solr
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:657)
>  ~[java/:?]
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
>  ~[java/:?]
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
>  ~[java/:?]
> at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1260) 
> ~[java/:?]
> at 
> org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:172)
>  ~[java/:?]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_172]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_172]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_172]
> at 
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
>  ~[metrics-core-3.2.6.jar:3.2.6]
> at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
>  ~[java/:?]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [?:1.8.0_172]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [?:1.8.0_172]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172]
> Caused by: org.apache.http.NoHttpResponseException: 127.0.0.1:62253 failed to 
> respond
> at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141)
>  ~[httpclient-4.5.6.jar:4.5.6]
> at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>  ~[httpclient-4.5.6.jar:4.5.6]
> at 
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>  ~[httpcore-4.4.10.jar:4.4.10]
> at 
> org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>  ~[httpcore-4.4.10.jar:4.4.10]
> at 
> org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
>  ~[httpclient-4.5.6.jar:4.5.6]
> at 
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>  ~[httpcore-4.4.10.jar:4.4.10]
> at 
>

[jira] [Created] (SOLR-13038) Overseer actions fail with NoHttpResponseException following a node restart

2018-12-03 Thread Jason Gerlowski (JIRA)

Jason Gerlowski created SOLR-13038:
--

 Summary: Overseer actions fail with NoHttpResponseException 
following a node restart
 Key: SOLR-13038
 URL: https://issues.apache.org/jira/browse/SOLR-13038
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SolrCloud
Affects Versions: master (8.0)
Reporter: Jason Gerlowski
Assignee: Jason Gerlowski


I noticed recently that a lot of overseer operations fail if they're executed 
right after a restart of a Solr node.  The failure returns a message like 
"org.apache.solr.client.solrj.SolrServerException:IOException occured when 
talking to server at: https://127.0.0.1:62253/solr;.  The logs are a bit more 
helpful:

{code}
org.apache.solr.client.solrj.SolrServerException: IOException occured when 
talking to server at: https://127.0.0.1:62253/solr
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:657)
 ~[java/:?]
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
 ~[java/:?]
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
 ~[java/:?]
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1260) 
~[java/:?]
at 
org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:172)
 ~[java/:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_172]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_172]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_172]
at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
 ~[metrics-core-3.2.6.jar:3.2.6]
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
 ~[java/:?]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_172]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_172]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172]
Caused by: org.apache.http.NoHttpResponseException: 127.0.0.1:62253 failed to 
respond
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141)
 ~[httpclient-4.5.6.jar:4.5.6]
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
 ~[httpclient-4.5.6.jar:4.5.6]
at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
 ~[httpcore-4.4.10.jar:4.4.10]
at 
org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
 ~[httpcore-4.4.10.jar:4.4.10]
at 
org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165) 
~[httpclient-4.5.6.jar:4.5.6]
at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
 ~[httpcore-4.4.10.jar:4.4.10]
at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
 ~[httpcore-4.4.10.jar:4.4.10]
at 
org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:120)
 ~[java/:?]
at 
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) 
~[httpclient-4.5.6.jar:4.5.6]
at 
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) 
~[httpclient-4.5.6.jar:4.5.6]
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) 
~[httpclient-4.5.6.jar:4.5.6]
at 
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) 
~[httpclient-4.5.6.jar:4.5.6]
at 
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
 ~[httpclient-4.5.6.jar:4.5.6]
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
 ~[httpclient-4.5.6.jar:4.5.6]
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
 ~[httpclient-4.5.6.jar:4.5.6]
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:542)
 ~[java/:?]
... 12 more
{code}

After a bit of debugging I was able to confirm the problem: when some 
non-overseer node gets restarted, the overseer never notices that its 
connections are invalid and will try to reuse them for subsequent requests that 
happen right after the restart.

There's a few ways we might be able to tackle this:
* we could look at adding logic to {{SolrHttpRequestRetryHandler}} to retry 
when this happens.  SHRRH already retries NoHttpResponseException generally, 
but has other logic which prevents any retries on collection/core-admin APIs.  
Maybe we could elaborate this a bit.
* we

[jira] [Resolved] (SOLR-6117) Replication command=fetchindex always return success.

2018-12-03 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski resolved SOLR-6117.
---
   Resolution: Fixed
Fix Version/s: 7.7
   master (8.0)

Marking this as 'Fixed' for 8.0 and 7.7.  To summarize/clarify, the fixes on 
{{master}} and {{branch_7x}} are a little different based on the need to avoid 
potentially breaking changes on 7x.  The 7x changes only go far enough to fix 
the bug where we return a 
"success" status even when the request fails.  The master changes do this, as 
well as correcting a few inconsistencies between the different error cases.

> Replication command=fetchindex always return success.
> -
>
> Key: SOLR-6117
> URL: https://issues.apache.org/jira/browse/SOLR-6117
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java)
>Affects Versions: 4.6, 7.5
>Reporter: Raintung Li
>Assignee: Jason Gerlowski
>Priority: Major
> Fix For: master (8.0), 7.7
>
> Attachments: SOLR-6117-master.patch, SOLR-6117.patch, 
> SOLR-6117.patch, SOLR-6117.patch, SOLR-6117.txt
>
>
> Replication API command=fetchindex do fetch the index. while occur the error, 
> still give success response. 
> API should return the right status, especially WAIT parameter is 
> true.(synchronous).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6595) Improve error response in case distributed collection cmd fails

2018-11-28 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-6595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702292#comment-16702292
 ] 

Jason Gerlowski commented on SOLR-6595:
---

Thinking aloud here, and I guess also soliciting feedback.

The current patch sets 500 as the value for the "status' property, as well as 
the HTTP status code on the response.  The expectation in most other places 
seems to be that the "status" property matches the HTTP status code.  So this 
seems like the technically correct thing to do from an API perspective.

There's is a downside to this though- SolrJ converts non-200 responses into 
exceptions.  So while the failure information is still in the response, SolrJ 
users can't get at it.  (This isn't strictly true...SolrJ tries its best to 
come up with a good exception message by looking for properties like "error" 
and "failure".  But that's a pale substitute to giving users access to the 
response itself if they want it).

It'd be cool if SolrJ users could access the original response in exceptional 
cases.  Maybe we should attach the parsed NamedList to RemoteSolrExceptions 
that get thrown by SolrJ.  That seems like a separate JIRA, but wanted to raise 
it here since it bears on these response changes indirectly.

> Improve error response in case distributed collection cmd fails
> ---
>
> Key: SOLR-6595
> URL: https://issues.apache.org/jira/browse/SOLR-6595
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.10
> Environment: SolrCloud with Client SSL
>Reporter: Sindre Fiskaa
>Assignee: Jason Gerlowski
>Priority: Minor
> Attachments: SOLR-6595.patch
>
>
> Followed the description 
> https://cwiki.apache.org/confluence/display/solr/Enabling+SSL and generated a 
> self signed key pair. Configured a few solr-nodes and used the collection api 
> to crate a new collection. -I get error message when specify the nodes with 
> the createNodeSet param. When I don't use createNodeSet param the collection 
> gets created without error on random nodes. Could this be a bug related to 
> the createNodeSet param?- *Update: It failed due to what turned out to be 
> invalid client certificate on the overseer, and returned the following 
> response:*
> {code:xml}
> 
>   0 name="QTime">185
>   
> org.apache.solr.client.solrj.SolrServerException:IOException occured 
> when talking to server at: https://vt-searchln04:443/solr
>   
> 
> {code}
> *Update: Three problems:*
> # Status=0 when the cmd did not succeed (only ZK was updated, but cores not 
> created due to failing to connect to shard nodes to talk to core admin API).
> # The error printed does not tell which action failed. Would be helpful to 
> either get the msg from the original exception or at least some message 
> saying "Failed to create core, see log on Overseer 
> # State of collection is not clean since it exists as far as ZK is concerned 
> but cores not created. Thus retrying the CREATECOLLECTION cmd would fail. 
> Should Overseer detect error in distributed cmds and rollback changes already 
> made in ZK?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6117) Replication command=fetchindex always return success.

2018-11-28 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16701957#comment-16701957
 ] 

Jason Gerlowski commented on SOLR-6117:
---

Attached an updated patch that's intended for the master branch, and thus has 
liberty to do more to make the various responses from the /replication API more 
uniform.  This version of the patch addresses all of the bullet points in my 
previous comment.  Haven't run tests more generally yet, but I hope to commit 
to master in the next week or so.

One thing I forgot to clarify in my previous comment: both of these patches 
address _all_ subcommands in the /replication API (not just "fetchindex")  That 
was a point of discussion in the original effort on this JIRA, so just thought 
I'd clarify.

> Replication command=fetchindex always return success.
> -
>
> Key: SOLR-6117
> URL: https://issues.apache.org/jira/browse/SOLR-6117
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java)
>Affects Versions: 4.6, 7.5
>Reporter: Raintung Li
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-6117-master.patch, SOLR-6117.patch, 
> SOLR-6117.patch, SOLR-6117.patch, SOLR-6117.txt
>
>
> Replication API command=fetchindex do fetch the index. while occur the error, 
> still give success response. 
> API should return the right status, especially WAIT parameter is 
> true.(synchronous).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6117) Replication command=fetchindex always return success.

2018-11-28 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-6117:
--
Attachment: SOLR-6117-master.patch

> Replication command=fetchindex always return success.
> -
>
> Key: SOLR-6117
> URL: https://issues.apache.org/jira/browse/SOLR-6117
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java)
>Affects Versions: 4.6, 7.5
>Reporter: Raintung Li
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-6117-master.patch, SOLR-6117.patch, 
> SOLR-6117.patch, SOLR-6117.patch, SOLR-6117.txt
>
>
> Replication API command=fetchindex do fetch the index. while occur the error, 
> still give success response. 
> API should return the right status, especially WAIT parameter is 
> true.(synchronous).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6117) Replication command=fetchindex always return success.

2018-11-28 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-6117:
--
Affects Version/s: 7.5

> Replication command=fetchindex always return success.
> -
>
> Key: SOLR-6117
> URL: https://issues.apache.org/jira/browse/SOLR-6117
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java)
>Affects Versions: 4.6, 7.5
>Reporter: Raintung Li
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-6117.patch, SOLR-6117.patch, SOLR-6117.patch, 
> SOLR-6117.txt
>
>
> Replication API command=fetchindex do fetch the index. while occur the error, 
> still give success response. 
> API should return the right status, especially WAIT parameter is 
> true.(synchronous).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6117) Replication command=fetchindex always return success.

2018-11-27 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700791#comment-16700791
 ] 

Jason Gerlowski commented on SOLR-6117:
---

Most recent attached patch is a slight update of Shalin's.  I'd hoped to add a 
lot more tests with this that trigger the various failure conditions, but it's 
hard to reproduce many of them via JUnit.  Also looked at adding unit tests for 
ReplicationHandler directly, but it relies heavily on SolrCore, which is final 
which makes mocking/stubbing difficult as well.  If anyone sees a way to get 
more coverage on this without major surgery, I'd love to hear it.

The current patch makes sure that we never advertise a response as status=OK 
falsely, so it's just a bugfix and should be safe to include in branch_7x from 
a breaking-change perspective.  There's a lot of other problems with the 
replication handler responses that would require breaking changes.  
Specifically:
* "status" is only present on some responses.  Ideally it should be present on 
all /replication responses so that clients can rely on it being there.
* "status" is used inconsistently.  Some uses give it an enum-like value that 
clients could key off of, others treat it like a "message" field and just give 
it random error messages
* when errors occur, the "message" and "exception" fields are used 
inconsistently.  Ideally if an error occurs there would always be a message, 
and sometimes there would also be an exception.
* many of the error-cases involving argument-validation set the status field 
properly but return with the wrong HTTP status (200). (i.e. they should throw a 
SolrException).

I plan on working some of these out soon in a larger commit that can be put on 
master.

> Replication command=fetchindex always return success.
> -
>
> Key: SOLR-6117
> URL: https://issues.apache.org/jira/browse/SOLR-6117
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java)
>Affects Versions: 4.6
>Reporter: Raintung Li
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-6117.patch, SOLR-6117.patch, SOLR-6117.patch, 
> SOLR-6117.txt
>
>
> Replication API command=fetchindex do fetch the index. while occur the error, 
> still give success response. 
> API should return the right status, especially WAIT parameter is 
> true.(synchronous).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6117) Replication command=fetchindex always return success.

2018-11-27 Thread Jason Gerlowski (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-6117:
--
Attachment: SOLR-6117.patch

> Replication command=fetchindex always return success.
> -
>
> Key: SOLR-6117
> URL: https://issues.apache.org/jira/browse/SOLR-6117
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java)
>Affects Versions: 4.6
>Reporter: Raintung Li
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-6117.patch, SOLR-6117.patch, SOLR-6117.patch, 
> SOLR-6117.txt
>
>
> Replication API command=fetchindex do fetch the index. while occur the error, 
> still give success response. 
> API should return the right status, especially WAIT parameter is 
> true.(synchronous).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

< 1 2 3 4 5 6 7 8 9 >

101 - 200 of 874 matches

Mail list logo