[jira] [Commented] (SOLR-6595) Improve error response in case distributed collection cmd fails

2019-01-02 Thread Jason Gerlowski (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732042#comment-16732042
 ] 

Jason Gerlowski commented on SOLR-6595:
---

I'm not going to have much time in the immediate future to finish this up, so I 
wanted to summarize the progress so far:

- the latest patch sets the "status" property to 500 when the "failure" list is 
present and non-empty
- because of this, SolrJ will now throw exceptions in failure cases where it 
previously allowed the request to fail silently.  This causes some tests to 
fail that were passing (incorrectly) before.  I investigated a few examples of 
this, and most were in test setup/cleanup when the expectations were a bit off. 
 There weren't a ton of these failures though and they should be simpler to 
debug thanks to other recent test flakiness improvements.
- I investigated making changes to SolrJ that would attach a NamedList to 
SolrExceptions thrown because of a 500, but didn't pursue that too far.  It's 
probably a separate JIRA anyways. 

> Improve error response in case distributed collection cmd fails
> ---
>
> Key: SOLR-6595
> URL: https://issues.apache.org/jira/browse/SOLR-6595
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.10
> Environment: SolrCloud with Client SSL
>Reporter: Sindre Fiskaa
>Assignee: Jason Gerlowski
>Priority: Minor
> Attachments: SOLR-6595.patch
>
>
> Followed the description 
> https://cwiki.apache.org/confluence/display/solr/Enabling+SSL and generated a 
> self signed key pair. Configured a few solr-nodes and used the collection api 
> to crate a new collection. -I get error message when specify the nodes with 
> the createNodeSet param. When I don't use createNodeSet param the collection 
> gets created without error on random nodes. Could this be a bug related to 
> the createNodeSet param?- *Update: It failed due to what turned out to be 
> invalid client certificate on the overseer, and returned the following 
> response:*
> {code:xml}
> 
>   0 name="QTime">185
>   
> org.apache.solr.client.solrj.SolrServerException:IOException occured 
> when talking to server at: https://vt-searchln04:443/solr
>   
> 
> {code}
> *Update: Three problems:*
> # Status=0 when the cmd did not succeed (only ZK was updated, but cores not 
> created due to failing to connect to shard nodes to talk to core admin API).
> # The error printed does not tell which action failed. Would be helpful to 
> either get the msg from the original exception or at least some message 
> saying "Failed to create core, see log on Overseer 
> # State of collection is not clean since it exists as far as ZK is concerned 
> but cores not created. Thus retrying the CREATECOLLECTION cmd would fail. 
> Should Overseer detect error in distributed cmds and rollback changes already 
> made in ZK?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6595) Improve error response in case distributed collection cmd fails

2018-11-28 Thread Jason Gerlowski (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702292#comment-16702292
 ] 

Jason Gerlowski commented on SOLR-6595:
---

Thinking aloud here, and I guess also soliciting feedback.

The current patch sets 500 as the value for the "status' property, as well as 
the HTTP status code on the response.  The expectation in most other places 
seems to be that the "status" property matches the HTTP status code.  So this 
seems like the technically correct thing to do from an API perspective.

There's is a downside to this though- SolrJ converts non-200 responses into 
exceptions.  So while the failure information is still in the response, SolrJ 
users can't get at it.  (This isn't strictly true...SolrJ tries its best to 
come up with a good exception message by looking for properties like "error" 
and "failure".  But that's a pale substitute to giving users access to the 
response itself if they want it).

It'd be cool if SolrJ users could access the original response in exceptional 
cases.  Maybe we should attach the parsed NamedList to RemoteSolrExceptions 
that get thrown by SolrJ.  That seems like a separate JIRA, but wanted to raise 
it here since it bears on these response changes indirectly.

> Improve error response in case distributed collection cmd fails
> ---
>
> Key: SOLR-6595
> URL: https://issues.apache.org/jira/browse/SOLR-6595
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.10
> Environment: SolrCloud with Client SSL
>Reporter: Sindre Fiskaa
>Assignee: Jason Gerlowski
>Priority: Minor
> Attachments: SOLR-6595.patch
>
>
> Followed the description 
> https://cwiki.apache.org/confluence/display/solr/Enabling+SSL and generated a 
> self signed key pair. Configured a few solr-nodes and used the collection api 
> to crate a new collection. -I get error message when specify the nodes with 
> the createNodeSet param. When I don't use createNodeSet param the collection 
> gets created without error on random nodes. Could this be a bug related to 
> the createNodeSet param?- *Update: It failed due to what turned out to be 
> invalid client certificate on the overseer, and returned the following 
> response:*
> {code:xml}
> 
>   0 name="QTime">185
>   
> org.apache.solr.client.solrj.SolrServerException:IOException occured 
> when talking to server at: https://vt-searchln04:443/solr
>   
> 
> {code}
> *Update: Three problems:*
> # Status=0 when the cmd did not succeed (only ZK was updated, but cores not 
> created due to failing to connect to shard nodes to talk to core admin API).
> # The error printed does not tell which action failed. Would be helpful to 
> either get the msg from the original exception or at least some message 
> saying "Failed to create core, see log on Overseer 
> # State of collection is not clean since it exists as far as ZK is concerned 
> but cores not created. Thus retrying the CREATECOLLECTION cmd would fail. 
> Should Overseer detect error in distributed cmds and rollback changes already 
> made in ZK?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6595) Improve error response in case distributed collection cmd fails

2018-11-26 Thread Jason Gerlowski (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16699566#comment-16699566
 ] 

Jason Gerlowski commented on SOLR-6595:
---

I've attached a patch here which ensures that any collection-api response with 
a non-empty "failure" property also has its status set to 500.  This has the 
advantage of covering things more generically and save us from constantly 
finding new cases where the status property (and HTTP status code) is 
incorrect.  (There's a few different JIRAs open at the moment for similar 
issues with various collection APIs.).

Reviewers might notice that I change the status to 500 not by throwing a 
SolrException as is common, but my introducing a field in SolrQueryResponse as 
a "status-override".  I didn't like deviating from the normal way of doing 
things, and I don't love introducing yet-another way to set the API status, but 
I had trouble finding a good way to flatten the often-nested structure of the 
"failure" map into a message for a SolrException without losing tons of 
information that could help the user out.  If anyone sees a better way here, 
I'd love some review/feedback.

This change triggers a few additional test failures- the API calls in these 
tests have apparently been failing for some time before this change but we 
never noticed since the response status obscured the problem.  So this patch 
includes fixes for a number of these tests.  I'm still building confidence that 
I've caught all of these cases, hoping to flush out more status-related test 
failures through the week.  If my runs stop finding issues by the end of the 
week, I'll be looking to commit.

> Improve error response in case distributed collection cmd fails
> ---
>
> Key: SOLR-6595
> URL: https://issues.apache.org/jira/browse/SOLR-6595
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.10
> Environment: SolrCloud with Client SSL
>Reporter: Sindre Fiskaa
>Assignee: Jason Gerlowski
>Priority: Minor
> Attachments: SOLR-6595.patch
>
>
> Followed the description 
> https://cwiki.apache.org/confluence/display/solr/Enabling+SSL and generated a 
> self signed key pair. Configured a few solr-nodes and used the collection api 
> to crate a new collection. -I get error message when specify the nodes with 
> the createNodeSet param. When I don't use createNodeSet param the collection 
> gets created without error on random nodes. Could this be a bug related to 
> the createNodeSet param?- *Update: It failed due to what turned out to be 
> invalid client certificate on the overseer, and returned the following 
> response:*
> {code:xml}
> 
>   0 name="QTime">185
>   
> org.apache.solr.client.solrj.SolrServerException:IOException occured 
> when talking to server at: https://vt-searchln04:443/solr
>   
> 
> {code}
> *Update: Three problems:*
> # Status=0 when the cmd did not succeed (only ZK was updated, but cores not 
> created due to failing to connect to shard nodes to talk to core admin API).
> # The error printed does not tell which action failed. Would be helpful to 
> either get the msg from the original exception or at least some message 
> saying "Failed to create core, see log on Overseer 
> # State of collection is not clean since it exists as far as ZK is concerned 
> but cores not created. Thus retrying the CREATECOLLECTION cmd would fail. 
> Should Overseer detect error in distributed cmds and rollback changes already 
> made in ZK?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6595) Improve error response in case distributed collection cmd fails

2018-11-19 Thread Jason Gerlowski (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692329#comment-16692329
 ] 

Jason Gerlowski commented on SOLR-6595:
---

Wanted to check in on this and see which of the original concerns are still 
issues:

bq. Status=0 when the cmd did not succeed
Still a problem, though it will soon be fixed for CREATE, the reporter's 
original example here.

bq. The error printed does not tell which action failed
Still a problem, but a hard one: it's tough to guess which bits in the 
exception chain are the helpful bits.  The top and root of the chain are the 
most likely entries to be interesting, but not always.  Any truncation of the 
exception chain is going to reduce the chance we're conveying the important 
part.

bq. State of collection is not clean since it exists as far as ZK is concerned 
but cores not created
This _should_ have already been fixed in SOLR-8983.

So I'd argue that fixing the {{status}} property should be our main goal.  To 
that end, I've attached a patch fixing this problem for CREATE on SOLR-5970.  I 
don't like the narrowness of that fix though will spend some time seeing if 
there's a way it can be generalized at a different level of our collection API 
processing.  Going to assign this to myself.

> Improve error response in case distributed collection cmd fails
> ---
>
> Key: SOLR-6595
> URL: https://issues.apache.org/jira/browse/SOLR-6595
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.10
> Environment: SolrCloud with Client SSL
>Reporter: Sindre Fiskaa
>Priority: Minor
>
> Followed the description 
> https://cwiki.apache.org/confluence/display/solr/Enabling+SSL and generated a 
> self signed key pair. Configured a few solr-nodes and used the collection api 
> to crate a new collection. -I get error message when specify the nodes with 
> the createNodeSet param. When I don't use createNodeSet param the collection 
> gets created without error on random nodes. Could this be a bug related to 
> the createNodeSet param?- *Update: It failed due to what turned out to be 
> invalid client certificate on the overseer, and returned the following 
> response:*
> {code:xml}
> 
>   0 name="QTime">185
>   
> org.apache.solr.client.solrj.SolrServerException:IOException occured 
> when talking to server at: https://vt-searchln04:443/solr
>   
> 
> {code}
> *Update: Three problems:*
> # Status=0 when the cmd did not succeed (only ZK was updated, but cores not 
> created due to failing to connect to shard nodes to talk to core admin API).
> # The error printed does not tell which action failed. Would be helpful to 
> either get the msg from the original exception or at least some message 
> saying "Failed to create core, see log on Overseer 
> # State of collection is not clean since it exists as far as ZK is concerned 
> but cores not created. Thus retrying the CREATECOLLECTION cmd would fail. 
> Should Overseer detect error in distributed cmds and rollback changes already 
> made in ZK?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6595) Improve error response in case distributed collection cmd fails

2016-10-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-6595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15575214#comment-15575214
 ] 

Jan Høydahl commented on SOLR-6595:
---

I wonder if the error reporting might be solved during a lot of refactoring of 
the overseer, async operations etc? Anyone?

> Improve error response in case distributed collection cmd fails
> ---
>
> Key: SOLR-6595
> URL: https://issues.apache.org/jira/browse/SOLR-6595
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.10
> Environment: SolrCloud with Client SSL
>Reporter: Sindre Fiskaa
>Priority: Minor
>
> Followed the description 
> https://cwiki.apache.org/confluence/display/solr/Enabling+SSL and generated a 
> self signed key pair. Configured a few solr-nodes and used the collection api 
> to crate a new collection. -I get error message when specify the nodes with 
> the createNodeSet param. When I don't use createNodeSet param the collection 
> gets created without error on random nodes. Could this be a bug related to 
> the createNodeSet param?- *Update: It failed due to what turned out to be 
> invalid client certificate on the overseer, and returned the following 
> response:*
> {code:xml}
> 
>   0 name="QTime">185
>   
> org.apache.solr.client.solrj.SolrServerException:IOException occured 
> when talking to server at: https://vt-searchln04:443/solr
>   
> 
> {code}
> *Update: Three problems:*
> # Status=0 when the cmd did not succeed (only ZK was updated, but cores not 
> created due to failing to connect to shard nodes to talk to core admin API).
> # The error printed does not tell which action failed. Would be helpful to 
> either get the msg from the original exception or at least some message 
> saying "Failed to create core, see log on Overseer 
> # State of collection is not clean since it exists as far as ZK is concerned 
> but cores not created. Thus retrying the CREATECOLLECTION cmd would fail. 
> Should Overseer detect error in distributed cmds and rollback changes already 
> made in ZK?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6595) Improve error response in case distributed collection cmd fails

2015-12-02 Thread mugeesh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036518#comment-15036518
 ] 

mugeesh commented on SOLR-6595:
---

above conversation nobody tell clearly how to solve it,
I am getting same error in solr-5.3.
Provide the exact command for creating create colllection/core.

> Improve error response in case distributed collection cmd fails
> ---
>
> Key: SOLR-6595
> URL: https://issues.apache.org/jira/browse/SOLR-6595
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.10
> Environment: SolrCloud with Client SSL
>Reporter: Sindre Fiskaa
>Priority: Minor
>
> Followed the description 
> https://cwiki.apache.org/confluence/display/solr/Enabling+SSL and generated a 
> self signed key pair. Configured a few solr-nodes and used the collection api 
> to crate a new collection. -I get error message when specify the nodes with 
> the createNodeSet param. When I don't use createNodeSet param the collection 
> gets created without error on random nodes. Could this be a bug related to 
> the createNodeSet param?- *Update: It failed due to what turned out to be 
> invalid client certificate on the overseer, and returned the following 
> response:*
> {code:xml}
> 
>   0 name="QTime">185
>   
> org.apache.solr.client.solrj.SolrServerException:IOException occured 
> when talking to server at: https://vt-searchln04:443/solr
>   
> 
> {code}
> *Update: Three problems:*
> # Status=0 when the cmd did not succeed (only ZK was updated, but cores not 
> created due to failing to connect to shard nodes to talk to core admin API).
> # The error printed does not tell which action failed. Would be helpful to 
> either get the msg from the original exception or at least some message 
> saying "Failed to create core, see log on Overseer 
> # State of collection is not clean since it exists as far as ZK is concerned 
> but cores not created. Thus retrying the CREATECOLLECTION cmd would fail. 
> Should Overseer detect error in distributed cmds and rollback changes already 
> made in ZK?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6595) Improve error response in case distributed collection cmd fails

2014-10-24 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-6595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183481#comment-14183481
 ] 

Jan Høydahl commented on SOLR-6595:
---

Appreciate feedback and discussion on how to solve this...

 Improve error response in case distributed collection cmd fails
 ---

 Key: SOLR-6595
 URL: https://issues.apache.org/jira/browse/SOLR-6595
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.10
 Environment: SolrCloud with Client SSL
Reporter: Sindre Fiskaa
Priority: Minor

 Followed the description 
 https://cwiki.apache.org/confluence/display/solr/Enabling+SSL and generated a 
 self signed key pair. Configured a few solr-nodes and used the collection api 
 to crate a new collection. -I get error message when specify the nodes with 
 the createNodeSet param. When I don't use createNodeSet param the collection 
 gets created without error on random nodes. Could this be a bug related to 
 the createNodeSet param?- *Update: It failed due to what turned out to be 
 invalid client certificate on the overseer, and returned the following 
 response:*
 {code:xml}
 response
   lst name=responseHeaderint name=status0/intint 
 name=QTime185/int/lst
   lst name=failure
 strorg.apache.solr.client.solrj.SolrServerException:IOException occured 
 when talking to server at: https://vt-searchln04:443/solr/str
   /lst
 /response
 {code}
 *Update: Three problems:*
 # Status=0 when the cmd did not succeed (only ZK was updated, but cores not 
 created due to failing to connect to shard nodes to talk to core admin API).
 # The error printed does not tell which action failed. Would be helpful to 
 either get the msg from the original exception or at least some message 
 saying Failed to create core, see log on Overseer node.name
 # State of collection is not clean since it exists as far as ZK is concerned 
 but cores not created. Thus retrying the CREATECOLLECTION cmd would fail. 
 Should Overseer detect error in distributed cmds and rollback changes already 
 made in ZK?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6595) Improve error response in case distributed collection cmd fails

2014-10-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-6595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168385#comment-14168385
 ] 

Jan Høydahl commented on SOLR-6595:
---

Comment to the three listed problems in the updated problem description:
# What error code to return? Anything is better than 0. In this case it's a 
server configuration error, so 5xx? But where to modify the status code? 
Perhaps {{OverseerCollectionProcessor#processResponse()}}?
# How about printing the Exception-class names of all intermediate exceptions 
in the chain and then the message from the original one?
# Rollback of partially successful collection create would be interesting, but 
deserves its own JIRA perhaps :-)

 Improve error response in case distributed collection cmd fails
 ---

 Key: SOLR-6595
 URL: https://issues.apache.org/jira/browse/SOLR-6595
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.10
 Environment: SolrCloud with Client SSL
Reporter: Sindre Fiskaa
Priority: Minor

 Followed the description 
 https://cwiki.apache.org/confluence/display/solr/Enabling+SSL and generated a 
 self signed key pair. Configured a few solr-nodes and used the collection api 
 to crate a new collection. -I get error message when specify the nodes with 
 the createNodeSet param. When I don't use createNodeSet param the collection 
 gets created without error on random nodes. Could this be a bug related to 
 the createNodeSet param?- *Update: It failed due to what turned out to be 
 invalid client certificate on the overseer, and returned the following 
 response:*
 {code:xml}
 response
   lst name=responseHeaderint name=status0/intint 
 name=QTime185/int/lst
   lst name=failure
 strorg.apache.solr.client.solrj.SolrServerException:IOException occured 
 when talking to server at: https://vt-searchln04:443/solr/str
   /lst
 /response
 {code}
 *Update: Three problems:*
 # Status=0 when the cmd did not succeed (only ZK was updated, but cores not 
 created due to failing to connect to shard nodes to talk to core admin API).
 # The error printed does not tell which action failed. Would be helpful to 
 either get the msg from the original exception or at least some message 
 saying Failed to create core, see log on Overseer node.name
 # State of collection is not clean since it exists as far as ZK is concerned 
 but cores not created. Thus retrying the CREATECOLLECTION cmd would fail. 
 Should Overseer detect error in distributed cmds and rollback changes already 
 made in ZK?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org