Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
dsmiley merged PR #2935: URL: https://github.com/apache/solr/pull/2935 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
dsmiley commented on PR #2935: URL: https://github.com/apache/solr/pull/2935#issuecomment-2608306662 I'll merge this in about a day as-is. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
mlbiscoc commented on code in PR #2935:
URL: https://github.com/apache/solr/pull/2935#discussion_r1925986935
##
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java:
##
@@ -216,38 +236,42 @@ private SimpleOrderedMap submitClusterStateRequest(
@Override
public Set getLiveNodes() {
-if (liveNodes == null) {
- throw new RuntimeException(
- "We don't know of any live_nodes to fetch the"
- + " latest live_nodes information from. "
- + "If you think your Solr cluster is up and is accessible,"
- + " you could try re-creating a new CloudSolrClient using
working"
- + " solrUrl(s) or zkHost(s).");
-}
if (TimeUnit.SECONDS.convert((System.nanoTime() - liveNodesTimestamp),
TimeUnit.NANOSECONDS)
> getCacheTimeout()) {
- for (String nodeName : liveNodes) {
-String baseUrl = Utils.getBaseUrlForNodeName(nodeName, urlScheme);
-try (SolrClient client = getSolrClient(baseUrl)) {
- Set liveNodes = fetchLiveNodes(client);
- this.liveNodes = (liveNodes);
- liveNodesTimestamp = System.nanoTime();
- return liveNodes;
-} catch (Exception e) {
- log.warn("Attempt to fetch cluster state from {} failed.", baseUrl,
e);
-}
- }
+
+ if (liveNodes.stream()
+ .anyMatch((node) ->
updateLiveNodes(URLUtil.getBaseUrlForNodeName(node, urlScheme
+return this.liveNodes;
+
+ log.warn(
+ "Attempt to fetch cluster state from all known live nodes {} failed.
Trying backup nodes",
+ liveNodes);
+
+ if (configuredNodes.stream().anyMatch((node) ->
updateLiveNodes(node.toString(
Review Comment:
We could but to me it makes sense to try live nodes first unless this class
was explicitly created as your JIRA proposal pointed, in that it only uses the
passed URLs and nothing else.
I think it should be used as backup and go for fetching from live nodes
first and assume by its name it is "live". If we do the opt in as a feature
flag for dynamic node discovery then that gives them control. I'm fine with
flipping it with configured first though if you feel strongly about it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
dsmiley commented on code in PR #2935:
URL: https://github.com/apache/solr/pull/2935#discussion_r1925844451
##
solr/CHANGES.txt:
##
@@ -176,6 +176,10 @@ Bug Fixes
* SOLR-17629: If SQLHandler failed to open the underlying stream (e.g. Solr
returns an error; could be user/syntax problem),
it needs to close the stream to cleanup resources but wasn't. (David Smiley)
+* SOLR-17519: SolrJ fix to CloudSolrClient and HTTP ClusterState where it can
fail to request cluster state
+ if its current live nodes list is stale. HTTP ClusterState now retains the
initial configured list of passed URLs as backup
Review Comment:
I don't think Solr users know about "HTTP ClusterState"; it's not likely in
their vocabulary. On the other hand, wording like "SolrJ CloudSolrClient
configured with Solr URLs" is likely to be understood.
##
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java:
##
@@ -216,38 +236,42 @@ private SimpleOrderedMap submitClusterStateRequest(
@Override
public Set getLiveNodes() {
-if (liveNodes == null) {
- throw new RuntimeException(
- "We don't know of any live_nodes to fetch the"
- + " latest live_nodes information from. "
- + "If you think your Solr cluster is up and is accessible,"
- + " you could try re-creating a new CloudSolrClient using
working"
- + " solrUrl(s) or zkHost(s).");
-}
if (TimeUnit.SECONDS.convert((System.nanoTime() - liveNodesTimestamp),
TimeUnit.NANOSECONDS)
> getCacheTimeout()) {
- for (String nodeName : liveNodes) {
-String baseUrl = Utils.getBaseUrlForNodeName(nodeName, urlScheme);
-try (SolrClient client = getSolrClient(baseUrl)) {
- Set liveNodes = fetchLiveNodes(client);
- this.liveNodes = (liveNodes);
- liveNodesTimestamp = System.nanoTime();
- return liveNodes;
-} catch (Exception e) {
- log.warn("Attempt to fetch cluster state from {} failed.", baseUrl,
e);
-}
- }
+
+ if (liveNodes.stream()
+ .anyMatch((node) ->
updateLiveNodes(URLUtil.getBaseUrlForNodeName(node, urlScheme
+return this.liveNodes;
+
+ log.warn(
+ "Attempt to fetch cluster state from all known live nodes {} failed.
Trying backup nodes",
+ liveNodes);
+
+ if (configuredNodes.stream().anyMatch((node) ->
updateLiveNodes(node.toString(
Review Comment:
Shouldn't we check configuredNodes *first* so as to give the user the
possibility of some control, basically addressing the conversation points in
JIRA?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
mlbiscoc commented on code in PR #2935:
URL: https://github.com/apache/solr/pull/2935#discussion_r1925380385
##
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java:
##
@@ -387,6 +398,19 @@ public String getPolicyNameByCollection(String coll) {
+ "ZkClientClusterStateProvider can be used for this."); // TODO
}
+ private List setConfiguredNodes(List solrUrls) {
Review Comment:
You're right bad naming. It's only used in the `init` so Instead I just
removed the function as I don't think we need another utility method and it was
somewhat small enough inlined the whole thing into the stream where it was
called. Can move it into a private static utility if you prefer otherwise.
##
solr/solrj/src/test/org/apache/solr/client/solrj/impl/ClusterStateProviderTest.java:
##
@@ -204,8 +204,8 @@ public void
testClusterStateProviderDownedInitialLiveNodes() throws Exception {
var jettyNode1 = cluster.getJettySolrRunner(0);
var jettyNode2 = cluster.getJettySolrRunner(1);
- String nodeName1 =
getNodeNameFromSolrUrl(jettyNode1.getBaseUrl().toString());
- String nodeName2 =
getNodeNameFromSolrUrl(jettyNode2.getBaseUrl().toString());
+ String nodeName1 =
getNodeNameForBaseUrl(jettyNode1.getBaseUrl().toString());
Review Comment:
Yeah it was deliberate. Was mostly to keep consistency between the names.
`getNodeNameForBaseUrl` and `getBaseUrlForNodeName` but can switch it back if
you prefer.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
dsmiley commented on code in PR #2935:
URL: https://github.com/apache/solr/pull/2935#discussion_r1924605334
##
solr/solrj/src/test/org/apache/solr/client/solrj/impl/ClusterStateProviderTest.java:
##
@@ -204,8 +204,8 @@ public void
testClusterStateProviderDownedInitialLiveNodes() throws Exception {
var jettyNode1 = cluster.getJettySolrRunner(0);
var jettyNode2 = cluster.getJettySolrRunner(1);
- String nodeName1 =
getNodeNameFromSolrUrl(jettyNode1.getBaseUrl().toString());
- String nodeName2 =
getNodeNameFromSolrUrl(jettyNode2.getBaseUrl().toString());
+ String nodeName1 =
getNodeNameForBaseUrl(jettyNode1.getBaseUrl().toString());
Review Comment:
Was From -> For deliberate? IMO it was named fine.
##
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java:
##
@@ -387,6 +398,19 @@ public String getPolicyNameByCollection(String coll) {
+ "ZkClientClusterStateProvider can be used for this."); // TODO
}
+ private List setConfiguredNodes(List solrUrls) {
Review Comment:
this doesn't actually "set" anything; it's a utility function mapping input
to output converting to a URL. It could even be static. If you declare a
static method that only converts a URL with the try-catch, it'd be simple for
the caller (init) to invoke it with the streams mapping in a one-liner.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
mlbiscoc commented on code in PR #2935:
URL: https://github.com/apache/solr/pull/2935#discussion_r1924556359
##
solr/test-framework/src/java/org/apache/solr/cloud/MiniSolrCloudCluster.java:
##
@@ -530,14 +530,14 @@ public JettySolrRunner stopJettySolrRunner(int index)
throws Exception {
}
/**
- * Add a previously stopped node back to the cluster
+ * Add a previously stopped node back to the cluster and reuse its port
*
* @param jetty a {@link JettySolrRunner} previously returned by {@link
#stopJettySolrRunner(int)}
* @return the started node
* @throws Exception on error
*/
public JettySolrRunner startJettySolrRunner(JettySolrRunner jetty) throws
Exception {
-jetty.start(false);
+jetty.start();
Review Comment:
Thanks for find that. Thats fair. For now, I overloaded the function with a
`reusePort` to not break anything and maybe change it to just reuse port later.
Is it fine to do that for now, or should I just be calling jetty itself and
reuse the port in the tests?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
mlbiscoc commented on code in PR #2935: URL: https://github.com/apache/solr/pull/2935#discussion_r1924555335 ## solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java: ## @@ -51,6 +56,7 @@ public abstract class BaseHttpClusterStateProvider implements ClusterStateProvid private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); private String urlScheme; + private List configuredNodes; Review Comment: Changed to `List` and changed `updateLiveNodes` accordingly. Lmk if how I changed it makes sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
mlbiscoc commented on code in PR #2935:
URL: https://github.com/apache/solr/pull/2935#discussion_r1924552833
##
solr/solrj/src/java/org/apache/solr/common/util/URLUtil.java:
##
@@ -98,4 +102,59 @@ private static String removeTrailingSlashIfPresent(String
url) {
return url;
}
+
+ /**
+ * Construct a V1 base url for the Solr node, given its name (e.g.,
'app-node-1:8983_solr') and a
+ * URL scheme.
+ *
+ * @param nodeName name of the Solr node
+ * @param urlScheme scheme for the base url ('http' or 'https')
+ * @return url that looks like {@code https://app-node-1:8983/solr}
+ * @throws IllegalArgumentException if the provided node name is malformed
+ */
+ public static String getBaseUrlForNodeName(final String nodeName, final
String urlScheme) {
+return getBaseUrlForNodeName(nodeName, urlScheme, false);
+ }
+
+ /**
+ * Construct a V1 or a V2 base url for the Solr node, given its name (e.g.,
+ * 'app-node-1:8983_solr') and a URL scheme.
+ *
+ * @param nodeName name of the Solr node
+ * @param urlScheme scheme for the base url ('http' or 'https')
+ * @param isV2 whether a V2 url should be constructed
+ * @return url that looks like {@code https://app-node-1:8983/api} (V2) or
{@code
+ * https://app-node-1:8983/solr} (V1)
+ * @throws IllegalArgumentException if the provided node name is malformed
+ */
+ public static String getBaseUrlForNodeName(
+ final String nodeName, final String urlScheme, boolean isV2) {
+final int colonAt = nodeName.indexOf(':');
+if (colonAt == -1) {
+ throw new IllegalArgumentException(
+ "nodeName does not contain expected ':' separator: " + nodeName);
+}
+
+final int _offset = nodeName.indexOf('_', colonAt);
+if (_offset < 0) {
+ throw new IllegalArgumentException(
+ "nodeName does not contain expected '_' separator: " + nodeName);
+}
+final String hostAndPort = nodeName.substring(0, _offset);
+return urlScheme + "://" + hostAndPort + "/" + (isV2 ? "api" : "solr");
+ }
+
+ /**
+ * Construct base Solr URL to a Solr node name
+ *
+ * @param solrUrl Given a base Solr URL string (e.g.,
'https://app-node-1:8983/solr')
+ * @return Node name that looks like {@code app-node-1:8983_solr}
+ * @throws MalformedURLException if the provided URL string is malformed
+ * @throws URISyntaxException if the provided URL string could not be parsed
as a URI reference.
+ */
+ public static String getNodeNameForBaseUrl(String solrUrl)
+ throws MalformedURLException, URISyntaxException {
+URL url = new URI(solrUrl).toURL();
+return url.getAuthority() + url.getPath().replace('/', '_');
Review Comment:
After changing configured nodes to List, this function isn't used
anymore except for tests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
dsmiley commented on code in PR #2935:
URL: https://github.com/apache/solr/pull/2935#discussion_r1924427707
##
solr/test-framework/src/java/org/apache/solr/cloud/MiniSolrCloudCluster.java:
##
@@ -530,14 +530,14 @@ public JettySolrRunner stopJettySolrRunner(int index)
throws Exception {
}
/**
- * Add a previously stopped node back to the cluster
+ * Add a previously stopped node back to the cluster and reuse its port
*
* @param jetty a {@link JettySolrRunner} previously returned by {@link
#stopJettySolrRunner(int)}
* @return the started node
* @throws Exception on error
*/
public JettySolrRunner startJettySolrRunner(JettySolrRunner jetty) throws
Exception {
-jetty.start(false);
+jetty.start();
Review Comment:
When faced with curiosities like this, it's useful to do a bit of code/JIRA
archeology. `git blame` (via IntelliJ's margin, BTW) shows the current setting
(to pick a random port) was *deliberate* and had its own JIRA --
https://issues.apache.org/jira/browse/SOLR-9474 and I found a related one:
https://issues.apache.org/jira/browse/SOLR-9469 . I read the conversation
there. Changing it is worthy of its own JIRA & CHANGES.txt entry, following a
deeper attempt to understand the causes behind why it hasn't been reusing a
port thus far.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
dsmiley commented on code in PR #2935: URL: https://github.com/apache/solr/pull/2935#discussion_r1924412143 ## solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java: ## @@ -51,6 +56,7 @@ public abstract class BaseHttpClusterStateProvider implements ClusterStateProvid private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); private String urlScheme; + private List configuredNodes; Review Comment: We don't need a `liveNodeURLs` field. We know how to convert on the fly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
HoustonPutman commented on code in PR #2935: URL: https://github.com/apache/solr/pull/2935#discussion_r1924410045 ## solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java: ## @@ -51,6 +56,7 @@ public abstract class BaseHttpClusterStateProvider implements ClusterStateProvid private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); private String urlScheme; + private List configuredNodes; Review Comment: yeah, keeping `liveNodes` as string and `configuredNodes` as URLs, with the method handling each sounds good to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
mlbiscoc commented on code in PR #2935: URL: https://github.com/apache/solr/pull/2935#discussion_r1924301197 ## solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java: ## @@ -51,6 +56,7 @@ public abstract class BaseHttpClusterStateProvider implements ClusterStateProvid private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); private String urlScheme; + private List configuredNodes; Review Comment: I basically tried to keep `configuredNodes` the same format as `liveNodes` as they are used the same with `updateLiveNodes`. I don't like the idea of another `liveNodeURLs` list attribute and think just changing liveNodes to be `List` would make sense but the caller now get a different list URLs and not nodeNames. I guess I'll just go the route of updateLiveNodes functions handling both `NodeName` strings and `Base URL` inputs unless you feel strongly about having an additional `liveNodeURLs` list -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
mlbiscoc commented on code in PR #2935:
URL: https://github.com/apache/solr/pull/2935#discussion_r1924267502
##
solr/test-framework/src/java/org/apache/solr/cloud/MiniSolrCloudCluster.java:
##
@@ -530,14 +530,14 @@ public JettySolrRunner stopJettySolrRunner(int index)
throws Exception {
}
/**
- * Add a previously stopped node back to the cluster
+ * Add a previously stopped node back to the cluster and reuse its port
*
* @param jetty a {@link JettySolrRunner} previously returned by {@link
#stopJettySolrRunner(int)}
* @return the started node
* @throws Exception on error
*/
public JettySolrRunner startJettySolrRunner(JettySolrRunner jetty) throws
Exception {
-jetty.start(false);
+jetty.start();
Review Comment:
The tests I incorporated where I take down and bring back nodes, the URLs
changed because port number changed. So I looked at this method noticed this
was the cause. It didn't make sense to me change the port when you are passing
the same jetty back and tests didn't seem to be effected so I didn't think
there was an impact to change. I can move it out if you'd like, I just need to
then start jetty in the tests instead of from this function to keep the same
ports.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
dsmiley commented on code in PR #2935:
URL: https://github.com/apache/solr/pull/2935#discussion_r1924189617
##
solr/test-framework/src/java/org/apache/solr/cloud/MiniSolrCloudCluster.java:
##
@@ -530,14 +530,14 @@ public JettySolrRunner stopJettySolrRunner(int index)
throws Exception {
}
/**
- * Add a previously stopped node back to the cluster
+ * Add a previously stopped node back to the cluster and reuse its port
*
* @param jetty a {@link JettySolrRunner} previously returned by {@link
#stopJettySolrRunner(int)}
* @return the started node
* @throws Exception on error
*/
public JettySolrRunner startJettySolrRunner(JettySolrRunner jetty) throws
Exception {
-jetty.start(false);
+jetty.start();
Review Comment:
I suspect the results of this could be subtle, and nobody would guess this
JIRA/PR-title would have anything to do with this. Thus should be merged
separately. What did you find that led you to incorporate this change into
this PR?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
HoustonPutman commented on code in PR #2935: URL: https://github.com/apache/solr/pull/2935#discussion_r1924191070 ## solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java: ## @@ -51,6 +56,7 @@ public abstract class BaseHttpClusterStateProvider implements ClusterStateProvid private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); private String urlScheme; + private List configuredNodes; Review Comment: Yeah, if the idea is to have 1 method that handles both, then I would instead have a `liveNodeURLs` list in addition to `liveNodes`, so that the `updateLiveNodes ` method takes in the urls. But the method can probably split to share logic, but handle both strings and urls as inputs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
dsmiley commented on code in PR #2935: URL: https://github.com/apache/solr/pull/2935#discussion_r1924183653 ## solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java: ## @@ -51,6 +56,7 @@ public abstract class BaseHttpClusterStateProvider implements ClusterStateProvid private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); private String urlScheme; + private List configuredNodes; Review Comment: Perhaps configuredNodes should be a List? Why convert to a live node format? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
dsmiley commented on code in PR #2935: URL: https://github.com/apache/solr/pull/2935#discussion_r1924183653 ## solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java: ## @@ -51,6 +56,7 @@ public abstract class BaseHttpClusterStateProvider implements ClusterStateProvid private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); private String urlScheme; + private List configuredNodes; Review Comment: Perhaps configuredNodes should be a `List`? Why convert to a live node format? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
dsmiley commented on code in PR #2935:
URL: https://github.com/apache/solr/pull/2935#discussion_r1924087626
##
solr/solrj/src/java/org/apache/solr/common/util/Utils.java:
##
@@ -748,61 +745,6 @@ public static String applyUrlScheme(final String url,
final String urlScheme) {
return (at == -1) ? (urlScheme + "://" + url) : urlScheme +
url.substring(at);
}
- /**
- * Construct a V1 base url for the Solr node, given its name (e.g.,
'app-node-1:8983_solr') and a
- * URL scheme.
- *
- * @param nodeName name of the Solr node
- * @param urlScheme scheme for the base url ('http' or 'https')
- * @return url that looks like {@code https://app-node-1:8983/solr}
- * @throws IllegalArgumentException if the provided node name is malformed
- */
- public static String getBaseUrlForNodeName(final String nodeName, final
String urlScheme) {
Review Comment:
These methods are called in a bunch of places; we can't simply remove them
in 9x. They can stay and call the new location and be deprecated.
##
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java:
##
@@ -248,15 +240,19 @@ > getCacheTimeout()) {
+ "succeeded in obtaining the cluster state from none of them."
+ "If you think your Solr cluster is up and is accessible,"
+ " you could try re-creating a new CloudSolrClient using
working"
- + " solrUrl(s) or zkHost(s).");
+ + " solrUrl(s).");
} else {
return this.liveNodes; // cached copy is fresh enough
}
}
+ private boolean updateLiveNodes(List liveNodes) {
+return updateLiveNodes(Set.copyOf(liveNodes));
+ }
+
private boolean updateLiveNodes(Set liveNodes) {
Review Comment:
why does updateLiveNodes take a Set as an argument; can't it simply be a
Collection? The Set is forcing you to do a conversion and that which changes
the order, and the order may be deliberate. It's not worth overloading the
method for either.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
mlbiscoc commented on PR #2935: URL: https://github.com/apache/solr/pull/2935#issuecomment-2605025917 > You know what... See URLUtil. Shouldn't _that_ be where these methods go? The existing method can be deprecated and call the one at the better location. Also has a test where your tests of this method can be added to. (big code base; so many util classes) > > This PR is pretty close to merging now. I ended up moving all those functions into `URLUril` which resulted in a much larger code change. Not sure if you meant for me to use the `Deprecated` annotation but instead just moved it and refactored the callers. Also didn't add in [SOLR-17607](https://issues.apache.org/jira/browse/SOLR-17607) because it broke a few more tests that needed more changes so just held the changes locally and will probably just make it in another PR and link it to that JIRA unless you prefer me to do it here in one go. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
dsmiley commented on code in PR #2935:
URL: https://github.com/apache/solr/pull/2935#discussion_r1922783901
##
solr/solrj/src/test/org/apache/solr/client/solrj/impl/ClusterStateProviderTest.java:
##
@@ -51,6 +54,15 @@ public static void setupCluster() throws Exception {
.resolve("streaming")
.resolve("conf"))
.configure();
+cluster.waitForAllNodes(30);
+System.setProperty("solr.solrj.cache.timeout.sec", "1");
Review Comment:
Christine, Solr's test infrastructure auto-restores all system properties to
as they were from test suite to test suite thanks to
SystemPropertiesRestoreRule. But not test-method to test-method, albeit that's
not pertinent for the line you commented (as it's in a BeforeClass).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
cpoerschke commented on code in PR #2935:
URL: https://github.com/apache/solr/pull/2935#discussion_r1922536226
##
solr/solrj/src/test/org/apache/solr/client/solrj/impl/ClusterStateProviderTest.java:
##
@@ -51,6 +54,15 @@ public static void setupCluster() throws Exception {
.resolve("streaming")
.resolve("conf"))
.configure();
+cluster.waitForAllNodes(30);
+System.setProperty("solr.solrj.cache.timeout.sec", "1");
Review Comment:
(am random pull request browser/scanner only here) -- maybe have matching
`System.clearProperty` during test cleanup to avoid carry-ever effects between
tests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
dsmiley commented on code in PR #2935:
URL: https://github.com/apache/solr/pull/2935#discussion_r1920955860
##
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java:
##
@@ -413,6 +430,19 @@ public String getQuorumHosts() {
return String.join(",", this.liveNodes);
}
+ public Set getNodeNamesFromSolrUrls(List urls) {
+return urls.stream()
+.map(
+(url) -> {
+ try {
+return getNodeNameFromSolrUrl(url);
+ } catch (MalformedURLException | URISyntaxException e) {
+throw new RuntimeException("Failed to parse base Solr URL " +
url, e);
Review Comment:
IllegalArgumentException seems appropriate
##
solr/solrj/src/java/org/apache/solr/common/util/Utils.java:
##
@@ -786,6 +789,20 @@ public static String getBaseUrlForNodeName(
return urlScheme + "://" + hostAndPort + "/" + (isV2 ? "api" : "solr");
}
+ /**
+ * Construct base Solr URL to a Solr node name
+ *
+ * @param solrUrl Given a base Solr URL string (e.g.,
'https://app-node-1:8983/solr')
+ * @return URL that looks like {@code app-node-1:8983_solr}
+ * @throws MalformedURLException if the provided URL string is malformed
+ * @throws URISyntaxException if the provided URL string could not be parsed
as a URI reference.
+ */
+ public static String getNodeNameFromSolrUrl(String solrUrl)
+ throws MalformedURLException, URISyntaxException {
+URL url = new URI(solrUrl).toURL();
+return url.getAuthority() + url.getPath().replace('/', '_');
+ }
Review Comment:
Perfect. (never mind my redundant code review comment)
##
solr/test-framework/src/java/org/apache/solr/cloud/MiniSolrCloudCluster.java:
##
@@ -530,14 +530,14 @@ public JettySolrRunner stopJettySolrRunner(int index)
throws Exception {
}
/**
- * Add a previously stopped node back to the cluster
+ * Add a previously stopped node back to the cluster and reuse its port
*
* @param jetty a {@link JettySolrRunner} previously returned by {@link
#stopJettySolrRunner(int)}
* @return the started node
* @throws Exception on error
*/
public JettySolrRunner startJettySolrRunner(JettySolrRunner jetty) throws
Exception {
-jetty.start(false);
+jetty.start();
Review Comment:
That makes sense... I'll call it out in the commit log. I wonder if there's
any down-side to this.
##
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java:
##
@@ -51,6 +55,7 @@ public abstract class BaseHttpClusterStateProvider implements
ClusterStateProvid
private static final Logger log =
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
private String urlScheme;
+ private Set backupNodes;
Review Comment:
Also maybe leave as a list as it was provided. Why not use them in the
order provided.
##
solr/solrj/src/java/org/apache/solr/common/util/Utils.java:
##
@@ -786,6 +789,20 @@ public static String getBaseUrlForNodeName(
return urlScheme + "://" + hostAndPort + "/" + (isV2 ? "api" : "solr");
}
+ /**
+ * Construct base Solr URL to a Solr node name
+ *
+ * @param solrUrl Given a base Solr URL string (e.g.,
'https://app-node-1:8983/solr')
+ * @return URL that looks like {@code app-node-1:8983_solr}
+ * @throws MalformedURLException if the provided URL string is malformed
+ * @throws URISyntaxException if the provided URL string could not be parsed
as a URI reference.
+ */
Review Comment:
fantastic javadocs! I wish someone documented the existing reverse
direction method
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
dsmiley commented on code in PR #2935:
URL: https://github.com/apache/solr/pull/2935#discussion_r1920949678
##
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java:
##
@@ -226,17 +232,15 @@ public Set getLiveNodes() {
}
if (TimeUnit.SECONDS.convert((System.nanoTime() - liveNodesTimestamp),
TimeUnit.NANOSECONDS)
> getCacheTimeout()) {
- for (String nodeName : liveNodes) {
-String baseUrl = Utils.getBaseUrlForNodeName(nodeName, urlScheme);
-try (SolrClient client = getSolrClient(baseUrl)) {
- Set liveNodes = fetchLiveNodes(client);
- this.liveNodes = (liveNodes);
- liveNodesTimestamp = System.nanoTime();
- return liveNodes;
-} catch (Exception e) {
- log.warn("Attempt to fetch cluster state from {} failed.", baseUrl,
e);
-}
- }
+
+ if (updateLiveNodes(liveNodes)) return this.liveNodes;
+
+ log.warn(
+ "Attempt to fetch cluster state from all known live nodes {} failed.
Trying backup nodes",
+ liveNodes);
+
+ if (updateLiveNodes(backupNodes)) return this.liveNodes;
+
throw new RuntimeException(
"Tried fetching live_nodes using all the node names we knew of, i.e.
"
+ liveNodes
Review Comment:
and the configured nodes. Also don't recommend zkHosts :-)
##
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java:
##
@@ -226,17 +232,15 @@ public Set getLiveNodes() {
}
Review Comment:
Just above this, there's a liveNodes null check. That seems bogus now
##
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java:
##
@@ -51,6 +55,7 @@ public abstract class BaseHttpClusterStateProvider implements
ClusterStateProvid
private static final Logger log =
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
private String urlScheme;
+ private Set backupNodes;
Review Comment:
Can we name this configuredNodes since that's what it is? Yes we use it as
a backup but we could adapt how we use them.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
mlbiscoc commented on code in PR #2935:
URL: https://github.com/apache/solr/pull/2935#discussion_r1920793611
##
solr/solrj/src/java/org/apache/solr/common/util/Utils.java:
##
@@ -786,6 +789,20 @@ public static String getBaseUrlForNodeName(
return urlScheme + "://" + hostAndPort + "/" + (isV2 ? "api" : "solr");
}
+ /**
+ * Construct base Solr URL to a Solr node name
+ *
+ * @param solrUrl Given a base Solr URL string (e.g.,
'https://app-node-1:8983/solr')
+ * @return URL that looks like {@code app-node-1:8983_solr}
+ * @throws MalformedURLException if the provided URL string is malformed
+ * @throws URISyntaxException if the provided URL string could not be parsed
as a URI reference.
+ */
+ public static String getNodeNameFromSolrUrl(String solrUrl)
+ throws MalformedURLException, URISyntaxException {
+URL url = new URI(solrUrl).toURL();
+return url.getAuthority() + url.getPath().replace('/', '_');
+ }
Review Comment:
I couldn't find anywhere where this is a function like this (Maybe you
someone knows?). Regardless, I think if it exists somewhere, it should sit here
because the `getBaseUrlForNodeName` function right above it does the conversion
`nodeName->Url` and putting `url->nodeName` right below it makes sense to me.
Added unit tests and just added unit tests for `getBaseUrlForNodeName` because
it didn't exist before.
##
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java:
##
@@ -226,17 +232,15 @@ public Set getLiveNodes() {
}
if (TimeUnit.SECONDS.convert((System.nanoTime() - liveNodesTimestamp),
TimeUnit.NANOSECONDS)
> getCacheTimeout()) {
- for (String nodeName : liveNodes) {
-String baseUrl = Utils.getBaseUrlForNodeName(nodeName, urlScheme);
-try (SolrClient client = getSolrClient(baseUrl)) {
- Set liveNodes = fetchLiveNodes(client);
- this.liveNodes = (liveNodes);
- liveNodesTimestamp = System.nanoTime();
- return liveNodes;
-} catch (Exception e) {
- log.warn("Attempt to fetch cluster state from {} failed.", baseUrl,
e);
-}
- }
+
+ if (updateLiveNodes(liveNodes)) return this.liveNodes;
+
+ log.warn(
+ "Attempt to fetch cluster state from all known live nodes {} failed.
Trying backup nodes",
+ liveNodes);
+
+ if (updateLiveNodes(backupNodes)) return this.liveNodes;
Review Comment:
Backup is checked if liveNodes is exhausted. There are a few places that
liveNodes is used for fetching but backup isn't now. Was thinking of adding it,
but that requires a larger refactor from just this bug and writing additional
tests. Happy to do it though.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
mlbiscoc commented on code in PR #2935:
URL: https://github.com/apache/solr/pull/2935#discussion_r1911140452
##
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java:
##
@@ -413,6 +418,30 @@ public String getQuorumHosts() {
return String.join(",", this.liveNodes);
}
+ /** Live nodes should always have the latest set of live nodes but never
remove initial set */
+ private void setLiveNodes(Set nodes) {
+Set liveNodes = new HashSet<>(nodes);
+liveNodes.addAll(this.initialNodes);
+this.liveNodes = Set.copyOf(liveNodes);
Review Comment:
My idea was to not do another loop through if liveNodes was exhausted.
`initalNodes` would always exist in liveNodes with this set method. Let me
refactor again from your suggestion comment
##
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java:
##
@@ -413,6 +418,30 @@ public String getQuorumHosts() {
return String.join(",", this.liveNodes);
}
+ /** Live nodes should always have the latest set of live nodes but never
remove initial set */
+ private void setLiveNodes(Set nodes) {
+Set liveNodes = new HashSet<>(nodes);
+liveNodes.addAll(this.initialNodes);
+this.liveNodes = Set.copyOf(liveNodes);
+ }
+
+ public Set getNodeNamesFromSolrUrls(List urls)
+ throws URISyntaxException, MalformedURLException {
+Set set = new HashSet<>();
+for (String url : urls) {
+ String nodeNameFromSolrUrl = getNodeNameFromSolrUrl(url);
+ set.add(nodeNameFromSolrUrl);
+}
+return Collections.unmodifiableSet(set);
+ }
+
+ /** URL to cluster state node name (http://127.0.0.1:12345/solr to
127.0.0.1:12345_solr) */
+ public String getNodeNameFromSolrUrl(String solrUrl)
+ throws MalformedURLException, URISyntaxException {
+URL url = new URI(solrUrl).toURL();
+return url.getAuthority() + url.getPath().replace('/', '_');
+ }
Review Comment:
Hmm maybe... I found `getBaseUrlForNodeName` but it's going the other way
nodeName->url. Let me look around more
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
dsmiley commented on PR #2935: URL: https://github.com/apache/solr/pull/2935#issuecomment-2583773292 I simplified characterization of how I think this should be done: * on initialization, copy the configured URLs to a field for safe keeping. Convert to URL if you like (validates if malformed) * on initialization, liveNodes shall be empty; we haven't fetched it yet. Ideally we don't even try to on initialization. * when getLiveNodes is called and it's not out of date, return it. If it's out of date, loop live nodes _and then the initial configured URLs_ for the first server that will respond to fetch the current live nodes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
dsmiley commented on code in PR #2935:
URL: https://github.com/apache/solr/pull/2935#discussion_r1911044868
##
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java:
##
@@ -229,10 +236,9 @@ > getCacheTimeout()) {
for (String nodeName : liveNodes) {
Review Comment:
If we exhaust liveNodes, shouldn't we then try the initial configured nodes,
and then only failing that, throw an exception?
##
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java:
##
@@ -413,6 +418,30 @@ public String getQuorumHosts() {
return String.join(",", this.liveNodes);
}
+ /** Live nodes should always have the latest set of live nodes but never
remove initial set */
+ private void setLiveNodes(Set nodes) {
+Set liveNodes = new HashSet<>(nodes);
+liveNodes.addAll(this.initialNodes);
+this.liveNodes = Set.copyOf(liveNodes);
+ }
+
+ public Set getNodeNamesFromSolrUrls(List urls)
+ throws URISyntaxException, MalformedURLException {
+Set set = new HashSet<>();
+for (String url : urls) {
+ String nodeNameFromSolrUrl = getNodeNameFromSolrUrl(url);
+ set.add(nodeNameFromSolrUrl);
+}
+return Collections.unmodifiableSet(set);
Review Comment:
Could easily be converted to a single expression with streams
##
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java:
##
@@ -413,6 +418,30 @@ public String getQuorumHosts() {
return String.join(",", this.liveNodes);
}
+ /** Live nodes should always have the latest set of live nodes but never
remove initial set */
+ private void setLiveNodes(Set nodes) {
+Set liveNodes = new HashSet<>(nodes);
+liveNodes.addAll(this.initialNodes);
+this.liveNodes = Set.copyOf(liveNodes);
+ }
+
+ public Set getNodeNamesFromSolrUrls(List urls)
+ throws URISyntaxException, MalformedURLException {
+Set set = new HashSet<>();
+for (String url : urls) {
+ String nodeNameFromSolrUrl = getNodeNameFromSolrUrl(url);
+ set.add(nodeNameFromSolrUrl);
+}
+return Collections.unmodifiableSet(set);
+ }
+
+ /** URL to cluster state node name (http://127.0.0.1:12345/solr to
127.0.0.1:12345_solr) */
+ public String getNodeNameFromSolrUrl(String solrUrl)
+ throws MalformedURLException, URISyntaxException {
+URL url = new URI(solrUrl).toURL();
+return url.getAuthority() + url.getPath().replace('/', '_');
+ }
Review Comment:
can you find other code in Solr doing this? Surely it's somewhere.
##
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java:
##
@@ -413,6 +418,30 @@ public String getQuorumHosts() {
return String.join(",", this.liveNodes);
}
+ /** Live nodes should always have the latest set of live nodes but never
remove initial set */
+ private void setLiveNodes(Set nodes) {
+Set liveNodes = new HashSet<>(nodes);
+liveNodes.addAll(this.initialNodes);
+this.liveNodes = Set.copyOf(liveNodes);
+ }
+
+ public Set getNodeNamesFromSolrUrls(List urls)
+ throws URISyntaxException, MalformedURLException {
+Set set = new HashSet<>();
+for (String url : urls) {
+ String nodeNameFromSolrUrl = getNodeNameFromSolrUrl(url);
+ set.add(nodeNameFromSolrUrl);
+}
+return Collections.unmodifiableSet(set);
+ }
+
+ /** URL to cluster state node name (http://127.0.0.1:12345/solr to
127.0.0.1:12345_solr) */
+ public String getNodeNameFromSolrUrl(String solrUrl)
+ throws MalformedURLException, URISyntaxException {
+URL url = new URI(solrUrl).toURL();
+return url.getAuthority() + url.getPath().replace('/', '_');
+ }
Review Comment:
should be a static method that with a unit test
##
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java:
##
@@ -413,6 +418,30 @@ public String getQuorumHosts() {
return String.join(",", this.liveNodes);
}
+ /** Live nodes should always have the latest set of live nodes but never
remove initial set */
+ private void setLiveNodes(Set nodes) {
+Set liveNodes = new HashSet<>(nodes);
+liveNodes.addAll(this.initialNodes);
+this.liveNodes = Set.copyOf(liveNodes);
Review Comment:
Why is this 3 lines of extra copy-ing instead of nothing more than:
`this.liveNodes = Set.copyOf(nodes);` ? That is, why are we touching / using
initialNodes at all here?
Maybe an IllegalArgumentException if nodes.isEmpty.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: issues-unsubscr.
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
dsmiley commented on code in PR #2935:
URL: https://github.com/apache/solr/pull/2935#discussion_r1910989910
##
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java:
##
@@ -61,6 +67,7 @@ public abstract class BaseHttpClusterStateProvider implements
ClusterStateProvid
private int cacheTimeout =
EnvUtils.getPropertyAsInteger("solr.solrj.cache.timeout.sec", 5);
public void init(List solrUrls) throws Exception {
+this.initialNodes = getNodeNamesFromSolrUrls(solrUrls);
Review Comment:
great idea to do basic URL malformed checks like this
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
mlbiscoc commented on code in PR #2935:
URL: https://github.com/apache/solr/pull/2935#discussion_r1910697779
##
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java:
##
@@ -61,6 +67,7 @@ public abstract class BaseHttpClusterStateProvider implements
ClusterStateProvid
private int cacheTimeout =
EnvUtils.getPropertyAsInteger("solr.solrj.cache.timeout.sec", 5);
public void init(List solrUrls) throws Exception {
+this.initialNodes = getNodeNamesFromSolrUrls(solrUrls);
Review Comment:
> I think the idea of this JIRA issue is that we'll take the Solr URLs as
configured and use this is the initial / backup liveNodes. I think this is a
very simple idea to to document/understand/implement.
That makes sense. This leaves me with a few questions then. Shouldn't this
take a list of `URL` or `URI` java object to verify actual non malformed URLs
instead of a list of strings? I created the functions below to convert these
Strings into cluster state nodeNames for `liveNodes`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
mlbiscoc commented on code in PR #2935:
URL: https://github.com/apache/solr/pull/2935#discussion_r1910698804
##
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java:
##
@@ -413,6 +418,30 @@ public String getQuorumHosts() {
return String.join(",", this.liveNodes);
}
+ /** Live nodes should always have the latest set of live nodes but never
remove initial set */
+ private void setLiveNodes(Set nodes) {
+Set liveNodes = new HashSet<>(nodes);
+liveNodes.addAll(this.initialNodes);
+this.liveNodes = Set.copyOf(liveNodes);
+ }
+
+ public Set getNodeNamesFromSolrUrls(List urls)
+ throws URISyntaxException, MalformedURLException {
+Set set = new HashSet<>();
+for (String url : urls) {
+ String nodeNameFromSolrUrl = getNodeNameFromSolrUrl(url);
+ set.add(nodeNameFromSolrUrl);
+}
+return Collections.unmodifiableSet(set);
+ }
+
+ /** URL to cluster state node name (http://127.0.0.1:12345/solr to
127.0.0.1:12345_solr) */
+ public String getNodeNameFromSolrUrl(String solrUrl)
+ throws MalformedURLException, URISyntaxException {
+URL url = new URI(solrUrl).toURL();
+return url.getAuthority() + url.getPath().replace('/', '_');
+ }
+
Review Comment:
Not sure if these methods belong here but it is required to convert the
initial set of String urls into cluster state node names
##
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java:
##
@@ -61,6 +67,7 @@ public abstract class BaseHttpClusterStateProvider implements
ClusterStateProvid
private int cacheTimeout =
EnvUtils.getPropertyAsInteger("solr.solrj.cache.timeout.sec", 5);
public void init(List solrUrls) throws Exception {
+this.initialNodes = getNodeNamesFromSolrUrls(solrUrls);
Review Comment:
> I think the idea of this JIRA issue is that we'll take the Solr URLs as
configured and use this is the initial / backup liveNodes. I think this is a
very simple idea to to document/understand/implement.
That makes sense. This leaves me with a few questions then. Shouldn't this
take a list of `URL` or `URI` java object to verify actual non malformed URLs
instead of a list of URLs? I created the functions below to convert these
Strings into cluster state nodeNames for `liveNodes`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
mlbiscoc commented on code in PR #2935:
URL: https://github.com/apache/solr/pull/2935#discussion_r1910689266
##
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java:
##
@@ -51,7 +52,9 @@ public abstract class BaseHttpClusterStateProvider implements
ClusterStateProvid
private static final Logger log =
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
private String urlScheme;
+ private Set initialNodes;
volatile Set liveNodes;
+ volatile Set knownNodes;
Review Comment:
Thats fair. Removed known nodes and put everything into live nodes while
using the initial set inside all the time.
##
solr/solrj/src/test/org/apache/solr/client/solrj/impl/ClusterStateProviderTest.java:
##
@@ -183,4 +204,99 @@ public void testClusterStateProvider() throws
SolrServerException, IOException {
clusterStateZk.getCollection("col2"),
equalTo(clusterStateHttp.getCollection("col2")));
}
}
+
+ @Test
+ public void testClusterStateProviderDownedLiveNodes() throws Exception {
+try (var cspZk = zkClientClusterStateProvider();
+var cspHttp = http2ClusterStateProvider()) {
+ Set expectedLiveNodes = cspZk.getClusterState().getLiveNodes();
+ Set actualLiveNodes = cspHttp.getLiveNodes();
+ assertEquals(2, actualLiveNodes.size());
+ assertEquals(expectedLiveNodes, actualLiveNodes);
+
+ cluster.stopJettySolrRunner(jettyNode1);
+ waitForCSPCacheTimeout();
+
+ expectedLiveNodes = cspZk.getClusterState().getLiveNodes();
+ actualLiveNodes = cspHttp.getLiveNodes();
+ assertEquals(1, actualLiveNodes.size());
+ assertEquals(expectedLiveNodes, actualLiveNodes);
+
+ cluster.startJettySolrRunner(jettyNode1);
+ cluster.stopJettySolrRunner(jettyNode2);
+ waitForCSPCacheTimeout();
+
+ // Should still be reachable because known hosts doesn't remove initial
nodes
+ expectedLiveNodes = cspZk.getClusterState().getLiveNodes();
+ actualLiveNodes = cspHttp.getLiveNodes();
+ assertEquals(1, actualLiveNodes.size());
+ assertEquals(expectedLiveNodes, actualLiveNodes);
+}
+ }
+
+ @Test
+ public void testClusterStateProviderDownedKnownHosts() throws Exception {
+
+try (var cspHttp = http2ClusterStateProvider()) {
+
+ String jettyNode1Url =
normalizeJettyUrl(jettyNode1.getBaseUrl().toString());
+ String jettyNode2Url =
normalizeJettyUrl(jettyNode2.getBaseUrl().toString());
+ Set expectedKnownNodes = Set.of(jettyNode1Url, jettyNode2Url);
+ Set actualKnownNodes = cspHttp.getKnownNodes();
+
+ assertEquals(2, actualKnownNodes.size());
+ assertEquals(expectedKnownNodes, actualKnownNodes);
+
+ cluster.stopJettySolrRunner(jettyNode1);
+ waitForCSPCacheTimeout();
+
+ // Known hosts should never remove the initial set of live nodes
+ actualKnownNodes = cspHttp.getKnownNodes();
+ assertEquals(2, actualKnownNodes.size());
+ assertEquals(expectedKnownNodes, actualKnownNodes);
+}
+ }
+
+ @Test
+ public void testClusterStateProviderKnownHostsWithNewHost() throws Exception
{
+
+try (var cspHttp = http2ClusterStateProvider()) {
+
+ var jettyNode3 = cluster.startJettySolrRunner();
+ String jettyNode1Url =
normalizeJettyUrl(jettyNode1.getBaseUrl().toString());
+ String jettyNode2Url =
normalizeJettyUrl(jettyNode2.getBaseUrl().toString());
+ String jettyNode3Url =
normalizeJettyUrl(jettyNode3.getBaseUrl().toString());
+ Set expectedKnownNodes = Set.of(jettyNode1Url, jettyNode2Url,
jettyNode3Url);
+ waitForCSPCacheTimeout();
+
+ Set actualKnownNodes = cspHttp.getKnownNodes();
+ assertEquals(3, actualKnownNodes.size());
+ assertEquals(expectedKnownNodes, actualKnownNodes);
+
+ cluster.stopJettySolrRunner(jettyNode1);
+ waitForCSPCacheTimeout();
+
+ actualKnownNodes = cspHttp.getKnownNodes();
+ assertEquals(3, actualKnownNodes.size());
+ assertEquals(expectedKnownNodes, actualKnownNodes);
+
+ cluster.stopJettySolrRunner(jettyNode3);
+ expectedKnownNodes = Set.of(jettyNode1Url, jettyNode2Url);
+ waitForCSPCacheTimeout();
+
+ // New nodes are removable from known hosts
+ actualKnownNodes = cspHttp.getKnownNodes();
+ assertEquals(2, actualKnownNodes.size());
+ assertEquals(expectedKnownNodes, actualKnownNodes);
+}
+ }
+
+ private void waitForCSPCacheTimeout() throws InterruptedException {
+Thread.sleep(6000);
Review Comment:
Cache timeout setting is in seconds and pulls an int. Can't make it 1ms.
This probably should have had better granularity instead of increments of
seconds.
I could change cacheTimeout to take milliseconds as the int instead but that
would change peoples system property not using the default. So if they set the
cache timeout to 5 seconds, it now turns into 5ms if they are not aware of this
change.
--
This is an automated messag
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
dsmiley commented on code in PR #2935: URL: https://github.com/apache/solr/pull/2935#discussion_r1904581414 ## solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java: ## @@ -51,7 +52,9 @@ public abstract class BaseHttpClusterStateProvider implements ClusterStateProvid private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); private String urlScheme; + private Set initialNodes; volatile Set liveNodes; + volatile Set knownNodes; Review Comment: There's no guarantee that getLiveNodes is going to return a list of reachable nodes. A moment after returning it, a node could become unreachable. So it's "best effort" and the caller has to deal with failures by trying the other nodes in the list and/or getting a possibly refreshed list. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
mlbiscoc commented on code in PR #2935: URL: https://github.com/apache/solr/pull/2935#discussion_r1904552713 ## solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java: ## @@ -51,7 +52,9 @@ public abstract class BaseHttpClusterStateProvider implements ClusterStateProvid private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); private String urlScheme; + private Set initialNodes; volatile Set liveNodes; + volatile Set knownNodes; Review Comment: I originally did it with just `liveNodes` but liveNodes is returned to the client which from it's name I was thinking they would assume they are "live". If we just place all the nodes into liveNodes thats not necessarily true, right? If the current set hold all the initially passed nodes, it isn't guaranteed it's live which is why I switched to known nodes while live is what was actually fetched from ZooKeeper. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
dsmiley commented on code in PR #2935:
URL: https://github.com/apache/solr/pull/2935#discussion_r1903717289
##
solr/solrj/src/test/org/apache/solr/client/solrj/impl/ClusterStateProviderTest.java:
##
@@ -34,15 +35,20 @@
import org.apache.solr.common.cloud.ClusterState;
import org.apache.solr.common.cloud.DocCollection;
import org.apache.solr.common.util.NamedList;
+import org.apache.solr.embedded.JettySolrRunner;
import org.hamcrest.Matchers;
+import org.junit.After;
import org.junit.BeforeClass;
import org.junit.Test;
public class ClusterStateProviderTest extends SolrCloudTestCase {
+ private static JettySolrRunner jettyNode1;
+ private static JettySolrRunner jettyNode2;
Review Comment:
could add a convenience method if you like.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]
dsmiley commented on code in PR #2935:
URL: https://github.com/apache/solr/pull/2935#discussion_r1903643954
##
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java:
##
@@ -65,6 +68,8 @@ public void init(List solrUrls) throws Exception {
urlScheme = solrUrl.startsWith("https") ? "https" : "http";
try (SolrClient initialClient = getSolrClient(solrUrl)) {
this.liveNodes = fetchLiveNodes(initialClient);
+this.initialNodes = Set.copyOf(liveNodes);
Review Comment:
I think the idea of this JIRA issue is that we'll take the Solr URLs as
configured and use *this* is the initial / backup liveNodes. I think this is a
very simple idea to to document/understand/implement.
##
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java:
##
@@ -51,7 +52,9 @@ public abstract class BaseHttpClusterStateProvider implements
ClusterStateProvid
private static final Logger log =
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
private String urlScheme;
+ private Set initialNodes;
volatile Set liveNodes;
+ volatile Set knownNodes;
Review Comment:
right away, I'm surprised. I can understand needing two sets, but 3? IMO
knownNodes doesn't need to exists; that's the same as liveNodes. I see below
you've changed many references from liveNodes to knownNodes but I recommend
against doing that because the exact wording of "liveNodes" is extremely
pervasive in SolrCloud (well known concept) so let's not have a vary it in just
this class or any class.
##
solr/solrj/src/test/org/apache/solr/client/solrj/impl/ClusterStateProviderTest.java:
##
@@ -34,15 +35,20 @@
import org.apache.solr.common.cloud.ClusterState;
import org.apache.solr.common.cloud.DocCollection;
import org.apache.solr.common.util.NamedList;
+import org.apache.solr.embedded.JettySolrRunner;
import org.hamcrest.Matchers;
+import org.junit.After;
import org.junit.BeforeClass;
import org.junit.Test;
public class ClusterStateProviderTest extends SolrCloudTestCase {
+ private static JettySolrRunner jettyNode1;
+ private static JettySolrRunner jettyNode2;
Review Comment:
static fields in tests that refer to Solr should be null'ed out, so there's
some burden to them. Here... I think you could avoid these altogether and
simply call `cluster.getJettySolrRunner(0)` which isn't bad!
##
solr/solrj/src/test/org/apache/solr/client/solrj/impl/ClusterStateProviderTest.java:
##
@@ -183,4 +204,99 @@ public void testClusterStateProvider() throws
SolrServerException, IOException {
clusterStateZk.getCollection("col2"),
equalTo(clusterStateHttp.getCollection("col2")));
}
}
+
+ @Test
+ public void testClusterStateProviderDownedLiveNodes() throws Exception {
+try (var cspZk = zkClientClusterStateProvider();
+var cspHttp = http2ClusterStateProvider()) {
+ Set expectedLiveNodes = cspZk.getClusterState().getLiveNodes();
+ Set actualLiveNodes = cspHttp.getLiveNodes();
+ assertEquals(2, actualLiveNodes.size());
+ assertEquals(expectedLiveNodes, actualLiveNodes);
+
+ cluster.stopJettySolrRunner(jettyNode1);
+ waitForCSPCacheTimeout();
+
+ expectedLiveNodes = cspZk.getClusterState().getLiveNodes();
+ actualLiveNodes = cspHttp.getLiveNodes();
+ assertEquals(1, actualLiveNodes.size());
+ assertEquals(expectedLiveNodes, actualLiveNodes);
+
+ cluster.startJettySolrRunner(jettyNode1);
+ cluster.stopJettySolrRunner(jettyNode2);
+ waitForCSPCacheTimeout();
+
+ // Should still be reachable because known hosts doesn't remove initial
nodes
+ expectedLiveNodes = cspZk.getClusterState().getLiveNodes();
+ actualLiveNodes = cspHttp.getLiveNodes();
+ assertEquals(1, actualLiveNodes.size());
+ assertEquals(expectedLiveNodes, actualLiveNodes);
+}
+ }
+
+ @Test
+ public void testClusterStateProviderDownedKnownHosts() throws Exception {
+
+try (var cspHttp = http2ClusterStateProvider()) {
+
+ String jettyNode1Url =
normalizeJettyUrl(jettyNode1.getBaseUrl().toString());
+ String jettyNode2Url =
normalizeJettyUrl(jettyNode2.getBaseUrl().toString());
+ Set expectedKnownNodes = Set.of(jettyNode1Url, jettyNode2Url);
+ Set actualKnownNodes = cspHttp.getKnownNodes();
+
+ assertEquals(2, actualKnownNodes.size());
+ assertEquals(expectedKnownNodes, actualKnownNodes);
+
+ cluster.stopJettySolrRunner(jettyNode1);
+ waitForCSPCacheTimeout();
+
+ // Known hosts should never remove the initial set of live nodes
+ actualKnownNodes = cspHttp.getKnownNodes();
+ assertEquals(2, actualKnownNodes.size());
+ assertEquals(expectedKnownNodes, actualKnownNodes);
+}
+ }
+
+ @Test
+ public void testClusterStateProviderKnownHostsWithNewHost() throws Exception
{
+
+try (var cspHttp = http2ClusterStateProvider())
