Re: [PR] Flakiness in LB2SolrClientTest.testTwoServers [solr]
janhoy merged PR #3937: URL: https://github.com/apache/solr/pull/3937 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Flakiness in LB2SolrClientTest.testTwoServers [solr]
janhoy commented on code in PR #3937:
URL: https://github.com/apache/solr/pull/3937#discussion_r2617384766
##
solr/solrj/src/java/org/apache/solr/common/util/RetryUtil.java:
##
@@ -84,12 +144,32 @@ public static void retryUntil(
throw new SolrException(ErrorCode.SERVER_ERROR, errorMessage);
}
+ /**
+ * Retries a command until it returns {@code true} or the timeout is reached.
+ *
+ * The command is executed repeatedly until it returns {@code true} or
the timeout expires. If
+ * the command returns {@code false} and there is time remaining, the method
sleeps for {@code
+ * intervalms} milliseconds before retrying. If the timeout is reached
before the command
+ * succeeds, a {@link SolrException} is thrown.
+ *
+ * @param timeoutms maximum time to retry in milliseconds
+ * @param intervalms wait interval between retries in milliseconds
+ * @param cmd the command to execute
+ * @throws SolrException if the timeout is reached before the command
succeeds, or if the thread
+ * is interrupted while sleeping between retries
+ */
public static void retryOnBoolean(long timeoutms, long intervalms,
BooleanRetryCmd cmd) {
long timeout =
System.nanoTime() + TimeUnit.NANOSECONDS.convert(timeoutms,
TimeUnit.MILLISECONDS);
while (true) {
boolean resp = cmd.execute();
if (!resp && System.nanoTime() < timeout) {
+try {
Review Comment:
Nice utitlity, can help us stabilize some more tests I imagine.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Flakiness in LB2SolrClientTest.testTwoServers [solr]
dsmiley commented on code in PR #3937:
URL: https://github.com/apache/solr/pull/3937#discussion_r2612591743
##
solr/solrj/src/java/org/apache/solr/common/util/RetryUtil.java:
##
@@ -84,12 +144,32 @@ public static void retryUntil(
throw new SolrException(ErrorCode.SERVER_ERROR, errorMessage);
}
+ /**
+ * Retries a command until it returns {@code true} or the timeout is reached.
+ *
+ * The command is executed repeatedly until it returns {@code true} or
the timeout expires. If
+ * the command returns {@code false} and there is time remaining, the method
sleeps for {@code
+ * intervalms} milliseconds before retrying. If the timeout is reached
before the command
+ * succeeds, a {@link SolrException} is thrown.
+ *
+ * @param timeoutms maximum time to retry in milliseconds
+ * @param intervalms wait interval between retries in milliseconds
+ * @param cmd the command to execute
+ * @throws SolrException if the timeout is reached before the command
succeeds, or if the thread
+ * is interrupted while sleeping between retries
+ */
public static void retryOnBoolean(long timeoutms, long intervalms,
BooleanRetryCmd cmd) {
long timeout =
System.nanoTime() + TimeUnit.NANOSECONDS.convert(timeoutms,
TimeUnit.MILLISECONDS);
while (true) {
boolean resp = cmd.execute();
if (!resp && System.nanoTime() < timeout) {
+try {
Review Comment:
nice catch :-)
##
solr/solrj/src/test/org/apache/solr/client/solrj/impl/LB2SolrClientTest.java:
##
@@ -196,9 +197,18 @@ public void testTwoServers() throws Exception {
solr[1].jetty = null;
startJettyAndWaitForAliveCheckQuery(solr[0]);
- resp = h.lbClient.query(solrQuery);
- name = resp.getResults().get(0).getFieldValue("name").toString();
- assertEquals("solr/collection10", name);
+ RetryUtil.retryOnBoolean(
+ 5000,
+ 500,
+ () -> {
+try {
+ QueryResponse resp1 = h.lbClient.query(solrQuery);
+ String name1 =
resp1.getResults().get(0).getFieldValue("name").toString();
+ return ("solr/collection10".equals(name1));
Review Comment:
instead of adding 1 to the end... really it's fine do do this whole thing as
one expression (inline each var).
##
solr/solrj/src/java/org/apache/solr/common/util/RetryUtil.java:
##
@@ -84,12 +144,32 @@ public static void retryUntil(
throw new SolrException(ErrorCode.SERVER_ERROR, errorMessage);
}
+ /**
+ * Retries a command until it returns {@code true} or the timeout is reached.
+ *
+ * The command is executed repeatedly until it returns {@code true} or
the timeout expires. If
+ * the command returns {@code false} and there is time remaining, the method
sleeps for {@code
+ * intervalms} milliseconds before retrying. If the timeout is reached
before the command
+ * succeeds, a {@link SolrException} is thrown.
+ *
+ * @param timeoutms maximum time to retry in milliseconds
+ * @param intervalms wait interval between retries in milliseconds
+ * @param cmd the command to execute
+ * @throws SolrException if the timeout is reached before the command
succeeds, or if the thread
Review Comment:
IMO AI adds too much documentation when it writes docs...
##
solr/solrj/src/java/org/apache/solr/common/util/RetryUtil.java:
##
@@ -25,23 +25,67 @@
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
+/** Utility class for retrying operations with various strategies. */
Review Comment:
thanks for adding javadocs
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Flakiness in LB2SolrClientTest.testTwoServers [solr]
janhoy commented on PR #3937: URL: https://github.com/apache/solr/pull/3937#issuecomment-3644161670 Thanks for bringing attention to `RetryUtil`, haven't seen it before. I switched to that strategy and improved RetryUtil with docs at the same time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Flakiness in LB2SolrClientTest.testTwoServers [solr]
dsmiley commented on PR #3937: URL: https://github.com/apache/solr/pull/3937#issuecomment-3643872452 A better compromise without more sleeps on the happy path would be to execute the test that follows this call with `org.apache.solr.common.util.RetryUtil`. That, I'd get behind. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Flakiness in LB2SolrClientTest.testTwoServers [solr]
dsmiley commented on PR #3937: URL: https://github.com/apache/solr/pull/3937#issuecomment-3643862040 I looked at this for 15min just now. This may help a little but I'm not optimistic; obviously is just a bandaid and test anti-pattern -- what I tell new engineers *not* to do. I wish we had better means of flaky reproduction other than beasting. In particular, I wonder if there's a technique and/or tool that can simulate the JVM slowing down a lot without actually burdening one's machine. I suspect the average test on Crave takes longer, but it's massively parallelized to run a crazy number of tests in parallel. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Flakiness in LB2SolrClientTest.testTwoServers [solr]
janhoy commented on PR #3937: URL: https://github.com/apache/solr/pull/3937#issuecomment-3636889767 And the failing precommit due to Antora / node is also quite annoying, feel I'm seeing it all the time? ``` * What went wrong: Execution failed for task ':solr:solr-ref-guide:buildLocalAntoraSite'. > Process 'command '/home/runner/work/solr/solr/solr/solr-ref-guide/.gradle/node/nodejs/node-v22.18.0-linux-x64/bin/npx'' finished with non-zero exit value 1 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Flakiness in LB2SolrClientTest.testTwoServers [solr]
epugh commented on PR #3937: URL: https://github.com/apache/solr/pull/3937#issuecomment-3636639199 More of a general question... How can we make our test platform not require super "magic" changes like this on a per class basis and solve it more globally? Is there a way that this could have been solved in our core code? Is it possible that in real life someone would hit the same issue of spinning up a Jetty and then sending requests? Could we have a retry or a probe for liveness instead approach that is built in? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Flakiness in LB2SolrClientTest.testTwoServers [solr]
janhoy commented on PR #3937: URL: https://github.com/apache/solr/pull/3937#issuecomment-3636189828 Adding you @dsmiley as you touched SolrJ a lot lately, but I don't think any of the recent work causes these test fails. Due to recent renames, the develocity history for this test is split across two class names. Older history can be seen as `LBHttp2SolrClientIntegrationTest` here https://develocity.apache.org/scans/tests?search.relativeStartTime=P28D&search.rootProjectNames=solr-root&search.timeZoneId=Europe%2FOslo&tests.container=org.apache.solr.client.solrj.impl.LBHttp2SolrClientIntegrationTest which confirms that this test has been flaky for a long time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
