> On Dec. 11, 2017, 6:57 p.m., Bill Farner wrote:
> > I should have looked more closely at the last patch, and when preparing my
> > patch before that. This test is bound to be flaky since there's nothing
> > ensuring a happens-before releationship between issuing the request and
> > checking the result.
> >
> > I can repro consistently with the patch below, slowing down the
> > `onThrowable` handler:
> > ```console
> > $ git diff
> > diff --git a/src/main/java/org/apache/aurora/scheduler/events/Webhook.java
> > b/src/main/java/org/apache/aurora/scheduler/events/Webhook.java
> > index 95e3f52..d4519ee 100644
> > --- a/src/main/java/org/apache/aurora/scheduler/events/Webhook.java
> > +++ b/src/main/java/org/apache/aurora/scheduler/events/Webhook.java
> > @@ -102,6 +102,10 @@ public class Webhook extends AbstractIdleService
> > implements EventSubscriber {
> > createRequest(stateChange).execute(new
> > AsyncCompletionHandler<Integer>() {
> > @Override
> > public void onThrowable(Throwable t) {
> > + try {
> > + Thread.sleep(1000);
> > + } catch (InterruptedException e) {
> > + }
> > errorsCounter.incrementAndGet();
> > LOG.error("Error sending a Webhook event", t);
> > ```
> >
> > If i get some spare time, i may take a stab at a fix; but the
> > happens-before is what we'll want to seek.
>
> Jordan Ly wrote:
> Interesting -- I would imagine this client would force the future to
> complete.
>
> ```
> /**
> * Wrap the DefaultAsyncHttpClient for testing so we can complete
> futures synchronously before
> * validating assertions. Otherwise, we would have to call
> `Thread.sleep` in our tests after
> * each TaskStateChange.
> */
> static class WebhookAsyncHttpClientWrapper extends
> DefaultAsyncHttpClient {
>
> WebhookAsyncHttpClientWrapper(DefaultAsyncHttpClientConfig config) {
> super(config);
> }
>
> @Override
> public <T> ListenableFuture<T>
> executeRequest(org.asynchttpclient.Request request,
> AsyncHandler<T>
> handler) {
> ListenableFuture<T> future = super.executeRequest(request, handler);
> try {
> future.get();
> future.done();
> } catch (InterruptedException | ExecutionException e) {
> // Ignore, future should not fail to resolve.
> }
> return future;
> }
> }
> ```
>
> Must be something flawed in the implementation.
>
> Bill Farner wrote:
> I re-remembered this code after posting the above reply; and i
> agree...pretty surprising!
>
> Bill Farner wrote:
> This patch fixes things for me; causing the `Thread.sleep()` to no longer
> affect test results:
>
> ```console
> diff --git
> a/src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java
> b/src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java
> index 1b5d2d0..cdcf50a 100644
> --- a/src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java
> +++ b/src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java
> @@ -17,7 +17,6 @@ import java.io.IOException;
> import java.net.URI;
> import java.util.List;
> import java.util.Map;
> -import java.util.concurrent.ExecutionException;
> import java.util.stream.Collectors;
>
> import javax.servlet.ServletException;
> @@ -107,12 +106,7 @@ public class WebhookTest {
> public <T> ListenableFuture<T>
> executeRequest(org.asynchttpclient.Request request,
> AsyncHandler<T>
> handler) {
> ListenableFuture<T> future = super.executeRequest(request,
> handler);
> - try {
> - future.get();
> - future.done();
> - } catch (InterruptedException | ExecutionException e) {
> - // Ignore, future should not fail to resolve.
> - }
> + future.toCompletableFuture().join();
> return future;
> }
> }
> ```
>
> The `future.get()` alone does not work due to this code in
> `NettyResponseFuture`:
> ```java
> public final void abort(final Throwable t) {
>
> if (terminateAndExit())
> return;
>
> future.completeExceptionally(t);
>
> if (onThrowableCalledField.compareAndSet(this, 0, 1)) {
> try {
> asyncHandler.onThrowable(t);
> } catch (Throwable te) {
> LOGGER.debug("asyncHandler.onThrowable", te);
> }
> }
> }
> ```
>
> The `future.completeExceptionally(t)` call causes `future.get()` above to
> throw `ExecutionException`, and our main test thread continues. Netty's
> thread separately proceeds to call `asyncHandler.onThrowable(t)`, which races
> with the main thread.
>
> I have not looked into the internals of `CompletableFuture#join()` to say
> why it differs, but it does pass consistently for me.
>
> Stephan Erb wrote:
> Jordan, will you update your PR? Or do you plan to file one, Bill?
>
> Jordan Ly wrote:
> I can update my PR. However, I believe `CompletableFuture#join()` tests a
> different code path which is the surrounding `try-catch`, not `onThrowable`.
> You can observe this behavior by putting a `Thread.sleep(10000000)` in
> `onThrowable`. The test will complete within 1 second which is not expected.
> Additionally, the error log will be "Error making Webhook request" as opposed
> to "Error sending a Webhook event".
> a different code path
You are correct! The test is fixed, but different code is tested; not what
we're after.
- Bill
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64523/#review193491
-----------------------------------------------------------
On Dec. 11, 2017, 6:29 p.m., Jordan Ly wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64523/
> -----------------------------------------------------------
>
> (Updated Dec. 11, 2017, 6:29 p.m.)
>
>
> Review request for Aurora and Stephan Erb.
>
>
> Bugs: AURORA-1961
> https://issues.apache.org/jira/browse/AURORA-1961
>
>
> Repository: aurora
>
>
> Description
> -------
>
> Attempt #2 to fix the flaky Webhook test.
>
> Along the same lines of Stephan's change
> (https://reviews.apache.org/r/64482/), but opting to never start the Jetty
> server and forcing a connect exception.
>
> Overall, this flakiness has seemed to increase in volume over the past month.
> I've been running into it with a noticable fraction of internal/test builds.
>
>
> Diffs
> -----
>
> src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java
> 1b5d2d02535345edfe6cf04d18d00434393f800b
>
>
> Diff: https://reviews.apache.org/r/64523/diff/1/
>
>
> Testing
> -------
>
> This change seems pretty hard to test considering the differences in
> environment and the unknown cause of the actual errors. Maybe we run the
> reviewbot on this repo repeatedly? Obviously not the most scientifically
> sound solution.
>
>
> Thanks,
>
> Jordan Ly
>
>