[jira] [Commented] (SPARK-17564) Flaky RequestTimeoutIntegrationSuite, furtherRequestsDelay

2017-04-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963663#comment-15963663
 ] 

Apache Spark commented on SPARK-17564:
--

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/17599

> Flaky RequestTimeoutIntegrationSuite, furtherRequestsDelay
> --
>
> Key: SPARK-17564
> URL: https://issues.apache.org/jira/browse/SPARK-17564
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 2.0.1, 2.1.0
>Reporter: Adam Roberts
>Priority: Minor
>
> Could be related to [SPARK-10680]
> This is the test and one fix would be to increase the timeouts from 1.2 
> seconds to 5 seconds
> {code}
> // The timeout is relative to the LAST request sent, which is kinda weird, 
> but still.
>   // This test also makes sure the timeout works for Fetch requests as well 
> as RPCs.
>   @Test
>   public void furtherRequestsDelay() throws Exception {
> final byte[] response = new byte[16];
> final StreamManager manager = new StreamManager() {
>   @Override
>   public ManagedBuffer getChunk(long streamId, int chunkIndex) {
> Uninterruptibles.sleepUninterruptibly(FOREVER, TimeUnit.MILLISECONDS);
> return new NioManagedBuffer(ByteBuffer.wrap(response));
>   }
> };
> RpcHandler handler = new RpcHandler() {
>   @Override
>   public void receive(
>   TransportClient client,
>   ByteBuffer message,
>   RpcResponseCallback callback) {
> throw new UnsupportedOperationException();
>   }
>   @Override
>   public StreamManager getStreamManager() {
> return manager;
>   }
> };
> TransportContext context = new TransportContext(conf, handler);
> server = context.createServer();
> clientFactory = context.createClientFactory();
> TransportClient client = 
> clientFactory.createClient(TestUtils.getLocalHost(), server.getPort());
> // Send one request, which will eventually fail.
> TestCallback callback0 = new TestCallback();
> client.fetchChunk(0, 0, callback0);
> Uninterruptibles.sleepUninterruptibly(1200, TimeUnit.MILLISECONDS);
> // Send a second request before the first has failed.
> TestCallback callback1 = new TestCallback();
> client.fetchChunk(0, 1, callback1);
> Uninterruptibles.sleepUninterruptibly(1200, TimeUnit.MILLISECONDS);
> // not complete yet, but should complete soon
> assertEquals(-1, callback0.successLength);
> assertNull(callback0.failure);
> callback0.latch.await(60, TimeUnit.SECONDS);
> assertTrue(callback0.failure instanceof IOException);
> // failed at same time as previous
> assertTrue(callback1.failure instanceof IOException); // This is where we 
> fail because callback1.failure is null
>   }
> {code}
> If there are better suggestions for improving this test let's take them 
> onboard, I think using 5 sec timeout periods would be a place to start so 
> folks don't need to needlessly triage this failure. Will add a few prints and 
> report back



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17564) Flaky RequestTimeoutIntegrationSuite, furtherRequestsDelay

2016-09-16 Thread Adam Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15496699#comment-15496699
 ] 

Adam Roberts commented on SPARK-17564:
--

callback1.failure is sometimes null and due to this failing intermittently I'm 
sure it's timing window related

We are supposed to get an IOException when the test passes:
{code}
callback1.failure: java.io.IOException: Connection from /*some ip*:35581 closed
callback1.failure.getClass: class java.io.IOException
{code}

but sometimes we get this so the assertion fails (and we should improve the 
message too)
{code}
callback1.failure: null
{code}

[~zsxwing] your expertise is welcome here, there's also the CountdownLatch 
constructor to experiment with (so could increase from 1) as well as the 1.2 
sec timeouts, looking to improve this test's robustness

> Flaky RequestTimeoutIntegrationSuite, furtherRequestsDelay
> --
>
> Key: SPARK-17564
> URL: https://issues.apache.org/jira/browse/SPARK-17564
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 2.0.1, 2.1.0
>Reporter: Adam Roberts
>Priority: Minor
>
> Could be related to [SPARK-10680]
> This is the test and one fix would be to increase the timeouts from 1.2 
> seconds to 5 seconds
> {code}
> // The timeout is relative to the LAST request sent, which is kinda weird, 
> but still.
>   // This test also makes sure the timeout works for Fetch requests as well 
> as RPCs.
>   @Test
>   public void furtherRequestsDelay() throws Exception {
> final byte[] response = new byte[16];
> final StreamManager manager = new StreamManager() {
>   @Override
>   public ManagedBuffer getChunk(long streamId, int chunkIndex) {
> Uninterruptibles.sleepUninterruptibly(FOREVER, TimeUnit.MILLISECONDS);
> return new NioManagedBuffer(ByteBuffer.wrap(response));
>   }
> };
> RpcHandler handler = new RpcHandler() {
>   @Override
>   public void receive(
>   TransportClient client,
>   ByteBuffer message,
>   RpcResponseCallback callback) {
> throw new UnsupportedOperationException();
>   }
>   @Override
>   public StreamManager getStreamManager() {
> return manager;
>   }
> };
> TransportContext context = new TransportContext(conf, handler);
> server = context.createServer();
> clientFactory = context.createClientFactory();
> TransportClient client = 
> clientFactory.createClient(TestUtils.getLocalHost(), server.getPort());
> // Send one request, which will eventually fail.
> TestCallback callback0 = new TestCallback();
> client.fetchChunk(0, 0, callback0);
> Uninterruptibles.sleepUninterruptibly(1200, TimeUnit.MILLISECONDS);
> // Send a second request before the first has failed.
> TestCallback callback1 = new TestCallback();
> client.fetchChunk(0, 1, callback1);
> Uninterruptibles.sleepUninterruptibly(1200, TimeUnit.MILLISECONDS);
> // not complete yet, but should complete soon
> assertEquals(-1, callback0.successLength);
> assertNull(callback0.failure);
> callback0.latch.await(60, TimeUnit.SECONDS);
> assertTrue(callback0.failure instanceof IOException);
> // failed at same time as previous
> assertTrue(callback1.failure instanceof IOException); // This is where we 
> fail because callback1.failure is null
>   }
> {code}
> If there are better suggestions for improving this test let's take them 
> onboard, I think using 5 sec timeout periods would be a place to start so 
> folks don't need to needlessly triage this failure. Will add a few prints and 
> report back



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org