[GitHub] [pulsar] devinbost commented on issue #6198: Flaky-test: PulsarStateTest.testSinkState - wrong number of messages received

2020-02-14 Thread GitBox
devinbost commented on issue #6198: Flaky-test: PulsarStateTest.testSinkState - 
wrong number of messages received
URL: https://github.com/apache/pulsar/issues/6198#issuecomment-586562858
 
 
   I fixed this issue in my PR #6202. However, now there's a new issue. 
   The Github CI output just looks like this: 
   
   `1211[ERROR] Run 4: PulsarStateTest.testSinkState:162 ยป PulsarAdmin 
org.apache.pulsar.shade.javax`
   
   I've attached a log with the test failure.
   
   [11_run integration 
tests.txt](https://github.com/apache/pulsar/files/4207875/11_run.integration.tests.txt)
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [pulsar] devinbost commented on issue #6198: Flaky-test: PulsarStateTest.testSinkState

2020-02-03 Thread GitBox
devinbost commented on issue #6198: Flaky-test: PulsarStateTest.testSinkState
URL: https://github.com/apache/pulsar/issues/6198#issuecomment-581674740
 
 
   When looking at the approach used in 
SimpleProducerConsumerTest.testAsyncProducerAndAsyncAck(..), that approach 
looks like this:
   
   ```
   // Asynchronously produce messages
   for (int i = 0; i < 10; i++) {
   final String message = "my-message-" + i;
   Future future = producer.sendAsync(message.getBytes());
   futures.add(future);
   }
   
   log.info("Waiting for async publish to complete");
   for (Future future : futures) {
   future.get();
   }
   
   Message msg = null;
   Set messageSet = Sets.newHashSet();
   for (int i = 0; i < 10; i++) {
   msg = consumer.receive(5, TimeUnit.SECONDS);
   String receivedMessage = new String(msg.getData());
   log.info("Received message: [{}]", receivedMessage);
   String expectedMessage = "my-message-" + i;
   testMessageOrderAndDuplicates(messageSet, receivedMessage, 
expectedMessage);
   }
   
   // Asynchronously acknowledge upto and including the last message
   Future ackFuture = consumer.acknowledgeCumulativeAsync(msg);
   log.info("Waiting for async ack to complete");
   ackFuture.get();
   consumer.close();
   ```
   
   If we wait for the futures to complete like that, does that really guarantee 
that the publish was **fully** completed? (i.e. When `future.get();` is done 
blocking, does that guarantee that the message can be received/consumed?) If 
so, then this approach could be used instead. 
   However, I am not yet convinced that the loop is fully closed by this 
approach. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [pulsar] devinbost commented on issue #6198: Flaky-test: PulsarStateTest.testSinkState

2020-02-03 Thread GitBox
devinbost commented on issue #6198: Flaky-test: PulsarStateTest.testSinkState
URL: https://github.com/apache/pulsar/issues/6198#issuecomment-581671742
 
 
   The other issue with blocking on the receive operation is that we'd need 
receiving to be synchronous with sending, which would impact Pulsar's 
performance unless there's a way to do so that I'm not considering. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [pulsar] devinbost commented on issue #6198: Flaky-test: PulsarStateTest.testSinkState

2020-02-03 Thread GitBox
devinbost commented on issue #6198: Flaky-test: PulsarStateTest.testSinkState
URL: https://github.com/apache/pulsar/issues/6198#issuecomment-581671178
 
 
   This issue seems to not occur after increasing the timeouts (e.g. doubling 
the retryCount and initSleepTimeInMillis for retryStrategically) when running 
them under stress. I pushed this change to my fork to see if it resolves the 
issue when running from Github CI. 
   
   The problem with the current approach to these tests is that there is a race 
condition between checking the status from the Admin API (which executes a REST 
call) and the method responsible for producing the messages. 
   `producer.send` blocks on the send operation, but it doesn't block on the 
receive operation. 
   Ideally, we'd have a way to block (at least for a period of time) until the 
messages are all received instead of needing to poll on the status. 
   However, such a change may not necessarily fix the test because we'd still 
be depending on execution to complete successfully after a period of time. (We 
may not have a choice because the lack of a timeout could cause the test to run 
indefinitely.)
   So, that brings us back to the idea of increasing the timeouts when polling 
the status to ensure we receive all the messages when using a slow test runner. 
   
   @sijie @jiazhai @yjshen Thoughts? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [pulsar] devinbost commented on issue #6198: Flaky-test: PulsarStateTest.testSinkState

2020-02-03 Thread GitBox
devinbost commented on issue #6198: Flaky-test: PulsarStateTest.testSinkState
URL: https://github.com/apache/pulsar/issues/6198#issuecomment-581644681
 
 
   The issue is intermittent locally even when running stress-ng.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [pulsar] devinbost commented on issue #6198: Flaky-test: PulsarStateTest.testSinkState

2020-02-03 Thread GitBox
devinbost commented on issue #6198: Flaky-test: PulsarStateTest.testSinkState
URL: https://github.com/apache/pulsar/issues/6198#issuecomment-581638173
 
 
   @jiazhai @yjshen Using stress-ng might be helpful for you guys if you're 
investigating the flaky tests. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [pulsar] devinbost commented on issue #6198: Flaky-test: PulsarStateTest.testSinkState

2020-02-03 Thread GitBox
devinbost commented on issue #6198: Flaky-test: PulsarStateTest.testSinkState
URL: https://github.com/apache/pulsar/issues/6198#issuecomment-581637709
 
 
   (I was able to install stress-ng on my mac via brew.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services