[GitHub] [pulsar] devinbost commented on issue #6198: Flaky-test: PulsarStateTest.testSinkState - wrong number of messages received
devinbost commented on issue #6198: Flaky-test: PulsarStateTest.testSinkState - wrong number of messages received URL: https://github.com/apache/pulsar/issues/6198#issuecomment-586562858 I fixed this issue in my PR #6202. However, now there's a new issue. The Github CI output just looks like this: `1211[ERROR] Run 4: PulsarStateTest.testSinkState:162 ยป PulsarAdmin org.apache.pulsar.shade.javax` I've attached a log with the test failure. [11_run integration tests.txt](https://github.com/apache/pulsar/files/4207875/11_run.integration.tests.txt) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] devinbost commented on issue #6198: Flaky-test: PulsarStateTest.testSinkState
devinbost commented on issue #6198: Flaky-test: PulsarStateTest.testSinkState URL: https://github.com/apache/pulsar/issues/6198#issuecomment-581674740 When looking at the approach used in SimpleProducerConsumerTest.testAsyncProducerAndAsyncAck(..), that approach looks like this: ``` // Asynchronously produce messages for (int i = 0; i < 10; i++) { final String message = "my-message-" + i; Future future = producer.sendAsync(message.getBytes()); futures.add(future); } log.info("Waiting for async publish to complete"); for (Future future : futures) { future.get(); } Message msg = null; Set messageSet = Sets.newHashSet(); for (int i = 0; i < 10; i++) { msg = consumer.receive(5, TimeUnit.SECONDS); String receivedMessage = new String(msg.getData()); log.info("Received message: [{}]", receivedMessage); String expectedMessage = "my-message-" + i; testMessageOrderAndDuplicates(messageSet, receivedMessage, expectedMessage); } // Asynchronously acknowledge upto and including the last message Future ackFuture = consumer.acknowledgeCumulativeAsync(msg); log.info("Waiting for async ack to complete"); ackFuture.get(); consumer.close(); ``` If we wait for the futures to complete like that, does that really guarantee that the publish was **fully** completed? (i.e. When `future.get();` is done blocking, does that guarantee that the message can be received/consumed?) If so, then this approach could be used instead. However, I am not yet convinced that the loop is fully closed by this approach. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] devinbost commented on issue #6198: Flaky-test: PulsarStateTest.testSinkState
devinbost commented on issue #6198: Flaky-test: PulsarStateTest.testSinkState URL: https://github.com/apache/pulsar/issues/6198#issuecomment-581671742 The other issue with blocking on the receive operation is that we'd need receiving to be synchronous with sending, which would impact Pulsar's performance unless there's a way to do so that I'm not considering. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] devinbost commented on issue #6198: Flaky-test: PulsarStateTest.testSinkState
devinbost commented on issue #6198: Flaky-test: PulsarStateTest.testSinkState URL: https://github.com/apache/pulsar/issues/6198#issuecomment-581671178 This issue seems to not occur after increasing the timeouts (e.g. doubling the retryCount and initSleepTimeInMillis for retryStrategically) when running them under stress. I pushed this change to my fork to see if it resolves the issue when running from Github CI. The problem with the current approach to these tests is that there is a race condition between checking the status from the Admin API (which executes a REST call) and the method responsible for producing the messages. `producer.send` blocks on the send operation, but it doesn't block on the receive operation. Ideally, we'd have a way to block (at least for a period of time) until the messages are all received instead of needing to poll on the status. However, such a change may not necessarily fix the test because we'd still be depending on execution to complete successfully after a period of time. (We may not have a choice because the lack of a timeout could cause the test to run indefinitely.) So, that brings us back to the idea of increasing the timeouts when polling the status to ensure we receive all the messages when using a slow test runner. @sijie @jiazhai @yjshen Thoughts? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] devinbost commented on issue #6198: Flaky-test: PulsarStateTest.testSinkState
devinbost commented on issue #6198: Flaky-test: PulsarStateTest.testSinkState URL: https://github.com/apache/pulsar/issues/6198#issuecomment-581644681 The issue is intermittent locally even when running stress-ng. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] devinbost commented on issue #6198: Flaky-test: PulsarStateTest.testSinkState
devinbost commented on issue #6198: Flaky-test: PulsarStateTest.testSinkState URL: https://github.com/apache/pulsar/issues/6198#issuecomment-581638173 @jiazhai @yjshen Using stress-ng might be helpful for you guys if you're investigating the flaky tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] devinbost commented on issue #6198: Flaky-test: PulsarStateTest.testSinkState
devinbost commented on issue #6198: Flaky-test: PulsarStateTest.testSinkState URL: https://github.com/apache/pulsar/issues/6198#issuecomment-581637709 (I was able to install stress-ng on my mac via brew.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services