becketqin commented on pull request #12255:
URL: https://github.com/apache/flink/pull/12255#issuecomment-631167059
@aljoscha Thanks for the review. I went through all the reported failures in
the Jira ticket. It looks people are reporting two different issues.
1. The "producer already closed" error caused by the race condition in
`StreamOperatorWrapper` that we are trying to fix in FLINK-16383.
2. The "does not host this topic-partition" error we are trying to fix here.
All the failures caused by issue 2 are either from 0.10 or 0.11 Kafka
connectors, while the first issues are reported on all the versions. I think
people are just adding comments to the same ticket once they see this test
failed, regardless of the failure cause.
That being said, theoretically speaking, using `KafkaAdminClient` to create
topic does not 100% guarantee that a producer will not see the "does not host
this topic-partition" error. This is because when the `AdminClient` can only
guarantee the topic metadata information has existed in the broker to which it
sent the `CreateTopicRequest`. When a producer comes at a later point, it might
send `TopicMetdataRequest` to a different broker and that broker may have not
received the updated topic metadata yet. But this is much unlikely to happen
given the broker usually receives the metadata update at the same time. Having
retries configured on the producer side should be sufficient to handle such
cases. We can also do that for 0.10 and 0.11 producers. But given that we have
the producer properties scattered over the places (which is something we
probably should avoid to begin with), it would be simpler to just make sure the
topic has been created successfully before we start the tests.
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org