[jira] [Commented] (FLINK-27137) Remove usage of AdminClient from KafkaSource logic
[ https://issues.apache.org/jira/browse/FLINK-27137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519919#comment-17519919 ] Gabor Somogyi commented on FLINK-27137: --- Yep, Kafka community thinks that all offset management MUST be done w/ AdminClient. Majority of the Kafka community agrees that this is a strategic direction. Consumers can be used for offset fetching but if any issue arise then they suggest to switch to AdminClient and not willing to fix Consumer. We've spent several months on SPARK-32032 to discover all the issues and possibilities... > Remove usage of AdminClient from KafkaSource logic > -- > > Key: FLINK-27137 > URL: https://issues.apache.org/jira/browse/FLINK-27137 > Project: Flink > Issue Type: Improvement > Components: Connectors / Kafka >Affects Versions: 1.15.0, 1.14.4 >Reporter: Gyula Fora >Priority: Major > > Parts of the KafkaSource logic (specifically the KafkaSourceEnumerator) uses > the Kafka AdminClient instead of the KafkaConsumer. > It seems that the KafkaConsumer already provides all the necessary > information that the enumerator needs so there is no reason for introducing > the AdminClient. > In some environments using the AdminClient can be problematic even if we are > not using certaing features. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-27137) Remove usage of AdminClient from KafkaSource logic
[ https://issues.apache.org/jira/browse/FLINK-27137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519912#comment-17519912 ] Gyula Fora commented on FLINK-27137: Thanks [~dengziming] , I can see that AdminClient is more targeted for offset management (even though the consumer seems to cover all our current use-cases) talked to [~gaborgsomogyi] and he actually pointed out some practical limitations of fetching the offsets through the consumer. That sounds like a more convincing argument. We will try to deal with our current issues on our end. Thanks all! > Remove usage of AdminClient from KafkaSource logic > -- > > Key: FLINK-27137 > URL: https://issues.apache.org/jira/browse/FLINK-27137 > Project: Flink > Issue Type: Improvement > Components: Connectors / Kafka >Affects Versions: 1.15.0, 1.14.4 >Reporter: Gyula Fora >Priority: Major > > Parts of the KafkaSource logic (specifically the KafkaSourceEnumerator) uses > the Kafka AdminClient instead of the KafkaConsumer. > It seems that the KafkaConsumer already provides all the necessary > information that the enumerator needs so there is no reason for introducing > the AdminClient. > In some environments using the AdminClient can be problematic even if we are > not using certaing features. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-27137) Remove usage of AdminClient from KafkaSource logic
[ https://issues.apache.org/jira/browse/FLINK-27137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519852#comment-17519852 ] dengziming commented on FLINK-27137: [~arvid] In fact, Spark also use Adminclient to get offset in [https://github.com/apache/spark/pull/29729] . [~gyfora] > Since Flink will absolutely need the KafkaConsumer API Event though we are doomed to use KafkaConsumer, however KafkaConsumer is created in TaskManager whereas we seem to get offsets in JobManager? so maybe we will not create duplicated component here. And KafkaConsumer contains extra component, for example `Fetcher` and `ConsumerCoordinator` which may send extra rpc requests. > regular kafka applications to increase compatibility AdminClient is also supported by Kafka community and thus is compatible and robust as KafkaConsumer, it's designed just like the Flink public API. > In some environments using the AdminClient can be problematic If AdminClient has bugs we should report it to Kafka community to help improve it. > Looking at the OffsetRetriever there are 4 requirements here In fact there maybe new type of OffsetSpec, for example max-timestamp in KIP-734, even though it's trivial but indicate AdminClient is better concerning offset management. > Remove usage of AdminClient from KafkaSource logic > -- > > Key: FLINK-27137 > URL: https://issues.apache.org/jira/browse/FLINK-27137 > Project: Flink > Issue Type: Improvement > Components: Connectors / Kafka >Affects Versions: 1.15.0, 1.14.4 >Reporter: Gyula Fora >Priority: Major > > Parts of the KafkaSource logic (specifically the KafkaSourceEnumerator) uses > the Kafka AdminClient instead of the KafkaConsumer. > It seems that the KafkaConsumer already provides all the necessary > information that the enumerator needs so there is no reason for introducing > the AdminClient. > In some environments using the AdminClient can be problematic even if we are > not using certaing features. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-27137) Remove usage of AdminClient from KafkaSource logic
[ https://issues.apache.org/jira/browse/FLINK-27137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519679#comment-17519679 ] Gyula Fora commented on FLINK-27137: Even KafkaConsumer 2.0.0 supports all these APIs that I mentioned so I don't think there should be a compatibility issue on that side. Could you please give one example feature that would not be supported by the KafkaConsumer directly and we require the AdminClient for it? That would probably help me understand better why we would like to keep it. > Remove usage of AdminClient from KafkaSource logic > -- > > Key: FLINK-27137 > URL: https://issues.apache.org/jira/browse/FLINK-27137 > Project: Flink > Issue Type: Improvement > Components: Connectors / Kafka >Affects Versions: 1.15.0, 1.14.4 >Reporter: Gyula Fora >Priority: Major > > Parts of the KafkaSource logic (specifically the KafkaSourceEnumerator) uses > the Kafka AdminClient instead of the KafkaConsumer. > It seems that the KafkaConsumer already provides all the necessary > information that the enumerator needs so there is no reason for introducing > the AdminClient. > In some environments using the AdminClient can be problematic even if we are > not using certaing features. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-27137) Remove usage of AdminClient from KafkaSource logic
[ https://issues.apache.org/jira/browse/FLINK-27137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519674#comment-17519674 ] Arvid Heise commented on FLINK-27137: - Hi [~gyfora], sorry if you perceived my message as unfriendly that was not my intention. :( I just expect a ticket that basically reverts a ticket that went through duly process to be more descriptive so that everyone can understand the issue and discuss. The information in the ticket description is still lacking in that regard. You are also completely dismissing the valid use case of users not depending on Java 8, simply bumping Kafka dependency to 3 and use the new functionality. So please go for an approach that makes both possible. I'd also raise the possibility that you simply implement a limited {{AdminClient}} on your end as well or else you may run into issues with other Java-based frameworks, such as Spark, that may or may not also use the {{AdminClient}}. Expecting certain frameworks to use one part of Public API of a client jar and not another is a non-obvious requirement. For context, you can't run Pulsar without its {{AdminClient}}, so don't read too much into the name. Lastly, even Redpanda is providing that API to be a full drop-in replacement. In general, I'd deem your approach to security as non-standard, so please make sure to not disable the standard cases by simply reverting. > Remove usage of AdminClient from KafkaSource logic > -- > > Key: FLINK-27137 > URL: https://issues.apache.org/jira/browse/FLINK-27137 > Project: Flink > Issue Type: Improvement > Components: Connectors / Kafka >Affects Versions: 1.15.0, 1.14.4 >Reporter: Gyula Fora >Priority: Major > > Parts of the KafkaSource logic (specifically the KafkaSourceEnumerator) uses > the Kafka AdminClient instead of the KafkaConsumer. > It seems that the KafkaConsumer already provides all the necessary > information that the enumerator needs so there is no reason for introducing > the AdminClient. > In some environments using the AdminClient can be problematic even if we are > not using certaing features. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-27137) Remove usage of AdminClient from KafkaSource logic
[ https://issues.apache.org/jira/browse/FLINK-27137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519591#comment-17519591 ] Gyula Fora commented on FLINK-27137: [~arvid] "so I'd be inclined to close this ticket with "Won't Do" : let's try to keep a friendly attitude, I never suggested that you would need to do anything, I simply want to discuss this :) As for the context of this ticket. Some managed Kafka environments (including ours) use custom Kafka client implementations. This might restrict the use of certain APIs / functions such as the AdminClient. From an architectural standpoint we should generally try to limit what external APIs we use to increase compatibility between services. Since Flink will absolutely need the KafkaConsumer API for the main purpose of consuming / producing data there is clearly no way around that, but if we can limit the usage of new client interfaces such as AdminClient that can be beneficial. Regular kafka consumers and producer applications do not usually use the AdminClient (as the name also suggests) and one could argue that Flink should try to behave as much as regular kafka applications to increase compatibility. Looking at the OffsetRetriever there are 4 requirements here: getting committed group offsets, getting begin/end offsets and gettings offsets for timestamp. This could all be implemented using the KafkaConsumer (at least looking at the newer API [https://kafka.apache.org/30/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html)] So if we find that everything that we are trying to achieve can be done by the APIs that we already use (KafkaConsumer) then it would best to avoid using the AdminClient which is basically just an extra API dependency at that point without practical gain. If we see that there are some future features that cannot be covered by the KafkaConsumer we should at least make this logic pluggable / optional. > Remove usage of AdminClient from KafkaSource logic > -- > > Key: FLINK-27137 > URL: https://issues.apache.org/jira/browse/FLINK-27137 > Project: Flink > Issue Type: Improvement > Components: Connectors / Kafka >Affects Versions: 1.15.0, 1.14.4 >Reporter: Gyula Fora >Priority: Major > > Parts of the KafkaSource logic (specifically the KafkaSourceEnumerator) uses > the Kafka AdminClient instead of the KafkaConsumer. > It seems that the KafkaConsumer already provides all the necessary > information that the enumerator needs so there is no reason for introducing > the AdminClient. > In some environments using the AdminClient can be problematic even if we are > not using certaing features. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-27137) Remove usage of AdminClient from KafkaSource logic
[ https://issues.apache.org/jira/browse/FLINK-27137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519534#comment-17519534 ] dengziming commented on FLINK-27137: In fact, AdminClient is better when concerning getting offsets, for example, AdminClient supports max-timestamp whereas KafkaConsumer doesn't, and KafkaConsumer contains extra component, for example `Fetcher` and `ConsumerCoordinator`. > Remove usage of AdminClient from KafkaSource logic > -- > > Key: FLINK-27137 > URL: https://issues.apache.org/jira/browse/FLINK-27137 > Project: Flink > Issue Type: Improvement > Components: Connectors / Kafka >Affects Versions: 1.15.0, 1.14.4 >Reporter: Gyula Fora >Priority: Major > > Parts of the KafkaSource logic (specifically the KafkaSourceEnumerator) uses > the Kafka AdminClient instead of the KafkaConsumer. > It seems that the KafkaConsumer already provides all the necessary > information that the enumerator needs so there is no reason for introducing > the AdminClient. > In some environments using the AdminClient can be problematic even if we are > not using certaing features. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-27137) Remove usage of AdminClient from KafkaSource logic
[ https://issues.apache.org/jira/browse/FLINK-27137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519514#comment-17519514 ] Arvid Heise commented on FLINK-27137: - Can you please elaborate on {{In some environments using the AdminClient can be problematic even if we are not using certaing features.}} There are some pros to using the AdminClient and the drawbacks have not been clearly stated, so I'd be inclined to close this ticket with "Won't Do". After you have stated your concerns clearly, I'd probably go for a hybrid approach, where a newly introduced {{OffsetRetriever}} abstracts from the used client to serve both use cases. > Remove usage of AdminClient from KafkaSource logic > -- > > Key: FLINK-27137 > URL: https://issues.apache.org/jira/browse/FLINK-27137 > Project: Flink > Issue Type: Improvement > Components: Connectors / Kafka >Affects Versions: 1.15.0, 1.14.4 >Reporter: Gyula Fora >Priority: Critical > > Parts of the KafkaSource logic (specifically the KafkaSourceEnumerator) uses > the Kafka AdminClient instead of the KafkaConsumer. > It seems that the KafkaConsumer already provides all the necessary > information that the enumerator needs so there is no reason for introducing > the AdminClient. > In some environments using the AdminClient can be problematic even if we are > not using certaing features. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-27137) Remove usage of AdminClient from KafkaSource logic
[ https://issues.apache.org/jira/browse/FLINK-27137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519480#comment-17519480 ] Martijn Visser commented on FLINK-27137: I have no strong opinion on this topic at all :) Would be good to get some input from [~dengziming] and [~lindong] since they've worked on this to hopefully get to a consensus on this usage. > Remove usage of AdminClient from KafkaSource logic > -- > > Key: FLINK-27137 > URL: https://issues.apache.org/jira/browse/FLINK-27137 > Project: Flink > Issue Type: Improvement > Components: Connectors / Kafka >Affects Versions: 1.15.0, 1.14.4 >Reporter: Gyula Fora >Priority: Critical > > Parts of the KafkaSource logic (specifically the KafkaSourceEnumerator) uses > the Kafka AdminClient instead of the KafkaConsumer. > It seems that the KafkaConsumer already provides all the necessary > information that the enumerator needs so there is no reason for introducing > the AdminClient. > In some environments using the AdminClient can be problematic even if we are > not using certaing features. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-27137) Remove usage of AdminClient from KafkaSource logic
[ https://issues.apache.org/jira/browse/FLINK-27137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519476#comment-17519476 ] Gyula Fora commented on FLINK-27137: I was not aware of that [~martijnvisser] but I think it still might make sense to remove the AdminClient. Basically the logic for using it seems to be that we might use some features in the future that the consumer might not have at that type :) > Remove usage of AdminClient from KafkaSource logic > -- > > Key: FLINK-27137 > URL: https://issues.apache.org/jira/browse/FLINK-27137 > Project: Flink > Issue Type: Improvement > Components: Connectors / Kafka >Affects Versions: 1.15.0, 1.14.4 >Reporter: Gyula Fora >Priority: Critical > > Parts of the KafkaSource logic (specifically the KafkaSourceEnumerator) uses > the Kafka AdminClient instead of the KafkaConsumer. > It seems that the KafkaConsumer already provides all the necessary > information that the enumerator needs so there is no reason for introducing > the AdminClient. > In some environments using the AdminClient can be problematic even if we are > not using certaing features. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-27137) Remove usage of AdminClient from KafkaSource logic
[ https://issues.apache.org/jira/browse/FLINK-27137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519473#comment-17519473 ] Martijn Visser commented on FLINK-27137: This was done deliberately via FLINK-25368 > Remove usage of AdminClient from KafkaSource logic > -- > > Key: FLINK-27137 > URL: https://issues.apache.org/jira/browse/FLINK-27137 > Project: Flink > Issue Type: Improvement > Components: Connectors / Kafka >Affects Versions: 1.15.0, 1.14.4 >Reporter: Gyula Fora >Priority: Critical > > Parts of the KafkaSource logic (specifically the KafkaSourceEnumerator) uses > the Kafka AdminClient instead of the KafkaConsumer. > It seems that the KafkaConsumer already provides all the necessary > information that the enumerator needs so there is no reason for introducing > the AdminClient. > In some environments using the AdminClient can be problematic even if we are > not using certaing features. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-27137) Remove usage of AdminClient from KafkaSource logic
[ https://issues.apache.org/jira/browse/FLINK-27137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519472#comment-17519472 ] Gyula Fora commented on FLINK-27137: cc [~matyas] [~thw] > Remove usage of AdminClient from KafkaSource logic > -- > > Key: FLINK-27137 > URL: https://issues.apache.org/jira/browse/FLINK-27137 > Project: Flink > Issue Type: Improvement > Components: Connectors / Kafka >Affects Versions: 1.15.0, 1.14.4 >Reporter: Gyula Fora >Priority: Critical > > Parts of the KafkaSource logic (specifically the KafkaSourceEnumerator) uses > the Kafka AdminClient instead of the KafkaConsumer. > It seems that the KafkaConsumer already provides all the necessary > information that the enumerator needs so there is no reason for introducing > the AdminClient. > In some environments using the AdminClient can be problematic even if we are > not using certaing features. -- This message was sent by Atlassian Jira (v8.20.1#820001)