[jira] [Commented] (FLINK-27137) Remove usage of AdminClient from KafkaSource logic

2022-04-09 Thread Gabor Somogyi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519919#comment-17519919
 ] 

Gabor Somogyi commented on FLINK-27137:
---

Yep, Kafka community thinks that all offset management MUST be done w/ 
AdminClient. Majority of the Kafka community agrees that this is a strategic 
direction.
Consumers can be used for offset fetching but if any issue arise then they 
suggest to switch to AdminClient and not willing to fix Consumer. We've spent 
several months on SPARK-32032 to discover all the issues and possibilities...

> Remove usage of AdminClient from KafkaSource logic
> --
>
> Key: FLINK-27137
> URL: https://issues.apache.org/jira/browse/FLINK-27137
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kafka
>Affects Versions: 1.15.0, 1.14.4
>Reporter: Gyula Fora
>Priority: Major
>
> Parts of the KafkaSource logic (specifically the KafkaSourceEnumerator) uses 
> the Kafka AdminClient instead of the KafkaConsumer.
> It seems that the KafkaConsumer already provides all the necessary 
> information that the enumerator needs so there is no reason for introducing 
> the AdminClient.
> In some environments using the AdminClient can be problematic even if we are 
> not using certaing features.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-27137) Remove usage of AdminClient from KafkaSource logic

2022-04-09 Thread Gyula Fora (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519912#comment-17519912
 ] 

Gyula Fora commented on FLINK-27137:


Thanks [~dengziming] , I can see that AdminClient is more targeted for offset 
management (even though the consumer seems to cover all our current use-cases)

talked to [~gaborgsomogyi] and he actually pointed out some practical 
limitations of fetching the offsets through the consumer. That sounds like a 
more convincing argument. We will try to deal with our current issues on our 
end.

Thanks all!

> Remove usage of AdminClient from KafkaSource logic
> --
>
> Key: FLINK-27137
> URL: https://issues.apache.org/jira/browse/FLINK-27137
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kafka
>Affects Versions: 1.15.0, 1.14.4
>Reporter: Gyula Fora
>Priority: Major
>
> Parts of the KafkaSource logic (specifically the KafkaSourceEnumerator) uses 
> the Kafka AdminClient instead of the KafkaConsumer.
> It seems that the KafkaConsumer already provides all the necessary 
> information that the enumerator needs so there is no reason for introducing 
> the AdminClient.
> In some environments using the AdminClient can be problematic even if we are 
> not using certaing features.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-27137) Remove usage of AdminClient from KafkaSource logic

2022-04-08 Thread dengziming (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519852#comment-17519852
 ] 

dengziming commented on FLINK-27137:


[~arvid] In fact, Spark also use Adminclient to get offset in 
[https://github.com/apache/spark/pull/29729] .

[~gyfora] 

 

> Since Flink will absolutely need the KafkaConsumer API

Event though we are doomed to use KafkaConsumer, however KafkaConsumer is 
created in TaskManager whereas we seem to get offsets in JobManager? so maybe 
we will not create duplicated component here. And KafkaConsumer contains extra 
component, for example `Fetcher` and `ConsumerCoordinator` which may send extra 
rpc requests.

> regular kafka applications to increase compatibility

AdminClient is also supported by Kafka community and thus is compatible and 
robust as KafkaConsumer, it's designed just like the Flink public API.

> In some environments using the AdminClient can be problematic 

If AdminClient has bugs we should report it to Kafka community to help improve 
it.

> Looking at the OffsetRetriever there are 4 requirements here

In fact there maybe new type of OffsetSpec, for example max-timestamp in 
KIP-734, even though it's trivial but indicate AdminClient is better concerning 
offset management.

> Remove usage of AdminClient from KafkaSource logic
> --
>
> Key: FLINK-27137
> URL: https://issues.apache.org/jira/browse/FLINK-27137
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kafka
>Affects Versions: 1.15.0, 1.14.4
>Reporter: Gyula Fora
>Priority: Major
>
> Parts of the KafkaSource logic (specifically the KafkaSourceEnumerator) uses 
> the Kafka AdminClient instead of the KafkaConsumer.
> It seems that the KafkaConsumer already provides all the necessary 
> information that the enumerator needs so there is no reason for introducing 
> the AdminClient.
> In some environments using the AdminClient can be problematic even if we are 
> not using certaing features.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-27137) Remove usage of AdminClient from KafkaSource logic

2022-04-08 Thread Gyula Fora (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519679#comment-17519679
 ] 

Gyula Fora commented on FLINK-27137:


Even KafkaConsumer 2.0.0 supports all these APIs that I mentioned so I don't 
think there should be a compatibility issue on that side.

Could you please give one example feature that would not be supported by the 
KafkaConsumer directly and we require the AdminClient for it? That would 
probably help me understand better why we would like to keep it.

> Remove usage of AdminClient from KafkaSource logic
> --
>
> Key: FLINK-27137
> URL: https://issues.apache.org/jira/browse/FLINK-27137
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kafka
>Affects Versions: 1.15.0, 1.14.4
>Reporter: Gyula Fora
>Priority: Major
>
> Parts of the KafkaSource logic (specifically the KafkaSourceEnumerator) uses 
> the Kafka AdminClient instead of the KafkaConsumer.
> It seems that the KafkaConsumer already provides all the necessary 
> information that the enumerator needs so there is no reason for introducing 
> the AdminClient.
> In some environments using the AdminClient can be problematic even if we are 
> not using certaing features.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-27137) Remove usage of AdminClient from KafkaSource logic

2022-04-08 Thread Arvid Heise (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519674#comment-17519674
 ] 

Arvid Heise commented on FLINK-27137:
-

Hi [~gyfora], sorry if you perceived my message as unfriendly that was not my 
intention. :(

I just expect a ticket that basically reverts a ticket that went through duly 
process to be more descriptive so that everyone can understand the issue and 
discuss. The information in the ticket description is still lacking in that 
regard. 

You are also completely dismissing the valid use case of users not depending on 
Java 8, simply bumping Kafka dependency to 3 and use the new functionality. So 
please go for an approach that makes both possible.

I'd also raise the possibility that you simply implement a limited 
{{AdminClient}} on your end as well or else you may run into issues with other 
Java-based frameworks, such as Spark, that may or may not also use the 
{{AdminClient}}. Expecting certain frameworks to use one part of Public API of 
a client jar and not another is a non-obvious requirement. For context, you 
can't run Pulsar without its {{AdminClient}}, so don't read too much into the 
name. Lastly, even Redpanda is providing that API to be a full drop-in 
replacement.

In general, I'd deem your approach to security as non-standard, so please make 
sure to not disable the standard cases by simply reverting. 

> Remove usage of AdminClient from KafkaSource logic
> --
>
> Key: FLINK-27137
> URL: https://issues.apache.org/jira/browse/FLINK-27137
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kafka
>Affects Versions: 1.15.0, 1.14.4
>Reporter: Gyula Fora
>Priority: Major
>
> Parts of the KafkaSource logic (specifically the KafkaSourceEnumerator) uses 
> the Kafka AdminClient instead of the KafkaConsumer.
> It seems that the KafkaConsumer already provides all the necessary 
> information that the enumerator needs so there is no reason for introducing 
> the AdminClient.
> In some environments using the AdminClient can be problematic even if we are 
> not using certaing features.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-27137) Remove usage of AdminClient from KafkaSource logic

2022-04-08 Thread Gyula Fora (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519591#comment-17519591
 ] 

Gyula Fora commented on FLINK-27137:


[~arvid] "so I'd be inclined to close this ticket with "Won't Do" : let's try 
to keep a friendly attitude, I never suggested that you would need to do 
anything, I simply want to discuss this :) 

As for the context of this ticket. Some managed Kafka environments (including 
ours) use custom Kafka client implementations. This might restrict the use of 
certain APIs / functions such as the AdminClient. From an architectural 
standpoint we should generally try to limit what external APIs we use to 
increase compatibility between services. Since Flink will absolutely need the 
KafkaConsumer API for the main purpose of consuming / producing data there is 
clearly no way around that, but if we can limit the usage of new client 
interfaces such as AdminClient that can be beneficial.

Regular kafka consumers and producer applications do not usually use the 
AdminClient (as the name also suggests) and one could argue that Flink should 
try to behave as much as regular kafka applications to increase compatibility.

Looking at the OffsetRetriever there are 4 requirements here: getting committed 
group offsets, getting begin/end offsets and gettings offsets for timestamp. 
This could all be implemented using the KafkaConsumer (at least looking at the 
newer API 
[https://kafka.apache.org/30/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html)]

So if we find that everything that we are trying to achieve can be done by the 
APIs that we already use (KafkaConsumer) then it would best to avoid using the 
AdminClient which is basically just an extra API dependency at that point 
without practical gain.

If we see that there are some future features that cannot be covered by the 
KafkaConsumer we should at least make this logic pluggable / optional.

> Remove usage of AdminClient from KafkaSource logic
> --
>
> Key: FLINK-27137
> URL: https://issues.apache.org/jira/browse/FLINK-27137
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kafka
>Affects Versions: 1.15.0, 1.14.4
>Reporter: Gyula Fora
>Priority: Major
>
> Parts of the KafkaSource logic (specifically the KafkaSourceEnumerator) uses 
> the Kafka AdminClient instead of the KafkaConsumer.
> It seems that the KafkaConsumer already provides all the necessary 
> information that the enumerator needs so there is no reason for introducing 
> the AdminClient.
> In some environments using the AdminClient can be problematic even if we are 
> not using certaing features.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-27137) Remove usage of AdminClient from KafkaSource logic

2022-04-08 Thread dengziming (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519534#comment-17519534
 ] 

dengziming commented on FLINK-27137:


In fact, AdminClient is better when concerning getting offsets, for example, 
AdminClient supports max-timestamp whereas KafkaConsumer doesn't, and 
KafkaConsumer contains extra component, for example `Fetcher` and 
`ConsumerCoordinator`.

> Remove usage of AdminClient from KafkaSource logic
> --
>
> Key: FLINK-27137
> URL: https://issues.apache.org/jira/browse/FLINK-27137
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kafka
>Affects Versions: 1.15.0, 1.14.4
>Reporter: Gyula Fora
>Priority: Major
>
> Parts of the KafkaSource logic (specifically the KafkaSourceEnumerator) uses 
> the Kafka AdminClient instead of the KafkaConsumer.
> It seems that the KafkaConsumer already provides all the necessary 
> information that the enumerator needs so there is no reason for introducing 
> the AdminClient.
> In some environments using the AdminClient can be problematic even if we are 
> not using certaing features.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-27137) Remove usage of AdminClient from KafkaSource logic

2022-04-08 Thread Arvid Heise (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519514#comment-17519514
 ] 

Arvid Heise commented on FLINK-27137:
-

Can you please elaborate on {{In some environments using the AdminClient can be 
problematic even if we are not using certaing features.}} 

There are some pros to using the AdminClient and the drawbacks have not been 
clearly stated, so I'd be inclined to close this ticket with "Won't Do".

After you have stated your concerns clearly, I'd probably go for a hybrid 
approach, where a newly introduced {{OffsetRetriever}} abstracts from the used 
client to serve both use cases.

> Remove usage of AdminClient from KafkaSource logic
> --
>
> Key: FLINK-27137
> URL: https://issues.apache.org/jira/browse/FLINK-27137
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kafka
>Affects Versions: 1.15.0, 1.14.4
>Reporter: Gyula Fora
>Priority: Critical
>
> Parts of the KafkaSource logic (specifically the KafkaSourceEnumerator) uses 
> the Kafka AdminClient instead of the KafkaConsumer.
> It seems that the KafkaConsumer already provides all the necessary 
> information that the enumerator needs so there is no reason for introducing 
> the AdminClient.
> In some environments using the AdminClient can be problematic even if we are 
> not using certaing features.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-27137) Remove usage of AdminClient from KafkaSource logic

2022-04-08 Thread Martijn Visser (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519480#comment-17519480
 ] 

Martijn Visser commented on FLINK-27137:


I have no strong opinion on this topic at all :) Would be good to get some 
input from [~dengziming] and [~lindong] since they've worked on this to 
hopefully get to a consensus on this usage. 

> Remove usage of AdminClient from KafkaSource logic
> --
>
> Key: FLINK-27137
> URL: https://issues.apache.org/jira/browse/FLINK-27137
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kafka
>Affects Versions: 1.15.0, 1.14.4
>Reporter: Gyula Fora
>Priority: Critical
>
> Parts of the KafkaSource logic (specifically the KafkaSourceEnumerator) uses 
> the Kafka AdminClient instead of the KafkaConsumer.
> It seems that the KafkaConsumer already provides all the necessary 
> information that the enumerator needs so there is no reason for introducing 
> the AdminClient.
> In some environments using the AdminClient can be problematic even if we are 
> not using certaing features.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-27137) Remove usage of AdminClient from KafkaSource logic

2022-04-08 Thread Gyula Fora (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519476#comment-17519476
 ] 

Gyula Fora commented on FLINK-27137:


I was not aware of that [~martijnvisser] but I think it still might make sense 
to remove the AdminClient. Basically the logic for using it seems to be that we 
might use some features in the future that the consumer might not have at that 
type :) 

> Remove usage of AdminClient from KafkaSource logic
> --
>
> Key: FLINK-27137
> URL: https://issues.apache.org/jira/browse/FLINK-27137
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kafka
>Affects Versions: 1.15.0, 1.14.4
>Reporter: Gyula Fora
>Priority: Critical
>
> Parts of the KafkaSource logic (specifically the KafkaSourceEnumerator) uses 
> the Kafka AdminClient instead of the KafkaConsumer.
> It seems that the KafkaConsumer already provides all the necessary 
> information that the enumerator needs so there is no reason for introducing 
> the AdminClient.
> In some environments using the AdminClient can be problematic even if we are 
> not using certaing features.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-27137) Remove usage of AdminClient from KafkaSource logic

2022-04-08 Thread Martijn Visser (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519473#comment-17519473
 ] 

Martijn Visser commented on FLINK-27137:


This was done deliberately via FLINK-25368

> Remove usage of AdminClient from KafkaSource logic
> --
>
> Key: FLINK-27137
> URL: https://issues.apache.org/jira/browse/FLINK-27137
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kafka
>Affects Versions: 1.15.0, 1.14.4
>Reporter: Gyula Fora
>Priority: Critical
>
> Parts of the KafkaSource logic (specifically the KafkaSourceEnumerator) uses 
> the Kafka AdminClient instead of the KafkaConsumer.
> It seems that the KafkaConsumer already provides all the necessary 
> information that the enumerator needs so there is no reason for introducing 
> the AdminClient.
> In some environments using the AdminClient can be problematic even if we are 
> not using certaing features.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-27137) Remove usage of AdminClient from KafkaSource logic

2022-04-08 Thread Gyula Fora (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519472#comment-17519472
 ] 

Gyula Fora commented on FLINK-27137:


cc [~matyas] [~thw] 

> Remove usage of AdminClient from KafkaSource logic
> --
>
> Key: FLINK-27137
> URL: https://issues.apache.org/jira/browse/FLINK-27137
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kafka
>Affects Versions: 1.15.0, 1.14.4
>Reporter: Gyula Fora
>Priority: Critical
>
> Parts of the KafkaSource logic (specifically the KafkaSourceEnumerator) uses 
> the Kafka AdminClient instead of the KafkaConsumer.
> It seems that the KafkaConsumer already provides all the necessary 
> information that the enumerator needs so there is no reason for introducing 
> the AdminClient.
> In some environments using the AdminClient can be problematic even if we are 
> not using certaing features.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)