Re: [DISCUSS] Flip the default value of Kafka offset fetching config (spark.sql.streaming.kafka.useDeprecatedOffsetFetching)

2022-10-18 Thread Jungtaek Lim
No further voice so far. I'm going to submit a PR. Thanks again for the
feedback!

On Mon, Oct 17, 2022 at 9:30 AM Jungtaek Lim 
wrote:

> Thanks Gabor and Dongjoon for supporting this!
>
> Bump to reach more eyes. If there is no further voice on this in a couple
> of days, I'll consider it as a lazy consensus and submit a PR to this.
>
> On Sat, Oct 15, 2022 at 3:32 AM Dongjoon Hyun 
> wrote:
>
>> +1
>>
>> I agree with Jungtaek and Gabor about switching the default value of
>> configurations with the migration guide.
>>
>> Dongjoon
>>
>> On Thu, Oct 13, 2022 at 12:46 AM Gabor Somogyi 
>> wrote:
>>
>>> Hi Jungtaek,
>>>
>>> Good to hear that the new approach is working fine. +1 from my side.
>>>
>>> BR,
>>> G
>>>
>>>
>>> On Thu, Oct 13, 2022 at 4:12 AM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>>
 Hi all,

 I would like to propose flipping the default value of Kafka offset
 fetching config. The context is following:

 Before Spark 3.1, there was only one approach on fetching offset, using
 consumer.poll(0). This has been pointed out as a root cause for hang since
 there is no timeout for metadata fetch.

 In Spark 3.1, we addressed this via introducing a new approach on
 fetching offset, via SPARK-32032
 . Since the new
 approach leverages AdminClient and consumer group is no longer needed for
 fetching offset, required security ACLs are loosen.

 Reference:
 https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#offset-fetching

 There was some concern about behavioral change on the security model
 hence we couldn't make the new approach by default.

 During the time, we have observed various Kafka connector related
 issues which came from old offset fetching (e.g. hang, issues on rebalance
 on customer group, etc.) and we fixed many of these issues via simply
 flipping the config.

 Based on this, I would consider the default value as "incorrect". The
 security-related behavioral change would be introduced inevitably (they can
 set topic based ACL rule), but most people will get benefited. IMHO this is
 something we can deal with release/migration note.

 Would like to hear the voices on this.

 Thanks,
 Jungtaek Lim (HeartSaVioR)

>>>


Re: [DISCUSS] Flip the default value of Kafka offset fetching config (spark.sql.streaming.kafka.useDeprecatedOffsetFetching)

2022-10-16 Thread Jungtaek Lim
Thanks Gabor and Dongjoon for supporting this!

Bump to reach more eyes. If there is no further voice on this in a couple
of days, I'll consider it as a lazy consensus and submit a PR to this.

On Sat, Oct 15, 2022 at 3:32 AM Dongjoon Hyun 
wrote:

> +1
>
> I agree with Jungtaek and Gabor about switching the default value of
> configurations with the migration guide.
>
> Dongjoon
>
> On Thu, Oct 13, 2022 at 12:46 AM Gabor Somogyi 
> wrote:
>
>> Hi Jungtaek,
>>
>> Good to hear that the new approach is working fine. +1 from my side.
>>
>> BR,
>> G
>>
>>
>> On Thu, Oct 13, 2022 at 4:12 AM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I would like to propose flipping the default value of Kafka offset
>>> fetching config. The context is following:
>>>
>>> Before Spark 3.1, there was only one approach on fetching offset, using
>>> consumer.poll(0). This has been pointed out as a root cause for hang since
>>> there is no timeout for metadata fetch.
>>>
>>> In Spark 3.1, we addressed this via introducing a new approach on
>>> fetching offset, via SPARK-32032
>>> . Since the new
>>> approach leverages AdminClient and consumer group is no longer needed for
>>> fetching offset, required security ACLs are loosen.
>>>
>>> Reference:
>>> https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#offset-fetching
>>>
>>> There was some concern about behavioral change on the security model
>>> hence we couldn't make the new approach by default.
>>>
>>> During the time, we have observed various Kafka connector related issues
>>> which came from old offset fetching (e.g. hang, issues on rebalance on
>>> customer group, etc.) and we fixed many of these issues via simply flipping
>>> the config.
>>>
>>> Based on this, I would consider the default value as "incorrect". The
>>> security-related behavioral change would be introduced inevitably (they can
>>> set topic based ACL rule), but most people will get benefited. IMHO this is
>>> something we can deal with release/migration note.
>>>
>>> Would like to hear the voices on this.
>>>
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>


Re: [DISCUSS] Flip the default value of Kafka offset fetching config (spark.sql.streaming.kafka.useDeprecatedOffsetFetching)

2022-10-14 Thread Dongjoon Hyun
+1

I agree with Jungtaek and Gabor about switching the default value of
configurations with the migration guide.

Dongjoon

On Thu, Oct 13, 2022 at 12:46 AM Gabor Somogyi 
wrote:

> Hi Jungtaek,
>
> Good to hear that the new approach is working fine. +1 from my side.
>
> BR,
> G
>
>
> On Thu, Oct 13, 2022 at 4:12 AM Jungtaek Lim 
> wrote:
>
>> Hi all,
>>
>> I would like to propose flipping the default value of Kafka offset
>> fetching config. The context is following:
>>
>> Before Spark 3.1, there was only one approach on fetching offset, using
>> consumer.poll(0). This has been pointed out as a root cause for hang since
>> there is no timeout for metadata fetch.
>>
>> In Spark 3.1, we addressed this via introducing a new approach on
>> fetching offset, via SPARK-32032
>> . Since the new
>> approach leverages AdminClient and consumer group is no longer needed for
>> fetching offset, required security ACLs are loosen.
>>
>> Reference:
>> https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#offset-fetching
>>
>> There was some concern about behavioral change on the security model
>> hence we couldn't make the new approach by default.
>>
>> During the time, we have observed various Kafka connector related issues
>> which came from old offset fetching (e.g. hang, issues on rebalance on
>> customer group, etc.) and we fixed many of these issues via simply flipping
>> the config.
>>
>> Based on this, I would consider the default value as "incorrect". The
>> security-related behavioral change would be introduced inevitably (they can
>> set topic based ACL rule), but most people will get benefited. IMHO this is
>> something we can deal with release/migration note.
>>
>> Would like to hear the voices on this.
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>


Re: [DISCUSS] Flip the default value of Kafka offset fetching config (spark.sql.streaming.kafka.useDeprecatedOffsetFetching)

2022-10-13 Thread Gabor Somogyi
Hi Jungtaek,

Good to hear that the new approach is working fine. +1 from my side.

BR,
G


On Thu, Oct 13, 2022 at 4:12 AM Jungtaek Lim 
wrote:

> Hi all,
>
> I would like to propose flipping the default value of Kafka offset
> fetching config. The context is following:
>
> Before Spark 3.1, there was only one approach on fetching offset, using
> consumer.poll(0). This has been pointed out as a root cause for hang since
> there is no timeout for metadata fetch.
>
> In Spark 3.1, we addressed this via introducing a new approach on fetching
> offset, via SPARK-32032
> . Since the new
> approach leverages AdminClient and consumer group is no longer needed for
> fetching offset, required security ACLs are loosen.
>
> Reference:
> https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#offset-fetching
>
> There was some concern about behavioral change on the security model hence
> we couldn't make the new approach by default.
>
> During the time, we have observed various Kafka connector related issues
> which came from old offset fetching (e.g. hang, issues on rebalance on
> customer group, etc.) and we fixed many of these issues via simply flipping
> the config.
>
> Based on this, I would consider the default value as "incorrect". The
> security-related behavioral change would be introduced inevitably (they can
> set topic based ACL rule), but most people will get benefited. IMHO this is
> something we can deal with release/migration note.
>
> Would like to hear the voices on this.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>


[DISCUSS] Flip the default value of Kafka offset fetching config (spark.sql.streaming.kafka.useDeprecatedOffsetFetching)

2022-10-12 Thread Jungtaek Lim
Hi all,

I would like to propose flipping the default value of Kafka offset fetching
config. The context is following:

Before Spark 3.1, there was only one approach on fetching offset, using
consumer.poll(0). This has been pointed out as a root cause for hang since
there is no timeout for metadata fetch.

In Spark 3.1, we addressed this via introducing a new approach on fetching
offset, via SPARK-32032 .
Since the new approach leverages AdminClient and consumer group is no
longer needed for fetching offset, required security ACLs are loosen.

Reference:
https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#offset-fetching

There was some concern about behavioral change on the security model hence
we couldn't make the new approach by default.

During the time, we have observed various Kafka connector related issues
which came from old offset fetching (e.g. hang, issues on rebalance on
customer group, etc.) and we fixed many of these issues via simply flipping
the config.

Based on this, I would consider the default value as "incorrect". The
security-related behavioral change would be introduced inevitably (they can
set topic based ACL rule), but most people will get benefited. IMHO this is
something we can deal with release/migration note.

Would like to hear the voices on this.

Thanks,
Jungtaek Lim (HeartSaVioR)