[VOTE] Release Spark 3.3.1 (RC4)

2022-10-16 Thread Yuming Wang
Please vote on releasing the following candidate as Apache Spark version 3.3.1.

The vote is open until 11:59pm Pacific time October 21th and passes if
a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.3.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see https://spark.apache.org

The tag to be voted on is v3.3.1-rc4 (commit
fbbcf9434ac070dd4ced4fb9efe32899c6db12a9):
https://github.com/apache/spark/tree/v3.3.1-rc4

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-bin

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1430

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-docs

The list of bug fixes going into 3.3.1 can be found at the following URL:
https://s.apache.org/ttgz6

This release is using the release script of the tag v3.3.1-rc4.


FAQ

==
What happened to v3.3.1-rc3?
==
A performance regression(SPARK-40703) was found after tagging
v3.3.1-rc3, which the Iceberg community hopes Spark 3.3.1 could fix.
So we skipped the vote on v3.3.1-rc3.

=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.3.1?
===
The current list of open tickets targeted at 3.3.1 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 3.3.1

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==
In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


Re: [DISCUSS] Flip the default value of Kafka offset fetching config (spark.sql.streaming.kafka.useDeprecatedOffsetFetching)

2022-10-16 Thread Jungtaek Lim
Thanks Gabor and Dongjoon for supporting this!

Bump to reach more eyes. If there is no further voice on this in a couple
of days, I'll consider it as a lazy consensus and submit a PR to this.

On Sat, Oct 15, 2022 at 3:32 AM Dongjoon Hyun 
wrote:

> +1
>
> I agree with Jungtaek and Gabor about switching the default value of
> configurations with the migration guide.
>
> Dongjoon
>
> On Thu, Oct 13, 2022 at 12:46 AM Gabor Somogyi 
> wrote:
>
>> Hi Jungtaek,
>>
>> Good to hear that the new approach is working fine. +1 from my side.
>>
>> BR,
>> G
>>
>>
>> On Thu, Oct 13, 2022 at 4:12 AM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I would like to propose flipping the default value of Kafka offset
>>> fetching config. The context is following:
>>>
>>> Before Spark 3.1, there was only one approach on fetching offset, using
>>> consumer.poll(0). This has been pointed out as a root cause for hang since
>>> there is no timeout for metadata fetch.
>>>
>>> In Spark 3.1, we addressed this via introducing a new approach on
>>> fetching offset, via SPARK-32032
>>> . Since the new
>>> approach leverages AdminClient and consumer group is no longer needed for
>>> fetching offset, required security ACLs are loosen.
>>>
>>> Reference:
>>> https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#offset-fetching
>>>
>>> There was some concern about behavioral change on the security model
>>> hence we couldn't make the new approach by default.
>>>
>>> During the time, we have observed various Kafka connector related issues
>>> which came from old offset fetching (e.g. hang, issues on rebalance on
>>> customer group, etc.) and we fixed many of these issues via simply flipping
>>> the config.
>>>
>>> Based on this, I would consider the default value as "incorrect". The
>>> security-related behavioral change would be introduced inevitably (they can
>>> set topic based ACL rule), but most people will get benefited. IMHO this is
>>> something we can deal with release/migration note.
>>>
>>> Would like to hear the voices on this.
>>>
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>