[GitHub] [spark] HeartSaVioR commented on pull request #34089: [SPARK-36837][BUILD] Upgrade Kafka to 3.1.0

GitBox Thu, 10 Feb 2022 18:45:37 -0800


HeartSaVioR commented on pull request #34089:
URL: https://github.com/apache/spark/pull/34089#issuecomment-1035821675

Thanks for the detailed inputs, @ijuma .

> If the expectation for Spark users is that a minor release is a drop-in
replacement and no action is expected from users, then I agree that the above
poses a problem.

Seems like it is not a drop-in replacement which remains my concern on
upgrade.

@dongjoon-hyun

We are releasing a new minor version, NOT a new major version, which end
users easily expect that upgrading is "drop-in replacement". From what I
understand is, Spark community is struggling very hard about NOT bringing the
behavioral changes in minor version to respect the semantic versioning. That is
why whenever we bring the change we make a safeguard against config and the
default value is "keep the existing behavior" in many cases, unless we release
a major version or the change is to solve the correctness issue.

When we evaluate bumping a new major version of dependency, we must be fully
aware of the breaking changes the bumping will bring to us. The breaking
changes on config are one of things we totally missed. We didn't know about
this till @ijuma explained to us. This is a huge hole in the process. We are
saying we can fix that, but what if I wasn't here and we just let it go?

After this PR we force our users to use Kafka client 3.1 (end users wouldn't
know the change before they look into their new dependency tree), without
knowing about the details what are the benefits they will gain and what are the
possible risks they will also get. This is a non-trivial problem.

> For example, Kafka dropped support for Scala 2.11 and added support for ZK
3.5.7 in Apache Kafka 2.5. We did a [KIP for the
former](https://cwiki.apache.org/confluence/display/KAFKA/KIP-531%3A+Drop+support+for+Scala+2.11+in+Kafka+2.5)
(Scala 2.11 upgrade), but did not do a KIP for the latter (ZK 3.5.x upgrade)
as our analysis indicated that it was a compatible upgrade given how Kafka uses
ZK. Even though it was a .7, this ZK version did have a bunch of critical bugs
in it still, but they would probably not have been found and fixed if we had
delayed adoption.

This is a good example, ZK 3.5.x upgrade didn't go through KIP (in other
words, dependency upgrade tends to go through KIP), because the analysis
indicated that it was a compatible upgrade given how Kafka uses ZK. Should we
do the analysis and share the result before moving on, instead of simply
checking with existing tests and say "OK we are good to go"?

The testing in staging/production, should have been done before merging,
because doing post-review is NOT a "comfort" action, and PR author also is NOT
that comfortable from post-review comments. "Let us do this first and leave the
remaining on post-reviewing because we can always revert it", works
technically, but everyone is not comfortable with this.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR commented on pull request #34089: [SPARK-36837][BUILD] Upgrade Kafka to 3.1.0

Reply via email to