Github user tdas commented on the issue:
https://github.com/apache/spark/pull/11863
1. I didnt quite get it when you meant "But your description of what the
code is currently doing
is not accurate, and your recommendation does not meet the use cases." I
just collapsed the three cases into two - when the user has NO PREFERENCES (the
system SHOULD figure out how to schedule partitions on the same executors
consistently), and SOME PREFERENCES (because of co-located brokers, or skew, or
whatever). Why doesnt this recommendation meet the criteria?
2. I agree with the argument that there are whole lot of stuff you cannot
do without exposing a () => Consumer function. Buts thats where the question of
API stability comes in. At this late stage of 2.0 release, I would much rather
provide simpler API for simpler usecases than we know will not break, rather
than an API that supports everything is more prone to breaking if Kafka breaks
API. We can always start simple and then add more advanced interfaces in the
future.
3. Wrapping things up with extra Spark classes and interfaces is a cost we
have to pay in order to prevent API breaking in the future. It is an investment
we are undertaking in every part of Spark - SparkSession (using a builder
pattern, instead of exposing constructor), SQL Data sources (never expose any
3rd party classes), etc. Its hard-learnt lesson.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]