Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/11863
1. When the user has no preferences, the system already does figure out
preferred locations, and not in a random way as you claimed.
2. So lets talk concretely, not hypothetically. If we publish an api
where the constructor takes
() => Connector,
and we provide two simple ways for users to get an instance of that type,
e.g.
constructorFactory(listOfTopics)
and
constructorFactory(fromOffsets)
What is actually going to break when the Kafka project adds a new
subscribeAccordingToThePhaseOfTheMoon(moons) method to Consumer? The
people using our simple factories go on about their business. The people
who are creating a consumer themselves can use the phase of the moon if
they want to, with a pretty minimal amount of change.
Non-hypothetically, the new Consumer already has a method for dynamic topic
subscription, which addresses some long-standing issues with the way the
0.8 consumer works. Cutting people off of this because you're afraid of
something breaking makes no sense. If people want to use something they
know is stable, with exactly the same features as the 0.8 connector....
they can still use the 0.8 connector with 0.10 brokers.
3. Again, concretely not hypothetically.
You're saying if only we had e.g. introduced a SparkWrappedMessage, and
made the 0.8 consumer messageHandler be
SparkWrappedMessage => R
instead of using the kafka class
MessageAndMetadata => R
all of this api change wouldn't have been necessary.
This is demonstrably false. It would not have prevented api change. The
behavior of the underlying consumer _changed_. It changed in such a way
that we no longer have individual access to a message as its being
deserialized, because the consumer pre-fetches messages in blocks every
time it finishes a poll. No amount of wrapping and hiding changes that.
I understand you've been burned on e.g. leaking classes from a myriad of
3rd party dependencies in core spark. But the very purpose of this
standalone jar is to connect to kafka... the behavior allowed by the kafka
classes isn't incidental leakage, it's the whole point.
From my point of view, your stated goal is to minimize change.
My stated goal is to make sure people can use Kafka and Spark to get their
jobs done.
I'm demonstrably willing to do the maintenance work to make this happen,
even if things unavoidably change. So are the other people who have worked
on this ticket since December of last year.
On Thu, Jun 23, 2016 at 12:39 PM, Tathagata Das <[email protected]>
wrote:
>
> 1.
>
> I didnt quite get it when you meant "But your description of what the
> code is currently doing
> is not accurate, and your recommendation does not meet the use cases."
> I just collapsed the three cases into two - when the user has NO
> PREFERENCES (the system SHOULD figure out how to schedule partitions
on the
> same executors consistently), and SOME PREFERENCES (because of
co-located
> brokers, or skew, or whatever). Why doesnt this recommendation meet the
> criteria?
> 2.
>
> I agree with the argument that there are whole lot of stuff you cannot
> do without exposing a () => Consumer function. Buts thats where the
> question of API stability comes in. At this late stage of 2.0 release,
I
> would much rather provide simpler API for simpler usecases than we know
> will not break, rather than an API that supports everything is more
prone
> to breaking if Kafka breaks API. We can always start simple and then
add
> more advanced interfaces in the future.
> 3.
>
> Wrapping things up with extra Spark classes and interfaces is a cost
> we have to pay in order to prevent API breaking in the future. It is an
> investment we are undertaking in every part of Spark - SparkSession
(using
> a builder pattern, instead of exposing constructor), SQL Data sources
> (never expose any 3rd party classes), etc. Its hard-learnt lesson.
>
> â
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/11863#issuecomment-228124879>, or
mute
> the thread
>
<https://github.com/notifications/unsubscribe/AAGAB5oB1G2I12GLilSrqXzd0DZnd6emks5qOsTBgaJpZM4H1Pg1>
> .
>
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]