Github user koeninger commented on the issue: https://github.com/apache/spark/pull/11863 1. When the user has no preferences, the system already does figure out preferred locations, and not in a random way as you claimed. 2. So lets talk concretely, not hypothetically. If we publish an api where the constructor takes () => Connector, and we provide two simple ways for users to get an instance of that type, e.g. constructorFactory(listOfTopics) and constructorFactory(fromOffsets) What is actually going to break when the Kafka project adds a new subscribeAccordingToThePhaseOfTheMoon(moons) method to Consumer? The people using our simple factories go on about their business. The people who are creating a consumer themselves can use the phase of the moon if they want to, with a pretty minimal amount of change. Non-hypothetically, the new Consumer already has a method for dynamic topic subscription, which addresses some long-standing issues with the way the 0.8 consumer works. Cutting people off of this because you're afraid of something breaking makes no sense. If people want to use something they know is stable, with exactly the same features as the 0.8 connector.... they can still use the 0.8 connector with 0.10 brokers. 3. Again, concretely not hypothetically. You're saying if only we had e.g. introduced a SparkWrappedMessage, and made the 0.8 consumer messageHandler be SparkWrappedMessage => R instead of using the kafka class MessageAndMetadata => R all of this api change wouldn't have been necessary. This is demonstrably false. It would not have prevented api change. The behavior of the underlying consumer _changed_. It changed in such a way that we no longer have individual access to a message as its being deserialized, because the consumer pre-fetches messages in blocks every time it finishes a poll. No amount of wrapping and hiding changes that. I understand you've been burned on e.g. leaking classes from a myriad of 3rd party dependencies in core spark. But the very purpose of this standalone jar is to connect to kafka... the behavior allowed by the kafka classes isn't incidental leakage, it's the whole point. From my point of view, your stated goal is to minimize change. My stated goal is to make sure people can use Kafka and Spark to get their jobs done. I'm demonstrably willing to do the maintenance work to make this happen, even if things unavoidably change. So are the other people who have worked on this ticket since December of last year. On Thu, Jun 23, 2016 at 12:39 PM, Tathagata Das <notificati...@github.com> wrote: > > 1. > > I didnt quite get it when you meant "But your description of what the > code is currently doing > is not accurate, and your recommendation does not meet the use cases." > I just collapsed the three cases into two - when the user has NO > PREFERENCES (the system SHOULD figure out how to schedule partitions on the > same executors consistently), and SOME PREFERENCES (because of co-located > brokers, or skew, or whatever). Why doesnt this recommendation meet the > criteria? > 2. > > I agree with the argument that there are whole lot of stuff you cannot > do without exposing a () => Consumer function. Buts thats where the > question of API stability comes in. At this late stage of 2.0 release, I > would much rather provide simpler API for simpler usecases than we know > will not break, rather than an API that supports everything is more prone > to breaking if Kafka breaks API. We can always start simple and then add > more advanced interfaces in the future. > 3. > > Wrapping things up with extra Spark classes and interfaces is a cost > we have to pay in order to prevent API breaking in the future. It is an > investment we are undertaking in every part of Spark - SparkSession (using > a builder pattern, instead of exposing constructor), SQL Data sources > (never expose any 3rd party classes), etc. Its hard-learnt lesson. > > â > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/apache/spark/pull/11863#issuecomment-228124879>, or mute > the thread > <https://github.com/notifications/unsubscribe/AAGAB5oB1G2I12GLilSrqXzd0DZnd6emks5qOsTBgaJpZM4H1Pg1> > . >
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org