gaborgsomogyi commented on issue #24738: [WIP][SPARK-23098][SQL] Migrate Kafka Batch source to v2. URL: https://github.com/apache/spark/pull/24738#issuecomment-503029450 @rdblue @HeartSaVioR Thanks for the helpful comments! I've just had a look and the suggested approach looks good. @HeartSaVioR thanks for bringing up those important concerns when adding new required columns to Kafka. Related the mentioned 2 bullet points the `dropDuplicates` issue shouldn't be problem here because this is only the batch source but this still stands: `"select *" returns different schemas and results.` The topic column was also odd to me at the first glance but from usage perspective makes sense and useful. Lately I've seen couple of use-cases where some error topic is used as sink when processing was not successful. I'm basically fine to expose metadata after discussing such questions like what @HeartSaVioR brought up. Presume this can be done in a separate PR.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
