HeartSaVioR commented on issue #23747: [SPARK-26848][SQL][SS] Introduce new option to Kafka source: offset by timestamp (starting/ending) URL: https://github.com/apache/spark/pull/23747#issuecomment-531933055 > I don't feel that qualified to review this, but see others have generally approved. I see. No problem and thanks for reviewing even the patch is not familiar for you. I can wait for other reviewers who can decide to merge. > Is there any impact to users who do not specify these new properties? does it overlap with or duplicate any existing "offset" functionality? Those would be my key review questions. No. It provides another way to set "offset", by timestamp. For now, end users need to set exact offset no. or either latest/earliest, and when they want to run the query starting from specific time point they need to know about exact offset which is inserted at that time. While end users may retrieve it from cli tool (not 100% sure but given they expose API...), it's not convenient to retrieve the offset from Kafka for the time point and set to Spark option. There's another benefit for this change - once they specify the offset to Spark option, unless they also leave comment to describe where the offset came from, the offset number is not showing the intention that they want to run from specific time point. After the patch the intention could be represented very clear. > Regarding Kafka 0.10 support, yes I think it could be reasonable to drop support for < 1.0. ... Would there be any significant upside for Spark, like simplifying code or assumptions, making it easier to support, taking advantage of newer features? Maybe we don't need to guide about version issue for both this (>= 0.10) and Kafka header support (>= 0.11). We already use pretty high version of Kafka client so there's no significant change (benefits on code side) on drop supporting old versions.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
