Github user sirishaSindri commented on a diff in the pull request:
https://github.com/apache/spark/pull/20836#discussion_r177276642
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
---
@@ -279,9 +279,8 @@ private[kafka010] case class InternalKafkaConsumer(
if (record.offset > offset) {
// This may happen when some records aged out but their offsets
already got verified
if (failOnDataLoss) {
- reportDataLoss(true, s"Cannot fetch records in [$offset,
${record.offset})")
--- End diff --
@gaborgsomogyi Thank you Gaborgsomogyi for looking at it. For the batch
queries, it will always fail if it fails to read any data from the provided
offsets due to lost data.
https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html
. With this change,it wont fail .Instead It will return all the available
messages within the requested offset range.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]