Github user jose-torres commented on a diff in the pull request:
https://github.com/apache/spark/pull/20647#discussion_r170095169
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
---
@@ -415,12 +418,14 @@ class MicroBatchExecution(
case v1: SerializedOffset => reader.deserializeOffset(v1.json)
case v2: OffsetV2 => v2
}
- reader.setOffsetRange(
- toJava(current),
- Optional.of(availableV2))
+ reader.setOffsetRange(toJava(current), Optional.of(availableV2))
logDebug(s"Retrieving data from $reader: $current ->
$availableV2")
- Some(reader ->
- new
StreamingDataSourceV2Relation(reader.readSchema().toAttributes, reader))
+ Some(reader -> StreamingDataSourceV2Relation(
--- End diff --
It's an artifact of the current implementation of streaming progress
reporting, which assumes at a deep and hard to untangle level that new data is
represented by a map of source -> logical plan.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]