Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/20647#discussion_r170091954
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
---
@@ -415,12 +418,14 @@ class MicroBatchExecution(
case v1: SerializedOffset => reader.deserializeOffset(v1.json)
case v2: OffsetV2 => v2
}
- reader.setOffsetRange(
- toJava(current),
- Optional.of(availableV2))
+ reader.setOffsetRange(toJava(current), Optional.of(availableV2))
logDebug(s"Retrieving data from $reader: $current ->
$availableV2")
- Some(reader ->
- new
StreamingDataSourceV2Relation(reader.readSchema().toAttributes, reader))
+ Some(reader -> StreamingDataSourceV2Relation(
--- End diff --
I realize that this is a pre-existing problem, but why is it necessary to
create a relation from a reader here? The addition of `FakeDataSourceV2` and
the `readerToDataSourceMap` aren't unresonable because the relation should have
a reference to the `DataSourceV2` instance, but I doubt that the relation
should be created here.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]