Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20647#discussion_r170091954
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
 ---
    @@ -415,12 +418,14 @@ class MicroBatchExecution(
                 case v1: SerializedOffset => reader.deserializeOffset(v1.json)
                 case v2: OffsetV2 => v2
               }
    -          reader.setOffsetRange(
    -            toJava(current),
    -            Optional.of(availableV2))
    +          reader.setOffsetRange(toJava(current), Optional.of(availableV2))
               logDebug(s"Retrieving data from $reader: $current -> 
$availableV2")
    -          Some(reader ->
    -            new 
StreamingDataSourceV2Relation(reader.readSchema().toAttributes, reader))
    +          Some(reader -> StreamingDataSourceV2Relation(
    --- End diff --
    
    I realize that this is a pre-existing problem, but why is it necessary to 
create a relation from a reader here? The addition of `FakeDataSourceV2` and 
the `readerToDataSourceMap` aren't unresonable because the relation should have 
a reference to the `DataSourceV2` instance, but I doubt that the relation 
should be created here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to