Github user jose-torres commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20647#discussion_r170106104
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
 ---
    @@ -415,12 +418,14 @@ class MicroBatchExecution(
                 case v1: SerializedOffset => reader.deserializeOffset(v1.json)
                 case v2: OffsetV2 => v2
               }
    -          reader.setOffsetRange(
    -            toJava(current),
    -            Optional.of(availableV2))
    +          reader.setOffsetRange(toJava(current), Optional.of(availableV2))
               logDebug(s"Retrieving data from $reader: $current -> 
$availableV2")
    -          Some(reader ->
    -            new 
StreamingDataSourceV2Relation(reader.readSchema().toAttributes, reader))
    +          Some(reader -> StreamingDataSourceV2Relation(
    --- End diff --
    
    When it comes to the streaming execution code, the basic problem is that it 
was more evolved than designed. For example, there's no particular reason to 
use a logical plan; the map is only ever used in order to construct another map 
of source -> physical plan stats. Untangling StreamExecution is definitely 
something we need to do, but that's going to be annoying and I think it's 
sufficiently orthogonal to the V2 migration to put off.
    
    There's currently no design doc for the streaming aspects of DataSourceV2. 
We kinda rushed an experimental version out the door, because it was coupled 
with the experimental ContinuousExecution streaming mode. I'm working on going 
back and cleaning things up; I'll send docs to the dev list and make sure to @ 
you on the changes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to