Github user jose-torres commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20647#discussion_r169719547
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
 ---
    @@ -107,17 +106,24 @@ case class DataSourceV2Relation(
     }
     
     /**
    - * A specialization of DataSourceV2Relation with the streaming bit set to 
true. Otherwise identical
    - * to the non-streaming relation.
    + * A specialization of [[DataSourceV2Relation]] with the streaming bit set 
to true.
    + *
    + * Note that, this plan has a mutable reader, so Spark won't apply 
operator push-down for this plan,
    + * to avoid making the plan mutable. We should consolidate this plan and 
[[DataSourceV2Relation]]
    + * after we figure out how to apply operator push-down for streaming data 
sources.
    --- End diff --
    
    I guess I had assumed that data source v2 guaranteed it would call all the 
supported stateful methods during planning, and that the most recent call won.
    
    Proposal 2 I don't think will work, since the planning for each batch is 
required at a fairly deep level and adaptive execution is disabled for 
streaming. Proposal 1 sounds fine to me, although I'd like to note that this 
kinda seems like it's working around an issue with having stateful pushdown 
methods. From an abstract perspective, they're really just action-at-a-distance 
parameters for createReadTasks().


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to