Github user jose-torres commented on a diff in the pull request:
https://github.com/apache/spark/pull/20647#discussion_r169719547
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
---
@@ -107,17 +106,24 @@ case class DataSourceV2Relation(
}
/**
- * A specialization of DataSourceV2Relation with the streaming bit set to
true. Otherwise identical
- * to the non-streaming relation.
+ * A specialization of [[DataSourceV2Relation]] with the streaming bit set
to true.
+ *
+ * Note that, this plan has a mutable reader, so Spark won't apply
operator push-down for this plan,
+ * to avoid making the plan mutable. We should consolidate this plan and
[[DataSourceV2Relation]]
+ * after we figure out how to apply operator push-down for streaming data
sources.
--- End diff --
I guess I had assumed that data source v2 guaranteed it would call all the
supported stateful methods during planning, and that the most recent call won.
Proposal 2 I don't think will work, since the planning for each batch is
required at a fairly deep level and adaptive execution is disabled for
streaming. Proposal 1 sounds fine to me, although I'd like to note that this
kinda seems like it's working around an issue with having stateful pushdown
methods. From an abstract perspective, they're really just action-at-a-distance
parameters for createReadTasks().
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]