cloud-fan commented on a change in pull request #25569: [SPARK-28863][SQL]
Introduce AlreadyPlanned, a node that speeds-up planning
URL: https://github.com/apache/spark/pull/25569#discussion_r317448797
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V1FallbackWriters.scala
##########
@@ -111,11 +107,10 @@ sealed trait V1FallbackWriters extends SupportsV1Write {
* A trait that allows Tables that use V1 Writer interfaces to append data.
*/
trait SupportsV1Write extends SparkPlan {
- // TODO: We should be able to work on SparkPlans at this point.
- def plan: LogicalPlan
+ def query: SparkPlan
protected def writeWithV1(relation: InsertableRelation): RDD[InternalRow] = {
- relation.insert(Dataset.ofRows(sqlContext.sparkSession, plan), overwrite =
false)
+ relation.insert(AlreadyPlanned.dataFrame(sqlContext.sparkSession, query),
overwrite = false)
Review comment:
I think `AlreadyPlanned` works well here, as a top-level node. However,
there can be problems if we make the framework too general and support using
`AlreadyPlanned` as a non-top-levle node.
The physical plan has no stats, so `AlreadyPlanned` has no stats as well.
This may change the planning result if the original logical plan has stats,
e.g. broadcast join becomes sort merge join.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]