Re: [PR] [SPARK-52060][SQL] Make `OneRowRelationExec` node [spark]

via GitHub Tue, 20 May 2025 13:29:27 -0700


richardc-db commented on code in PR #50849:
URL: https://github.com/apache/spark/pull/50849#discussion_r2098816354



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala:
##########
@@ -319,3 +319,46 @@ case class RDDScanExec(
 
   override def getStream: Option[SparkDataStream] = stream
 }
+
+/**
+ * A physical plan node for `OneRowRelation` for scans with no 'FROM' clause.
+ *
+ * We do not extend `RDDScanExec` in order to avoid complexity due to 
`TreeNode.makeCopy` and
+ * `TreeNode`'s general use of reflection.
+ */
+case class OneRowRelationExec() extends LeafExecNode
+  with InputRDDCodegen {
+
+  override val nodeName: String = s"Scan OneRowRelation"
+
+  override val output: Seq[Attribute] = Nil
+
+  private val emptyRow: InternalRow = InternalRow.empty
+
+  private val rdd = session.sparkContext.parallelize(Seq(emptyRow), 1)

Review Comment:
   hmm, sure i implemented this - but because we still want to increment the 
number of output rows at the time when the row is actually processed, i did not 
end up simply returning `rdd` from `doExecute()`... lmk if you think there a 
better way.



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala:
##########
@@ -319,3 +319,46 @@ case class RDDScanExec(
 
   override def getStream: Option[SparkDataStream] = stream
 }
+
+/**
+ * A physical plan node for `OneRowRelation` for scans with no 'FROM' clause.
+ *
+ * We do not extend `RDDScanExec` in order to avoid complexity due to 
`TreeNode.makeCopy` and
+ * `TreeNode`'s general use of reflection.
+ */
+case class OneRowRelationExec() extends LeafExecNode
+  with InputRDDCodegen {
+
+  override val nodeName: String = s"Scan OneRowRelation"
+
+  override val output: Seq[Attribute] = Nil
+
+  private val emptyRow: InternalRow = InternalRow.empty
+
+  private val rdd = session.sparkContext.parallelize(Seq(emptyRow), 1)
+
+  override lazy val metrics = Map(
+    "numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of output 
rows"))

Review Comment:
   hmm, I guess we might not _need_ it, but again I would prefer to keep it as 
is to avoid breaking anyone that might rely on it. I also think that it might 
be relevant for Spark UI?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-52060][SQL] Make `OneRowRelationExec` node [spark]

Reply via email to