Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/23127#discussion_r236391673 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -406,14 +415,62 @@ trait BlockingOperatorWithCodegen extends CodegenSupport { override def limitNotReachedChecks: Seq[String] = Nil } +/** + * Leaf codegen node reading from a single RDD. + */ +trait InputRDDCodegen extends CodegenSupport { + + def inputRDD: RDD[InternalRow] + + // If the input is an RDD of InternalRow which are potentially not UnsafeRow, + // and there is no parent to consume it, it needs an UnsafeProjection. + protected val createUnsafeProjection: Boolean = (parent == null) + + override def inputRDDs(): Seq[RDD[InternalRow]] = { + inputRDD :: Nil + } + + override def doProduce(ctx: CodegenContext): String = { --- End diff -- The new one should be the same as the previous `RowDataSourceScanExec.doProduce` and `RDDScanExec.doProduce` if createUnsafeProjection == true, and it should be the same as the previous `InputAdapter.doProduce` and `LocalTableScanExec.doProduce` when createUnsafeProjection == false. From the fact that `InputAdapter` was not doing an explicit unsafe projection, even though it's input could be InternalRows that are not UnsafeRows I derived an assumption that it is safe not to do so as long as there is a parent operator. This assumes that that parent operator would always result in an UnsafeProjection being eventually added, and hence the output of the WholeStageCodegen will be in UnsafeRows.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org