Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/23127#discussion_r236395764 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -406,14 +415,62 @@ trait BlockingOperatorWithCodegen extends CodegenSupport { override def limitNotReachedChecks: Seq[String] = Nil } +/** + * Leaf codegen node reading from a single RDD. + */ +trait InputRDDCodegen extends CodegenSupport { + + def inputRDD: RDD[InternalRow] + + // If the input is an RDD of InternalRow which are potentially not UnsafeRow, + // and there is no parent to consume it, it needs an UnsafeProjection. + protected val createUnsafeProjection: Boolean = (parent == null) + + override def inputRDDs(): Seq[RDD[InternalRow]] = { + inputRDD :: Nil + } + + override def doProduce(ctx: CodegenContext): String = { --- End diff -- > This assumes that that parent operator would always result in some UnsafeProjection being eventually added, and hence the output of the WholeStageCodegen unit will be UnsafeRows. I think it's quite a hack in my patch, and that there should be some nicer interface to tell the codegened operators whether thei're dealing with UnsafeRows input, or InternalRows that may not be UnsafeRows...
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org