Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/21619#discussion_r200225290
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala
---
@@ -190,26 +194,29 @@ object Block {
while (strings.hasNext) {
val input = inputs.next
input match {
- case _: ExprValue | _: Block =>
+ case _: ExprValue | _: CodeBlock =>
codeParts += buf.toString
buf.clear
blockInputs += input.asInstanceOf[JavaCode]
+ case EmptyBlock =>
case _ =>
buf.append(input)
}
buf.append(strings.next)
}
- if (buf.nonEmpty) {
- codeParts += buf.toString
- }
+ codeParts += buf.toString
(codeParts.toSeq, blockInputs.toSeq)
}
}
/**
* A block of java code. Including a sequence of code parts and some
inputs to this block.
- * The actual java code is generated by embedding the inputs into the code
parts.
+ * The actual java code is generated by embedding the inputs into the code
parts. Here we keep
+ * inputs of `JavaCode` instead of simply folding them as a string of
code, because we need to
+ * track expressions (`ExprValue`) in this code block. We need to be able
to manipulate the
+ * expressions later without changing the behavior of this code block in
some applications, e.g.,
+ * method splitting.
*/
case class CodeBlock(codeParts: Seq[String], blockInputs: Seq[JavaCode])
extends Block {
override lazy val exprValues: Set[ExprValue] = {
--- End diff --
Currently it is a `lazy` one, so we may only build it when we use it. It
was invented originally for manipulating expressions in a code block.
But I realized that we may not actually need `exprValues` if we treat
`CodeBlock` as tree. In the PR #21405, the manipulating API doesn't use
`exprValues` when transforming a `CodeBlock`.
Thus I agree with you that we can get rid of `exprValues` in the PR.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]