Github user rednaxelafx commented on the issue:
https://github.com/apache/spark/pull/20419
@kiszk SGTM and LGTM. Let's ship it!
One more question on the side: with the `forceComment = true`, are we fully
sure that won't affect the equality of `CodeAndComment`?
The whole point of this PR is to have a way to embed the ID into the
non-executable code of the generated code so that it won't affect the codegen
cache hit, right?
Could you please post an example of either the generated code or the
metrics, e.g. a `SortMergeJoin` on two identical `spark.range(3)`s, and confirm
that even when the two `range()`s are codegen's into different
`codegenStageId`s, with the `spark.sql.codegen.useIdInClassName` turned off,
the two stages would still hit the codegen cache? Basically to verify the
example I gave in
https://github.com/apache/spark/pull/20224#issuecomment-357091842 still hits
the codegen cache when `spark.sql.codegen.useIdInClassName=false`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]