uros-db commented on code in PR #56547:
URL: https://github.com/apache/spark/pull/56547#discussion_r3430238692
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeCsvJsonExprs.scala:
##########
@@ -58,6 +83,122 @@ object OptimizeCsvJsonExprs extends Rule[LogicalPlan] {
}
}
+ /**
+ * Share simple top-level GetJsonObject paths without changing the
Hive-compatible semantics of
+ * nested paths, wildcards, or array subscripts. [[MultiGetJsonObject]]
preserves the first
+ * non-null duplicate-key match used by GetJsonObject, unlike JsonTuple.
+ */
+ private def shareGetJsonObjects(project: Project): Project = {
Review Comment:
Please consider this minor note here regarding performance: there might be
some redundant optimizer work, because shareGetJsonObjects runs its full
collect/group/dedup on every Project on every FixedPoint iteration, including
the inner project it just created. It's correct/idempotent, just does some
wasted allocation; an early
`project.projectList.exists(_.containsPattern(GET_JSON_OBJECT))` guard would
avoid it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]