sunchao commented on code in PR #56547:
URL: https://github.com/apache/spark/pull/56547#discussion_r3437515468
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeCsvJsonExprs.scala:
##########
@@ -58,6 +83,122 @@ object OptimizeCsvJsonExprs extends Rule[LogicalPlan] {
}
}
+ /**
+ * Share simple top-level GetJsonObject paths without changing the
Hive-compatible semantics of
+ * nested paths, wildcards, or array subscripts. [[MultiGetJsonObject]]
preserves the first
+ * non-null duplicate-key match used by GetJsonObject, unlike JsonTuple.
+ */
+ private def shareGetJsonObjects(project: Project): Project = {
Review Comment:
Thanks, addressed in 8b54e69. I used an exact GetJsonObject instance check
rather than containsPattern(GET_JSON_OBJECT), because MultiGetJsonObject also
advertises that pattern. A pattern-based guard would still match the
synthesized inner project. The new guard avoids rerunning the
collect/group/dedup work unless the project contains an actual GetJsonObject.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]