Re: [PR] [SPARK-47670][SQL] Share repeated top-level JSON path parsing [spark]

via GitHub Thu, 18 Jun 2026 09:47:02 -0700


sunchao commented on code in PR #56547:
URL: https://github.com/apache/spark/pull/56547#discussion_r3437515468



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeCsvJsonExprs.scala:
##########
@@ -58,6 +83,122 @@ object OptimizeCsvJsonExprs extends Rule[LogicalPlan] {
       }
   }
 
+  /**
+   * Share simple top-level GetJsonObject paths without changing the 
Hive-compatible semantics of
+   * nested paths, wildcards, or array subscripts. [[MultiGetJsonObject]] 
preserves the first
+   * non-null duplicate-key match used by GetJsonObject, unlike JsonTuple.
+   */
+  private def shareGetJsonObjects(project: Project): Project = {

Review Comment:
   Thanks, addressed in 8b54e69. I used an exact GetJsonObject instance check 
rather than containsPattern(GET_JSON_OBJECT), because MultiGetJsonObject also 
advertises that pattern. A pattern-based guard would still match the 
synthesized inner project. The new guard avoids rerunning the 
collect/group/dedup work unless the project contains an actual GetJsonObject.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-47670][SQL] Share repeated top-level JSON path parsing [spark]

Reply via email to