Re: [PR] [SPARK-47670][SQL] Share repeated top-level JSON path parsing [spark]

via GitHub Wed, 17 Jun 2026 10:32:48 -0700


uros-db commented on code in PR #56547:
URL: https://github.com/apache/spark/pull/56547#discussion_r3430238692



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeCsvJsonExprs.scala:
##########
@@ -58,6 +83,122 @@ object OptimizeCsvJsonExprs extends Rule[LogicalPlan] {
       }
   }
 
+  /**
+   * Share simple top-level GetJsonObject paths without changing the 
Hive-compatible semantics of
+   * nested paths, wildcards, or array subscripts. [[MultiGetJsonObject]] 
preserves the first
+   * non-null duplicate-key match used by GetJsonObject, unlike JsonTuple.
+   */
+  private def shareGetJsonObjects(project: Project): Project = {

Review Comment:
   Please consider this minor note here regarding performance: there might be 
some redundant optimizer work, because shareGetJsonObjects runs its full 
collect/group/dedup on every Project on every FixedPoint iteration, including 
the inner project it just created. It's correct/idempotent, just does some 
wasted allocation; an early 
`project.projectList.exists(_.containsPattern(GET_JSON_OBJECT))` guard would 
avoid it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-47670][SQL] Share repeated top-level JSON path parsing [spark]

Reply via email to