cloud-fan commented on code in PR #56547:
URL: https://github.com/apache/spark/pull/56547#discussion_r3431571039


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala:
##########
@@ -136,6 +141,82 @@ case class GetJsonObject(json: Expression, path: 
Expression)
     copy(json = newLeft, path = newRight)
 }
 
+object GetJsonObject {
+  private[sql] def simpleTopLevelField(path: UTF8String): Option[String] = {
+    try {
+      Option(path).flatMap(value => 
JsonPathParser.parse(value.toString)).collect {
+        case List(PathInstruction.Key, PathInstruction.Named(fieldName)) => 
fieldName
+      }
+    } catch {
+      // Numeric subscripts are parsed as Long and can overflow before the 
parser returns None.
+      case _: NumberFormatException => None
+    }
+  }
+}
+
+/**
+ * Extracts multiple simple top-level fields from a JSON string in one parse. 
This is an internal
+ * expression used to share sibling [[GetJsonObject]] expressions; unsupported 
JSON paths remain
+ * as independent GetJsonObject expressions.
+ */
+case class MultiGetJsonObject(

Review Comment:
   `MultiGetJsonObject` only wraps `MultiGetJsonObjectEvaluator` with a 
hand-written `eval`/`doGenCode`. Since the evaluator already holds all the 
logic, the framework can generate that for you via the 
`Invoke(Literal.create(evaluator, ObjectType(...)), "evaluate", structType, 
Seq(json), Seq(json.dataType))` idiom that `StructsToJson`/`SchemaOfJson` use 
in this same file.
   
   Best of all would be to make `MultiGetJsonObject` a `RuntimeReplaceable` 
with that `Invoke` as its `replacement`: you'd drop the hand-written 
`doGenCode` **and** keep the readable `multi_get_json_object` node in the plan. 
That's historically blocked for an optimizer-inserted node (it's created after 
`ReplaceExpressions`, so a `RuntimeReplaceable` would never be replaced), but 
[SPARK-57512](https://issues.apache.org/jira/browse/SPARK-57512) / #56575 lifts 
that by letting an evaluable `RuntimeReplaceable` survive the optimizer and be 
materialized just before codegen — so this becomes possible if you coordinate 
with that change. If you'd rather not depend on it, emitting the `Invoke` 
directly and dropping the class works today (the only cost is `EXPLAIN` showing 
`invoke(...)`).
   
   Behavior is preserved either way: `Invoke`'s default `propagateNull` matches 
the current `nullIntolerant = true`, the nested `Project` still survives 
`CollapseProject` (an `Invoke` isn't "cheap" and the alias is referenced more 
than once), and the rule stays idempotent.
   
   Minor and independent: the inner `transformExpressionsWithPruning` predicate 
at L72 adds `GET_JSON_OBJECT`, but the `jsonOptimization` partial function it 
drives has no `GetJsonObject`/`MultiGetJsonObject` case — only the outer 
`transformWithPruning` at L60 needs the pattern, so it can be dropped from the 
inner one. Non-blocking.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to