cloud-fan commented on code in PR #56547:
URL: https://github.com/apache/spark/pull/56547#discussion_r3431571039
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala:
##########
@@ -136,6 +141,82 @@ case class GetJsonObject(json: Expression, path:
Expression)
copy(json = newLeft, path = newRight)
}
+object GetJsonObject {
+ private[sql] def simpleTopLevelField(path: UTF8String): Option[String] = {
+ try {
+ Option(path).flatMap(value =>
JsonPathParser.parse(value.toString)).collect {
+ case List(PathInstruction.Key, PathInstruction.Named(fieldName)) =>
fieldName
+ }
+ } catch {
+ // Numeric subscripts are parsed as Long and can overflow before the
parser returns None.
+ case _: NumberFormatException => None
+ }
+ }
+}
+
+/**
+ * Extracts multiple simple top-level fields from a JSON string in one parse.
This is an internal
+ * expression used to share sibling [[GetJsonObject]] expressions; unsupported
JSON paths remain
+ * as independent GetJsonObject expressions.
+ */
+case class MultiGetJsonObject(
Review Comment:
`MultiGetJsonObject` only wraps `MultiGetJsonObjectEvaluator` with a
hand-written `eval`/`doGenCode`. Since the evaluator already holds all the
logic, the framework can generate that for you via the
`Invoke(Literal.create(evaluator, ObjectType(...)), "evaluate", structType,
Seq(json), Seq(json.dataType))` idiom that `StructsToJson`/`SchemaOfJson` use
in this same file.
Best of all would be to make `MultiGetJsonObject` a `RuntimeReplaceable`
with that `Invoke` as its `replacement`: you'd drop the hand-written
`doGenCode` **and** keep the readable `multi_get_json_object` node in the plan.
That's historically blocked for an optimizer-inserted node (it's created after
`ReplaceExpressions`, so a `RuntimeReplaceable` would never be replaced), but
[SPARK-57512](https://issues.apache.org/jira/browse/SPARK-57512) / #56575 lifts
that by letting an evaluable `RuntimeReplaceable` survive the optimizer and be
materialized just before codegen — so this becomes possible if you coordinate
with that change. If you'd rather not depend on it, emitting the `Invoke`
directly and dropping the class works today (the only cost is `EXPLAIN` showing
`invoke(...)`).
Behavior is preserved either way: `Invoke`'s default `propagateNull` matches
the current `nullIntolerant = true`, the nested `Project` still survives
`CollapseProject` (an `Invoke` isn't "cheap" and the alias is referenced more
than once), and the rule stays idempotent.
Minor and independent: the inner `transformExpressionsWithPruning` predicate
at L72 adds `GET_JSON_OBJECT`, but the `jsonOptimization` partial function it
drives has no `GetJsonObject`/`MultiGetJsonObject` case — only the outer
`transformWithPruning` at L60 needs the pattern, so it can be dropped from the
inner one. Non-blocking.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]