tedyu commented on pull request #30984:
URL: https://github.com/apache/spark/pull/30984#issuecomment-757984613
Here is some background on how I came about the current approach.
Canonical json expression is something like: phone->code or phone->>code
where phone is the json(b) column and code is the field.
However, the json expression is rejected by:
```
org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to
toAttribute on unresolved object, tree: unresolvedalias(lambdafunction(code,
lambda 'phone, false), None)
at
org.apache.spark.sql.catalyst.analysis.UnresolvedAlias.toAttribute(unresolved.scala:463)
at
org.apache.spark.sql.catalyst.plans.logical.Project.$anonfun$output$1(basicLogicalOperators.scala:63)
at
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.immutable.List.map(List.scala:298)
at
org.apache.spark.sql.catalyst.plans.logical.Project.output(basicLogicalOperators.scala:63)
at
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$inputSet$1(QueryPlan.scala:57)
```
I haven't spent much time investigating how the native jsonb expression can
be directly supported in Spark because the lambda is a fundamental notation.
bq. using strings all around like a > 1 instead of GreaterThan(a, 1)
In general, I agree strong typing is better than string matching.
However, please note that what this PR tries to handle is not the '>' part,
it is the ```a``` part.
bq. the string representation for nested columns is pretty standard.
As I mentioned above, json path expression is quite standard. I tend to
think that `get_json_object(phone, '$.code')` should be treated similarly as
`phone->code` (due to the restriction on lambda).
Along this line of thinking, using string matching for pushing down json
path expression is tantamount to pushing down nested column.
@cloud-fan
bq. probably need to merge the V1 Filter and V2 Expression to support cases
like this
If you can outline in relatively more detailed steps how the merge would
help expressing json path expression, that would be nice.
I have some cycles which I can contribute to building this.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]