tedyu commented on pull request #30984:
URL: https://github.com/apache/spark/pull/30984#issuecomment-757984613


   Here is some background on how I came about the current approach.
   
   Canonical json expression is something like: phone->code or phone->>code 
where phone is the json(b) column and code is the field.
   However, the json expression is rejected by:
   ```
   org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to 
toAttribute on unresolved object, tree: unresolvedalias(lambdafunction(code, 
lambda 'phone, false), None)
       at 
org.apache.spark.sql.catalyst.analysis.UnresolvedAlias.toAttribute(unresolved.scala:463)
       at 
org.apache.spark.sql.catalyst.plans.logical.Project.$anonfun$output$1(basicLogicalOperators.scala:63)
       at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
       at scala.collection.immutable.List.foreach(List.scala:392)
       at scala.collection.TraversableLike.map(TraversableLike.scala:238)
       at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
       at scala.collection.immutable.List.map(List.scala:298)
       at 
org.apache.spark.sql.catalyst.plans.logical.Project.output(basicLogicalOperators.scala:63)
       at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$inputSet$1(QueryPlan.scala:57)
   ```
   I haven't spent much time investigating how the native jsonb expression can 
be directly supported in Spark because the lambda is a fundamental notation.
   
   bq. using strings all around like a > 1 instead of GreaterThan(a, 1)
   
   In general, I agree strong typing is better than string matching.
   However, please note that what this PR tries to handle is not the '>' part, 
it is the ```a``` part.
   
   bq. the string representation for nested columns is pretty standard.
   
   As I mentioned above, json path expression is quite standard. I tend to 
think that `get_json_object(phone, '$.code')` should be treated similarly as 
`phone->code` (due to the restriction on lambda).
   
   Along this line of thinking, using string matching for pushing down json 
path expression is tantamount to pushing down nested column.
   
   @cloud-fan 
   bq. probably need to merge the V1 Filter and V2 Expression to support cases 
like this
   
   If you can outline in relatively more detailed steps how the merge would 
help expressing json path expression, that would be nice.
   I have some cycles which I can contribute to building this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to