[GitHub] [spark] MaxGekk commented on issue #27759: [SPARK-31008][SQL]Support json_array_length function

GitBox Mon, 02 Mar 2020 09:13:23 -0800

MaxGekk commented on issue #27759: [SPARK-31008][SQL]Support json_array_length 
function
URL: https://github.com/apache/spark/pull/27759#issuecomment-593511926
 
 
   > At the moment we have to parse all ...
   
   You can avoid deep parsing by specifying string as element type. For example:
   ```scala
   scala> val df = Seq("""[{"a":1}, {"a": 2}]""").toDF("json")
   df: org.apache.spark.sql.DataFrame = [json: string]
   
   scala> df.select(size(from_json($"json", ArrayType(StringType)))).show
   +---------------------+
   |size(from_json(json))|
   +---------------------+
   |                    2|
   +---------------------+
   ``` 
   It does actually the same as your expression. Maybe it is less optimal 
because `from_json()` materializes arrays but this is another question how to 
optimize the combination of size + from_json of array of strings. I would add 
an optimization rule instead of extending public API.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] MaxGekk commented on issue #27759: [SPARK-31008][SQL]Support json_array_length function

Reply via email to