I will extract the element I want to sort, then combine it with the old struct 
as a new struct whose first element is what I want to sort.
________________________________
发件人: neeraj bhadani <bhadani.neeraj...@gmail.com>
发送时间: 2020年5月19日 19:09
收件人: user <user@spark.apache.org>
主题: array_sort function behaviour

Hi All,
   I need to sort the array<struct> based on a particular element from a 
struct. I am trying to use the "array_sort" function and could see that by 
default it is sorting the array but based on the first numerical element. Is 
this the expected behaviour? PFB sample code and output.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
//// SAMPLE CODE
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
val jsonData = """
{
"topping":
[
{ "id": "5001", "id1": "5001", "type": "None" },
{ "id": "5002", "id1": "5008", "type": "Glazed" },
{ "id": "5005", "id1": "5007", "type": "Sugar" },
{ "id": "5007", "id1": "5002", "type": "Powdered Sugar" },
{ "id": "5006", "id1": "5005", "type": "Chocolate with Sprinkles" },
{ "id": "5003", "id1": "5004", "type": "Chocolate" },
{ "id": "5004", "id1": "5003", "type": "Maple" }
]
}
"""
val json_df = spark.read.json(Seq(jsonData).toDS)
val sort_df = json_df.select(array_sort($"topping").as("sort_col"))
display(sort_df)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
//// OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[Screenshot 2020-05-19 12.06.30.png]
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

As you could see the above output is sorted based on the "id" element which is 
the first numerical element in the struct.

Is there any way to specify the element based on which sorting can be done?

Regards,
Neeraj

Reply via email to