[GitHub] [spark] WangGuangxin opened a new pull request #31967: [SPAKR-34819][SQL]MapType supports orderable semantics

GitBox Thu, 25 Mar 2021 22:40:56 -0700


WangGuangxin opened a new pull request #31967:
URL: https://github.com/apache/spark/pull/31967



   ### What changes were proposed in this pull request?
   Currently MapType doesn't support orderable semantics, while it's supported 
in Hive/Presto. This makes it hard to migrate from Hive to SparkSQL if user 
have groupby/orderby map type in their sql.
   
   
   ### Why are the changes needed?
   Generally,  we compare two maps by the following steps:
   1. If the size of two maps are not equal, compare them by size.
   2. Otherwise, sort each map entry by map key, then compare two map entries 
one by one, first compare by key, then value.
   
   We have to specially handle this in grouping/join/window because Spark SQL 
turns grouping/join/window partition keys into binary `UnsafeRow` and compare 
the binary data directly instead of using MapType's ordering. In this case, we 
have to insert a `SortMapKey` expression to sort map entry by key. This is very 
similiar to `NormalizeFloatingNumbers` 
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Add more UTs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] WangGuangxin opened a new pull request #31967: [SPAKR-34819][SQL]MapType supports orderable semantics

Reply via email to