Re: Distinct on Map data type -- SPARK-19893

2018-01-16 Thread Tejas Patil
There is a JIRA for making Map types orderable : https://issues.apache.org/jira/browse/SPARK-18134 Given that this is a non-trivial change, it will take time. On Sat, Jan 13, 2018 at 9:50 PM, ckhari4u wrote: > Wan, Thanks a lot,! I see the issue now. > > Do we have any JIRA's open for the future

Re: Distinct on Map data type -- SPARK-19893

2018-01-13 Thread ckhari4u
Wan, Thanks a lot,! I see the issue now. Do we have any JIRA's open for the future work to be done on this? -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@sp

Re: Distinct on Map data type -- SPARK-19893

2018-01-13 Thread Wenchen Fan
A very simple example is sql("select create_map(1, 'a', 2, 'b')") .union(sql("select create_map(2, 'b', 1, 'a')")) .distinct By definition a map should not care about the order of its entries, so the above query should return one record. However it returns 2 records before SPARK-19893 On Sat,

Re: Distinct on Map data type -- SPARK-19893

2018-01-12 Thread HariKrishnan CK
Hi Wan, could you please be more specific on the scenarios where it will give wrong results. I checked distinct and intersect operators in many use cases i have and could not figure out a failure scenario giving wrong results. Thanks On Jan 12, 2018 7:36 PM, "Wenchen Fan" wrote: Actually Spark

Re: Distinct on Map data type -- SPARK-19893

2018-01-12 Thread Wenchen Fan
Actually Spark 2.1.0 doesn't work for your case, it may give you wrong result... We are still working on adding this feature, but before that, we should fail earlier instead of returning wrong result. On Sat, Jan 13, 2018 at 11:02 AM, ckhari4u wrote: > I see SPARK-19893 is backported to Spark 2.

Distinct on Map data type -- SPARK-19893

2018-01-12 Thread ckhari4u
I see SPARK-19893 is backported to Spark 2.1 and 2.0.1 as well. I do not see a clear justification for why SPARK 19893 is important and needed. I have a sample table which works fine with an earlier build of Spark 2.1.0. Now that the latest build is having the backport of SPARK-19893, its failing w