Re: Need to order iterator values in spark dataframe

2020-04-01 Thread Ranjan, Abhinav
Enrico, The below solution works but there is a little glitch. It is working fine in spark-shell but failing for *_/skewed keys/_* while doing a spark-submit. while looking into the execution plan, the partitioning value is same for both repartition and groupByKey and is driven by the value

Re: Need to order iterator values in spark dataframe

2020-03-26 Thread Zahid Rahman
I believe I logged an issue first and I should get a response first. I was ignored. Regards Did you know there are 8 million people in kashmir locked up in their homes by the Hindutwa (Indians) for 8 months. Now the whole planet is locked up in their homes. You didn't take notice of them either.

Re: Need to order iterator values in spark dataframe

2020-03-26 Thread Enrico Minack
Abhinav, you can repartition by your key, then sortWithinPartition, and the groupByKey. Since data are already hash-partitioned by key, Spark should not shuffle the data hence change the sort wihtin each partition: ds.repartition($"key").sortWithinPartitions($"code").groupBy($"key") Enrico

Need to order iterator values in spark dataframe

2020-03-26 Thread Ranjan, Abhinav
Hi, I have a dataframe which has data like: key                         |    code    |    code_value 1                            |    c1        |    11 1                            |    c2        |    12 1                            |    c2        |    9 1                            |    c3