Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/21291#discussion_r188878956
--- Diff: python/pyspark/sql/tests.py ---
@@ -5239,8 +5239,8 @@ def test_complex_groupby(self):
expected2 = df.groupby().agg(sum(df.v))
# groupby one column and one sql expression
- result3 = df.groupby(df.id, df.v % 2).agg(sum_udf(df.v))
- expected3 = df.groupby(df.id, df.v % 2).agg(sum(df.v))
+ result3 = df.groupby(df.id, df.v %
2).agg(sum_udf(df.v)).orderBy(df.id, df.v % 2)
--- End diff --
They are already ordered by `df.id`. This is the partial data:
```
Expected:
id (v % 2) sum(v)
0 0 0.0 120.0
1 0 1.0 125.0
2 1 1.0 125.0
3 1 0.0 130.0
4 2 0.0 130.0
5 2 1.0 135.0
```
```
Result:
id (v % 2) sum(v)
0 0 0.0 120.0
1 0 1.0 125.0
2 1 0.0 130.0
3 1 1.0 125.0
4 2 0.0 130.0
5 2 1.0 135.0
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]