HI All,
I have data in a emp_df (DataFrame) as mentioned below:
EmpId Sal DeptNo
001 100 10
002 120 20
003 130 10
004 140 20
005 150 10
ordrd_emp_df = emp_df.orderBy($"DeptNo",$"Sal".desc) which results as
below:
DeptNo Sal EmpId
10 150 005
10 130 003
10 100 001
20 140 004
20 120 002
Now I want to pick highest paid EmpId of each DeptNo.,hence applied agg
First method as below
ordrd_emp_df.groupBy("DeptNo").agg($"DeptNo",first("EmpId").as("TopSal")).select($"DeptNo",$"TopSal")
Expected output is DeptNo TopSal
10 005
20 004
But my output varies for each iteration such as
First Iteration results as Dept TopSal
10 003
20 004
Secnd Iteration results as Dept TopSal
10 005
20 004
Third Iteration results as Dept TopSal
10 003
20 002
Not sure why output varies on each iteration as no change in code and
values in DataFrame
Please let me know if any inputs on this
Regards,
Satish Chandra J