Hi Hemant,
My dataframe "ordrd_emd_df" consist data in order as I have applied oderBy
in the first step
And also tried having "orderBy" method before "groupBy" than also getting
different results in each iteration

Regards,
Satish Chandra


On Wed, Feb 3, 2016 at 4:28 PM, Hemant Bhanawat <hemant9...@gmail.com>
wrote:

> Missing order by?
>
> Hemant Bhanawat
> SnappyData (http://snappydata.io/)
>
>
> On Wed, Feb 3, 2016 at 3:45 PM, satish chandra j <jsatishchan...@gmail.com
> > wrote:
>
>> HI All,
>> I have data in a emp_df (DataFrame) as mentioned below:
>>
>> EmpId   Sal   DeptNo
>> 001       100   10
>> 002       120   20
>> 003       130   10
>> 004       140   20
>> 005       150   10
>>
>> ordrd_emp_df = emp_df.orderBy($"DeptNo",$"Sal".desc)  which results as
>> below:
>>
>> DeptNo  Sal   EmpId
>> 10         150   005
>> 10         130   003
>> 10         100   001
>> 20         140   004
>> 20         120   002
>>
>> Now I want to pick highest paid EmpId of each DeptNo.,hence applied agg
>> First method as below
>>
>>
>> ordrd_emp_df.groupBy("DeptNo").agg($"DeptNo",first("EmpId").as("TopSal")).select($"DeptNo",$"TopSal")
>>
>> Expected output is DeptNo  TopSal
>>                               10        005
>>                                20       004
>> But my output varies for each iteration such as
>>
>> First Iteration results as  Dept  TopSal
>>                                       10     003
>>                                        20     004
>>
>> Secnd Iteration results as Dept  TopSal
>>                                       10     005
>>                                       20     004
>>
>> Third Iteration results as  Dept  TopSal
>>                                       10     003
>>                                       20     002
>>
>> Not sure why output varies on each iteration as no change in code and
>> values in DataFrame
>>
>> Please let me know if any inputs on this
>>
>> Regards,
>> Satish Chandra J
>>
>
>

Reply via email to