Re: dataframe.groupby.agg vs sql("select from groupby)")

2016-03-10 Thread Reynold Xin
They should be identical. Can you paste the detailed explain output. On Thursday, March 10, 2016, FangFang Chen wrote: > hi, > Based on my testing, the memory cost is very different for > 1. sql("select * from ...").groupby.agg > 2. sql("select ... From ... Groupby

dataframe.groupby.agg vs sql("select from groupby)")

2016-03-10 Thread FangFang Chen
hi, Based on my testing, the memory cost is very different for 1. sql("select * from ...").groupby.agg 2. sql("select ... From ... Groupby ..."). For table.partition sized more than 500g, 2# run good, while outofmemory happened in 1#. I am using the same spark configurations. Could somebody