RE: Partitioning to speed up processing?

2016-03-10 Thread Gerhard Fiedler
Grouping is applied in the aggregation. From: holden.ka...@gmail.com [mailto:holden.ka...@gmail.com] On Behalf Of Holden Karau Sent: Thu, Mar 10, 2016 13:56 To: Gerhard Fiedler Cc: user@spark.apache.org Subject: Re: Partitioning to speed up processing? Are they entire data set aggregates or is

Re: Partitioning to speed up processing?

2016-03-10 Thread Holden Karau
Are they entire data set aggregates or is there some grouping applied? On Thursday, March 10, 2016, Gerhard Fiedler wrote: > I have a number of queries that result in a sequence Filter > Project > > Aggregate. I wonder whether partitioning the input table makes sense. > > > > Does Aggregate bene

Partitioning to speed up processing?

2016-03-10 Thread Gerhard Fiedler
I have a number of queries that result in a sequence Filter > Project > Aggregate. I wonder whether partitioning the input table makes sense. Does Aggregate benefit from a partitioned input? If so, what partitions would be most useful (related to the aggregations)? Do Filter and Project preserv