Re: [PERFORM] Group by more efficient than distinct?

Luke Lonergan Sun, 20 Apr 2008 22:36:42 -0700

Hi Francisco,

Generally, PG sorting is much slower than hash aggregation for performing
the distinct operation.  There may be small sizes where this isn¹t true, but
for large amounts of data (in-memory or not), hash agg (used most often, but
not always by GROUP BY) is faster.


We¹ve implemented a special optimization to PG sorting that does the
distinct processing within the sort, instead of afterward, but it¹s limited
to some small-ish number (10,000) of distinct values due to it¹s use of a
memory and processing intensive heap.

So, you¹re better off using GROUP BY and making sure that the planner is
using hash agg to do the work.

- Luke 


On 4/17/08 8:46 PM, "Francisco Reyes" <[EMAIL PROTECTED]> wrote:

> I am trying to get a distinct set of rows from 2 tables.
> After looking at someone else's query I noticed they were doing a group by
> to obtain the unique list.
> 
> After comparing on multiple machines with several tables, it seems using
> group by to obtain a distinct list is substantially faster than using
> select distinct.
> 
> Is there any dissadvantage of using "group by" to obtain a unique list?
> 
> On a small dataset the difference was about 20% percent.
> 
> Group by
>  HashAggregate  (cost=369.61..381.12 rows=1151 width=8) (actual
> time=76.641..85.167 rows=2890 loops=1)
> 
> Distinct
>  Unique  (cost=1088.23..1174.53 rows=1151 width=8) (actual
> time=90.516..140.123 rows=2890 loops=1)
> 
> Although I don't have the numbers here with me, a simmilar result was
> obtaining against a query that would return 100,000 rows. 20% and more
> speed differnce between "group by" over "select distinct".
> 
> --
> Sent via pgsql-performance mailing list ([email protected])
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: [PERFORM] Group by more efficient than distinct?

Reply via email to