Hi Francisco, Generally, PG sorting is much slower than hash aggregation for performing the distinct operation. There may be small sizes where this isn¹t true, but for large amounts of data (in-memory or not), hash agg (used most often, but not always by GROUP BY) is faster.
We¹ve implemented a special optimization to PG sorting that does the distinct processing within the sort, instead of afterward, but it¹s limited to some small-ish number (10,000) of distinct values due to it¹s use of a memory and processing intensive heap. So, you¹re better off using GROUP BY and making sure that the planner is using hash agg to do the work. - Luke On 4/17/08 8:46 PM, "Francisco Reyes" <[EMAIL PROTECTED]> wrote: > I am trying to get a distinct set of rows from 2 tables. > After looking at someone else's query I noticed they were doing a group by > to obtain the unique list. > > After comparing on multiple machines with several tables, it seems using > group by to obtain a distinct list is substantially faster than using > select distinct. > > Is there any dissadvantage of using "group by" to obtain a unique list? > > On a small dataset the difference was about 20% percent. > > Group by > HashAggregate (cost=369.61..381.12 rows=1151 width=8) (actual > time=76.641..85.167 rows=2890 loops=1) > > Distinct > Unique (cost=1088.23..1174.53 rows=1151 width=8) (actual > time=90.516..140.123 rows=2890 loops=1) > > Although I don't have the numbers here with me, a simmilar result was > obtaining against a query that would return 100,000 rows. 20% and more > speed differnce between "group by" over "select distinct". > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >