Re: [PERFORM] Group by more efficient than distinct?

PFC Fri, 18 Apr 2008 03:31:11 -0700

On Fri, 18 Apr 2008 11:36:02 +0200, Gregory Stark <[EMAIL PROTECTED]>wrote:

"Francisco Reyes" <[EMAIL PROTECTED]> writes:

Is there any dissadvantage of using "group by" to obtain a unique list?

On a small dataset the difference was about 20% percent.

Group by
HashAggregate  (cost=369.61..381.12 rows=1151 width=8) (actual
time=76.641..85.167 rows=2890 loops=1)


        Basically :

- If you process up to some percentage of your RAM worth of data, hashingis going to be a lot faster- If the size of the hash grows larger than your RAM, hashing will failmiserably and sorting will be much faster since PG's disksort is reallygood

        - GROUP BY knows this and acts accordingly
        - DISTINCT doesn't know this, it only knows sorting, so it sorts

- If you need DISTINCT x ORDER BY x, sorting may be faster too (dependingon the % of distinct rows)

        - If you need DISTINCT ON, well, you're stuck with the Sort
        - So, for the time being, you can replace DISTINCT with GROUP BY...

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Group by more efficient than distinct?

Reply via email to