Hi All,

I am wondering about the relative performance of  "insert into table1 select 
distinct a,b from ..." and "insert into table1 select a,b from ... group by 
a,b" when querying tables of different sizes (10K, 100K, 1s, 10s, 100s of 
millions of rows).  

The distinct way tends to sort/unique and the group by tends to hash 
aggregate... any opinions on which is better?

I can also change the schema to a certain extent, so would it be worthwhile to 
put indices on the queried tables (or refactor them) hoping the distinct does 
an index scan instead of sort...  would the query planner take advantage of 
that?

Thanks,

Shawn


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

               http://archives.postgresql.org

Reply via email to