subject:"Re\: \[PERFORM\] Group by more efficient than distinct\?"

Re: [PERFORM] Group by more efficient than distinct?

2008-04-22 Thread Matthew Wakeling

On Mon, 21 Apr 2008, Mark Mielke wrote: This surprises me - hash values are lossy, so it must still need to confirm against the real list of values, which at a minimum should require references to the rows to check against? Is PostgreSQL doing something beyond my imagination? :-) Not too

Re: [PERFORM] Group by more efficient than distinct?

2008-04-22 Thread Mark Mielke

Matthew Wakeling wrote: On Mon, 21 Apr 2008, Mark Mielke wrote: This surprises me - hash values are lossy, so it must still need to confirm against the real list of values, which at a minimum should require references to the rows to check against? Is PostgreSQL doing something beyond my

Re: [PERFORM] Group by more efficient than distinct?

2008-04-22 Thread Matthew Wakeling

On Tue, 22 Apr 2008, Mark Mielke wrote: The poster I responded to said that the memory required for a hash join was relative to the number of distinct values, not the number of rows. They gave an example of millions of rows, but only a few distinct values. Above, you agree with me that it it

Re: [PERFORM] Group by more efficient than distinct?

2008-04-22 Thread Mark Mielke

Matthew Wakeling wrote: On Tue, 22 Apr 2008, Mark Mielke wrote: The poster I responded to said that the memory required for a hash join was relative to the number of distinct values, not the number of rows. They gave an example of millions of rows, but only a few distinct values. Above, you

Re: [PERFORM] Group by more efficient than distinct?

2008-04-21 Thread PFC

On Sun, 20 Apr 2008 17:15:36 +0200, Francisco Reyes [EMAIL PROTECTED] wrote: PFC writes: - If you process up to some percentage of your RAM worth of data, hashing is going to be a lot faster Thanks for the excellent breakdown and explanation. I will try and get sizes of the tables in

Re: [PERFORM] Group by more efficient than distinct?

2008-04-21 Thread Mark Mielke

PFC wrote: Actually, the memory used by the hash depends on the number of distinct values, not the number of rows which are processed... Consider : SELECT a GROUP BY a SELECT a,count(*) GROUP BY a In both cases the hash only holds discinct values. So if you have 1 million rows to

Re: [PERFORM] Group by more efficient than distinct?

2008-04-21 Thread Mark Mielke

Mark Mielke wrote: PFC wrote: Actually, the memory used by the hash depends on the number of distinct values, not the number of rows which are processed... Consider : SELECT a GROUP BY a SELECT a,count(*) GROUP BY a In both cases the hash only holds discinct values. So if you

Re: [PERFORM] Group by more efficient than distinct?

2008-04-20 Thread Francisco Reyes

Gregory Stark writes: HashAggregate needs to store more values in memory at the same time so it's not a good plan if you have a lot of distinct values. So far the resulting number of rows where in the thousands and the source data were in there hundreds of thousands and the group by was

Re: [PERFORM] Group by more efficient than distinct?

2008-04-20 Thread Francisco Reyes

PFC writes: - If you process up to some percentage of your RAM worth of data, hashing is going to be a lot faster Thanks for the excellent breakdown and explanation. I will try and get sizes of the tables in question and how much memory the machines have. - If you need DISTINCT

Re: [PERFORM] Group by more efficient than distinct?

2008-04-20 Thread Luke Lonergan

Hi Francisco, Generally, PG sorting is much slower than hash aggregation for performing the distinct operation. There may be small sizes where this isn¹t true, but for large amounts of data (in-memory or not), hash agg (used most often, but not always by GROUP BY) is faster. We¹ve implemented a

Re: [PERFORM] Group by more efficient than distinct?

2008-04-18 Thread Thomas Pundt

On Freitag, 18. April 2008, Francisco Reyes wrote: | I am trying to get a distinct set of rows from 2 tables. | After looking at someone else's query I noticed they were doing a group by | to obtain the unique list. | | After comparing on multiple machines with several tables, it seems using |

Re: [PERFORM] Group by more efficient than distinct?

2008-04-18 Thread Gregory Stark

Francisco Reyes [EMAIL PROTECTED] writes: Is there any dissadvantage of using group by to obtain a unique list? On a small dataset the difference was about 20% percent. Group by HashAggregate (cost=369.61..381.12 rows=1151 width=8) (actual time=76.641..85.167 rows=2890 loops=1)

Re: [PERFORM] Group by more efficient than distinct?

2008-04-18 Thread PFC

On Fri, 18 Apr 2008 11:36:02 +0200, Gregory Stark [EMAIL PROTECTED] wrote: Francisco Reyes [EMAIL PROTECTED] writes: Is there any dissadvantage of using group by to obtain a unique list? On a small dataset the difference was about 20% percent. Group by HashAggregate (cost=369.61..381.12

Re: [PERFORM] Group by more efficient than distinct?

Re: [PERFORM] Group by more efficient than distinct?

Re: [PERFORM] Group by more efficient than distinct?

Re: [PERFORM] Group by more efficient than distinct?

Re: [PERFORM] Group by more efficient than distinct?

Re: [PERFORM] Group by more efficient than distinct?

Re: [PERFORM] Group by more efficient than distinct?

Re: [PERFORM] Group by more efficient than distinct?

Re: [PERFORM] Group by more efficient than distinct?

Re: [PERFORM] Group by more efficient than distinct?

Re: [PERFORM] Group by more efficient than distinct?

Re: [PERFORM] Group by more efficient than distinct?

Re: [PERFORM] Group by more efficient than distinct?

13 matches

Site Navigation

Mail list logo

Footer information