On Mon, 21 Apr 2008, Mark Mielke wrote:
This surprises me - hash values are lossy, so it must still need to confirm
against the real list of values, which at a minimum should require references
to the rows to check against?
Is PostgreSQL doing something beyond my imagination? :-)
Not too
Matthew Wakeling wrote:
On Mon, 21 Apr 2008, Mark Mielke wrote:
This surprises me - hash values are lossy, so it must still need to
confirm against the real list of values, which at a minimum should
require references to the rows to check against?
Is PostgreSQL doing something beyond my
On Tue, 22 Apr 2008, Mark Mielke wrote:
The poster I responded to said that the memory required for a hash join was
relative to the number of distinct values, not the number of rows. They gave
an example of millions of rows, but only a few distinct values. Above, you
agree with me that it it
Matthew Wakeling wrote:
On Tue, 22 Apr 2008, Mark Mielke wrote:
The poster I responded to said that the memory required for a hash
join was relative to the number of distinct values, not the number of
rows. They gave an example of millions of rows, but only a few
distinct values. Above, you
On Sun, 20 Apr 2008 17:15:36 +0200, Francisco Reyes
[EMAIL PROTECTED] wrote:
PFC writes:
- If you process up to some percentage of your RAM worth of data,
hashing is going to be a lot faster
Thanks for the excellent breakdown and explanation. I will try and get
sizes of the tables in
PFC wrote:
Actually, the memory used by the hash depends on the number of
distinct values, not the number of rows which are processed...
Consider :
SELECT a GROUP BY a
SELECT a,count(*) GROUP BY a
In both cases the hash only holds discinct values. So if you have
1 million rows to
Mark Mielke wrote:
PFC wrote:
Actually, the memory used by the hash depends on the number of
distinct values, not the number of rows which are processed...
Consider :
SELECT a GROUP BY a
SELECT a,count(*) GROUP BY a
In both cases the hash only holds discinct values. So if you
Gregory Stark writes:
HashAggregate needs to store more values in memory at the same time so it's
not a good plan if you have a lot of distinct values.
So far the resulting number of rows where in the thousands and the source
data were in there hundreds of thousands and the group by was
PFC writes:
- If you process up to some percentage of your RAM worth of data, hashing
is going to be a lot faster
Thanks for the excellent breakdown and explanation. I will try and get sizes
of the tables in question and how much memory the machines have.
- If you need DISTINCT
Hi Francisco,
Generally, PG sorting is much slower than hash aggregation for performing
the distinct operation. There may be small sizes where this isn¹t true, but
for large amounts of data (in-memory or not), hash agg (used most often, but
not always by GROUP BY) is faster.
We¹ve implemented a
On Freitag, 18. April 2008, Francisco Reyes wrote:
| I am trying to get a distinct set of rows from 2 tables.
| After looking at someone else's query I noticed they were doing a group by
| to obtain the unique list.
|
| After comparing on multiple machines with several tables, it seems using
|
Francisco Reyes [EMAIL PROTECTED] writes:
Is there any dissadvantage of using group by to obtain a unique list?
On a small dataset the difference was about 20% percent.
Group by
HashAggregate (cost=369.61..381.12 rows=1151 width=8) (actual
time=76.641..85.167 rows=2890 loops=1)
On Fri, 18 Apr 2008 11:36:02 +0200, Gregory Stark [EMAIL PROTECTED]
wrote:
Francisco Reyes [EMAIL PROTECTED] writes:
Is there any dissadvantage of using group by to obtain a unique list?
On a small dataset the difference was about 20% percent.
Group by
HashAggregate (cost=369.61..381.12
13 matches
Mail list logo