Re: [PERFORM] HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?

2011-05-11 Thread Robert Haas
On Wed, Apr 13, 2011 at 1:22 PM, Scott Carey sc...@richrelevance.com wrote:
 A pathological skew case (all relations with the same key), should be
 _cheaper_ to probe.   There should be only _one_ entry in the hash (for
 the one key), and that entry will be a list of all relations matching the
 key.  Therefore, hash probes will either instantly fail to match on an
 empty bucket, fail to match the one key with one compare, or match the one
 key and join on the matching list.

 In particular for anti-join, high skew should be the best case scenario.

I think this argument may hold some water for an anti-join, and maybe
for a semi-join, but it sure doesn't seem right for any kind of join
that has to iterate over all matches (rather than just the first one);
that is, inner, left, right, or full.

 A hash structure that allows multiple entries per key is inappropriate for
 skewed data, because it is not O(n).  One that has one entry per key
 remains O(n) for all skew.  Furthermore, the hash buckets and # of entries
 is proportional to n_distinct in this case, and smaller and more cache and
 memory friendly to probe.

I don't think this argument is right.  The hash table is sized for a
load factor significantly less than one, so if there are multiple
entries in a bucket, it is fairly likely that they are all for the
same key.  Granted, we have to double-check the keys to figure that
out; but I believe that the data structure you are proposing would
require similar comparisons.  The only difference is that they'd be
required when building the hash table, rather than when probing it.

 You can put either relation on the outside with an anti-join, but would
 need a different algorithm and cost estimator if done the other way
 around.  Construct a hash on the join key, that keeps a list of relations
 per key, iterate over the other relation, and remove the key and
 corresponding list from the hash when there is a match, when complete the
 remaining items in the hash are the result of the join (also already
 grouped by the key).  It could be terminated early if all entries are
 removed.
 This would be useful if the hash was small, the other side of the hash too
 large to fit in memory, and alternative was a massive sort on the other
 relation.

This would be a nice extension of commit
f4e4b3274317d9ce30de7e7e5b04dece7c4e1791.

 Does the hash cost estimator bias towards smaller hashes due to hash probe
 cost increasing with hash size due to processor caching effects?  Its not
 quite O(n) due to caching effects.

I don't think we account for that (and I'm not convinced we need to).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?

2011-04-13 Thread Scott Carey
Sorry for resurrecting this thread, but this has been in my outbox for months 
and I think it is important: On Oct 27, 2010, at 12:56 PM, Tom Lane wrote:  
Scott Carey writes:   Why does hashjoin behave poorly when the inner relation 
is not   uniformly distributed and the outer is?  Because a poorly 
distributed inner relation leads to long hash chains.  In the very worst case, 
all the keys are on the same hash chain and it  degenerates to a nested-loop 
join. (There is an assumption in the  costing code that the longer hash chains 
also tend to get searched more  often, which maybe doesn't apply if the outer 
rel is flat, but it's not  obvious that it's safe to not assume that.) I 
disagree. Either 1: The estimator is wrong or 2: The hash data structure is 
flawed. A pathological skew case (all relations with the same key), should be 
_cheaper_ to probe. There should be only _one_ entry in the hash (for the one 
key), and that entry will be a list of all relations matching the key. 
Therefore, hash probes will either instantly fail to match on an empty bucket, 
fail to match the one key with one compare, or match the one key and join on 
the matching list. In particular for anti-join, high skew should be the best 
case scenario. A hash structure that allows multiple entries per key is 
inappropriate for skewed data, because it is not O(n). One that has one entry 
per key remains O(n) for all skew. Furthermore, the hash buckets and # of 
entries is proportional to n_distinct in this case, and smaller and more cache 
and memory friendly to probe.  Not really. It's still searching a long hash 
chain; maybe it will find  an exact match early in the chain, or maybe not. 
It's certainly not  *better* than antijoin with a well-distributed inner rel. 
There shouldn't be long hash chains. A good hash function + proper bucket count 
+ one entry per key = no long chains.  Although the  point is moot, anyway, 
since if it's an antijoin there is only one  candidate for which rel to put on 
the outside. You can put either relation on the outside with an anti-join, but 
would need a different algorithm and cost estimator if done the other way 
around. Construct a hash on the join key, that keeps a list of relations per 
key, iterate over the other relation, and remove the key and corresponding list 
from the hash when there is a match, when complete the remaining items in the 
hash are the result of the join (also already grouped by the key). It could be 
terminated early if all entries are removed. This would be useful if the hash 
was small, the other side of the hash too large to fit in memory, and 
alternative was a massive sort on the other relation. Does the hash cost 
estimator bias towards smaller hashes due to hash probe cost increasing with 
hash size due to processor caching effects? Its not quite O(n) due to caching 
effects.  regards, tom lane


Re: [PERFORM] HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?

2011-04-13 Thread Scott Carey
New email-client nightmares!  Fixed below.  I think.
-

Sorry for resurrecting this thread, but this has been in my outbox for
months and I think it is important:

On Oct 27, 2010, at 12:56 PM, Tom Lane wrote:


 Scott Carey  writes:
Why does hashjoin behave poorly when the inner relation is not

uniformly distributed and the outer is?



 Because a poorly distributed inner relation leads to long hash chains.
 In the very worst case, all the keys are on the same hash chain and it
 degenerates to a nested-loop join.  (There is an assumption in the
 costing code that the longer hash chains also tend to get searched more
 often, which maybe doesn't apply if the outer rel is flat, but it's not
 obvious that it's safe to not assume that.)


I disagree.  Either
1:  The estimator is wrong
or
2:  The hash data structure is flawed.

A pathological skew case (all relations with the same key), should be
_cheaper_ to probe.   There should be only _one_ entry in the hash (for
the one key), and that entry will be a list of all relations matching the
key.  Therefore, hash probes will either instantly fail to match on an
empty bucket, fail to match the one key with one compare, or match the one
key and join on the matching list.

In particular for anti-join, high skew should be the best case scenario.

A hash structure that allows multiple entries per key is inappropriate for
skewed data, because it is not O(n).  One that has one entry per key
remains O(n) for all skew.  Furthermore, the hash buckets and # of entries
is proportional to n_distinct in this case, and smaller and more cache and
memory friendly to probe.

Not really.  It's still searching a long hash chain; maybe it will find
 an exact match early in the chain, or maybe not.  It's certainly not
 *better* than antijoin with a well-distributed inner rel.

There shouldn't be long hash chains.  A good hash function + proper bucket
count + one entry per key = no long chains.


 Although the
 point is moot, anyway, since if it's an antijoin there is only one
 candidate for which rel to put on the outside.

You can put either relation on the outside with an anti-join, but would
need a different algorithm and cost estimator if done the other way
around.  Construct a hash on the join key, that keeps a list of relations
per key, iterate over the other relation, and remove the key and
corresponding list from the hash when there is a match, when complete the
remaining items in the hash are the result of the join (also already
grouped by the key).  It could be terminated early if all entries are
removed.  
This would be useful if the hash was small, the other side of the hash too
large to fit in memory, and alternative was a massive sort on the other
relation.

Does the hash cost estimator bias towards smaller hashes due to hash probe
cost increasing with hash size due to processor caching effects?  Its not
quite O(n) due to caching effects.


 regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?

2011-04-13 Thread Tom Lane
Scott Carey sc...@richrelevance.com writes:
 On Oct 27, 2010, at 12:56 PM, Tom Lane wrote:
 Because a poorly distributed inner relation leads to long hash chains.
 In the very worst case, all the keys are on the same hash chain and it
 degenerates to a nested-loop join.

 A pathological skew case (all relations with the same key), should be
 _cheaper_ to probe.

I think you're missing the point, which is that all the hash work is
just pure overhead in such a case (and it is most definitely not
zero-cost overhead).  You might as well just do a nestloop join.
Hashing is only beneficial to the extent that it allows a smaller subset
of the inner relation to be compared to each outer-relation tuple.
So I think biasing against skew-distributed inner relations is entirely
appropriate.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?

2011-04-13 Thread Scott Carey


On 4/13/11 10:35 AM, Tom Lane t...@sss.pgh.pa.us wrote:

Scott Carey sc...@richrelevance.com writes:
 On Oct 27, 2010, at 12:56 PM, Tom Lane wrote:
 Because a poorly distributed inner relation leads to long hash chains.
 In the very worst case, all the keys are on the same hash chain and it
 degenerates to a nested-loop join.

 A pathological skew case (all relations with the same key), should be
 _cheaper_ to probe.

I think you're missing the point, which is that all the hash work is
just pure overhead in such a case (and it is most definitely not
zero-cost overhead).  You might as well just do a nestloop join.
Hashing is only beneficial to the extent that it allows a smaller subset
of the inner relation to be compared to each outer-relation tuple.
So I think biasing against skew-distributed inner relations is entirely
appropriate.

No it is not pure overhead, and nested loops is far slower.  The only way
it is the same is if there is only _one_ hash bucket!  And that would be a
bug...
In the pathological skew case:

Example:  1,000,000 outer relations.  10,000 inner relations, all with one
key.

Nested loops join:
10 billion compares.

Hashjoin with small inner relation hashed with poor hash data structure:
1. 10,000 hash functions to build the hash (10,000 'puts').
2. 1,000,000 hash functions to probe (1,000,000 'gets').
3. Only keys that fall in the same bucket trigger a compare.  Assume 100
hash buckets (any less is a bug, IMO) and skew such that the bucket is 10x
more likely than average to be hit. 100,000 hit the bucket.  Those that
match are just like nested loops -- this results in 1 billion compares.
All other probes hit an empty bucket and terminate without a compare.
Total: 1.01 million hash functions and bucket seeks, 0.01 of which are
hash 'puts', + 1 billion compares


Hashjoin with 'one entry per key; entry value is list of matching
relations' data structure:
1. 10,000 hash functions to build the hash (10,000 'puts').
2. 1,000,000 hash functions to probe (1,000,000 'gets').
3. Only keys that fall in the same bucket trigger a compare.  Assume 100
hash buckets and enough skew so that the bucket is 10x as likely to be
hit. 100,000 hit bucket.  Those that match only do a compare against one
key -- this results in 100,000 compares.
Total: 1.01 million hash functions and bucket seeks, 0.01 of which are
slightly more expensive hash 'puts', + 0.1 million compares



10 billion compares is much more expensive than either hash scenario.  If
a hash function is 5x the cost of a compare, and a hash 'put' 2x a 'get'
then the costs are about:

10 billion,
1.006 billion,
~6 million

The cost of the actual join output is significant (pairing relations and
creating new relations for output) but close to constant in all three.



In both the 'hash the big one' and 'hash the small one' case you have to
calculate the hash and seek into the hash table the same number of times.
10,000 hash calculations and 'puts' + 1,000,000 hash calculations and
'gets', versus 1,000,000 hash 'puts' and 10,000 'gets'.
But in one case that table is smaller and more cache efficient, making
those gets and puts cheaper.

Which is inner versus outer changes the number of buckets, which can alter
the number of expected compares, but that can be controlled for benefit --
the ratio of keys to buckets can be controlled.  If you choose the smaller
relation, you can afford to overcompensate with more buckets, resulting in
more probes on empty buckets and thus fewer compares.

Additionally, a hash structure that only has one entry per key can greatly
reduce the number of compares and make hashjoin immune to skew from the
cost perspective.  It also makes it so that choosing the smaller relation
over the big one to hash is always better provided the number of buckets
is chosen well.



regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?

2010-10-27 Thread Scott Carey

On Oct 26, 2010, at 8:48 PM, Tom Lane wrote:

 Robert Haas robertmh...@gmail.com writes:
 I'm also a bit suspicious of the fact that the hash condition has a
 cast to text on both sides, which implies, to me anyway, that the
 underlying data types are not text.  That might mean that the query
 planner doesn't have very good statistics, which might mean that the
 join selectivity estimates are wackadoo, which can apparently cause
 this problem:
 
 Um ... you're guilty of the same thing as the OP, ie not showing how
 you got this example.  But I'm guessing that it was something like
 
 create table little as select * from generate_series(1,10) a;
 create table big as select * from generate_series(1,10) a;
 ... wait for auto-analyze of big ...
 explain select * from little, big where little.a = big.a;
 
 Here, big is large enough to prod autovacuum into analyzing it,
 whereas little isn't.  So when the planner runs, it sees
 
 (1) big is known to have 10 rows, and big.a is known unique;
 (2) little is estimated to have many fewer rows, but nothing is
known about the distribution of little.a.
 
 In this situation, it's going to prefer to hash big, because hash join
 behaves pretty nicely when the inner rel is uniformly distributed and
 the outer not, but not nicely at all when it's the other way round.
 It'll change its mind as soon as you analyze little, but it doesn't
 like taking a chance on an unknown distribution.  See cost_hashjoin
 and particularly estimate_hash_bucketsize.

The type of hash table will make a difference in how it behaves with skew.  
Open addressing versus linking, etc. 

 
 I'm not convinced this explains Scott's results though --- the numbers
 he's showing don't seem to add up even if you assume a pretty bad
 distribution for the smaller rel.

Answering both messages partially:

The join is on varchar columns.  So no they are cast ::text because its from 
two slightly different varchar declarations to ::text.

The small relation in this case is unique on the key.  But I think postgres 
doesn't know that because there is a unique index on:

(id, name) and the query filters for id = 12 in the example, leaving name 
unique.  But postgres does not identify this as a unique condition for the key. 
 However, there are only about 150 distinct values of id, so even if 'name' 
collides sometimes across ids, there can be no more than 150 values that map to 
one key.

I gave a partial plan, the parent joins are sometimes anti-joins.  The general 
query form is two from the temp table:
An update to a main table where the unique index keys match the temp (inner 
join)
An insert into the main table where the unique index keys do not exist (NOT 
EXISTS query, results in anti-join).  The large relation is the one being 
inserted/updated into the main table.  
Both have the same subplan that takes all the time in the middle -- a hash that 
hashes the large table and probes from the small side.  I am still completely 
confused on how the hashjoin is calculating costs.  In one of my examples it 
seems to be way off.  


Why does hashjoin behave poorly when the inner relation is not uniformly 
distributed and the outer is?  In particular for anti-join this should be the 
best case scenario.

My assumption is this:  Imagine the worst possible skew on the small table.  
Every value is in the same key and there are 20,000 entries in one list under 
one key.  The large table is in the outer relation, and probes for every 
element and has uniform distribution, 20 million -- one thousand rows for each 
of 20,000 keys.  
Inner Join: 
  The values that match the key join agains the list.  There is no faster way 
to join once the list is identified.  It is essentially nested loops per 
matching key.  1000 matches each output 20,000 rows.  If the hashjoin was 
reversed, then the inner relation would be the large table and the outer 
relation would be the small table.  There would be 20,000 matches that each 
output 1000 rows.
Semi-Join:
  The same thing happens, but additionally rows that match nothing are emitted. 
 If the relation that is always kept is the large one, it makes more sense to 
hash the small.
Anti-Join:
  Two relations, I'll call one select the other is not exists.  The 
select relation is the one we are keeping, but only if its key does not match 
any key in the not exists relation.
Case 1:  The large uniform table is select and the small one not exists
  It is always optimal to have the small one as the inner hashed relation, no 
matter what the skew.  If the inner relation is the not exists table, only 
the key needs to be kept in the inner relation hash, the values or number of 
matches do not matter, so skew is irrelevant and actually makes it faster than 
even distribution of keys.
Case 2:  The small relation is the select
  Using the large not exists relation as the inner relation works well, as 
only the existence of a key needs to be kept.  Hashing the small select 
relation 

Re: [PERFORM] HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?

2010-10-27 Thread Tom Lane
Scott Carey sc...@richrelevance.com writes:
 Why does hashjoin behave poorly when the inner relation is not
 uniformly distributed and the outer is?

Because a poorly distributed inner relation leads to long hash chains.
In the very worst case, all the keys are on the same hash chain and it
degenerates to a nested-loop join.  (There is an assumption in the
costing code that the longer hash chains also tend to get searched more
often, which maybe doesn't apply if the outer rel is flat, but it's not
obvious that it's safe to not assume that.)

 In particular for anti-join this should be the best case scenario.

Not really.  It's still searching a long hash chain; maybe it will find
an exact match early in the chain, or maybe not.  It's certainly not
*better* than antijoin with a well-distributed inner rel.  Although the
point is moot, anyway, since if it's an antijoin there is only one
candidate for which rel to put on the outside.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?

2010-10-26 Thread Robert Haas
On Mon, Oct 18, 2010 at 9:40 PM, Scott Carey sc...@richrelevance.com wrote:
 8.4.5

 I consistently see HashJoin plans that hash the large table, and scan the 
 small table.  This is especially puzzling in some cases where I have 30M rows 
 in the big table and ~ 100 in the small... shouldn't it hash the small table 
 and scan the big one?

 Here is one case I saw just recently

               Hash Cond: ((a.e_id)::text = (ta.name)::text)
               -  Index Scan using c_a_s_e_id on a  (cost=0.00..8.21 rows=14 
 width=27)
                     Index Cond: (id = 12)
               -  Hash  (cost=89126.79..89126.79 rows=4825695 width=74)
                     -  Seq Scan on p_a_1287446030 tmp  (cost=0.00..89126.79 
 rows=4825695 width=74)
                           Filter: (id = 12)

Can we have the complex EXPLAIN output here, please?  And the query?
For example, this would be perfectly sensible if the previous line
started with Hash Semi Join or Hash Anti Join.

rhaas=# explain select * from little where exists (select * from big
where big.a = little.a);
  QUERY PLAN
---
 Hash Semi Join  (cost=3084.00..3478.30 rows=10 width=4)
   Hash Cond: (little.a = big.a)
   -  Seq Scan on little  (cost=0.00..1.10 rows=10 width=4)
   -  Hash  (cost=1443.00..1443.00 rows=10 width=4)
 -  Seq Scan on big  (cost=0.00..1443.00 rows=10 width=4)
(5 rows)

I'm also a bit suspicious of the fact that the hash condition has a
cast to text on both sides, which implies, to me anyway, that the
underlying data types are not text.  That might mean that the query
planner doesn't have very good statistics, which might mean that the
join selectivity estimates are wackadoo, which can apparently cause
this problem:

rhaas=# explain select * from little, big where little.a = big.a;
QUERY PLAN
---
 Hash Join  (cost=3084.00..3577.00 rows=2400 width=8)
   Hash Cond: (little.a = big.a)
   -  Seq Scan on little  (cost=0.00..34.00 rows=2400 width=4)
   -  Hash  (cost=1443.00..1443.00 rows=10 width=4)
 -  Seq Scan on big  (cost=0.00..1443.00 rows=10 width=4)
(5 rows)

rhaas=# analyze;
ANALYZE
rhaas=# explain select * from little, big where little.a = big.a;
QUERY PLAN
---
 Hash Join  (cost=1.23..1819.32 rows=10 width=8)
   Hash Cond: (big.a = little.a)
   -  Seq Scan on big  (cost=0.00..1443.00 rows=10 width=4)
   -  Hash  (cost=1.10..1.10 rows=10 width=4)
 -  Seq Scan on little  (cost=0.00..1.10 rows=10 width=4)
(5 rows)

This doesn't appear to make a lot of sense, but...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?

2010-10-26 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I'm also a bit suspicious of the fact that the hash condition has a
 cast to text on both sides, which implies, to me anyway, that the
 underlying data types are not text.  That might mean that the query
 planner doesn't have very good statistics, which might mean that the
 join selectivity estimates are wackadoo, which can apparently cause
 this problem:

Um ... you're guilty of the same thing as the OP, ie not showing how
you got this example.  But I'm guessing that it was something like

create table little as select * from generate_series(1,10) a;
create table big as select * from generate_series(1,10) a;
... wait for auto-analyze of big ...
explain select * from little, big where little.a = big.a;

Here, big is large enough to prod autovacuum into analyzing it,
whereas little isn't.  So when the planner runs, it sees

(1) big is known to have 10 rows, and big.a is known unique;
(2) little is estimated to have many fewer rows, but nothing is
known about the distribution of little.a.

In this situation, it's going to prefer to hash big, because hash join
behaves pretty nicely when the inner rel is uniformly distributed and
the outer not, but not nicely at all when it's the other way round.
It'll change its mind as soon as you analyze little, but it doesn't
like taking a chance on an unknown distribution.  See cost_hashjoin
and particularly estimate_hash_bucketsize.

I'm not convinced this explains Scott's results though --- the numbers
he's showing don't seem to add up even if you assume a pretty bad
distribution for the smaller rel.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?

2010-10-19 Thread Scott Carey

On Oct 18, 2010, at 8:43 PM, Tom Lane wrote:

 Scott Carey sc...@richrelevance.com writes:
 I consistently see HashJoin plans that hash the large table, and scan
 the small table.
 
 Could we see a self-contained test case?  And what cost parameters are
 you using, especially work_mem?

I'll see if I can make a test case.  

Tough to do since I catch these on a production machine when a query is taking 
a long time, and some of the tables are transient.
work_mem is 800MB, I tried 128K and it didn't matter.  It will switch to merge 
join at some point, but not a smaller hash, and the merge join is definitely 
slower.

 
 This is especially puzzling in some cases where I have 30M rows in the big 
 table and ~ 100 in the small... shouldn't it hash the small table and scan 
 the big one?
 
 Well, size of the table isn't the only factor; in particular, a highly
 nonuniform distribution of the key value will inflate the cost estimate
 for using a table on the inner size of the hash.  But the example you
 show here seems a bit extreme.
 

In the case today, the index scan returns unique values, and the large table 
has only a little skew on the join key.

Another case I ran into a few weeks ago is odd. 
(8.4.3 this time)

rr= explain INSERT INTO pav2 (p_id, a_id, values, last_updated, s_id) SELECT 
av.p_id, av.a_id, av.values, av.last_updated, a.s_id 
FROM pav av, attr a where av.a_id = a.id;
 QUERY PLAN 
 
-
 Hash Join  (cost=2946093.92..631471410.73 rows=1342587125 width=69)
   Hash Cond: (a.id = av.a_id)
   -  Seq Scan on attr a  (cost=0.00..275.21 rows=20241 width=8)
   -  Hash  (cost=1200493.44..1200493.44 rows=70707864 width=65)
 -  Seq Scan on pav av  (cost=0.00..1200493.44 rows=70707864 width=65)

If the cost to hash is 1200493, and it needs to probe the hash 20241 times, why 
would the total cost be 631471410?  The cost to probe can't be that big! A cost 
of 500 to probe and join?  
Why favor hashing the large table and probing with the small values rather than 
the other way around?

In this case, I turned enable_mergejoin off in a test because it was deciding 
to sort the 70M rows instead of hash 20k rows and scan 70M, and then got this 
'backwards' hash.  The merge join is much slower, but the cost estimate is much 
less and no combination of cost parameters will make it switch (both estimates 
are affected up and down similarly by the cost parameters).

Both tables analyzed, etc.  One of them is a bulk operation staging table with 
no indexes (the big one), but it is analyzed.  The (av.p_id, av.a_id) pair is 
unique in it. a.id is unique (primary key). The above thinks it is going to 
match 20 times on average (but it actually matches only 1 -- PK join).  av.a_id 
is somewhat skewed , but that is irrelevant if it always matches one.  Even if 
it did match 20 on average, is it worse to probe a hash table 70M times and 
retrieve 20 maches each time than probe 20k times and retrive 7 matches 
each time?  Its the same number of hash function calls and comparisons, but 
different memory profile.




   regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?

2010-10-19 Thread Mladen Gogala

Scott Carey wrote:


If the cost to hash is 1200493, and it needs to probe the hash 20241 times, why would the total cost be 631471410?  The cost to probe can't be that big! A cost of 500 to probe and join?  
Why favor hashing the large table and probing with the small values rather than the other way around?
  



May I ask a stupid question: how is the query cost calculated? What are 
the units? I/O requests? CPU cycles? Monopoly money?



--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?

2010-10-19 Thread Kevin Grittner
Mladen Gogala mladen.gog...@vmsinfo.com wrote:
 
 how is the query cost calculated? What are 
 the units? I/O requests? CPU cycles? Monopoly money?
 
http://www.postgresql.org/docs/current/interactive/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?

2010-10-18 Thread Scott Carey
8.4.5

I consistently see HashJoin plans that hash the large table, and scan the small 
table.  This is especially puzzling in some cases where I have 30M rows in the 
big table and ~ 100 in the small... shouldn't it hash the small table and scan 
the big one?

Here is one case I saw just recently

   Hash Cond: ((a.e_id)::text = (ta.name)::text)
   -  Index Scan using c_a_s_e_id on a  (cost=0.00..8.21 rows=14 
width=27)
 Index Cond: (id = 12)
   -  Hash  (cost=89126.79..89126.79 rows=4825695 width=74)
 -  Seq Scan on p_a_1287446030 tmp  (cost=0.00..89126.79 
rows=4825695 width=74)
   Filter: (id = 12)

Does this ever make sense?  Isn't it always better to hash the smaller side of 
the join, or at least predominantly so?  Maybe if  you want the order of 
elements returning from the join to coincide with the order of the outer part 
of the join for a join higher up the plan tree.  in this specific case, I want 
the order to be based on the larger table for the join higher up (not shown) in 
the plan so that its index scan is in the order that tmp already is.

Certainly, for very small hash tables ( 1000 entries) the cache effects 
strongly favor small tables -- the lookup should be very cheap.  Building a 
very large hash is not cheap, and wastes lots of memory.  I suppose at very 
large sizes something else might come into play that favors hashing the bigger 
table, but I can't think of what that would be for the general case.

Any ideas?  I've seen this with dozens of queries, some simple, some with 5 or 
6 tables and joins.  I even tried making work_mem very small in a 30M row to 
500 row join, and it STILL hashed the big table.  At first I thought that I was 
reading the plan wrong, but google suggests its doing what it looks like its 
doing.  Perhaps this is a bug?
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?

2010-10-18 Thread Tom Lane
Scott Carey sc...@richrelevance.com writes:
 I consistently see HashJoin plans that hash the large table, and scan
 the small table.

Could we see a self-contained test case?  And what cost parameters are
you using, especially work_mem?

 This is especially puzzling in some cases where I have 30M rows in the big 
 table and ~ 100 in the small... shouldn't it hash the small table and scan 
 the big one?

Well, size of the table isn't the only factor; in particular, a highly
nonuniform distribution of the key value will inflate the cost estimate
for using a table on the inner size of the hash.  But the example you
show here seems a bit extreme.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance