Re: RIGHT/FULL OUTER hash joins (was Re: [HACKERS] small table left outer join big table)

2010-12-30 Thread Jie Li
On Thu, Dec 30, 2010 at 11:50 PM, Robert Haas robertmh...@gmail.com wrote:

 On Thu, Dec 30, 2010 at 10:45 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  I had an epiphany about this topic, or actually two of them.
 
  1. Whether or not you think there's a significant performance reason
  to support hash right joins, there's a functionality reason.  The
  infrastructure for right join could just as easily do full joins.
  And AFAICS, a hash full join would only require one hashable join
  clause --- the other FULL JOIN ON conditions could be anything at all.
  This is unlike the situation for merge join, where all the JOIN ON
  conditions have to be mergeable or it doesn't work right.  So we could
  greatly reduce the scope of the dreaded FULL JOIN is only supported
  with merge-joinable join conditions error.  (Well, okay, it's not
  *that* dreaded, but people complain about it occasionally.)

 Yeah, that would be neat.  It might be a lot faster in some cases, too.


Yeah, PostgreSQL should have this great feature.

Actually Oracle 10g already has the right hash join,
http://dbcrusade.blogspot.com/2008/01/oracle-hash-join-right-outer.html

 And Oracle 11g has the full hash join.
http://www.dba-oracle.com/oracle11g/oracle_11g_full_hash_join.htm

Haven't checked whether other DBMS have this feature.

Thanks,
Li Jie


Re: [HACKERS] small table left outer join big table

2010-12-29 Thread Jie Li
On Wed, Dec 29, 2010 at 3:58 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On Wed, 2010-12-29 at 09:59 -0500, Tom Lane wrote:
  Robert Haas robertmh...@gmail.com writes:
   On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs si...@2ndquadrant.com
 wrote:
   It's not a bug, that's the way it currently works. We don't need a
 test
   case for that.
 
   Oh, you're right.  I missed the fact that it's a left join.
 
  The only thing that struck me as curious about it was that the OP didn't
  get a nestloop-with-inner-indexscan plan.  That would be explainable if
  there was no index on the large table's id column ... but columns
  named like that usually have indexes.
 
  I can't get all *that* excited about complicating hash joins as
  proposed.  The query is still fundamentally going to be slow because
  you won't get out of having to seqscan the large table.  The only way
  to make it really fast is to not read all of the large table, and
  nestloop-with-inner-indexscan is the only plan type with a hope of
  doing that.

 Seq scanning the big table isn't bad... we've gone to a lot of trouble
 to make it easy to do this, especially with many users.

 Maintaining many large indexes is definitely bad, all that random I/O is
 going to suck badly.

 Seems like an interesting and relatively optimisation to me. Not sure if
 this is a request for feature, or a proposal to write the optimisation.
 I hope its the latter.


Thanks for your comments.  Yeah I'm excited to write code for PostgreSQL,
but I'm new here
and not familiar with the code routine or patch submission. I will try to
learn in near future. So
for the moment, it is a request for feature, and I'm looking forward to any
pgsql-hackers working
on this.

Thanks,
Li Jie


[HACKERS] small table left outer join big table

2010-12-28 Thread Jie Li
Hi,

Please see the following plan:

postgres=# explain select * from small_table left outer join big_table using
(id);
 QUERY PLAN

 Hash Left Join  (cost=126408.00..142436.98 rows=371 width=12)
   Hash Cond: (small_table.id = big_table.id)
   -  Seq Scan on small_table  (cost=0.00..1.09 rows=9 width=8)
   -  Hash  (cost=59142.00..59142.00 rows=410 width=8)
 -  Seq Scan on big_table  (cost=0.00..59142.00 rows=410
width=8)
(5 rows)

Here I have a puzzle, why not choose the small table to build hash table? It
can avoid multiple batches thus save significant I/O cost, isn't it?

We can perform this query in two phases:
1) inner join, using the small table to build hash table.
2) check whether each tuple in the hash table has matches before, which can
be done with another flag bit

The only compromise is the output order, due to the two separate phases. Not
sure whether the SQL standard requires it.

Thanks,
Li Jie


Re: [HACKERS] Why is sorting on two columns so slower thansortingon one column?

2010-12-26 Thread Jie Li
On Thu, Dec 23, 2010 at 10:26 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Kenneth Marshall k...@rice.edu writes:
  On Thu, Dec 23, 2010 at 10:42:26PM +0800, Li Jie wrote:
  But in the last query that sorts on id,  since the query selects all
 the columns for output, the actual sorted size is the same, and the only
 difference is the comparison cost. The query sorting on two columns needs to
 do twice the comparison. Am I right?

  I think you are right. Sorry for the confusion.

 I doubt the cost of comparing two integers is the issue here; rather
 it's more likely one of how many merge passes were needed.  You could
 find out instead of just speculating by turning on trace_sort and
 comparing the log outputs.

regards, tom lane


I follow your advice and set the trace_sort, here is the two queries result,
the log outputs looks similar and  they seem to use the same number of
tapes. But the time is different. Could you have a look:

postgres=# explain analyze select * from big_wf order by id;
  QUERY PLAN

-
Sort  (cost=565525.45..575775.45 rows=410 width=8) (actual
time=28102.985..39351.309 rows=410 loops=1)
  Sort Key: id
  Sort Method:  external merge  Disk: 72048kB
  -  Seq Scan on big_wf  (cost=0.00..59142.00 rows=410 width=8) (actual
time=9.190..7262.789 rows=410 loops=1)
Total runtime: 42953.855 ms
(5 rows)
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  begin tuple sort: nkeys = 1, workMem = 20480, randomAccess = f
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  switching to external sort with 74 tapes: CPU 0.29s/0.28u sec elapsed
0.71 sec
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  finished writing run 1 to tape 0: CPU 0.68s/2.12u sec elapsed 3.41 sec
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  finished writing run 2 to tape 1: CPU 1.22s/4.24u sec elapsed 6.53 sec
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  finished writing run 3 to tape 2: CPU 1.67s/6.38u sec elapsed 9.67 sec
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  finished writing run 4 to tape 3: CPU 2.22s/8.61u sec elapsed 12.93
sec
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  finished writing run 5 to tape 4: CPU 2.72s/10.80u sec elapsed 16.09
sec
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  finished writing run 6 to tape 5: CPU 3.23s/12.86u sec elapsed 19.12
sec
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  finished writing run 7 to tape 6: CPU 3.81s/15.02u sec elapsed 23.84
sec
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  finished writing run 8 to tape 7: CPU 4.36s/17.13u sec elapsed 27.05
sec
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  performsort starting: CPU 4.38s/17.20u sec elapsed 27.15 sec
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  finished writing run 9 to tape 8: CPU 4.39s/17.88u sec elapsed 27.97
sec
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  finished writing final run 10 to tape 9: CPU 4.39s/17.88u sec elapsed
27.97 sec
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  performsort done (except 10-way final merge): CPU 4.39s/17.98u sec
elapsed 28.10 sec
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  external sort ended, 9006 disk blocks used: CPU 8.01s/27.02u sec
elapsed 42.92 sec
STATEMENT:  explain analyze select * from big_wf order by id;


postgres=# explain analyze select * from big_wf order by age,id;
Sort  (cost=565525.45..575775.45 rows=410 width=8) (actual
time=43709.851..57048.645 rows=410 loops=1)
  Sort Key: age, id
  Sort Method:  external merge  Disk: 72048kB
  -  Seq Scan on big_wf  (cost=0.00..59142.00 rows=410 width=8) (actual
time=10.090..7075.208 rows=410 loops=1)
Total runtime: 60721.824 ms

STATEMENT:  explain analyze select * from big_wf order by age,id;
LOG:  begin tuple sort: nkeys = 2, workMem = 20480, randomAccess = f
STATEMENT:  explain analyze select * from big_wf order by age,id;
LOG:  switching to external sort with 74 tapes: CPU 0.28s/0.30u sec elapsed
0.67 sec
STATEMENT:  explain analyze select * from big_wf order by age,id;
LOG:  finished writing run 1 to tape 0: CPU 0.71s/3.63u sec elapsed 4.90 sec
STATEMENT:  explain analyze select * from big_wf order by age,id;
LOG:  finished writing run 2 to tape 1: CPU 1.30s/7.53u sec elapsed 10.30
sec
STATEMENT:  explain analyze select * from big_wf order by age,id;
LOG:  finished writing run 3 to tape 2: CPU 1.88s/11.36u sec elapsed 15.39
sec
STATEMENT:  explain analyze select * from big_wf order by age,id;
LOG:  finished writing run 4 to tape 3: CPU 2.43s/15.20u sec elapsed 20.51
sec
STATEMENT:  

[HACKERS] Why is sorting on two columns so slower than sorting on one column?

2010-12-23 Thread Jie Li
Hi,

Here is the test table,

postgres=# \d big_wf
Table public.big_wf
 Column |  Type   | Modifiers
+-+---
 age| integer |
 id | integer |

postgres=# \dt+ big_wf
 List of relations
 Schema |  Name  | Type  |  Owner   |  Size  | Description
++---+--++-
 public | big_wf | table | workshop | 142 MB |


The first query sorting on one column:
postgres=# explain analyze select * from big_wf order by age;
   QUERY
PLAN
-
 Sort  (cost=565525.45..575775.45 rows=410 width=8) (actual
time=11228.155..16427.149 rows=410 loops=1)
   Sort Key: age
   Sort Method:  external sort  Disk: 72112kB
   -  Seq Scan on big_wf  (cost=0.00..59142.00 rows=410 width=8)
(actual time=6.196..4797.620 rows=410 loops=1)
 Total runtime: 19530.452 ms
(5 rows)

The second query sorting on two columns:
postgres=# explain analyze select * from big_wf order by age,id;
   QUERY
PLAN
-
 Sort  (cost=565525.45..575775.45 rows=410 width=8) (actual
time=37544.779..48206.702 rows=410 loops=1)
   Sort Key: age, id
   Sort Method:  external merge  Disk: 72048kB
   -  Seq Scan on big_wf  (cost=0.00..59142.00 rows=410 width=8)
(actual time=6.796..5518.663 rows=410 loops=1)
 Total runtime: 51258.000 ms
(5 rows)

The verision is 9.0.1 and the work_mem is 20MB. One special thing is, the
first column(age) of all the tuples are of the same value, so the second
column(id) is always needed for comparison.  While the first sorting takes
about only 6 seconds, the second one takes over 30 seconds,  Is this too
much than expected? Is there any possible optimization ?

Thanks,
Li Jie


[HACKERS] Why percent_rank is so slower than rank?

2010-12-09 Thread Jie Li
Hi all,

I'm new to window functions. Recently I run some simple queries but
surprised to find percent_rank is so slower than rank, could anybody tell me
why?

The table schema:
test=# \d inventory1
 Table public.inventory1
Column|  Type   | Modifiers
--+-+---
 inv_date_sk  | integer | not null
 inv_item_sk  | integer | not null
 inv_warehouse_sk | integer | not null
 inv_quantity_on_hand | integer |

test=# \dt+ inventory1
   List of relations
 Schema |Name| Type  |  Owner   |  Size   | Description
++---+--+-+-
 public | inventory1 | table | workshop | 8880 kB |

The rank query result:
test=# explain analyze select inv_date_sk,inv_item_sk, rank()over(partition
by inv_date_sk order by inv_item_sk) from inventory1;
  QUERY
PLAN
---
 WindowAgg  (cost=19563.99..23343.99 rows=189000 width=8) (actual
time=631.947..1361.158 rows=189000 loops=1)
   -  Sort  (cost=19563.99..20036.49 rows=189000 width=8) (actual
time=631.924..771.990 rows=189000 loops=1)
 Sort Key: inv_date_sk, inv_item_sk
 Sort Method:  quicksort  Memory: 12218kB
 -  Seq Scan on inventory1  (cost=0.00..3000.00 rows=189000
width=8) (actual time=0.055..198.948 rows=189000 loops=1)
 Total runtime: 1500.193 ms
(6 rows)

The percent_rank result:
test=# explain analyze select inv_date_sk,inv_item_sk,
percent_rank()over(partition by inv_date_sk order by inv_item_sk) from
inventory1;
  QUERY
PLAN
---
 WindowAgg  (cost=19563.99..23343.99 rows=189000 width=8) (actual
time=766.432..32924.804 rows=189000 loops=1)
   -  Sort  (cost=19563.99..20036.49 rows=189000 width=8) (actual
time=756.320..905.407 rows=189000 loops=1)
 Sort Key: inv_date_sk, inv_item_sk
 Sort Method:  quicksort  Memory: 12218kB
 -  Seq Scan on inventory1  (cost=0.00..3000.00 rows=189000
width=8) (actual time=0.102..224.607 rows=189000 loops=1)
 Total runtime: 33152.188 ms
(6 rows)

One special thing is that all the values of the partition key(inv_date_sk)
are the same, that is, there is only one window partition. I find that
percent_rank needs to buffer all the tuples to get the total number of rows.
But why is it so expensive?

I use 8.4.4. And I only increase the work_mem to 100M and leave other
parameters untouched.

Thanks,
Li Jie