subject:"\[PERFORM\] slow joining very large table to smaller ones"

Re: [PERFORM] slow joining very large table to smaller ones

2005-07-18 Thread Dawid Kuroczko

On 7/15/05, Bruno Wolff III <[EMAIL PROTECTED]> wrote:
> On Thu, Jul 14, 2005 at 16:29:58 -0600,
>   Dan Harris <[EMAIL PROTECTED]> wrote:
> >
> > Ok, I tried this one.  My ssh keeps getting cut off by a router
> > somewhere between me and the server due to inactivity timeouts, so
> > all I know is that both the select and explain analyze are taking
> > over an hour to run.  Here's the explain select for that one, since
> > that's the best I can get.
> 
> Are you using NAT at home? That's probably where the issue is. If you
> have control of that box you can probably increase the timeout to a
> couple of hours.

Some versions of ssh have such a configuration option (in .ssh/config):

Host *
ServerAliveInterval 600

...it means that ssh will send a "ping" packet to a sshd every 10 minutes
of inactivity.  This way NAT will see activity and won't kill the session.
I'm using OpenSSH_4.1p1 for this...

Oh, and it doesn't have anything to do with TCP keep alive, which is
rather for finding dead connections than keeping connections alive. ;)

   Regards,
   Dawid

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] slow joining very large table to smaller ones

2005-07-15 Thread John A Meinel

Dan Harris wrote:
>
> On Jul 14, 2005, at 10:12 PM, John A Meinel wrote:
>
>>
>> My biggest question is why the planner things the Nested Loop would be
>> so expensive.
>> Have you tuned any of the parameters? It seems like something is  out of
>> whack. (cpu_tuple_cost, random_page_cost, etc...)
>>
>
> here's some of my postgresql.conf.  Feel free to blast me if I did
> something idiotic here.
>
> shared_buffers = 5
> effective_cache_size = 1348000
> random_page_cost = 3
> work_mem = 512000

Unless you are the only person connecting to this database, your
work_mem is very high. And if you haven't modified maintenance_work_mem
it is probably very low. work_mem might be causing postgres to think it
can fit all of a merge into ram, making it faster, I can't say for sure.

> max_fsm_pages = 8

This seems high, but it depends how many updates/deletes you get
in-between vacuums. It may not be too excessive. VACUUM [FULL] VERBOSE
replies with how many free pages are left, if you didn't use that
already for tuning. Though it should be tuned based on a steady state
situation. Not a one time test.

> log_min_duration_statement = 6
> fsync = true ( not sure if I'm daring enough to run without this )
> wal_buffers = 1000
> checkpoint_segments = 64
> checkpoint_timeout = 3000
>

These seem fine to me.

Can you include the output of EXPLAIN SELECT both with and without SET
join_collapselimit? Since your tables have grown, I can't compare the
estimated number of rows, and costs very well.

EXPLAIN without ANALYZE is fine, since I am wondering what the planner
is thinking things cost.

John
=:->

>
> # FOR PG_AUTOVACUUM --#
> stats_command_string = true
> stats_row_level = true
>

signature.asc
Description: OpenPGP digital signature

Re: [PERFORM] slow joining very large table to smaller ones

2005-07-15 Thread PFC




Ok, I tried this one.  My ssh keeps getting cut off by a router
somewhere between me and the server due to inactivity timeouts, so
all I know is that both the select and explain analyze are taking
over an hour to run.  Here's the explain select for that one, since
that's the best I can get.


one word : screen !
one of the most useful little command line utilities...

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [PERFORM] slow joining very large table to smaller ones

2005-07-15 Thread Bruno Wolff III

On Thu, Jul 14, 2005 at 16:29:58 -0600,
  Dan Harris <[EMAIL PROTECTED]> wrote:
> 
> Ok, I tried this one.  My ssh keeps getting cut off by a router  
> somewhere between me and the server due to inactivity timeouts, so  
> all I know is that both the select and explain analyze are taking  
> over an hour to run.  Here's the explain select for that one, since  
> that's the best I can get.

Are you using NAT at home? That's probably where the issue is. If you
have control of that box you can probably increase the timeout to a
couple of hours.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] slow joining very large table to smaller ones

2005-07-15 Thread Dan Harris



On Jul 15, 2005, at 9:09 AM, Dan Harris wrote:



On Jul 14, 2005, at 10:12 PM, John A Meinel wrote:



My biggest question is why the planner things the Nested Loop  
would be

so expensive.
Have you tuned any of the parameters? It seems like something is  
out of

whack. (cpu_tuple_cost, random_page_cost, etc...)




here's some of my postgresql.conf.  Feel free to blast me if I did  
something idiotic here.


shared_buffers = 5
effective_cache_size = 1348000
random_page_cost = 3
work_mem = 512000
max_fsm_pages = 8
log_min_duration_statement = 6
fsync = true ( not sure if I'm daring enough to run without this )
wal_buffers = 1000
checkpoint_segments = 64
checkpoint_timeout = 3000


# FOR PG_AUTOVACUUM --#
stats_command_string = true
stats_row_level = true



Sorry, I forgot to re-post my hardware specs.

HP DL585
4 x 2.2 GHz Opteron
12GB RAM
SmartArray RAID controller, 1GB hardware cache, 4x73GB 10k SCSI in  
RAID 0+1

ext2 filesystem

Also, there are 30 databases on the machine, 27 of them are identical  
schemas. 
 


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] slow joining very large table to smaller ones

2005-07-15 Thread Dan Harris



On Jul 14, 2005, at 10:12 PM, John A Meinel wrote:


My biggest question is why the planner things the Nested Loop would be
so expensive.
Have you tuned any of the parameters? It seems like something is  
out of

whack. (cpu_tuple_cost, random_page_cost, etc...)



here's some of my postgresql.conf.  Feel free to blast me if I did  
something idiotic here.


shared_buffers = 5
effective_cache_size = 1348000
random_page_cost = 3
work_mem = 512000
max_fsm_pages = 8
log_min_duration_statement = 6
fsync = true ( not sure if I'm daring enough to run without this )
wal_buffers = 1000
checkpoint_segments = 64
checkpoint_timeout = 3000


# FOR PG_AUTOVACUUM --#
stats_command_string = true
stats_row_level = true


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [PERFORM] slow joining very large table to smaller ones

2005-07-14 Thread John A Meinel

Dan Harris wrote:
>
> On Jul 14, 2005, at 7:15 PM, John A Meinel wrote:
>
>>
>>
>> Is the distribution of your rows uneven? Meaning do you have more rows
>> with a later id than an earlier one?
>>
>
> There are definitely some id's that will have many times more than  the
> others.  If I group and count them, the top 10 are fairly  dominant in
> the table.

That usually skews the estimates. Since the estimate is more of an
average (unless the statistics are higher).

>
>>>
>>
>> Hmm.. How to do it permanantly? Well you could always issue "set
>> join_collapse set 1; select * from "
>> But obviously that isn't what you prefer. :)
>>
>> I think there are things you can do to make merge join more expensive
>> than a nested loop, but I'm not sure what they are.
>
>
> Maybe someone else has some ideas to encourage this behavior for  future
> work?  Setting it on a per-connection basis is doable, but  would add
> some burden to us in code.

My biggest question is why the planner things the Nested Loop would be
so expensive.
Have you tuned any of the parameters? It seems like something is out of
whack. (cpu_tuple_cost, random_page_cost, etc...)

>
>>
>> What I really don't understand is that the estimates dropped as well.
>> The actual number of estimate rows drops to 3k instead of > 1M.
>> The real question is why does the planner think it will be so  expensive?
>>
>>
>>> select count(*) from k_b join k_r using (incidentid) where k_b.id=107
>>> and k_r.id=94;
>>> count
>>> ---
>>>373
>>>
>>>
>>
>> Well, this says that they are indeed much more selective.
>> Each one has > 1k rows, but together you end up with only 400.
>>
>
> Is this a bad thing?  Is this not "selective enough" to make it much
> faster?

Yes, being more selective is what makes it faster. But the planner
doesn't seem to notice it properly.

>
> Overall, I'm much happier now after seeing the new plan come about,  if
> I can find a way to make that join_collapse behavior permanent, I  can
> certainly live with these numbers.
>

I'm sure there are pieces to tune, but I've reached my limits of
parameters to tweak :)

> Thanks again for your continued efforts.
>
> -Dan
>

John
=:->


signature.asc
Description: OpenPGP digital signature

Re: [PERFORM] slow joining very large table to smaller ones

2005-07-14 Thread Dan Harris



On Jul 14, 2005, at 7:15 PM, John A Meinel wrote:



Is the distribution of your rows uneven? Meaning do you have more rows
with a later id than an earlier one?



There are definitely some id's that will have many times more than  
the others.  If I group and count them, the top 10 are fairly  
dominant in the table.




Hmm.. How to do it permanantly? Well you could always issue "set
join_collapse set 1; select * from "
But obviously that isn't what you prefer. :)

I think there are things you can do to make merge join more expensive
than a nested loop, but I'm not sure what they are.


Maybe someone else has some ideas to encourage this behavior for  
future work?  Setting it on a per-connection basis is doable, but  
would add some burden to us in code.




What I really don't understand is that the estimates dropped as well.
The actual number of estimate rows drops to 3k instead of > 1M.
The real question is why does the planner think it will be so  
expensive?




select count(*) from k_b join k_r using (incidentid) where k_b.id=107
and k_r.id=94;
count
---
   373




Well, this says that they are indeed much more selective.
Each one has > 1k rows, but together you end up with only 400.



Is this a bad thing?  Is this not "selective enough" to make it much  
faster?


Overall, I'm much happier now after seeing the new plan come about,  
if I can find a way to make that join_collapse behavior permanent, I  
can certainly live with these numbers.


Thanks again for your continued efforts.

-Dan

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] slow joining very large table to smaller ones

2005-07-14 Thread Dan Harris



On Jul 14, 2005, at 5:12 PM, John A Meinel wrote:


Dan Harris wrote:




Well, postgres is estimating around 500 rows each, is that way  
off?  Try

just doing:
EXPLAIN ANALYZE SELECT incidentid FROM k_b WHERE id = 107;
EXPLAIN ANALYZE SELECT incidentid FROM k_r WHERE id = 94;



Once again, do this and post the results. We might just need to tweak
your settings so that it estimates the number of rows correctly,  
and we

don't need to do anything else.



Ok, sorry I missed these the first time through:

explain analyze select incidentid from k_b where id = 107;
   QUERY PLAN
 

Index Scan using k_b_idx on k_b  (cost=0.00..1926.03 rows=675  
width=14) (actual time=0.042..298.394 rows=2493 loops=1)

   Index Cond: (id = 107)
Total runtime: 299.103 ms

select count(*) from k_b;
count

698350

( sorry! I think I said this one only had tens of thousands in it )


explain analyze select incidentid from k_r where id =  
94;   QUERY PLAN
 
-
Index Scan using k_r_idx on k_r  (cost=0.00..2137.61 rows=757  
width=14) (actual time=0.092..212.187 rows=10893 loops=1)

   Index Cond: (id = 94)
Total runtime: 216.498 ms
(3 rows)


select count(*) from k_r;
count

671670


That one is quite a bit slower, yet it's the same table structure and  
same index as k_b, also it has fewer records.


I did run VACUUM ANALYZE immediately before running these queries.   
It seems a lot better with the join_collapse set.




\
Well, the planner is powerful enough to flatten nested selects. To  
make

it less "intelligent" you can do:
SET join_collapse_limit 1;
or
SET join_collapse_limit 0;
Which should tell postgres to not try and get tricky with your query.
Again, *usually* the planner knows better than you do. So again  
just do

it to see what you get.



Ok, when join_collapse_limit = 1 I get this now:

explain analyze select recordtext from eventactivity join ( select  
incidentid from k_r join k_b using (incidentid) where k_r.id = 94 and  
k_b.id = 107 ) a  using (incidentid );
   
QUERY PLAN
 
---
Nested Loop  (cost=0.00..156509.08 rows=2948 width=35) (actual  
time=1.555..340.625 rows=24825 loops=1)
   ->  Nested Loop  (cost=0.00..5361.89 rows=6 width=28) (actual  
time=1.234..142.078 rows=366 loops=1)
 ->  Index Scan using k_b_idx on k_b  (cost=0.00..1943.09  
rows=681 width=14) (actual time=0.423..56.974 rows=2521 loops=1)

   Index Cond: (id = 107)
 ->  Index Scan using k_r_idx on k_r  (cost=0.00..5.01  
rows=1 width=14) (actual time=0.031..0.031 rows=0 loops=2521)
   Index Cond: ((k_r.id = 94) AND  
((k_r.incidentid)::text = ("outer".incidentid)::text))
   ->  Index Scan using eventactivity1 on eventactivity   
(cost=0.00..25079.55 rows=8932 width=49) (actual time=0.107..0.481  
rows=68 loops=366)
 Index Cond: ((eventactivity.incidentid)::text =  
("outer".incidentid)::text)

Total runtime: 347.975 ms

MUCH better!  Maybe you can help me understand what I did and if I  
need to make something permanent to get this behavior from now on?








If you have analyzed recently can you do:
SELECT relname, reltuples FROM pg_class WHERE relname='eventactivity';

It is a cheaper form than "SELECT count(*) FROM eventactivity" to  
get an

approximate estimate of the number of rows. But if it isn't too
expensive, please also give the value from SELECT count(*) FROM
eventactivity.

Again, that helps us know if your tables are up-to-date.



Sure:

select relname, reltuples from pg_class where relname='eventactivity';
relname|  reltuples
---+-
eventactivity | 3.16882e+07

select count(*) from eventactivity;
  count
--
31871142













I don't know how selective your keys are, but one of these queries
should probably structure it better for the planner. It depends  
a  lot on

how selective your query is.




eventactivity currently has around 36 million rows in it. There   
should
only be maybe 200-300 incidentids at most that will be matched   
with the
combination of k_b and k_r.  That's why I was thinking I  could  
somehow

get a list of just the incidentids that matched the id  = 94 and id =
107 in k_b and k_r first. Then, I would only need to  grab a few  
hundred

out of 36 million rows from eventactivity.




Well, you can also try:
SELECT count(*) FROM k_b JOIN k_r USING (incidentid)
 WHERE k_b.id=?? AND k_r.id=??
;

That will tell you how many rows they have in common.


selec

Re: [PERFORM] slow joining very large table to smaller ones

2005-07-14 Thread John A Meinel

Tom Lane wrote:
> John A Meinel <[EMAIL PROTECTED]> writes:
>
>>What I don't understand is that the planner is actually estimating that
>>joining against the new table is going to *increase* the number of
>>returned rows.
>
>
> It evidently thinks that incidentid in the k_r table is pretty
> nonunique.  We really need to look at the statistics data to
> see what's going on.
>
>   regards, tom lane
>

Okay, sure. What about doing this, then:

EXPLAIN ANALYZE
SELECT recordtext FROM eventactivity
  JOIN (SELECT DISTINCT incidentid FROM k_r JOIN k_b USING (incidentid)
 WHERE k_r.id = ?? AND k_b.id = ??)
 USING (incidentid)
;

Since I assume that eventactivity is the only table with "recordtext",
and that you don't get any columns from k_r and k_b, meaning it would be
pointless to get duplicate incidentids.

I may be misunderstanding what the query is trying to do, but depending
on what is in k_r and k_b, is it possible to use a UNIQUE INDEX rather
than just an index on incidentid?

There is also the possibility of
EXPLAIN ANALYZE
SELECT recordtext FROM eventactivtity
  JOIN (SELECT incidentid FROM k_r WHERE k_r.id = ??
 UNION SELECT incidentid FROM k_b WHERE k_b.id = ??)
 USING (incidentid)
;

But both of these would mean that you don't actually want columns from
k_r or k_b, just a unique list of incident ids.

But first, I agree, we should make sure the pg_stats values are reasonable.

John
=:->

signature.asc
Description: OpenPGP digital signature

Re: [PERFORM] slow joining very large table to smaller ones

2005-07-14 Thread Tom Lane

John A Meinel <[EMAIL PROTECTED]> writes:
> What I don't understand is that the planner is actually estimating that
> joining against the new table is going to *increase* the number of
> returned rows.

It evidently thinks that incidentid in the k_r table is pretty
nonunique.  We really need to look at the statistics data to
see what's going on.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] slow joining very large table to smaller ones

2005-07-14 Thread John A Meinel

Dan Harris wrote:
>
> On Jul 14, 2005, at 9:42 AM, John A Meinel wrote:
>
>>
>>
>> You might try giving it a little bit more freedom with:
>>
>> EXPLAIN ANALYZE
>> SELECT recordtext FROM eventactivity, k_r, k_b
>>  WHERE eventactivity.incidentid = k_r.incidentid
>>AND eventactivity.incidentid = k_b.incidentid
>>AND k_r.id = 94
>>AND k_b.id = 107
>> -- AND k_r.incidentid = k_b.incidentid
>> ;
>>
>> I'm pretty sure that would give identical results, just let the  planner
>> have a little bit more freedom about how it does it.
>> Also the last line is commented out, because I think it is redundant.
>>
>
> Ok, I tried this one.  My ssh keeps getting cut off by a router
> somewhere between me and the server due to inactivity timeouts, so  all
> I know is that both the select and explain analyze are taking  over an
> hour to run.  Here's the explain select for that one, since  that's the
> best I can get.
>
> explain select recordtext from eventactivity,k_r,k_b where
> eventactivity.incidentid = k_r.incidentid and  eventactivity.incidentid
> = k_b.incidentid and k_r.id = 94 and k_b.id  = 107;
>   QUERY PLAN
> 
> --
> Merge Join  (cost=9624.61..4679590.52 rows=151009549 width=35)
>Merge Cond: (("outer".incidentid)::text = "inner"."?column2?")
>->  Merge Join  (cost=4766.92..4547684.26 rows=16072733 width=117)
>  Merge Cond: (("outer".incidentid)::text = "inner"."?column2?")
>  ->  Index Scan using eventactivity1 on eventactivity
> (cost=0.00..4186753.16 rows=46029271 width=49)
>  ->  Sort  (cost=4766.92..4771.47 rows=1821 width=68)
>Sort Key: (k_b.incidentid)::text
>->  Index Scan using k_b_idx on k_b   (cost=0.00..4668.31
> rows=1821 width=68)
>  Index Cond: (id = 107)
>->  Sort  (cost=4857.69..4862.39 rows=1879 width=68)
>  Sort Key: (k_r.incidentid)::text
>  ->  Index Scan using k_r_idx on k_r  (cost=0.00..4755.52
> rows=1879 width=68)
>Index Cond: (id = 94)
> (13 rows)
>

If anything, the estimations have gotten worse. As now it thinks there
will be 1800 rows returned each, whereas you were thinking it would be
more around 100.

Since you didn't say, you did VACUUM ANALYZE recently, right?

>

...

>>
>> You can also try disabling merge joins, and see how that changes  things.
>>
>
> Are there any negative sideaffects of doing this?

If the planner is estimating things correctly, you want to give it the
most flexibility of plans to pick from, because sometimes a merge join
is faster (postgres doesn't pick things because it wants to go slower).
The only reason for the disable flags is that sometimes the planner
doesn't estimate correctly. Usually disabling a method is not the final
solution, but a way to try out different methods, and see what happens
to the results.

Using: SET enable_mergejoin TO off;
You can disable it just for the current session (not for the entire
database). Which is the recommended way if you have a query that
postgres is messing up on. (Usually it is correct elsewhere).

>>
>> Well, postgres is estimating around 500 rows each, is that way off?  Try
>> just doing:
>> EXPLAIN ANALYZE SELECT incidentid FROM k_b WHERE id = 107;
>> EXPLAIN ANALYZE SELECT incidentid FROM k_r WHERE id = 94;

Once again, do this and post the results. We might just need to tweak
your settings so that it estimates the number of rows correctly, and we
don't need to do anything else.

>>
>> And see if postgres estimates the number of rows properly.
>>
>> I assume you have recently VACUUM ANALYZEd, which means you might need
>> to update the statistics target (ALTER TABLE k_b ALTER COLUMN
>> incidientid SET STATISTICS 100) default is IIRC 10, ranges from  1-1000,
>> higher is more accurate, but makes ANALYZE slower.
>>

...

>> EXPLAIN ANALYZE
>> SELECT recordtext FROM eventactivity
>>  JOIN (SELECT incidentid FROM k_r JOIN k_b USING (incidentid)
>> WHERE k_r.id = 94 AND k_b.id = 107)
>> USING (incidentid);
>>
>
> This one looks like the same plan as the others:
>
> explain select recordtext from eventactivity join ( select incidentid
> from k_r join k_b using (incidentid) where k_r.id = 94 and k_b.id =  107
> ) a  using (incidentid );

Well, the planner is powerful enough to flatten nested selects. To make
it less "intelligent" you can do:
SET join_collapse_limit 1;
or
SET join_collapse_limit 0;
Which should tell postgres to not try and get tricky with your query.
Again, *usually* the planner knows better than you do. So again just do
it to see what you get.

The problem is that if you are only using EXPLAIN SELECT, you will
probably get something which *looks* worse. Because if it looked better,
the planner would have used it. That is why you really need the EXPLAIN
ANALYZE, so that you can see where the planner is incorrect

Re: [PERFORM] slow joining very large table to smaller ones

2005-07-14 Thread Tom Lane

Dan Harris <[EMAIL PROTECTED]> writes:
> Here's the explain select for that one, since  
> that's the best I can get.

> explain select recordtext from eventactivity,k_r,k_b where  
> eventactivity.incidentid = k_r.incidentid and  
> eventactivity.incidentid = k_b.incidentid and k_r.id = 94 and k_b.id  
> = 107;
>QUERY PLAN
>  
> --
> Merge Join  (cost=9624.61..4679590.52 rows=151009549 width=35)
> Merge Cond: (("outer".incidentid)::text = "inner"."?column2?")
> ->  Merge Join  (cost=4766.92..4547684.26 rows=16072733 width=117)
>   Merge Cond: (("outer".incidentid)::text = "inner"."?column2?")
>   ->  Index Scan using eventactivity1 on eventactivity   
> (cost=0.00..4186753.16 rows=46029271 width=49)
>   ->  Sort  (cost=4766.92..4771.47 rows=1821 width=68)
> Sort Key: (k_b.incidentid)::text
> ->  Index Scan using k_b_idx on k_b   
> (cost=0.00..4668.31 rows=1821 width=68)
>   Index Cond: (id = 107)
> ->  Sort  (cost=4857.69..4862.39 rows=1879 width=68)
>   Sort Key: (k_r.incidentid)::text
>   ->  Index Scan using k_r_idx on k_r  (cost=0.00..4755.52  
> rows=1879 width=68)
> Index Cond: (id = 94)
> (13 rows)

There's something awfully fishy here.  The 8.0 planner is definitely
capable of figuring out that it ought to join the smaller relations
first.  As an example, using 8.0.3+ (CVS branch tip) I did

regression=# create table eventactivity(incidentid varchar, recordtext text);
CREATE TABLE
regression=# create table k_r(incidentid varchar);
CREATE TABLE
regression=# create table k_b(incidentid varchar);
CREATE TABLE
regression=# explain select recordtext from eventactivity inner join
(select incidentid from k_r) a using (incidentid)
inner join (select incidentid from k_b) b using (incidentid);

(Being too impatient to actually fill the eventactivity table with 36M
rows of data, I just did some debugger magic to make the planner think
that that was the table size...)  The default plan looks like

 Merge Join  (cost=16137814.70..36563453.23 rows=136170 width=32)
   Merge Cond: (("outer".incidentid)::text = "inner"."?column3?")
   ->  Merge Join  (cost=170.85..290.48 rows=7565 width=64)
 Merge Cond: ("outer"."?column2?" = "inner"."?column2?")
 ->  Sort  (cost=85.43..88.50 rows=1230 width=32)
   Sort Key: (k_r.incidentid)::text
   ->  Seq Scan on k_r  (cost=0.00..22.30 rows=1230 width=32)
 ->  Sort  (cost=85.43..88.50 rows=1230 width=32)
   Sort Key: (k_b.incidentid)::text
   ->  Seq Scan on k_b  (cost=0.00..22.30 rows=1230 width=32)
   ->  Sort  (cost=16137643.84..16227643.84 rows=3600 width=64)
 Sort Key: (eventactivity.incidentid)::text
 ->  Seq Scan on eventactivity  (cost=0.00..108.00 rows=3600 
width=64)

and if I "set enable_mergejoin TO 0;" I get

 Hash Join  (cost=612.54..83761451.54 rows=136170 width=32)
   Hash Cond: (("outer".incidentid)::text = ("inner".incidentid)::text)
   ->  Seq Scan on eventactivity  (cost=0.00..108.00 rows=3600 width=64)
   ->  Hash  (cost=504.62..504.62 rows=7565 width=64)
 ->  Hash Join  (cost=25.38..504.62 rows=7565 width=64)
   Hash Cond: (("outer".incidentid)::text = 
("inner".incidentid)::text)
   ->  Seq Scan on k_r  (cost=0.00..22.30 rows=1230 width=32)
   ->  Hash  (cost=22.30..22.30 rows=1230 width=32)
 ->  Seq Scan on k_b  (cost=0.00..22.30 rows=1230 width=32)

which is the plan I would judge Most Likely To Succeed based on what we
know about Dan's problem.  (The fact that the planner is estimating it
as twice as expensive as the mergejoin comes from the fact that with no
statistics about the join keys, the planner deliberately estimates hash
join as expensive, because it can be pretty awful in the presence of
many equal keys.)

So the planner is certainly capable of finding the desired plan, even
without any tweaking of the query text.  This means that what we have
is mainly a statistical problem.  Have you ANALYZEd these tables
recently?  If so, may we see the pg_stats rows for incidentid in all
three tables?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] slow joining very large table to smaller ones

2005-07-14 Thread John A Meinel

Dan Harris wrote:
>
> On Jul 14, 2005, at 9:42 AM, John A Meinel wrote:

...
Did you try doing this to see how good the planners selectivity
estimates are?

>> Well, postgres is estimating around 500 rows each, is that way off?  Try
>> just doing:
>> EXPLAIN ANALYZE SELECT incidentid FROM k_b WHERE id = 107;
>> EXPLAIN ANALYZE SELECT incidentid FROM k_r WHERE id = 94;

These should be fast queries.

John
=:->


signature.asc
Description: OpenPGP digital signature

Re: [PERFORM] slow joining very large table to smaller ones

2005-07-14 Thread Michael Stone


On Thu, Jul 14, 2005 at 04:29:58PM -0600, Dan Harris wrote:
Ok, I tried this one.  My ssh keeps getting cut off by a router  
somewhere between me and the server due to inactivity timeouts, so  
all I know is that both the select and explain analyze are taking  
over an hour to run.


Try running the query as a script with nohup & redirect the output to a
file.

Mike Stone

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] slow joining very large table to smaller ones

2005-07-14 Thread Dan Harris



On Jul 14, 2005, at 9:42 AM, John A Meinel wrote:



You might try giving it a little bit more freedom with:

EXPLAIN ANALYZE
SELECT recordtext FROM eventactivity, k_r, k_b
 WHERE eventactivity.incidentid = k_r.incidentid
   AND eventactivity.incidentid = k_b.incidentid
   AND k_r.id = 94
   AND k_b.id = 107
-- AND k_r.incidentid = k_b.incidentid
;

I'm pretty sure that would give identical results, just let the  
planner

have a little bit more freedom about how it does it.
Also the last line is commented out, because I think it is redundant.



Ok, I tried this one.  My ssh keeps getting cut off by a router  
somewhere between me and the server due to inactivity timeouts, so  
all I know is that both the select and explain analyze are taking  
over an hour to run.  Here's the explain select for that one, since  
that's the best I can get.


explain select recordtext from eventactivity,k_r,k_b where  
eventactivity.incidentid = k_r.incidentid and  
eventactivity.incidentid = k_b.incidentid and k_r.id = 94 and k_b.id  
= 107;

  QUERY PLAN
 
--

Merge Join  (cost=9624.61..4679590.52 rows=151009549 width=35)
   Merge Cond: (("outer".incidentid)::text = "inner"."?column2?")
   ->  Merge Join  (cost=4766.92..4547684.26 rows=16072733 width=117)
 Merge Cond: (("outer".incidentid)::text = "inner"."?column2?")
 ->  Index Scan using eventactivity1 on eventactivity   
(cost=0.00..4186753.16 rows=46029271 width=49)

 ->  Sort  (cost=4766.92..4771.47 rows=1821 width=68)
   Sort Key: (k_b.incidentid)::text
   ->  Index Scan using k_b_idx on k_b   
(cost=0.00..4668.31 rows=1821 width=68)

 Index Cond: (id = 107)
   ->  Sort  (cost=4857.69..4862.39 rows=1879 width=68)
 Sort Key: (k_r.incidentid)::text
 ->  Index Scan using k_r_idx on k_r  (cost=0.00..4755.52  
rows=1879 width=68)

   Index Cond: (id = 94)
(13 rows)




You might also try:
EXPLAIN ANALYZE
SELECT recordtext
  FROM eventactivity JOIN k_r USING (incidentid)
  JOIN k_b USING (incidentid)
 WHERE k_r.id = 94
   AND k_b.id = 107
;



Similar results here.  The query is taking at least an hour to finish.

explain select recordtext from eventactivity join k_r using  
( incidentid ) join k_b using (incidentid ) where k_r.id = 94 and  
k_b.id = 107;


 QUERY PLAN
 
--

Merge Join  (cost=9542.77..4672831.12 rows=148391132 width=35)
   Merge Cond: (("outer".incidentid)::text = "inner"."?column2?")
   ->  Merge Join  (cost=4726.61..4542825.87 rows=15930238 width=117)
 Merge Cond: (("outer".incidentid)::text = "inner"."?column2?")
 ->  Index Scan using eventactivity1 on eventactivity   
(cost=0.00..4184145.43 rows=46000104 width=49)

 ->  Sort  (cost=4726.61..4731.13 rows=1806 width=68)
   Sort Key: (k_b.incidentid)::text
   ->  Index Scan using k_b_idx on k_b   
(cost=0.00..4628.92 rows=1806 width=68)

 Index Cond: (id = 107)
   ->  Sort  (cost=4816.16..4820.82 rows=1863 width=68)
 Sort Key: (k_r.incidentid)::text
 ->  Index Scan using k_r_idx on k_r  (cost=0.00..4714.97  
rows=1863 width=68)

   Index Cond: (id = 94)
(13 rows)




Also, if possible give us the EXPLAIN ANALYZE so that we know if the
planner is making accurate estimates. (You might send an EXPLAIN while
waiting for the EXPLAIN ANALYZE to finish)

You can also try disabling merge joins, and see how that changes  
things.




Are there any negative sideaffects of doing this?








Well, postgres is estimating around 500 rows each, is that way off?  
Try

just doing:
EXPLAIN ANALYZE SELECT incidentid FROM k_b WHERE id = 107;
EXPLAIN ANALYZE SELECT incidentid FROM k_r WHERE id = 94;

And see if postgres estimates the number of rows properly.

I assume you have recently VACUUM ANALYZEd, which means you might need
to update the statistics target (ALTER TABLE k_b ALTER COLUMN
incidientid SET STATISTICS 100) default is IIRC 10, ranges from  
1-1000,

higher is more accurate, but makes ANALYZE slower.




Right now, it looks like pg is trying to sort the entire   
eventactivity
table for the merge join which is taking several  minutes to do.   
Can I
rephrase this so that it does the searching  through k_r and k_b  
FIRST
and then go into eventactivity using the  index on incidentid?  It  
seems

like that shouldn't be too hard to  make fast but my SQL query skills
are only average.



To me, it looks like it is doing an index scan (on k_b.id) through k_b
first, sorting the results by incidentid, then merge joining that with
eventactivity.

I'm guessing you actually want it to merge k_b and k_r to get extra
selec

Re: [PERFORM] slow joining very large table to smaller ones

2005-07-14 Thread John A Meinel

Dan Harris wrote:
> I'm trying to improve the speed of this query:
>
> explain select recordtext from eventactivity inner join ( select
> incidentid from k_r where id = 94 ) a using ( incidentid ) inner join  (
> select incidentid from k_b where id = 107 ) b using ( incidentid );

You might try giving it a little bit more freedom with:

EXPLAIN ANALYZE
SELECT recordtext FROM eventactivity, k_r, k_b
 WHERE eventactivity.incidentid = k_r.incidentid
   AND eventactivity.incidentid = k_b.incidentid
   AND k_r.id = 94
   AND k_b.id = 107
-- AND k_r.incidentid = k_b.incidentid
;

I'm pretty sure that would give identical results, just let the planner
have a little bit more freedom about how it does it.
Also the last line is commented out, because I think it is redundant.

You might also try:
EXPLAIN ANALYZE
SELECT recordtext
  FROM eventactivity JOIN k_r USING (incidentid)
  JOIN k_b USING (incidentid)
 WHERE k_r.id = 94
   AND k_b.id = 107
;

Also, if possible give us the EXPLAIN ANALYZE so that we know if the
planner is making accurate estimates. (You might send an EXPLAIN while
waiting for the EXPLAIN ANALYZE to finish)

You can also try disabling merge joins, and see how that changes things.

>   QUERY PLAN
> 
> --
> Merge Join  (cost=2747.29..4249364.96 rows=11968693 width=35)
>Merge Cond: (("outer".incidentid)::text = "inner"."?column2?")
>->  Merge Join  (cost=1349.56..4230052.73 rows=4413563 width=117)
>  Merge Cond: (("outer".incidentid)::text = "inner"."?column2?")
>  ->  Index Scan using eventactivity1 on eventactivity
> (cost=0.00..4051200.28 rows=44519781 width=49)
>  ->  Sort  (cost=1349.56..1350.85 rows=517 width=68)
>Sort Key: (k_b.incidentid)::text
>->  Index Scan using k_b_idx on k_b   (cost=0.00..1326.26
> rows=517 width=68)
>  Index Cond: (id = 107)
>->  Sort  (cost=1397.73..1399.09 rows=542 width=68)
>  Sort Key: (k_r.incidentid)::text
>  ->  Index Scan using k_r_idx on k_r  (cost=0.00..1373.12
> rows=542 width=68)
>Index Cond: (id = 94)
> (13 rows)
>
>
> There are many millions of rows in eventactivity.  There are a few
> ten-thousand rows in k_r and k_b.  There is an index on 'incidentid'  in
> all three tables.  There should only be less than 100 rows matched  in
> k_r and k_b total.  That part on its own is very very fast.  But,  it
> should have those 100 or so incidentids extracted in under a  second and
> then go into eventactivity AFTER doing that.  At least,  that's my
> intention to make this fast.

Well, postgres is estimating around 500 rows each, is that way off? Try
just doing:
EXPLAIN ANALYZE SELECT incidentid FROM k_b WHERE id = 107;
EXPLAIN ANALYZE SELECT incidentid FROM k_r WHERE id = 94;

And see if postgres estimates the number of rows properly.

I assume you have recently VACUUM ANALYZEd, which means you might need
to update the statistics target (ALTER TABLE k_b ALTER COLUMN
incidientid SET STATISTICS 100) default is IIRC 10, ranges from 1-1000,
higher is more accurate, but makes ANALYZE slower.

>
> Right now, it looks like pg is trying to sort the entire  eventactivity
> table for the merge join which is taking several  minutes to do.  Can I
> rephrase this so that it does the searching  through k_r and k_b FIRST
> and then go into eventactivity using the  index on incidentid?  It seems
> like that shouldn't be too hard to  make fast but my SQL query skills
> are only average.

To me, it looks like it is doing an index scan (on k_b.id) through k_b
first, sorting the results by incidentid, then merge joining that with
eventactivity.

I'm guessing you actually want it to merge k_b and k_r to get extra
selectivity before joining against eventactivity.
I think my alternate forms would let postgres realize this. But if not,
you could try:

EXPLAIN ANALYZE
SELECT recordtext FROM eventactivity
 JOIN (SELECT incidentid FROM k_r JOIN k_b USING (incidentid)
WHERE k_r.id = 94 AND k_b.id = 107)
USING (incidentid);

I don't know how selective your keys are, but one of these queries
should probably structure it better for the planner. It depends a lot on
how selective your query is.
If you have 100M rows, the above query looks like it expects k_r to
restrict it to 44M rows, and k_r + k_b down to 11M rows, which really
should be a seq scan (> 10% of the rows = seq scan). But if you are
saying the selectivity is mis-estimated it could be different.

John
=:->
>
> Thanks
> -Dan
>
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
>
>   http://www.postgresql.org/docs/faq
>

signature.asc
Description: OpenPGP digital signature

[PERFORM] slow joining very large table to smaller ones

2005-07-14 Thread Dan Harris


I'm trying to improve the speed of this query:

explain select recordtext from eventactivity inner join ( select  
incidentid from k_r where id = 94 ) a using ( incidentid ) inner join  
( select incidentid from k_b where id = 107 ) b using ( incidentid );

  QUERY PLAN
 
--

Merge Join  (cost=2747.29..4249364.96 rows=11968693 width=35)
   Merge Cond: (("outer".incidentid)::text = "inner"."?column2?")
   ->  Merge Join  (cost=1349.56..4230052.73 rows=4413563 width=117)
 Merge Cond: (("outer".incidentid)::text = "inner"."?column2?")
 ->  Index Scan using eventactivity1 on eventactivity   
(cost=0.00..4051200.28 rows=44519781 width=49)

 ->  Sort  (cost=1349.56..1350.85 rows=517 width=68)
   Sort Key: (k_b.incidentid)::text
   ->  Index Scan using k_b_idx on k_b   
(cost=0.00..1326.26 rows=517 width=68)

 Index Cond: (id = 107)
   ->  Sort  (cost=1397.73..1399.09 rows=542 width=68)
 Sort Key: (k_r.incidentid)::text
 ->  Index Scan using k_r_idx on k_r  (cost=0.00..1373.12  
rows=542 width=68)

   Index Cond: (id = 94)
(13 rows)


There are many millions of rows in eventactivity.  There are a few  
ten-thousand rows in k_r and k_b.  There is an index on 'incidentid'  
in all three tables.  There should only be less than 100 rows matched  
in k_r and k_b total.  That part on its own is very very fast.  But,  
it should have those 100 or so incidentids extracted in under a  
second and then go into eventactivity AFTER doing that.  At least,  
that's my intention to make this fast.


Right now, it looks like pg is trying to sort the entire  
eventactivity table for the merge join which is taking several  
minutes to do.  Can I rephrase this so that it does the searching  
through k_r and k_b FIRST and then go into eventactivity using the  
index on incidentid?  It seems like that shouldn't be too hard to  
make fast but my SQL query skills are only average.


Thanks
-Dan

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [PERFORM] slow joining very large table to smaller ones

Re: [PERFORM] slow joining very large table to smaller ones

Re: [PERFORM] slow joining very large table to smaller ones

Re: [PERFORM] slow joining very large table to smaller ones

Re: [PERFORM] slow joining very large table to smaller ones

Re: [PERFORM] slow joining very large table to smaller ones

Re: [PERFORM] slow joining very large table to smaller ones

Re: [PERFORM] slow joining very large table to smaller ones

Re: [PERFORM] slow joining very large table to smaller ones

Re: [PERFORM] slow joining very large table to smaller ones

Re: [PERFORM] slow joining very large table to smaller ones

Re: [PERFORM] slow joining very large table to smaller ones

Re: [PERFORM] slow joining very large table to smaller ones

Re: [PERFORM] slow joining very large table to smaller ones

Re: [PERFORM] slow joining very large table to smaller ones

Re: [PERFORM] slow joining very large table to smaller ones

Re: [PERFORM] slow joining very large table to smaller ones

[PERFORM] slow joining very large table to smaller ones

18 matches

Site Navigation

Mail list logo

Footer information