scott.marlowe wrote:

On Fri, 20 Feb 2004, Sean Shanny wrote:



max_connections = 100

# - Memory -

shared_buffers = 16000 # min 16, at least max_connections*2, 8KB each
sort_mem = 256000 # min 64, size in KB



You might wanna drop sort_mem somewhat and just set it during your imports to something big like 512000 or larger. That way with 100 users during the day you won't have to worry about swap storms, and when you run your updates, you get all that sort_mem.




Actual row count in the temp table:

select count(*) from referral_temp ;
 502347

Actual row count in d_referral table:

select count(*) from d_referral ;
 27908024


Note: that an analyze had not been performed on the referral_temp table prior to the explain analyze run.


explain analyze SELECT t1.id, t2.md5, t2.url from referral_temp t2 LEFT OUTER JOIN d_referral t1 ON t2.md5 = t1.referral_md5

Nested Loop Left Join (cost=0.00..3046.00 rows=1001 width=68) (actual time=136.513..6440616.541 rows=502347 loops=1)
-> Seq Scan on referral_temp t2 (cost=0.00..20.00 rows=1000 width=64) (actual time=21.730..10552.421 rows=502347 loops=1)
-> Index Scan using d_referral_referral_md5_key on d_referral t1 (cost=0.00..3.01 rows=1 width=40) (actual time=12.768..14.022 rows=1 loops=502347)
Index Cond: ("outer".md5 = t1.referral_md5)



Thanks.


--sean
Total runtime: 6441969.698 ms
(5 rows)


Here is an explain analyze after the analyze was done. Unfortunately I think a lot of the data was still in cache when I did this again :-(


explain analyze SELECT t1.id, t2.md5, t2.url from referral_temp t2 LEFT OUTER JOIN d_referral t1 ON t2.md5 = t1.referral_md5;

Nested Loop Left Join (cost=0.00..1468759.69 rows=480082 width=149) (actual time=69.576..3226854.850 rows=502347 loops=1)
-> Seq Scan on referral_temp t2 (cost=0.00..16034.81 rows=480081 width=145) (actual time=11.206..4003.521 rows=502347 loops=1)
-> Index Scan using d_referral_referral_md5_key on d_referral t1 (cost=0.00..3.01 rows=1 width=40) (actual time=6.396..6.402 rows=1 loops=502347)
Index Cond: ("outer".md5 = t1.referral_md5)
Total runtime: 3227830.752 ms



Hmmm. It looks like postgresql is still picking a nested loop when it should be sorting something faster. Try doing a "set enable_nestloop = off" and see what you get.


New results with the above changes: (Rather a huge improvement!!!) Thanks Scott. I will next attempt to make the cpu_* changes to see if it the picks the correct plan.

explain analyze SELECT t1.id, t2.md5, t2.url from referral_temp t2 LEFT OUTER JOIN d_referral t1 ON t2.md5 = t1.referral_md5;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Hash Left Join (cost=1669281.60..3204008.48 rows=480082 width=149) (actual time=157221.125..-412311.378 rows=502347 loops=1)
Hash Cond: ("outer".md5 = "inner".referral_md5)
-> Seq Scan on referral_temp t2 (cost=0.00..16034.81 rows=480081 width=145) (actual time=11.537..1852.336 rows=502347 loops=1)
-> Hash (cost=1356358.48..1356358.48 rows=30344048 width=40) (actual time=157187.530..157187.530 rows=0 loops=1)
-> Seq Scan on d_referral t1 (cost=0.00..1356358.48 rows=30344048 width=40) (actual time=14.134..115048.285 rows=27908024 loops=1)
Total runtime: 212595.909 ms
(6 rows)


Time: 213094.984 ms
tripmaster=# explain analyze SELECT t1.id, t2.md5, t2.url from url_temp t2 LEFT OUTER JOIN d_url t1 ON t2.md5 = t1.url_md5;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Hash Left Join (cost=2023843.40..3157938.15 rows=1379872 width=191) (actual time=178150.113..867074.579 rows=1172920 loops=1)
Hash Cond: ("outer".md5 = "inner".url_md5)
-> Seq Scan on url_temp t2 (cost=0.00..50461.72 rows=1379872 width=187) (actual time=6.597..6692.324 rows=1172920 loops=1)
-> Hash (cost=1734904.72..1734904.72 rows=28018272 width=40) (actual time=178124.568..178124.568 rows=0 loops=1)
-> Seq Scan on d_url t1 (cost=0.00..1734904.72 rows=28018272 width=40) (actual time=16.912..2639059.078 rows=23239137 loops=1)
Total runtime: 242846.965 ms
(6 rows)


Time: 243190.900 ms

If that makes it faster, you may want to adjust the costs of the cpu_* stuff higher to see if that can force it to do the right thing.

Looking at the amount of time taken by the nested loop, it looks like the problem to me.

And why are you doing a left join of ONE row from one table against the whole temp table? Do you really need to do that? since there's only one row in the source table, and I'd guess is only matches one or a few rows from the temp table, this means you're gonna have that one row and a bunch of null filled rows to go with it.





---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly

Reply via email to