Re: [SPAM] - [GENERAL] Performance of UPDATE SET = FROM vs UPDATE SET = (SELECT ...)

Igor Neyman Mon, 03 Nov 2014 11:02:33 -0800


-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of [email protected]
Sent: Monday, November 03, 2014 11:34 AM
To: [email protected]
Subject: [SPAM] - [GENERAL] Performance of UPDATE SET = FROM vs UPDATE SET = 
(SELECT ...)


Why does the UPDATE SET = FROM choose a more poorly performing plan than the 
UPDATE SET = (SELECT ...)?  It seems to me that it is the same join.
 
I'm using 9.3.5.  

CREATE TABLE orig
(
        key1    VARCHAR(11) PRIMARY KEY,
        time1   TIME
);

INSERT INTO orig (key1, time1)
SELECT 
        a::TEXT,
        (((random()*100)::INT % 24)::TEXT || ':' ||
        ((random()*100)::INT % 60)::TEXT)::TIME FROM 
generate_series(80000000000, 80002000000) a;

CREATE INDEX odx ON orig(key1);

CREATE TABLE second (LIKE orig);
 
INSERT INTO second (key1) 
        SELECT (80000000000+(((random()*1000000)::INT) % 1000000))::TEXT
        FROM generate_series(1,400000);

EXPLAIN ANALYZE
UPDATE second SET time1 = orig.time1
FROM orig
WHERE second.key1 = orig.key1;

                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Update on second  (cost=69461.02..106082.02 rows=400000 width=32) (actual 
time=16033.023..16033.023 rows=0 loops=1)
   ->  Hash Join  (cost=69461.02..106082.02 rows=400000 width=32) (actual 
time=7698.445..12992.039 rows=400000 loops=1)
         Hash Cond: ((second.key1)::text = (orig.key1)::text)
         ->  Seq Scan on second  (cost=0.00..12627.00 rows=400000 width=18) 
(actual time=49.820..791.397 rows=400000 loops=1)
         ->  Hash  (cost=31765.01..31765.01 rows=2000001 width=26) (actual 
time=7648.540..7648.540 rows=2000001 loops=1)
               Buckets: 4096  Batches: 128  Memory Usage: 717kB
               ->  Seq Scan on orig  (cost=0.00..31765.01 rows=2000001 
width=26) (actual time=0.014..3655.844 rows=2000001 loops=1)  Total runtime: 
16033.193 ms
(8 rows)

UPDATE second SET time1 = NULL;
        
EXPLAIN ANALYZE
UPDATE second SET time1 = (SELECT orig.time1 FROM orig,second
                        WHERE orig.key1 = second.key1 LIMIT 1);


                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Update on second  (cost=3.60..19078.19 rows=1279959 width=18) (actual 
time=4642.453..4642.453 rows=0 loops=1)
   InitPlan 1 (returns $1)
     ->  Limit  (cost=0.43..3.60 rows=1 width=8) (actual time=2.611..2.613 
rows=1 loops=1)
           ->  Nested Loop  (cost=0.43..4056331.83 rows=1279959 width=8) 
(actual time=2.606..2.606 rows=1 loops=1)
                 ->  Seq Scan on second second_1  (cost=0.00..19074.59 
rows=1279959 width=12) (actual time=2.487..2.487 rows=1 loops=1)
                 ->  Index Scan using odx on orig  (cost=0.43..3.14 rows=1 
width=20) (actual time=0.098..0.098 rows=1 loops=1)
                       Index Cond: ((key1)::text = (second_1.key1)::text)
   ->  Seq Scan on second  (cost=0.00..19074.59 rows=1279959 width=18) (actual 
time=6.420..817.739 rows=400000 loops=1)  Total runtime: 4642.561 ms
(9 rows)


These 2 queries are not the same.

The first query updates rows in the "second" table with the orig.time1 values 
based on key1 column match.
The second query finds first possible match (based on key1 column) and assigns 
orig.time1 value from the matched row to every record in "second" table.

Regards,
Igor Neyman



-- 
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [SPAM] - [GENERAL] Performance of UPDATE SET = FROM vs UPDATE SET = (SELECT ...)

Reply via email to