On Thu, Nov 06, 2014 at 02:55:20PM +0000, Shaun Thomas wrote:
>
> These updates aren't equivalent. It's very important you know this, because
> you're also inflating your table with a lot of extra updated rows.
>
> Take the first UPDATE:
>
> > UPDATE second SET time1 = orig.time1
> > FROM orig
> > WHERE second.key1 = orig.key1;
>
> If you wrote this as a SELECT, it would look like this:
>
> SELECT second.time1, orig.time1
> FROM second
> JOIN ORIG ON (second.key1 = orig.key1)
>
> Since second is a many to one subset of orig, you now have several
> simultaneous updates. Your second UPDATE:
>
> > UPDATE second SET time1 = (SELECT orig.time1 FROM orig,second
> > WHERE orig.key1 = second.key1 LIMIT 1);
>
> Is equivalent to this SELECT:
>
> SELECT second.time1,
> (SELECT orig.time1 FROM orig,second
> WHERE orig.key1 = second.key1 LIMIT 1)
> FROM second;
>
> Meaning you'd only get as many updates as there are rows in second. The
> difference is your LIMIT 1. However, since you're not using an ORDER BY
> clause, the actual value you get for time1 will be indeterminate. Something
> like this would remove the row inflation and fix the random time1 behavior,
> but I'm not sure it was your intent:
>
> UPDATE second
> SET time1 = orig.time1
> FROM (SELECT DISTINCT ON (key1) key1, time1
> FROM orig
> ORDER BY key1, time1 DESC) sub
> WHERE second.key1 = sub.key1;
I see now that I made more than one mistake.
1) I forgot to INCLUDE INDEXES when creating second. I would have
seen dup keys when filling it.
CREATE TABLE second (LIKE orig INCLUDING INDEXES);
2) I should have used something like this to fill second:
INSERT INTO second (key1)
SELECT key1 FROM orig
ORDER BY random()
LIMIT 400000;
3) I then incorrectly remembered the query I had written at work. It
should have been:
EXPLAIN ANALYZE
UPDATE second se SET time1 = (SELECT time1 FROM orig
WHERE orig.key1 = se.key1);
Once the second table is filled with unique keys, then both UPDATES
should have produced the same results, but the UPDATE FROM is faster
than the UPDATE = SELECT, which is documented.
My original intent was to find out what the performance differences
between the two are.
Thanks for pointing these things out!
PJ
--
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general