Re: [PERFORM] Adding new field to big table

RichmondDyes Wed, 30 Apr 2014 15:33:07 -0700

Jeff I think adding the new table is the best way to handle this issue.

From:   Jeff Janes <jeff.ja...@gmail.com>
To:     acanada <acan...@cnio.es>, 
Cc:     postgres performance list <pgsql-performance@postgresql.org>
Date:   03/18/2014 02:05 AM
Subject:        Re: [PERFORM] Adding new field to big table
Sent by:        pgsql-performance-ow...@postgresql.org

On Monday, March 17, 2014, acanada <acan...@cnio.es> wrote:
Hello,

Jeff and Jeffrey thank you for your tips.
This is the explain of the query:
x=> explain update document as d set "svmConfidence" = st.svmconfidence 
from svm_confidence_id_tmp as st where st.id = d.id;
                                               QUERY PLAN                 

---------------------------------------------------------------------------------------------------------
 Update on document d  (cost=4204242.82..61669685.86 rows=124515592 
width=284)
   ->  Hash Join  (cost=4204242.82..61669685.86 rows=124515592 width=284)
         Hash Cond: (d.id = st.id)
         ->  Seq Scan on document d  (cost=0.00..8579122.97 rows=203066697 
width=270)
         ->  Hash  (cost=1918213.92..1918213.92 rows=124515592 width=18)
               ->  Seq Scan on svm_confidence_id_tmp st 
 (cost=0.00..1918213.92 rows=124515592 width=18)
(6 rows)

It's not using the index, most of the rows are beeing updated.
I'm trying with the CTAS solution.

Once this hash join spills to disk, the performance is going to get very 
bad.  The problem is that the outer table is going to get split into 
batches and written to disk.  If this were just a select, that would not 
be a problem because when it reads each batch back in, that is all it 
needs to do as the temp file contains all the necessary info.  But with an 
update, each batch that it reads back in and matches to the inner side, it 
then needs to back to the physical table to do the update, using the ctid 
saved in the batches to find the table tuple.  So in effect this adds a 
nested loop from the hashed copy of the table to the real copy, and the 
locality of reference between those is poor when there are many batches. 
 I'm pretty sure that the extra cost of doing this look up is not taken 
into account by the planner.  But, if it chooses a different plan than a 
hash join, that other plan might also have the same problem.

Some things for you to consider, other than CTAS:

1) Are you analyzing your temporary table before you do the update?  That 
might switch it to a different plan.

2) Make work_mem as large as you can stand, just for the one session that 
runs the update, to try to avoid spilling to disk.

3) If you set enable_hashjoin off temporarily in this session, what plan 
do you get?

0) Why are you creating the temporary table?  You must have some process 
that comes up with the value for the new column to put in the temporary 
table, why not just stick it directly into the original table?

Some things for the PostgreSQL hackers to consider:

1) When the hash spills to disk, it seems to write to disk the entire row 
that is going to be updated (except for the one column which is going to 
be overwritten) plus that tuple's ctid.  It doesn't seem like this is 
necessary, it should only need to write the ctid and the join key (and 
perhaps any quals?).  Since it has to visit the old row anyway to set its 
cmax, it can pull out the rest of the data to make the new tuple while it 
is there.  If it wrote a lot less data to the temp tables it could make a 
lot less batches for the same work_mem, and here the cost is directly 
proportional to the number of batches.  (Also, for the table not being 
updated, it writes the ctid to temp space when there seems to be no use 
for it.)

2) Should the planner account for the scattered reads needed to join to 
the original table on ctid for update from whatever materialized version 
of the table is created?  Of course all other plans would also need to be 
similarly instrumented.  I think most of them would have the same problem 
as the hash_join.  The one type I can think of that doesn't would be a 
merge join in which there is strong correlation between the merge key and 
the ctid order on the table to be updated.

3) It seems like the truly efficient way to run such an update on a very 
large data set would be join the two tables (hash or merge), then sort the 
result on the ctid of the updated table, then do a "merge join" between 
that sorted result and the physical table.  I don't think such a method 
currently is known to the planner, is it?  How hard would it be to make 
it?

The example I have been using is:

alter table pgbench_accounts add filler2 text;
create table foo as select aid, md5(aid::text) from pgbench_accounts;
analyze;
explain (verbose) update pgbench_accounts set filler2 =md5 from foo where 
pgbench_accounts.aid=foo.aid;

Cheers,

Jeff

-- Confidentiality Notice --
This email message, including all the attachments, is for the sole use of the 
intended recipient(s) and contains confidential information. Unauthorized use 
or disclosure is prohibited. If you are not the intended recipient, you may not 
use, disclose, copy or disseminate this information. If you are not the 
intended recipient, please contact the sender immediately by reply email and 
destroy all copies of the original message,
including attachments.

Re: [PERFORM] Adding new field to big table

Reply via email to