Re: [GENERAL] Update big table

Haiming Zhang Wed, 17 Jul 2013 17:52:02 -0700

Thanks.

I have fixed the problem. And now the query can run in 4 mins for 1.5 million 
records in TABLE1 and 0.5 million records in TABLE2.


The solution is I created a function that gets all event_id from TABLE2 then 
travel through each event_id and do the update for TABLE1. This removes the 
side effects of using IN predicate. Also I have
                1. set FILLFACTOR = 70
                2. remove trigger and index before the function, create index 
based on event_id
                3. add trigger and index back after run function.

Regards,
Haiming

From: BladeOfLight16 [mailto:[email protected]]
Sent: Monday, 15 July 2013 5:26 AM
To: [email protected]; Haiming Zhang
Subject: Re: [GENERAL] Update big table

I don't believe you can use JOIN explicitly in this situation because it's an 
UPDATE, but I believe you can accomplish the same effect with FROM and WHERE.

UPDATE table SET column1 = TRUE
FROM table2
WHERE table1.event_id = table2.event_id;

I would make sure there's an index on table2.event_id if it's not the PK. If 
it's part of a composite key, either make sure that table2.event_id is the 
first column in the index (PostgreSQL will use an index like that; right?) or 
create an index where it is.

(Sorry. Forgot to hit Reply All.)
On Sun, Jul 14, 2013 at 7:03 AM, Vincenzo Romano 
<[email protected]<mailto:[email protected]>> wrote:
2013/7/14 Haiming Zhang 
<[email protected]<mailto:[email protected]>>:
> Hi All,
>
>
>
> I am using postgres 9.1, I have a question about updating big table. Here is
> the basic information of this table.
>
>                 1. This table has one primary key and other 11 columns.
>
>                 2.  It also has a trigger that before update records,
> another table got updated first.
>
>                 3. The has millions of records now.
>                 4. I am not allowed to delete records in this table when
> UPDATE
>
> The problem is when I do a "Update" query it takes a long time to execute.
> Eg. when I run query like this " update TABLE set column1 = true where
> EVENT_ID in (select EVENT_ID from TABLE2 );" , it took hours to update the
> whole table. In order to optimize the update speed. I tried the following
> strategies:
>
>                 1. create index based on primary key, column1 and
> combination of primary key and column1.
>
>                 2. Alter FILLFACTOR = 70, vacuum all and then reindex
>
>                 3. drop trigger before update
>
> Then I use "EXPLAIN" to estimate query plan, all of the above strategies do
> not improve the UPDATE speed dramatically.
>
>
>
> Please comments on my three strategies (eg, does I index too many columns in
> 1?) and please advise me how to improve the update speed. Any advice is
> welcomed. I appreciate all you help.
>
>
>
> Thanks,
>
>
>
> Regards,
>
> Haiming
A JOIN would solve your speed problem.
The IN() predicate is the cause.
AFAIK.


--
Sent via pgsql-general mailing list 
([email protected]<mailto:[email protected]>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


________________________________
If you are not an authorised recipient of this e-mail, please contact me at 
Redflex immediately by return phone call or by email. In this case, you should 
not read, print, retransmit, store or act in reliance on this e-mail or any 
attachments, and should destroy all copies of them. This e-mail and any 
attachments are confidential and may contain privileged information and/or 
copyright material of Redflex or third parties. You should only retransmit, 
distribute or commercialise the material if you are authorised to do so. This 
notice should not be removed.

Re: [GENERAL] Update big table

Reply via email to