Re: [PERFORM] Very slow update statement on 40mio rows

Florian Schröck Tue, 26 Feb 2013 00:01:51 -0800

Hello Merlin,
thanks for the feedback, I forwarded this to my developer, this is an
interesting approach.



-- 

Best regards

Florian Schröck

On 02/19/2013 09:04 PM, Merlin Moncure wrote:
> On Fri, Feb 15, 2013 at 9:32 AM, Florian Schröck <fschro...@aycan.de> wrote:
>> Hello Kevin,
>> not updating every row which doesn't need the update solved the problem!
>> Your query took only 1 minute. :)
>>
>> Thank you so much for the fast response, have a great weekend!
>>
>> PS: When you switch to "TEXT" on the explain URL you can see the final
>> runtime which was 66 minutes with the original statement.
>>
>> Best regards,
>> Florian
>>
>> On 02/15/2013 03:59 PM, Kevin Grittner wrote:
>>> Florian Schröck <fschro...@aycan.de> wrote:
>>>
>>>> UPDATE BackupFiles
>>>>    SET nTapeNr=0, nAFIOCounter=0, nBlockCounter=0,
>>>>        cStatus='NEW'::StatusT, bOnSetBlue=false,
>>>>        bOnSetYellow=false, nLastBackupTS= '0001-01-01 00:00:00'
>>>>    WHERE cStatus='NEW' OR cStatus='WRITING' OR cStatus='ONTAPE';
>>>>
>>>> Explain analyze: http://explain.depesz.com/s/8y5
>>>> The statement takes 60-90 minutes.
>>> The EXPLAIN ANALYZE at that URL shows a runtime of 3 minutes and 41
>>> seconds.
>>>
>>>> I tried to optimize the settings but until now without success.
>>>>
>>>> Can we optimize this update statement somehow? Do you have any
>>>> other ideas?
>>> Are there any rows which would already have the values that you are
>>> setting?  If so, it would be faster to skip those by using this
>>> query:
>>>
>>> UPDATE BackupFiles
>>>   SET nTapeNr=0, nAFIOCounter=0, nBlockCounter=0,
>>>       cStatus='NEW'::StatusT, bOnSetBlue=false,
>>>       bOnSetYellow=false, nLastBackupTS= '0001-01-01 00:00:00'
>>>   WHERE (cStatus='NEW' OR cStatus='WRITING' OR cStatus='ONTAPE')
>>>     AND (nTapeNr <> 0 OR nAFIOCounter <> 0 OR nBlockCounter <> 0
>>>          OR cStatus <> 'NEW'::StatusT
>>>          OR bOnSetBlue IS DISTINCT FROM false
>>>          OR bOnSetYellow IS DISTINCT FROM false
>>>          OR nLastBackupTS <> '0001-01-01 00:00:00');
>>>
>>> Another way to accomplish this is with the
>>> suppress_redundant_updates_trigger() trigger function:
>>>
>>> http://www.postgresql.org/docs/9.2/interactive/functions-trigger.html
> if the number of rows you actually update is not very large relative
> to size of the table, just for fun, try this:
>
>
> CREATE OR REPLACE FUNCTION BakupFilesCandidateReset(BackupFiles)
> RETURNS BOOL AS
> $$
>   SELECT ($1).cStatus IN('NEW', 'WRITING', 'ONTAPE')
>   AND
>     (($1).nTapeNr, ($1).nAFIOCounter, ($1).nBlockCounter,
> ($1).cStatus, ($1).bOnSetBlue, ($1).bOnSetYellow, ($1).nLastBackupTS)
>     IS DISTINCT FROM /* simple != will suffice if values are never null */
>    (0, 0, 0, 'NEW'::StatusT, false, false, '0001-01-01 00:00:00');
> $$ LANGUAGE SQL IMMUTABLE;
>
> CREATE INDEX ON BackupFiles(BakupFilesCandidateReset(BackupFiles))
>   WHERE BakupFilesCandidateReset(BackupFiles);
>
>
> SELECT * FROM BackupFiles WHERE BakupFilesCandidateReset(BackupFiles);
> UPDATE BackupFiles SET ... WHERE BakupFilesCandidateReset(BackupFiles);
> etc
>
> idea here is to maintain partial boolean index representing candidate
> records to update.  plus it's nifty.   this is basic mechanism that
> can be used as foundation for very fast push pull queues.
>
> merlin
>

Re: [PERFORM] Very slow update statement on 40mio rows

Reply via email to