[PERFORM] 答复: [PERFORM] Postgresql update op is very very slow

2008-06-26 Thread jay
I know the problem, because there are about 35 million rows , which
cost about 12G disk space and checkpoint segments use 64, but update
operation is in one transaction which lead fast fill up the checkpoint
segments and lead do checkpoints frequently, but checkpoints will cost lots
resources, so update operation become slowly and slowly and bgwrite won't
write because it's not commit yet.
Create a new table maybe a quick solution, but it's not appropriated in some
cases.
If we can do commit very 1000 row per round, it may resolve the
problem.
But  PG not support transaction within function yet? 

-邮件原件-
发件人: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] 代表 Heikki Linnakangas
发送时间: 2008年6月25日 18:11
收件人: jay
抄送: pgsql-performance@postgresql.org
主题: Re: [PERFORM] Postgresql update op is very very slow

jay wrote:
 I've a table with about 34601755 rows ,when I execute 'update msg_table
set
 type=0;' is very very slow, cost several hours, but still not complete?
 
 Why postgresql is so slowly? Is the PG MVCC problem? 

Possibly. Because of MVCC, a full-table update will actually create a 
new version of each row.

I presume that's a one-off query, or a seldom-run batch operation, and 
not something your application needs to do often. In that case, you 
could drop all indexes, and recreate them after the update, which should 
help a lot:

BEGIN;
DROP INDEX index name, index name 2, ...; -- for each index
UPDATE msg_table SET type = 0;
CREATE INDEX ... -- Recreate indexes
COMMIT;

Or even better, instead of using UPDATE, do a SELECT INTO a new table, 
drop the old one, and rename the new one in its place. That has the 
advantage that the new table doesn't contain the old row version, so you 
don't need to vacuum right away to reclaim the space.

Actually, there's an even more clever trick to do roughly the same thing:

ALTER TABLE msg_table ALTER COLUMN type TYPE int4 USING 0;

(assuming type is int4, replace with the actual data type if necessary)

This will rewrite the table, similar to a DROP + CREATE, and rebuild all 
indexes. But all in one command.

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: [PERFORM] 答复: [PERFORM] Postgresql update op is very very slow

2008-06-26 Thread Pavan Deolasee
2008/6/26 jay [EMAIL PROTECTED]:

If we can do commit very 1000 row per round, it may resolve the
 problem.
 But  PG not support transaction within function yet?


Yeah, transaction control is not supported inside functions. There are
some hacks using dblink to do transactions inside functions. You may
want to check that out.

I had suggested another hack in the past for very simplistic updates,
when you are sure that the tuple length does not change between
updates and you are ready to handle half updated table if there is a
crash or failure in between. May be for your case, where you are
updating a single column of the entire table and setting it to some
default value for all the rows, it may work fine. But please be aware
of data consistency issues before you try that. And it must be once in
a lifetime kind of hack.

http://postgresql-in.blogspot.com/2008/04/postgresql-in-place-update.html

Thanks,
Pavan


-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 答复: [PERFORM] Postgresql update op is very very slow

2008-06-26 Thread Tom Lane
jay [EMAIL PROTECTED] writes:
   I know the problem, because there are about 35 million rows , which
 cost about 12G disk space and checkpoint segments use 64, but update
 operation is in one transaction which lead fast fill up the checkpoint
 segments and lead do checkpoints frequently, but checkpoints will cost lots
 resources, so update operation become slowly and slowly and bgwrite won't
 write because it's not commit yet.
 Create a new table maybe a quick solution, but it's not appropriated in some
 cases.
   If we can do commit very 1000 row per round, it may resolve the
 problem.

No, that's utterly unrelated.  Transaction boundaries have nothing to do
with checkpoints.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: [PERFORM] Re: [PERFORM] 答复: [PERFORM] Postgresql update op is very very slow

2008-06-26 Thread Scott Marlowe
2008/6/26 Pavan Deolasee [EMAIL PROTECTED]:
 2008/6/26 jay [EMAIL PROTECTED]:

If we can do commit very 1000 row per round, it may resolve the
 problem.
 But  PG not support transaction within function yet?


 Yeah, transaction control is not supported inside functions. There are
 some hacks using dblink to do transactions inside functions. You may
 want to check that out.

If you need autonomous transactions.  For most people save points and
catching seem to be a n acceptable form of transaction control.

 I had suggested another hack in the past for very simplistic updates,
 when you are sure that the tuple length does not change between
 updates and you are ready to handle half updated table if there is a
 crash or failure in between. May be for your case, where you are
 updating a single column of the entire table and setting it to some
 default value for all the rows, it may work fine. But please be aware
 of data consistency issues before you try that. And it must be once in
 a lifetime kind of hack.

 http://postgresql-in.blogspot.com/2008/04/postgresql-in-place-update.html

In a way that's what pg_bulkloader does.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: [PERFORM] 答复: [PERFORM] Postgresql update op is very very slow

2008-06-26 Thread Scott Marlowe
2008/6/26 Tom Lane [EMAIL PROTECTED]:
 jay [EMAIL PROTECTED] writes:
   I know the problem, because there are about 35 million rows , which
 cost about 12G disk space and checkpoint segments use 64, but update
 operation is in one transaction which lead fast fill up the checkpoint
 segments and lead do checkpoints frequently, but checkpoints will cost lots
 resources, so update operation become slowly and slowly and bgwrite won't
 write because it's not commit yet.
 Create a new table maybe a quick solution, but it's not appropriated in some
 cases.
   If we can do commit very 1000 row per round, it may resolve the
 problem.

 No, that's utterly unrelated.  Transaction boundaries have nothing to do
 with checkpoints.

True.  But if you update 1 rows and vacuum you can keep the bloat
to something reasonable.

On another note, I haven't seen anyone suggest adding the appropriate
where clause to keep from updating rows that already match.  Cheap
compared to updating the whole table even if a large chunk aren't a
match.  i.e.

... set col=0 where col 0;

That should  be the first thing you reach for in this situation, if it can help.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance