Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

PFC Wed, 26 Jan 2005 11:51:58 -0800


http://borg.postgresql.org/docs/8.0/interactive/storage-page-layout.html

If you vacuum as part of the transaction it's going to be more efficient
of resources, because you have more of what you need right there (ie:
odds are that you're on the same page as the old tuple). In cases like
that it very likely makes a lot of sense to take a small hit in your
transaction time up-front, instead of a larger hit doing a vacuum down
the road.

Some pros would be that you're going to make a disk write anyway because the page is modified, so why not vacuum that page while it's there. If the machine is CPU bound you lose, if it's IO bound you save some IO, but the cost of index updates has to be taken into account...

        It prompted a few questions :

Note : temp contains 128k (131072) values generated from a sequence.

create table test (id serial primary key, a integer, z integer, e integer, r integer, t integer, y integer ) without oids; insert into test (id,a,z,e,r,t,y) select id,0,0,0,0,0,0 from temp; INSERT 0 131072

explain analyze update test set y=1; Seq Scan on test (cost=0.00..2226.84 rows=126284 width=30) (ac Seq Scan on test (cost=0.00..2274.72 rows=131072 width=30) (actual time=0.046..964.590 rows=131072 loops=1) Total runtime: 15628.143 ms tual time=0.047..617.553 rows=131072 loops=1) Total runtime: 4432.509 ms

explain analyze update test set y=1; Seq Scan on test (cost=0.00..4453.68 rows=252568 width=30) (actual time=52.198..611.594 rows=131072 loops=1) Total runtime: 5739.064 ms

explain analyze update test set y=1; Seq Scan on test (cost=0.00..6680.52 rows=378852 width=30) (actual time=127.301..848.762 rows=131072 loops=1) Total runtime: 6548.206 ms

Gets slower as more and more dead tuples accumulate... normal as this is a seq scan. Note the row estimations getting bigger with the table size...

vacuum full test; explain analyze update test set y=1; Seq Scan on test (cost=0.00..2274.72 rows=131072 width=30) (actual time=0.019..779.864 rows=131072 loops=1) Total runtime: 5600.311 ms

vacuum full test; explain analyze update test set y=1; Seq Scan on test (cost=0.00..2274.72 rows=131072 width=30) (actual time=0.039..1021.847 rows=131072 loops=1) Total runtime: 5126.590 ms

-> Seems vacuum full does its job....

vacuum test; explain analyze update test set y=1; Seq Scan on test (cost=0.00..3894.08 rows=196608 width=30) (actual time=36.491..860.135 rows=131072 loops=1) Total runtime: 7293.698 ms

vacuum test; explain analyze update test set y=1; Seq Scan on test (cost=0.00..3894.08 rows=196608 width=30) (actual time=0.044..657.125 rows=131072 loops=1) Total runtime: 5934.141 ms

vacuum analyze test; explain analyze update test set y=1; Seq Scan on test (cost=0.00..3894.08 rows=196608 width=30) (actual time=0.018..871.132 rows=131072 loops=1) Total runtime: 5548.053 ms

-> here vacuum is about as slow as vacuum full (which is normal as the whole table is updated) however the row estimation is still off even after ANALYZE.


 Let's create a few indices :

vacuum full test;
create index testa on test(a);
create index testz on test(z);
create index teste on test(e);
create index testr on test(r);
create index testt on test(t);
-- we don't create an index on y

vacuum full test; explain analyze update test set a=id; Seq Scan on test (cost=0.00..2274.72 rows=131072 width=30) (actual time=0.044..846.102 rows=131072 loops=1) Total runtime: 14998.307 ms

We see that the index updating time has made this query a lot slower. This is normal, but :

vacuum full test; explain analyze update test set a=id; Seq Scan on test (cost=0.00..2274.72 rows=131072 width=30) (actual time=0.045..1387.626 rows=131072 loops=1) Total runtime: 17644.368 ms

Now, we updated ALL rows but didn't actually change a single value. However it took about the same time as the first one. I guess the updates all really took place, even if all it did was copy the rows with new transaction ID's. Now, let's update a column which is not indexed :

vacuum full test; explain analyze update test set y=id; Seq Scan on test (cost=0.00..2274.72 rows=131072 width=30) (actual time=0.046..964.590 rows=131072 loops=1) Total runtime: 15628.143 ms

Takes 'bout the same time : the indexes still have to be updated to reference the new rows after all.

So, here is something annoying with the current approach : Updating rows in a table bloats ALL indices, not just those whose indexed values have been actually updated. So if you have a table with many indexed fields and you often update some obscure timestamp field, all the indices will bloat, which will of course be corrected by VACUUM, but vacuum will have extra work to do.

        I don't have suggestions, just questions :

Is there a way that an update to the indices can be avoided if the indexed values do not change ? Would it depend if an updated tuple can be stored on the same page it was before (along with the old version) ? If the answer is Yes : - would saving the cost of updating the indexes pay off over vacuuming the page on the run to try to squeeze the new tuple version in ? - would it be interesting to specify for each table a target % of free space ('air holes') in pages for vacuum to try to achieve, in order to be able to insert updated row versions on the same page they were before, and save index updates ?

        Regards...

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
     subscribe-nomail command to [EMAIL PROTECTED] so that your
     message can get through to the mailing list cleanly

Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

Reply via email to