Hi

Running PostgreSQL 9.4, I am running in the following issue.
On a huge table, I have to remove the content (set to '') of a column that 
makes for 99% of the weight of the table. Since the production can not be 
stopped, I will not be able to do a VACUUM FULL nor any direct rewrite of the 
table, so I tried the following trick to drop the content while reducing the 
table weight. (I re-indented it for this email)

ppr=500
for i in `seq 0 60` ; do 
        psql $MYDB -c "UPDATE $TABLE 
                SET content = ''::bytea 
                WHERE ctid = ANY(ARRAY(
                        SELECT ('(' || p.i || ',' || s.i || ')')::tid
                        FROM generate_series(`$i * $ppr`, `($i + 1) * $ppr`) 
p(i),
                                 generate_series(0, 2048) AS s(i)
                ));"
        psql $MYDB -c "VACUUM $TABLE;"
done

This reduced my table from 1200MB down to 240MB, 29000 pages, while I expected 
the final result to be even lower. I did a copy of the table, and it needed 
only 30MB, 3800 pages, so there was still something wrong.
I did a simple (and slow) query to get the number of rows per page on both 
table. The new one had slightly more rows per page, but this was in no way 
enough to justify such a gap. Then I saw that after page 2080 the table had, 
for the following 27000 pages, between 0 and 5 rows per page. So vacuum could 
not remove the pages because there were gaps.
I figured out that, after a vacuum, updating the rows at the end of the table 
should be put back at the beginning of the free space and thus after an update 
of the rows of the last 27000 pages the table could be reduced to the optimal 
size.
But when I raw the following, I was very surprised (NB: I know this query does 
a full scan, I won't do it on a 100GB table, I promise)

VACUUM VERBOSE $TABLE;

SELECT count(*) FROM $TABLE WHERE ctid > '(29000,0)';
=> 29 rows
UPDATE $TABLE SET content = ''::bytea WHERE ctid > '(29000,0)';
=> 29 rows updated
SELECT count(*) FROM $TABLE WHERE ctid > '(29000,0)';
==> 29 rows again !

So instead of filling huge gaps (I've got for instance an 80 pages gap, from 
id 2082 to id 2160), Pg put the data back at the end of the table.


I'm quite puzzled by this situation. I believed PostgreSQL would use the free 
space to put the new data, so my update loop would give me a clean, «packed» 
table.

What behaviour did I miss here ? How can I get PostgreSQL to use that free 
space without falling back to a vacuum full ? (And without using tools like 
pg_repack if possible, because this seems like not needed in that situation)


Thanks

 Pierre

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to