> -----Original Message-----
> From: Vik Fearing [mailto:vik.fear...@2ndquadrant.com]
> Sent: Friday, March 02, 2018 20:56
> To: l...@laurent-hasson.com; pgsql-performance@lists.postgresql.org
> Cc: Stephen Frost <sfr...@snowman.net>
> Subject: Re: Updating large tables without dead tuples
> 
> On 02/24/2018 12:27 AM, l...@laurent-hasson.com wrote:
> > Hello
> >
> >
> >
> > I work with a large and wide table (about 300 million rows, about 50
> > columns), and from time to time, we get business requirements to make
> > some modifications. But sometimes, it's just some plain mistake. This
> > has happened to us a few weeks ago where someone made a mistake and we
> > had to update a single column of a large and wide table. Literally,
> > the source data screwed up a zip code and we had to patch on our end.
> >
> >
> >
> > Anyways. Query ran was:
> >
> >     update T set source_id = substr(sourceId, 2, 10);
> >
> > Took about 10h and created 100's of millions of dead tuples, causing
> > another couple of hours of vacuum.
> >
> >
> >
> > This was done during a maintenance window, and that table is read-only
> > except when we ETL data to it on a weekly basis, and so I was just
> > wondering why I should pay the "bloat" penalty for this type of
> > transaction. Is there a trick that could be use here?
> Yes, there is a trick I like to use here, as long as you don't mind locking 
> the
> table (even against reads).
> 
> I'll assume T.source_id is of type text.  If it's not, use whatever the 
> actual type
> is.
> 
> ALTER TABLE T
>     ALTER COLUMN source_id TYPE text USING substr(sourceId, 2, 10);
> 
> I copied what you had verbatim, I earnestly hope you don't have two columns
> source_id and sourceId in your table.
> 
> This will rewrite the entire table just the same as a VACUUM FULL after your
> UPDATE would.
> 
> Don't forget to VACUUM ANALYZE this table after the operation.  Even though
> there will be no dead rows, you still need to VACUUM it to generate the
> visibility map and you need to ANALYZE it for statistics on your "new" column.
> 
> Foreign keys remain intact with this solution and you don't have double wal
> logging like for an UPDATE.
> --
> Vik Fearing                                          +33 6 46 75 15 36
> http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

[Laurent Hasson] 
Yes, sorry... only a single column source_id. I understand your idea... Is that 
because a TEXT field (vs a varchar) would be considered TOAST and be treated 
differently?

Thanks,
Laurent.

Reply via email to