Re: [GENERAL] Run statements before pg_dump in same transaction?
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 François Beausoleil asked: > To that end, we'd like to anonymize the data before it leaves the database > server. > > One solution we thought of would be to run statements prior to pg_dump, but > within > the same transaction, something like this: > > BEGIN; > UPDATE users SET email = 'dev+' || id || '@example.com', password_hash = '/* > hash of "password" */', ...; > -- launch pg_dump as usual, ensuring a ROLLBACK at the end ... > Is there a ready-made solution for this? No - at least not with generating a dump and scrubbing *before* the data comes out. Some other ideas: * Periodically do a full dump to another database under your control, sanitize the data, and make all dev dumps come from *that* database. Process roughly becomes: * pg_dump herokudb | psql dev_temp * * drop existing dev_old; rename devdb to dev_old; rename dev_temp to devdb * Devs can pg_dump devdb at will That still moves your sensitive data to another server though, even temporarily. Another approach is to use the -T flag of pg_dump to exclude certain tables. Make modified copies of them on the server, then rename them after the dump (or simply put them in a new namespace): * (create a dev.users identical (including indexes, etc.) to public.users) * truncate table dev.users; * insert into dev.users select * from public.users; * update dev.users set email = 'dev' + || ...etc. * pg_dump -d -T public.users > devs_use_this.pg Then dev could do: set schema = dev, public; Or you could simply move the sanitized table back: alter table dev.users set schema public; If you are going to rename, it may be simpler to not make an identical copy of the affected tables (i.e. with indexes) but just a data-only copy: create table dev.users as select * from public.users; pg_dump herokudb --schema-only | psql devdb pg_dump herokudb --data-only -T public.users | psql devdb psql devdb -c 'insert into public.users select * from dev.users' - -- Greg Sabino Mullane g...@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201703240911 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iEYEAREDAAYFAljVHHsACgkQvJuQZxSWSshUbgCg7TzCkAzT4wKoKd5/2rruzLte TJcAoI7AvGdGzlNp5b3N+LFJ9DWIZ8/C =7heB -END PGP SIGNATURE- -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Run statements before pg_dump in same transaction?
On 03/23/2017 12:06 PM, François Beausoleil wrote: Hi all! For development purposes, we dump the production database to local. It's fine because the DB is small enough. The company's growing and we want to reduce risks. To that end, we'd like to anonymize the data before it leaves the database server. One solution we thought of would be to run statements prior to pg_dump, but within the same transaction, something like this: BEGIN; UPDATE users SET email = 'dev+' || id || '@example.com', password_hash = '/* hash of "password" */', ...; -- launch pg_dump as usual, ensuring a ROLLBACK at the end -- pg_dump must run with the *same* connection, obviously -- if not already done by pg_dump ROLLBACK; Is there a ready-made solution for this? Our DB is hosted on Heroku, and we don't have 100% flexibility in how we dump. The only thing I could find is: https://devcenter.heroku.com/articles/heroku-postgres-backups Direct database-to-database copies So copy your production server to a second server on Heroku, anonymize the data on the second server and then dump that data. I searched for "postgresql anonymize data dump before download"[1] and variations, but I didn't see anything highly relevant. Thanks! François PS: Cross-posted to http://dba.stackexchange.com/q/168023/3935 [1]: https://duckduckgo.com/?q=postgresql+anonymize+data+dump+before+download -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Run statements before pg_dump in same transaction?
> On 23 Mar 2017, at 20:23, John R Pierce wrote: > > On 3/23/2017 12:06 PM, François Beausoleil wrote: >> For development purposes, we dump the production database to local. It's >> fine because the DB is small enough. The company's growing and we want to >> reduce risks. To that end, we'd like to anonymize the data before it leaves >> the database server. > > how about piping the /output/ of pg_dump to GPG or a similar file/stream > encryption module ? John, anonymising data is not about encrypting, but about making data anonymous. This is usually done to create data sets useful for development or statistical analysis, whilst preventing people from obtaining sensitive information from the data set. For example, companies want to know how many facebook users are interested in buying drones, but those companies should not know who exactly those people are. ISTR that there are some tools for this purpose, but the details escape me. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Run statements before pg_dump in same transaction?
On 3/23/2017 12:06 PM, François Beausoleil wrote: For development purposes, we dump the production database to local. It's fine because the DB is small enough. The company's growing and we want to reduce risks. To that end, we'd like to anonymize the data before it leaves the database server. how about piping the /output/ of pg_dump to GPG or a similar file/stream encryption module ? -- john r pierce, recycling bits in santa cruz -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Run statements before pg_dump in same transaction?
On 3/23/2017 12:06 PM, François Beausoleil wrote: BEGIN; UPDATE users SET email = 'dev+' || id || '@example.com', password_hash = '/* hash of "password" */', ...; -- launch pg_dump as usual, ensuring a ROLLBACK at the end -- pg_dump must run with the*same* connection, obviously -- if not already done by pg_dump ROLLBACK; there is no way to get pg_dump to use the same connection as your script its a standalone program, and will open its own connection, hence run its own transactions. -- john r pierce, recycling bits in santa cruz
[GENERAL] Run statements before pg_dump in same transaction?
Hi all! For development purposes, we dump the production database to local. It's fine because the DB is small enough. The company's growing and we want to reduce risks. To that end, we'd like to anonymize the data before it leaves the database server. One solution we thought of would be to run statements prior to pg_dump, but within the same transaction, something like this: BEGIN; UPDATE users SET email = 'dev+' || id || '@example.com', password_hash = '/* hash of "password" */', ...; -- launch pg_dump as usual, ensuring a ROLLBACK at the end -- pg_dump must run with the *same* connection, obviously -- if not already done by pg_dump ROLLBACK; Is there a ready-made solution for this? Our DB is hosted on Heroku, and we don't have 100% flexibility in how we dump. I searched for "postgresql anonymize data dump before download"[1] and variations, but I didn't see anything highly relevant. Thanks! François PS: Cross-posted to http://dba.stackexchange.com/q/168023/3935 [1]: https://duckduckgo.com/?q=postgresql+anonymize+data+dump+before+download -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general