Re: [GENERAL] Run statements before pg_dump in same transaction?

2017-03-24 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


François Beausoleil asked:
> To that end, we'd like to anonymize the data before it leaves the database 
> server.
>
> One solution we thought of would be to run statements prior to pg_dump, but 
> within 
> the same transaction, something like this:
>
> BEGIN;
> UPDATE users SET email = 'dev+' || id || '@example.com', password_hash = '/* 
> hash of "password" */', ...;
> -- launch pg_dump as usual, ensuring a ROLLBACK at the end
...
> Is there a ready-made solution for this?

No - at least not with generating a dump and scrubbing *before* the data comes 
out. Some other ideas:

* Periodically do a full dump to another database under your control, sanitize 
the 
data, and make all dev dumps come from *that* database. Process roughly becomes:

* pg_dump herokudb | psql dev_temp
* 
* drop existing dev_old; rename devdb to dev_old; rename dev_temp to devdb
* Devs can pg_dump devdb at will

That still moves your sensitive data to another server though, even temporarily.
Another approach is to use the -T flag of pg_dump to exclude certain tables. 
Make modified copies of them on the server, then rename them after the dump 
(or simply put them in a new namespace):

* (create a dev.users identical (including indexes, etc.) to public.users)
* truncate table dev.users;
* insert into dev.users select * from public.users;
* update dev.users set email = 'dev' + || ...etc. 
* pg_dump -d  -T public.users > devs_use_this.pg

Then dev could do: set schema = dev, public;

Or you could simply move the sanitized table back:

alter table dev.users set schema public;

If you are going to rename, it may be simpler to not make an identical copy 
of the affected tables (i.e. with indexes) but just a data-only copy:

create table dev.users as select * from public.users;

pg_dump herokudb --schema-only | psql devdb
pg_dump herokudb --data-only -T public.users | psql devdb
psql devdb -c 'insert into public.users select * from dev.users'


- -- 
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201703240911
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAljVHHsACgkQvJuQZxSWSshUbgCg7TzCkAzT4wKoKd5/2rruzLte
TJcAoI7AvGdGzlNp5b3N+LFJ9DWIZ8/C
=7heB
-END PGP SIGNATURE-




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Run statements before pg_dump in same transaction?

2017-03-23 Thread Adrian Klaver

On 03/23/2017 12:06 PM, François Beausoleil wrote:

Hi all!

For development purposes, we dump the production database to local. It's fine 
because the DB is small enough. The company's growing and we want to reduce 
risks. To that end, we'd like to anonymize the data before it leaves the 
database server.

One solution we thought of would be to run statements prior to pg_dump, but 
within the same transaction, something like this:

BEGIN;
UPDATE users SET email = 'dev+' || id || '@example.com', password_hash = '/* hash of 
"password" */', ...;
-- launch pg_dump as usual, ensuring a ROLLBACK at the end
-- pg_dump must run with the *same* connection, obviously

-- if not already done by pg_dump
ROLLBACK;

Is there a ready-made solution for this? Our DB is hosted on Heroku, and we 
don't have 100% flexibility in how we dump.


The only thing I could find is:

https://devcenter.heroku.com/articles/heroku-postgres-backups

Direct database-to-database copies

So copy your production server to a second server on Heroku, anonymize 
the data on the second server and then dump that data.




I searched for "postgresql anonymize data dump before download"[1] and 
variations, but I didn't see anything highly relevant.

Thanks!
François

PS: Cross-posted to http://dba.stackexchange.com/q/168023/3935

  [1]: https://duckduckgo.com/?q=postgresql+anonymize+data+dump+before+download






--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Run statements before pg_dump in same transaction?

2017-03-23 Thread Alban Hertroys

> On 23 Mar 2017, at 20:23, John R Pierce  wrote:
> 
> On 3/23/2017 12:06 PM, François Beausoleil wrote:
>> For development purposes, we dump the production database to local. It's 
>> fine because the DB is small enough. The company's growing and we want to 
>> reduce risks. To that end, we'd like to anonymize the data before it leaves 
>> the database server.
> 
> how about piping the /output/ of pg_dump to GPG or a similar file/stream 
> encryption module ?

John, anonymising data is not about encrypting, but about making data 
anonymous. This is usually done to create data sets useful for development or 
statistical analysis, whilst preventing people from obtaining sensitive 
information from the data set. For example, companies want to know how many 
facebook users are interested in buying drones, but those companies should not 
know who exactly those people are.

ISTR that there are some tools for this purpose, but the details escape me.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Run statements before pg_dump in same transaction?

2017-03-23 Thread John R Pierce

On 3/23/2017 12:06 PM, François Beausoleil wrote:

For development purposes, we dump the production database to local. It's fine 
because the DB is small enough. The company's growing and we want to reduce 
risks. To that end, we'd like to anonymize the data before it leaves the 
database server.


how about piping the /output/ of pg_dump to GPG or a similar file/stream 
encryption module ?



--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Run statements before pg_dump in same transaction?

2017-03-23 Thread John R Pierce

On 3/23/2017 12:06 PM, François Beausoleil wrote:

BEGIN;
UPDATE users SET email = 'dev+' || id || '@example.com', password_hash = '/* hash of 
"password" */', ...;
-- launch pg_dump as usual, ensuring a ROLLBACK at the end
-- pg_dump must run with the*same*  connection, obviously

-- if not already done by pg_dump
ROLLBACK;


there is no way to get pg_dump to use the same connection as your 
script its a standalone program, and will open its own connection, 
hence run its own transactions.



--
john r pierce, recycling bits in santa cruz



[GENERAL] Run statements before pg_dump in same transaction?

2017-03-23 Thread François Beausoleil
Hi all!

For development purposes, we dump the production database to local. It's fine 
because the DB is small enough. The company's growing and we want to reduce 
risks. To that end, we'd like to anonymize the data before it leaves the 
database server.

One solution we thought of would be to run statements prior to pg_dump, but 
within the same transaction, something like this:

BEGIN;
UPDATE users SET email = 'dev+' || id || '@example.com', password_hash = '/* 
hash of "password" */', ...;
-- launch pg_dump as usual, ensuring a ROLLBACK at the end
-- pg_dump must run with the *same* connection, obviously

-- if not already done by pg_dump
ROLLBACK;

Is there a ready-made solution for this? Our DB is hosted on Heroku, and we 
don't have 100% flexibility in how we dump.

I searched for "postgresql anonymize data dump before download"[1] and 
variations, but I didn't see anything highly relevant.

Thanks!
François

PS: Cross-posted to http://dba.stackexchange.com/q/168023/3935

  [1]: https://duckduckgo.com/?q=postgresql+anonymize+data+dump+before+download



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general