Hey Chris,

Thank you for your reply, this is the way to do this task properly. In my case I don't need to clean any data, it already comes validated from the db, so I might get away with using dblink and some functions, if that fails, I'll do dump/restore into temporary tables.

So far I figured out the right package that's has dblink.so (used to be dblink.sql), then I activated it from under postgres user with CREATE EXTENSON dblink; Now I figured out the syntax of dblink command that allows me to connect via socket, it's --  FROM dblink('dbname=fmcc host=/var/run/postgresql port=5432 password=test'...

The only problem is I'm getting this interesting error, for some reason dblink *really wants to do password verification*

ERROR:  password is required
DETAIL:  Non-superuser cannot connect if the server does not request a password.
HINT:  Target server's authentication method must be changed.
********** Error **********

ERROR: password is required
SQL state: 2F003
Detail: Non-superuser cannot connect if the server does not request a password.
Hint: Target server's authentication method must be changed.

Alex.

On 2018-07-20 04:34 PM, Christopher Browne wrote:
On Fri, 20 Jul 2018 at 13:42, Alex Volkov via talk <talk@gtalug.org> wrote:
Hey Everyone,

I'm looking for a tool to transform (possibly migrate) data from one
postgres db to another.

I have two postgres databases -- old (not normalized) has all the data
and new (normalized with some major schema changes) has no data.

The new db has more tables and table relationship structured
differently, some data residing in old db is in text column in csv
format, whereas in new database it's a separate table, and so on.

I've been thinking of writing a script that would just transform data
from one format to the other, but before writing a bunch of code that's
going to be run exactly once, I'm wondering if there a tool out there
which I can use to express one-way transformation rules for these databases.

I have done this a few times ;-)

The first time, it was pretty much "scripts manipulating text files"...

These days, I seek to get the data out in some sort of "logical form",
where, as much as possible, internal details such as object IDs get
thrown out, and the data then gets loaded into some tables, and then
cleansed and loaded into the target schema.

Often, what happens is that the data files represent data that I
receive from someone else (that used to operate a domain name
registry...), and so the notion of using dblink isn't ever in my head.

But then, in effect, the data flows...

- Loaded into raw_this, and raw_that, and raw_other...

- Some cleansing is done to make sure that invalid data is
fixed or thrown out, giving me tables cleansed_this, cleansed_that,
cleansed_other...

Finally, the data from the "cleansed" tables gets loaded into the
new schema essentially via a series of INSERT statements.

By having that intermediate stage, you have some ability to
audit the data, to know that all the data loaded was in good
form, or to have specific identification of data discarded because
it was bad., or to identify all the data that got modified to
clean it up (e.g. - I have lots of postal addresses, so have to
clean up country codes and the like).


---
Talk Mailing List
talk@gtalug.org
https://gtalug.org/mailman/listinfo/talk

Reply via email to