"David E. Wheeler" <[email protected]> writes:
> Ive discovered something a bit disturbing at $work. Were migrating (slowly)
> from Oracle to PostgreSQL, and in some cases are using oracle_fdw to copy
> data over. Alas, there are a fair number of text values in the Oracle
> database that, although the database is UTF-8, are actually something else
> (CP1252 or Latin1). When we copy from an oracle_fdw foreign table into a
> PostgreSQL table, PostgreSQL does not complain, but ends up storing the
> mis-encoded strings, even though the database is UTF-8.
> I assume that this is because the foreign table, as a table, is assumed by
> the system to have valid data, and therefor additional character encoding
> validation is skipped, yes?
Probably not so much "assumed" as "nobody thought about it". In
e.g. plperl we expend the cycles to do encoding validity checking on
*every* string entering the system from Perl. I'm not sure why foreign
tables ought to get a pass on that, especially when you consider the
communication overhead that the encoding check would be amortized
against.
Now, having said that, I think it has to be the reponsibility of the FDW
to apply any required check ... which makes this a bug report against
oracle_fdw, not the core system. (FWIW, contrib/file_fdw depends on the
COPY code, which will check encoding.)
regards, tom lane
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers