On Fri, 9 Oct 2009, Tom Lane wrote:

what do we do with rows that fail encoding conversion? For logging to a file we could/should just decree that we write out the original, allegedly-in-the-client-encoding data. I'm not sure what we do about logging to a table though. The idea of storing bytea is pretty unpleasant but there might be little choice.

I think this detail can get punted as documented and the error logged, but not actually handled perfectly. In most use cases I've seen here, saving the rows to the "reject" file/table is a convenience rather than a hard requirement anyway. You can always dig them back out of the original again if you see an encoding error in the logs, and it's rare you can completely automate that anyway.

The main purpose of the reject file/table is to accumulate things you might fix by hand or systematic update (i.e. add ",\N" for a missing column when warranted) before trying a re-import for review. I suspect the users of this feature would be OK with knowing that can't be 100% accurate in the face of encoding errors. It's more important that in the usual case, things like bad delimiters and missing columns, that you can easily manipulate the rejects as simple text. Making that harder just for this edge case wouldn't match the priorities of the users of this feature I've encountered.

--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to