Re: [HACKERS] Support UTF-8 files with BOM in COPY FROM

Andrew Dunstan Mon, 26 Sep 2011 05:08:46 -0700


On 09/26/2011 07:12 AM, Magnus Hagander wrote:

On Mon, Sep 26, 2011 at 06:58, Itagaki Takahiro
<itagaki.takah...@gmail.com>  wrote:

Hi,

I'd like to support UTF-8 text or csv files that has BOM (byte order mark)
in COPY FROM command. BOM will be automatically detected and ignored
if the file encoding is UTF-8. WIP patch attached.

I'm thinking about only COPY FROM for reads, but if someone wants to add
BOM in COPY TO, we might also support COPY TO WITH BOM for writes.

Comments welcome.

I like it in general. But if we're looking at the BOM, shouldn't we
also look and *reject* the file if it's a BOM for a non-UTF8 file? Say
if the BOM claims it's UTF16?

It should be rejected as invalidly encoded anyway, as a non-utf8 BOM isnot valid utf-8. We shouldn't check in non-unicode cases where thesequence might be valid in those encodings (e.g. ISO-8859-1).


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support UTF-8 files with BOM in COPY FROM

Reply via email to