David Christensen <da...@endpoint.com> wrote: > Is that only when the default client encoding is set to UTF8 > (PGCLIENTENCODING, whatever), or will it be coded to work with the > following: > > $ psql -f <file> > Where <file> is: > <BOM> > SET CLIENT ENCODING 'utf8';
Sure. Client encoding is declared in body of a file, but BOM is in head of the file. So, we should always ignore BOM sequence at the file head no matter what client encoding is used. The attached patch replace BOM with while spaces, but it does not change client encoding automatically. I think we can always ignore client encoding at the replacement because SQL command cannot start with BOM sequence. If we don't ignore the sequence, execution of the script must fail with syntax error. This patch does nothing about COPY and \copy commands. It might be possible to add BOM handling code around AllocateFile() in CopyFrom() to support "COPY FROM 'utf8file-with-bom.tsv'", but we need another approach for "COPY FROM STDIN". For example, $ echo utf8bom-1.tsv utf8bom-2.tsv | psql -c "COPY FROM STDIN" might contain BOM sequence in the middle of input stream. Anyway, those changes would come from another patch in the future. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
psql-utf8bom_20091021.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers