Jim Nasby <j...@nasby.net> writes:
> Wait... I thought that was one of the objections... that we wanted to
> leave a BOM in something like a COPY untouched?
I think most of us are okay with stripping a BOM that appears at the
*beginning* of a text file (assuming there's reason to believe the file
is in UTF8 encoding). BOM sequences embedded later in the file are a lot
more debatable, and I for one don't want to assume those can be dropped.
I don't know of any legitimate usage of such cases, and think it's
probably better to report an encoding error.
> Uh... could we just treat BOM as another whitespace character?
A BOM is *most certainly not* whitespace. The only even semi-legitimate
usage it has in UTF8 is as a file encoding marker. You can bet that the
user whose text editor made the file did not think he had whitespace at
the front. Anyway, your proposition that leading whitespace is ignorable
fails completely for data files.
regards, tom lane
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: