> 
> I don't know what the best solution is here.  The BOM encoded as UTF-8
> is valid data in other encodings.  Of course, there is your point that
> such data cannot be at the start of an SQL command.
> 

Is the UTF-8 BOM ( EF BB BF ) actually valid data in any other multi-byte 
encoding (other than it's intended use in UTF-8)?

I realize that for single-byte encoding, such as latin-1, it would be legal as 
data, since any bytes other that 00 are legal, although never legal outside a 
quoted string in a SQL command or psql command.

Certainly, no psql command input file can start with those bytes, or you would 
get an error (unless it is changed so the BOM is ignored).

As to zero-width non-breaking space:  the BOM is supposed to be treated as such 
if in the middle of a string, but if it is the start, it is just the BOM, and 
isn't considered part of the data, if I'm reading the spec right.  Perhaps the 
lexers should allow for it as white space (along with other Unicode space 
characters, such as U+2060).
It's not really important, since allowing the BOM sequence in the middle of a 
file is "deprecated" according to the Unicode standard.

And what if you see a real BOM, FF FE or FE FF or FF FE 00 00 or 00 00 FE FF?  
Give an error saying UTF-16 and UTF-32 are not supported?

Or is there a plan to read and convert the UTF-16 or UTF-32 to UTF-8, so psql 
and PostgreSQL understand it?
(BTW, that would actually be nice on Windows, where UTF-16 is common).

If we accept UTF-8 BOM, we should at least detect the other BOM sequences and 
give an error or warning.

Overall, from my user point of view, having psql deal with the BOM (at least 
the utf-8 one) would be more friendly than current behavior, as some editors 
(notepad for example) automatically add the BOM to the beginning of Unicode 
files, and it's not obvious without dumping the file in hex.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to