Re: [HACKERS] EOL characters and multibyte encodings

Andrew Dunstan Fri, 22 Jun 2007 05:15:08 -0700


William ZHANG wrote:


It's safe, because you'll be dealing with prosrc inside the backend,
therefore using a backend-legal encoding, and those don't have any ASCII
aliasing problems (all bytes of an MB character must have high bit set).


The lower byte of some characters in BIG5, GBK, GB18030 may be less than
0x7F and don't have the high bit set. Fortunately, they don't use 0x0D and
0x0A (CR and LF).

Those are client-only encodings, precisely for this sort of reason, andthus not relevant to the present discussion. As Tom points out above,when the language handler gets the code it will be encoded in therelevant backend encoding which can't be any of these.

(Side note: the restriction by the R parser to unix-only line endings isa dreadful piece of design. As Jon Postel rightly said, the best rule is"Be liberal in what you accept and conservative in what you send." Justabout every parser for every language has been able to handle this, sowhy must R be different?)


cheers

andrew

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

               http://www.postgresql.org/about/donate

Re: [HACKERS] EOL characters and multibyte encodings

Reply via email to