Re: [HACKERS] EOL characters and multibyte encodings

Joe Conway Thu, 21 Jun 2007 15:55:12 -0700

Tom Lane wrote:

Joe Conway <[EMAIL PROTECTED]> writes:
My first thought on fixing this issue was to simply replace allinstances of '\r' in pg_proc.prosrc with '\n' prior to sending it to theR parser. As far as I know, any instances of '\r' embedded in asyntactically valid R statement must be escaped (i.e. literally thecharacters "\" and "r"), so that should not be a problem. But I amconcerned about how this potentially plays against multibyte characters.Is it safe to do this, or do I need to use a mb-aware replace algorithm?
It's safe, because you'll be dealing with prosrc inside the backend,
therefore using a backend-legal encoding, and those don't have any ASCII
aliasing problems (all bytes of an MB character must have high bit set).


Great -- I wasn't sure about that.

However I dislike doing it exactly that way because line numbers in the
R script will all get doubled.  Unless R never reports errors in terms
of line numbers, you'd be better off to either delete the \r characters
or replace them with spaces.

Good point. But I need to be able to deal with Apple EOLs too -- IIRCthose can be *only* '\r'. So I guess I need to do a look-ahead wheneverI run into '\r', see if it is followed by '\n', and then munge thestring accordingly.


Joe

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Re: [HACKERS] EOL characters and multibyte encodings

Reply via email to