Steve,

You're right and I'm wrong, I was confused by the UTF code numbers, which differ from the actual byte encodings used for UTF8. Indeed, all the multi-byte higher-order stuff is stuffed into 128-255 in the UTF8 encoding, so a straight byte-swap would work (for UTF8 and the various one-byte latin code pages, that is).

Paul

On 21-Jun-07, at 10:30 AM, Stephen Woodbridge wrote:

Hmmmm, I am probably wrong on this but I thought 0x0 - 0x7f are standard UTF8 characters with a constant meaning that is the same as ascii for those bytes, and the all multi-byte characters had to have a the highorder bit set to indicate is was part of a multibyte sequence.

I was not under the impresion that at you could have 0x0 - 0x7f as a part of a multi-byte sequence. I am not an expert in this area and probably just know enough to mislead you ;) but I think it is worthwhile getting some additional inside into this. I for one would like to see a multi-byte UTF8 sequence with \r embedded in it.

-Steve


Paul Ramsey wrote:
Danger, will Robinson. All values are fair game in bytes 2,3,4 of the UTF encodings, so yes, it's possible you'll wreck multi-byte characters by doing a simple replacement on the byte array. Better to use an encoding-aware string replace function (not knowing C, I don't know what that would be, but there must be some in the PgSQL code base).
P
On 21-Jun-07, at 7:03 AM, Joe Conway wrote:
Obe, Regina wrote:
Joe,
Can you take a look at it again. It was messed up in my firefox too. I think originally I had it looking right in Firefox, but then IE it didn't look right so I changed it to look right in IE, but forgot to check back in firefox. Hopefully this time I have made all browser masters happy.

http://www.bostongis.com/PrinterFriendly.aspx? content_name=postgresql_plr_tut02
The tutorial looks perfect now in Firefox on Fedora Core 7.

BTW, I have confirmed on the R-devel list that the R engine is expecting \n for EOL, and \r will cause a syntax error, on all platforms. I will probably fix this by simply replacing \r with \n in PL/R functions. My only reservation is whether this might cause issues for installations with multibyte characters. Does anyone know if it is possible for multibyte characters to include a byte = 13 (\r), i.e. is the simple replacement of \r safe in all locales?

Thanks,

Joe

_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users

Reply via email to