Re: [postgis-users] Re: [Plr-general] Tutorial on PLR and PostGIS, more on carriage returns

Michael Fuhr Fri, 22 Jun 2007 00:28:27 -0700

On Thu, Jun 21, 2007 at 02:50:02PM -0700, Paul Ramsey wrote:
> You're right and I'm wrong, I was confused by the UTF code numbers,  
> which differ from the actual byte encodings used for UTF8.  Indeed,  
> all the multi-byte higher-order stuff is stuffed into 128-255 in the  
> UTF8 encoding, so a straight byte-swap would work (for UTF8 and the  
> various one-byte latin code pages, that is).


Additionally, leading and trailing bytes of multibyte UTF-8 sequences
use disparate ranges and the value of the leading byte indicates
how many trailing bytes follow.  Section 2.5 of The Unicode Standard
discusses encoding form design principles; Section 3.9 contains
formal definitions.  Table 3-7 shows the byte ranges allowed in
each position (single, leading, trailing).

http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf
http://www.unicode.org/versions/Unicode5.0.0/ch03.pdf

-- 
Michael Fuhr
_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users

Re: [postgis-users] Re: [Plr-general] Tutorial on PLR and PostGIS, more on carriage returns

Reply via email to