[power-pro] Re: unicode muti-word code points (Bruce)

silvermoonwoman2001 Tue, 01 Sep 2009 09:26:42 -0700

--- In [email protected], "brucexs" <bswit...@...> wrote:
>
> --- In [email protected], "entropyreduction" 
> <alancampbelllists+yahoo@> wrote:
> >
> > These any help?
> > 
> > Unicode UTF-8 encoding
> > http://www1.tip.nl/~t876506/utf8tbl.html
> >
> 
> Here it is using and's and or's instead of subtracts and adds.  The subtracts 
> work in the case only because you know the exact bit that are being removed 
> (whereas &0x3f removes (eg) the top 2 bits, regardless of what they are.
> 
> Note that &, |, <<, >> are understood by PowerPro to operate on numbers.  
> Since PowerPro stores integers as (base 10) strings, it automatically 
> converts the numbers to their binary form before applying the operator, then 
> converts back to strings.
> 
> This function converts a UTF8 string representing a single code point to that 
> code point in Unicode.  Note that utf8 can be stored in normal PowerPro 
> strings.  Unicode cannot, so the result is returned as a number.
> 
> // split lines throughout by Yahoo...
> // test samples are from wikipedia article on utf8
> 
> win.debug("50 => 50      ",  cvtutf8("\x50").convertbase(10,16))
> 
> win.debug("C2A2 => 00A2     ", cvtutf8("\xC2\xA2").convertbase(10,16))
> 
> win.debug("E282AC => 20AC    ", cvtutf8("\xE2\x82\xAC").convertbase(10,16))
> 
> win.debug("F0A4ADA2 => 024B62    ", 
> cvtutf8("\xF0\xA4\xAD\xA2").convertbase(10,16))
> 
> //****************************************************
> function cvtutf8(u8)
> 
> local b1=u8[0].tonum
> if (b1 <= 0x7f)
> quit (b1)
> 
> 
> local b2 = u8[1].tonum
> if (b1<=0xDf) 
> quit ((b1&0x1f)<<6 | (b2 & 0x3f ))     //   110y yyyy  10xx xxxx 
> 
> local b3 = u8[2].tonum
> if (b1<=0xef)
> quit ((b1&0xf)<<12 | (b2&0x3f)<<6 | (b3&0x3f) )   //1110zzzz 10yyyyyy 
> 10xxxxxx 
> 
> local b4 = u8[3].tonum
> quit  ((b1&0xf)<<18 | (b2&0x3f)<<12 | (b3&0x3f)<<6 | (b4&0x3f))    //11110uuu 
> 10zzzzzz 10yyyyyy 10xxxxxx
>


Thank to both of you. Works fine. I will study and endeavor to understand it. 
Also, I wasn't aware that case keywords worked outside of the case function 
(nice!).

Regards,
Sheri

[power-pro] Re: unicode muti-word code points (Bruce)

Reply via email to