--- In [email protected], "brucexs" <bswit...@...> wrote: > > --- In [email protected], "entropyreduction" > <alancampbelllists+yahoo@> wrote: > > > > These any help? > > > > Unicode UTF-8 encoding > > http://www1.tip.nl/~t876506/utf8tbl.html > > > > Here it is using and's and or's instead of subtracts and adds. The subtracts > work in the case only because you know the exact bit that are being removed > (whereas &0x3f removes (eg) the top 2 bits, regardless of what they are. > > Note that &, |, <<, >> are understood by PowerPro to operate on numbers. > Since PowerPro stores integers as (base 10) strings, it automatically > converts the numbers to their binary form before applying the operator, then > converts back to strings. > > This function converts a UTF8 string representing a single code point to that > code point in Unicode. Note that utf8 can be stored in normal PowerPro > strings. Unicode cannot, so the result is returned as a number. > > // split lines throughout by Yahoo... > // test samples are from wikipedia article on utf8 > > win.debug("50 => 50 ", cvtutf8("\x50").convertbase(10,16)) > > win.debug("C2A2 => 00A2 ", cvtutf8("\xC2\xA2").convertbase(10,16)) > > win.debug("E282AC => 20AC ", cvtutf8("\xE2\x82\xAC").convertbase(10,16)) > > win.debug("F0A4ADA2 => 024B62 ", > cvtutf8("\xF0\xA4\xAD\xA2").convertbase(10,16)) > > //**************************************************** > function cvtutf8(u8) > > local b1=u8[0].tonum > if (b1 <= 0x7f) > quit (b1) > > > local b2 = u8[1].tonum > if (b1<=0xDf) > quit ((b1&0x1f)<<6 | (b2 & 0x3f )) // 110y yyyy 10xx xxxx > > local b3 = u8[2].tonum > if (b1<=0xef) > quit ((b1&0xf)<<12 | (b2&0x3f)<<6 | (b3&0x3f) ) //1110zzzz 10yyyyyy > 10xxxxxx > > local b4 = u8[3].tonum > quit ((b1&0xf)<<18 | (b2&0x3f)<<12 | (b3&0x3f)<<6 | (b4&0x3f)) //11110uuu > 10zzzzzz 10yyyyyy 10xxxxxx >
Thank to both of you. Works fine. I will study and endeavor to understand it. Also, I wasn't aware that case keywords worked outside of the case function (nice!). Regards, Sheri
