[REBOL] UTF-8

Alain Goyé Sun, 17 Oct 2004 11:39:14 -0700

Hi all,

I got interested in manipulating Unicode with REBOL and tried the UTF-8 script by Jan 
Skibinski.


It seems there is an error in the encode function which did not convert correctly my 
test case : the 1st letter of Khmer alphabet which code is U+1780, should become 
#{E19E80} in UTF-8, according to my understanding (based on 
http://www.zvon.org/tmRFC/RFC2279/Output/chapter2.html).

In case it may be helpful to someone this version should work (though not optimized 
and tested only with k=2 on U+1780 :-) :

    encode: func [
        k [integer!]
        ucs [string!]
        /local c f m x result [string!]
    ][
        result: make string! length? ucs
        f: pick fetch k
        parse/all ucs [any [c: k skip ( 
            either 128 > x: f c [ 
                insert tail result x
            ][
                result: tail result
                m: 64
                until [
                    insert result to char! x and 63 or 128
                    (m: m / 2) > x: x and -64 / 64
                ]
                insert result to char! x or pick udata 1 + length? result
            ]
        )]]
        head result
    ]

-- 
To unsubscribe from the list, just send an email to rebol-request
at rebol.com with unsubscribe as the subject.

[REBOL] UTF-8

Reply via email to