Hi all,
I got interested in manipulating Unicode with REBOL and tried the UTF-8 script by Jan
Skibinski.
It seems there is an error in the encode function which did not convert correctly my
test case : the 1st letter of Khmer alphabet which code is U+1780, should become
#{E19E80} in UTF-8, according to my understanding (based on
http://www.zvon.org/tmRFC/RFC2279/Output/chapter2.html).
In case it may be helpful to someone this version should work (though not optimized
and tested only with k=2 on U+1780 :-) :
encode: func [
k [integer!]
ucs [string!]
/local c f m x result [string!]
][
result: make string! length? ucs
f: pick fetch k
parse/all ucs [any [c: k skip (
either 128 > x: f c [
insert tail result x
][
result: tail result
m: 64
until [
insert result to char! x and 63 or 128
(m: m / 2) > x: x and -64 / 64
]
insert result to char! x or pick udata 1 + length? result
]
)]]
head result
]
--
To unsubscribe from the list, just send an email to rebol-request
at rebol.com with unsubscribe as the subject.