The unicode plugin's character functions (such as length) apparently are 
dividing the number of UTF16-based bytes by 2 to get the length, which is true 
only for the Basic Multilingual Plane. Regex/utf8 works fine tho.

;needs 4.9k for multiline strings
local str="2F862" ;;a CJK Compatibility Ideograph
local instr="\xf0\xaf\xa1\xa2"
;teststr(str, instr)
str ++=" and 10401" ;;Deseret Capital Letter Long E
instr ++= "\xf0\x90\x90\x81"
local ustring=unicode.from_utf8(instr)
local ulength=ustring.length
local rlength=regex.pcrematchcount(?".", ustring.to_utf8, "utf8")
messagebox("OK", ???xend
Code Points: &(str)
unicode.length (incorrect): &(ulength)
regex char matches (correct): &(rlength)
xend)
quit


Reply via email to