This is a little late but there was a discussion about the slowness of simple
offset() when dealing with text that contains Unicode characters.
Geoff Canyon and Brian Milby found a faster solution by setting the
itemDelimiter to the search string.
They even provided a way to find the position of substrings in the search
string which the offset() command does by design.
Here I propose a variant of the offset() form that uses UTF16 to search, easily
adaptable to UTF32 if necessary.
To test (as in Brian's testStack) add a unicode character to the text to be
searched e.g. at the end. Just any non-ASCII character to see the speed penalty
of simple offset(). I used ð (Icelandic d) or use any chinese character.
Kind regards
Bernd
-------------------------------------------
function allOffsets pDelim, pString, pCaseSensitive
local tNewPos, tPos, tResult
put textEncode(pDelim,"UTF16") into pDelim
put textEncode(pString,"UTF16") into pString
set the caseSensitive to pCaseSensitive is true
put 0 into tPos
repeat forever
put offset(pDelim, pString, tPos) into tNewPos
if tNewPos = 0 then exit repeat
add tNewPos to tPos
put tPos div 2 + tPos mod 2,"" after tResult
end repeat
if tResult is empty then return 0
else return char 1 to -2 of tResult
end allOffsets
-----------------------------------------
_______________________________________________
use-livecode mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode