The recent testing of the Parse1 and Parse2 algorithms I think must have been on ascii not utf-8 text
I tested on the English translation of Les Miserables, to ensure at least a sprinkling of multi-bite characters in the text, and a longish file: 3.4 MB. I tested for the search string ‘Valjean’ which obviously occurs very frequently. The searches were first applied to the raw binary text as read from the utf-8 encoded file, without decoding; then on the text utf-8 decoded Parse 0 : using itemdelimiter ‘Valjean’ (case insensitive) Parse 1: using offset with skips Parse 2: using offset, truncating the text and 0 skip Results: searches on raw text parse0 10 ms parse1 9 ms parse2 708 ms searches on utf-8text parse0 4402 ms parse1 225469 ms parse2 3453 ms The winner for long utf-8 text is Parse 2; for raw text Parse1 and Parse 0 are equivalent The results dramatically demonstrate the exponential decay in performance with long utf-8 text. For most searches I would think one could use the raw text as long as one was searching for an ascii string, false positives where the string of single bytes occurs inside multibyte characters would be extremely unlikely. Neville _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode