Hi All I feel my IQ goes up by at lest 10 points when I read some of Mark's detailed answers concerning Livecode internals - and the thought processes that go into them.
Mark's and many others' answers are worth the price of admission to this list - thanks a lot Mark for such an erudite exposition of this and other arcane Livecode concepts. I have "Joel on Software" but a" Mark on software" would be a great read as well. Lagi On 12 October 2017 at 18:16, Mark Waddingham via use-livecode < use-livecode@lists.runrev.com> wrote: > On 2017-10-12 18:38, Peter Reid via use-livecode wrote: > >> I agree that the redundant indexing is an expensive approach, however >> I have found that this abnormal structure of a repeat for each loop >> can be a lot faster than the other loop forms in some circumstances. >> >> I can't find the reference that first highlighted to me the lack of >> guaranteed sequence of chunks, but I've assumed this was a restriction >> for some time. It would be great if this is incorrect as I've got some >> heavyweight looping in several of my apps that would benefit from >> this! >> >> Can anyone from LiveCode give a definitive answer to this please? >> > > Yes - the previous posts are correct. > > If you are using 'repeat for each' to iterate over something which has an > order, then the iteration will follow that order. > > Specifically: > > - string chunks (char, item, line, word, trueWord, sentence, paragraph, > token, codepoint, codeunit) are all ordered > > - byte chunks are ordered > > - key chunks are unordered (they use the hash-order of the array, which > can change arbitrarily even within a session) > > - element chunks are ordered for arrays which are sequences*, unordered > for all other arrays > > * An array is a sequence if: it has only integer keys, and keys range from > 1 up to the number of elements in the array. > e.g. If the keys are 1, 2, 3, 4 - then that is a sequence; 0, 1, 2, 3 is > not; 1, 3, 4 is not. > > Just to reiterate something Richard mentioned - the performance advantage > of a 'repeat for each' loop is completely lost if you use a normal chunk > within the loop to fetch another part (or the same part!) of the container: > > put 0 into tIndex > repeat for each word tWord in tString > add 1 to tIndex > put word somefunc(tIndex) of tString into tSameWord > ... > end repeat > > This is because the chunk evaluation must step through somefunc(tIndex) > words in the string *again*. (It takes N steps to access the N'th > string/byte chunk in a container - repeat for each gets its performance > advantage because it remembers where it got the previous chunk from so it > can fetch the next, so it is just 1 step each iteration). > > The only types of access which are guaranteed not to have this N step > behavior are codeunit, byte and array accesses: > > - a LiveCode string is (internally) an array of codeunits; which means > that it takes only one step to get any codeunit in it. > > - a LiveCode binary string is (internally) an array of bytes; so it only > takes one step to get any byte from it > > - LiveCode arrays are hash-tables, so you can look up any key in one step > > Note: If you are using the 'byte' chunk, make sure that the thing they are > accessing *is* actually a binary string before using it on them. The > conversion from string -> binary string (which exists for compatibility > with pre-7 scripts) will cause a performance bump. > > In terms of the codeunit/codepoint/character chunks - the engine *tries* > to ensure it does as little work as possible when accessing these. > Internally, the engine sets flags on a string so that it can optimize these > lookups. In particular: > > 1) If a string contains only Unicode codepoints from the first 64k > Unicode codes *without* surrogates (they give access to everything above > 64k) then codepoint access on that string will be 1 step. > > 2) If a string contains characters which are only composed of single > codepoints in the first 64k Unicode code *without* surrogates, then char > access on that string will be 1 step. > > In particular, if the string you are processing actually came from a > native string; then (1) and (2) are guaranteed to be the case. However, if > you have arbitrary Unicode strings, then it won't generally be the case: > > a) any characters which have a Unicode code > 64k must be represented by > two codes < 64k (surrogate pairs); this means that the engine has to step > through the string codeunit by codeunit to count the codepoints so it can > find the one you asked for (N step again) > > b) any characters which are composed of multiple codepoints are in the > same boat > > Case (b) can arise quite often even in Roman script languages. For > example, in Unicode you can write e-acute as EITHER e,combining-acute (two > codepoints) OR as e-with-acute (one codepoint). For some languages, most > characters will be multiple codepoints - e.g. the Indic languages. There > are also a lot of esoteric languages (for scholarly / academic purposes) > which are entirely defined by Unicode codepoints > 64k. > > Upshot is that if you are doing char by char string processing of general > Unicode strings, then try and use repeat for each char; and if you *can't* > do that for some reason, then try and work out how you can do the > processing by using the codeunit chunk. > > However, as Knuth is famed for saying - "Premature optimization is the > root of all evil" - don't spend time trying to optimize an algorithm until > you have a working version and can intimately analyse where the slow parts > are. Trying to optimize ahead of time will likely just cause a lot of time > being spent needlessly, and potential hair loss when figuring out why the > (probably much more complicated code) doesn't work quite right, or why it > didn't produce the speed improvement you were expecting! > > Warmest Regards, > > Mark. > > -- > Mark Waddingham ~ m...@livecode.com ~ http://www.livecode.com/ > LiveCode: Everyone can create apps > > _______________________________________________ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode > _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode