Uli:

>  Problem: Looping over the words in a variable can be rather slow
>since HC has to scan the string from the start each time, i.e. a loop
>that looks at the first 3 words scans:
>
>  word 1
>  word 1, word 2
>  word 1, word 2, word 3
[ Actually, the entire string is read every loop...see today's post to the
HC List.]
>
>thus exponentially adding to the time it takes to get a word.
>
>  Solutions:
>
>1) Since every time a script loops over all words of a text it uses
>the "the number of words" function, which also scans the whole string
>once, why not have that generate a look-up table, i.e. an array where
>each item holds the start and end offset of the item with the number
>of the array index.
>
>2) Have each variable remember the start and end offsets of the last
>word retrieved and its number. If a sequence of words is parsed, this
>information would be examined beforehand, thus allowing to go on
>parsing relative to the last word retrieved.
>
>  The second one would optimize sequential accessing of chunks the
>most, but the first one would be better for dynamic loops, since it'd
>even speed up if you only needed to get every second item.

Rob:

Looking at this a little closer, I don't think the extra overhead involved
would justify indexing every string when it is read the first time in case
a second word should be needed immediately.  And, BTW, what if the script
chunks the string by line or item...and what if the itemDelimiter is
changed after the string is indexed...and what if the reference is "word
someNum of line anotherNum of item someThingElse"?

In the case of non-looping chunk references, I think we should accept the
speed gain we get from using Pascal notation instead of C notation and
focus on speeding up repeat loops.

A repeat loop (generally) processes each chunk in turn.  I think
maintaining a pointer to a current position before/within/after the string
and getting the next/previous char/word/line/item is simpler than maintain
the offsets of the last word (and line? and item?...and if the
itemDelimiter changes, etc.).  It would also add new capabilities (next &
previous) in addition to being faster (IMFO) than your approach.


Rob Cozens, CCW
http://www.serendipitysoftware.com/who.html

"And I, which was two fooles, do so grow three;
Who are a little wise, the best fooles bee."

from "The Triple Foole" by  John Donne (1572-1631)

Reply via email to