At 8:04 PM +0100 on 1/19/00, M. Uli Kusterer wrote:

>1) Since every time a script loops over all words of a text it uses
>the "the number of words" function, which also scans the whole string
>once, why not have that generate a look-up table, i.e. an array where
>each item holds the start and end offset of the item with the number
>of the array index.

Because that's eight bytes of pointer-ish data for a five-byte word.
Ouch. And it'd REALLY slow down "the number of words in", which would
be spitting out somewhere around 8MB of data for every 5MB in. For
certain things it would not be as bad, and for short words it would be
even worse.

I would, of course, keep track of the last item. And 10 or so inbetween
-- though it is rather hard to do without knowning the last item; I'd
have to guess based on the byte number.

>
>2) Have each variable remember the start and end offsets of the last
>word retrieved and its number. If a sequence of words is parsed, this
>information would be examined beforehand, thus allowing to go on
>parsing relative to the last word retrieved.

Almost what I plan on doing.

>
>  The second one would optimize sequential accessing of chunks the
>most, but the first one would be better for dynamic loops, since it'd
>even speed up if you only needed to get every second item.

I'd count from the best known position, which will seldom be the
beginning. If I know word 576 begins at x, and I want word 573, I'll
start at x and count backwards.

I'll also use those 10 or so that I found when counting the words if I
have them. Once again: Pick the closest point and count from there.

>But it
>could dramatically increase RAM requirements whenever the number of
>items etc. is counted,

"increase"? Double (or worse) is the word you're looking for.

>while the second method would add to the
>overall RAM requirements, since every variable would have to carry
>around 3 additional fields with info on the last item retrieved.

3 fields -- 12 bytes -- is not that bad. It's not like a variable is
exactly that small.

>  Both of the above suggestions possibly wouldn't be able to improve
>performance for nested chunks (e.g. "item 2 of line 1") very much,
>though, so a construct like MC's "repeat for each" would definitely
>be required in addition.

Why not? As soon as I get some time to pull off some more magic, they will.

A repeat for each is a Good Thing, though.

>
>  Anyway, does anybody have suggestions what could be done to optimize
>chunks? Anthony?

Cache them. That can even optimize:

        repeat with x = 1 to 5
          repeat with y = 1 to 5
            repeat with z = 1 to 5
              put word z of line y of item x of var
            end repeat
          end repeat
        end repeat

The trickery here would be to realize that "item x of var" can be
treated as a variable for caching just as legitametly as can be "var".


Might also want to implement some LRU caching for those silly nonlinear
accesses :)

Reply via email to