>Because that's eight bytes of pointer-ish data for a five-byte word.
>Ouch. And it'd REALLY slow down "the number of words in", which would
>be spitting out somewhere around 8MB of data for every 5MB in. For
>certain things it would not be as bad, and for short words it would be
>even worse.

Anthony,

  another approach would be to remember the length of each word and 
space in a list. Every second length would be a space length, and 
calculating the offset would in most cases (i.e. words > 4 
characters) speed up the process a good bunch. I.e.:

Text "This house is just too cool to be pulled down"

would be complemented by:
4,1,5,1,2,1,4,1,3,1,4,1,2,1,2,6,1,4

18 * 4 = 72 bytes at fixed offsets to scan instead of 45 ... ewwwww?!

You know, sometimes English is a really inconvenient language ... Of 
course, we could use shorts instead, which would be 36 bytes (I doubt 
anyone has a 64k word in a text, right? Of course, we'd introduce a 
limit...), which would already be a gain. Or we could just store the 
start offset of each word.

  I guess we'll have to heed Stroustrup's experience here ... create 
applications that test what is the most effective and use that.

>I would, of course, keep track of the last item. And 10 or so inbetween
>-- though it is rather hard to do without knowning the last item; I'd
>have to guess based on the byte number.

  Do you want to globally keep track of the last item (how?) or on a 
per-string/per-variable basis?

>I'd count from the best known position, which will seldom be the
>beginning. If I know word 576 begins at x, and I want word 573, I'll
>start at x and count backwards.
>
>I'll also use those 10 or so that I found when counting the words if I
>have them. Once again: Pick the closest point and count from there.

  Right, why didn't I think of that? Surely would be an advantage that 
should gain a good chunk of bytes.

>3 fields -- 12 bytes -- is not that bad. It's not like a variable is
>exactly that small.

  The trouble is that these fields are only used in a small subsection 
of variables, and there might be hundreds of variables in a script. 
That would mean 12k per script more memory used. It quickly adds up.

>Why not? As soon as I get some time to pull off some more magic, they will.

  How do you intend to optimize nested chunks this way? I don't see 
how I'd have Joker cache a chunk's last value ... I could probably do 
it for loops, by having the code cache it, but if I want to keep the 
last read chunk's offset with its variable so even several accesses 
in a row (but not in a loop) are optimized ... My instincts tell me 
it's possible, but it'll take some time until my brain realizes how.

  OTOH, the stuff on top of this message sounds sensible. Just 
remembering the end of the last chunk read, along with what type 
(item, word, line etc.) it was and which item/number was the last one 
should make it possible to parse from middle and start, and would at 
least speed up sequential work. Thanks, Anthony!

>Might also want to implement some LRU caching for those silly nonlinear
>accesses :)

  LRU ???

Cheers,
-- M. Uli Kusterer

------------------------------------------------------------
              http://www.weblayout.com/witness
        'The Witnesses of TeachText are everywhere...'

--- HELP SAVE HYPERCARD: ---
Details at: http://www.hyperactivesw.com/SaveHC.html
Sign: http://www.giguere.uqam.ca/petition/hcpetition.html

Reply via email to