Re: Reading a (BIG) text file one line at a time - in reality...

Richard Gaskin Wed, 24 Nov 2004 00:28:02 -0800

J. Landman Gay wrote:

On 11/23/04 10:17 PM, Richard Gaskin wrote:
If any of you have time to improve the buffering method below I'd be interested in any significant changes to your test results.
If we want the buffering method to be as fast as possible, so as to test the method itself rather than the script that runs it, then we can speed up the script by rewriting method #3 like this:
  put the millisecs into t
  --
  put 0 into tWordCount3
  open file tFile for text read
  put empty into tBuffer
  repeat
    read from file tFile for 32000
    put tBuffer before it -- stores only 1 line from previous read
    if it is empty then exit repeat
    if the number of lines in it > 1 then
      put last line of it into tBuffer
      delete last line of it
    else
      put empty into tBuffer
    end if
    --
    repeat for each line l in it
      add the number of words of l to tWordCount3
    end repeat
  end repeat
  --
  put the millisecs - t into t3
  close file tFile
  --
  --
This script assumes that the last line in each 32K block is incomplete, which will almost always be the case. If the line isn't incomplete, it doesn't hurt anything to treat it like it is.

Problem is, I'm getting a slightly different word count than your original method. I didn't debug that because it's getting late, but it is off by just a few chars and I suspect it has to do with the very last line in the file. At any rate, the idea is that the difference in speed is pretty high; in my test the original took about 850 milliseconds and the revised one above took about 125. This would probably change your benchmarks a bit.

I added a "close file" command for completeness. If I get a chance, I'll try to figure out why my count is off, if someone else doesn't do it first.

Good work, Jacque. I knew there would be a way to change the "repeat with" to a "repeat for each", and moving the last line to the buffer and walking through "it" instead looks like the way to go.

However, my results differ from yours -- I'm getting an accurate word count, but slower speed than before:

200 MB free RAM
---------------
Read all:      5.881 secs
Read buffered: 8.575 secs

Either there was something wrong with the first time I ran the tests, or there's something wrong with how I've copied your version in. And of course I have a spare 200MBs of RAM -- got too much to do to go through launching all my other apps just to put the squeeze on a test. :)

We still don't know the business specifics of the original poster to know if this is at all useful to him, but assuming it will be to others down the road the next logical questions are:

1. How can we generalize this so one handler can be used to feed lines to another handler for processing?

2. Can we further generalize it to use other chunk types (words, items, tokens)?

3. Once we solve #1 and 2, should we request an addition to the engine for this? If you think this is fast now wait till you see what the engine can do with it. It'll be like life before and after the split and combine commands.

--
 Richard Gaskin
 Fourth World Media Corporation
 ___________________________________________________________
 [EMAIL PROTECTED]       http://www.FourthWorld.com
_______________________________________________
use-revolution mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Reading a (BIG) text file one line at a time - in reality...

Reply via email to