Re: Not Shy...

Richard Gaskin Tue, 08 Jun 2004 14:34:09 -0700

Bob Nelson wrote:

...so let's dive in with both feet.

HyperCard was my friend - and remained my friend until the advent of OS X
and a new machine that won't boot into OS 9.x any more.  Sad, since I used
it for all sorts of cool tricks, especially the massaging of copious amounts
of data that needed a good "cleaning" before dropping it into MySQL or
FileMaker.

A new project came along and prompted me to go hunting.  One of the old
HyperCard sites recommended Revolution or SuperCard, so I've been demo'ing
Revolution for a couple of days to see how the package operates compared to
other options - including RealBASIC.

So I've got a little script that does a moderately simple thing:  Grab a web
page, bring it back, strip useless data out of it (right now, I've just got
it stripping out the extra returns and leading spaces per line) and the next
step will be to kill the HTML on the page so I can mine the data...

My layout and code are fairly simple:

Two fields and two buttons so I can work through the example - the first
field is the 'holder' of the remote URL which has been retrieved
(Imported_Raw) and the second field will be the restructured output when I'm
done.

Here's the code, for those who want to dive deeper...

on mouseUp put 0 into i repeat forever add 1 to i if char 1 of line i of field "Imported_Raw" is numToChar(13) then delete line i of field "Imported_Raw" put "Ate one return at line " & i & " of " & the number of lines of field "Imported_Raw" & " total lines." subtract 1 from i end if repeat while char 1 of line i of field "Imported_Raw" = " " delete char 1 of line i of field "Imported_Raw" put "Ate one space at line " & i & " of " & the number of lines of field "Imported_Raw" & " total lines." end repeat if line i of field "Imported_Raw" is the last line of field "Imported_Raw" then exit repeat end if end repeat end mouseUp


Here's what I noticed about execution:

1.  Importing the URL is awesome - a great feature that makes my life soooo
much easier for this project!  And fast, too!
2.  The page I grabbed consisted of 140,000 lines of code.  After removing
extra line feeds, the number of lines is around 80,000.
3.  This script runs VERY slow, compared to relatively the same script in
HyperCard running under 9.2.1 -- as an example, Revolution has been running
this script for more than 18 hours and still hasn't finished processing.
(And that's running on a Dual 2 GHz, 4 Gb RAM, OS X most current version
with all updates.)  Under HC, the similar script executed in about an hour -
running on an iMac G3/233 with 1 Gb and OS 9.2.1 -- any comments regarding
execution speed?
4.  I don't see any mechanisms for determining progress of the operation --
although I may have certainly missed something.  Are there any progress
bars, etc., that one can use in Revolution?
5.  Looking through all the examples I can find, as well as documentation, I
noted that there aren't many examples related to text manipulation - and
importing/exporting text, etc., in/out of your stack.  I'm sure I missed
something on this front, as I'm sure people would be doing this all the
time...  Can anyone point me in a direction?

I think there may be issues with the original code. For example, if the last line of the file is empty then any empty line will cause it to exit prematurely.

I've revised the handler below, with comments to help describe the admittedly liberal rewrite. My note there about the use of the mod operator to update the progress bar incrementally is weak -- ideally you should divide the data size by the number of useful scrollbar increments to get the value to use with the mod operator.

I also added a simple timing mechanism (the references to milliseconds at the top and bottom) so you can see how fast it is and compare it with similar additions to your existing script.

Even as it is, the handler below should be a few orders of magnitude faster than what you have above. But if you raise the mod value for the scrollbar update even higher you should see it gain another big speed increase.

--
 Richard Gaskin
 Fourth World Media Corporation
 ___________________________________________________
 Rev tools and more:  http://www.fourthworld.com/rev

------------------------------------------------------------


on mouseUp
  -- Get initial timing:
  put the milliseconds into s
  --
  -- Always much faster to work in a variable than field data:
  put fld "Imported_Raw" into tData
  --
  -- Since we'll use the number of lines often, let's get it only once:
  put the number of lines of tData into tNumLines
  --
  -- Progress indicator -
  -- Add a scrollbar object, set the style to "progress":
  set the endValue of scrollbar 1 to tNumLines
  put 0 into i
  -- The "repeat for each" construct is often two or three orders of
  -- magnitude faster than any other form, since it parses the chunk
  -- referenced in it as it goes while keeping a pointer into the data.
  -- In order to maintain its place in the data it must treat the data
  -- as read-only, so we'll copy the data into another var for output:
  repeat for each line tLine in tData
    add 1 to i
    -- Update our progress bar
    -- Since the time it takes the OS to redraw the scrollbar can cut
    -- into our total processing time significantly, rather than
    -- updating it in each iteration we'll update it just every 20
    -- lines:
    if i mod 20 = 0 then  set the thumbposition of scrollbar 1 to i
    --
    -- Using the constant "cr" is much faster than calling the numToChar
    -- function, which adds up a lot in a repeat, so we could use:
    --    if char 1 of tLine is cr then
    -- ...instead of:
    --    if char 1 of tLine is numToChar(13)
    --
    -- But since we're already parsing by lines that's done for us, all
    -- we need to do is see if the line is empty:
    --    if tLine is empty then
    --      put "Ate one return at line " & i & " of " & tNumLines & \
    --        " total lines."
    --      next repeat
    --    end if
    --
    -- Unless you really need to know how many spaces are removed,
    -- you can do that and this part too:
    --    repeat while char 1 of tLine = " "
    --      delete char 1 of tLine
    --      put "Ate one space at line " & i & " of " & tNumLines & \
    --         " total lines."
    --    end repeat
    --
    -- ...in just two lines:
    get word 1 to (the number of words of tLine) of tLine
    if it is empty then next repeat
    --
    -- Now we just copy the trimmed text to an output var:
    put it &cr after tOutputData
    --
  end repeat
  -- Show completed progress in case your data isn't evenly divisible
  -- by 20:
  set the thumbposition of scrollbar 1 to tNumLines
  --
  put tOutputData into fld "Processed_Data"
  --
  -- Display elapsed time:
  put the milliseconds - s
end mouseUp

_______________________________________________
use-revolution mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Not Shy...

Reply via email to