From the dept of "Please don't let my wife or boss know that I've been wasting time on this"....

Chipp Walters wrote:
Interesting note: I found the following results:
function altAsciiScrub2 pText
   put replacetext(pText,"[" & numToChar(129) & "-" & numToChar(255) &
"]","") into pText
   put  replacetext(pText,"[" & numToChar(1) & "-" & numToChar(31) & "]","")
into pText
   return pText
end altAsciiScrub2

executed in 19 ticks on my QuadCore Vista 64 machine

function altAsciiScrub1 pText
   repeat for each char c in pText
      get charToNum(c)
      if it > 128 or it < 32 then
         next repeat
      end if
      put c after t
   end repeat
   return t
end altAsciiScrub1

executed in 17 ticks on my QuadCore Vista 64 machine

repeat for each is really fast.


Looking this I couldn't help wondering "does it make a difference if there are a lot of high-code characters to delete?" and "regex is a bit slower: but does the setup of regex pay off if the string is long enough"? (Chipp didn't specify what kind of input he wanted to work on, or tested with.)

And I also wondered, "what other ways could we trade off setup over a long repeat"?

(BTW strictly speaking ASCII is 0-127; both the routines above are allowing 128. I only mention because it took me a while to figure out why my routines sometimes returned different results to the above two; the difference depended on whether there as a character with code 128 in the test string.)

So I implemented another couple of options: one taking Devin's suggestion of calling replace with the characters you don't want: thus only calling numtochar/chartonum a fixed number of times, regardless of the length of time.

   function asciiScrub3 pText
      set the caseSensitive to true
      -- delete any characters below space
      repeat with i = 0 to 31
         replace numtochar(i) with empty in pText
      end repeat
      -- delete any characters above ASCII
      repeat with i = 128 to 255
         replace numtochar(i) with empty in pText
      end repeat
      return pText
   end asciiScrub3

As you'd expect, this is much slower than the above approaches for a short string; but it is slightly faster than either of them for a large string - more so if there are a lot of high-code characters in the string.

In my work, I'm often dealing with non-ASCII characters; but it rarely suffices to delete them, I generally need to convert them to something else (either the relevant ASCII character, or to another character set). The built-in functions in Rev are often frustratingly just off the mark for this, so I tend to just run the characters through an array, blessing each time I do so how fast repeat for each is, and how fast arrays are. So I naturally wondered how that approach would work even when I just wanted to delete the characters over 128:


   function asciiScrub4 pText
      set the caseSensitive to true
      -- set up array to map characters we want to retain to themselves
      put empty into a
      repeat with i = 32 to 127
         get numtochar(i)
         put it into a[it]
      end repeat
      -- filter the string through the array set up above
      put empty into t
      repeat for each char c in pText
         put a[c] after t
      end repeat
      return t
   end asciiScrub4

Again, as you'd expect, the setup time costs on a short string. But on large blocks of text, this turned out to be the fastest method - not quite twice as fast as the original two, but getting there.

As for composition, it makes less of a difference than I thought; all functions are slightly faster if the source string contains more high-code characters (ie if the output string is shorter); there doesn't seem to be a very significant difference between how routines are affected by this.

Mostly what I demonstrated was that all the approaches are so fast that especially on a short string it's hard to get any significance in the relative timings - you'd have to be doing a vast amount of processing to justify spending any time doing better than whatever the first approach you came up with was. And that sometimes I'll do almost anything to avoid the work I should be getting on with...

- Ben

_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to