Re: AW: AW: How trim: Bug in RegExp engine

Wouter Tue, 25 Oct 2005 17:33:22 -0700

On 25 Oct 2005, at 23:39, Thomas Fischer wrote:

-snip-

3. It seems that regular expressions are to be avoided in timesensitive parts of the script anyway. Playing around a little bit Ifound that the RegExp solution I suggested took far more time thanany other solution (by about a factor of 10 compared with thefastest solution). Probably this should be optimized in an updatedversion. It seems that regular expressions in Perl are by a factor6 faster (and again 8 times faster on my PC laptop).On the other hand, this shows that those cumbersome repeat loopsare surprisingly fast.
The fastest is:

  while char 1 of testString is space
    delete char 1 of testString
  end repeat
taking about 1.8 microseconds per round (11 ticks for 100000repeats), and with a string whiteSpace = tab && return a loop with
  while char 1 of testString is in whiteSpace
takes about twice as long.

  word 1 to -1 of testString
removing whitespace at the front and the end simultaneously is onlya little slower, while using token
  token 1 to -1 of testString
takes surprisingly three times as long as using "word".


These timing tests are not completely fair because:

while char 1 of testString is space -> removes only space fromfront if any

    delete char 1 of testString
  end repeat

word 1 to -1 of testString -> removes tabs, spaces and returns fromfront and back of string if anytoken 1 to -1 of testString -> removes tabs, spaces, hard spaces,and returns from front and back of string if any

You compare time it takes for frontal removal of space if any withtime it takes for frontal and back removal of tab and space if any ortab, space and hard space (semicolon and return) if any.To make it more fair, the time testing handlers should be equalizedon the removal of tabs, spaces and hard spaces from front and back ofa string.

On the other hand this gives an indication of which way to use inwhat case.


Greetings,
Wouter

PS

for token 1 to -1 of testString -> watch out for semicolon as it willbe treated as a whitespace or itemdelimiter for token.In the docs is stated that (semicolon), space, return, and tab arethe itemdelimiters for token.As hard spaces are also removed this listing is not complete and hardspace should be added.

For me it seems kind of weird to the consider (semicolon), space,(hardspace), return, and tab as "itemdelimiters" for token, becausethey are removed as being whitespaces and are not really acting as anitemdelimiter.On the other hand tokens themselves are more acting like a specialkind of itemdelimiter.

_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: AW: AW: How trim: Bug in RegExp engine

Reply via email to