It started with some friendly ribbing. I have a friend who thinks that the
answer to all things is "perl," and he goes on and on about its benefits,
speed, flexibility, etc., etc. Meanwhile, perl code looks to me like
something my cat created when he walked across the keyboard.

Having had enough, and with an actual, simple project to test, I created
equivalent perl and RB apps to process a 100 MB text file. The processing
was fairly simple: If a line was a pure number, a constant string was added
to the beginning of it, otherwise, the text was wrapped in another constant
string and quotes were replaced by double-quotes.

For example, this text:

 1
 Text 1
 2
 Text 2
 3
 Text "3"

Became this text:

 case 1
 r = "Text 1"
 case 2
 r = "Text 2"
 case 3
 r = "Text ""3"""

Imagine my surprise when it took RB a minute to do what perl did in 15
seconds. I'm not surprised that perl is faster, only HOW much faster it is.
My argument was going to be, "see, difference wasn't that much, and RB is
much easier to read," but these results make it no contest.

So my question is: Why is RB so much slower? What is it doing that's making
the difference? And could I do something to close the gap?

The code I used for each is as follows:

 perl -lne 'chomp ; if(/^\d+$/){print ('\''case '\'' , $_)} else {s
|\"|\"\"|g ; print("r = \"" , $_ , "\"\n")}' test.txt > perltest.txt

And for REALbasic:

  #pragma BackgroundTasks false
  #pragma BoundsChecking false
  
  dim fIn, fOut as FolderItem
  dim t as Double = microseconds
  
  fIn = DesktopFolder.Child( "test.txt" )
  fOut = DesktopFolder.Child( "output.txt" )
  
  dim tIn as TextInputStream = fIn.OpenAsTextFile
  dim tOut as TextOutputStream = fOut.CreateTextFile
  
  dim line as string
  while not tIn.EOF
    line =  tIn.ReadLine
    if IsNumeric( line ) then
      tOut.WriteLine( "case " + line )
    else
      line = ReplaceAllB( line, """", """""" )
      tOut.WriteLine( "r = """ + line + """" + chr( 13 ) )
    end if
  wend
  
  tIn.Close
  tOut.Close
  
  t = microseconds - t
  t = t / 1000000
  
  MsgBox( format( t, "#,0.000" ) + " seconds" )
  
BTW, I also created a version that loaded the whole file at once, split it,
modified it without testing (since I knew that every odd line was a number
and even line text) and dumped it back to a file. That took 24 seconds!

__________________________________________________________________________
Kem Tekinay                                                 (212) 201-1465
MacTechnologies Consulting                              Fax (914) 242-7294
http://www.mactechnologies.com                        Pager (917) 491-5546

  To join the MacTechnologies Consulting mailing list, send an e-mail to:
           [EMAIL PROTECTED]








_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Reply via email to