On Dec 08, 2006, at 22:09 UTC, Daniel L. Taylor wrote:

> Regarding your first implementation:
> 
> * You're reading/writing data repeatedly in small chunks. Horrible
use of
> disk I/O. Every read is followed by a write, which means every loop
involves
> two head seeks, and the reads/writes are much smaller than optimal
for a
> burst.

I don't think so.  There is caching at many levels: the disk itself,
the OS, and then within REALbasic.  At least in theory, you shouldn't
see too much penalty for this use behavior at the RB level.

Of course, you may well get a small speedup from reading all the data
into memory, and writing out all the results at once.  Hard to say for
sure, but I wouldn't expect it to be a huge difference.

> * I'm not sure, but I would bet that ( "case " + line ) and ( "r =
""" +
> line + """" + chr( 13 ) ) allocate new strings before writing to
disk. (I
> doubt the compiler is optimized to recognize what's happening and
call Write
> repeatedly or, better yet, call a version that accepts an array of
strings
> to write in order.)

All true.  OTOH, if you replace these with multiple calls to Write,
then you have the overhead of more function calls (and I suspect that a
substantial fraction of the time in this code is going into function
call overhead).  I can't guess which would be better in this case.

> * ReplaceAllB( line, """", """""" ) forces a string allocation/copy.

True, but again, alternatives may be worse.

> * chr( 13 ) is a wasted function call that forces yet another memory
> allocation.

This is certainly true.  This function call should be moved out of the
loop.

> So the data probably ended up copied around 3 or 4x with all the
related
> memory allocations. Given all of that, it's a testament to split and
RB that
> you got close to Perl's speed!

Heh, that's a good point.

> It would be easy to hand Perl its lunch using C for this example. I'm
not
> sure you can do it in RB because the language lacks the structures and
> compiler optimizations necessary to efficiently treat and manipulate
a block
> of memory as an array of values.

Mainly because of function-call overhead.  We can hope that at some
point, the compiler will be able to inline certain small functions
(memoryblock accessors spring immediately to mind), or reduce the
overhead of function calls in general.

Best,
- Joe

--
Joe Strout -- [EMAIL PROTECTED]
Verified Express, LLC     "Making the Internet a Better Place"
http://www.verex.com/

_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Reply via email to