On Dec 08, 2006, at 22:09 UTC, Daniel L. Taylor wrote: > Regarding your first implementation: > > * You're reading/writing data repeatedly in small chunks. Horrible use of > disk I/O. Every read is followed by a write, which means every loop involves > two head seeks, and the reads/writes are much smaller than optimal for a > burst.
I don't think so. There is caching at many levels: the disk itself, the OS, and then within REALbasic. At least in theory, you shouldn't see too much penalty for this use behavior at the RB level. Of course, you may well get a small speedup from reading all the data into memory, and writing out all the results at once. Hard to say for sure, but I wouldn't expect it to be a huge difference. > * I'm not sure, but I would bet that ( "case " + line ) and ( "r = """ + > line + """" + chr( 13 ) ) allocate new strings before writing to disk. (I > doubt the compiler is optimized to recognize what's happening and call Write > repeatedly or, better yet, call a version that accepts an array of strings > to write in order.) All true. OTOH, if you replace these with multiple calls to Write, then you have the overhead of more function calls (and I suspect that a substantial fraction of the time in this code is going into function call overhead). I can't guess which would be better in this case. > * ReplaceAllB( line, """", """""" ) forces a string allocation/copy. True, but again, alternatives may be worse. > * chr( 13 ) is a wasted function call that forces yet another memory > allocation. This is certainly true. This function call should be moved out of the loop. > So the data probably ended up copied around 3 or 4x with all the related > memory allocations. Given all of that, it's a testament to split and RB that > you got close to Perl's speed! Heh, that's a good point. > It would be easy to hand Perl its lunch using C for this example. I'm not > sure you can do it in RB because the language lacks the structures and > compiler optimizations necessary to efficiently treat and manipulate a block > of memory as an array of values. Mainly because of function-call overhead. We can hope that at some point, the compiler will be able to inline certain small functions (memoryblock accessors spring immediately to mind), or reduce the overhead of function calls in general. Best, - Joe -- Joe Strout -- [EMAIL PROTECTED] Verified Express, LLC "Making the Internet a Better Place" http://www.verex.com/ _______________________________________________ Unsubscribe or switch delivery mode: <http://www.realsoftware.com/support/listmanager/> Search the archives of this list here: <http://support.realsoftware.com/listarchives/lists.html>
