Mark,

        
It sounds like you may get some benefit from parallelization that comes from cluster processing, but you will have to rework your process to achieve it.

The current process seems like it would benefit from IO speedup activities. RAID, cache on the controller, etc.


If you were to move to a server class system, I'd still suggest that you look at your process/code, see if you could do more in RAM. Perhaps you could do that on a desktop system with lots of RAM and get most of the gains. That would be less hardware investment, but more coding for you.



Good luck!



        Kevin




Mark Freeze wrote:
You guys are way ahead of me on some of the hardware questions... However,
to try and answer some of them:
 I have a script that controls the following actions:
 1. Runs a c++ program that I wrote that opens a text file (the 50 - 100 MB
file that I mentioned), reads each line sequentially and splits the data
into two output files after performing numerous tasks to the data. (e.g.
checking the validity of the zip code, making sure it matches the state,
calculating amounts due, etc...
 2. Makes the second file into a dbase file
 3. Runs another c++ program on the first file that examines each record in
the file and compares it to another database (using proprietary code
libraries supplied by our software vendor) that corrects any bad info in the
address, adds a zip+4, adds carrier route info, etc...
 4. Looks for another text file to process
 5. Appends all processed text files together
 6. Appends all dbase files into one
 As I said in my previous post, each 100MB text file takes about 1 hr to
run. Most of this time is spent on step 3.
 So, would clustering speed up this sometimes 3 - 4 hr process?
 Thanks,
Mark.
--
TriLUG mailing list        : http://www.trilug.org/mailman/listinfo/trilug
TriLUG Organizational FAQ  : http://trilug.org/faq/
TriLUG Member Services FAQ : http://members.trilug.org/services_faq/

Reply via email to