Mark Freeze wrote:

 2. Makes the second file into a dbase file
 3. Runs another c++ program on the first file that examines each record in
the file and compares it to another database (using proprietary code
libraries supplied by our software vendor) that corrects any bad info in the
address, adds a zip+4, adds carrier route info, etc...

That definitely sounds like something you could parallelize. And you might not even have to re-code your program (depending on some "ifs" ). You could probably just divide the input file into x chunks, where x is the number of nodes
you want working in parallel.  Then sftp the chunks to the
respective nodes and kick off your processor program. After it finishes, sftp the chunks back to a central location and
concatenate them and do whatever else needs to be done.

Of course that leaves you needing to sort out some things like:

1. how to divide the input file. Possibly your existing step 1 program could do it, or maybe you could use some shell scripting and the tail command.

2. how to move the chunks. I mentioned sftp as an example, but you could use ftp, nfs, samba, or pretty much anything.

3. controlling the worker processes... if the process currently works by watching a directory for a file, you may not have to do anything special. Otherwise you may have to work out another way to kick off the process on the remote nodes.

4. work out how to determine when the process(es) is done with a chunk of file so you can gather up your chunks to continue processing.


HTH.


TTYL,


Phil
--
North Carolina - First In Freedom

Free America - Vote Libertarian
www.lp.org

--
TriLUG mailing list        : http://www.trilug.org/mailman/listinfo/trilug
TriLUG Organizational FAQ  : http://trilug.org/faq/
TriLUG Member Services FAQ : http://members.trilug.org/services_faq/

Reply via email to