Re: [TriLUG] Clusters, performance, etc...

Phillip Rhodes Mon, 07 Nov 2005 18:16:21 -0800

Mark Freeze wrote:

 2. Makes the second file into a dbase file
 3. Runs another c++ program on the first file that examines each record in
the file and compares it to another database (using proprietary code
libraries supplied by our software vendor) that corrects any bad info in the
address, adds a zip+4, adds carrier route info, etc...

That definitely sounds like something you could parallelize.And you might not even have to re-code your program(depending on some "ifs" ). You could probably just dividethe input file into x chunks, where x is the number of nodes

you want working in parallel.  Then sftp the chunks to the

respective nodes and kick off your processor program. Afterit finishes, sftp the chunks back to a central location and

concatenate them and do whatever else needs to be done.

Of course that leaves you needing to sort out some things like:

1. how to divide the input file. Possibly your existingstep 1 program could do it, or maybe you could use someshell scripting and the tail command.

2. how to move the chunks. I mentioned sftp as an example,but you could use ftp, nfs, samba, or pretty much anything.

3. controlling the worker processes... if the processcurrently works by watching a directory for a file, you maynot have to do anything special. Otherwise you may have towork out another way to kick off the process on the remotenodes.

4. work out how to determine when the process(es) is donewith a chunk of file so you can gather up your chunks tocontinue processing.



HTH.


TTYL,


Phil
--
North Carolina - First In Freedom

Free America - Vote Libertarian
www.lp.org

--
TriLUG mailing list        : http://www.trilug.org/mailman/listinfo/trilug
TriLUG Organizational FAQ  : http://trilug.org/faq/
TriLUG Member Services FAQ : http://members.trilug.org/services_faq/

Re: [TriLUG] Clusters, performance, etc...

Reply via email to