Thanks for the reply Alex.  The app that I would write would not be that
complex.  It would simply read about 40k records from a database with a
single "GET * FROM main" , run the conditionals and distance calculations
and sums/averages that we have discussed, and then write about 12k records
back to a database table.  Probably no more than 200-300 lines of code.  A
user would then be able to download the processed data from the database to
their desktop or mobile device AIR app very efficiently and do no processing
on their device so that it is nice and speedy.  But this data has to be
processed once a day for 1400 cities in the US.  If I was using a single
computer to do the work linearly at 1 minute per run, it would take almost
24 hours.  I need the data much faster than that.  I want to update every
market in the US at 3am local time, so about 3-4 hours to process all of the
data from the East to West Coast. So if I was able to run 4 processes in
parallel on a single machine without too much of a performance loss, it
would finish in about 6 hours.  So two machines could do it in 3 hours.  I
could use a scheduler to run the processes, and all would be good in the
world. (By the way, I would only download updates from a service for each of
the 1400 cities to update my database, but once I have the updates, the
entire dataset has to be reprocessed.)

So if it is not possible to run 4 processes in parallel efficiently using C,
I would either have to get an array of machines to do the work, or see how
well something like Apache Spark could handle the workload.  So it basically
boils down to this:  the speed of C plus the ability to run several
processes in parallel VS the scaleability of Spark to process big datasets.

Does that help at all?



--
View this message in context: 
http://apache-flex-users.2333346.n4.nabble.com/Multithreading-tp13274p13284.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Reply via email to