Thanks for the reply Alex. The app that I would write would not be that complex. It would simply read about 40k records from a database with a single "GET * FROM main" , run the conditionals and distance calculations and sums/averages that we have discussed, and then write about 12k records back to a database table. Probably no more than 200-300 lines of code. A user would then be able to download the processed data from the database to their desktop or mobile device AIR app very efficiently and do no processing on their device so that it is nice and speedy. But this data has to be processed once a day for 1400 cities in the US. If I was using a single computer to do the work linearly at 1 minute per run, it would take almost 24 hours. I need the data much faster than that. I want to update every market in the US at 3am local time, so about 3-4 hours to process all of the data from the East to West Coast. So if I was able to run 4 processes in parallel on a single machine without too much of a performance loss, it would finish in about 6 hours. So two machines could do it in 3 hours. I could use a scheduler to run the processes, and all would be good in the world. (By the way, I would only download updates from a service for each of the 1400 cities to update my database, but once I have the updates, the entire dataset has to be reprocessed.)
So if it is not possible to run 4 processes in parallel efficiently using C, I would either have to get an array of machines to do the work, or see how well something like Apache Spark could handle the workload. So it basically boils down to this: the speed of C plus the ability to run several processes in parallel VS the scaleability of Spark to process big datasets. Does that help at all? -- View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Multithreading-tp13274p13284.html Sent from the Apache Flex Users mailing list archive at Nabble.com.
