On 9/14/06, Guillermo Ballester Valor <[EMAIL PROTECTED]> wrote: > Also note that it have to wait that all threads finished the first step to > perform the last carry step.
A couple of observations. Right now I am running the second verification of M44 on a shared machine (now 92% complete). The software greatly slows down if someone else launches a process that starts using up CPU time on the same processor as a Glucas thread. The wait for all threads to finish can sometimes bring the overall software to a crawl because the OS has scheduled multiple threads/processes on the same CPU. So for a dedicated Prime95 system, this shouldn't be a problem, but on a typical system if the user is running multiple processes that use a non-trivial amount of CPU time it could cause major slowdowns in the software. Tony's idea of leaving 1 or 2 cores for the OS the schedule other jobs on with a largers system could make a big difference. The other thing you have to be careful with when doing multi-threaded programming is dynamic memory allocation. I have no idea what Prime95 currently uses or plans on using but I found with my thesis work that some algorithms were actually slower when parallelized because of memory allocation blocking. The standard malloc/new libraries that most people have/use on their systems treat memory allocation as an atomic operation so no 2 threads try to allocate the same range in memory simultaneously. If you do a lot of memory allocation in threads, the overall process can be greatly slowed down by threads blocking on a memory allocation calls. Just something to keep in mind that could affect performance. George would it be possible to build some sort of mock-up using both types of designs to get an idea of how the performance might compare without having to implement the whole thing? I can test stuff on 4 CPU Opteron Linux systems if you want. Jeff. _______________________________________________ Prime mailing list [email protected] http://hogranch.com/mailman/listinfo/prime
