Agreed, technically no problem, the question if it's worth the effort. It also would require a bit more effort on the end-application side, since merging workers needs mpi communications. the rest could be wrapped in a similar way.
For really (memory) expensive calculations, it might also be worth to parallelize within 1 frame analysis, but that would require quite a bit more thinking for each individual application. Am 23. April 2012 21:55 schrieb Konstantin Koschke <[email protected]>: > Cheers guys, > > On Mon, Apr 23, 2012 at 5:33 PM, Christoph Junghans <[email protected]> > wrote: >> Hi Sikandar, >> >> you are correct! >> >> Threads make use of the shared memory nature of the system and hence >> data is not communicated. >> >> I guess it would be possible to replace the thread class with a >> mpi-based class, but I am not sure how much work it is. >> Maybe Konstantin can comment on this! >> >> Cheers, >> >> Christoph >> >> >> Am 23. April 2012 08:31 schrieb Sikandar Mashayak <[email protected]>: >>> Hi >>> >>> As per my understanding, votca applications are thread enabled allowing >>> us to use multiple cores on the same machine. Since, the implementation >>> is based on the shared memory, I am not sure if we can use it on HPC >>> to make use of multiple nodes, which require implementation based on the >>> distributed memory. >>> >>> Please correct me, if I am missing anything. >>> >>> Thanks >>> Sikandar >>> > > Threads are limited to shared memory architectures and VOTCA is based > on threads, thus, you're right. > Adding MPI as a communication interface is in principle possible. > However, I don't see a practical application and feel free to correct > me here: > - VOTCA uses a simple parallelization over time frames, i.e., if you > wanted to analyze a trajectory, you could also split it up into chunks > of the size "frames/cores" and you'd be able to compute your > quantities in a "trivially parallel" way > - on a current server node with 8-12 cores, this runs already into IO > limits: in order to use all CPU power, you want to decrease IO (little > input data) and increase FLOPS (complicated inner calculation loops); > if your actual calculation is too simple, VOTCA will be limited by > transferring data from the disk to the memory (IO limit); this is what > I observe most of the time > - if you were to generalize this approach to more heterogeneous > architectures (Threads+MPI), you'd be hitting fast-interconnect limits > > Do you have a minimal example that you'd like to be sped up? The first > step would be to make sure that your calculation is limited by the > CPU. > > Best > Konstantin > > -- > You received this message because you are subscribed to the Google Groups > "votca" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/votca?hl=en. > -- You received this message because you are subscribed to the Google Groups "votca" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/votca?hl=en.
