Hi Konstantin I think, I get your point. I am just restating what you said, to make sure I understand it right.
Yes, parallelization algorithm used in VOTCA is trivially parallel, as multiple frames are processed simultaneously by different cores, and averages are computed in the end once all cores finish their computation. If the computations performed each frame are not intensive then performance is mainly limited by I/O of frame reading. Would MPI-enabling make sense if per frame analysis is computationally intensive, and required number of frames to process are quite large? In that case, it would be useful to use as many possible nodes offered by HPC cluster, keeping the same trivially parallel algorithm. Thanks Sikandar On Mon, Apr 23, 2012 at 3:55 PM, Konstantin Koschke < [email protected]> wrote: > Cheers guys, > > On Mon, Apr 23, 2012 at 5:33 PM, Christoph Junghans <[email protected]> > wrote: > > Hi Sikandar, > > > > you are correct! > > > > Threads make use of the shared memory nature of the system and hence > > data is not communicated. > > > > I guess it would be possible to replace the thread class with a > > mpi-based class, but I am not sure how much work it is. > > Maybe Konstantin can comment on this! > > > > Cheers, > > > > Christoph > > > > > > Am 23. April 2012 08:31 schrieb Sikandar Mashayak <[email protected] > >: > >> Hi > >> > >> As per my understanding, votca applications are thread enabled allowing > >> us to use multiple cores on the same machine. Since, the implementation > >> is based on the shared memory, I am not sure if we can use it on HPC > >> to make use of multiple nodes, which require implementation based on the > >> distributed memory. > >> > >> Please correct me, if I am missing anything. > >> > >> Thanks > >> Sikandar > >> > > Threads are limited to shared memory architectures and VOTCA is based > on threads, thus, you're right. > Adding MPI as a communication interface is in principle possible. > However, I don't see a practical application and feel free to correct > me here: > - VOTCA uses a simple parallelization over time frames, i.e., if you > wanted to analyze a trajectory, you could also split it up into chunks > of the size "frames/cores" and you'd be able to compute your > quantities in a "trivially parallel" way > - on a current server node with 8-12 cores, this runs already into IO > limits: in order to use all CPU power, you want to decrease IO (little > input data) and increase FLOPS (complicated inner calculation loops); > if your actual calculation is too simple, VOTCA will be limited by > transferring data from the disk to the memory (IO limit); this is what > I observe most of the time > - if you were to generalize this approach to more heterogeneous > architectures (Threads+MPI), you'd be hitting fast-interconnect limits > > Do you have a minimal example that you'd like to be sped up? The first > step would be to make sure that your calculation is limited by the > CPU. > > Best > Konstantin > > -- > You received this message because you are subscribed to the Google Groups > "votca" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/votca?hl=en. > > -- You received this message because you are subscribed to the Google Groups "votca" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/votca?hl=en.
