Agreed, technically no problem, the question if it's worth the effort.
It also would require a bit more effort on the end-application side,
since merging workers needs mpi communications. the rest could be
wrapped in a similar way.

For really (memory) expensive calculations, it might also be worth to
parallelize within 1 frame analysis, but that would require quite a
bit more thinking for each individual application.

Am 23. April 2012 21:55 schrieb Konstantin Koschke
<[email protected]>:
> Cheers guys,
>
> On Mon, Apr 23, 2012 at 5:33 PM, Christoph Junghans <[email protected]> 
> wrote:
>> Hi  Sikandar,
>>
>> you are correct!
>>
>> Threads make use of the shared memory nature of the system and hence
>> data is not communicated.
>>
>> I guess it would be possible to replace the thread class with a
>> mpi-based class, but I am not sure how much work it is.
>> Maybe Konstantin can comment on this!
>>
>> Cheers,
>>
>> Christoph
>>
>>
>> Am 23. April 2012 08:31 schrieb Sikandar Mashayak <[email protected]>:
>>> Hi
>>>
>>> As per my understanding, votca applications are thread enabled allowing
>>> us to use multiple cores on the same machine. Since, the implementation
>>> is based on the shared memory, I am not sure if we can use it on HPC
>>> to make use of multiple nodes, which require implementation based on the
>>> distributed memory.
>>>
>>> Please correct me, if I am missing anything.
>>>
>>> Thanks
>>> Sikandar
>>>
>
> Threads are limited to shared memory architectures and VOTCA is based
> on threads, thus, you're right.
> Adding MPI as a communication interface is in principle possible.
> However, I don't see a practical application and feel free to correct
> me here:
> - VOTCA uses a simple parallelization over time frames, i.e., if you
> wanted to analyze a trajectory, you could also split it up into chunks
> of the size "frames/cores" and you'd be able to compute your
> quantities in a "trivially parallel" way
> - on a current server node with 8-12 cores, this runs already into IO
> limits: in order to use all CPU power, you want to decrease IO (little
> input data) and increase FLOPS (complicated inner calculation loops);
> if your actual calculation is too simple, VOTCA will be limited by
> transferring data from the disk to the memory (IO limit); this is what
> I observe most of the time
> - if you were to generalize this approach to more heterogeneous
> architectures (Threads+MPI), you'd be hitting fast-interconnect limits
>
> Do you have a minimal example that you'd like to be sped up? The first
> step would be to make sure that your calculation is limited by the
> CPU.
>
> Best
> Konstantin
>
> --
> You received this message because you are subscribed to the Google Groups 
> "votca" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/votca?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"votca" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/votca?hl=en.

Reply via email to