Re: Pre-sending data in the multi-processor package.

Edward d'Auvergne Thu, 22 Mar 2012 03:45:00 -0700

Hi,

There's no need to get into all of the details of what I am doing,
that would take too much time.  I essentially have two aims with the
changes:


The first is to minimise the code that a user of the multi-processor
package needs to write.  I am aiming to shift as much as possible into
the package.  I'm also trying to simplify the code paths by creating a
well defined abstraction interface between the multi-package and the
user via a clean API (so that in the end the user never needs to use
the Processor_box object).  For these changes, I am testing using the
multi/test_implementation.py script.  This script also serves to teach
the user exactly what they need to minimally do to use the package.

The second is to implement additional features so that a user can
better handle rank x to rank y processor data transfers (on all
fabrics).  That is via the data_upload() and data_fetch() API
functions, which I think I'll have the back ends (behind the API) set
up a series of Slave_commands to transfer the data.  As the data
container is behind the API and not visible to the user, it can either
be global or a class object.  We can change this backend at any time
and, thanks to the API abstraction, the user code will never need to
change.  This will be tested via the multi/test_implementation2.py
script.

And one last thing I am doing is cleaning the code.  This is mainly by
eliminating dead, unused code (methods, variables, etc), completing
your TODOs and FIXMEs, and adding a lot of comments and docstrings.

Cheers,

Edward





On 22 March 2012 10:50, Gary Thompson <[email protected]> wrote:
> Hi Ed
> sorry that I didn't get to this earlier, things have been a  bit hectic,
> Arnouts had a baby, we had a complete power cut for 1 day this week and I
> have had to do helium fills as well. Anyway some thoughts.
>
> Setting data on the remote machine as a cache is a good idea.
>
> setting up a remote set of constants is easy once the multi processor is
> configured as all you need to do is queue a  multi.Slave_command  that will
> save some state on the remote machine either in  a class or module variable
> or a global.
>
> So my  thought is there is no need add any specific storage api to the
> package, the easiest thing to do would be to just add a Slave-command that
> you can queue which sets a class or global variable on the target machine.
> This means that the all the intelligence is in the add on class rather than
> in the main multi processor package. I see several good things in this
>
> 1. less api
> 2. less code to maintain
> 3. more flexibility and more modular
> 4. modules can that use the multi processor api are more isolated as they
> can save data in their own namespace rather than having problems with having
> problems with names clashing in a dict based storage area
> 5. its a better use of what python gives us
>
>
> I hope this helps I am now working my way back through the backlog
>
> regards
> gary
>
>
>
> On 03/21/2012 09:50 AM, Edward d'Auvergne wrote:
>>
>> Hi Gary,
>>
>> I think I'll start to modify the design of the multi-processor
>> package.  What is required is a data storage container within each
>> Processor instance (on each node).  As the Processor is a singleton
>> and there is only one per node, then this container would be unique.
>> There would need to be a function within the multi-processor API that
>> calling code on the master can use to send data to all slaves to be
>> stored in this data container.  As the parallelisation code is at the
>> level of the function call, then almost all data used by the slaves is
>> identical - the only difference being a few parameters.  This could
>> also be used both at the level of the initialisation of the target
>> function class to send invariant data once at the start, and then at
>> the level of the target function call to send data that changes per
>> function call (i.e. with the model parameters).  The slave_command
>> objects will then be sent to the slaves, and the slaves can then
>> access the data within these command objects and the
>> Processor.data_container objects, again probably via an API function.
>> If you don't think this is a good idea, or if you can see that you
>> have implemented something similar that I have missed, please say.
>>
>> For the API (multi/__init__.py), I am thinking of the following pair
>> of optional functions:
>>
>> def data_fetch(name=None):
>>     """API function for obtaining data from the Processor instance's data
>> store.
>>
>>     This is for fetching data from the data store of the Processor
>> instance.
>>
>>
>>     @keyword name:  The name of the data structure to fetch.
>>     @type name:     str
>>     @return:        The value of the associated data structure.
>>     @rtype:         anything
>>     """
>>
>>
>> def data_upload(name=None, value=None, rank=None):
>>     """API function for sending data to be stored on the Processor of
>> the given rank.
>>
>>     This can be used for transferring data from Processor instance i
>> to the data store of Processor instance j.
>>
>>
>>     @keyword name:  The name of the data structure to store.
>>     @type name:     str
>>     @keyword value: The data structure.
>>     @type value:    anything
>>     @keyword rank:  An optional argument to send data only to the
>> Processor of the given rank.  If None, then the data will be sent to
>> all Processor instances.
>>     @type rank:     None or int
>>     """
>>
>> The parallelised model-free code will be unaffected as the
>> parallelisation is at a much higher level and does not need this
>> mechanism.  Any feedback would be appreciated.
>>
>> Cheers,
>>
>> Edward
>
>
>>
>>
>>
>> On 14 March 2012 16:17, Edward d'Auvergne<[email protected]>  wrote:
>>>
>>> Hi Gary,
>>>
>>> Before I start hacking into the multi-processor package, I was
>>> wondering if you know of a way of pre-sending data to slave processors
>>> using the current design?  The reason is because I would like to have
>>> the parallelisation at the lowest level of the target function.  But
>>> there is a massive quantity of data which doesn't change at the target
>>> function level which would be better to transmit to and store on the
>>> slaves prior to optimisation (atomic positions, bond vectors, base NMR
>>> data, missing data flags, etc.).  This is required to keep the data
>>> transmission of the slave_command objects from killing scalability.
>>> Any ideas?
>>>
>>> Cheers,
>>>
>>> Edward
>
>
>
> --
> -------------------------------------------------------------------
> Dr Gary Thompson                  [Homans Lab Research Coordinator]
>
> Astbury Centre for Structural Molecular Biology,
> University of Leeds,
> Leeds, LS2 9JT, West-Yorkshire, UK             Tel. +44-113-3433024
> email: [email protected]                   Fax  +44-113-3431935
> -------------------------------------------------------------------
>

_______________________________________________
relax (http://nmr-relax.com)

This is the relax-devel mailing list
[email protected]

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel

Re: Pre-sending data in the multi-processor package.

Reply via email to