Re: AW: Apache Tuscany doubts

Simon Laws Tue, 01 Jul 2008 06:38:47 -0700

On Fri, Jun 27, 2008 at 4:38 PM, Raymond Feng <[EMAIL PROTECTED]> wrote:


> Thanks Simon for the quick and nice response. Just want to clarify a bit on
> the data transfer: In Tuscany SCA, the data can be in any format, XML or
> binary. What matters is the data formats that the communication protocol
> (binding) can handle. We also have a databinding framework to enable
> transparent data transformation across formats.
>
> For example, you can model the data transfer service as:
>
> DataTransfer
>   byte[] receiveData(...);
> or
>   InputStream receiveData();
> or
>   Image receiveData();
>
> Thanks,
> Raymond
>
> --------------------------------------------------
> From: "Simon Laws" <[EMAIL PROTECTED]>
> Sent: Friday, June 27, 2008 6:18 AM
> To: <[email protected]>
> Subject: Re: AW: Apache Tuscany doubts
>
>  On Fri, Jun 27, 2008 at 1:56 AM, Malte Marquarding <
>> [EMAIL PROTECTED]> wrote:
>>
>>  Hi Raymond,
>>>
>>> The system we envisage is roughly as follows.
>>>
>>> We are running a set of radio telescopes at remote site. The high level
>>> exposure of the various parts of system should be via service components.
>>> The actual implementation is specific to the problem.We have thl
>>> telescope
>>> control (of the hardware). The data generated form these telescopes has
>>> to
>>> be processed on a supercomputer ( hence implementation in c++), through a
>>> set of services like Calibration, Imaging, Analysis. It also has some
>>> control queue (running the end-to-end process, referencing the various
>>> services), archiving, logging, user access (virtual observatory)
>>> components
>>> etc.
>>>
>>> I am investigating tuscany for this.
>>>
>>> Another unrelated  problem I have is that I can't see (with SDOs and SCA
>>> )
>>> how to handle the transfer of the data. The data output of the c++
>>> services
>>> is tens to hundreds of Terabytes. I was thinking of having a DataMoving
>>> service encapsulating something like GridFTP. Has anyone got a suggestion
>>> of
>>> how to handle this in an SCA context.
>>>
>>> I can give more details if necessary.
>>>
>>> Cheers,
>>> Malte.
>>>
>>> On Fri, Jun 27, 2008 at 4:16 AM, Raymond Feng <[EMAIL PROTECTED]>
>>> wrote:
>>>
>>> > Hi,
>>> >
>>> > Can you describe the use cases you have in mind? It will help us better
>>> > understand what you want to achieve.
>>> >
>>> >
>>>
>>>
>> Hi Malte
>>
>> The bindings we have implemented to date are intended to operate in the
>> typical SOA environment where you pass data to a component and ask it to
>> do
>> something. We tend to talk in terms of XML documents which will be fine
>> for
>> the control messages you need but not suitable for the telescope data
>> itself.
>>
>> To try and understand the subtleties of your scenario I'm going to take
>> the
>> components you suggested and invent some operations that we might expect
>> to
>> find there...
>>
>> TelescopeControl
>>  PerformObservation(ObservatonParameters, ObservationId)
>>   // I assume you give the telescope a job to do, i.e point at the sky and
>> record the results against a given ID
>>   // and then callback when the task is complete
>> DataTransfer
>>  Transfer(FromLocation, ToLocation)
>>   // just manages the task of moving large datasets across the network. As
>> you say gridFtp could be a candidate here.
>> Calibration
>>  Run(ObservationId)
>>  GetDatasetLocation(ID)
>> Imaging
>>  Run(ObservationId)
>>  GetDatasetLocation(ID)
>> Analysis
>>  Run(ObservationId)
>>  GetDatasetLocation(ID)
>> Archive
>>  GetDatasetLocation(ID)
>> Logging
>>  // are you logging control messages here
>> Coordination
>>  DoSometing()
>>  // coordinate the activities of the application, for example, it might do
>>  TelescopeControl.PerformObservation(someParames, "reading1")
>>  // when task is complete
>>  fromLocation = TelescopeControl.GetDatasetLocation("reading1")
>>  toLocation = Calibration.GetDatasetLocation("reading1")
>>  DataTransfer.Transfer(fromLocation, toLocation)
>>  Calibration.run("reading1")
>>
>> etc.
>>
>> The Calibration, Analysis, Imaging components are a bit tricky to
>> visualize.
>> Are they closely related or stand alone? Do they always have to run in
>> sequence in the same order? What sort of infrastructure do they rely on?
>> For
>> example, you mention a supercomputer so are we talking MPI collectives and
>> Condor like schedulers. In which case an SCA component such as "Calibrate"
>> may just wrap the task of creating and submitting JSDL to the scheduler
>> rather than representing the Calibration code itself.  You may even resort
>> to a more generic "ComputeEngine" component that allows you to dynamically
>> configure jobs to be run.
>>
>> Personally I would like Tuscany to be able to slot right in here so that
>> the
>> analytical components could be supported in HPC environments with
>> appropriate SCA implementation types, bindings and integration with the
>> underlying HPC and grid infrastructure. We are not really there yet. I've
>> done some work on a LoadBalancer demo that shows Tuscany running in a
>> Tomcat
>> cluster but scenarios like yours can really help us all think what
>> features
>> would be appropriate.
>>
>> Regards
>>
>> Simon
>>
>>
Hi Raymond

Yeah, I'm not saying that SCA components would never be at either end of the
data transfer but that we don't have the HPC type bindings to support this
at the moment. I was proposing that an incremental step would be to use a
service to orchestrate the data transfer, using a well known mechanism like
GFTP or UPS, and build our experience so that we can write a suitable
binding.

Assuming that we have a suitable high performance databinding in the future
the question is how to use it. I wasn't thinking that we would wire the
telescope to the anaylsis directly for data transport purposes as which
analysis and compute engine is chosen seems to be subject to external
process.Neither was I thinking of wiring the telescope to the analysis via
the controller component for data transfer puposes. Control messages yes but
not data transfer. Hence I was suggesting that a data transfer component
would control how the data is moved from the telescope to the compute
engine. In this later case you could imagine a CallableRerence being passed
to the compute engine so that it can pull down the required data form the
telescope. So I think the types of components that we have could remain
fairly constant but we could experiment with increasing the level of
Tuscany/SCA support as we learn more.

Thoughts?

Simon

Re: AW: Apache Tuscany doubts

Reply via email to