Thanks Simon for the quick and nice response. Just want to clarify a bit on the data transfer: In Tuscany SCA, the data can be in any format, XML or binary. What matters is the data formats that the communication protocol (binding) can handle. We also have a databinding framework to enable transparent data transformation across formats.

For example, you can model the data transfer service as:

DataTransfer
   byte[] receiveData(...);
or
   InputStream receiveData();
or
   Image receiveData();

Thanks,
Raymond

--------------------------------------------------
From: "Simon Laws" <[EMAIL PROTECTED]>
Sent: Friday, June 27, 2008 6:18 AM
To: <[email protected]>
Subject: Re: AW: Apache Tuscany doubts

On Fri, Jun 27, 2008 at 1:56 AM, Malte Marquarding <
[EMAIL PROTECTED]> wrote:

Hi Raymond,

The system we envisage is roughly as follows.

We are running a set of radio telescopes at remote site. The high level
exposure of the various parts of system should be via service components.
The actual implementation is specific to the problem.We have thl telescope control (of the hardware). The data generated form these telescopes has to
be processed on a supercomputer ( hence implementation in c++), through a
set of services like Calibration, Imaging, Analysis. It also has some
control queue (running the end-to-end process, referencing the various
services), archiving, logging, user access (virtual observatory) components
etc.

I am investigating tuscany for this.

Another unrelated problem I have is that I can't see (with SDOs and SCA ) how to handle the transfer of the data. The data output of the c++ services
is tens to hundreds of Terabytes. I was thinking of having a DataMoving
service encapsulating something like GridFTP. Has anyone got a suggestion
of
how to handle this in an SCA context.

I can give more details if necessary.

Cheers,
Malte.

On Fri, Jun 27, 2008 at 4:16 AM, Raymond Feng <[EMAIL PROTECTED]> wrote:

> Hi,
>
> Can you describe the use cases you have in mind? It will help us better
> understand what you want to achieve.
>
>


Hi Malte

The bindings we have implemented to date are intended to operate in the
typical SOA environment where you pass data to a component and ask it to do something. We tend to talk in terms of XML documents which will be fine for
the control messages you need but not suitable for the telescope data
itself.

To try and understand the subtleties of your scenario I'm going to take the components you suggested and invent some operations that we might expect to
find there...

TelescopeControl
 PerformObservation(ObservatonParameters, ObservationId)
// I assume you give the telescope a job to do, i.e point at the sky and
record the results against a given ID
   // and then callback when the task is complete
DataTransfer
 Transfer(FromLocation, ToLocation)
// just manages the task of moving large datasets across the network. As
you say gridFtp could be a candidate here.
Calibration
 Run(ObservationId)
 GetDatasetLocation(ID)
Imaging
 Run(ObservationId)
 GetDatasetLocation(ID)
Analysis
 Run(ObservationId)
 GetDatasetLocation(ID)
Archive
 GetDatasetLocation(ID)
Logging
  // are you logging control messages here
Coordination
 DoSometing()
// coordinate the activities of the application, for example, it might do
  TelescopeControl.PerformObservation(someParames, "reading1")
  // when task is complete
  fromLocation = TelescopeControl.GetDatasetLocation("reading1")
  toLocation = Calibration.GetDatasetLocation("reading1")
  DataTransfer.Transfer(fromLocation, toLocation)
  Calibration.run("reading1")

etc.

The Calibration, Analysis, Imaging components are a bit tricky to visualize.
Are they closely related or stand alone? Do they always have to run in
sequence in the same order? What sort of infrastructure do they rely on? For
example, you mention a supercomputer so are we talking MPI collectives and
Condor like schedulers. In which case an SCA component such as "Calibrate"
may just wrap the task of creating and submitting JSDL to the scheduler
rather than representing the Calibration code itself.  You may even resort
to a more generic "ComputeEngine" component that allows you to dynamically
configure jobs to be run.

Personally I would like Tuscany to be able to slot right in here so that the
analytical components could be supported in HPC environments with
appropriate SCA implementation types, bindings and integration with the
underlying HPC and grid infrastructure. We are not really there yet. I've
done some work on a LoadBalancer demo that shows Tuscany running in a Tomcat cluster but scenarios like yours can really help us all think what features
would be appropriate.

Regards

Simon

Reply via email to