Re: [Taverna-hackers] [MYGRID] a "would love to use Taverna but.." story

Stian Soiland-Reyes Mon, 30 Mar 2009 01:25:55 -0700

On Wed, Mar 25, 2009 at 18:20, Richard Holland
<[email protected]> wrote:
> I'm already aware of how to fix the data reference problem for local
> activities (i.e. not webservices) using the new T2 Platform, as it is
> fairly easy to write new kinds of data reference handlers for it which
> can intercept the various file operations, but this is no good for the
> T2 GUI which does not yet use the T2 Platform (and so plugins are
> incompatible between the two), and neither does it solve the webservices
> issue (which would require both a new T2 GUI plugin and new kinds of
> webservices, just as I've been doing to solve the Globus issue). Also


The cleanest solution would be for the services to return URIs instead
of the actual data. The problem is that as far as I know there is not
yet any official way in SOAP to say that "The actual result is the
value behind this URI" - so we have to invent something ourself - for
instance a simple subclass of xsd:anyURI that we can call .. say..
URIRef.. or some not-so-silly name.

It should then be possible to patch the XML splitter of wsdl-activity
to recognize this, and register it as a real URI (and not just a
string that looks like an URI) in Taverna.  The idea is then that the
XML splitter activity recognizes the type URIRef and it will therefore
not try to dereference the passed URI, but just insert the link
directly.  If you pass such an output to any non-confirming service,
the URI will be dereferenced (read: downloaded) and the actual value
inserted into the message.


Note that from an activity point of view the main difference between
the T2 platform and the Taverna 2.0 workbench is not that big, there's
some different parameters in the interface you have to implement - and
your Maven dependencies are different, but code-wise there's no big
change - so it should be quite easy to support both the platform and
the workbench.

Note that you can even do this intermediately using shim services, you
can do this even in an unmodified Taverna 1 - if you are connecting
two "compatible" services, just pass the URI directly. If not - insert
the local worker "Fetch image from web page" (for binaries) or "Fetch
web page" (for text).


The actual URIs can just be in the style of:
  
http://myservice.university.ac.uk:8080/myService/data/E1D67277-82CC-47C4-A13F-06654AAECBCD

..the trick on the service side is to support two things - if the URI
starts with the same prefix as those it can generate  (in this case
http://myservice.university.ac.uk:8080/myService/data/) - it can just
chop off and look for the last bit in it's local file store, for
instance in /var/tmp/myServiceData/E1D67277-82CC-47C4-A13F-06654AAECBCD.
 (But do remember to secure your service so you can't ask for
../../../../../etc/passwd !)

If the URI is "external" - it will download from the given URI - this
would then also support service-to-service data transfer.

To support both referenced and non-referenced inputs (to avoid clients
having to do an upload to a third party site), you can use xsd:choice
in the XML schemas.

You still have the question of how to support big uploads - you can do
that by having a simple REST-service at say
http://myservice.university.ac.uk:8080/myService/uploads (to avoid the
SOAP overhead) and do a POST there of the big data.

The URL for this can be retrieved from some special method
(getUploadService() ? ) on the endpoint - we can look at how
WSRF-services in Globus is doing a very similar thing for inspiration.

The returned URLs (sent with a 201 Created and a Location header to
http://myservice.university.ac.uk:8080/myService/uploads/53780139-01E5-4F86-8AA0-EDBD1507ED57
) can be used as inputs with the SOAP service. You can avoid
porn/spam-abuse etc. by making those URLs only be downloadable by your
own service - the easiest would be to just block any download and use
the same trick as before, but look in the upload directory
var/tmp/myServiceDataUploads.


The advantage of going for simple HTTP is that you can then do
cross-service data referencing (avoiding the slow ADSL link down to
the Taverna user), even if the two services have never heard about
each-other before, are implemented in different languages, etc.

I suggest initially only supporting "raw" data (binaries and string)
in this 'big' upload - but  you could extend this to support
XSD-described XML documents as well -  although I believe creating and
parsing such documents would bring back the majority of the problems
you wanted to avoid in the first place.  (You can still have multiple
URI references in a single SOAP message, just put the references at
the 'right' level)


When dereferencing server-side you can use something like the Download
Manager in the Taverna 2 platform, which can avoid double-downloading
the same URI.

-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester

------------------------------------------------------------------------------
_______________________________________________
taverna-hackers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/taverna-hackers
Developers Guide: http://www.mygrid.org.uk/usermanual1.7/dev_guide.html
FAQ: http://www.mygrid.org.uk/wiki/Mygrid/TavernaFaq

Re: [Taverna-hackers] [MYGRID] a "would love to use Taverna but.." story

Reply via email to