On Wed, Mar 25, 2009 at 18:20, Richard Holland <[email protected]> wrote: > I'm already aware of how to fix the data reference problem for local > activities (i.e. not webservices) using the new T2 Platform, as it is > fairly easy to write new kinds of data reference handlers for it which > can intercept the various file operations, but this is no good for the > T2 GUI which does not yet use the T2 Platform (and so plugins are > incompatible between the two), and neither does it solve the webservices > issue (which would require both a new T2 GUI plugin and new kinds of > webservices, just as I've been doing to solve the Globus issue). Also
The cleanest solution would be for the services to return URIs instead of the actual data. The problem is that as far as I know there is not yet any official way in SOAP to say that "The actual result is the value behind this URI" - so we have to invent something ourself - for instance a simple subclass of xsd:anyURI that we can call .. say.. URIRef.. or some not-so-silly name. It should then be possible to patch the XML splitter of wsdl-activity to recognize this, and register it as a real URI (and not just a string that looks like an URI) in Taverna. The idea is then that the XML splitter activity recognizes the type URIRef and it will therefore not try to dereference the passed URI, but just insert the link directly. If you pass such an output to any non-confirming service, the URI will be dereferenced (read: downloaded) and the actual value inserted into the message. Note that from an activity point of view the main difference between the T2 platform and the Taverna 2.0 workbench is not that big, there's some different parameters in the interface you have to implement - and your Maven dependencies are different, but code-wise there's no big change - so it should be quite easy to support both the platform and the workbench. Note that you can even do this intermediately using shim services, you can do this even in an unmodified Taverna 1 - if you are connecting two "compatible" services, just pass the URI directly. If not - insert the local worker "Fetch image from web page" (for binaries) or "Fetch web page" (for text). The actual URIs can just be in the style of: http://myservice.university.ac.uk:8080/myService/data/E1D67277-82CC-47C4-A13F-06654AAECBCD ..the trick on the service side is to support two things - if the URI starts with the same prefix as those it can generate (in this case http://myservice.university.ac.uk:8080/myService/data/) - it can just chop off and look for the last bit in it's local file store, for instance in /var/tmp/myServiceData/E1D67277-82CC-47C4-A13F-06654AAECBCD. (But do remember to secure your service so you can't ask for ../../../../../etc/passwd !) If the URI is "external" - it will download from the given URI - this would then also support service-to-service data transfer. To support both referenced and non-referenced inputs (to avoid clients having to do an upload to a third party site), you can use xsd:choice in the XML schemas. You still have the question of how to support big uploads - you can do that by having a simple REST-service at say http://myservice.university.ac.uk:8080/myService/uploads (to avoid the SOAP overhead) and do a POST there of the big data. The URL for this can be retrieved from some special method (getUploadService() ? ) on the endpoint - we can look at how WSRF-services in Globus is doing a very similar thing for inspiration. The returned URLs (sent with a 201 Created and a Location header to http://myservice.university.ac.uk:8080/myService/uploads/53780139-01E5-4F86-8AA0-EDBD1507ED57 ) can be used as inputs with the SOAP service. You can avoid porn/spam-abuse etc. by making those URLs only be downloadable by your own service - the easiest would be to just block any download and use the same trick as before, but look in the upload directory var/tmp/myServiceDataUploads. The advantage of going for simple HTTP is that you can then do cross-service data referencing (avoiding the slow ADSL link down to the Taverna user), even if the two services have never heard about each-other before, are implemented in different languages, etc. I suggest initially only supporting "raw" data (binaries and string) in this 'big' upload - but you could extend this to support XSD-described XML documents as well - although I believe creating and parsing such documents would bring back the majority of the problems you wanted to avoid in the first place. (You can still have multiple URI references in a single SOAP message, just put the references at the 'right' level) When dereferencing server-side you can use something like the Download Manager in the Taverna 2 platform, which can avoid double-downloading the same URI. -- Stian Soiland-Reyes, myGrid team School of Computer Science The University of Manchester ------------------------------------------------------------------------------ _______________________________________________ taverna-hackers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/taverna-hackers Developers Guide: http://www.mygrid.org.uk/usermanual1.7/dev_guide.html FAQ: http://www.mygrid.org.uk/wiki/Mygrid/TavernaFaq
