[Taverna-hackers] Specification of Taverna REST support

Mark V Thu, 08 Apr 2010 21:56:40 -0700

Hi,
Thanks for posting the T2 REST outline.  I have some thoughts which I
hope are useful.  Sorry to make it so long, but I'm begining to
despair about workflow tooling for REST consumers.
Though I am working towards using Taverna, I'm not experienced with
Taverna.  With that in mind I hope you find the suggestions useful and
the argument compelling.

Overall I can't see much in the current proposal that is worth
following - sorry to be so brutal.
Much more could be gained by providing a capability for T2 to run
script languages such as Ruby which has several well developed and
battle tested frameworks, each aimed at making it easy to consuming
HTTP (and other protocol) services, RESTful or not.

One inescapable issue is what is meant by a RESTful service.
Opinions range:
From, anything over HTTP (and only HTTP).
To, only what Roy Fielding has described (inventor's prerogative).

Reader's Digest version of what follows:
 - Offering a Ruby (my preference), Python, etc. 'service' will get
you 90% towards accommodating additional RESTful (and not RESTful)
services.  Is this currently possible or on a Roadmap?
 - OpenID authentication and OAuth authentication is important [3].
 - Generic input and output ports based on script variables for
passing data to and from any Ruby, Python etc scripts using
'convention-over-configuration'.
 - A special case is providing URI input and output ports.  That is,
protocol agnostic URI specific input and output ports, e.g. populate
an input port's URI template with data from other input ports
(iterating over arrays, etc).

On Fri, Apr 9, 2010 at 1:59 AM, Alan Williams <[email protected]> wrote:
> Hope you find this useful,
>
> Alan
>
> Hi!
>
> My colleague Stuart Owen has been specifying how we can build a
> lightweight support for REST services in Taverna [1] .

Lightweight is good, I do think it is possible for less T2 HTTP
specifics to facilitate the development of more RESTful services.
Having said that, I didn't find the specification in [4] sensible, nor
the example at [5] compelling.
Both do reflect a view I had until I wrestled with [1].
I think Fielding addresses the issues.

> We've scheduled
> the first iteration of implementing this to be made available for
> Taverna 2.2 this spring.
>
>
> As some of you know, you can access most REST services already using
> the 'Get page from URL' local worker, but this would mean half your
> workflow is spent constructing an URL.

Why would anyone do that?
Could they not just write some Beanshell script (javascipt), using ExtJS?
Or R-project script using Rcurl or HTTPRequest?
And if they are code averse why are they going to be interested in a
Get service? Just document that putting this URI in here and output
ports named, x,y,z will contain the data 1,2,3 - they don't need to
know that in v1 it came from a single GET, in v1.3 three GETS and in
v2 one HTTP, four FTP and one SMTP request!

> Also this worker can only use
> the HTTP GET method, and with no way to specify other parameters such
> as the 'Accepts' header.

I _really_ think there is much to be gained by abstracting these sorts
of protocol specific concerns away from the RESTful service component.
There are many well designed and battle tested frameworks available
for accessing RESTful services.
These frameworks are in languages like Ruby, Python, perl, etc.  T2
would benefit by throwing open some doors to those languages in the
same way it opens the door to javascript and R.
IMO it is much more valuable to provide some way of leveraging all
that prior art/experience.
This is not to say there is nothing REST specific to do.  For example,
user friendly (GUI) provision of OpenID/OAuth support, 'automagically'
consuming URI template input ports given other input port data, robust
and efficient mapping between input/output ports and script variables.

>
> He has worked on this proposal together with several local users, and
> we had a meeting where we discussed this approach. There are of course
> plenty of more possibilities, but the goal of this planned work is to
> build something that should make it just a little bit easier to do
> REST services in Taverna.

I do like the URI parameter-inputs/port-name mapping idea, it seems
natural, and even appears to satisfy a Fielding rule! ;)
In case it is implied, I would not force input ports to match URI
template parameters at the outset.

A key to RESTful design is to decouple the client and the server,
making the input-port data available during the life of a REST client
script gives the provider flexibility to consume that data at
different points in time and so change, at a later date, the order in
which data is 'discovered' via some hypertext driven pathway.

I think the 4-protocol-specific-services (or even N services) is a bad
idea. The user should not have to know if the URI they provide is
first consumed with a POST or GET, and the script provider should be
free to change this by providing an updated client script.
I'd really prefer an approach where T2 provides ways for Ruby, Python
etc script authors to run their client code, and those authors are
free to choose to use/abuse the 'RESTful' definition, and how their
client code interacts with their service.
I think having T2 worry about headers and response codes, etc.
constrains service providers (e.g to HTTP) - it seems to me the issue
is not whether T2 is happy with the data or protocol, but whether the
script processing the response knowns what to do with it (given its
knowledge of media types, etc.  See Fielding's post [1] )

>
> Part of the challenge is that most REST services are not formally
> described,

Roy might disagree [1] - any REST service is formally described.
Or at least Roy Fielding seems to think there is a formal description
of any and all RESTful services - if you don't match the
definition/rules he asks us to call it something else (WADL?).

I think you meant to say that many services claim to be RESTful, when
they are not.  Consequently, there are a plethora of 'interpretations'
claiming to be RESTful.
In my experience the problem is services describing themselves as
RESTful when the inventor indicates they clearly are not.
This situation is unavoidable while REST knowledge and experience
spreads, and is vigorously debated, among developers.

My point is this: T2's 'challenge' is to accommodate services that
claim to be RESTful (but are not) without shutting out services that
are or, at least, are striving to be RESTful.
Frankly, I really think T2 would benefit by sidestepping this whole
issue and providing sripting language components that are convenient
and generic - let the different service providers defend their choices
and their work.

> even though there are several proposed standards for doing
> so (WADL, WSDL2). This solution will therefore require the user to
> specify URL pattern for the service themselves. However, we're also
> allowing for future linking with BioCatalogue, which can have richer
> descriptions of REST services.
>
> (See for example http://sandbox.biocatalogue.org/rest_methods/41 of
> such descriptions which we in the future could show from the
> 'Available services' panel, and show help about using the BioCatalogue
> plugin. )

Okay, this is a case in point.  Ignore Fielding's definition/rules for
the moment.
It would be useful if T2 allowed for a service that took a different approach.
It may or may not be a REST service, may or may not use HTTP, so could
be very different to this service.

>
> One of the other challenges with REST support is how to deal with the
> output data, for instance using XPath for XML output, or a similar
> JSONPath for JSON output.

Again, I'd avoid even touching prescriptions on this topic.
Can't the REST service script provider be left to consume this data
themselves, per the Beanshell and RShell service conventions for
port/variable name mapping?
Of course the REST service script provider is free to document that an
output port named 'chew_this' will contain an XML document from hell.

>
>
> Feel free to have a look at Stuart's specification [1] and provide us
> with any feedback. We're also interested in example services that you
> envision using from Taverna.
>
>
> [1] http://www.mygrid.org.uk/dev/wiki/display/developer/T2+REST+Support
>

Currently I'm trying to move my understanding closer to Fielding's
definitions/descriptions - he is bright, and has thought about this
subject.
So far, when I've understood the practical effects of what he's
described, there has been substantial benefits - YMMV.
A definition which has influenced my thinking is [1]. It is succinct
enough for me to periodically evaluate what I'm currently doing or
thinking.

It took me a long time to wrap my mind around the _practical_
implications behind Roy's description/rules.
I'm certainly not a guru so bear in mind that my understanding is evolving.
Anyway, as I have tried to force my ideas to conform to what Roy
describes, a direct (beneficial) consequence has been to eliminate
prior design decisions that I now realize were poor, or forced me into
situations where there was no good solution open to me.
When that improvement process stops I'll probably stop moving my
implementation/architecture towards his definitions.

It will be exciting to have Taverna's REST Service allow the full
range of what people think RESTful behavior/architecture is, at least
this would allow legacy service ideas (such as mine) to evolve to be
RESTful.
In fact I think it'd be good to explicitly decline offering a Taverna
view of how a RESTful service behaves - or what is RESTful - beyond
some minimal things, like starting with a URI (i.e. HTTP optional),
and settling on a defined template format, see [2].

I think great flexibility (hence functionality) could be achieved with
few conveniences provided by a Taverna RESTful service GUI component:
 A) OpenID authentication and OAuth authorization, two and three legged [3].
 B)  A _single_ URI entry point:
    The URI could be a parameterized template (For example see [2]),
where the value from any input port name matching the template field
is substituted into the template.  Or the URI could be complete.  It
would be useful to have a couple of behaviors for populating a
template  (synchronized, left-to-right, right-to-left) by iterating
over arrays data in other input ports.
 C) Delegate loading/running RESTful client's script code
(Jython/JRuby, etc), to a minimal script.  Ideally the full power of
these languages woud be exposed, allowing all their pre-existing and
emerging frameworks to be used.

T2 could easily say:
"The T2 convention is to make the data stored in script variables
available to output port's whose port name matches the variable name.
RESTful API service provided scripts are responsible for consuming
their responses and making them available via variable/port names they
document for end users."

I think this is how the Beanshell and Rshell services work?
To avoid difficulties tracking variable when their count explodes
there could be a prefix (t2_out_) naming convention, e.g. (Ruby)
@t2_out_yellow.  This way it is left to the RESTful script author to
consume their output formats, they eat their own dog food rather than
force everyone else to consume JSON, XML, etc.  This would of course
still allow a RESTful provider to dump a whole HTTP response, or XML
document, in a output port and have another T2 component consume it...
horses for courses.

This may be going off topic...
Providing a T2 RESTful service component  that fits this (RESTful)
service description would be much easier, and 90% complete, if Taverna
was just able to run Ruby/Python, etc. scripts.
In the tradition of "write the code you wish you had".
I have in mind the following (Ruby) script:

#-------- start Ruby script -----------

require 'my-restful-lib'
require 'my-restful-lib/t2'

puts "This is what that T2 passed in via the env: #{ENV['T2_URI_INPUT']}"
puts "This is my oauth key from the T2 GUI component: #{ENV['T2_OAUTH_KEY']}"

# Do something with ENV['T2_INPUT_PORT_NAMES']

resp = MyRestfulLib::T2.grab ENV['T2_URI_INPUT'],
         :oauth_key => ENV['T2_OAUTH_KEY']

# Do something with resp
# Export value to T2 RESTful service output port names....
# How -  using system ENV, other?

if resp.code == 200 & resp.header[:content_length] < 20
  then
    @t2_interesting_data = resp.body.wow
  else
    @t2_interesting_data = 'boring'
end

puts "#...@t2_interesting_data}"

#--------- end Ruby script ------------

Of course I have just used 'puts' and ENV.  It is not clear to me what
the best way is to pass data between T2 and scripting frameworks such
as Ruby/Python (JRuby/Jython)...

The advantages of this approach are (per Fielding's rules):
 - The T2 RESTful service (T2 service) does not force providers adopt
any single communication protocol (the result formating might resemble
HTTP content, but the API client code could have obtained that data
using FTP, etc.)
 - T2 service providers have a free hand with their media types
 - T2 service providers have a free hand with URI construction within
their scripts
 - T2 service providers have a free hand following hypertext within
their scripts (HATEOS)
 - The T2 service does not require definition of fixed resource names
or hierarchies
 - The T2 service does not require any “typed” resources be
significant to the client's RESTful API library
 - The T2 service can initialize any RESTful API with no prior
knowledge beyond the initial URI (i.e. populated template)

The main work/issues might be:
 - Implementing JRuby, Jython, etc. script support (bringing
additional benefits beyond REST services)
 - Settling on conventions for passing data between these scripts and
the T2 RESTful service ports
 - OAuth

Two significant advantages of this proposal are:
I) Apart from populating an input URI with values from matching input
port names, this 'service' would really just be a Ruby/Python
(JRuby/Jython) 'service', so there would likely be much greater bang
inaddition to having access to existing RESTful frameworks, e.g.
Ruby's EventMachine or Python's Twisted.
II) The detail of any RESTful API contents and behavior are abstracted
away from T2, and those headaches are left were they belong - with the
RESTful API service providers.

I deliberately have not emphasized WADL, etc.
More than a year ago I tried to implement a simple RESTful process
using WADL - it didn't work. At the time I wasn't as familiar with
Roy's 'rules', maybe WADL is now REST capable?
I do know that there are some who swear _by_ WADL and others who just
swear _at_ WADL :)
In what I have suggested, those issues are left to the writer of the
RESTful library (MyRestLib::T2 in the example above).
Mischief follows:
In fact in the above proposal it is _possible_ that no data is ever
obtained across the network, nor HTTP ever used, and the 'response' is
just manufactured in-memory ;)

Thanks for the opportunity to provide feedback, and I appreciate any comments.

Regards
Mark

[1]  http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven
[2]  http://tools.ietf.org/html/draft-zyp-json-schema-02#section-6.1.1
[3]  http://oauth.net
[4]  http://www.mygrid.org.uk/dev/wiki/display/developer/T2+REST+Support
[5]  http://sandbox.biocatalogue.org/rest_methods/41

> --
> Stian Soiland-Reyes, myGrid team
> School of Computer Science
> The University of Manchester
>
> ------------------------------------------------------------------------------
> Download Intel&#174; Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev
> _______________________________________________
> taverna-hackers mailing list
> [email protected]
> Web site: http://www.taverna.org.uk
> Mailing lists: http://www.taverna.org.uk/about/contact-us/
> Developers Guide: http://www.taverna.org.uk/developers/
>
>

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
taverna-hackers mailing list
[email protected]
Web site: http://www.taverna.org.uk
Mailing lists: http://www.taverna.org.uk/about/contact-us/
Developers Guide: http://www.taverna.org.uk/developers/

[Taverna-hackers] Specification of Taverna REST support

Reply via email to