On Wed, Oct 6, 2010 at 08:27, Richard Holland <[email protected]> wrote:
> Hello all, had a conversation with Carole here in Hannover yesterday. I had 
> some suggestions for future improvements to Taverna and she said I should 
> post these to the list:

Thanks for all the great ideas! I agree on the other comments, so I'll
just add in.

> 1. Internal parallelisation of workflows. The ability to specify that a 
> particular subsection within a workflow should implicitly divide the input, 
> run several instances in parallel on some kind of back-end grid (LSF, SGE, 
> Condor, or an EC2 approach, etc.), and recombine the output. A subsection 
> could just be one component, or several chained together, or the entire 
> workflow. The current strategy as per App4Andy I believe is just the last one 
> of the three - i.e. the entire workflow.

As others have pointed out, this only makes sense when invoking local services.

At the ECCB 2010 conference we got the feeling that many still like
the idea of plugging their own tools and scripts into Taverna (as they
don't want to throw away the work they've currently got), so we're
going to keep supporting this in addition to our current work on
making more easy-to-plug-together components.


The challenge here is that these tools typically are standalone Perl
and Python scripts making various assumptions about file paths, and
typically not built to be run in parallel. (Paths would need to be
parameterised to avoid stepping on each other's toes). Additionally
there's of course the requirement for the environment, like a perl
script using various modules from CPAN that needs to be installed,
compiled and linked with C-libraries that also needs to be installed
for that particular operating system.

This leads to the challenge of at least giving some kind of
description of these requirements, or just let the user take full
control to make sure that everything is installed, but in both cases
lowering the chance of your workflow being shareable or portable.

There's also the paradigm shift between a procedural scripting and a
dataflow oriented description of the workflow like in Taverna. In
App4Andy we decided to stay close to the original script, so the
actual data flow generally got exchanged via files and mySQL
databases, while the workflow took a more orchestration role, simply
passing file paths and database references.

This approach makes it easy to adapt your scripts to be run from a
workflow, but tricky to include other external services or existing
workflows, which might be your motivation for going for a workflow in
the first place. Moving to a full-blown Taverna-approach would mean
changing all the scripts to deal with the actual data in individual
iterations, but the advantage of that would also be that one could do
as you suggest, and let Taverna take control of the parallelism and
splitting of data among instances, thanks to guidance from the
workflow designer.

I think the way forward in cloud parallelism would be to allow you to
build your workflow for parallelism, so you can do say the typical
map-reduce pattern from your workflow. The challenge is that adapters
would have to modify their existing tools and maintain the cloud
installation.


> 2. Simplified plugin APIs to make it quicker and easier for Taverna plugin 
> novices to understand what they're meant to be doing. This has definitely 
> improved recently in terms of documentation, but still some way to go in 
> terms of the API itself.

Thanks. We're aware of the continuing challenge, but hope it is
improving. For instance, at the ECCB we met with Martijn van Iersel,
developer of BridgeDB - and after two 30 minutes sessions following
the plugin tutorial and a few Eclipse tricks, he had a Taverna
workflow running with a new plugin which could do various
bio-identifier translations using BridgeDB. (I think interesting
plugin should make its way out to the public in the near future.) Now
the annoying thing is that still a few 'eclipse tricks' were needed,
and those are not always easy to document.

We're hoping for API access to be easier once we've transitioned to
the OSGi platform, which we're working on this autumn.


> 6. The killer feature would be the ability to install a plugin on your 
> desktop or grid front-end client and have it propagate to the back-end server 
> or multiple grid instances along with the workflow, in the case that it 
> hasn't already been installed back there. You'd need to think of some way of 
> sandboxing the plugins distributed in this way so that they can only affect 
> the workflows of the user that submitted them.which

Supposedly OSGi should make it easier to do this. We could have done
dynamic plugin installation already with Raven, but never did it
because of the security concerns. Sandboxing plugins is quite tricky
since they tend to need either disk or network access.. but some kind
of 'Access control' system similar to how you install Android apps on
a mobile phone, together with some kind of signing and pre-approval of
certain workflows or plugin providers could go a long way.

I agree this would be a very cool feature, specially for a cloud
installation, as we could provide a generic Taverna Server image which
anyone could start up with their credentials, and then the workflow
submission could define and provide which extra stuff is needed for
execution; plugins and data, possibly even binaries, in a way
customizing the instance on the fly.


> 8. Taverna server (App4Andy is a great start by the way) to offer the 
> possibility to upload arbitrary workflows via the desktop client and execute 
> them on the server/grid/cloud, rather than choosing from a predefined 
> selection. On uploading it could make an auto-generated but editable web 
> interface for obtaining workflow input, monitoring progress, downloading 
> results/provenance/etc. The workflow could be a one-off, or could be stored 
> there, and kept private, or shared via user/group notions, etc. This is all 
> in Knime already (except the cloud bit).

Alex Nenadic has been working on a portlet that does many of these
things, allowing you to customize the input form, etc. I'm not sure
what is the release plan for that at the moment, but we've got
something in the brew!

-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
taverna-hackers mailing list
[email protected]
Web site: http://www.taverna.org.uk
Mailing lists: http://www.taverna.org.uk/about/contact-us/
Developers Guide: http://www.taverna.org.uk/developers/

Reply via email to