On Wed, Oct 6, 2010 at 08:27, Richard Holland <[email protected]> wrote: > Hello all, had a conversation with Carole here in Hannover yesterday. I had > some suggestions for future improvements to Taverna and she said I should > post these to the list:
Thanks for all the great ideas! I agree on the other comments, so I'll just add in. > 1. Internal parallelisation of workflows. The ability to specify that a > particular subsection within a workflow should implicitly divide the input, > run several instances in parallel on some kind of back-end grid (LSF, SGE, > Condor, or an EC2 approach, etc.), and recombine the output. A subsection > could just be one component, or several chained together, or the entire > workflow. The current strategy as per App4Andy I believe is just the last one > of the three - i.e. the entire workflow. As others have pointed out, this only makes sense when invoking local services. At the ECCB 2010 conference we got the feeling that many still like the idea of plugging their own tools and scripts into Taverna (as they don't want to throw away the work they've currently got), so we're going to keep supporting this in addition to our current work on making more easy-to-plug-together components. The challenge here is that these tools typically are standalone Perl and Python scripts making various assumptions about file paths, and typically not built to be run in parallel. (Paths would need to be parameterised to avoid stepping on each other's toes). Additionally there's of course the requirement for the environment, like a perl script using various modules from CPAN that needs to be installed, compiled and linked with C-libraries that also needs to be installed for that particular operating system. This leads to the challenge of at least giving some kind of description of these requirements, or just let the user take full control to make sure that everything is installed, but in both cases lowering the chance of your workflow being shareable or portable. There's also the paradigm shift between a procedural scripting and a dataflow oriented description of the workflow like in Taverna. In App4Andy we decided to stay close to the original script, so the actual data flow generally got exchanged via files and mySQL databases, while the workflow took a more orchestration role, simply passing file paths and database references. This approach makes it easy to adapt your scripts to be run from a workflow, but tricky to include other external services or existing workflows, which might be your motivation for going for a workflow in the first place. Moving to a full-blown Taverna-approach would mean changing all the scripts to deal with the actual data in individual iterations, but the advantage of that would also be that one could do as you suggest, and let Taverna take control of the parallelism and splitting of data among instances, thanks to guidance from the workflow designer. I think the way forward in cloud parallelism would be to allow you to build your workflow for parallelism, so you can do say the typical map-reduce pattern from your workflow. The challenge is that adapters would have to modify their existing tools and maintain the cloud installation. > 2. Simplified plugin APIs to make it quicker and easier for Taverna plugin > novices to understand what they're meant to be doing. This has definitely > improved recently in terms of documentation, but still some way to go in > terms of the API itself. Thanks. We're aware of the continuing challenge, but hope it is improving. For instance, at the ECCB we met with Martijn van Iersel, developer of BridgeDB - and after two 30 minutes sessions following the plugin tutorial and a few Eclipse tricks, he had a Taverna workflow running with a new plugin which could do various bio-identifier translations using BridgeDB. (I think interesting plugin should make its way out to the public in the near future.) Now the annoying thing is that still a few 'eclipse tricks' were needed, and those are not always easy to document. We're hoping for API access to be easier once we've transitioned to the OSGi platform, which we're working on this autumn. > 6. The killer feature would be the ability to install a plugin on your > desktop or grid front-end client and have it propagate to the back-end server > or multiple grid instances along with the workflow, in the case that it > hasn't already been installed back there. You'd need to think of some way of > sandboxing the plugins distributed in this way so that they can only affect > the workflows of the user that submitted them.which Supposedly OSGi should make it easier to do this. We could have done dynamic plugin installation already with Raven, but never did it because of the security concerns. Sandboxing plugins is quite tricky since they tend to need either disk or network access.. but some kind of 'Access control' system similar to how you install Android apps on a mobile phone, together with some kind of signing and pre-approval of certain workflows or plugin providers could go a long way. I agree this would be a very cool feature, specially for a cloud installation, as we could provide a generic Taverna Server image which anyone could start up with their credentials, and then the workflow submission could define and provide which extra stuff is needed for execution; plugins and data, possibly even binaries, in a way customizing the instance on the fly. > 8. Taverna server (App4Andy is a great start by the way) to offer the > possibility to upload arbitrary workflows via the desktop client and execute > them on the server/grid/cloud, rather than choosing from a predefined > selection. On uploading it could make an auto-generated but editable web > interface for obtaining workflow input, monitoring progress, downloading > results/provenance/etc. The workflow could be a one-off, or could be stored > there, and kept private, or shared via user/group notions, etc. This is all > in Knime already (except the cloud bit). Alex Nenadic has been working on a portlet that does many of these things, allowing you to customize the input form, etc. I'm not sure what is the release plan for that at the moment, but we've got something in the brew! -- Stian Soiland-Reyes, myGrid team School of Computer Science The University of Manchester ------------------------------------------------------------------------------ Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. http://p.sf.net/sfu/beautyoftheweb _______________________________________________ taverna-hackers mailing list [email protected] Web site: http://www.taverna.org.uk Mailing lists: http://www.taverna.org.uk/about/contact-us/ Developers Guide: http://www.taverna.org.uk/developers/
