Re: [Taverna-hackers] Ideas

Richard Holland Wed, 06 Oct 2010 04:37:08 -0700

On 6 Oct 2010, at 12:52, Donal K. Fellows wrote:

> On 06/10/2010 08:27, Richard Holland wrote:
>> Hello all, had a conversation with Carole here in Hannover yesterday.
>> I had some suggestions for future improvements to Taverna and she
>> said I should post these to the list:
> 
> The summary for the "tl;dr" crowd:
>  Some of these things we're going to be doing, some we'd like to do
>  but don't yet know enough, some are ferociously difficult, and some I
>  simply don't have the expertise to talk about. :-)
> 
> For the interested...
> 
>> 1. Internal parallelisation of workflows. The ability to specify that
>> a particular subsection within a workflow should implicitly divide
>> the input, run several instances in parallel on some kind of back-end
>> grid (LSF, SGE, Condor, or an EC2 approach, etc.), and recombine the
>> output. A subsection could just be one component, or several chained
>> together, or the entire workflow. The current strategy as per
>> App4Andy I believe is just the last one of the three - i.e. the
>> entire workflow.
> 
> Automatic parallelization is one of these things that is Known To Be
> Difficult. If the workflow processor was a call to a SOAP or REST based
> service, autoparallel would turn the workflow into a distributed denial
> of service attack tool. :-)
> 
> The only way to make real progress with parallelizing a workflow
> requires knowledge of the specific workflow, the data it is processing,
> the resources that it relies on, etc. The App4Andy work was a success in
> large part because we did this first.


Agreed. I wouldn't necessarily suggest automatically defining which tasks 
should be parallelised. But the ability to be able to right-click on one in the 
interface and say 'parallise this because I know it is slow' would be super. As 
would feedback of processing times so that it could give hints as to which ones 
you might like to. You could have an API call for each task indicating whether 
it is permitted to be parallel or not - the default would be yes, but the web 
service one would be no, and anything marked no would not be able to be 
selected as parallelisable (sp?) in the interface (hence disallowing DDOS).

> 
>> 3. Reporting functionality. The ability to dynamically generate PDFs
>> or spreadsheets or Word docs etc. based on workflow output via some
>> kind of graphic designer for report templates is really valuable.
>> Knime and Pipeline Pilot both have this feature.
> 
> One of the things I want to see is more community involvement in Taverna
> so that it isn't just a project done in one place. Reporting is one of
> the areas where I think contributions would be very valuable.
> 
>> 5. Still need a better way of delegating security credentials to the
>> server/grid instances so the workflow can log into things on your
>> behalf.
> 
> That's an area where there will be more work done; I plan to have a
> sketch of a strawman proposal hammered out later this week, including
> having a key requirement to not be tightly bound to Java clients. (The
> big problem is that existing solutions for this tend to rely on
> *everything* using the same auth system; that's just not what happens on
> the ground.)
> 
>> 6. The killer feature would be the ability to install a plugin on
>> your desktop or grid front-end client and have it propagate to the
>> back-end server or multiple grid instances along with the workflow,
>> in the case that it hasn't already been installed back there. You'd
>> need to think of some way of sandboxing the plugins distributed in
>> this way so that they can only affect the workflows of the user that
>> submitted them.
> 
> A seriously neat idea, but *very* difficult to get right.
> 
>> 7. Have the ability (maybe via myExperiment) to log every execution
>> of a workflow including a reference to the input data, the structure
>> of the workflow at the time of the execution, and a reference to the
>> results, provenance, etc. This is very useful for lab notebook
>> concepts and also for reproducing work at a later date.
> 
> Another seriously neat idea. I'd need to understand what "every" means
> to you here (I can think of a few conflicting use-cases ;-)) and there's
> some issues with where the data is actually stored, but it would fit in
> with some of the concepts of the Next-Gen Workbench too.

Every being defined as everything deemed by the user as important. Obviously 
pointless to do for simple back-end repeated tasks that do data processing 
behind the scenes, but very useful for not-very-repeated but high-value 
workflows. Should be user-definable whether it gets tracked or not. 

(I brought this one up because it was a specific request from a customer of 
ours who already has this functionality built as an in-house developed custom 
add-on to Pipeline Pilot; it interacts with the server instance via API calls).

> 
> I'd rate the probability of this being done as high.
> 
>> 8. Taverna server (App4Andy is a great start by the way) to offer the
>> possibility to upload arbitrary workflows via the desktop client and
>> execute them on the server/grid/cloud, rather than choosing from a
>> predefined selection. On uploading it could make an auto-generated
>> but editable web interface for obtaining workflow input, monitoring
>> progress, downloading results/provenance/etc. The workflow could be a
>> one-off, or could be stored there, and kept private, or shared via
>> user/group notions, etc. This is all in Knime already (except the
>> cloud bit).
> 
> We can do some of this now (the auto-interface isn't sophisticated, but
> it does exist) and other things - the integration with the workbench in
> particular - are on the technical feature roadmap. This *will* be done.

Thanks for the responses - good to know what is on the map and what is not. The 
most important of all these things really is number 1 (parallelisation) as 
everything else can be worked-around/hacked by those who are desperate enough. 
:)

cheers
Richard

> 
> Donal.
> <donal_k_fellows.vcf>------------------------------------------------------------------------------
> Beautiful is writing same markup. Internet Explorer 9 supports
> standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
> Spend less time writing and  rewriting code and more time creating great
> experiences on the web. Be a part of the beta today.
> http://p.sf.net/sfu/beautyoftheweb_______________________________________________
> taverna-hackers mailing list
> [email protected]
> Web site: http://www.taverna.org.uk
> Mailing lists: http://www.taverna.org.uk/about/contact-us/
> Developers Guide: http://www.taverna.org.uk/developers/

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: [email protected]
http://www.eaglegenomics.com/


------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
taverna-hackers mailing list
[email protected]
Web site: http://www.taverna.org.uk
Mailing lists: http://www.taverna.org.uk/about/contact-us/
Developers Guide: http://www.taverna.org.uk/developers/

Re: [Taverna-hackers] Ideas

Reply via email to