Re: [Taverna-hackers] Provenance model docs

Guzmán Llambías - INCO Thu, 20 May 2010 07:45:30 -0700

Hi Stian!!

Thank you very much for the detailed explanation! I'm very interested in the 
result of the provenance queries and the OPM Graphs. Are those results in 
RDF? You sent me a reference of a query sample, but is there any docs I 
could read about it? How does Taverna support the OPM?


Thanks for all!

Regards,
Guzmán

----- Original Message ----- 
From: "Stian Soiland-Reyes" <[email protected]>
To: "List for general discussion and hacking of the Taverna project" 
<[email protected]>; "Paolo Missier" 
<[email protected]>
Sent: Wednesday, May 19, 2010 4:27 AM
Subject: Re: [Taverna-hackers] Provenance model docs


> On Wed, May 19, 2010 at 03:35, Guzman Llambias - INCO
> <[email protected]> wrote:
>
>> I've been looking foward some provenance models docs for T2, without
>> luck. Could you please guide me a bit in order to find some?
>
> Hi!
>
> I'll try to write this up as a wiki page.. but here goes a quick draft:
>
>
> The provenance model for Taverna 2 is quite different from the model
> of Taverna 1, as we now focus on the lineage/origin of data. So we
> want to easily check which inputs caused a given output, which
> upstream outputs gave those inputs, and so on.
>
>
> It might be easy to understand what we're capturing by looking at how
> we store the provenance. This is done internally in a Derby database,
> but can also be configured to store in a mySQL database.
>
>
> Here's the current database schema for provenance as of Taverna 2.1:
>
> http://www.mygrid.org.uk/dev/wiki/display/developer/Provenance+schema+in+2.1.2
>
>
> However, this schema does not capture all aspects of workflow
> executions, so I'm in the process of refactoring the database schema
> to:
>
> http://www.mygrid.org.uk/dev/wiki/display/developer/Provenance+schema+in+2.2.0
>
> .. I'll update this page to reflect reality once that's done.
>
>
> Note that this database is not meant to be exposed directly, but it's
> possible to query the database using a 'lineage query', and export the
> provenance as an OPM graph. (Open Provenance Model).
>
>
> See 
> http://code.google.com/p/mygrid-labs/source/browse/provenance-client/trunk/src/main/resources/testQuery1.xml
> for an example of a query, this will select runs over workflow
> ac41d494-f77c-4dd5-919c-47272aa6a848 (the dataflow identifier found
> inside the .t2flow file), and in particular it will select the run
> identified as ae1e2b6b-3bc5-4c93-a250-c4dd0210c3b3, in addition to any
> runs since 2009-10-08.
>
> In the result graph will be the details of the origin of the <select>
> element, so in this case it is the output port "value" on the
> processor "String_constant" inside the nested workflow
> "Nested_workflow", the workflow output port "out", and all output
> ports of the service "Beanshell".
>
> The <focus> element selects which details leading to the <select>
> outputs you want to look up details for, specified in a similar
> fashion.
>
> Paolo Missier (copied) should be able to fill in with details on how
> to run such queries.
>
>
>
> The rough way Provenance works internally in Taverna is this:
>
> When running a workflow, the WorkflowInstanceFacade will trundle
> through the workflow's processors, and insert a new Dispatch stack
> layer, IntermediateProvenance. This is placed all at the top, below
> Parallelize, but above ErrorBounce, meaning that it should see the
> actual inputs from the processor input ports, and the actual output
> delivered to the processor output ports, at the time the execution is
> finished.
>
> When a job is received (ie. all data is available on the
> processor input ports and an available thread has been identified by
> Parallellize), IntermediateProvenance records the input data and
> (soon) execution start time. Similarly on the way up, it will record
> the output data (which might be the error document registered or
> bounced by the ErrorBounce layer), all stored in a hashmap per
> iteration.
>
> On the way up, this bean of provenance information is sent to the
> provenance database, where it is stored in the tables as explained in
> the schema above.
>
> There is then a ProvenanceAccess layer, where one might query
> different aspects of the stored provenance, like which input and
> output values an intermediate processor dealt with. You can also ask
> for the 'lineage' of a data value, which should give you a trace as to
> which input values it depends on throughout the workflow, or export
> the whole thing (or a selection to such a query) as an OPM graph.
>
>
> In order to populate the new table ServiceInvocation there will be a
> new, lightweight provenance layer that will be inserted between each
> of the deeper dispatch layers, it will then be able to record
> individual retries, failovers, looping.
>
>
>
> -- 
> Stian Soiland-Reyes, myGrid team
> School of Computer Science
> The University of Manchester
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> taverna-hackers mailing list
> [email protected]
> Web site: http://www.taverna.org.uk
> Mailing lists: http://www.taverna.org.uk/about/contact-us/
> Developers Guide: http://www.taverna.org.uk/developers/
> 


------------------------------------------------------------------------------

_______________________________________________
taverna-hackers mailing list
[email protected]
Web site: http://www.taverna.org.uk
Mailing lists: http://www.taverna.org.uk/about/contact-us/
Developers Guide: http://www.taverna.org.uk/developers/

Re: [Taverna-hackers] Provenance model docs

Reply via email to