Hi Stian!!

Thank you very much for the detailed explanation! I'm very interested in the 
result of the provenance queries and the OPM Graphs. Are those results in 
RDF? You sent me a reference of a query sample, but is there any docs I 
could read about it? How does Taverna support the OPM?

Thanks for all!

Regards,
Guzmán

----- Original Message ----- 
From: "Stian Soiland-Reyes" <[email protected]>
To: "List for general discussion and hacking of the Taverna project" 
<[email protected]>; "Paolo Missier" 
<[email protected]>
Sent: Wednesday, May 19, 2010 4:27 AM
Subject: Re: [Taverna-hackers] Provenance model docs


> On Wed, May 19, 2010 at 03:35, Guzman Llambias - INCO
> <[email protected]> wrote:
>
>> I've been looking foward some provenance models docs for T2, without
>> luck. Could you please guide me a bit in order to find some?
>
> Hi!
>
> I'll try to write this up as a wiki page.. but here goes a quick draft:
>
>
> The provenance model for Taverna 2 is quite different from the model
> of Taverna 1, as we now focus on the lineage/origin of data. So we
> want to easily check which inputs caused a given output, which
> upstream outputs gave those inputs, and so on.
>
>
> It might be easy to understand what we're capturing by looking at how
> we store the provenance. This is done internally in a Derby database,
> but can also be configured to store in a mySQL database.
>
>
> Here's the current database schema for provenance as of Taverna 2.1:
>
> http://www.mygrid.org.uk/dev/wiki/display/developer/Provenance+schema+in+2.1.2
>
>
> However, this schema does not capture all aspects of workflow
> executions, so I'm in the process of refactoring the database schema
> to:
>
> http://www.mygrid.org.uk/dev/wiki/display/developer/Provenance+schema+in+2.2.0
>
> .. I'll update this page to reflect reality once that's done.
>
>
> Note that this database is not meant to be exposed directly, but it's
> possible to query the database using a 'lineage query', and export the
> provenance as an OPM graph. (Open Provenance Model).
>
>
> See 
> http://code.google.com/p/mygrid-labs/source/browse/provenance-client/trunk/src/main/resources/testQuery1.xml
> for an example of a query, this will select runs over workflow
> ac41d494-f77c-4dd5-919c-47272aa6a848 (the dataflow identifier found
> inside the .t2flow file), and in particular it will select the run
> identified as ae1e2b6b-3bc5-4c93-a250-c4dd0210c3b3, in addition to any
> runs since 2009-10-08.
>
> In the result graph will be the details of the origin of the <select>
> element, so in this case it is the output port "value" on the
> processor "String_constant" inside the nested workflow
> "Nested_workflow", the workflow output port "out", and all output
> ports of the service "Beanshell".
>
> The <focus> element selects which details leading to the <select>
> outputs you want to look up details for, specified in a similar
> fashion.
>
> Paolo Missier (copied) should be able to fill in with details on how
> to run such queries.
>
>
>
> The rough way Provenance works internally in Taverna is this:
>
> When running a workflow, the WorkflowInstanceFacade will trundle
> through the workflow's processors, and insert a new Dispatch stack
> layer, IntermediateProvenance. This is placed all at the top, below
> Parallelize, but above ErrorBounce, meaning that it should see the
> actual inputs from the processor input ports, and the actual output
> delivered to the processor output ports, at the time the execution is
> finished.
>
> When a job is received (ie. all data is available on the
> processor input ports and an available thread has been identified by
> Parallellize), IntermediateProvenance records the input data and
> (soon) execution start time. Similarly on the way up, it will record
> the output data (which might be the error document registered or
> bounced by the ErrorBounce layer), all stored in a hashmap per
> iteration.
>
> On the way up, this bean of provenance information is sent to the
> provenance database, where it is stored in the tables as explained in
> the schema above.
>
> There is then a ProvenanceAccess layer, where one might query
> different aspects of the stored provenance, like which input and
> output values an intermediate processor dealt with. You can also ask
> for the 'lineage' of a data value, which should give you a trace as to
> which input values it depends on throughout the workflow, or export
> the whole thing (or a selection to such a query) as an OPM graph.
>
>
> In order to populate the new table ServiceInvocation there will be a
> new, lightweight provenance layer that will be inserted between each
> of the deeper dispatch layers, it will then be able to record
> individual retries, failovers, looping.
>
>
>
> -- 
> Stian Soiland-Reyes, myGrid team
> School of Computer Science
> The University of Manchester
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> taverna-hackers mailing list
> [email protected]
> Web site: http://www.taverna.org.uk
> Mailing lists: http://www.taverna.org.uk/about/contact-us/
> Developers Guide: http://www.taverna.org.uk/developers/
> 


------------------------------------------------------------------------------

_______________________________________________
taverna-hackers mailing list
[email protected]
Web site: http://www.taverna.org.uk
Mailing lists: http://www.taverna.org.uk/about/contact-us/
Developers Guide: http://www.taverna.org.uk/developers/

Reply via email to