On Fri, Jan 28, 2011 at 17:57, Guzman Llambias - INCO
<[email protected]> wrote:
> Sorry, I used a bad term to explain my solution. When I talk about
> streaming I'm not talking about video streaming or something like
> that. My use case is the following:
>
> We have a system that produces a very big input data file, that
> Taverna will use it as Input data (you can see it as a list of data
> elements). Instead of waiting for the hole file to be produced, we are
> trying to split the data in elements and give them to Taverna as they
> are available. It takes time to produce each data element so, Taverna
> can start working with the elements it has. When the third party
> system finished it's job, it will close the streaming to tell Taverna
> that no more inputs will be sent, so Taverna can know when to finish
> the experiment.

The BioMart activity in Taverna can do this kind of "pipelining"

To summarize, BioMart allows the user to configure a database query
towards a MART service, which then sends back the result of the query
in a kind of comma separated plain text format.

What the activity does is to split this both by row and column, and
output a list of values at each port.

As the format is quite easy to parse, it is possible to read the
incoming data line by line and push the individual list items out of
the activity, even before the whole data has been sent over the wire.
Taverna has pipelining capabilities, so all the steps below the
BioMart service in the workflow, (which are expecting single items)
will start processing the individual values as soon as they arrive.


When you define such an activity you will define say 'depth 1' on the
output port (a list), but 'depth 0' (single items) for the *granular
depth* - the depth of which the activity will return items. It can
then use the callBack.receiveResults()  with an index array saying
which position the value should have in the parent list. (Typically
just [0], [1], etc.. - but you could return out of order, or have
higher dimensions if the difference between depth and granular depth
was higher)

In the very end, the service must output the 'full list' at the empty
index [] - so it needs to keep around the t2reference's for the output
returned earlier - but a ever-running service could simply never
return the list. The 'downside' to this is that any service downstream
expecting a list would never be run (as the list is never finished).



-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester

------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
taverna-hackers mailing list
[email protected]
Web site: http://www.taverna.org.uk
Mailing lists: http://www.taverna.org.uk/about/contact-us/
Developers Guide: http://www.taverna.org.uk/developers/

Reply via email to