On Fri, Jan 28, 2011 at 17:57, Guzman Llambias - INCO <[email protected]> wrote: > Sorry, I used a bad term to explain my solution. When I talk about > streaming I'm not talking about video streaming or something like > that. My use case is the following: > > We have a system that produces a very big input data file, that > Taverna will use it as Input data (you can see it as a list of data > elements). Instead of waiting for the hole file to be produced, we are > trying to split the data in elements and give them to Taverna as they > are available. It takes time to produce each data element so, Taverna > can start working with the elements it has. When the third party > system finished it's job, it will close the streaming to tell Taverna > that no more inputs will be sent, so Taverna can know when to finish > the experiment.
The BioMart activity in Taverna can do this kind of "pipelining" To summarize, BioMart allows the user to configure a database query towards a MART service, which then sends back the result of the query in a kind of comma separated plain text format. What the activity does is to split this both by row and column, and output a list of values at each port. As the format is quite easy to parse, it is possible to read the incoming data line by line and push the individual list items out of the activity, even before the whole data has been sent over the wire. Taverna has pipelining capabilities, so all the steps below the BioMart service in the workflow, (which are expecting single items) will start processing the individual values as soon as they arrive. When you define such an activity you will define say 'depth 1' on the output port (a list), but 'depth 0' (single items) for the *granular depth* - the depth of which the activity will return items. It can then use the callBack.receiveResults() with an index array saying which position the value should have in the parent list. (Typically just [0], [1], etc.. - but you could return out of order, or have higher dimensions if the difference between depth and granular depth was higher) In the very end, the service must output the 'full list' at the empty index [] - so it needs to keep around the t2reference's for the output returned earlier - but a ever-running service could simply never return the list. The 'downside' to this is that any service downstream expecting a list would never be run (as the list is never finished). -- Stian Soiland-Reyes, myGrid team School of Computer Science The University of Manchester ------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ taverna-hackers mailing list [email protected] Web site: http://www.taverna.org.uk Mailing lists: http://www.taverna.org.uk/about/contact-us/ Developers Guide: http://www.taverna.org.uk/developers/
