On Thu, May 28, 2009 at 14:03, Silvia Giuliani <[email protected]> wrote: > Dear all > > We have constructed a workflow for analysing microarray data using Soaplab > web services (workflow attached). The idea is that the user supplies a > number of affymetrix CEL files (about 30Mb) via Taverna which are then sent > to the web services for analysis. Our problem though is that we rapidly run > out of memory (Java Heap space) when two or more files are supplied. We have > assigned more memory to Java, and this helps, but clearly we are a long way > from our goal of being able to analyse tens of files. The solution would be > use to references instead of loading the files into memory but we cant find > anything in the manual that shows us how to do this. Any clues?
Have you tried this workflow in the latest 2.1b1 ? See http://www.myexperiment.org/packs/60 - this version should be more memory efficient as it dumps large data to a database stored on disk. As to using references, this would require changing the services to work with references instead of (or in addition to) the full data. The easiest way to do this is to accept a URL instead of the real data, like http://myservice/outputs/ed73c5d4-717f-4ddd-8263-6791aa85c07c.xml If the service also returns data in this way, pipelining between such services means that the big data is never transfered back or forth to Taverna. If the service is clever, it can even recognize that it's one of it's 'own' URLs and just access the file directly without any downloading. There could be a kind of 'upload' method for getting started with the first service inputs, but it should check somehow that the data is real (as expected by the service) to avoid abuse of the service by people who like to share illegal and obscene stuff. Instead of a URL you could use some internal identification scheme, but note that then your references would not work with other services doing the same trick, and you would have to provide some kind of download method. I would recommend making 'mirrored' methods or services for supporting references in case you imagine clients who would not need the references. There's unfortunately not any agreed upon service standard for saying that a reference is in place of the 'real' data - so currently you would have to introduce a shim into the workflow that converts the URL to a T2reference if you are connecting the output to another service or want to see it locally. Such a beanshell script can be as easy as: output = new URL(input); -- Stian Soiland-Reyes, myGrid team School of Computer Science The University of Manchester ------------------------------------------------------------------------------ OpenSolaris 2009.06 is a cutting edge operating system for enterprises looking to deploy the next generation of Solaris that includes the latest innovations from Sun and the OpenSource community. Download a copy and enjoy capabilities such as Networking, Storage and Virtualization. Go to: http://p.sf.net/sfu/opensolaris-get _______________________________________________ taverna-users mailing list [email protected] [email protected] Web site: http://www.taverna.org.uk Mailing lists: http://www.taverna.org.uk/taverna-mailing-lists/
