The Tool activity does (as you have found out) create a new subfolder for each execution.
Taverna does however pass values by reference in the workflow, and the Tool activity use symlinking to avoid duplicating/copying files when the values come from a previous Tool activity. The command used is normally /bin/ln -s, but this can be customized within Preferences -> Tool Invocation to do hard-linking or copying instead. So if you feed one file output as an file input to the next step, it will efficiency-wise be the same as if it was in the same folder (but allowing a different filename). This should be quite important for your GB-sized values. I would generally advise against using a whole folder (unless we are talking hundreds of files or files which filenames are undetermined before running) - as it basically would mean that data would be flowing outside your workflow, making your workflow harder to understand and maintain. You would also loose the benefits a dataflow system gives, such as ability to track/inspect those intermediate values, or to redirect them to alternate steps by just rewiring the workflow. Now, I am not sure as to why you want to do this, perhaps your tool outputs many many files? It would not be practical to modify the Tool service to declare all those files in advance. Some have used "tar" "zip" to pass a single archive, but since you talked about GBs I don't think that would be particularly efficient. The tool activity can consume a list of values, using Advanced -> File list. This would take a list of values (say from a Taverna iteration over the previous step) and store all of those values within the temporary directory, and write an ordered list of their filenames to the specified file. There is however not an equivalent mechanism for output lists or folders. An obvious workaround is to change your commands to move the outputs in a different folder, e.g. add a few lines: mkdir -p ../julian mv *.csv ../julian/ And similar to copy/link in for the next step: ln -s ../julian/*.csv . but now you would have to add a clean-up-step to the beginning (and end) of the workflow as you have effectively made a global variable that persist across runs, making old results hang around. You would also then have to be careful and not run the same workflow (or two derived workflows) at the same time as they would be stepping on each others toes. A cleaner solution, is tha you just want to pass the working directory, then add "pwd" to the script. If there is no other output, then the value of STDOUT would be the folder, which you can pass as a parameter to the next step. If you have other outputs, then do this via a file, e.g. add a line: pwd > directory.txt and add directory.txt as a File Output "directory". And in the next step, connect and add "directory" as String Replacement input and insert to its script: ln -s %%directory%%/*.csv . Now your data would not be flowing "outside" the workflow, although the workflow would only be passing a reference to a (unique) directory. Note that the directory is persisted until you delete the workflow run. Note that obviously these methods makes your workflow much more OS-specific. On 26 November 2013 17:26, stylz2k <[email protected]> wrote: > Am 26.11.2013 14:56, schrieb Stian Soiland-Reyes: > > I see you have got several pointers to other ways to do it. Personally I > would go for the Tool activity if the jar in question might otherwise cause > Taverna to crash. > > Have you checked if the outofmemory happens Taverna-side (possibly making > Taverna unstable) or with your subprocess? You are only allowing it 512 MB > with your command below. > > Also try to look at the end of taverna.log, click Advanced -> Show logs and > folders. > > On 25 Nov 2013 20:06, "stylz2k" <[email protected]> wrote: >> >> Hi everyone, >> >> I'm trying to create a workflow for processing data from next generation >> sequencing data. So i have big files. >> >> Input ~ 15 GB. >> >> >> Basically I'm trying to use a beanshell script that looks like the >> following: >> >> String command = "java -Xms256m -Xmx512m -jar /path/to/Programm.jar"; >> Process proc = null; >> Runtime rt = Runtime.getRuntime(); >> proc = rt.exec(command >> + input1 >> + input2 >> + input3 >> + output); >> int exitVal = proc.waitFor(); >> >> When running the workflow I get the following error: >> >> java.lang.OutOfMemoryError: Requested array size exceeds VM limit >> >> My linux machine has 16 gb of memory and I have already tried to give >> taverna more memory at the start. >> >> On gnome-terminal the command runs without any errors. Are there any >> suggestions? >> >> I'm really thankful for any help on this topic. >> >> Best Regards, >> >> Julian >> >> >> > Hi, > > I'm now going for the external Tool activity, and it works! Thanks to > everyone who answered this question. You are great! > > Unfortunately there are new questions. (Who guessed...) > > I successfully created a tool activity, which runs perfectly. > Now i want to use the created output files with a second tool activity (and > more to come...) in the same workflow. > > I discovered that Taverna saves the output files to: > > folder1 =/a/directoy/i/can/choose/usecase2041533162701613279dir > > In my attemps the second tool activity creates a new folder and creates a > softlink to the files from folder1. > Of course this will work but is it possible to run a different tool activity > in the same folder? > > I'm asking because i need the output files from every tool activity. Is > there any way of collecting all output files in one folder? > Or is it possible to execute the complete workflow on folder1 ? > > Thanks for any suggestions. > > > > > Best, > > Julian > > ------------------------------------------------------------------------------ > Rapidly troubleshoot problems before they affect your business. Most IT > organizations don't have a clear picture of how application performance > affects their revenue. With AppDynamics, you get 100% visibility into your > Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics > Pro! > http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk > _______________________________________________ > taverna-users mailing list > [email protected] > [email protected] > Web site: http://www.taverna.org.uk > Mailing lists: http://www.taverna.org.uk/about/contact-us/ > -- Stian Soiland-Reyes, myGrid team School of Computer Science The University of Manchester http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718 ------------------------------------------------------------------------------ Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk _______________________________________________ taverna-users mailing list [email protected] [email protected] Web site: http://www.taverna.org.uk Mailing lists: http://www.taverna.org.uk/about/contact-us/
