The Tool activity does (as you have found out) create a new subfolder
for each execution.

Taverna does however pass values by reference in the workflow, and the
Tool activity use symlinking to avoid duplicating/copying files when
the values come from a previous Tool activity. The command used is
normally /bin/ln -s, but this can be customized within Preferences ->
Tool Invocation to do hard-linking or copying instead.

So if you feed one file output as an file input to the next step, it
will efficiency-wise be the same as if it was in the same folder (but
allowing a different filename). This should be quite important for
your GB-sized values.

I would generally advise against using a whole folder (unless we are
talking hundreds of files or files which filenames are undetermined
before running) - as it basically would mean that data would be
flowing outside your workflow, making your workflow harder to
understand and maintain.

You would also loose the benefits a dataflow system gives, such as
ability to track/inspect those intermediate values, or to redirect
them to alternate steps by just rewiring the workflow.


Now, I am not sure as to why you want to do this, perhaps your tool
outputs many many files? It would not be practical to modify the Tool
service to declare all those files in advance.  Some have used "tar"
"zip" to pass a single archive, but since you talked about GBs I don't
think that would be particularly efficient.


The tool activity can consume a list of values, using Advanced -> File
list. This would take a list of values (say from a Taverna iteration
over the previous step) and store all of those values within the
temporary directory, and write an ordered list of their filenames to
the specified file. There is however not an equivalent mechanism for
output lists or folders.


An obvious workaround is to change your commands to move the outputs
in a different folder, e.g. add a few lines:

  mkdir -p ../julian
  mv *.csv ../julian/

And similar to copy/link in for the next step:
  ln -s ../julian/*.csv .

but now you would have to add a clean-up-step to the beginning (and
end) of the workflow as you have effectively made a global variable
that persist across runs, making old results hang around.

You would also then have to be careful and not run the same workflow
(or two derived workflows) at the same time as they would be stepping
on each others toes.


A cleaner solution, is tha you just want to pass the working
directory, then add "pwd" to the script. If there is no other output,
then the value of STDOUT would be the folder, which you can pass as a
parameter to the next step.  If you have other outputs, then do this
via a file, e.g. add a line:

  pwd > directory.txt

and add directory.txt as a File Output "directory".

And in the next step, connect and add "directory" as String
Replacement input and insert to its script:

  ln -s %%directory%%/*.csv .


Now your data would not be flowing "outside" the workflow, although
the workflow would only be passing a reference to a (unique)
directory. Note that the directory is persisted until you delete the
workflow run.

Note that obviously these methods makes your workflow much more OS-specific.

On 26 November 2013 17:26, stylz2k <[email protected]> wrote:
> Am 26.11.2013 14:56, schrieb Stian Soiland-Reyes:
>
> I see you have got several pointers to other ways to do it. Personally I
> would go for the Tool activity if the jar in question might otherwise cause
> Taverna to crash.
>
> Have you checked if the outofmemory happens Taverna-side (possibly making
> Taverna unstable) or with your subprocess? You are only allowing it 512 MB
> with your command below.
>
> Also try to look at the end of taverna.log, click Advanced -> Show logs and
> folders.
>
> On 25 Nov 2013 20:06, "stylz2k" <[email protected]> wrote:
>>
>> Hi everyone,
>>
>> I'm trying to create a workflow for processing data from next generation
>> sequencing data. So i have big files.
>>
>> Input ~ 15 GB.
>>
>>
>> Basically I'm trying to use a beanshell script that looks like the
>> following:
>>
>> String command = "java -Xms256m -Xmx512m -jar /path/to/Programm.jar";
>> Process proc = null;
>> Runtime rt = Runtime.getRuntime();
>> proc = rt.exec(command
>>      + input1
>>      + input2
>>      + input3
>>      + output);
>> int exitVal = proc.waitFor();
>>
>> When running the workflow I get the following error:
>>
>> java.lang.OutOfMemoryError: Requested array size exceeds VM limit
>>
>> My linux machine has 16 gb of memory and I have already tried to give
>> taverna more memory at the start.
>>
>> On gnome-terminal the command runs without any errors. Are there any
>> suggestions?
>>
>> I'm really thankful for any help on this topic.
>>
>> Best Regards,
>>
>> Julian
>>
>>
>>
> Hi,
>
> I'm now going for the external Tool activity, and it works! Thanks to
> everyone who answered this question. You are great!
>
> Unfortunately there are new questions. (Who guessed...)
>
> I successfully created a tool activity, which runs perfectly.
> Now i want to use the created output files with a second tool activity (and
> more to come...) in the same workflow.
>
> I discovered that Taverna saves the output files to:
>
> folder1 =/a/directoy/i/can/choose/usecase2041533162701613279dir
>
> In my attemps the second tool activity creates a new folder and creates a
> softlink to the files from folder1.
> Of course this will work but is it possible to run a different tool activity
> in the same folder?
>
> I'm asking because i need the output files from every tool activity. Is
> there any way of collecting all output files in one folder?
> Or is it possible to execute the complete workflow on folder1 ?
>
> Thanks for any suggestions.
>
>
>
>
> Best,
>
> Julian
>
> ------------------------------------------------------------------------------
> Rapidly troubleshoot problems before they affect your business. Most IT
> organizations don't have a clear picture of how application performance
> affects their revenue. With AppDynamics, you get 100% visibility into your
> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics
> Pro!
> http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk
> _______________________________________________
> taverna-users mailing list
> [email protected]
> [email protected]
> Web site: http://www.taverna.org.uk
> Mailing lists: http://www.taverna.org.uk/about/contact-us/
>



-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk
_______________________________________________
taverna-users mailing list
[email protected]
[email protected]
Web site: http://www.taverna.org.uk
Mailing lists: http://www.taverna.org.uk/about/contact-us/

Reply via email to