[galaxy-dev] Workflows and unknown number of output datasets

Louise-Amélie Schmitt Fri, 27 May 2011 01:28:54 -0700

Hi everyone

First let me thank all the Team Galaxy, the conference was really great.

Kanwei asked me to send the tool I told him about. It's one of thesetools for which you can't know the exact number of output datasetsbefore tool run. Here are the files. It's really simple actually, but itwould be nice if I could integrate it in workflows without it having tobe the final step. Would it even be possible?

And please ignore the idiotic comments in the files, I was really tiredthat day.


Cheers,
L-A

# -*- coding: UTF-8 -*-

import os, sys, string

mpxdata         = sys.argv[1]
barcodes        = sys.argv[2]
output1         = sys.argv[3]
output1id       = sys.argv[4]
newfilepath     = sys.argv[5]

# Building the command line
cmd = "java -cp /g/steinmetz/projects/solexa/java/solexaJ/bin/:/g/steinmetz/projects/solexa/software/picard/trunk/dist/picard-1.18.jar:/g/steinmetz/projects/solexa/software/picard/trunk/dist/sam-1.18.jar"
cmd+= " deBarcoding F1="
cmd+= mpxdata
cmd+= " BL="
cmd+= barcodes
cmd+= " DR=\""
cmd+= newfilepath
cmd+= "\""

# Executing deBarcoding
status = os.system(cmd)

# In the unlikely event of a fire, please use the nearest emergency exit
if status != 0:
        print "Demultiplexing failed."
        sys.exit(status)

oldnames=[]

# Reconstructing the output file names as deBarcoding writes them
bc = open(barcodes, "r")
for l in bc.readlines():
        l = l.split()
        if l[0] != "":
                oldnames.append(l[0])
for i in range(len(oldnames)):
        oldnames[i] = oldnames[i] + ".txt"

newnames=[]

# Creating the required paths for multiple outputs
if os.path.isdir(newfilepath):
        for f in oldnames:
                if os.path.isfile(newfilepath+"/"+f):
                        name = os.path.splitext(f)[0]
                        s = "primary_"
                        s+= output1id
                        s+= "_"
                        s+= string.replace(name, "_", "-")
                        s+= "_visible_fastq"
                        newnames.append(newfilepath+"/"+s)

# Adding the appropriate prefixes to the old filenames
for i in range(len(oldnames)):
        oldnames[i] = newfilepath+"/"+oldnames[i]

# Setting the first file as the mandatory output file defined in the xml
newnames[0] = output1

# Moving everything where it will be seen properly by Galaxy
for i in range(len(oldnames)):
        os.rename(oldnames[i],newnames[i])

# Ta-da!

<tool id="debarcoding" name="Demultiplexer">
	<description>Demultiplexes multiplexed data (who would have guessed?)</description>
	<command interpreter="python">debarcoding.py $mpxdata $barcodes $output1 $output1.id $__new_file_path__</command>
	<inputs>
		<param type="data" format="gz" name="mpxdata" label="Compressed Sequence"/>
		<param type="data" format="bc" name="barcodes" label="Barcode Set"/>

	</inputs>
	<outputs>
		<data format="fastq" name="output1" metadata_source="mpxdata" />
	</outputs>

	<help>
**Program:** debarcoding.py (v1.0.0)

**Author:**  This is a wrapper for Wave's deBarcoding java tool

**Summary:** This tool demutiplexes data according to a list of barcodes containing a column withe the sample name and a second column with the barcode sequence.

**Usage:**   Here is an example of the java tool's usage::

 Two failed PE runs (multiplex 2 and multiplex8) yielded SE data that can be used for assembly together with the corrected PE ones. To this end, the deBarcoding script from wave was used:

 javasol deBarcoding

 where javasol is an alias to:

 java -cp /g/steinmetz/projects/solexa/java/solexaJ/bin/:/g/steinmetz/projects/solexa/software/picard/trunk/dist/picard-1.18.jar:/g/steinmetz/projects/solexa/software/picard/trunk/dist/sam-1.18.jar

 The command lines were (thanks to Wave for the clarifications):

 For the mplex num. 3, some pre-processing was necessary
 
 R  src/getBarcodeSequencing.R
 javasol deBarcoding F1=s_1_LESAFFRE_sequence.txt.gz BL=barcodeList.txt DR="seq_lane1"
 javasol deBarcoding F1=s_2_LESAFFRE_sequence.txt.gz BL=barcodeList.txt DR="seq_lane2"
 javasol deBarcoding F1=s_3_LESAFFRE_sequence.txt.gz BL=barcodeList.txt DR="seq_lane3"
 cat seq_lane1/111.txt seq_lane2/111.txt seq_lane3/111.txt >sequences/111.txt
 cat seq_lane1/112.txt seq_lane2/112.txt seq_lane3/112.txt >sequences/112.txt
 cat seq_lane1/1251.txt seq_lane2/1251.txt seq_lane3/1251.txt >sequences/1251.txt
 cat seq_lane1/1303.txt seq_lane2/1303.txt seq_lane3/1303.txt >sequences/1303.txt
 cat seq_lane1/93ep.txt seq_lane2/93ep.txt seq_lane3/93ep.txt >sequences/93ep.txt
 rename ".txt" "_s_sequence.txt" *.txt
 gzip *.txt

        </help>

</tool>

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] Workflows and unknown number of output datasets

Reply via email to