Hi everyone First let me thank all the Team Galaxy, the conference was really great.
Kanwei asked me to send the tool I told him about. It's one of these tools for which you can't know the exact number of output datasets before tool run. Here are the files. It's really simple actually, but it would be nice if I could integrate it in workflows without it having to be the final step. Would it even be possible?
And please ignore the idiotic comments in the files, I was really tired that day.
Cheers, L-A
# -*- coding: UTF-8 -*- import os, sys, string mpxdata = sys.argv[1] barcodes = sys.argv[2] output1 = sys.argv[3] output1id = sys.argv[4] newfilepath = sys.argv[5] # Building the command line cmd = "java -cp /g/steinmetz/projects/solexa/java/solexaJ/bin/:/g/steinmetz/projects/solexa/software/picard/trunk/dist/picard-1.18.jar:/g/steinmetz/projects/solexa/software/picard/trunk/dist/sam-1.18.jar" cmd+= " deBarcoding F1=" cmd+= mpxdata cmd+= " BL=" cmd+= barcodes cmd+= " DR=\"" cmd+= newfilepath cmd+= "\"" # Executing deBarcoding status = os.system(cmd) # In the unlikely event of a fire, please use the nearest emergency exit if status != 0: print "Demultiplexing failed." sys.exit(status) oldnames=[] # Reconstructing the output file names as deBarcoding writes them bc = open(barcodes, "r") for l in bc.readlines(): l = l.split() if l[0] != "": oldnames.append(l[0]) for i in range(len(oldnames)): oldnames[i] = oldnames[i] + ".txt" newnames=[] # Creating the required paths for multiple outputs if os.path.isdir(newfilepath): for f in oldnames: if os.path.isfile(newfilepath+"/"+f): name = os.path.splitext(f)[0] s = "primary_" s+= output1id s+= "_" s+= string.replace(name, "_", "-") s+= "_visible_fastq" newnames.append(newfilepath+"/"+s) # Adding the appropriate prefixes to the old filenames for i in range(len(oldnames)): oldnames[i] = newfilepath+"/"+oldnames[i] # Setting the first file as the mandatory output file defined in the xml newnames[0] = output1 # Moving everything where it will be seen properly by Galaxy for i in range(len(oldnames)): os.rename(oldnames[i],newnames[i]) # Ta-da!
<tool id="debarcoding" name="Demultiplexer"> <description>Demultiplexes multiplexed data (who would have guessed?)</description> <command interpreter="python">debarcoding.py $mpxdata $barcodes $output1 $output1.id $__new_file_path__</command> <inputs> <param type="data" format="gz" name="mpxdata" label="Compressed Sequence"/> <param type="data" format="bc" name="barcodes" label="Barcode Set"/> </inputs> <outputs> <data format="fastq" name="output1" metadata_source="mpxdata" /> </outputs> <help> **Program:** debarcoding.py (v1.0.0) **Author:** This is a wrapper for Wave's deBarcoding java tool **Summary:** This tool demutiplexes data according to a list of barcodes containing a column withe the sample name and a second column with the barcode sequence. **Usage:** Here is an example of the java tool's usage:: Two failed PE runs (multiplex 2 and multiplex8) yielded SE data that can be used for assembly together with the corrected PE ones. To this end, the deBarcoding script from wave was used: javasol deBarcoding where javasol is an alias to: java -cp /g/steinmetz/projects/solexa/java/solexaJ/bin/:/g/steinmetz/projects/solexa/software/picard/trunk/dist/picard-1.18.jar:/g/steinmetz/projects/solexa/software/picard/trunk/dist/sam-1.18.jar The command lines were (thanks to Wave for the clarifications): For the mplex num. 3, some pre-processing was necessary R src/getBarcodeSequencing.R javasol deBarcoding F1=s_1_LESAFFRE_sequence.txt.gz BL=barcodeList.txt DR="seq_lane1" javasol deBarcoding F1=s_2_LESAFFRE_sequence.txt.gz BL=barcodeList.txt DR="seq_lane2" javasol deBarcoding F1=s_3_LESAFFRE_sequence.txt.gz BL=barcodeList.txt DR="seq_lane3" cat seq_lane1/111.txt seq_lane2/111.txt seq_lane3/111.txt >sequences/111.txt cat seq_lane1/112.txt seq_lane2/112.txt seq_lane3/112.txt >sequences/112.txt cat seq_lane1/1251.txt seq_lane2/1251.txt seq_lane3/1251.txt >sequences/1251.txt cat seq_lane1/1303.txt seq_lane2/1303.txt seq_lane3/1303.txt >sequences/1303.txt cat seq_lane1/93ep.txt seq_lane2/93ep.txt seq_lane3/93ep.txt >sequences/93ep.txt rename ".txt" "_s_sequence.txt" *.txt gzip *.txt </help> </tool>
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/