[galaxy-dev] stdout and stderr while using pbs

2011-05-27 Thread shashi shekhar
Hi All,

if i am using pbs . in this i am getting stderror and stdout . then how can
i handle such type of problem. can i check the standard error before
displaying anything on browser.



Reagrds
shashi shekhar
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] GATK integration

2011-05-27 Thread Jan Haas
Hey,

I know, the GATK-tools is considered to be in alpha status, but maybe someone 
can help me. First, is there a documentation on which dependencies where to put 
(just to confirm I've done it correctly) ?
Second, when I run the unified genotyper, I get the following error:

[Wed May 25 15:57:04 CEST 2011] net.sf.picard.sam.CreateSequenceDictionary 
REFERENCE=/var/folders/gr/grLYUw45FFS4KKMXzK5R6TI/-Tmp-/tmp7m3sg0/gatk_input.fasta
 
OUTPUT=/var/folders/gr/grLYUw45FFS4KKMXzK5R6TI/-Tmp-/tmp7m3sg0/dict6511818973877179087.tmp
TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 
TMP_DIR=/var/folders/gr/grLYUw45FFS4KKMXzK5R6TI/-Tmp-/ngs2 VERBOSITY=INFO 
QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 
MAX_RECORDS_IN_RAM=50 CREATE_INDEX=false CREATE_MD5_FILE=false
[Wed May 25 15:57:04 CEST 2011] net.sf.picard.sam.CreateSequenceDictionary done.
Runtime.totalMemory()=129957888
Exception in thread main java.lang.OutOfMemoryError: Java heap space
at 
net.sf.picard.reference.IndexedFastaSequenceFile.getSubsequenceAt(IndexedFastaSequenceFile.java:178)
at 
net.sf.picard.reference.IndexedFastaSequenceFile.getSequence(IndexedFastaSequenceFile.java:157)
at 
net.sf.picard.reference.IndexedFastaSequenceFile.nextSequence(IndexedFastaSequenceFile.java:234)
at 
net.sf.picard.sam.CreateSequenceDictionary.makeSequenceDictionary(CreateSequenceDictionary.java:133)
at 
net.sf.picard.sam.CreateSequenceDictionary.doWork(CreateSequenceDictionary.java:113)
at 
net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:165)
at 
org.broadinstitute.sting.gatk.datasources.simpleDataSources.ReferenceDataSource.(ReferenceDataSource.java:131)
at 
org.broadinstitute.sting.gatk.AbstractGenomeAnalysisEngine.openReferenceSequenceFile(AbstractGenomeAnalysisEngine.java:577)
at 
org.broadinstitute.sting.gatk.AbstractGenomeAnalysisEngine.initializeDataSources(AbstractGenomeAnalysisEngine.java:318)
at 
org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:90)
at 
org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:97)
at 
org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:244)
at 
org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:87)


I assume, changing the java heap size will solve the problem. But how can I do 
it in galaxy ?

Thanks for your help!

Jan




___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] Workflows and unknown number of output datasets

2011-05-27 Thread Louise-Amélie Schmitt

Hi everyone

First let me thank all the Team Galaxy, the conference was really great.

Kanwei asked me to send the tool I told him about. It's one of these 
tools for which you can't know the exact number of output datasets 
before tool run. Here are the files. It's really simple actually, but it 
would be nice if I could integrate it in workflows without it having to 
be the final step. Would it even be possible?


And please ignore the idiotic comments in the files, I was really tired 
that day.


Cheers,
L-A
# -*- coding: UTF-8 -*-

import os, sys, string

mpxdata = sys.argv[1]
barcodes= sys.argv[2]
output1 = sys.argv[3]
output1id   = sys.argv[4]
newfilepath = sys.argv[5]

# Building the command line
cmd = java -cp /g/steinmetz/projects/solexa/java/solexaJ/bin/:/g/steinmetz/projects/solexa/software/picard/trunk/dist/picard-1.18.jar:/g/steinmetz/projects/solexa/software/picard/trunk/dist/sam-1.18.jar
cmd+=  deBarcoding F1=
cmd+= mpxdata
cmd+=  BL=
cmd+= barcodes
cmd+=  DR=\
cmd+= newfilepath
cmd+= \

# Executing deBarcoding
status = os.system(cmd)

# In the unlikely event of a fire, please use the nearest emergency exit
if status != 0:
print Demultiplexing failed.
sys.exit(status)

oldnames=[]

# Reconstructing the output file names as deBarcoding writes them
bc = open(barcodes, r)
for l in bc.readlines():
l = l.split()
if l[0] != :
oldnames.append(l[0])
for i in range(len(oldnames)):
oldnames[i] = oldnames[i] + .txt

newnames=[]

# Creating the required paths for multiple outputs
if os.path.isdir(newfilepath):
for f in oldnames:
if os.path.isfile(newfilepath+/+f):
name = os.path.splitext(f)[0]
s = primary_
s+= output1id
s+= _
s+= string.replace(name, _, -)
s+= _visible_fastq
newnames.append(newfilepath+/+s)

# Adding the appropriate prefixes to the old filenames
for i in range(len(oldnames)):
oldnames[i] = newfilepath+/+oldnames[i]

# Setting the first file as the mandatory output file defined in the xml
newnames[0] = output1

# Moving everything where it will be seen properly by Galaxy
for i in range(len(oldnames)):
os.rename(oldnames[i],newnames[i])

# Ta-da!
tool id=debarcoding name=Demultiplexer
	descriptionDemultiplexes multiplexed data (who would have guessed?)/description
	command interpreter=pythondebarcoding.py $mpxdata $barcodes $output1 $output1.id $__new_file_path__/command
	inputs
		param type=data format=gz name=mpxdata label=Compressed Sequence/
		param type=data format=bc name=barcodes label=Barcode Set/

	/inputs
	outputs
		data format=fastq name=output1 metadata_source=mpxdata /
	/outputs

	help
**Program:** debarcoding.py (v1.0.0)

**Author:**  This is a wrapper for Wave's deBarcoding java tool

**Summary:** This tool demutiplexes data according to a list of barcodes containing a column withe the sample name and a second column with the barcode sequence.

**Usage:**   Here is an example of the java tool's usage::

 Two failed PE runs (multiplex 2 and multiplex8) yielded SE data that can be used for assembly together with the corrected PE ones. To this end, the deBarcoding script from wave was used:

 javasol deBarcoding

 where javasol is an alias to:

 java -cp /g/steinmetz/projects/solexa/java/solexaJ/bin/:/g/steinmetz/projects/solexa/software/picard/trunk/dist/picard-1.18.jar:/g/steinmetz/projects/solexa/software/picard/trunk/dist/sam-1.18.jar

 The command lines were (thanks to Wave for the clarifications):

 For the mplex num. 3, some pre-processing was necessary
 
 R  src/getBarcodeSequencing.R
 javasol deBarcoding F1=s_1_LESAFFRE_sequence.txt.gz BL=barcodeList.txt DR=seq_lane1
 javasol deBarcoding F1=s_2_LESAFFRE_sequence.txt.gz BL=barcodeList.txt DR=seq_lane2
 javasol deBarcoding F1=s_3_LESAFFRE_sequence.txt.gz BL=barcodeList.txt DR=seq_lane3
 cat seq_lane1/111.txt seq_lane2/111.txt seq_lane3/111.txt sequences/111.txt
 cat seq_lane1/112.txt seq_lane2/112.txt seq_lane3/112.txt sequences/112.txt
 cat seq_lane1/1251.txt seq_lane2/1251.txt seq_lane3/1251.txt sequences/1251.txt
 cat seq_lane1/1303.txt seq_lane2/1303.txt seq_lane3/1303.txt sequences/1303.txt
 cat seq_lane1/93ep.txt seq_lane2/93ep.txt seq_lane3/93ep.txt sequences/93ep.txt
 rename .txt _s_sequence.txt *.txt
 gzip *.txt

/help

/tool
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Filter data and cut column bugs

2011-05-27 Thread Peter Cock
On Mon, May 23, 2011 at 3:34 PM, Peter Cock p.j.a.c...@googlemail.com wrote:
 On Mon, May 23, 2011 at 3:09 PM, Anton Nekrutenko an...@bx.psu.edu wrote:
 Dear Peter:

 Yes, that would help.

 One possibility would be to have all new bugs CC'd to the dev
 mailing list? Not sure if everyone here would like that or not...

 But, you patches are definitely not unnoticed. We'll apply them
 (likely at the conference) ...

Kanwei has just applied the fix for issues 535 and 537, thanks!

https://bitbucket.org/galaxy/galaxy-central/issue/535/
Filter data on any column tool complains about hash comment lines

https://bitbucket.org/galaxy/galaxy-central/issue/537/
Filter data on any column tool casts unused columns

That leaves these two pending,

https://bitbucket.org/galaxy/galaxy-central/issue/534/
Cut column tool messes up # header lines

https://bitbucket.org/galaxy/galaxy-central/issue/536/
Filter data on any column tools shows nonsense % vs total on stdout

While I'm looking over minor bugs where I submitted a
patch, the following should also be fairly quick to review:

FASTQ to Tabular tool doesn't accept plain FASTQ
https://bitbucket.org/galaxy/galaxy-central/issue/436/

Conditional does not work with scripts in different path
https://bitbucket.org/galaxy/galaxy-central/issue/159/

Regards,

Peter
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/