date:20130205

[galaxy-user] Problems with large gzipped fasta files

2013-02-05 Thread Jim Robinson


Hi,

I am having a lot of difficulty uploading some large gzipped fastqs (~ 
10GB) to the public server.   I have tried both ftp and pulling by 
http URL.   The upload succeeds, however I get an error as it tries to 
gunzip it.I have tried more than 10 times now and succeeded once.  
These files are correct and complete, and gunzip properly locally.   The 
error shown is usually this


empty
format: txt, database: ?
Problem decompressing gzipped data

However on 2 occasions (both ftp uploads) I got the traceback below.   
Am I missing some obvious trick?   I searched the archives and see 
references to problems with large gzipped files but no solutions.


Thanks

Jim


Traceback (most recent call last):
  File /galaxy/home/g2main/galaxy_main/tools/data_source/upload.py, 
line 384, in module

__main__()
  File /galaxy/home/g2main/galaxy_main/tools/data_source/upload.py, 
line 373, in __main__

add_file( dataset, registry, json_file, output_path )
  File /galaxy/home/g2main/galaxy_main/tools/data_source/upload.py, 
line 270, in add_file
line_count, converted_path = sniff.convert_newlines( dataset.path, 
in_place=in_place )
  File /galaxy/home/g2main/galaxy_main/lib/galaxy/datatypes/sniff.py, 
line 106, in convert_newlines

shutil.move( temp_name, fname )
  File /usr/lib/python2.7/shutil.py, line 299, in move
copy2(src, real_dst)
  File /usr/lib/python2.7/shutil.py, line 128, in copy2
copyfile(src, dst)
  File /usr/lib/python2.7/shutil.py, line 84, in copyfile
copyfileobj(fsrc, fdst)
  File /usr/lib/python2.7/shutil.py, line 49, in copyfileobj
buf = fsrc.read(length)
IOError: [Errno 5] Input/output error
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Error / Nebula and RepeatMasker

2013-02-05 Thread Sarah Maman


Hi Alban, Marie-Stephane and Bjoern,


* _*For IntersectBed tool,*_ running is OK. I've just deleted ; in xml 
file and your tool runs. My cluster add one ; so a double ;; gives 
an error.


I thing that this new eroor is due to files tested :

Differing number of VCF fields encountered at line: 38.  Exiting...

*_*For RepeatMasker tool*_, thanks to Marie-Stephane (CONGRATULATIONS 
MARIE-STEPHANE !), we need to specify that bash commands are included iwith 
Python. So xml file have been modify in this way :
- Specify perl before RepeatMasker command.
- Specify RepeatMasker path
- Include all bash comman in a Python script :  #os.system(cp $gff_file 
$output_gff;)  instead of  cp $gff_file $output_gff;
So the code is :

   /command
## The command is a Cheetah template which allows some Python based syntax.
## Lines starting hash hash are comments. Galaxy will turn newlines into 
spaces

## create temp directory
#import tempfile, os
#set $dirname = os.path.abspath(tempfile.mkdtemp())
#set $input_filename = os.path.split(str($query))[-1]
#set $output_basename = os.path.join($dirname, $input_filename)
//*perl /usr/local/bioinfo/bin/RepeatMasker*// -parallel 8 $nolow $noint 
$norna

#if str($species)!=all:
   $species
#end if
-dir $dirname
#if $adv_opts.adv_opts_selector==advanced:
   #if str($adv_opts.gc)!=0:
   -gc $adv_opts.gc
   #end if
   $adv_opts.gccalc
   #set $output_files_list = str($adv_opts.output_files).split(',')
   #if gff in $output_files_list:
   -gff
   #end if
   #if html in $output_files_list:
   -html
   #end if
   $adv_opts.slow_search
   $adv_opts.quick_search
   $adv_opts.rush_search
   $adv_opts.only_alus
   $adv_opts.is_only
#else:
   ## Set defaults
   -gff
## End of advanced options:
#end if
$query
///dev/null 2 /dev/null; /

/## Copy the output files to galaxy
#if $adv_opts.adv_opts_selector==advanced:
   #if summary in $output_files_list:
   ## Write out the summary file (default)
   #set $summary_file = $output_basename + '.tbl'
   #os.system(cp $summary_file $output_summary;)
   #end if
   #if gff in $output_files_list:
   ## Write out the gff file (default)
   #set $gff_file = $output_basename + '.out.gff'
   #os.system(cp $gff_file $output_gff);
   #end if
   #if html in $output_files_list:
   ## Write out the html file
   #set $html_file = $output_basename + '.out.html'
   #os.system(cp $html_file $output_html;)
   #end if
#else:

   ## Write out the summary file (default)
   #set $summary_file = $output_basename + '.tbl'
   *#os.system(cp $summary_file $output_summary;) *
   ## Write out the gff file (default)
   #set $gff_file = $output_basename + '.out.gff'
   #os.system(cp $gff_file $output_gff;)
## End of advanced options:
#end if
## Write out mask sequence file
#set $mask_sequence_file = $output_basename + '.masked'
#os.system(cp $mask_sequence_file $output_mask;)
## Write out standard file (default)
## The default '.out' file from RepeatMasker has a 3-line header and 
spaces rather

## than tabs. Remove the header and replace the whitespaces with tab
#set $standard_file = $output_basename + '.out'
#os.system(tail -n +4 $standard_file | tr -s ' ' '\t'  $output_std;)
## Delete all temporary files
#os.system(rm $dirname -r; /


* _*For the tool FilterControl*_, our cluster is configured to kill jobs that 
use more than 4Go of memory.
I don't achieve to modify qsub options in my Galaxy instance, so I'he changed this option 
-Xmx6g in  -Xmx4g. Maybe some treatment won't run by lake of memory ..
If you have any idea on how to add option on Galaxy qsub, could you please help 
me ?
I would like to add these options to qsub : qsub -l mem=6G -l h_vmem=8G


*  _*ChIPMunk*_ :Sorry, but all ChIPMunk files (xml, pl; sh) didn't have execution 
rights... So I just do chmod a+x on these files and ChIPMunk tool is OK in my 
Galaxy instance


Thanks a lot for all your explanations, Alban, Marie-Stephane and Bjoern .

Thanks in advance for qsub and IntersectBed tool (test file),

Sarah






alermine a écrit :

Hi Sarah,

I'll try to debug, point per point:

- Here is the error I get when running the tool FilterControl 
*** glibc detected *** java: double free or corruption (!prev): 0x7fe56800ecd0 ***


I think here you have a misconfiguration of the memory of your java 
install (according to the need of the tool)


If you look at the FilterControlPeaks.sh file, the java is called with 
the option -Xmx6g. So your java install have to be allowed to use 6G 
as memory (by default it's 1024M)


- Here is the error I get when running the tool IntersectBed 


/work/galaxy/database/pbs/galaxy_4129.sh: line 13: Erreur de syntaxe
près du symbole inattendu « ;; »
/work/galaxy/database/pbs/galaxy_4129.sh: line 13: `bedtools intersect
-f 0.05



I don't understand this one, The intersectBed tool is only composed of 
a xml which simply call bedtools..


Check the command by typing 'bedtools intersect' in a terminal, if it 
sends you back

Re: [galaxy-user] Error / Nebula and RepeatMasker

2013-02-05 Thread alermine

Hi all,

 * For the tool FilterControl, our cluster is configured to kill jobs that use 
 more than 4Go of memory.
 I don't achieve to modify qsub options in my Galaxy instance, so I'he changed 
 this option -Xmx6g in  -Xmx4g. Maybe some treatment won't run by lake of 
 memory ..
The memory that you specify with -Xmx6g is the memory used by Java and NOT the 
memory used for the job launched by Galaxy on the cluster (see 
http://www.auditmypc.com/java-memory-xmx512.asp).
So you have to specify in your java configuration the min and max memory 
usable. 
 If you have any idea on how to add option on Galaxy qsub, could you please 
 help me ?
 I would like to add these options to qsub : qsub -l mem=6G -l h_vmem=8G
Try by specifying in universe_wsgi.ini runner:///queue/-l 
nodes=1:ppn=1,mem=6gb,h_vmem=8gb/

If it's not understood by Galaxy,To add new qsub options, you have to make some 
modifications in the source code:

1. Identify your scheduler (pbs, drmaa, sge)
2. Edit the python script which creates jobs for your scheduler:

GALAXY_INSTALL_DIR/galaxy-dist/lib/galaxy/jobs/runners/pbs.py
GALAXY_INSTALL_DIR/galaxy-dist/lib/galaxy/jobs/runners/drmaa.py
GALAXY_INSTALL_DIR/galaxy-dist/lib/galaxy/jobs/runners/sge.py

3. Search in the script the function which parse your scheduler options (ex: 
for pbs.py, the function is named def determine_pbs_options( self, url ):)

4. Modify the parsing step to make this function understand the h_vmem option

Hope it's help you,

++,

Alban

--
Alban Lermine
Unité 900: INSERM - Mines ParisTech - Institut Curie
 Bioinformatics and Computational Systems Biology of Cancer
11-13 rue Pierre et Marie Curie (1er étage) - 75005 Paris - France
Tel: +33 (0) 1 56 24 69 84



Le 5 févr. 2013 à 14:53, Sarah Maman sarah.ma...@toulouse.inra.fr a écrit :

 Hi Alban, Marie-Stephane and Bjoern,
 
 
 * For IntersectBed tool, running is OK. I've just deleted ; in xml file and 
 your tool runs. My cluster add one ; so a double ;; gives an error.
 
 I thing that this new eroor is due to files tested :
 
 Differing number of VCF fields encountered at line: 38.  Exiting...
 
 *For RepeatMasker tool, thanks to Marie-Stephane (CONGRATULATIONS 
 MARIE-STEPHANE !), we need to specify that bash commands are included iwith 
 Python. So xml file have been modify in this way :
 - Specify perl before RepeatMasker command.
 - Specify RepeatMasker path
 - Include all bash comman in a Python script :  #os.system(cp $gff_file 
 $output_gff;)  instead of  cp $gff_file $output_gff;
 So the code is :
 command 
 ## The command is a Cheetah template which allows some Python based syntax. 
 ## Lines starting hash hash are comments. Galaxy will turn newlines into 
 spaces 
 ## create temp directory 
 #import tempfile, os 
 #set $dirname = os.path.abspath(tempfile.mkdtemp()) 
 #set $input_filename = os.path.split(str($query))[-1] 
 #set $output_basename = os.path.join($dirname, $input_filename) 
 perl /usr/local/bioinfo/bin/RepeatMasker -parallel 8 $nolow $noint $norna 
 #if str($species)!=all: 
 $species 
 #end if 
 -dir $dirname 
 #if $adv_opts.adv_opts_selector==advanced: 
 #if str($adv_opts.gc)!=0: 
 -gc $adv_opts.gc 
 #end if 
 $adv_opts.gccalc 
 #set $output_files_list = str($adv_opts.output_files).split(',') 
 #if gff in $output_files_list: 
 -gff 
 #end if 
 #if html in $output_files_list: 
 -html 
 #end if 
 $adv_opts.slow_search 
 $adv_opts.quick_search 
 $adv_opts.rush_search 
 $adv_opts.only_alus 
 $adv_opts.is_only 
 #else: 
 ## Set defaults 
 -gff 
 ## End of advanced options: 
 #end if 
 $query 
 /dev/null 2 /dev/null;
 ## Copy the output files to galaxy 
 #if $adv_opts.adv_opts_selector==advanced: 
 #if summary in $output_files_list: 
 ## Write out the summary file (default) 
 #set $summary_file = $output_basename + '.tbl' 
 #os.system(cp $summary_file $output_summary;) 
 #end if 
 #if gff in $output_files_list: 
 ## Write out the gff file (default) 
 #set $gff_file = $output_basename + '.out.gff' 
 #os.system(cp $gff_file $output_gff); 
 #end if 
 #if html in $output_files_list: 
 ## Write out the html file 
 #set $html_file = $output_basename + '.out.html' 
 #os.system(cp $html_file $output_html;) 
 #end if 
 #else: 
 
 ## Write out the summary file (default) 
 #set $summary_file = $output_basename + '.tbl' 
 #os.system(cp $summary_file $output_summary;) 
 ## Write out the gff file (default) 
 #set $gff_file = $output_basename + '.out.gff' 
 #os.system(cp $gff_file $output_gff;) 
 ## End of advanced options: 
 #end if 
 ## Write out mask sequence file 
 #set $mask_sequence_file = $output_basename + '.masked' 
 #os.system(cp $mask_sequence_file $output_mask;) 
 ## Write out standard file (default) 
 ## The default '.out' file from RepeatMasker has a 3-line header

Re: [galaxy-user] Problems with large gzipped fasta files

2013-02-05 Thread Jennifer Jackson


Hi Jim,

You message was misthreaded (perhaps a reply to another thread, with 
just the subject line changed?), but I was able to dig it out.


A this time, there are no known issues with FTP Upload to the public 
Main server. Any issues you have have found prior were either related to 
a problem with the original file content (compression problem) or a 
transitory issue with the FTP server that has since been resolved (there 
has been a handful in the last few years).


The instructions to follow are here:
http://wiki.galaxyproject.org/FTPUpload

I am not exactly sure what your issue is, but any chance that you have 
more than one file per archive? That will certainly cause an issue, but 
usually with just the first file loading the remainder not.


Please send more details if this continues. Does the failure occur at 
the FTP stage or at the point where you move from the FTP holding area 
into a history?


Thanks!

Jen
Galaxy team


On 2/5/13 5:48 AM, Jim Robinson wrote:

Hi,

I am having a lot of difficulty uploading some large gzipped fastqs (~
10GB) to the public server.   I have tried both ftp and pulling by
http URL.   The upload succeeds, however I get an error as it tries to
gunzip it.I have tried more than 10 times now and succeeded once.
These files are correct and complete, and gunzip properly locally.   The
error shown is usually this

empty
format: txt, database: ?
Problem decompressing gzipped data

However on 2 occasions (both ftp uploads) I got the traceback below.
Am I missing some obvious trick?   I searched the archives and see
references to problems with large gzipped files but no solutions.

Thanks

Jim


Traceback (most recent call last):
   File /galaxy/home/g2main/galaxy_main/tools/data_source/upload.py,
line 384, in module
 __main__()
   File /galaxy/home/g2main/galaxy_main/tools/data_source/upload.py,
line 373, in __main__
 add_file( dataset, registry, json_file, output_path )
   File /galaxy/home/g2main/galaxy_main/tools/data_source/upload.py,
line 270, in add_file
 line_count, converted_path = sniff.convert_newlines( dataset.path,
in_place=in_place )
   File /galaxy/home/g2main/galaxy_main/lib/galaxy/datatypes/sniff.py,
line 106, in convert_newlines
 shutil.move( temp_name, fname )
   File /usr/lib/python2.7/shutil.py, line 299, in move
 copy2(src, real_dst)
   File /usr/lib/python2.7/shutil.py, line 128, in copy2
 copyfile(src, dst)
   File /usr/lib/python2.7/shutil.py, line 84, in copyfile
 copyfileobj(fsrc, fdst)
   File /usr/lib/python2.7/shutil.py, line 49, in copyfileobj
 buf = fsrc.read(length)
IOError: [Errno 5] Input/output error


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/



--
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/

[galaxy-user] Problems with large gzipped fasta files

Re: [galaxy-user] Error / Nebula and RepeatMasker

Re: [galaxy-user] Error / Nebula and RepeatMasker

Re: [galaxy-user] Problems with large gzipped fasta files

4 matches

Site Navigation

Mail list logo

Footer information