Re: [galaxy-user] FASTQ to FASTQSanger using Groomer question

2011-03-22 Thread Eric Cabot
Illumina's technical support team told me two weeks ago that Cassava 1.8 
will not be released for at least six weeks. That makes it at least a 
month from now.


Do you know anyone outside the company that has used it yet? Beyond the 
moaning within the cited seqanswers thread, I'd be interested in hearing 
any first-hand impressions.


Eric


Peter Cock wrote:

On Mon, Mar 21, 2011 at 1:42 PM, David K Crossman dkcro...@uab.edu wrote:

Hello!



I am fairly new to using Galaxy and have a question about
the FASTQ Groomer feature.  I have 4 RNA-Seq raw data files that were just
recently generated from Illumina’s NGS instruments.


Very recently? If they are already using Illumina's CASAVA v1.8 pipeline
then the FASTQ files will already be in the Sanger FASTQ format:
http://seqanswers.com/forums/showthread.php?t=8895

Peter

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/



___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-user] FASTQ to FASTQSanger using Groomer question

2011-03-21 Thread Daniel Blankenberg
Hi David,

Your files appear to be of the Sanger FASTQ variant. As you have noticed, the 
info blurb provided by the Grooming tool provides information that should be 
utilized to confirm input types. While the 'Illumina 1.3+' FASTQ  format does 
encode scores using a different ASCII range, it is my understanding that the 
scripts provided by the manufacturer to create FASTQ formatted files were 
enhanced to write out Sanger encoded quality scores. 

The correct Grooming path for your data is Sanger -- Sanger.  Please let us 
know if we can provide further assistance.

Thanks for using Galaxy,

Dan


On Mar 21, 2011, at 9:42 AM, David K Crossman wrote:

 Hello!
  
 I am fairly new to using Galaxy and have a question about the 
 FASTQ Groomer feature.  I have 4 RNA-Seq raw data files that were just 
 recently generated from Illumina’s NGS instruments.  I am aware that the 
 first step to perform in Galaxy is FASTQ Groomer to convert the format to 
 FASTQ Sanger.  I presume that I would choose Illumina 1.3+ in the “Input 
 FASTQ quality scores type” box.  However, if I look at the raw data reads, I 
 notice that Line 4 (which encodes the quality values for sequence in Line 2) 
 has values outside of the Illumina 1.3+ range (some of them fall into the 
 Sanger format.  I am enclosing the Quality Score Comparison figure along with 
 some of the raw RNA-Seq data): 
 Quality Score Comparison
 SS
 ...III
 ..
 !#$%'()*+,-./0123456789:;=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
 | |||  |  
|
 3359   64   73104 
   126
  
 S - Sanger   Phred+33,  93 values  (0, 93) (0 to 60 expected in raw reads)
 I - Illumina 1.3 Phred+64,  62 values  (0, 62) (0 to 40 expected in raw reads)
 X - Solexa   Solexa+64, 67 values (-5, 62) (-5 to 40 expected in raw 
 reads)
 Diagram adapted from http://en.wikipedia.org/wiki/FASTQ_format
 RNA-Seq raw data
 @HWI-ST156_294:7:1:1058:2165:0/1
 CACCAACTCACAGCCACTCCGTGAGGCCAGCAAGGCAAGAACATTCATCTC
 +
 FGGHHHGFHHFHHEGHCGGGEB.EE9D?DD4FFFCBB/.C=D
  
 @HWI-ST156_294:7:1:1184:2191:0/1
 CGTAAATCCATGTCTGACTTCTGGATAGCAAACACCAGCACCGCGTGGATG
 +
 EE;E=ECEEBE@=GBFGF/GFFCFA;:@8AEABBA#
  
 @HWI-ST156_294:7:1:1018:2200:0/1
 NCTGATTAAGGATAATGAGTAGTAGAACTAATGATGTTATTCCTTGG
 +
 ###
  
 @HWI-ST156_294:7:1:1225:2217:0/1
 GTGACTACACAAAGCACCCTTCTAAACCAGACCATTCTGGAGAATGA
 +
 FFCEFFFE?FEBDC?987::,3:-9145,DA:C9;+?
  
  
 As a test in FASTQ Groomer, I chose either the Sanger or 
 Illumina 1.3+ as the input quality scores type and these are the results I 
 got:
  
 FASTQ Groomer on tn-read1 (using Sanger as input)
 6.1 Gb
 format: fastqsanger, database:mm9
 Info: Groomed 45868679 sanger reads into sanger reads. Based upon quality and 
 sequence, the input data is valid for: sanger Input ASCII range: '#'(35) - 
 'I'(73) Input decimal range: 2 - 40
  
 FASTQ Groomer on tn-read1 (using Illumina1.3+ as input)
 6.1 Gb
 format: fastqsanger, database:mm9
 Info: Groomed 45868679 illumina reads into sanger reads. Based upon quality 
 and sequence, the input data is valid for: sanger Input ASCII range: '#'(35) 
 - 'I'(73) Input decimal range: -29 - 9
  
 Which one is right (I presume the Illumina 1.3+ one, but I can’t find any 
 sort of explanation)?  I noticed that the “input decimal range” had different 
 values (although they spanned the same length) in relation to which input was 
 chosen.  What would happen downstream in TopHat if Sanger was used instead of 
 Illumina 1.3+ for these files?  Is there any other reading 
 material/websites/etc… out there that might help me better understand the 
 quality score and which to use?  Any info/help would be greatly appreciated.
  
 Thanks,
 David
  
  
 David K. Crossman, Ph.D.
 Systems Biologist/Analyst/Statistician
 Heflin Center for Genomic Science
 University of Alabama at Birmingham
 720 20th Street South
 Kaul Room 420
 Birmingham, AL 35294-0024
 (205) 996-4045
 (205) 996-4056 (fax)
 David K. Crossman, Ph.D.
 Heflin Center for Genomic Science
  
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this 

Re: [galaxy-user] FASTQ to FASTQSanger using Groomer question

2011-03-21 Thread Peter Cock
On Mon, Mar 21, 2011 at 1:42 PM, David K Crossman dkcro...@uab.edu wrote:
 Hello!



     I am fairly new to using Galaxy and have a question about
 the FASTQ Groomer feature.  I have 4 RNA-Seq raw data files that were just
 recently generated from Illumina’s NGS instruments.

Very recently? If they are already using Illumina's CASAVA v1.8 pipeline
then the FASTQ files will already be in the Sanger FASTQ format:
http://seqanswers.com/forums/showthread.php?t=8895

Peter

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/