Re: [galaxy-user] How to use collapsed sequence files in mapping and displaying

2012-04-16 Thread Jennifer Jackson

Hi Jun,

There isn't an automatic way to interpret this 'count' number from the 
sequence identifier when visualizing a BAM/SAM file, but it can be done 
in a BED file with some text manipulation.


Note: BED data does not contain sequence data (BAM/SAM data does). Just 
something to be aware of when planning visualization priorities. If you 
want to zoom to the nucleotide/sequence level and see sequence data in 
your track, then this method is probably not the right choice.


If you do choose to do this, after first converting BAM/SAM to Interval, 
the count could be placed into the 'score' attribute of a BED dataset. 
BED data displays at UCSC in shades of grey based on score values. See 
column #5 "score" here:

http://genome.ucsc.edu/FAQ/FAQformat.html#format1

The basic idea would be use tools from the tool group "Text 
Manipulation" to manipulate the data. The general path would be the 
following (tune as needed):


Starting with an Interval file:

- Parse out the count data from the sequence name with "Convert 
delimiters to TAB" by "Dashes" to isolate the count from the latter half 
of the first column (sequence identifier). This new column of data will 
become your "score" column.


- Optional. You may want/need to perform a calculation on the 'score' 
value to make it fit the 0-1000 grey scale that UCSC offers. To do so, 
use "Compute" and your own scaling expression.


- "Add column" to create a "name" column. A "." (dot) works as a NULL value.

- "Cut" columns to create a BED format of 6 columns in the proper order:
http://wiki.g2.bx.psu.edu/Learn/Datatypes#Bed

- Click on pencil icon to 'Edit Attributes' to set datatype to ".bed" 
and save. Then, set/double check all 6 attributes and save. Finally, set 
database if this become unassigned during processing.


Best wishes for your project,

Jen
Galaxy team


On 4/16/12 8:58 AM, Jun Lu wrote:

I found that there is a "collapse" tool under FASTA manipulation, which
will significantly shorten mapping time with bowtie with small RNA reads
that tend to have many reads of exact length and sequence after clipping
adaptors.
The tool generates new names for each unique sequence read with a number
indicating the number of times (or occurrences) the unique sequence has
appeared in the data.
The question is, after mapping with Bowtie, how can I regain this
"occurrence" information when displaying in Genome Browser? The current
setting will only show one mapped read for each unique sequence, no
matter how many times this unique sequence has occurred.
Should I write a custom code to expand the resulting sam file based on
the occurrences?

All runs were executed on the galaxy main server.
Any suggestion is appreciated.
Jun

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

http://lists.bx.psu.edu/


--
Jennifer Jackson
http://galaxyproject.org
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/


[galaxy-user] How to use collapsed sequence files in mapping and displaying

2012-04-16 Thread Jun Lu
I found that there is a "collapse" tool under FASTA manipulation, which 
will significantly shorten mapping time with bowtie with small RNA reads 
that tend to have many reads of exact length and sequence after clipping 
adaptors.
The tool generates new names for each unique sequence read with a number 
indicating the number of times (or occurrences) the unique sequence has 
appeared in the data.
The question is, after mapping with Bowtie, how can I regain this 
"occurrence" information when displaying in Genome Browser? The current 
setting will only show one mapped read for each unique sequence, no 
matter how many times this unique sequence has occurred.
Should I write a custom code to expand the resulting sam file based on 
the occurrences?


All runs were executed on the galaxy main server.
Any suggestion is appreciated.
Jun

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/