I want to make an intersection between a few hundreds of genomic intervals 
(predicted translocation sites from SVDetect) and low mappability regions in 
genomes (we are working with mm9 right now).

UCSC has an excellent mappability track that exactly matches our sequencing 
data (50 bp kmers), but it seems very difficult to get that data into Galaxy. I 
want a BED format that summarizes intervals of low mappability (ie. less than 
0.5 on the scale used by UCSC). The UCSC Table Browser has a limit of 10M 
lines, which seems to give just part of chromosome 1. It will be very messy to 
try to get the whole genome bit by bit using this method and then stitch it 
back together using some sort of concatenation. 

UCSC Help suggests downloading the mappability data for the whole genome as a 
bigwig formatted file, then convert to BED. I gave this a try, but we get a 4 
GB file, with intervals of just one or two base pairs. Again, lots of work to 
get back to the nicer BED that I could make with the UCSC tools over smaller 
genomic regions. Also, super-painful to upload this huge file to Galaxy, and 
unhappy trying to write my own parsers to filter and smooth this file. 

Any other suggestions? Maybe someone else knows where to find a mappability 
file (for mm9) that has nice intervals in a Galaxy compatible format.

—Stuart Brown



___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to