[galaxy-dev] Query Fastq files for particular sequence elements

2012-06-14 Thread Jane Dorweiler
Greetings all,

I've been trying to find a way to query fastq files for particular sequence
elements.   Our data was mapped using BWA by our collaborator, and
repetitive elements 'ignored', but we are now interested in determining
whether a couple specific repetitive elements of interest are
differentially represented in the raw read files.  Are there any tools that
anyone has developed to do anything like this -- and that perhaps I'm
simply missing as I explore the available tools?

 In the short term, I've written a very crude python script to begin
exploring the question, but I'm sure there has to be a much better way.

If there are no such tools available, I'm hopeful that someone might have
some helpful suggestions, or that perhaps it could be explored during the
upcoming conference /or training day in July.


Thanks and Best Regards,  Jane
-- 


Jane E. Dorweiler, PhD
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Query Fastq files for particular sequence elements

2012-06-14 Thread Hans-Rudolf Hotz

Hi Jane

I recommend mapping the data again yourself

Alternatively, you might wanna play with 'grep' (if you have the Galaxy 
Unix tools installed in your Galaxy server), or use the tool 'Select 
lines that match an expression'. I would do a Fastq to Tab on your data 
first. Or you can try the emboss tool 'fuzznuc' on the Fasta version of 
you data.


...but assuming you are talking about 'big' fastq files, mapping the 
data again yourself is most likely the way to go.


Regards, Hans



On 06/14/2012 07:12 PM, Jane Dorweiler wrote:

Greetings all,
I've been trying to find a way to query fastq files for particular
sequence elements. Our data was mapped using BWA by our collaborator,
and repetitive elements 'ignored', but we are now interested in
determining whether a couple specific repetitive elements of interest
are differentially represented in the raw read files. Are there any
tools that anyone has developed to do anything like this -- and that
perhaps I'm simply missing as I explore the available tools?
In the short term, I've written a very crude python script to begin
exploring the question, but I'm sure there has to be a much better way.
If there are no such tools available, I'm hopeful that someone might
have some helpful suggestions, or that perhaps it could be explored
during the upcoming conference /or training day in July.
Thanks and Best Regards, Jane
--
Jane E. Dorweiler, PhD



___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

   http://lists.bx.psu.edu/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/