Hi Richard,

 >This looks, in part, a way to store a huge multiple sequence  
alignment with
 >a reference sequence (the first character in ( ) being the DNA base  
in a
 >reference DNA molecule, but due to the inequal lengths in each VLA,  
it would
 >seem that gaps are not stored, or stored elsewhere in some way,  
which would
 >be necessary to reconstruct the alignment.
I>s that right?

Yes, I'm searching a way to store a big multiple alignment. However,  
the alignment has been generated between a genomic sequence (the dna  
string in my example) and a huge amount of short reads (from next  
generation sequencing).

 >I'm curious because efficient retrieval of such multiple sequence  
alignments
 >is an issue for a colleague.  I think he eventually stored each base  
of ~50
 >full genomes (10^6 bases) as a separate mysql row with an index  
position.  I
 >thought it would never work due to overhead but seems fast enough  
for his
 >purposes (selecting arbitrary alignments of several Kbp for web  
display).

In the multiple alignment that you are describing I think that  
pytables could be very useful. In effect you known a priori the number  
of genomes and the length of the alignment. Therefore, you can build a  
table storing position by position all nucleotides of a column.
In my case I don't known a priori the depth of each character, each  
column could contain a variable number of bases.
Moreover, I need a quick method to store this information since the  
number of short reads could be very huge (> 8-9 GB).

Ernesto



------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to