Hi Richard, >This looks, in part, a way to store a huge multiple sequence alignment with >a reference sequence (the first character in ( ) being the DNA base in a >reference DNA molecule, but due to the inequal lengths in each VLA, it would >seem that gaps are not stored, or stored elsewhere in some way, which would >be necessary to reconstruct the alignment. I>s that right?
Yes, I'm searching a way to store a big multiple alignment. However, the alignment has been generated between a genomic sequence (the dna string in my example) and a huge amount of short reads (from next generation sequencing). >I'm curious because efficient retrieval of such multiple sequence alignments >is an issue for a colleague. I think he eventually stored each base of ~50 >full genomes (10^6 bases) as a separate mysql row with an index position. I >thought it would never work due to overhead but seems fast enough for his >purposes (selecting arbitrary alignments of several Kbp for web display). In the multiple alignment that you are describing I think that pytables could be very useful. In effect you known a priori the number of genomes and the length of the alignment. Therefore, you can build a table storing position by position all nucleotides of a column. In my case I don't known a priori the depth of each character, each column could contain a variable number of bases. Moreover, I need a quick method to store this information since the number of short reads could be very huge (> 8-9 GB). Ernesto ------------------------------------------------------------------------------ Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users