Re: [Pytables-users] using vlarray

llewelr Sun, 13 Dec 2009 11:29:17 -0800

Ernesto.

I agree, I think pytables would be a good solution for his full (with gaps)multiple sequence alignment.

Regarding your problem, hopefully others with more experience can help withoptimization. Personally I've had troubles in similar situations where Ihad large amounts of variable length data that had to be addedincrementally, or even worse, updated afterward.

Is Brent's suggestion helpful, though? Since it seems like you lose theordering of the sequences in the short reads as VLA anyway, can you juststore counts of nucleotides? Or will you be using this table to try tobuild scaffolds (linked sequences) over the short reads? -- in that caseyou must be relying on the start and stop genomic positions storedelsewhere.

Rich

On Dec 13, 2009 8:41am, Ernesto <[email protected]> wrote:

Hi Richard,

>This looks, in part, a way to store a huge multiple sequence alignmentwith

>a reference sequence (the first character in ( ) being the DNA base in a

>reference DNA molecule, but due to the inequal lengths in each VLA, itwould

>seem that gaps are not stored, or stored elsewhere in some way, whichwould

>be necessary to reconstruct the alignment.

I>s that right?

Yes, I'm searching a way to store a big multiple alignment. However, thealignment has been generated between a genomic sequence (the dna stringin my example) and a huge amount of short reads (from next generationsequencing).

>I'm curious because efficient retrieval of such multiple sequencealignments

>is an issue for a colleague. I think he eventually stored each base of~50

>full genomes (10^6 bases) as a separate mysql row with an indexposition. I

>thought it would never work due to overhead but seems fast enough for his

>purposes (selecting arbitrary alignments of several Kbp for web display).

In the multiple alignment that you are describing I think that pytablescould be very useful. In effect you known a priori the number of genomesand the length of the alignment. Therefore, you can build a table storingposition by position all nucleotides of a column.

In my case I don't known a priori the depth of each character, eachcolumn could contain a variable number of bases.

Moreover, I need a quick method to store this information since thenumber of short reads could be very huge (> 8-9 GB).

Ernesto

------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev

_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] using vlarray

Reply via email to