Re: What strategy for random accession of records in massive FASTA file?

Steve Holden Fri, 14 Jan 2005 15:00:06 -0800

Bengt Richter wrote:

On 12 Jan 2005 14:46:07 -0800, "Chris Lasher" <[EMAIL PROTECTED]> wrote:

[...]

Others have probably solved your basic problem, or pointed
the way. I'm just curious.

Given that the information content is 2 bits per character
that is taking up 8 bits of storage, there must be a good reason
for storing and/or transmitting them this way? I.e., it it easy
to think up a count-prefixed compressed format packing 4:1 in
subsequent data bytes (except for the last byte which have
less than 4 2-bit codes).

I'm wondering how the data is actually used once records are
retrieved. (but I'm too lazy to explore the biopython.org link).

Revealingly honest.

Of course, adopting an encoding that only used two bits per base would make it impossible to use the re module to search for patterns in them, for example. So the work of continuously translating between representations might militate against more efficient representations. Or, of course, it might not :-)

it's-only-storage-ly y'rs  - steve
--
Steve Holden               http://www.holdenweb.com/
Python Web Programming  http://pydish.holdenweb.com/
Holden Web LLC      +1 703 861 4237  +1 800 494 3119
--
http://mail.python.org/mailman/listinfo/python-list

Re: What strategy for random accession of records in massive FASTA file?

Reply via email to