Bengt Richter wrote:
[...]On 12 Jan 2005 14:46:07 -0800, "Chris Lasher" <[EMAIL PROTECTED]> wrote:
Others have probably solved your basic problem, or pointed the way. I'm just curious.
Given that the information content is 2 bits per character that is taking up 8 bits of storage, there must be a good reason for storing and/or transmitting them this way? I.e., it it easy to think up a count-prefixed compressed format packing 4:1 in subsequent data bytes (except for the last byte which have less than 4 2-bit codes).
I'm wondering how the data is actually used once records are retrieved. (but I'm too lazy to explore the biopython.org link).
Revealingly honest.
Of course, adopting an encoding that only used two bits per base would make it impossible to use the re module to search for patterns in them, for example. So the work of continuously translating between representations might militate against more efficient representations. Or, of course, it might not :-)
it's-only-storage-ly y'rs - steve -- Steve Holden http://www.holdenweb.com/ Python Web Programming http://pydish.holdenweb.com/ Holden Web LLC +1 703 861 4237 +1 800 494 3119 -- http://mail.python.org/mailman/listinfo/python-list