Ivan Vilata i Balaguer wrote: > ----- Forwarded message from PyTables <[EMAIL PROTECTED]> ----- > > #152: creating a custom filter > ---------------------------------+------------------------------------------ > Reporter: [EMAIL PROTECTED] | Owner: somebody > Type: enhancement | Status: new > Priority: major | Component: PyTables > Version: trunk | Keywords: > ---------------------------------+------------------------------------------ > Can I get some pointers on how to create my own filter for PyTables? It is > fairly clear to me how this should be done at HDF5 API level, but in > PyTables a list of filters seems to be quite hard-wired. I am considering > PyTables for a database that stores large amount of genomic sequence (5 > letter alphabet - 4 nucleotides - ACTG plus N for "unknown" nucleotide). > Such sequences can be efficiently encoded with 2 bits per nucleotide for > storage (4x compression), but processing them is more convenient in > decoded "byte per nucleotide" form. Encoding/decoding looks like a natural > filter procedure. For typical real sequences, further zlib compression of > bit-encoded data has no benefit or even inflates the data. The best > specialized and very costly algorithms give < 15% additional compression. > So, bit encoding is all that is needed. I will be willing to contribute > the filter code back to PyTables source.
You might want to consider allowing the possibility of other IUPAC ambiguity codes than N. If you allow N, you already can't store each nucleotide in two bits. With four bits, you would be able to store the full complement of ambiguity codes (ACGT/WSKMRY/BDHV/N), and can even do it in an elegant way where each bit represents one of ACGT. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users