Ivan Vilata i Balaguer wrote:
> ----- Forwarded message from PyTables <[EMAIL PROTECTED]> -----
> 
> #152: creating a custom filter
> ---------------------------------+------------------------------------------
> Reporter:  [EMAIL PROTECTED]          |       Owner:  somebody
>     Type:  enhancement           |      Status:  new     
> Priority:  major                 |   Component:  PyTables
>  Version:  trunk                 |    Keywords:          
> ---------------------------------+------------------------------------------
>  Can I get some pointers on how to create my own filter for PyTables? It is
>  fairly clear to me how this should be done at HDF5 API level, but in
>  PyTables a list of filters seems to be quite hard-wired. I am considering
>  PyTables for a database that stores large amount of genomic sequence (5
>  letter alphabet - 4 nucleotides - ACTG plus N for "unknown" nucleotide).
>  Such sequences can be efficiently encoded with 2 bits per nucleotide for
>  storage (4x compression), but processing them is more convenient in
>  decoded "byte per nucleotide" form. Encoding/decoding looks like a natural
>  filter procedure. For typical real sequences, further zlib compression of
>  bit-encoded data has no benefit or even inflates the data. The best
>  specialized and very costly algorithms give < 15% additional compression.
>  So, bit encoding is all that is needed. I will be willing to contribute
>  the filter code back to PyTables source.

You might want to consider allowing the possibility of other IUPAC 
ambiguity codes than N. If you allow N, you already can't store each 
nucleotide in two bits. With four bits, you would be able to store the 
full complement of ambiguity codes (ACGT/WSKMRY/BDHV/N), and can even do 
it in an elegant way where each bit represents one of ACGT.



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to