> Date: Tue, 12 Feb 2008 09:56:48 +0100
> From: [EMAIL PROTECTED]
> Just one question: if your alphabet has 5 letters, how are you planning
> to use only 2 bits to encode it?  Or maybe the N is never actually
> appearing in data?

Almost 2 bits.
This is how a 2 bit encoding is done in NCBI BLAST databases:
http://blast.wustl.edu/blast/ncbi20ntfmt.html

It relies on the assumption that degenerate symbols (non ACTG) are relatively 
rare. So, it stores a run of two bit encodings, and at the end an array with 
positions of degenerates. To make things simple, I can just cut-and-paste the 
code from NCBI Toolkit if I decide to implement  this filter.
Thanks for the references to the source code. I will start looking at them.
Andrey

>
> I guess that your proposal should be very useful for people using
> PyTables in genomics (there are already some users in that field).  If
> some of them is reading this, it'd be interesting if they gave their
> opinion in this matter, so that we can build a richer panorama.
>
> Just one question: if your alphabet has 5 letters, how are you planning
> to use only 2 bits to encode it?  Or maybe the N is never actually
> appearing in data?  Otherwise, I guess you need at least 3 bits.  Also,
> users should be warned that using the filter on non-ACTGN data will
> render it useless... well, good documentation on the valid input domain
> should do the trick.
>
> So, well, there are no pointers as such to add a new filter to PyTables,
> but you can for instance look for "bzip2", which is the latest added
> compressor, and copy from what you get (I guess your filter is simpler
> since it doesn't have external dependencies like bzip2)::
>
>   debian/ptrepack.1 -- ptrepack manual page
>   doc/xml/usersguide.xml -- documentation!
>   setup.py -- pyrex_extnames and Extension entry
>   src/_comp_bzip2.pyx -- bzip2 extension
>   src/H5ARRAY.c -- add support to H5ARRAYmake
>   src/H5TB-opt.c -- add support to H5TBOmake_table
>   src/H5VLARRAY.c -- add support to H5VLARRAYmake
>   src/H5Zbzip2.c -- define registration function and implement filter
>   src/H5Zbzip2.h -- declare filter id and registration function
>   src/utils.c -- include header
>   src/utilsExtension.pyx -- initialize and register, whichLibVersion
>   tables/filters.py -- all_complibs, docstrings
>   tables/scripts/ptrepack.py -- usage string
>   tables/tests/test_all.py -- print_versions
>   tables/tests/test_....py -- VERY IMPORTANT: add some tests!
>
> Now I'm listing the changesets related with the addition of bzip2
> support.  Since files have been changed several times, I don't think
> this will be of much help, but there they go: 764, 765, 767, 844, 1256,
> 1446, 1451, 1462, 1471, 2515, 3051 (try
> http://www.pytables.org/trac/changeset/NUMBER).
>
> Thanks for your support, and good luck!
>
> PS: I'm closing the ticket since we don't have much to do with it, but
> feel free to reopen it when you have a patch.  In the meantime, please
> use the list for support.
>
> ::
>
>       Ivan Vilata i Balaguer >qo<   http://www.carabos.com/
>              Cárabos Coop. V.  V  V   Enjoy Data
>                                 ""

_________________________________________________________________
Need to know the score, the latest news, or you need your Hotmail®-get your 
"fix".
http://www.msnmobilefix.com/Default.aspx
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to