> Date: Tue, 12 Feb 2008 09:56:48 +0100 > From: [EMAIL PROTECTED] > Just one question: if your alphabet has 5 letters, how are you planning > to use only 2 bits to encode it? Or maybe the N is never actually > appearing in data?
Almost 2 bits. This is how a 2 bit encoding is done in NCBI BLAST databases: http://blast.wustl.edu/blast/ncbi20ntfmt.html It relies on the assumption that degenerate symbols (non ACTG) are relatively rare. So, it stores a run of two bit encodings, and at the end an array with positions of degenerates. To make things simple, I can just cut-and-paste the code from NCBI Toolkit if I decide to implement this filter. Thanks for the references to the source code. I will start looking at them. Andrey > > I guess that your proposal should be very useful for people using > PyTables in genomics (there are already some users in that field). If > some of them is reading this, it'd be interesting if they gave their > opinion in this matter, so that we can build a richer panorama. > > Just one question: if your alphabet has 5 letters, how are you planning > to use only 2 bits to encode it? Or maybe the N is never actually > appearing in data? Otherwise, I guess you need at least 3 bits. Also, > users should be warned that using the filter on non-ACTGN data will > render it useless... well, good documentation on the valid input domain > should do the trick. > > So, well, there are no pointers as such to add a new filter to PyTables, > but you can for instance look for "bzip2", which is the latest added > compressor, and copy from what you get (I guess your filter is simpler > since it doesn't have external dependencies like bzip2):: > > debian/ptrepack.1 -- ptrepack manual page > doc/xml/usersguide.xml -- documentation! > setup.py -- pyrex_extnames and Extension entry > src/_comp_bzip2.pyx -- bzip2 extension > src/H5ARRAY.c -- add support to H5ARRAYmake > src/H5TB-opt.c -- add support to H5TBOmake_table > src/H5VLARRAY.c -- add support to H5VLARRAYmake > src/H5Zbzip2.c -- define registration function and implement filter > src/H5Zbzip2.h -- declare filter id and registration function > src/utils.c -- include header > src/utilsExtension.pyx -- initialize and register, whichLibVersion > tables/filters.py -- all_complibs, docstrings > tables/scripts/ptrepack.py -- usage string > tables/tests/test_all.py -- print_versions > tables/tests/test_....py -- VERY IMPORTANT: add some tests! > > Now I'm listing the changesets related with the addition of bzip2 > support. Since files have been changed several times, I don't think > this will be of much help, but there they go: 764, 765, 767, 844, 1256, > 1446, 1451, 1462, 1471, 2515, 3051 (try > http://www.pytables.org/trac/changeset/NUMBER). > > Thanks for your support, and good luck! > > PS: I'm closing the ticket since we don't have much to do with it, but > feel free to reopen it when you have a patch. In the meantime, please > use the list for support. > > :: > > Ivan Vilata i Balaguer >qo< http://www.carabos.com/ > Cárabos Coop. V. V V Enjoy Data > "" _________________________________________________________________ Need to know the score, the latest news, or you need your Hotmail®-get your "fix". http://www.msnmobilefix.com/Default.aspx ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users