On Feb 17, 2009, at 8:30 PM, Andrew Dalke wrote:
Thanks for the link. I've read [Andrew Smellie's] paper now. It's very
nicely written, clear, with good background, description,
and results,
  ...
Andrew Smellie didn't take what I thought would be
the obvious next step and use a trie (probably a
Patricia trie) to store things.
  ...
Now I want to try it out! :)

I made a first pass implementation of my trie idea using
Python.  It's attached. My quick summary is:


fingerprint   size of  time for  size of  time for  time for
   size         bits     bits     trie     trie      trie (no hits)
===========   =======  ========  =======  ========  ========
  98837        14 MB     0.05s    59 MB    0.6s      0.04s
 198819        28 MB     0.09s   105 MB    0.7s      0.07s

Based on the paper, it seems his C code requires about
160 MB for 200,000 compounds so there's a clear savings
due to at least one of the two memory savings techniques
I tried.

I expect faster performance for the regular "&" based
screens if I were to do the test directly in C. I'm
guessing a factor of 3.

I expect an even better performance increase and reduced
memory use were I to rewrite the trie in C. There I
expect about 10x-30x performance, which brings them
to the same times. But that's a scientific wild-ass guess.

There's no real conclusion you can draw from these numbers
other than that I've managed to save a lot of memory and
that this is an avenue worth considering.



                                Andrew
                                da...@dalkescientific.com



Reply via email to