On Feb 12, 2009, at 8:46 AM, Greg Landrum wrote:
I'm either not understanding completely or I disagree. The queries were constructed by fragmenting the molecules I searched through, so I'd expect lots of substructure hits (and a lower screen-out rate that arbitrary queries against arbitrary molecules).
Ahh, of course. But I don't think fingerprint screen give, say, 0.001% false rates. I think they are more in line with what you found. But if the bit distributions were really uncorrelated for molecules where one is not a substructure of the other, then I would expect extremely low false positive rates. 2048 bits should give a lot of discrimination power if the bits weren't correlated.
That's a good idea to add to the list of things to look into. It's also relatively easy to do because it probably just involves increasing the minimum path length included in fingerprints (at least as a first step).
Again, I don't have experience with that, but it means that there's less ability to handle unlikely atom types. Yes, the larger subgraphs will include them. Don't know.
Looking at MACCS is a good idea. I'll also put that on the list.
Is this list on a wiki? ;) Andrew da...@dalkescientific.com