On Thu, Feb 12, 2009 at 2:55 PM, Andrew Dalke <da...@dalkescientific.com> wrote:
> On Feb 12, 2009, at 8:46 AM, Greg Landrum wrote:
>> I'm either not understanding completely or I disagree. The queries
>> were constructed by fragmenting the molecules I searched through, so
>> I'd expect lots of substructure hits (and a lower screen-out rate that
>> arbitrary queries against arbitrary molecules).
>
> Ahh, of course.
>
> But I don't think fingerprint screen give, say, 0.001% false rates.
> I think they are more in line with what you found. But if the bit
> distributions were really uncorrelated for molecules where one is
> not a substructure of the other, then I would expect extremely
> low false positive rates. 2048 bits should give a lot of
> discrimination power if the bits weren't correlated.

Agreed, the bit correlation experiment should be done.

>> That's a good idea to add to the list of things to look into. It's
>> also relatively easy to do because it probably just involves
>> increasing the minimum path length included in fingerprints (at least
>> as a first step).
>
> Again, I don't have experience with that, but it means
> that there's less ability to handle unlikely atom types.
> Yes, the larger subgraphs will include them. Don't know.

I suspect the less common atom types aren't a big concern since the
larger subgraphs will include them and any sugraph isomorphism
involving them will go very quickly (since most things will be
screened out in the atom-atom mapping phase)

>
>> Looking at MACCS is a good idea. I'll also put that on the list.
>
> Is this list on a wiki? ;)

Not yet, but I just put up the page for it:
http://code.google.com/p/rdkit/wiki/SubstructureSearchOptimization

Now I just need to populate it.

-greg

Reply via email to