Hi Paul,

On Wed, Nov 20, 2019 at 5:32 PM Paul Zierep via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> Hi,
> in the original paper of ECFPs (Rogers, D.; Hahn, M.
> “Extended-Connectivity Fingerprints.” *J. Chem. Inf. and Model.* *50*:742-54
> (2010).); it says, " that the relationship between fingerprint features and
> the substructures may not always be one-to-one, " (especially for the FCFPs
> but also the ECFPs).
>
> I was wondering if in the implementation of the rdkit Morgan Fingerprints
> (speaking of the non-hashed/folded type of course), is it possible that the
> one feature encodes for different not identical substructures.
>

Yes. The function that takes the atom environments and hashes them to
produce an integer is not perfect and can produce collisions.
I believe this is fairly rare, but it certainly can happen.
Here's a concrete example:
https://github.com/rdkit/rdkit/issues/814

There's a longer discussion of this topic here:
https://sourceforge.net/p/rdkit/mailman/message/36438523/

-greg
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to