Hi Paul, On Wed, Nov 20, 2019 at 5:32 PM Paul Zierep via Rdkit-discuss < rdkit-discuss@lists.sourceforge.net> wrote:
> Hi, > in the original paper of ECFPs (Rogers, D.; Hahn, M. > “Extended-Connectivity Fingerprints.” *J. Chem. Inf. and Model.* *50*:742-54 > (2010).); it says, " that the relationship between fingerprint features and > the substructures may not always be one-to-one, " (especially for the FCFPs > but also the ECFPs). > > I was wondering if in the implementation of the rdkit Morgan Fingerprints > (speaking of the non-hashed/folded type of course), is it possible that the > one feature encodes for different not identical substructures. > Yes. The function that takes the atom environments and hashes them to produce an integer is not perfect and can produce collisions. I believe this is fairly rare, but it certainly can happen. Here's a concrete example: https://github.com/rdkit/rdkit/issues/814 There's a longer discussion of this topic here: https://sourceforge.net/p/rdkit/mailman/message/36438523/ -greg
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss