This is a really good question.

I must admit that I find the ECFP behavior as published to be somewhat
weird.
It doesn't make sense to me that the chiral versions of the Morgan-2
fingerprints for CCC[CH](C)CCO, CCC[C@@H](C)CCO, and CCC[C@H](C)CCO would
be identical.

However, as you point out, we have tried to reproduce the details of the
published algorithm and the way chirality is being handled currently does
not do that. I don't think "fixing" the current behavior would be a great
idea, but it would make sense to add an additional option to use the
original chirality rules (along with some documentation explaining them).
Here's the github issue: https://github.com/rdkit/rdkit/issues/2818

I didn't notice this discrepancy when I did the original comparison of
similarities between RDKit's MorganFP and PPs ECFP implementation many
years ago because I ran both of them without chirality being turned on.

Thanks for pointing this out Ansgar!
-greg




On Mon, Nov 25, 2019 at 1:09 PM Schuffenhauer, Ansgar <
ansgar.schuffenha...@novartis.com> wrote:

> Dear all
>
>
>
> I have observed some unexpected behaviour with the chiral version of the
> Morgan Fingerprints in RDKit
>
>
>
> When reading the Rogers paper (http://doi.org/10.1021/ci100050t ) I find:
>
> “If the atom is a possible stereoatom but is not yet disambiguated, and
> all attachment atoms have different identifiers, then the atom is marked as
> disambiguated, and a stereochemical flag is appended to the array,
> depending on the marked stereochemistry. (Step 4 is only performed if
> stereochemical fingerprints are requested.)”
>
>
>
> In this aspect I believe that the rdkit implementation does not follow
> exactly the ECFP paper.
>
> As a test I calculated the pairwise similarity between the enatiomers of
> butan-2-ol, hexan-3-ol, octan-4-ol, decan-5-ol, ...
>
> Eventually the both alkyl chains should grow too long to become
> disambiguated within the fingerprint radius, there for the chirality on the
> chiral center should not be recognised any more, and the fingerprint of the
> enantiomers should become equal to 1, once the chains outgrow the
> fingerprint radius.
>
>
>
> Strangely that doesn’t happen, as can be seen in the attached notebook,
> all fingerprints with radius > 0 will always give similarities < 1.0 for
> the enantiomer pairs.
>
>
>
> This contrasts with the Pipeline Pilot implementation, where with the
> similarity of the enantiomers indeed becomes 1.0 once the chains outgrow
> the fingerprint radius. For your reference I added also fingerprints and
> similarity values obtained at different ECFP diameters
>
>
>
> Is this difference in behaviour intentional? I always assumed so far that
> rdkit Morgan and Pipeline Pilot ECFP would give identical similarity results
> .
>
>
>
>
>
> With best regards
>
>
>
>
>
> *Ansgar Schuffenhauer*
>
> Senior Investigator I
>
> T +41 79 608 9063
>
> ansgar.schuffenha...@novartis.com
>
>
>
> *Novartis Pharma AG*
>
> NIBR
>
>
>
> Novartis Campus
>
> Virchow 16-4.249.09
>
> 4056 Basel
>
> Switzerland
> ------------------------------
>
>
>
> _________________________
>
>
>
> CONFIDENTIALITY NOTICE
>
>
>
> The information contained in this e-mail message is intended only for the
> exclusive use of the individual or entity named above and may contain
> information that is privileged, confidential or exempt from disclosure
> under applicable law. If the reader of this message is not the intended
> recipient, or the employee or agent responsible for delivery of the message
> to the intended recipient, you are hereby notified that any dissemination,
> distribution or copying of this communication is strictly prohibited. If
> you have received this communication in error, please notify the sender
> immediately by e-mail and delete the material from any computer.  Thank you.
>
>
>
>
>
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to