Hi Rebecca,

It looks like the standardiser modifies the molecule but does not update
computed values like valences.
If you call UpdatePropertyCache(), everything seems fine:

In [22]: m = Chem.MolFromSmiles("C(=O)(c1ccc(cc1)O)O")

In [23]: std_m = rules.run(m)

In [24]: std_m.UpdatePropertyCache()

In [25]: rdMolDescriptors.GetHashedAtomPairFingerprint(std_m)
Out[25]: <rdkit.DataStructs.cDataStructs.IntSparseIntVect at 0x20f36579a30>

@Francis: do you think that it would make sense to modify standardiser to
call 'UpdatePropertyCache()` on the results before returning them?

-greg


On Wed, Apr 11, 2018 at 5:02 PM, Rebecca Mackenzie - UKRI STFC <
rebecca.macken...@stfc.ac.uk> wrote:

> Hi there,
>
>
> I typically use python's standardiser (https://pypi.python.org/pypi/
> standardiser) when preparing any molecules for machine learning, and I
> have found the GetHashedAtomPairFingerprintAsBitVect as very good tool
> for input into support vector machines and neural networks.
>
>
> However, this fingerprint function can fail if input is standardised.
>
>
> #An example:
>
> m = Chem.MolFromSmiles("C(=O)(c1ccc(cc1)O)O")
> fp = rdMolDescriptors.GetHashedAtomPairFingerprintAsBitVect(m)
>
> #standardise the molecule, using standardiser v0.1.9
> https://pypi.python.org/pypi/standardiser
> std_m = rules.run(m)
> fp = rdMolDescriptors.GetHashedAtomPairFingerprintAsBitVect(std_m)
>
> Produces the following error:
>
> RuntimeError                              Traceback (most recent call last)
> <ipython-input-221-b73abfdb31ef> in <module>()
>       4 #standardise the molecule, using standardiser v0.1.9
> https://pypi.python.org/pypi/standardiser
>       5 std_m = rules.run(m)
> ----> 6 fp = rdMolDescriptors.GetHashedAtomPairFingerprintAsBitVect(std_m)
>
> RuntimeError: Invariant Violation
>         explicit valence exceeds atom degree
>         Violation occurred on line 32 in file Code/GraphMol/Fingerprints/
> AtomPairs.cpp
>         Failed Expression: val >= atom->getDegree()
>         RDKIT: 2017.09.3
>         BOOST: 1_63
>
> After a few hours digging through, I have found why this is:
> Source code (https://github.com/rdkit/rdkit/blob/master/Code/
> GraphMol/Fingerprints/AtomPairs.cpp) :
>
> unsigned int res = 0;
>
>
> if (atom->getIsAromatic()) {
>     res = 1;
> } else if (atom->getHybridization() != Atom::SP3) {
> unsigned int val = static_cast<unsigned int>(atom->getExplicitValence());
>     val -= atom->getNumExplicitHs();
>     CHECK_INVARIANT(val >= atom->getDegree(),
>                     "explicit valence exceeds atom degree");
> res = val - atom->getDegree();
>
> From what I can gather, standardisation adds explicit hydrogens (although
> has no way to turn this off), whereas default sanitisation
>
> does not.
>
>
> When iterating through each atom in the standardised molecule
> (mol.GetAtoms()) and checking the explicit valence (a.GetExplicitValence),
> explicit hydrogens (a.GetNumExplicitHs) and degree (a.GetDegree) it is easy
> to see how the error is caused:
>
> Standardised molecule
> Atoms: ['C', 'O', 'C', 'C', 'C', 'C', 'C', 'C', 'O', 'O']
> Explicit Valence: [4, 2, 4, 3, 3, 4, 3, 3, 1, 1]
> Explicit Hs: [0, 0, 0, 1, 1, 0, 1, 1, 1, 1]
> Valence (Explicit Valence - Explicit Hs): [4, 2, 4, 2, 2, 4, 2, 2, 0, 0]
> Degrees: [3, 1, 3, 2, 2, 3, 2, 2, 1, 1]
> Valence >= Degrees: [True, True, True, True, True, True, True, True,
> False, False]
>
>
> Should GetHashedAtomPairFingerprint be coded so that explicit hydrogens
> are added before performing this check so that issues when standardising no
> longer occur?
>
> System info:
> Python 3.6.3
> RDKit 2017.09.03
> standardiser 0.1.9
>
> Kind Regards,
>
>
> Rebecca Mackenzie
>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to