On Dec 16, 2016, at 1:55 PM, Stephen Pickett wrote:
> With a 2013 RDkit install we get consistent canonicalization between reaction
> labelled and unlabelled atoms.
> >>> mol = Chem.MolFromSmiles('C1CC([*])CCN1')
> >>> Chem.MolToSmiles(mol)
> '[*]C1CCNCC1'
> >>> mol = Chem.MolFromSmiles('C1CC([*:1])CCN1')
> >>> Chem.MolToSmiles(mol)
> '[*:1]C1CCNCC1'
2013 RDKit didn't preserve the atom order between labeled and unlabeled atoms.
It looked like it for many cases, but there were a few cases where the slight
change to the initial atom invariants, caused by the atom label, ended up
affecting the SMILES.
I no longer have an older version of RDKit installed. Going through my notes,
here was one of the failure cases:
core =>
Cc1cc2c3c(c1)C[N@]([*])CCN(C)CC[N@@]([*])Cc1cc(C)cc(c1OCCCO3)C[N@@](C)CCN(C)CC[N@](C)C2
syntax =>
Cc1cc2c3c(c1)C[N@]([*:1])CCN(C)CC[N@@]([*:2])Cc1cc(C)cc(c1OCCCO3)C[N@](C)CCN(C)CC[N@@](C)C2
canonical =>
Cc1cc2c3c(c1)C[N@]([*:2])CCN(C)CC[N@@]([*:1])Cc1cc(C)cc(c1OCCCO3)C[N@@](C)CCN(C)CC[N@](C)C2
For my project I ended up canonicalizing with unlabeled atoms, using the
_smilesAtomOutputOrder to figure out where the "*" atoms were located in the
SMILES string, use CanonicalRankAtoms() to figure out which were symmetrical,
and come up with my own canonical labeling on top of the canonical unlabeled
SMILES.
> I can understand why canonicalization can be different between versions but
> I’m not sure whether this change in behaviour is expected?
While it is possible to generate a canonical labeling which preserves the same
atom order as the canonical unlabeled SMILES (as I did above), that's more
complicated. It's easier to include the label as part of the atom invariant and
use the regular canonicalization mechanism.
Cheers,
Andrew
[email protected]
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss