Re: [Rdkit-discuss] Canonicalisation with reaction labels

Andrew Dalke Fri, 16 Dec 2016 06:28:05 -0800

On Dec 16, 2016, at 1:55 PM, Stephen Pickett wrote:
> With a 2013 RDkit install we get consistent canonicalization between reaction 
> labelled and unlabelled atoms.
> >>> mol = Chem.MolFromSmiles('C1CC([*])CCN1')
> >>> Chem.MolToSmiles(mol)
> '[*]C1CCNCC1'
> >>> mol = Chem.MolFromSmiles('C1CC([*:1])CCN1')
> >>> Chem.MolToSmiles(mol)
> '[*:1]C1CCNCC1'


2013 RDKit didn't preserve the atom order between labeled and unlabeled atoms.

It looked like it for many cases, but there were a few cases where the slight 
change to the initial atom invariants, caused by the atom label, ended up 
affecting the SMILES.

I no longer have an older version of RDKit installed. Going through my notes, 
here was one of the failure cases:

core =>       
Cc1cc2c3c(c1)C[N@]([*])CCN(C)CC[N@@]([*])Cc1cc(C)cc(c1OCCCO3)C[N@@](C)CCN(C)CC[N@](C)C2
 syntax    => 
Cc1cc2c3c(c1)C[N@]([*:1])CCN(C)CC[N@@]([*:2])Cc1cc(C)cc(c1OCCCO3)C[N@](C)CCN(C)CC[N@@](C)C2
 canonical => 
Cc1cc2c3c(c1)C[N@]([*:2])CCN(C)CC[N@@]([*:1])Cc1cc(C)cc(c1OCCCO3)C[N@@](C)CCN(C)CC[N@](C)C2

For my project I ended up canonicalizing with unlabeled atoms, using the 
_smilesAtomOutputOrder to figure out where the "*" atoms were located in the 
SMILES string, use CanonicalRankAtoms() to figure out which were symmetrical, 
and come up with my own canonical labeling on top of the canonical unlabeled 
SMILES.


> I can understand why canonicalization can be different between versions but 
> I’m not sure whether this change in behaviour is expected?

While it is possible to generate a canonical labeling which preserves the same 
atom order as the canonical unlabeled SMILES (as I did above), that's more 
complicated. It's easier to include the label as part of the atom invariant and 
use the regular canonicalization mechanism.

Cheers,


                                Andrew
                                da...@dalkescientific.com



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Canonicalisation with reaction labels

Reply via email to