On Dec 16, 2016, at 1:55 PM, Stephen Pickett wrote:
> With a 2013 RDkit install we get consistent canonicalization between reaction 
> labelled and unlabelled atoms.
> >>> mol = Chem.MolFromSmiles('C1CC([*])CCN1')
> >>> Chem.MolToSmiles(mol)
> '[*]C1CCNCC1'
> >>> mol = Chem.MolFromSmiles('C1CC([*:1])CCN1')
> >>> Chem.MolToSmiles(mol)
> '[*:1]C1CCNCC1'

2013 RDKit didn't preserve the atom order between labeled and unlabeled atoms.

It looked like it for many cases, but there were a few cases where the slight 
change to the initial atom invariants, caused by the atom label, ended up 
affecting the SMILES.

I no longer have an older version of RDKit installed. Going through my notes, 
here was one of the failure cases:

core =>       
 syntax    => 
 canonical => 

For my project I ended up canonicalizing with unlabeled atoms, using the 
_smilesAtomOutputOrder to figure out where the "*" atoms were located in the 
SMILES string, use CanonicalRankAtoms() to figure out which were symmetrical, 
and come up with my own canonical labeling on top of the canonical unlabeled 

> I can understand why canonicalization can be different between versions but 
> I’m not sure whether this change in behaviour is expected?

While it is possible to generate a canonical labeling which preserves the same 
atom order as the canonical unlabeled SMILES (as I did above), that's more 
complicated. It's easier to include the label as part of the atom invariant and 
use the regular canonicalization mechanism.



Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
Rdkit-discuss mailing list

Reply via email to