Hello all, I have come across an unexpected behaviour by RDKit when reading MOL blocks of the same compound in V2000 and V3000 formats. In particular, RDKit seems to perceive the stereochemistry of the compound differently depending on the format.
The original compound is a V3000 tab: ACCLDraw01272318022D 0 0 0 0 0 999 V3000 M V30 BEGIN CTAB M V30 COUNTS 16 18 0 0 1 M V30 BEGIN ATOM M V30 1 C 12.5458 -11.8979 0 0 M V30 2 C 13.2916 -12.2506 0 0 M V30 3 C 13.9706 -11.7808 0 0 M V30 4 C 13.9024 -10.9587 0 0 M V30 5 C 13.1566 -10.6061 0 0 M V30 6 N 12.4789 -11.0754 0 0 M V30 7 N 12.985 -9.2285 0 0 CFG=3 M V30 8 C 12.1695 -9.1051 0 0 M V30 9 C 12.0846 -7.9654 0 0 CFG=2 M V30 10 C 12.6712 -8.5447 0 0 M V30 11 C 13.4045 -8.1659 0 0 CFG=1 M V30 12 C 13.2699 -7.3511 0 0 M V30 13 N 12.4544 -7.2277 0 0 CFG=3 M V30 14 C 12.0751 -6.4952 0 0 M V30 15 H 14.2246 -8.2516 0 0 M V30 16 H 11.2754 -7.8051 0 0 M V30 END ATOM M V30 BEGIN BOND M V30 1 2 1 2 M V30 2 1 2 3 M V30 3 2 3 4 M V30 4 1 4 5 M V30 5 2 5 6 M V30 6 1 6 1 M V30 7 1 5 7 M V30 8 1 8 7 M V30 9 1 9 8 M V30 10 1 9 10 M V30 11 1 10 11 M V30 12 1 11 7 M V30 13 1 11 12 M V30 14 1 9 13 M V30 15 1 13 12 M V30 16 1 13 14 M V30 17 1 11 15 CFG=3 M V30 18 1 9 16 CFG=3 M V30 END BOND M V30 BEGIN COLLECTION M V30 MDLV30/STEABS ATOMS=(2 9 11) M V30 END COLLECTION M V30 END CTAB M END Which can also be represented (in Biovia Draw) as: [image: image.png] The same compound converted into V2000: 1 -OEChem-01262310192D 16 18 0 1 0 0 0 0 0999 V2000 15.8326 -5.9013 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 16.5785 -6.2540 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 17.2575 -5.7842 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 17.1893 -4.9622 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 16.4435 -4.6095 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 15.7658 -5.0788 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 16.2719 -3.2319 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 16.6914 -2.1693 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0 16.5568 -1.3545 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 15.7413 -1.2311 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 15.3714 -1.9688 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0 15.4564 -3.1085 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 15.9581 -2.5482 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 15.3620 -0.4986 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 16.6914 -2.1693 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 15.3714 -1.9688 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 0 0 0 0 2 3 1 0 0 0 0 3 4 2 0 0 0 0 4 5 1 0 0 0 0 5 6 2 0 0 0 0 1 6 1 0 0 0 0 11 13 1 0 0 0 0 10 11 1 0 0 0 0 11 12 1 0 0 0 0 8 13 1 0 0 0 0 9 10 1 0 0 0 0 10 14 1 0 0 0 0 7 12 1 0 0 0 0 8 9 1 0 0 0 0 7 8 1 0 0 0 0 5 7 1 0 0 0 0 8 15 1 6 0 0 0 11 16 1 6 0 0 0 M END Which is also represented as: [image: image.png] However, if I read the two compounds using RDKit and convert them into SMILES, I get two compounds with different stereochemistry: mols = [v3000, v2000] for mol in mols: m = Chem.MolFromMolBlock(mol) print(Chem.MolToSmiles(m)) CN1C[C@@H]2C[C@H]1CN2c1ccccn1 CN1C[C@@H]2C[C@@H]1CN2c1ccccn1 I have inspected the tabs but I could not figure out why the two formats are behaving differently given that they are rendered in the same way in Biovia. Any hints? Is this a bug in RDKit? Thanks, -- *Gianmarco*
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss