Hello all,

I have come across an unexpected behaviour by RDKit when reading MOL blocks
of the same compound in V2000 and V3000 formats. In particular, RDKit seems
to perceive the stereochemistry of the compound differently depending on
the format.

The original compound is a V3000 tab:

  ACCLDraw01272318022D

  0  0  0     0  0            999 V3000
M  V30 BEGIN CTAB
M  V30 COUNTS 16 18 0 0 1
M  V30 BEGIN ATOM
M  V30 1 C 12.5458 -11.8979 0 0
M  V30 2 C 13.2916 -12.2506 0 0
M  V30 3 C 13.9706 -11.7808 0 0
M  V30 4 C 13.9024 -10.9587 0 0
M  V30 5 C 13.1566 -10.6061 0 0
M  V30 6 N 12.4789 -11.0754 0 0
M  V30 7 N 12.985 -9.2285 0 0 CFG=3
M  V30 8 C 12.1695 -9.1051 0 0
M  V30 9 C 12.0846 -7.9654 0 0 CFG=2
M  V30 10 C 12.6712 -8.5447 0 0
M  V30 11 C 13.4045 -8.1659 0 0 CFG=1
M  V30 12 C 13.2699 -7.3511 0 0
M  V30 13 N 12.4544 -7.2277 0 0 CFG=3
M  V30 14 C 12.0751 -6.4952 0 0
M  V30 15 H 14.2246 -8.2516 0 0
M  V30 16 H 11.2754 -7.8051 0 0
M  V30 END ATOM
M  V30 BEGIN BOND
M  V30 1 2 1 2
M  V30 2 1 2 3
M  V30 3 2 3 4
M  V30 4 1 4 5
M  V30 5 2 5 6
M  V30 6 1 6 1
M  V30 7 1 5 7
M  V30 8 1 8 7
M  V30 9 1 9 8
M  V30 10 1 9 10
M  V30 11 1 10 11
M  V30 12 1 11 7
M  V30 13 1 11 12
M  V30 14 1 9 13
M  V30 15 1 13 12
M  V30 16 1 13 14
M  V30 17 1 11 15 CFG=3
M  V30 18 1 9 16 CFG=3
M  V30 END BOND
M  V30 BEGIN COLLECTION
M  V30 MDLV30/STEABS ATOMS=(2 9 11)
M  V30 END COLLECTION
M  V30 END CTAB
M  END

Which can also be represented (in Biovia Draw) as:

[image: image.png]
The same compound converted into V2000:
1
  -OEChem-01262310192D

 16 18  0     1  0  0  0  0  0999 V2000
   15.8326   -5.9013    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   16.5785   -6.2540    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   17.2575   -5.7842    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   17.1893   -4.9622    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   16.4435   -4.6095    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   15.7658   -5.0788    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   16.2719   -3.2319    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   16.6914   -2.1693    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
   16.5568   -1.3545    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   15.7413   -1.2311    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   15.3714   -1.9688    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
   15.4564   -3.1085    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   15.9581   -2.5482    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   15.3620   -0.4986    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   16.6914   -2.1693    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
   15.3714   -1.9688    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  2  0  0  0  0
  2  3  1  0  0  0  0
  3  4  2  0  0  0  0
  4  5  1  0  0  0  0
  5  6  2  0  0  0  0
  1  6  1  0  0  0  0
 11 13  1  0  0  0  0
 10 11  1  0  0  0  0
 11 12  1  0  0  0  0
  8 13  1  0  0  0  0
  9 10  1  0  0  0  0
 10 14  1  0  0  0  0
  7 12  1  0  0  0  0
  8  9  1  0  0  0  0
  7  8  1  0  0  0  0
  5  7  1  0  0  0  0
  8 15  1  6  0  0  0
 11 16  1  6  0  0  0
M  END

Which is also represented as:

[image: image.png]
However, if I read the two compounds using RDKit and convert them into
SMILES, I get two compounds with different stereochemistry:
mols = [v3000, v2000]
for mol in mols:
    m = Chem.MolFromMolBlock(mol)
    print(Chem.MolToSmiles(m))

CN1C[C@@H]2C[C@H]1CN2c1ccccn1
CN1C[C@@H]2C[C@@H]1CN2c1ccccn1

I have inspected the tabs but I could not figure out why the two formats
are behaving differently given that they are rendered in the same way in
Biovia.

Any hints? Is this a bug in RDKit?

Thanks,

-- 
*Gianmarco*
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to