Hello,
I have noticed an issue with InChI generation, in a rather specific
situation...
There are cases where the following generates different InChIs, whereas
they ought to be identical....
new_mol = reduce(Chem.CombineMols, Chem.GetMolFrags(old_mol, asMols=True))
old_inchi = Chem.MolToInchi(old_mol)
new_inchi = Chem.MolToInchi(new_mol)
I've attached an SD file containing some molecules (actually
different versions of the same compound) that exhibit the problem, and
some code to demonstrate it. The actual application is from a
custom-desalting procedure, but I hope this serves as an illustration. I
can provide other examples if necessary.
I'm running the 2012_12_1 release, and see the same results on Mac
OS X and Linux.
Francis
--
Dr Francis L Atkinson
Chemogenomics Group
European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton
Cambridge UK
(01223) 494473
#! /Library/Frameworks/EPD64.framework/Versions/Current/bin/python
from __future__ import print_function
from rdkit import Chem
######
old_mols = [x for x in Chem.SDMolSupplier("404631.sdf")]
print("Checking that all input mols have same InChI: {}\n".format(len({Chem.MolToInchi(x) for x in old_mols}) == 1))
print("-" * 100)
for n, old_mol in enumerate(old_mols):
print("Starting mol no. {}...".format(n))
# Generate new mol by splitting into unconnected components and recombining...
new_mol = reduce(Chem.CombineMols, Chem.GetMolFrags(old_mol, asMols=True))
# Compare InChIs from old and new mols...
old_inchi = Chem.MolToInchi(old_mol)
new_inchi = Chem.MolToInchi(new_mol)
differences = "".join(["v" if (old_inchi[i] != new_inchi[i]) else " " for i in range(0, min(len(old_inchi), len(new_inchi)))])
print("{}\n{}\n{}".format(differences, old_inchi, new_inchi))
# Passing the new mol though a molblock seems to fix the problem...
new_inchi_2 = Chem.MolToInchi(Chem.MolFromMolBlock(Chem.MolToMolBlock(new_mol)))
print("Checking whether old and new InChIs are the same (after passage thru molblock): {}".format(old_inchi == new_inchi_2))
print("-" * 100)
Marvin 02211109112D
17 15 0 0 0 0 999 V2000
-0.7607 -10.6459 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
-0.0457 -10.2343 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.6692 -10.6459 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0
1.3843 -10.2343 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0
2.0993 -10.6459 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0
2.8142 -10.2343 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0
-1.4740 -10.2352 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.6692 -11.4731 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1.3843 -9.4072 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
2.0993 -11.4731 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
3.5317 -10.6451 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
4.2440 -10.2326 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
2.8132 -9.4072 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
6.1244 -9.8084 0.0000 Sb 0 0 0 0 0 0 0 0 0 0 0 0
5.2971 -9.8084 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
6.5359 -9.0942 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
6.5359 -10.5227 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
3 4 1 0 0 0 0
4 9 1 1 0 0 0
1 2 1 0 0 0 0
5 10 1 1 0 0 0
4 5 1 0 0 0 0
6 11 1 0 0 0 0
11 12 1 0 0 0 0
5 6 1 0 0 0 0
6 13 1 6 0 0 0
2 3 1 0 0 0 0
1 7 1 0 0 0 0
3 8 1 1 0 0 0
14 15 2 0 0 0 0
14 16 2 0 0 0 0
14 17 1 0 0 0 0
M END
> <molregno>
404631
$$$$
Marvin 01251111452D
17 15 0 0 0 0 999 V2000
10.5491 -5.7444 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
10.9657 -5.0258 0.0000 Sb 0 0 0 0 0 0 0 0 0 0 0 0
10.5491 -4.3071 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
11.7949 -5.0258 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
2.5833 -5.3375 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
3.2978 -4.9250 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
4.0123 -5.3375 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
4.7267 -4.9250 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0
5.4412 -5.3375 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0
4.7267 -4.1000 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
5.4412 -6.1625 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
6.1557 -4.9250 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0
6.8702 -5.3375 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0
6.1557 -4.1000 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
6.8702 -6.1625 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
7.5846 -4.9250 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
8.2991 -5.3375 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
8 9 1 0 0 0 0
8 10 1 6 0 0 0
9 11 1 6 0 0 0
5 6 1 0 0 0 0
9 12 1 0 0 0 0
1 2 1 0 0 0 0
12 13 1 0 0 0 0
6 7 1 0 0 0 0
12 14 1 6 0 0 0
2 3 2 0 0 0 0
13 15 1 1 0 0 0
7 8 1 0 0 0 0
13 16 1 0 0 0 0
2 4 2 0 0 0 0
16 17 1 0 0 0 0
M END
> <molregno>
404631
$$$$
Marvin 01311110422D
17 15 0 0 0 0 999 V2000
19.4700 -21.6663 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
19.0581 -22.3811 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0
18.2343 -22.3811 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0
17.8223 -21.6663 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0
16.9985 -21.6663 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0
16.5866 -22.3811 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
19.0581 -20.9556 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
19.4700 -23.0960 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
17.8223 -23.0960 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
18.2343 -20.9556 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
16.5866 -20.9556 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
15.7627 -22.3811 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
15.3508 -23.0960 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
21.2500 -19.9698 0.0000 Sb 0 0 0 0 0 0 0 0 0 0 0 0
20.4250 -19.9698 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
21.6625 -19.2540 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
21.6625 -20.6815 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
2 3 1 0 0 0 0
3 4 1 0 0 0 0
4 5 1 0 0 0 0
5 6 1 0 0 0 0
1 7 1 0 0 0 0
2 8 1 1 0 0 0
3 9 1 1 0 0 0
4 10 1 1 0 0 0
5 11 1 6 0 0 0
12 13 1 0 0 0 0
6 12 1 0 0 0 0
14 15 2 0 0 0 0
14 16 2 0 0 0 0
14 17 1 0 0 0 0
M END
> <molregno>
404631
$$$$
Marvin 08191110492D
17 15 0 0 0 0 999 V2000
0.3417 -24.0125 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.0561 -23.6000 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
1.7706 -24.0125 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.4851 -23.6000 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0
3.1996 -24.0125 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0
3.9140 -23.6000 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0
4.6285 -24.0125 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0
5.3430 -23.6000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
6.0574 -24.0125 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
4.6285 -24.8375 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
3.9140 -22.7750 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
3.1996 -24.8375 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
2.4851 -22.7750 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1.6583 -25.5945 0.0000 Sb 0 0 0 0 0 0 0 0 0 0 0 0
1.2419 -24.8783 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1.2419 -26.3066 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
2.4829 -25.5956 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
6 7 1 0 0 0 0
3 4 1 0 0 0 0
7 8 1 0 0 0 0
8 9 1 0 0 0 0
4 5 1 0 0 0 0
7 10 1 1 0 0 0
2 3 1 0 0 0 0
6 11 1 6 0 0 0
5 6 1 0 0 0 0
5 12 1 6 0 0 0
1 2 1 0 0 0 0
4 13 1 6 0 0 0
14 16 2 0 0 0 0
14 15 2 0 0 0 0
14 17 1 0 0 0 0
M END
> <molregno>
404631
$$$$
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss