Re: [Rdkit-discuss] counting stereocenters

2017-09-14 Thread Greg Landrum
Hi Daniel,
this is some oddity that happens with molecules constructed from InChIs.
CalcNumAtomStereoCenters() returns sensible results if you call it on the
molecules constructed from SMILES:
In [24]: mol_list2 = [Chem.MolFromSmiles(Chem.MolToSmiles(mol,True)) for
mol in mol_list]

In [25]: [CalcNumUnspecifiedAtomStereoCenters(mol) for mol in mol_list2]
Out[25]: [0, 1, 1]

In [26]: [CalcNumAtomStereoCenters(mol) for mol in mol_list2]
Out[26]: [2, 2, 1]


I will try to track this down.

-greg


On Thu, Sep 14, 2017 at 4:47 PM, Daniel Hitchcock  wrote:

> Hi All,
>
> I have a list of compounds (InChi strings) that I need to filter.
> Basically I need to identify which molecules are missing stereo information.
>
> I came across the "CalcNumUnspecifiedAtomStereoCenters" which sounded
> exactly like what I needed, but unfortunately all it does it return 0s, as
> well as the "CalcNumAtomStereoCenters" function. I've viewed the molecules
> using MolToImage(mol).show() to verify the stereo information is accurate,
> and it's all there.
>
> Here is the code I used. It's in python 3.5.2, and
> rdkit.Chem.rdMolDescriptors._CalcNumUnspecifiedAtomStereoCenters_version
> is 1.0.0
>
> """
> Three molecules with stereochemistry.
> stereo_inchi - 2 stereocenters specified
> am_inchi - 1 sterecenter sepcified, the other ambiguous
> unspec_inchi - 1 stereocenter, unmentioned in the InChi string
>
> Program should output:
> 2 0
> 1 1
> 0 1
> """
> from rdkit.Chem import MolFromInchi, MolToInchi
> from rdkit.Chem.AllChem import CalcNumAtomStereoCenters,
> CalcNumUnspecifiedAtomStereoCenters
> stereo_inchi = 'InChI=1S/C10H10O6/c1-5(9(12)13)16-8-4-6(10(14)15)2-3-7(8)
> 11/h2-4,7-8,11H,1H2,(H,12,13)(H,14,15)/t7-,8-/m1/s1'
> am_inchi = 'InChI=1S/C11H21NO5/c1-8(13)5-11(16)17-9(6-10(14)15)7-12(2,
> 3)4/h8-9,13H,5-7H2,1-4H3/t8?,9-/m1/s1'
> unspec_inchi = 'InChI=1S/C14H27NO4/c1-5-6-7-8-9-14(18)19-12(10-13(16)17)
> 11-15(2,3)4/h12H,5-11H2,1-4H3'
> mol_list = [stereo_inchi, am_inchi, unspec_inchi]
> mol_list = [MolFromInchi(mol) for mol in mol_list]
> for mol in mol_list:
> print(CalcNumAtomStereoCenters(mol), CalcNumUnspecifiedAtomStereoCe
> nters(mol))
>
>
> Thanks in advance!
>
> Cheers,
>
> -daniel
>
> --
> Daniel Hitchcock, PhDwww.linkedin.com/pub/daniel-hitchcock/24/7b8/858/
> Research Scientist I
> Metabolomics Platform
> The Broad Institute of MIT and Harvard
> 415 Main St, Cambridge, 02142
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] counting stereocenters

2017-09-14 Thread Daniel Hitchcock

Hi All,

I have a list of compounds (InChi strings) that I need to filter. 
Basically I need to identify which molecules are missing stereo information.


I came across the "CalcNumUnspecifiedAtomStereoCenters" which sounded 
exactly like what I needed, but unfortunately all it does it return 0s, 
as well as the "CalcNumAtomStereoCenters" function. I've viewed the 
molecules using MolToImage(mol).show() to verify the stereo information 
is accurate, and it's all there.


Here is the code I used. It's in python 3.5.2, and 
rdkit.Chem.rdMolDescriptors._CalcNumUnspecifiedAtomStereoCenters_version 
is 1.0.0


"""
Three molecules with stereochemistry.
stereo_inchi - 2 stereocenters specified
am_inchi - 1 sterecenter sepcified, the other ambiguous
unspec_inchi - 1 stereocenter, unmentioned in the InChi string

Program should output:
2 0
1 1
0 1
"""
from rdkit.Chem import MolFromInchi, MolToInchi
from rdkit.Chem.AllChem import CalcNumAtomStereoCenters, 
CalcNumUnspecifiedAtomStereoCenters
stereo_inchi = 
'InChI=1S/C10H10O6/c1-5(9(12)13)16-8-4-6(10(14)15)2-3-7(8)11/h2-4,7-8,11H,1H2,(H,12,13)(H,14,15)/t7-,8-/m1/s1'
am_inchi = 
'InChI=1S/C11H21NO5/c1-8(13)5-11(16)17-9(6-10(14)15)7-12(2,3)4/h8-9,13H,5-7H2,1-4H3/t8?,9-/m1/s1'
unspec_inchi = 
'InChI=1S/C14H27NO4/c1-5-6-7-8-9-14(18)19-12(10-13(16)17)11-15(2,3)4/h12H,5-11H2,1-4H3'

mol_list = [stereo_inchi, am_inchi, unspec_inchi]
mol_list = [MolFromInchi(mol) for mol in mol_list]
for mol in mol_list:
    print(CalcNumAtomStereoCenters(mol), 
CalcNumUnspecifiedAtomStereoCenters(mol))


Thanks in advance!

Cheers,

-daniel

--
Daniel Hitchcock, PhD
www.linkedin.com/pub/daniel-hitchcock/24/7b8/858/
Research Scientist I
Metabolomics Platform
The Broad Institute of MIT and Harvard
415 Main St, Cambridge, 02142

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss