Re: [Rdkit-discuss] isotopic SMILES
On Feb 7, 2017, at 01:17, Curt Fischerwrote: > I am confused by this behavior: > > >>> labeled_etoh = Chem.MolFromSmiles('C[13C]O') > >>> print(Chem.MolToSmiles(labeled_etoh)) > > C[C]O > > >>> print(Chem.MolToSmiles(labeled_etoh, isomericSmiles=True)) > > C[13C]O > > 1. Why are there any brackets at all in the first output? Why not just 'CCO'? The middle atom in "CCO" has two hydrogens. The middle atom in "C[C]O" has no hydrogens. > 2. Is there any documentation anywhere that the "isomericSmiles" argument is > also an "isotopicSmiles" argument? I don't believe so. A search via DuckDuckGo of rdkit.org finds only two irrelevant matches. > I am also confused about when Chem.MolToSmiles() puts in H atoms in the > output. SMILES has a short-hand notation to represent hydrogens. "[CH4]" and "C" are both methane. When atom is described using brackets then the number of hydrogens must be specified with the H notation. When an atom is described without brackets then the number of hydrogens is based on the permitted valence values. C has a valence of 4, -C- has two single bonds, so the middle carbon of CCO has two hydrogen bonds to complete the valence. The output mechanism prefers to use the short-hand notation if possible. That isn't possible if the sum of hydrogens and bond types is different than one of the valence levels, or if there is an isotope, charge, chiral, etc., which requires the use of []s. > > >>> three_hb1 = Chem.MolFromSmiles('C[13CH](O)C[13C](=O)O') > >>> three_hb2 = Chem.MolFromSmiles('C[13C](O)C[13C](=O)O') > >>> print(Chem.MolToSmiles(three_hb1, isomericSmiles=True)) > > C[13CH](O)C[13C](=O)O > > >>> print(Chem.MolToSmiles(three_hb2, isomericSmiles=True)) > > C[13C](O)C[13C](=O)O > > >>> print(Chem.MolToSmiles(three_hb1, isomericSmiles=False)) > > CC(O)CC(=O)O > > >>> print(Chem.MolToSmiles(three_hb2, isomericSmiles=False)) > > C[C](O)CC(=O)O > > 3. Why are there no brackets for three_hb1 output, but there are for > three_hb2? I think you mean "for the isomericSmiles=False" output? The first three_hb1 output has brackets. The isotope notation requires []s, so the option of using the short-hand notation doesn't exist. In that case the number of hydrogens must be specified as otherwise it means the atom has no hydrogens. > 4. As far as I can tell, the two three_hb molecules are identical. Why > aren't all Hs removed during canonicalization? The second atom in three_hb1 has 1 hydrogen and three single bonds. The second atom in three_hb2 has 0 hydrogens and three single bonds. They are different structures so have different SMILES. Cheers, Andrew da...@dalkescientific.com -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] chirality assignment
Dear All, I'm generating conformation of a molecule: C1C2C3OC3C1C13OC21C1CC3C2OC21 This molecule has many chiral centers and 10 possible isomers. EmbedMolecule command of RDKit_2015_03_1 can generate every isomer but RDKit_2016_09_3 fails in 9 of 10. For example, RDKit_2015_03_1 - >>> mol=Chem.MolFromSmiles('C1[C@H]2[C@@H]3O[C@@H]3[C@@H]1[C@@]13O[C@@]21[C@@H]1C[C@H]3[C@@H]2O[C@@H]21') >>> Chem.FindMolChiralCenters( copy(m), includeUnassigned=True ) [(1, 'S'), (2, 'S'), (4, 'R'), (5, 'R'), (6, 'R'), (8, 'R'), (9, 'R'), (11, 'S'), (12, 'S'), (14, 'R')] >>> m=Chem.AddHs(mol) >>> AllChem.EmbedMolecule( m, randomSeed = 256, maxAttempts = 1, clearConfs = >>> False ) 0 - RDKit_2016_09_3 - >>> mol=Chem.MolFromSmiles('C1[C@H]2[C@@H]3O[C@@H]3[C@@H]1[C@@]13O[C@@]21[C@@H]1C[C@H]3[C@@H]2O[C@@H]21') >>> m=Chem.AddHs(mol) >>> Chem.FindMolChiralCenters( copy(m), includeUnassigned=True ) [(1, 'S'), (2, 'S'), (4, 'R'), (5, 'R'), (6, 'R'), (8, 'R'), (9, 'R'), (11, 'S'), (12, 'S'), (14, 'R')] >>> AllChem.EmbedMolecule( m, randomSeed = 256, maxAttempts = 100, clearConfs = >>> False ) -1 - Two chiral centers in this molecule are stereo-dependent (6th and 8th atoms). Conformation of molecule without assignment for these atoms can be generated, but the chiralities remain unassigned. RDKit_2016_09_3 - >>> mol=Chem.MolFromSmiles('C1[C@H]2[C@@H]3O[C@@H]3[C@@H]1C13OC21[C@@H]1C[C@H]3[C@@H]2O[C@@H]21') >>> Chem.FindMolChiralCenters( copy(m), includeUnassigned=True ) [(1, 'S'), (2, 'S'), (4, 'R'), (5, 'R'), (6, '?'), (8, '?'), (9, 'R'), (11, 'S'), (12, 'S'), (14, 'R')] >>> m=Chem.AddHs(mol) >>> AllChem.EmbedMolecule( m, randomSeed = 256, maxAttempts = 100, clearConfs = >>> False ) 0 >>> Chem.AssignAtomChiralTagsFromStructure(m) >>> Chem.FindMolChiralCenters( copy(m), includeUnassigned=True ) [(1, 'S'), (2, 'S'), (4, 'R'), (5, 'R'), (6, '?'), (8, '?'), (9, 'S'), (11, 'R'), (12, 'S'), (14, 'R')] >>> opt = AllChem.UFFOptimizeMolecule( m, maxIters = 1, confId=0) >>> ff = AllChem.UFFGetMoleculeForceField( m, confId = 0 ) >>> ff.Minimize() 0 >>> Chem.AssignAtomChiralTagsFromStructure(m) >>> Chem.FindMolChiralCenters( copy(m), includeUnassigned=True ) [(1, 'S'), (2, 'S'), (4, 'R'), (5, 'R'), (6, '?'), (8, '?'), (9, 'R'), (11, 'S'), (12, 'S'), (14, 'R')] - How can I assign chiralities of these atoms in RDKit_2016_09_3? Regards, Rintarou Suzuki, Rintarou National Agriculture and Food Research Organization Tsukuba, Japan -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] isotopic SMILES
Hellow rdkit users, What behavior should we expect for Chem.MolToSmiles() when dealing with isotopically substituted molecules? I am confused by this behavior: >>> labeled_etoh = Chem.MolFromSmiles('C[13C]O') >>> print(Chem.MolToSmiles(labeled_etoh)) C[C]O >>> print(Chem.MolToSmiles(labeled_etoh, isomericSmiles=True)) C[13C]O 1. Why are there any brackets at all in the first output? Why not just 'CCO '? 2. Is there any documentation anywhere that the "isomericSmiles" argument is also an "isotopicSmiles" argument? I am also confused about when Chem.MolToSmiles() puts in H atoms in the output. >>> three_hb1 = Chem.MolFromSmiles('C[13CH](O)C[13C](=O)O') >>> three_hb2 = Chem.MolFromSmiles('C[13C](O)C[13C](=O)O') >>> print(Chem.MolToSmiles(three_hb1, isomericSmiles=True)) C[13CH](O)C[13C](=O)O >>> print(Chem.MolToSmiles(three_hb2, isomericSmiles=True)) C[13C](O)C[13C](=O)O >>> print(Chem.MolToSmiles(three_hb1, isomericSmiles=False)) CC(O)CC(=O)O >>> print(Chem.MolToSmiles(three_hb2, isomericSmiles=False)) C[C](O)CC(=O)O 3. Why are there no brackets for three_hb1 output, but there are for three_hb2? 4. As far as I can tell, the two three_hb molecules are identical. Why aren't all Hs removed during canonicalization? Curt -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Looking for a bit of testing for py27 on windows
Dear all, I'd like to try an experiment with the windows build of the new RDKit patch release (2016.09.4): instead of using the (ancient) recommended compiler for the conda build, I have done a build using the most recent version of visual studio (VS2015). It would make life significantly easier if we could use this as the standard solution for doing windows builds. I've tested on both Windows 10 and Windows 7, but before I put it on the normal conda site, I'll like to be sure that it works for others too. If you're a conda user and are willing to help out, please try creating a new conda environment with python 2.7 and installing the rdkit like this: "conda install -c greglandrum rdkit". If conda has problems finding boost, you may need to: "conda install -c rdkit boost" first. If you do give it a try, please let me know, whether it works or not. One thing to try if you do encounter problems, please try installing the VS2015 redistributable DLLs from Microsoft: https://www.microsoft.com/en-us/download/details.aspx?id=48145 (these are normal DLLs, nothing odd). Thanks, in advance, for any feedback!-greg -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss