Re: [Rdkit-discuss] MolFromPDBBlock and heterocycles
Hi Steven, The PDB reader in the RDKit doesn’t determine any bond orders - everything is read as a single bond. In order to set the bond orders, you need to call the AssignBondOrdersFromTemplate() function using a reference molecule generated from SMILES (or SDF). Here is some example code from the docs: >>> from rdkit.Chem import AllChem >>> template = AllChem.MolFromSmiles("CN1C(=NC(C1=O)(c2c2)c3c3)N") >>> mol = AllChem.MolFromPDBFile(os.path.join(RDConfig.RDCodeDir, 'Chem', 'test_data', '4DJU_lig.pdb')) >>> len([1 for b in template.GetBonds() if b.GetBondTypeAsDouble() == 1.0]) 8 >>> len([1 for b in mol.GetBonds() if b.GetBondTypeAsDouble() == 1.0]) 22 Now assign the bond orders based on the template molecule >>> newMol = AllChem.AssignBondOrdersFromTemplate(template, mol) >>> len([1 for b in newMol.GetBonds() if b.GetBondTypeAsDouble() == 1.0]) 8 Note that the template molecule should have no explicit hydrogens else the algorithm will fail. Hope this helps. Best, Sereina 2016-09-07 17:16 GMT+02:00 Steven Combs: > Hello! > > I have a pdb block that I am working with, which is attached to this > email. The ligand has aromatic ring structures in it; however, when it is > read into RDKit and converted into a smiles string, the aromatic rings are > converted into aliphatic rings. Any thoughts? > > Here is the python code: > > def extract_data( filename): > extracted_info = "" > with open(filename) as f: > for line in f.readlines(): > if "HETATM" in line: > extracted_info += ( line) > return extracted_info > > for index, filename in enumerate(solution_pdb_filenames): > row = extract_data( filename) > m = Chem.MolFromPDBBlock(row, sanitize=True, removeHs=False ) > Chem.SetHybridization(m) > Chem.SetAromaticity(m) > Chem.SanitizeMol(m, > sanitizeOps=Chem.rdmolops.SanitizeFlags.SANITIZE_ALL) > #not needed since sanitizing during read in, but trying to figure out if it > actually worked > print ("Parsing file " + str(index) + " of " + > str(len(solution_pdb_filenames))) > print (Chem.MolToSmiles(m, kekuleSmiles=True, allHsExplicit=True)) > > The output smile string is: > > [H][O][CH]1[NH][CH]([C]([H])([H])[CH]([OH])[OH])[CH]([C]([H] > )([H])[C]([H])([H])[H])[CH]([CH]([OH])[CH]2[CH]([H])[CH]([ > H])[CH]([H])[CH]([N]([H])[H])[CH]2[H])[CH]1[N]([C]([H])([H]) > [H])[C]([H])([H])[H] > > Steven Combs > > > > > -- > > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MolFromPDBBlock and heterocycles
Hi Steven, The PDB reader in the RDKit doesn’t determine any bond orders - everything is read as a single bond. In order to set the bond orders, you need to call the AssignBondOrdersFromTemplate() function using a reference molecule generated from SMILES (or SDF). Here is some example code from the docs: >>> from rdkit.Chem import AllChem >>> template = AllChem.MolFromSmiles("CN1C(=NC(C1=O)(c2c2)c3c3)N") >>> mol = AllChem.MolFromPDBFile(os.path.join(RDConfig.RDCodeDir, 'Chem', >>> 'test_data', '4DJU_lig.pdb')) >>> len([1 for b in template.GetBonds() if b.GetBondTypeAsDouble() == 1.0]) 8 >>> len([1 for b in mol.GetBonds() if b.GetBondTypeAsDouble() == 1.0]) 22 Now assign the bond orders based on the template molecule >>> newMol = AllChem.AssignBondOrdersFromTemplate(template, mol) >>> len([1 for b in newMol.GetBonds() if b.GetBondTypeAsDouble() == 1.0]) 8 Note that the template molecule should have no explicit hydrogens else the algorithm will fail. Hope this helps. Best, Sereina On 07 Sep 2016, at 17:16, Steven Combswrote: > Hello! > > I have a pdb block that I am working with, which is attached to this email. > The ligand has aromatic ring structures in it; however, when it is read into > RDKit and converted into a smiles string, the aromatic rings are converted > into aliphatic rings. Any thoughts? > > Here is the python code: > > def extract_data( filename): > extracted_info = "" > with open(filename) as f: > for line in f.readlines(): > if "HETATM" in line: > extracted_info += ( line) > return extracted_info > > for index, filename in enumerate(solution_pdb_filenames): > row = extract_data( filename) > m = Chem.MolFromPDBBlock(row, sanitize=True, removeHs=False ) > Chem.SetHybridization(m) > Chem.SetAromaticity(m) > Chem.SanitizeMol(m, > sanitizeOps=Chem.rdmolops.SanitizeFlags.SANITIZE_ALL) #not needed since > sanitizing during read in, but trying to figure out if it actually worked > print ("Parsing file " + str(index) + " of " + > str(len(solution_pdb_filenames))) > print (Chem.MolToSmiles(m, kekuleSmiles=True, allHsExplicit=True)) > > The output smile string is: > > [H][O][CH]1[NH][CH]([C]([H])([H])[CH]([OH])[OH])[CH]([C]([H])([H])[C]([H])([H])[H])[CH]([CH]([OH])[CH]2[CH]([H])[CH]([H])[CH]([H])[CH]([N]([H])[H])[CH]2[H])[CH]1[N]([C]([H])([H])[H])[C]([H])([H])[H] > > Steven Combs > > > -- > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] drawing code take 2
:( Here's one where AddHs() really breaks things. It is an unpleasant molecule to draw. So it looks like the issue is that - for CID 112084 call to AddHs() changed the layout (arguably not for the better), whereas - for CID 260719 it didn't change the layout/shorten the bonds when it really should have. For reference, CID260719.ob.svg is the other toolkit's rendering of the same file with (atom indexes changed to green from OB's default red). -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu alatis_output_Structure3D_CID_260719.sdf Description: application/vnd.kinar signature.asc Description: OpenPGP digital signature -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] MolFromPDBBlock and heterocycles
Hello! I have a pdb block that I am working with, which is attached to this email. The ligand has aromatic ring structures in it; however, when it is read into RDKit and converted into a smiles string, the aromatic rings are converted into aliphatic rings. Any thoughts? Here is the python code: def extract_data( filename): extracted_info = "" with open(filename) as f: for line in f.readlines(): if "HETATM" in line: extracted_info += ( line) return extracted_info for index, filename in enumerate(solution_pdb_filenames): row = extract_data( filename) m = Chem.MolFromPDBBlock(row, sanitize=True, removeHs=False ) Chem.SetHybridization(m) Chem.SetAromaticity(m) Chem.SanitizeMol(m, sanitizeOps=Chem.rdmolops.SanitizeFlags.SANITIZE_ALL) #not needed since sanitizing during read in, but trying to figure out if it actually worked print ("Parsing file " + str(index) + " of " + str(len(solution_pdb_filenames))) print (Chem.MolToSmiles(m, kekuleSmiles=True, allHsExplicit=True)) The output smile string is: [H][O][CH]1[NH][CH]([C]([H])([H])[CH]([OH])[OH])[CH]([C]([H])([H])[C]([H])([H])[H])[CH]([CH]([OH])[CH]2[CH]([H])[CH]([H])[CH]([H])[CH]([N]([H])[H])[CH]2[H])[CH]1[N]([C]([H])([H])[H])[C]([H])([H])[H] Steven Combs test.pdb Description: Protein Databank data -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MolWt of substructure hit?
Hi, Thanks for this, the clue that I needed was that there's a method: " matches = mol.GetSubstructMatches(pat) " This should work fine for what I need. Cheers, Steve. -Original Message- From: Andrew Dalke [mailto:da...@dalkescientific.com] Sent: 07 September 2016 12:10 To: Stephen O'haganCc: rdkit-discuss@lists.sourceforge.net Subject: Re: [Rdkit-discuss] MolWt of substructure hit? On Sep 7, 2016, at 11:53 AM, Stephen O'hagan wrote: > How would I find the molecular weight (fraction) of that substructure within > a compounds expressed as a SMILES string, e.g.: I don't know if a built-in function which does this. It's possible to write one. Here's a function which will compute the molecular weight given the molecule and the atom indices for the fragment. def get_fragment_molwt(mol, atom_indices): assert len(atom_indices) == len(set(atom_indices)) # quick duplicate check molwt = 0.0 for atom_index in atom_indices: atom = mol.GetAtomWithIdx(atom_index) molwt += atom.GetMass() return molt If you want to include the hydrogen mass, then use this variant: from rdkit import Chem _H_mass = Chem.Atom(1).GetMass() def get_fragment_molwt(mol, atom_indices): assert len(atom_indices) == len(set(atom_indices)) # quick duplicate check molwt = 0.0 for atom_index in atom_indices: atom = mol.GetAtomWithIdx(atom_index) molwt += atom.GetMass() + atom.GetTotalNumHs() * _H_mass return molt Here's an example of how to use the function: #== from rdkit import Chem def get_fragment_molwt(): ... as above ... smiles = "CC(=O)O[C@H]1CC[C@@]2(C)C(=CCC3C4CC=C(c5cccnc5)[C@@]4(C)CCC32)C1" smarts = "[#6](:,-[#6]:,-[#6](-[#6]):,-[#6]-[#6](:[#6]:[#7]):[#6]:[#6]):,-[#6]:,-[#6]" mol = Chem.MolFromSmiles(smiles) assert mol is not None, smiles pat = Chem.MolFromSmarts(smarts) assert pat is not None, smarts matches = mol.GetSubstructMatches(pat) molwt = MolWt(mol) for match_no, match in enumerate(matches, 1): fragment_molwt = get_fragment_molwt(mol, match) print("#{}: {:.2%}".format(match_no, fragment_molwt/molwt)) #== If I don't include the hydrogens in the fragment weight calculation then I get: #1: 37.32% #2: 37.32% #3: 37.32% ... If I include the hydrogens, then I get: #1: 40.15% #2: 39.64% #3: 40.15% ... Cheers, Andrew da...@dalkescientific.com -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MolWt of substructure hit?
On Sep 7, 2016, at 11:53 AM, Stephen O'hagan wrote: > How would I find the molecular weight (fraction) of that substructure within > a compounds expressed as a SMILES string, e.g.: I don't know if a built-in function which does this. It's possible to write one. Here's a function which will compute the molecular weight given the molecule and the atom indices for the fragment. def get_fragment_molwt(mol, atom_indices): assert len(atom_indices) == len(set(atom_indices)) # quick duplicate check molwt = 0.0 for atom_index in atom_indices: atom = mol.GetAtomWithIdx(atom_index) molwt += atom.GetMass() return molt If you want to include the hydrogen mass, then use this variant: from rdkit import Chem _H_mass = Chem.Atom(1).GetMass() def get_fragment_molwt(mol, atom_indices): assert len(atom_indices) == len(set(atom_indices)) # quick duplicate check molwt = 0.0 for atom_index in atom_indices: atom = mol.GetAtomWithIdx(atom_index) molwt += atom.GetMass() + atom.GetTotalNumHs() * _H_mass return molt Here's an example of how to use the function: #== from rdkit import Chem def get_fragment_molwt(): ... as above ... smiles = "CC(=O)O[C@H]1CC[C@@]2(C)C(=CCC3C4CC=C(c5cccnc5)[C@@]4(C)CCC32)C1" smarts = "[#6](:,-[#6]:,-[#6](-[#6]):,-[#6]-[#6](:[#6]:[#7]):[#6]:[#6]):,-[#6]:,-[#6]" mol = Chem.MolFromSmiles(smiles) assert mol is not None, smiles pat = Chem.MolFromSmarts(smarts) assert pat is not None, smarts matches = mol.GetSubstructMatches(pat) molwt = MolWt(mol) for match_no, match in enumerate(matches, 1): fragment_molwt = get_fragment_molwt(mol, match) print("#{}: {:.2%}".format(match_no, fragment_molwt/molwt)) #== If I don't include the hydrogens in the fragment weight calculation then I get: #1: 37.32% #2: 37.32% #3: 37.32% ... If I include the hydrogens, then I get: #1: 40.15% #2: 39.64% #3: 40.15% ... Cheers, Andrew da...@dalkescientific.com -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] MolWt of substructure hit?
Hi, Supposing I have identified a substructure as a SMARTS string, e.g. [#6](:,-[#6]:,-[#6](-[#6]):,-[#6]-[#6](:[#6]:[#7]):[#6]:[#6]):,-[#6]:,-[#6] - In general, this may have wild card atoms. How would I find the molecular weight (fraction) of that substructure within a compounds expressed as a SMILES string, e.g.: CC(=O)O[C@H]1CC[C@@]2(C)C(=CCC3C4CC=C(c5cccnc5)[C@@]4(C)CCC32)C1 I may or may not wish to count multiple hits in one target. Cheers, Steve. -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss