Re: [Rdkit-discuss] MolFromPDBBlock and heterocycles

2016-09-07 Thread Sereina Riniker
Hi Steven,

The PDB reader in the RDKit doesn’t determine any bond orders - everything
is read as a single bond.
In order to set the bond orders, you need to call the
AssignBondOrdersFromTemplate() function using a reference molecule
generated from SMILES (or SDF).

Here is some example code from the docs:

>>> from rdkit.Chem import AllChem
>>> template = AllChem.MolFromSmiles("CN1C(=NC(C1=O)(c2c2)c3c3)N")
>>> mol = AllChem.MolFromPDBFile(os.path.join(RDConfig.RDCodeDir, 'Chem',
'test_data', '4DJU_lig.pdb'))
>>> len([1 for b in template.GetBonds() if b.GetBondTypeAsDouble() == 1.0])
8
>>> len([1 for b in mol.GetBonds() if b.GetBondTypeAsDouble() == 1.0])
22

Now assign the bond orders based on the template molecule
>>> newMol = AllChem.AssignBondOrdersFromTemplate(template, mol)
>>> len([1 for b in newMol.GetBonds() if b.GetBondTypeAsDouble() == 1.0])
8

Note that the template molecule should have no explicit hydrogens
else the algorithm will fail.

Hope this helps.

Best,
Sereina


2016-09-07 17:16 GMT+02:00 Steven Combs :

> Hello!
>
> I have a pdb block that I am working with, which is attached to this
> email. The ligand has aromatic ring structures in it; however, when it is
> read into RDKit and converted into a smiles string, the aromatic rings are
> converted into aliphatic rings. Any thoughts?
>
> Here is the python code:
>
> def extract_data( filename):
> extracted_info = ""
> with open(filename) as f:
> for line in f.readlines():
> if "HETATM" in line:
> extracted_info += ( line)
> return extracted_info
>
> for index, filename in enumerate(solution_pdb_filenames):
> row = extract_data( filename)
> m = Chem.MolFromPDBBlock(row, sanitize=True, removeHs=False )
> Chem.SetHybridization(m)
> Chem.SetAromaticity(m)
> Chem.SanitizeMol(m, 
> sanitizeOps=Chem.rdmolops.SanitizeFlags.SANITIZE_ALL)
> #not needed since sanitizing during read in, but trying to figure out if it
> actually worked
> print ("Parsing file " + str(index) + " of " +
> str(len(solution_pdb_filenames)))
> print (Chem.MolToSmiles(m, kekuleSmiles=True, allHsExplicit=True))
>
> The output smile string is:
>
> [H][O][CH]1[NH][CH]([C]([H])([H])[CH]([OH])[OH])[CH]([C]([H]
> )([H])[C]([H])([H])[H])[CH]([CH]([OH])[CH]2[CH]([H])[CH]([
> H])[CH]([H])[CH]([N]([H])[H])[CH]2[H])[CH]1[N]([C]([H])([H])
> [H])[C]([H])([H])[H]
>
> Steven Combs
>
>
>
> 
> --
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolFromPDBBlock and heterocycles

2016-09-07 Thread Sereina
Hi Steven,

The PDB reader in the RDKit doesn’t determine any bond orders - everything is 
read as a single bond. 
In order to set the bond orders, you need to call the 
AssignBondOrdersFromTemplate() function using a reference molecule generated 
from SMILES (or SDF).

Here is some example code from the docs:

>>> from rdkit.Chem import AllChem
>>> template = AllChem.MolFromSmiles("CN1C(=NC(C1=O)(c2c2)c3c3)N")
>>> mol = AllChem.MolFromPDBFile(os.path.join(RDConfig.RDCodeDir, 'Chem', 
>>> 'test_data', '4DJU_lig.pdb'))
>>> len([1 for b in template.GetBonds() if b.GetBondTypeAsDouble() == 1.0])
8
>>> len([1 for b in mol.GetBonds() if b.GetBondTypeAsDouble() == 1.0])
22

Now assign the bond orders based on the template molecule
>>> newMol = AllChem.AssignBondOrdersFromTemplate(template, mol)
>>> len([1 for b in newMol.GetBonds() if b.GetBondTypeAsDouble() == 1.0])
8

Note that the template molecule should have no explicit hydrogens
else the algorithm will fail.

Hope this helps.

Best,
Sereina


On 07 Sep 2016, at 17:16, Steven Combs  wrote:

> Hello!
> 
> I have a pdb block that I am working with, which is attached to this email. 
> The ligand has aromatic ring structures in it; however, when it is read into 
> RDKit and converted into a smiles string, the aromatic rings are converted 
> into aliphatic rings. Any thoughts?
> 
> Here is the python code:
> 
> def extract_data( filename):
> extracted_info = ""
> with open(filename) as f:
> for line in f.readlines():
> if "HETATM" in line:
> extracted_info += ( line)
> return extracted_info
> 
> for index, filename in enumerate(solution_pdb_filenames):
> row = extract_data( filename)
> m = Chem.MolFromPDBBlock(row, sanitize=True, removeHs=False ) 
> Chem.SetHybridization(m)
> Chem.SetAromaticity(m)
> Chem.SanitizeMol(m, 
> sanitizeOps=Chem.rdmolops.SanitizeFlags.SANITIZE_ALL) #not needed since 
> sanitizing during read in, but trying to figure out if it actually worked
> print ("Parsing file " + str(index) + " of " + 
> str(len(solution_pdb_filenames)))
> print (Chem.MolToSmiles(m, kekuleSmiles=True, allHsExplicit=True))
> 
> The output smile string is:
> 
> [H][O][CH]1[NH][CH]([C]([H])([H])[CH]([OH])[OH])[CH]([C]([H])([H])[C]([H])([H])[H])[CH]([CH]([OH])[CH]2[CH]([H])[CH]([H])[CH]([H])[CH]([N]([H])[H])[CH]2[H])[CH]1[N]([C]([H])([H])[H])[C]([H])([H])[H]
> 
> Steven Combs
> 
> 
> --
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] drawing code take 2

2016-09-07 Thread Dimitri Maziuk
:(

Here's one where AddHs() really breaks things. It is an unpleasant
molecule to draw.

So it looks like the issue is that
- for CID 112084 call to AddHs() changed the layout (arguably not for
the better), whereas
- for CID 260719 it didn't change the layout/shorten the bonds when it
really should have.

For reference, CID260719.ob.svg is the other toolkit's rendering of the
same file with (atom indexes changed to green from OB's default red).

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu


alatis_output_Structure3D_CID_260719.sdf
Description: application/vnd.kinar


signature.asc
Description: OpenPGP digital signature
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] MolFromPDBBlock and heterocycles

2016-09-07 Thread Steven Combs
Hello!

I have a pdb block that I am working with, which is attached to this email.
The ligand has aromatic ring structures in it; however, when it is read
into RDKit and converted into a smiles string, the aromatic rings are
converted into aliphatic rings. Any thoughts?

Here is the python code:

def extract_data( filename):
extracted_info = ""
with open(filename) as f:
for line in f.readlines():
if "HETATM" in line:
extracted_info += ( line)
return extracted_info

for index, filename in enumerate(solution_pdb_filenames):
row = extract_data( filename)
m = Chem.MolFromPDBBlock(row, sanitize=True, removeHs=False )
Chem.SetHybridization(m)
Chem.SetAromaticity(m)
Chem.SanitizeMol(m,
sanitizeOps=Chem.rdmolops.SanitizeFlags.SANITIZE_ALL) #not needed since
sanitizing during read in, but trying to figure out if it actually worked
print ("Parsing file " + str(index) + " of " +
str(len(solution_pdb_filenames)))
print (Chem.MolToSmiles(m, kekuleSmiles=True, allHsExplicit=True))

The output smile string is:

[H][O][CH]1[NH][CH]([C]([H])([H])[CH]([OH])[OH])[CH]([C]([H])([H])[C]([H])([H])[H])[CH]([CH]([OH])[CH]2[CH]([H])[CH]([H])[CH]([H])[CH]([N]([H])[H])[CH]2[H])[CH]1[N]([C]([H])([H])[H])[C]([H])([H])[H]

Steven Combs


test.pdb
Description: Protein Databank data
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolWt of substructure hit?

2016-09-07 Thread Stephen O'hagan
Hi, 

Thanks for this, the clue that I needed was that there's a method:

" matches = mol.GetSubstructMatches(pat) "

This should work fine for what I need.

Cheers,
Steve.

-Original Message-
From: Andrew Dalke [mailto:da...@dalkescientific.com] 
Sent: 07 September 2016 12:10
To: Stephen O'hagan 
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] MolWt of substructure hit?

On Sep 7, 2016, at 11:53 AM, Stephen O'hagan wrote:
> How would I find the molecular weight (fraction) of that substructure within 
> a compounds expressed as a SMILES string, e.g.:

I don't know if a built-in function which does this. It's possible to write 
one. Here's a function which will compute the molecular weight given the 
molecule and the atom indices for the fragment.

def get_fragment_molwt(mol, atom_indices):
assert len(atom_indices) == len(set(atom_indices)) # quick duplicate check
molwt = 0.0
for atom_index in atom_indices:
atom = mol.GetAtomWithIdx(atom_index)
molwt += atom.GetMass()
return molt

If you want to include the hydrogen mass, then use this variant:

from rdkit import Chem

_H_mass = Chem.Atom(1).GetMass()
def get_fragment_molwt(mol, atom_indices):
assert len(atom_indices) == len(set(atom_indices)) # quick duplicate check
molwt = 0.0
for atom_index in atom_indices:
atom = mol.GetAtomWithIdx(atom_index)
molwt += atom.GetMass() + atom.GetTotalNumHs() * _H_mass
return molt


Here's an example of how to use the function:

#==
from rdkit import Chem

def get_fragment_molwt():
  ... as above ...

smiles = "CC(=O)O[C@H]1CC[C@@]2(C)C(=CCC3C4CC=C(c5cccnc5)[C@@]4(C)CCC32)C1"
smarts = 
"[#6](:,-[#6]:,-[#6](-[#6]):,-[#6]-[#6](:[#6]:[#7]):[#6]:[#6]):,-[#6]:,-[#6]"

mol = Chem.MolFromSmiles(smiles)
assert mol is not None, smiles

pat = Chem.MolFromSmarts(smarts)
assert pat is not None, smarts

matches = mol.GetSubstructMatches(pat)

molwt = MolWt(mol)
for match_no, match in enumerate(matches, 1):
fragment_molwt = get_fragment_molwt(mol, match)
print("#{}: {:.2%}".format(match_no, fragment_molwt/molwt)) #==


If I don't include the hydrogens in the fragment weight calculation then I get:

#1: 37.32%
#2: 37.32%
#3: 37.32%
   ...

If I include the hydrogens, then I get:

#1: 40.15%
#2: 39.64%
#3: 40.15%
   ...

Cheers,

Andrew
da...@dalkescientific.com



--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolWt of substructure hit?

2016-09-07 Thread Andrew Dalke
On Sep 7, 2016, at 11:53 AM, Stephen O'hagan wrote:
> How would I find the molecular weight (fraction) of that substructure within 
> a compounds expressed as a SMILES string, e.g.:

I don't know if a built-in function which does this. It's possible to write 
one. Here's a function which will compute the molecular weight given the 
molecule and the atom indices for the fragment.

def get_fragment_molwt(mol, atom_indices):
assert len(atom_indices) == len(set(atom_indices)) # quick duplicate check
molwt = 0.0
for atom_index in atom_indices:
atom = mol.GetAtomWithIdx(atom_index)
molwt += atom.GetMass()
return molt

If you want to include the hydrogen mass, then use this variant:

from rdkit import Chem

_H_mass = Chem.Atom(1).GetMass()
def get_fragment_molwt(mol, atom_indices):
assert len(atom_indices) == len(set(atom_indices)) # quick duplicate check
molwt = 0.0
for atom_index in atom_indices:
atom = mol.GetAtomWithIdx(atom_index)
molwt += atom.GetMass() + atom.GetTotalNumHs() * _H_mass
return molt


Here's an example of how to use the function:

#==
from rdkit import Chem

def get_fragment_molwt():
  ... as above ...

smiles = "CC(=O)O[C@H]1CC[C@@]2(C)C(=CCC3C4CC=C(c5cccnc5)[C@@]4(C)CCC32)C1"
smarts = 
"[#6](:,-[#6]:,-[#6](-[#6]):,-[#6]-[#6](:[#6]:[#7]):[#6]:[#6]):,-[#6]:,-[#6]"

mol = Chem.MolFromSmiles(smiles)
assert mol is not None, smiles

pat = Chem.MolFromSmarts(smarts)
assert pat is not None, smarts

matches = mol.GetSubstructMatches(pat)

molwt = MolWt(mol)
for match_no, match in enumerate(matches, 1):
fragment_molwt = get_fragment_molwt(mol, match)
print("#{}: {:.2%}".format(match_no, fragment_molwt/molwt))
#==


If I don't include the hydrogens in the fragment weight calculation then I get:

#1: 37.32%
#2: 37.32%
#3: 37.32%
   ...

If I include the hydrogens, then I get:

#1: 40.15%
#2: 39.64%
#3: 40.15%
   ...

Cheers,

Andrew
da...@dalkescientific.com



--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] MolWt of substructure hit?

2016-09-07 Thread Stephen O'hagan
Hi,

Supposing I have identified a substructure as a SMARTS string, e.g.

[#6](:,-[#6]:,-[#6](-[#6]):,-[#6]-[#6](:[#6]:[#7]):[#6]:[#6]):,-[#6]:,-[#6]

- In general, this may have wild card atoms.

How would I find the molecular weight (fraction) of that substructure within a 
compounds expressed as a SMILES string, e.g.:

CC(=O)O[C@H]1CC[C@@]2(C)C(=CCC3C4CC=C(c5cccnc5)[C@@]4(C)CCC32)C1

I may or may not wish to count multiple hits in one target.

Cheers,
Steve.
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss