Greg Landrum wrote:
Dear Markus,
On Thu, Jul 8, 2010 at 2:56 PM, markus kossner <m.koss...@tu-bs.de> wrote:
Dear Cedric,
months ago I did some extraction of frameworks with the RDKit. I'm
neither sure, if my script follows the exact Murcko paper, nor is the
snippet well tested.
At least it could provide a starting point for your further work. I sent
an email to your personal address with the script attached.
Greg, if this is of general interest i could certainly send the code to
the discuss list.
I think it's definitely of general interest, so please send it.
-greg
O.K, here's my little script called Frames.py.
Basically what the code does is a walk from every Atom that is the
neighbor of a ring away from that ring. If we can not reach another
ring on the way, the atom chain will be deleted. However, as soon as we
reach another ring, the chain will become a connecting chain in the
scaffold. Ring Atoms [found by GetRingInfo()] as well as connecting
non-ring chains are stored in a Chem.EditableMol-Object representing the
Scaffold of the molecule.
There is a function, that can return this scaffold in two forms that
differ in the interpretation of what a 'Scaffold' is:
- with mode='RedScaff' a Scaffold will be the pure molecular core
ignoring bond or atom types!
this is done by running over the Scaffold (...the editable mol from
above) and changing any atom to carbon and any bond to a single bond
- with mode='Scaff' a Scaffold will be sensitive of element and bond type.
With mode='Scaff' Cyclohexyl-, phenyl or pyridyl- mojeties will be
different ring species.
mode='RedScaff' will interpret them as six-membered rings and make all
of them become a cyclohexane in the scaffold representation.
Usage of the script:
simply supply a .sdf file after the script name.
The program will generate a .sdf file called frames.sdf containing the
cluster ID of the frame it belongs to as a property tag.
Additionally the raw frames are output to 'frames.smi'. This is a simple
enumeration of the frames detected.
Please note, that the data is preliminary stored in memory. This means,
that very large input files may cause the consumption of large amounts
of memory at runtime!
Cheers, Markus
#!/usr/bin/env python2.6
# encoding: utf-8
import os,sys
from Chem import AllChem as Chem
def flatten(x):
"""flatten(sequence) -> list
Returns a single, flat list which contains all elements retrieved
from the sequence and all nested sub-sequences (iterables).
Examples:
>>> [1, 2, [3,4], (5,6)]
[1, 2, [3, 4], (5, 6)]
>>> flatten([[[1,2,3], (42,None)], [4,5], [6], 7, MyVector(8,9,10)])
[1, 2, 3, 42, None, 4, 5, 6, 7, 8, 9, 10]"""
result = []
for el in x:
if hasattr(el, "__iter__") and not isinstance(el, basestring):
result.extend(flatten(el))
else:
result.append(el)
return result
def GetFrame(mol,mode='RedScaff'):
'''return a ganeric molecule defining the reduced scaffold of the molecule.
mode can be 'RedScaff' and 'Scaff'. The second mode will turn every atom to a carbon and every bond to a single bond!'''
ring=mol.GetRingInfo()
RingAtoms= flatten(ring.AtomRings())
NonRingAtoms=[ atom.GetIdx() for atom in mol.GetAtoms() if atom.GetIdx() not in RingAtoms ]
RingNeighbors=[]
Paths=[]
for NonRingAtom in NonRingAtoms:
for neighbor in mol.GetAtomWithIdx(NonRingAtom).GetNeighbors():
if neighbor.GetIdx() in RingAtoms:
RingNeighbors.append(NonRingAtom)
Paths.append([neighbor.GetIdx(),NonRingAtom]) #The ring Atoms having a non ring Nieghbor will be the start of a walk
break
PosLinkers=[x for x in NonRingAtoms if x not in RingNeighbors] #Only these Atoms are Possible Linkers of two rings
Framework=[ x for x in RingAtoms ]
Linkers=[]
while len(Paths)>0:
NewPaths=[]
for P in Paths:
if P==None:
print 'ooh, there is still a bug somewere '
else:
for neighbor in mol.GetAtomWithIdx(P[-1]).GetNeighbors():
if neighbor.GetIdx() not in P:
if neighbor.GetIdx() in NonRingAtoms:
n=P[:]
n.append(neighbor.GetIdx())
NewPaths.append(n[:])
elif neighbor.GetIdx() in RingAtoms:
n=P[:]
n.append(neighbor.GetIdx())
Linkers.append(n)
Framework=Framework+P[:]
Paths=NewPaths[:]
#print 'Linkers:',Linkers
#print 'RingAtoms:',RingAtoms
if mode=='Scaff':
Framework=list(set(Framework))
todel=[]
NonRingAtoms.sort(reverse=True)
em=Chem.EditableMol(mol)
BondsToAdd=[ sorted([i[0],i[-1]]) for i in Linkers ]
mem=[]
for i in BondsToAdd:
if i not in mem:
em.AddBond(i[0],i[1],Chem.BondType.SINGLE)
mem.append(i)
for i in NonRingAtoms:
todel.append(i)
for i in todel:
em.RemoveAtom(i)
m=em.GetMol()
return m
if mode=='RedScaff':
Framework=list(set(Framework))
todel=[]
NonRingAtoms.sort(reverse=True)
for i in NonRingAtoms:
if i != None:
if i not in Framework:
todel.append(i)
em=Chem.EditableMol(mol)
for i in todel:
em.RemoveAtom(i)
m=em.GetMol()
#===================================#
#Now do the flattening of atoms and bonds!
#Any heavy atom will become a carbon and any bond will become a single bond!! #
#===================================#
for atom in m.GetAtoms(): #
atom.SetAtomicNum(6) #
atom.SetFormalCharge(0) #
for bond in m.GetBonds(): #
bond.SetBondType(Chem.BondType.SINGLE) #
Chem.SanitizeMol(m) #
#===================================#
return m
if len(sys.argv) < 2:
print "No input file provided: Frames.py <file-to-process.sdf>"
sys.exit(1)
suppl=Chem.SDMolSupplier(sys.argv[1])
FrameDict={}
for mol in suppl:
if mol == 'None': continue
try:
m=GetFrame(mol,mode='RedScaff')
if FrameDict.has_key(Chem.MolToSmiles(m)):
FrameDict[Chem.MolToSmiles(m)].append(mol)
else:
FrameDict[Chem.MolToSmiles(m)]=[mol,]
except:
print '--------------------------------'
print 'could not process the molecule with the following name:'
print mol.GetProp('_Name')
counter=0
w=open('frames.smi','w')
d=Chem.SDWriter('frames.sdf')
for key,item in FrameDict.items():
counter+=1
#d=Chem.SDWriter(str(counter)+'.sdf')
for i in item:
i.SetProp('Scaffold',key)
i.SetProp('Cluster',str(counter))
d.write(i)
print key,len(item)
w.write(key+'\n')
w.close
print 'number of Frames found: %d' %(counter)
------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss