Greg Landrum wrote:
Dear Markus,

On Thu, Jul 8, 2010 at 2:56 PM, markus kossner <m.koss...@tu-bs.de> wrote:
Dear Cedric,
months ago I did some extraction of frameworks with the RDKit. I'm
neither sure, if my script follows the exact Murcko paper, nor is the
snippet well tested.
At least it could provide a starting point for your further work. I sent
an email to your personal address with the script attached.
Greg, if this is of general interest i could certainly send the code to
the discuss list.

I think it's definitely of general interest, so please send it.

-greg
O.K, here's my little script called Frames.py.
Basically what the code does is a walk from every Atom that is the neighbor of a ring away from that ring. If we can not reach another ring on the way, the atom chain will be deleted. However, as soon as we reach another ring, the chain will become a connecting chain in the scaffold. Ring Atoms [found by GetRingInfo()] as well as connecting non-ring chains are stored in a Chem.EditableMol-Object representing the Scaffold of the molecule.

There is a function, that can return this scaffold in two forms that differ in the interpretation of what a 'Scaffold' is: - with mode='RedScaff' a Scaffold will be the pure molecular core ignoring bond or atom types! this is done by running over the Scaffold (...the editable mol from above) and changing any atom to carbon and any bond to a single bond
- with mode='Scaff' a Scaffold will be sensitive of element and bond type.
With mode='Scaff' Cyclohexyl-, phenyl or pyridyl- mojeties will be different ring species. mode='RedScaff' will interpret them as six-membered rings and make all of them become a cyclohexane in the scaffold representation.

Usage of the script:
simply supply a .sdf file after the script name.
The program will generate a .sdf file called frames.sdf containing the cluster ID of the frame it belongs to as a property tag. Additionally the raw frames are output to 'frames.smi'. This is a simple enumeration of the frames detected.

Please note, that the data is preliminary stored in memory. This means, that very large input files may cause the consumption of large amounts of memory at runtime!

Cheers, Markus

#!/usr/bin/env python2.6
# encoding: utf-8


import os,sys
from Chem import AllChem as Chem

def flatten(x):
    """flatten(sequence) -> list
    Returns a single, flat list which contains all elements retrieved
    from the sequence and all nested sub-sequences (iterables).
    Examples:
    >>> [1, 2, [3,4], (5,6)]
    [1, 2, [3, 4], (5, 6)]
    >>> flatten([[[1,2,3], (42,None)], [4,5], [6], 7, MyVector(8,9,10)])
    [1, 2, 3, 42, None, 4, 5, 6, 7, 8, 9, 10]"""
    result = []
    for el in x:
        if hasattr(el, "__iter__") and not isinstance(el, basestring):
            result.extend(flatten(el))
        else:
            result.append(el)
    return result

def GetFrame(mol,mode='RedScaff'):
	'''return a ganeric molecule defining the reduced scaffold of the molecule.
	mode can be 'RedScaff' and 'Scaff'. The second mode will turn every atom to a carbon and every bond to a single bond!'''
	ring=mol.GetRingInfo()
	RingAtoms= flatten(ring.AtomRings())
	NonRingAtoms=[ atom.GetIdx() for atom in mol.GetAtoms() if atom.GetIdx() not in RingAtoms ]
	RingNeighbors=[]
	Paths=[]
	for NonRingAtom in NonRingAtoms:
		for neighbor in mol.GetAtomWithIdx(NonRingAtom).GetNeighbors():
			if neighbor.GetIdx() in RingAtoms:
				RingNeighbors.append(NonRingAtom)
				Paths.append([neighbor.GetIdx(),NonRingAtom]) #The ring Atoms having a non ring Nieghbor will be the start of a walk
				break
	PosLinkers=[x for x in NonRingAtoms if x not in RingNeighbors] #Only these Atoms are Possible Linkers of two rings
	Framework=[ x for x in RingAtoms ]
	Linkers=[]
	while len(Paths)>0:
		NewPaths=[]
		for P in Paths:
			if P==None:
				print 'ooh, there is still a bug somewere '
			else:
				for neighbor in mol.GetAtomWithIdx(P[-1]).GetNeighbors():
					if neighbor.GetIdx() not in P:
						if neighbor.GetIdx() in NonRingAtoms:
							n=P[:]
							n.append(neighbor.GetIdx())
							NewPaths.append(n[:])
						elif neighbor.GetIdx() in RingAtoms:
							n=P[:]
							n.append(neighbor.GetIdx())
							Linkers.append(n)
							Framework=Framework+P[:]

		Paths=NewPaths[:]
	#print 'Linkers:',Linkers
	#print 'RingAtoms:',RingAtoms
	if mode=='Scaff':
		Framework=list(set(Framework))
		todel=[]
		NonRingAtoms.sort(reverse=True)
		em=Chem.EditableMol(mol)
		BondsToAdd=[ sorted([i[0],i[-1]]) for i in Linkers ]
		mem=[]
		for i in BondsToAdd:
			if i not in mem:
				em.AddBond(i[0],i[1],Chem.BondType.SINGLE)
				mem.append(i)
		for i in NonRingAtoms:
			todel.append(i)
		for i in todel:
			em.RemoveAtom(i)
		m=em.GetMol()
		return m

	if mode=='RedScaff':
		Framework=list(set(Framework))
		todel=[]
		NonRingAtoms.sort(reverse=True)
		for i in NonRingAtoms:
			if i != None:
				if i not in Framework:
					todel.append(i)
		em=Chem.EditableMol(mol)
		for i in todel:
			em.RemoveAtom(i)
		m=em.GetMol()
		#===================================#
		#Now do the flattening of atoms and bonds! 
		#Any heavy atom will become a carbon and any bond will become a single bond!!		#
		#===================================#
		for atom in m.GetAtoms():                                                 #
			atom.SetAtomicNum(6)                                                    #
			atom.SetFormalCharge(0)                                                #
		for bond in m.GetBonds():                                                   #
			bond.SetBondType(Chem.BondType.SINGLE)                 #
		Chem.SanitizeMol(m)                                                          #
		#===================================#
		return m

if len(sys.argv) < 2:
	print "No input file provided: Frames.py <file-to-process.sdf>"
	sys.exit(1)

suppl=Chem.SDMolSupplier(sys.argv[1])

FrameDict={}

for mol in suppl:
	if mol == 'None': continue
	try:
		m=GetFrame(mol,mode='RedScaff')
		if FrameDict.has_key(Chem.MolToSmiles(m)):
			FrameDict[Chem.MolToSmiles(m)].append(mol)
		else:
			FrameDict[Chem.MolToSmiles(m)]=[mol,]
	except:
		print '--------------------------------'
		print 'could not process the molecule with the following name:'
		print mol.GetProp('_Name')

counter=0
w=open('frames.smi','w')
d=Chem.SDWriter('frames.sdf')
for key,item in FrameDict.items():
	counter+=1
	#d=Chem.SDWriter(str(counter)+'.sdf')
	for i in item:
		i.SetProp('Scaffold',key)
		i.SetProp('Cluster',str(counter))
		d.write(i)
	print key,len(item)
	w.write(key+'\n')
w.close
print 'number of Frames found: %d' %(counter)
------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to