Re: [Rdkit-discuss] framework generation rdkit

markus kossner Thu, 08 Jul 2010 23:57:18 -0700

Greg Landrum wrote:

Dear Markus,


On Thu, Jul 8, 2010 at 2:56 PM, markus kossner <m.koss...@tu-bs.de> wrote:

Dear Cedric,
months ago I did some extraction of frameworks with the RDKit. I'm
neither sure, if my script follows the exact Murcko paper, nor is the
snippet well tested.
At least it could provide a starting point for your further work. I sent
an email to your personal address with the script attached.
Greg, if this is of general interest i could certainly send the code to
the discuss list.


I think it's definitely of general interest, so please send it.

-greg

O.K, here's my little script called Frames.py.

Basically what the code does is a walk from every Atom that is theneighbor of a ring away from that ring. If we can not reach anotherring on the way, the atom chain will be deleted. However, as soon as wereach another ring, the chain will become a connecting chain in thescaffold. Ring Atoms [found by GetRingInfo()] as well as connectingnon-ring chains are stored in a Chem.EditableMol-Object representing theScaffold of the molecule.

There is a function, that can return this scaffold in two forms thatdiffer in the interpretation of what a 'Scaffold' is:- with mode='RedScaff' a Scaffold will be the pure molecular coreignoring bond or atom types!this is done by running over the Scaffold (...the editable mol fromabove) and changing any atom to carbon and any bond to a single bond

- with mode='Scaff' a Scaffold will be sensitive of element and bond type.

With mode='Scaff' Cyclohexyl-, phenyl or pyridyl- mojeties will bedifferent ring species.mode='RedScaff' will interpret them as six-membered rings and make allof them become a cyclohexane in the scaffold representation.


Usage of the script:
simply supply a .sdf file after the script name.

The program will generate a .sdf file called frames.sdf containing thecluster ID of the frame it belongs to as a property tag.Additionally the raw frames are output to 'frames.smi'. This is a simpleenumeration of the frames detected.

Please note, that the data is preliminary stored in memory. This means,that very large input files may cause the consumption of large amountsof memory at runtime!


Cheers, Markus

#!/usr/bin/env python2.6
# encoding: utf-8


import os,sys
from Chem import AllChem as Chem

def flatten(x):
    """flatten(sequence) -> list
    Returns a single, flat list which contains all elements retrieved
    from the sequence and all nested sub-sequences (iterables).
    Examples:
    >>> [1, 2, [3,4], (5,6)]
    [1, 2, [3, 4], (5, 6)]
    >>> flatten([[[1,2,3], (42,None)], [4,5], [6], 7, MyVector(8,9,10)])
    [1, 2, 3, 42, None, 4, 5, 6, 7, 8, 9, 10]"""
    result = []
    for el in x:
        if hasattr(el, "__iter__") and not isinstance(el, basestring):
            result.extend(flatten(el))
        else:
            result.append(el)
    return result

def GetFrame(mol,mode='RedScaff'):
	'''return a ganeric molecule defining the reduced scaffold of the molecule.
	mode can be 'RedScaff' and 'Scaff'. The second mode will turn every atom to a carbon and every bond to a single bond!'''
	ring=mol.GetRingInfo()
	RingAtoms= flatten(ring.AtomRings())
	NonRingAtoms=[ atom.GetIdx() for atom in mol.GetAtoms() if atom.GetIdx() not in RingAtoms ]
	RingNeighbors=[]
	Paths=[]
	for NonRingAtom in NonRingAtoms:
		for neighbor in mol.GetAtomWithIdx(NonRingAtom).GetNeighbors():
			if neighbor.GetIdx() in RingAtoms:
				RingNeighbors.append(NonRingAtom)
				Paths.append([neighbor.GetIdx(),NonRingAtom]) #The ring Atoms having a non ring Nieghbor will be the start of a walk
				break
	PosLinkers=[x for x in NonRingAtoms if x not in RingNeighbors] #Only these Atoms are Possible Linkers of two rings
	Framework=[ x for x in RingAtoms ]
	Linkers=[]
	while len(Paths)>0:
		NewPaths=[]
		for P in Paths:
			if P==None:
				print 'ooh, there is still a bug somewere '
			else:
				for neighbor in mol.GetAtomWithIdx(P[-1]).GetNeighbors():
					if neighbor.GetIdx() not in P:
						if neighbor.GetIdx() in NonRingAtoms:
							n=P[:]
							n.append(neighbor.GetIdx())
							NewPaths.append(n[:])
						elif neighbor.GetIdx() in RingAtoms:
							n=P[:]
							n.append(neighbor.GetIdx())
							Linkers.append(n)
							Framework=Framework+P[:]

		Paths=NewPaths[:]
	#print 'Linkers:',Linkers
	#print 'RingAtoms:',RingAtoms
	if mode=='Scaff':
		Framework=list(set(Framework))
		todel=[]
		NonRingAtoms.sort(reverse=True)
		em=Chem.EditableMol(mol)
		BondsToAdd=[ sorted([i[0],i[-1]]) for i in Linkers ]
		mem=[]
		for i in BondsToAdd:
			if i not in mem:
				em.AddBond(i[0],i[1],Chem.BondType.SINGLE)
				mem.append(i)
		for i in NonRingAtoms:
			todel.append(i)
		for i in todel:
			em.RemoveAtom(i)
		m=em.GetMol()
		return m

	if mode=='RedScaff':
		Framework=list(set(Framework))
		todel=[]
		NonRingAtoms.sort(reverse=True)
		for i in NonRingAtoms:
			if i != None:
				if i not in Framework:
					todel.append(i)
		em=Chem.EditableMol(mol)
		for i in todel:
			em.RemoveAtom(i)
		m=em.GetMol()
		#===================================#
		#Now do the flattening of atoms and bonds! 
		#Any heavy atom will become a carbon and any bond will become a single bond!!		#
		#===================================#
		for atom in m.GetAtoms():                                                 #
			atom.SetAtomicNum(6)                                                    #
			atom.SetFormalCharge(0)                                                #
		for bond in m.GetBonds():                                                   #
			bond.SetBondType(Chem.BondType.SINGLE)                 #
		Chem.SanitizeMol(m)                                                          #
		#===================================#
		return m

if len(sys.argv) < 2:
	print "No input file provided: Frames.py <file-to-process.sdf>"
	sys.exit(1)

suppl=Chem.SDMolSupplier(sys.argv[1])

FrameDict={}

for mol in suppl:
	if mol == 'None': continue
	try:
		m=GetFrame(mol,mode='RedScaff')
		if FrameDict.has_key(Chem.MolToSmiles(m)):
			FrameDict[Chem.MolToSmiles(m)].append(mol)
		else:
			FrameDict[Chem.MolToSmiles(m)]=[mol,]
	except:
		print '--------------------------------'
		print 'could not process the molecule with the following name:'
		print mol.GetProp('_Name')

counter=0
w=open('frames.smi','w')
d=Chem.SDWriter('frames.sdf')
for key,item in FrameDict.items():
	counter+=1
	#d=Chem.SDWriter(str(counter)+'.sdf')
	for i in item:
		i.SetProp('Scaffold',key)
		i.SetProp('Cluster',str(counter))
		d.write(i)
	print key,len(item)
	w.write(key+'\n')
w.close
print 'number of Frames found: %d' %(counter)

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] framework generation rdkit

Reply via email to