Hello everyone,

I am using RDKit for a while now. My focus is the transformation of 
molecules in simplified reduced forms. With the help of SMARTS I specify 
molecular substructures and pattern to transform these parts into 
pseudoatoms. Afterwards I would like to get the maximum common 
substructures out of the reduced graphs. These is done to compare 
molecules and their MCS in small datasets of molecules.

SMILES string:

CCC(C)C(C(=O)O)N    Isoleucin      reduced form: 
[Zn][Zn][Zn]([Zn])[Zn]([Nb])[Mo]

CC(C)CC(C(=O)O)N    Leucin        reduced form: 
[Zn][Zn]([Zn])[Zn][Zn]([Nb])[Mo]


I would like to know how to group together Zn-Zn-Zn- ... as a single 
-Zn- atom in the reduced graph. Because these linker atoms (Zn) are only 
carbon atoms which can be compressed together. The number of the linkers 
doesn't play a role in the reduced form and gives false results by 
comparing the reduced graphs one below the other when they have a 
different length of carbon atoms next to each other.

I started simplying my molecules into reduced graph with the following 
code:

#---------------------------------------------------
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import Draw
print "\n" , "Module erfolgreich importiert" , "\n"


def molecule(smiles):
     mol = Chem.MolFromSmiles(smiles)
     Draw.MolToFile(mol, 'pictures/molecule.png')
     file = open('pseudo_negative_ionizable','r')
     lines = file.readlines()
     file.close()

     for line in lines:
         repl = Chem.MolFromSmarts( line )
         pseudo = Chem.MolFromSmarts('[Mo]')
         mol_new = AllChem.ReplaceSubstructs(mol, repl, pseudo, True)
         mol_new_smi = Chem.MolToSmiles(mol_new[0])
         print mol_new_smi
         mol = Chem.MolFromSmiles(mol_new_smi)

.....definition of every pseudoatom in SMARTS semantic


Now, I want instead of many -Zn-Zn- atoms, only one -Zn- atom to 
represent in my result.


Another problem I have is when I transform bigger molecules into reduced 
forms  and they include some ringsystems rdkit functions plit these 
molecules. The output is a molecule spilt by a dot. Is there any 
possibility to avoid it and keep the molecule (reduced form) together?

Example:

O=C(Nc1ccc(-n2ccccc2=O)cn1)C1CC(O)(c2ccccc2Cl)CN1C(=O)Nc1ccc(Cl)cc1

--> reduced form: [Sc].[V].[Co].[Zn].[Zn].[Sc][Co][Ni].[V][Co][Ni][Hf]

I am very much looking forward towards your help,

Thanks & regards,
Jessica

------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to