Hi RDkitters,

    Calling out to .cpp rdkit hackers. I'm mostly a rdkit python programmer, 
but I'm extending the atomic_data.cpp file with some pseudo atoms. Its partly 
successful but I get problems with the depiction. The peptide/molecule 
cartridge from www.proteax.com can output condensed molfiles, where the 
standard amino acids are replaced with pseudo atoms (good for overview and 
large proteins/peptides). To be able to load them into rdkit I added the 
following to atomic_data.cpp

113    Ala    1.9    2    5.0    71.08    6    300    71.08    2 \n" 
"114    Arg    1.9    2    6.6    155.18    5    300    155.18    3 \n \ 
115    Asn    1.9    2    5.7    114.1    6    300    114.1    2 \n \ 
116    Asp    1.9    2    5.6    114.08    6    300    114.08    3 \n \ 
117    Cys    1.9    2    5.5    70.07    6    300    70.07    3 \n \ 
118    Gln    1.9    2    6.0    128.11    6    300    128.11    2 \n \ 
119    Glu    1.9    2    5.9    128.13    6    300    128.13    3 \n \ 
120    Gly    1.9    2    4.5    57.05    6    300    57.05    2 \n \ 
121    His    1.9    2    6.1    136.13    5    300    136.13    3 \n \ 
122    Ile    1.9    2    6.2    113.16    6    300    113.16    2 \n \ 
123    Leu    1.9    2    6.2    113.16    6    300    113.16    2 \n \ 
124    Lys    1.9    2    6.4    127.16    5    300    127.16    3 \n \ 
125    Met    1.9    2    6.2    131.2    6    300    131.2    2 \n \ 
126    Phe    1.9    2    6.4    147.17    6    300    147.17    2 \n \ 
127    Pro    1.9    2    5.6    97.12    6    300    97.12    2 \n \ 
128    Ser    1.9    2    5.2    87.08    6    300    87.08    2 \n \ 
129    Thr    1.9    2    5.6    101.1    6    300    101.1    2 \n \ 
130    Trp    1.9    2    6.8    186.21    6    300    186.21    2 \n \ 
131    Tyr    1.9    2    6.5    162.17    6    300    162.17    3 \n \ 
132    Val    1.9    2    5.9    99.13    6    300    99.13    2 \n ";


After recompile, I can load the condensed mol files (attached an example). 
Smarts and Smiles are not fully supported, but I can refer directly to the 
atomic number to get around that e.g. 

mol = Chem.MolFromSmarts('[#128]')

Unfortunately the drawings of the mols gets wrong symbols for atomic numbers 
above #127. They are OK inside the python session.

mol.GetAtoms()[0].GetSymbol() returns the correct symbol 'Ser'

A problem is that in depictions the atomic number gets "wrapped" so it comes 
out as '*'

I traced it down to a conversion in MolDrawing.py


mol = Chem.Mol(mol.ToBinary())
and it can be reproduced in the python session:


mol = Chem.Mol(mol.ToBinary())
mol.GetAtoms()[0].GetSymbol()

returns '*' which is the pseudo atom with atomic number 0.

My guess is that it has something to do with the atomic number being limited to 
7 bit in the ToBinary() function? 128 = 10000000 => 0000000 = 0? Any .cpp guru 
who knows the ToBinary() who can help and suggest a solution?

Also it would be best to start the atomic numbering of the pseudo atoms from 
147 so that it matches MDL ISIS draw and Proteax pseudo atom numbering, but 
RDkit assumes that the atomic numbering are forth running. I guess its possible 
to add a lot of placeholders in between 112 and 147, but is there another 
solution?


Bonus question: What is the rB0 in the atomic_data.cpp file? 

P/S. don't use the above pseudo atoms for anything but tested, it need more 
testing with regard to the behavior of the combinations of outer shell 
electrons and allowed valences.

Esben Jannik Bjerrum
cand.pharm, Ph.D

/Sent from my Ubuntu Touch Phone

Phone +45 2823 8009
http://dk.linkedin.com/in/esbenbjerrum
http://www.wildcardconsulting.dk

Attachment: condensed_molfile.mol
Description: Binary data

------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to