Hi RDkitters,
Calling out to .cpp rdkit hackers. I'm mostly a rdkit python programmer,
but I'm extending the atomic_data.cpp file with some pseudo atoms. Its partly
successful but I get problems with the depiction. The peptide/molecule
cartridge from www.proteax.com can output condensed molfiles, where the
standard amino acids are replaced with pseudo atoms (good for overview and
large proteins/peptides). To be able to load them into rdkit I added the
following to atomic_data.cpp
113 Ala 1.9 2 5.0 71.08 6 300 71.08 2 \n"
"114 Arg 1.9 2 6.6 155.18 5 300 155.18 3 \n \
115 Asn 1.9 2 5.7 114.1 6 300 114.1 2 \n \
116 Asp 1.9 2 5.6 114.08 6 300 114.08 3 \n \
117 Cys 1.9 2 5.5 70.07 6 300 70.07 3 \n \
118 Gln 1.9 2 6.0 128.11 6 300 128.11 2 \n \
119 Glu 1.9 2 5.9 128.13 6 300 128.13 3 \n \
120 Gly 1.9 2 4.5 57.05 6 300 57.05 2 \n \
121 His 1.9 2 6.1 136.13 5 300 136.13 3 \n \
122 Ile 1.9 2 6.2 113.16 6 300 113.16 2 \n \
123 Leu 1.9 2 6.2 113.16 6 300 113.16 2 \n \
124 Lys 1.9 2 6.4 127.16 5 300 127.16 3 \n \
125 Met 1.9 2 6.2 131.2 6 300 131.2 2 \n \
126 Phe 1.9 2 6.4 147.17 6 300 147.17 2 \n \
127 Pro 1.9 2 5.6 97.12 6 300 97.12 2 \n \
128 Ser 1.9 2 5.2 87.08 6 300 87.08 2 \n \
129 Thr 1.9 2 5.6 101.1 6 300 101.1 2 \n \
130 Trp 1.9 2 6.8 186.21 6 300 186.21 2 \n \
131 Tyr 1.9 2 6.5 162.17 6 300 162.17 3 \n \
132 Val 1.9 2 5.9 99.13 6 300 99.13 2 \n ";
After recompile, I can load the condensed mol files (attached an example).
Smarts and Smiles are not fully supported, but I can refer directly to the
atomic number to get around that e.g.
mol = Chem.MolFromSmarts('[#128]')
Unfortunately the drawings of the mols gets wrong symbols for atomic numbers
above #127. They are OK inside the python session.
mol.GetAtoms()[0].GetSymbol() returns the correct symbol 'Ser'
A problem is that in depictions the atomic number gets "wrapped" so it comes
out as '*'
I traced it down to a conversion in MolDrawing.py
mol = Chem.Mol(mol.ToBinary())
and it can be reproduced in the python session:
mol = Chem.Mol(mol.ToBinary())
mol.GetAtoms()[0].GetSymbol()
returns '*' which is the pseudo atom with atomic number 0.
My guess is that it has something to do with the atomic number being limited to
7 bit in the ToBinary() function? 128 = 10000000 => 0000000 = 0? Any .cpp guru
who knows the ToBinary() who can help and suggest a solution?
Also it would be best to start the atomic numbering of the pseudo atoms from
147 so that it matches MDL ISIS draw and Proteax pseudo atom numbering, but
RDkit assumes that the atomic numbering are forth running. I guess its possible
to add a lot of placeholders in between 112 and 147, but is there another
solution?
Bonus question: What is the rB0 in the atomic_data.cpp file?
P/S. don't use the above pseudo atoms for anything but tested, it need more
testing with regard to the behavior of the combinations of outer shell
electrons and allowed valences.
Esben Jannik Bjerrum
cand.pharm, Ph.D
/Sent from my Ubuntu Touch Phone
Phone +45 2823 8009
http://dk.linkedin.com/in/esbenbjerrum
http://www.wildcardconsulting.dk
condensed_molfile.mol
Description: Binary data
------------------------------------------------------------------------------
_______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

