On May 27, 2011, at 1:25 PM, Greg Landrum wrote: > That is definitely wrong according to the Daylight theory manual: > "Isotopic specifications are indicated by preceding the atomic symbol > with a number equal to the desired integral atomic mass.
Yes, and I think they are being imprecise, but since SMILES is meant for "normal" chemistry, it's in an area where imprecision doesn't make much difference. Where does it make a difference? High resolution mass spec, for one. The mass of 28Si is not 28.00000 but 27.9769265325. I've been looking at how RDKit handles isotopes/mass, and I think there are some good examples of how its current approach can cause confusion. For those who haven't reviewed the code, RDKit turns "[Si]" into an Atom instance with mass of 28.086, that being the average abundance of silicon. To generate the isomeric SMILES, RDKit looks at the mass. If it's more than 0.1 amu difference from the integral atomic mass (28 in this case) then it puts in the atomic mass. Otherwise it omits the abundance. Thus, since || 28.086 - 28 || <= 0.1 Input: [Si] gives Output: [Si] Suppose I have isotopically pure silicon [28Si]. RDKit turns this into an Atom with mass 28.0000. If I generate the isomeric SMILES I get that || 28.0000 - 28 || <= 0.1 which means no atomic number will be displayed in the output, so Input: [28Si] gives Output: [Si] I tested this with Pubchem compound CID 21732668. It has an isomeric SMILES of F[28Si](F)(F)P([28Si](F)(F)F)[28Si](F)(F)F RDKit converts that into an isomeric SMILES of F[Si](F)(F)P([Si](F)(F)F)[Si](F)(F)F In other words, the generated SMILES is no longer isotopically pure. I believe this is wrong. As it stands, the only way to tell if a given atom is supposed to be isotopically pure is to see if atom.GetMass() == int(atom.GetMass()) This will only fail for Tc, Pm, Po, At, and the other elements which have only very unstable isotopes, and hence where the idea of "average abundance" makes no sense. So for purposes of the first bit in the MACCS definition, I propose using something like: def has_specified_isotope(mol): for atom in mol.GetAtoms(): mass = atom.GetMass() if mass == int(mass): return True return False BTW, checking out of curiosity, I see that elements 106 (Sg) and higher have a isotopic mass defect which is greater than 0.1 amu. If RDKit supported Sg then it would always turn Input: [Sg] into Output: [106Sg] when making the isomeric SMILES. http://en.wikipedia.org/wiki/Isotopes_of_seaborgium http://en.wikipedia.org/wiki/Seaborgium PubChem does not have any of the reported Sg containing molecules. In fact: Failed to decode the following as a Molecular Formula or a CID: SgO3 It seems that no molecule containing Sg is in PubChem. > We can agree to change it, but it's certainly consistent with what > Daylight says in the theory manual. The problem above arises because RDKit uses an average mass when no mass is specified. The object model in the manual only allows integer masses, and the Daylight API agrees with that. I therefore don't see how RDKit's behavior is consistent. Drop support for a default mass based on abundances and what do you use as the default mass? Andrew da...@dalkescientific.com ------------------------------------------------------------------------------ vRanger cuts backup time in half-while increasing security. With the market-leading solution for virtual backup and recovery, you get blazing-fast, flexible, and affordable data protection. Download your free trial now. http://p.sf.net/sfu/quest-d2dcopy1 _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss