On Fri, May 27, 2011 at 3:47 PM, Andrew Dalke <da...@dalkescientific.com> wrote: > On May 27, 2011, at 1:25 PM, Greg Landrum wrote: >> That is definitely wrong according to the Daylight theory manual: >> "Isotopic specifications are indicated by preceding the atomic symbol >> with a number equal to the desired integral atomic mass. > > Yes, and I think they are being imprecise, but since SMILES is > meant for "normal" chemistry, it's in an area where imprecision > doesn't make much difference. > > Where does it make a difference? High resolution mass spec, > for one. The mass of 28Si is not 28.00000 but 27.9769265325.
No arguments here. But that doesn't address the [0Si] question. > I've been looking at how RDKit handles isotopes/mass, and > I think there are some good examples of how its current > approach can cause confusion. There is a lot of room for improvement in the way the RDKit handles isotopes. (I'm being polite to myself). When I have the free day for RDKit backend work, I need to go back and re-examine the way this is done. > For those who haven't reviewed the code, RDKit turns "[Si]" > into an Atom instance with mass of 28.086, that being the > average abundance of silicon. correct. > To generate the isomeric SMILES, RDKit looks at the mass. > If it's more than 0.1 amu difference from the integral > atomic mass (28 in this case) then it puts in the atomic > mass. Otherwise it omits the abundance. > > Thus, since || 28.086 - 28 || <= 0.1 > > Input: [Si] gives Output: [Si] > > > Suppose I have isotopically pure silicon [28Si]. > RDKit turns this into an Atom with mass 28.0000. > If I generate the isomeric SMILES I get that > > || 28.0000 - 28 || <= 0.1 > > which means no atomic number will be displayed > in the output, so > > Input: [28Si] gives Output: [Si] > > I tested this with Pubchem compound CID 21732668. > It has an isomeric SMILES of > > F[28Si](F)(F)P([28Si](F)(F)F)[28Si](F)(F)F > > RDKit converts that into an isomeric SMILES of > > F[Si](F)(F)P([Si](F)(F)F)[Si](F)(F)F > > In other words, the generated SMILES is no longer isotopically > pure. > > > I believe this is wrong. You will get no argument from me. It's wrong. > As it stands, the only way to tell if a given atom is supposed > to be isotopically pure is to see if > > atom.GetMass() == int(atom.GetMass()) > > This will only fail for Tc, Pm, Po, At, and the other elements > which have only very unstable isotopes, and hence where the > idea of "average abundance" makes no sense. > > > So for purposes of the first bit in the MACCS definition, > I propose using something like: > > def has_specified_isotope(mol): > for atom in mol.GetAtoms(): > mass = atom.GetMass() > if mass == int(mass): > return True > return False > > > > > BTW, checking out of curiosity, I see that elements 106 (Sg) > and higher have a isotopic mass defect which is greater than > 0.1 amu. If RDKit supported Sg then it would always turn > > Input: [Sg] into Output: [106Sg] > > when making the isomeric SMILES. > > http://en.wikipedia.org/wiki/Isotopes_of_seaborgium > http://en.wikipedia.org/wiki/Seaborgium > > PubChem does not have any of the reported Sg containing > molecules. In fact: > > Failed to decode the following as a Molecular Formula or a CID: > SgO3 > > It seems that no molecule containing Sg is in PubChem. > > > >> We can agree to change it, but it's certainly consistent with what >> Daylight says in the theory manual. > > The problem above arises because RDKit uses an average mass when > no mass is specified. The object model in the manual only allows > integer masses, and the Daylight API agrees with that. I therefore > don't see how RDKit's behavior is consistent. It's consistent to within roundoff error if you specify an isotope. The theory manual says if you don't specify anything, it's "unspecified mass". I interpreted that to mean "average atomic mass". -greg ------------------------------------------------------------------------------ vRanger cuts backup time in half-while increasing security. With the market-leading solution for virtual backup and recovery, you get blazing-fast, flexible, and affordable data protection. Download your free trial now. http://p.sf.net/sfu/quest-d2dcopy1 _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss