On May 27, 2011, at 1:25 PM, Greg Landrum wrote:
> That is definitely wrong according to the Daylight theory manual:
> "Isotopic specifications are indicated by preceding the atomic symbol
> with a number equal to the desired integral atomic mass.

Yes, and I think they are being imprecise, but since SMILES is
meant for "normal" chemistry, it's in an area where imprecision
doesn't make much difference.

Where does it make a difference? High resolution mass spec,
for one. The mass of 28Si is not 28.00000 but 27.9769265325.



I've been looking at how RDKit handles isotopes/mass, and
I think there are some good examples of how its current
approach can cause confusion.

For those who haven't reviewed the code, RDKit turns "[Si]"
into an Atom instance with mass of 28.086, that being the
average abundance of silicon.

To generate the isomeric SMILES, RDKit looks at the mass.
If it's more than 0.1 amu difference from the integral
atomic mass (28 in this case) then it puts in the atomic
mass. Otherwise it omits the abundance.

Thus, since || 28.086 - 28 || <= 0.1

 Input: [Si]    gives Output: [Si]


Suppose I have isotopically pure silicon [28Si].
RDKit turns this into an Atom with mass 28.0000.
If I generate the isomeric SMILES I get that

    || 28.0000 - 28 || <= 0.1

which means no atomic number will be displayed
in the output, so

 Input: [28Si]    gives Output: [Si]


I tested this with Pubchem compound CID 21732668.
It has an isomeric SMILES of

 F[28Si](F)(F)P([28Si](F)(F)F)[28Si](F)(F)F

RDKit converts that into an isomeric SMILES of

 F[Si](F)(F)P([Si](F)(F)F)[Si](F)(F)F

In other words, the generated SMILES is no longer isotopically
pure.


I believe this is wrong.


As it stands, the only way to tell if a given atom is supposed
to be isotopically pure is to see if

 atom.GetMass() == int(atom.GetMass())

This will only fail for Tc, Pm, Po, At, and the other elements
which have only very unstable isotopes, and hence where the
idea of "average abundance" makes no sense.


So for purposes of the first bit in the MACCS definition,
I propose using something like:

def has_specified_isotope(mol):
 for atom in mol.GetAtoms():
   mass = atom.GetMass()
   if mass == int(mass):
     return True
 return False




BTW, checking out of curiosity, I see that elements 106 (Sg)
and higher have a isotopic mass defect which is greater than
0.1 amu. If RDKit supported Sg then it would always turn

 Input: [Sg]    into Output: [106Sg]

when making the isomeric SMILES.

http://en.wikipedia.org/wiki/Isotopes_of_seaborgium
http://en.wikipedia.org/wiki/Seaborgium

PubChem does not have any of the reported Sg containing
molecules. In fact:

 Failed to decode the following as a Molecular Formula or a CID:
 SgO3

It seems that no molecule containing Sg is in PubChem.



> We can agree to change it, but it's certainly consistent with what
> Daylight says in the theory manual.

The problem above arises because RDKit uses an average mass when
no mass is specified. The object model in the manual only allows
integer masses, and the Daylight API agrees with that. I therefore
don't see how RDKit's behavior is consistent.

Drop support for a default mass based on abundances and what
do you use as the default mass?



                                Andrew
                                da...@dalkescientific.com



------------------------------------------------------------------------------
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery, 
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now. 
http://p.sf.net/sfu/quest-d2dcopy1
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to