On Fri, May 27, 2011 at 3:47 PM, Andrew Dalke <da...@dalkescientific.com> wrote:
> On May 27, 2011, at 1:25 PM, Greg Landrum wrote:
>> That is definitely wrong according to the Daylight theory manual:
>> "Isotopic specifications are indicated by preceding the atomic symbol
>> with a number equal to the desired integral atomic mass.
>
> Yes, and I think they are being imprecise, but since SMILES is
> meant for "normal" chemistry, it's in an area where imprecision
> doesn't make much difference.
>
> Where does it make a difference? High resolution mass spec,
> for one. The mass of 28Si is not 28.00000 but 27.9769265325.

No arguments here. But that doesn't address the [0Si] question.

> I've been looking at how RDKit handles isotopes/mass, and
> I think there are some good examples of how its current
> approach can cause confusion.

There is a lot of room for improvement in the way the RDKit handles
isotopes. (I'm being polite to myself).
When I have the free day for RDKit backend work, I need to go back and
re-examine the way this is done.

> For those who haven't reviewed the code, RDKit turns "[Si]"
> into an Atom instance with mass of 28.086, that being the
> average abundance of silicon.

correct.

> To generate the isomeric SMILES, RDKit looks at the mass.
> If it's more than 0.1 amu difference from the integral
> atomic mass (28 in this case) then it puts in the atomic
> mass. Otherwise it omits the abundance.
>
> Thus, since || 28.086 - 28 || <= 0.1
>
>  Input: [Si]    gives Output: [Si]
>
>
> Suppose I have isotopically pure silicon [28Si].
> RDKit turns this into an Atom with mass 28.0000.
> If I generate the isomeric SMILES I get that
>
>    || 28.0000 - 28 || <= 0.1
>
> which means no atomic number will be displayed
> in the output, so
>
>  Input: [28Si]    gives Output: [Si]
>
> I tested this with Pubchem compound CID 21732668.
> It has an isomeric SMILES of
>
>  F[28Si](F)(F)P([28Si](F)(F)F)[28Si](F)(F)F
>
> RDKit converts that into an isomeric SMILES of
>
>  F[Si](F)(F)P([Si](F)(F)F)[Si](F)(F)F
>
> In other words, the generated SMILES is no longer isotopically
> pure.
>
>
> I believe this is wrong.

You will get no argument from me. It's wrong.

> As it stands, the only way to tell if a given atom is supposed
> to be isotopically pure is to see if
>
>  atom.GetMass() == int(atom.GetMass())
>
> This will only fail for Tc, Pm, Po, At, and the other elements
> which have only very unstable isotopes, and hence where the
> idea of "average abundance" makes no sense.
>
>
> So for purposes of the first bit in the MACCS definition,
> I propose using something like:
>
> def has_specified_isotope(mol):
>  for atom in mol.GetAtoms():
>   mass = atom.GetMass()
>   if mass == int(mass):
>     return True
>  return False
>
>
>
>
> BTW, checking out of curiosity, I see that elements 106 (Sg)
> and higher have a isotopic mass defect which is greater than
> 0.1 amu. If RDKit supported Sg then it would always turn
>
>  Input: [Sg]    into Output: [106Sg]
>
> when making the isomeric SMILES.
>
> http://en.wikipedia.org/wiki/Isotopes_of_seaborgium
> http://en.wikipedia.org/wiki/Seaborgium
>
> PubChem does not have any of the reported Sg containing
> molecules. In fact:
>
>  Failed to decode the following as a Molecular Formula or a CID:
>  SgO3
>
> It seems that no molecule containing Sg is in PubChem.
>
>
>
>> We can agree to change it, but it's certainly consistent with what
>> Daylight says in the theory manual.
>
> The problem above arises because RDKit uses an average mass when
> no mass is specified. The object model in the manual only allows
> integer masses, and the Daylight API agrees with that. I therefore
> don't see how RDKit's behavior is consistent.

It's consistent to within roundoff error if you specify an isotope.
The theory manual says if you don't specify anything, it's
"unspecified mass". I interpreted that to mean "average atomic mass".

-greg

------------------------------------------------------------------------------
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery, 
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now. 
http://p.sf.net/sfu/quest-d2dcopy1
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to