On Dec 27, 2010, at 5:41 AM, Greg Landrum wrote:
> Heh, I was wondering if you were going to take that one up. Knowing
> how much you enjoy (ab)using dot diconnects it seemed likely. :-)

And take it up I did. Here's the essay I just wrote about the technique.

http://dalkescientific.com/writings/diary/archive/2010/12/28/reordering_smiles.html

I managed to work around the bug for the first and second versions
of the algorithm but the workaround didn't work for the third. I
instead went over to OpenBabel for it.

One of the features I would like in a toolkit is the ability to say:

  format_atom(atom)
  format_bond(bond)

and get back the appropriate SMILES for that atom or bond. This
would include the logic for representing "[CH4]" vs "C", and
if there's a single bond between two aromatic atoms then it would
return "-" instead of "".

I ended up writing those myself, and found out that reporting
the isotope number is hard. As far as I can tell, the closest
solution is:

  mass = atom.GetMass()
  if mass == int(mass):
    print "isotope is", int(mass)
  else:
    print "isotope not specified"

but it isn't perfect since this test passes for

[Tc] [Pm] [Po] [At] [Rn] [Fr] [Ra] [Ac] [Np] [Pu] 
[Am] [Cm] [Bk] [Cf] [Es] [Fm] [Md] [No] [Lr]

Those aren't common in drugs, but it would still be nice to know
if there was a user-specified isotope number or not.


                                Andrew
                                [email protected]



------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to