Following a recent brief discussion about hypervalent halogen salt handing
in RDKit (chlorates, periodates etc.) I've been thinking about my
preferences for representation of hypervalent structures in general,
including more common groups like phosphorus(V) compounds, sulfoxides,
sulfones etc., as well as how they should be sanitized by RDKit

It might be useful to have a general discussion about how RDKit should
handle these systems. A 'one size fits all' solution which everyone agrees
on is, unfortunately, likely to be quite impossible.

A brief summary of my thoughts:
- we have to use the dative bond representation for nitro compounds because
N has no accessible d-orbitals, so the hypervalent -N(=O)=O representation
is 'wrong'
- P, S, Cl (and higher congeners) do have accessible d-orbitals, so
hypervalent representations for these compounds are not intrinsically
wrong, it's a matter of convention (and interoperability) whether we use
dative bond or hypervalent representations, e.g. C[S+]([O-])C or CS(=O)C
for DMSO.

My personal preference is to use hypervalent representations in the
majority of cases, e.g.
chlorate O=Cl(=O)[O-] instead of [O-][Cl+2]([O-])[O-]
periodate O=I(=O)(=O)[O-] instead of [O-][I+3]([O-])([O-])[O-]
iodosobenzene c1ccccc1I=O instead of c1ccccc1[I+][O-]
dimethylsulfone CS(=O)(=O)C instead of C[S+2]([O-])([O-])C
trimethylphosphine oxide CP(=O)(C)C instead of C[P+]([O-]))C)C
etc. etc.

There are also a few cases which come down purely to personal preference
and I generally use these guidelines:
- salt anions have any residual negative charge on O where possible, so
thiosulfate ends up as O=S([O-])([O-])=S rather than O=S(=O)([O-])[S-]
- carbanions adjacent to sulfonyl or phosphoryl groups have the charge on
the carbon
- sulfur and phosphorus ylids are represented as charge separated, e.g
trimethylsulfonium ylide is C[S+](C)[C-] rather than CS(C)=C.

Currently, RDKit will convert the hypervalent representation of the halogen
acids into dative bond form, leave sulfur compounds untouched, and for
phosphorus only convert the 'metaphosphate' structures [C,N]=P(C)=O to
[C,N]=[P+](C)[O-].

As an experiment, I've created a modified version of MolOps.cpp which does
all of my preferred conversions above (with the exception of moving charge
in thiosulfates from S to O if the input structure was already
hypervalent). It has changes to the functions phosphorusCleanup(),
halogenCleanup(), cleanUp() and a new function sulfurCleanup(). If anyone
is interested (and with Greg's permission), I'll share a Google drive link
to the file so others can try it out.

Note that a few tests will fail with the new MolOps.cpp:
- testMMFFForceField (does some checks on dative bond forms which
presumably now get converted)
- graphmolMolOpsTest (builds perchlorates etc. and expects the result to be
in dative bond form)
- pythonTestDirChem (not sure what's wrong with this one - I can't find
what it does!)

Apologies for the length of all this...

Chris Earnshaw
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to