Glad there's some interest!
At the moment the default behaviour of RDKit would be to leave the input
representation of your structures unchanged.
My modified code would change a dative bond representation of the sulfinate
to hypervalent (C[S+]([O-])[O-] -> CS(=O)[O-], but would do nothing with
the P-H phosphinic acid. Ideally it should convert C[PH+](*)[O-] to
C[PH](*)=O but doesn't - I'll try to have a look at this.
Of course there's a tautomerization question for these phosphorus compounds
as well (I did quite a lot of work with some of them decades ago when I was
a 'real' chemist!) and you may well want to convert e.g. CP(O)(O) to
C[PH](=O)O at some point. That's a separate issue from sanitization -
probably best handled with a specific reaction transformation when you need
On 23 November 2017 at 17:49, Stephen Roughley <s.d.rough...@googlemail.com>
> This is great! Maybe this should be an option in MolSanitize? Also, a
> couple of others to watch out for - phosphinic acids (e.g. C[PH]=O ) and
> sulfinic acids (e.g. CS(=O)[OH]).
> I'm not sure what RDKit currently does with those, but it would be worth
> incorporating them into any solution / test set?
> On Thu, Nov 23, 2017 at 5:27 PM, Chris Earnshaw <cgearns...@gmail.com>
>> Following a recent brief discussion about hypervalent halogen salt
>> handing in RDKit (chlorates, periodates etc.) I've been thinking about my
>> preferences for representation of hypervalent structures in general,
>> including more common groups like phosphorus(V) compounds, sulfoxides,
>> sulfones etc., as well as how they should be sanitized by RDKit
>> It might be useful to have a general discussion about how RDKit should
>> handle these systems. A 'one size fits all' solution which everyone agrees
>> on is, unfortunately, likely to be quite impossible.
>> A brief summary of my thoughts:
>> - we have to use the dative bond representation for nitro compounds
>> because N has no accessible d-orbitals, so the hypervalent -N(=O)=O
>> representation is 'wrong'
>> - P, S, Cl (and higher congeners) do have accessible d-orbitals, so
>> hypervalent representations for these compounds are not intrinsically
>> wrong, it's a matter of convention (and interoperability) whether we use
>> dative bond or hypervalent representations, e.g. C[S+]([O-])C or CS(=O)C
>> for DMSO.
>> My personal preference is to use hypervalent representations in the
>> majority of cases, e.g.
>> chlorate O=Cl(=O)[O-] instead of [O-][Cl+2]([O-])[O-]
>> periodate O=I(=O)(=O)[O-] instead of [O-][I+3]([O-])([O-])[O-]
>> iodosobenzene c1ccccc1I=O instead of c1ccccc1[I+][O-]
>> dimethylsulfone CS(=O)(=O)C instead of C[S+2]([O-])([O-])C
>> trimethylphosphine oxide CP(=O)(C)C instead of C[P+]([O-]))C)C
>> etc. etc.
>> There are also a few cases which come down purely to personal preference
>> and I generally use these guidelines:
>> - salt anions have any residual negative charge on O where possible, so
>> thiosulfate ends up as O=S([O-])([O-])=S rather than O=S(=O)([O-])[S-]
>> - carbanions adjacent to sulfonyl or phosphoryl groups have the charge on
>> the carbon
>> - sulfur and phosphorus ylids are represented as charge separated, e.g
>> trimethylsulfonium ylide is C[S+](C)[C-] rather than CS(C)=C.
>> Currently, RDKit will convert the hypervalent representation of the
>> halogen acids into dative bond form, leave sulfur compounds untouched, and
>> for phosphorus only convert the 'metaphosphate' structures [C,N]=P(C)=O to
>> As an experiment, I've created a modified version of MolOps.cpp which
>> does all of my preferred conversions above (with the exception of moving
>> charge in thiosulfates from S to O if the input structure was already
>> hypervalent). It has changes to the functions phosphorusCleanup(),
>> halogenCleanup(), cleanUp() and a new function sulfurCleanup(). If anyone
>> is interested (and with Greg's permission), I'll share a Google drive link
>> to the file so others can try it out.
>> Note that a few tests will fail with the new MolOps.cpp:
>> - testMMFFForceField (does some checks on dative bond forms which
>> presumably now get converted)
>> - graphmolMolOpsTest (builds perchlorates etc. and expects the result to
>> be in dative bond form)
>> - pythonTestDirChem (not sure what's wrong with this one - I can't find
>> what it does!)
>> Apologies for the length of all this...
>> Chris Earnshaw
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> Rdkit-discuss mailing list
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Rdkit-discuss mailing list