Hi George,
Don’t quote me on this but I’m guessing the reason Indigo does better is
probably because they parse the input coordinates and wedge/hatch labels to the
InChI API where as RDKit sets the winding. That is in Indigo’s case - InChI
will look at the depiction and interpret where the stereo centres are whilst
RDKit tells it explicitly. Basically RDKit is actually round tripping through
it’s object model it whilst Indigo isn’t.
Anyways a little on InChI and stereochemistry….
Tl;DR; 200/90,000 (0.2%) ain’t bad.
When stereochemistry is validated (for
https://www.ebi.ac.uk/chembl/compound/inspect/CHEMBL287254) the tetrahedral
centre on the ring will be removed - it does not have a configuration. In that
depiction it doesn’t matter whether the bond is up or down because the centre
is dependant on the configuration of the other stereo centre in the ring. Since
the other stereo centre doesn’t have a configuration it doesn’t mater want
configuration this one has. Formally, a stereo centre is not a stereo centre if
there is a permutation that inverts only it’s [the stereo centre]
configuration. Clearly this is only the case when both have a configuration.
For a more concise example consider these structures.
InChI=1S/C8H16/c1-7-3-5-8(2)6-4-7/h7-8H,3-6H2,1-2H3/t7-,8?
InChI=1S/C8H16/c1-7-3-5-8(2)6-4-7/h7-8H,3-6H2,1-2H3
Load the first one into your favourite structure diagram editor and invert the
wedge/hatch bond. Generating a new InChI will give the same InChI string.
Hmm... but didn't inverted the stereo centre? If we got the same InChI it must
not matter configuration it is. But the InChI does encode this as seen above.
Clearly we do want the two stereoisomers to be different but I’m not sure how
useful it is that the above two are not the same.
InChI=1S/C8H16/c1-7-3-5-8(2)6-4-7/h7-8H,3-6H2,1-2H3/t7-,8+
InChI=1S/C8H16/c1-7-3-5-8(2)6-4-7/h7-8H,3-6H2,1-2H3/t7-,8-
Anyways this isn’t really a problem with RDKit, check out the OEChem release
notes:
http://www.eyesopen.com/docs/toolkits/current/html/OEChem_TK-csharp/releasenotes/version1_9_2.html
They did have page showing exactly what they disagree with but that seems to of
gone missing… thankfully PubChem also do it :-)…
CID 2375263 :
InChI=1S/C17H11Cl3FN3/c1-10-14(9-22-12-4-7-16(21)15(19)8-12)17(20)24(23-10)13-5-2-11(18)3-6-13/h2-9H,1H3
(notice standard InChI = 1S).
If you feed InChI the depiction it will add a stereo configuration for the
double bond so you’ll get one of the following (depending on depiction)
InChI=1S/C17H11Cl3FN3/c1-10-14(9-22-12-4-7-16(21)15(19)8-12)17(20)24(23-10)13-5-2-11(18)3-6-13/h2-9H,1H3/b22-9+
InChI=1S/C17H11Cl3FN3/c1-10-14(9-22-12-4-7-16(21)15(19)8-12)17(20)24(23-10)13-5-2-11(18)3-6-13/h2-9H,1H3/b22-9-
Testing last august I found ~250,000 (0.5%) differences in PubChem-Compound.
The InChI is great but it’s not perfect and there will always be differences
based on what toolkits agree on. There is of course an argument that the InChI
is something they can agree on… InChI version 2 is where things get get really
fun.
J
On 30 Jan 2014, at 19:55, George Papadatos <[email protected]> wrote:
> I agree; that's why I tried to minimise 'doctoring' as much as I could in
> this case.
> George
>
>
> On 30 January 2014 19:46, Dimitri Maziuk <[email protected]> wrote:
> On 01/30/2014 01:07 PM, George Papadatos wrote:
> > OK just to add some fuel to this fire: A colleague of mine and I looked at
> > the inchi roundtrip using KNIME 2.9 and the latest versions of indigo and
> > rdkit nodes.
>
> > Rdkit had 10 times more discrepancies
>
> If it's any consolation OpenBabel stereo perception does not do CIP
> ordering so any input that didn't have correct stereochemistry or it was
> removed during whatever processing you did, its output InChi will have a
> wrong stereo layer. I expect with properly doctored input you'll get
> 100% discrepancies there.
>
> --
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
>
>
> ------------------------------------------------------------------------------
> WatchGuard Dimension instantly turns raw network data into actionable
> security intelligence. It gives you real-time visual feedback on key
> security issues and trends. Skip the complicated setup - simply import
> a virtual appliance and go from zero to informed in seconds.
> http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
> _______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
> ------------------------------------------------------------------------------
> WatchGuard Dimension instantly turns raw network data into actionable
> security intelligence. It gives you real-time visual feedback on key
> security issues and trends. Skip the complicated setup - simply import
> a virtual appliance and go from zero to informed in seconds.
> http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk_______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends. Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss