Hi George,

Don’t quote me on this but I’m guessing the reason Indigo does better is 
probably because they parse the input coordinates and wedge/hatch labels to the 
InChI API where as RDKit sets the winding. That is in Indigo’s case - InChI 
will look at the depiction and interpret where the stereo centres are whilst 
RDKit tells it explicitly. Basically RDKit is actually round tripping through 
it’s object model it whilst Indigo isn’t.

Anyways a little on InChI and stereochemistry….

Tl;DR; 200/90,000 (0.2%) ain’t bad.

When stereochemistry is validated (for 
https://www.ebi.ac.uk/chembl/compound/inspect/CHEMBL287254) the tetrahedral 
centre on the ring will be removed - it does not have a configuration. In that 
depiction it doesn’t matter whether the bond is up or down because the centre 
is dependant on the configuration of the other stereo centre in the ring. Since 
the other stereo centre doesn’t have a configuration it doesn’t mater want 
configuration this one has. Formally, a stereo centre is not a stereo centre if 
there is a permutation that inverts only it’s [the stereo centre] 
configuration. Clearly this is only the case when both have a configuration.

For a more concise example consider these structures.

InChI=1S/C8H16/c1-7-3-5-8(2)6-4-7/h7-8H,3-6H2,1-2H3/t7-,8?
InChI=1S/C8H16/c1-7-3-5-8(2)6-4-7/h7-8H,3-6H2,1-2H3

Load the first one into your favourite structure diagram editor and invert the 
wedge/hatch bond. Generating a new InChI will give the same InChI string. 
Hmm... but didn't inverted the stereo centre? If we got the same InChI it must 
not matter configuration it is. But the InChI does encode this as seen above.  
Clearly we do want the two stereoisomers to be different but I’m not sure how 
useful it is that the above two are not the same.

InChI=1S/C8H16/c1-7-3-5-8(2)6-4-7/h7-8H,3-6H2,1-2H3/t7-,8+
InChI=1S/C8H16/c1-7-3-5-8(2)6-4-7/h7-8H,3-6H2,1-2H3/t7-,8-

Anyways this isn’t really a problem with RDKit, check out the OEChem release 
notes: 
http://www.eyesopen.com/docs/toolkits/current/html/OEChem_TK-csharp/releasenotes/version1_9_2.html

They did have page showing exactly what they disagree with but that seems to of 
gone missing… thankfully PubChem also do it :-)…

CID 2375263 :  
InChI=1S/C17H11Cl3FN3/c1-10-14(9-22-12-4-7-16(21)15(19)8-12)17(20)24(23-10)13-5-2-11(18)3-6-13/h2-9H,1H3
  (notice standard InChI = 1S). 

If you feed InChI the depiction it will add a stereo configuration for the 
double bond so you’ll get one of the following (depending on depiction)

InChI=1S/C17H11Cl3FN3/c1-10-14(9-22-12-4-7-16(21)15(19)8-12)17(20)24(23-10)13-5-2-11(18)3-6-13/h2-9H,1H3/b22-9+
InChI=1S/C17H11Cl3FN3/c1-10-14(9-22-12-4-7-16(21)15(19)8-12)17(20)24(23-10)13-5-2-11(18)3-6-13/h2-9H,1H3/b22-9-

Testing last august I found ~250,000 (0.5%) differences in PubChem-Compound. 
The InChI is great but it’s not perfect and there will always be differences 
based on what toolkits agree on. There is of course an argument that the InChI 
is something they can agree on… InChI version 2 is where things get get really 
fun.

J

On 30 Jan 2014, at 19:55, George Papadatos <[email protected]> wrote:

> I agree; that's why I tried to minimise 'doctoring' as much as I could in 
> this case. 
> George
> 
> 
> On 30 January 2014 19:46, Dimitri Maziuk <[email protected]> wrote:
> On 01/30/2014 01:07 PM, George Papadatos wrote:
> > OK just to add some fuel to this fire: A colleague of mine and I looked at
> > the inchi roundtrip using KNIME 2.9 and the latest versions of indigo and
> > rdkit nodes.
> 
> > Rdkit had 10 times more discrepancies
> 
> If it's any consolation OpenBabel stereo perception does not do CIP
> ordering so any input that didn't have correct stereochemistry or it was
> removed during whatever processing you did, its output InChi will have a
> wrong stereo layer. I expect with properly doctored input you'll get
> 100% discrepancies there.
> 
> --
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
> 
> 
> ------------------------------------------------------------------------------
> WatchGuard Dimension instantly turns raw network data into actionable
> security intelligence. It gives you real-time visual feedback on key
> security issues and trends.  Skip the complicated setup - simply import
> a virtual appliance and go from zero to informed in seconds.
> http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
> _______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 
> 
> ------------------------------------------------------------------------------
> WatchGuard Dimension instantly turns raw network data into actionable 
> security intelligence. It gives you real-time visual feedback on key
> security issues and trends.  Skip the complicated setup - simply import
> a virtual appliance and go from zero to informed in seconds.
> http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk_______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to