OK just to add some fuel to this fire: A colleague of mine and I looked at
the inchi roundtrip using KNIME 2.9 and the latest versions of indigo and
rdkit nodes. We used ~90,000 inchis from chembl_17, converted them to mols
(sanitise + remove Hs), removed the ones that fail to convert, and then we
converted back to inchis (standard ones, no extra parameters). We assessed
the discrepancies between indigo and rdkit inchis compared to the original
input inchis that are stored in chembl.
Rdkit had 10 times more discrepancies with 200 failures as opposed to 21
from indigo. This rate (~0.2%) was also confirmed using ~1 million inchis.

I had a closer look to a couple of cases here:
http://nbviewer.ipython.org/gist/madgpap/8715974

It seems that there is more that one reason for the failure. I totally
understand Greg's caution about the inchi2mol conversion, but given the
difference between rdkit and indigo, there might room for improvement. Any
insights would be very much appreciated.

Btw, the KNIME workflow and full list of fails are available to you.

Cheers,

George



On 30 January 2014 04:11, Greg Landrum <[email protected]> wrote:

> Yeah, I have been tempted several times to remove the InChI->RDKit
> functionality entirely
>
>
>
> On Thu, Jan 30, 2014 at 5:05 AM, Igor Filippov 
> <[email protected]>wrote:
>
>> Thank you, Greg!
>> Very nice explanation and I think this issue has confused people before
>> me as well. I am going to have to keep reminding myself about it as the
>> subject comes up every now and then.
>>
>> Igor
>> On Jan 29, 2014 10:59 PM, "Greg Landrum" <[email protected]> wrote:
>>
>>> Hi Igor,
>>>
>>> On Wed, Jan 29, 2014 at 2:04 PM, Igor Filippov <
>>> [email protected]> wrote:
>>>
>>>> Greg et al,
>>>>
>>>> Here is a little script that demonstrates a problem with fingerprints
>>>> after the roundtrip through InChI.
>>>> My input mol file is also attached.
>>>> As you can see the similarity between "before" and "after" is not 1 in
>>>> 45 out of 100 cases.
>>>> In one case it is as low as 0.29. Could someone take a look and tell me
>>>> what I'm doing wrong?
>>>>
>>>
>>> Ah! Now I see what you're doing and understand the problem.
>>>
>>> It's really important when using InChI to remember that InChI is
>>> designed to be an identifier, not an interchange format. The InChI
>>> algorithm modifies the molecule as part of its canonicalization step. This
>>> modification includes standardizing tautomers.
>>>
>>> Here's an example of the type of substructure modification that happens
>>> in your molecules:
>>> input smiles c1ccccc1C(=O)Nc1ccccc1 on begin converted to InChI and back
>>> yields: OC(=Nc1ccccc1)c1ccccc1
>>>
>>> Basically: If you think you know what your molecules are, you probably
>>> should be building them from SMILES or CTAB, not InChI.
>>>
>>> Apologies that I didn't think of this before; I was just focusing on the
>>> stereochemistry.
>>>
>>> -greg
>>>
>>
>
>
> ------------------------------------------------------------------------------
> WatchGuard Dimension instantly turns raw network data into actionable
> security intelligence. It gives you real-time visual feedback on key
> security issues and trends.  Skip the complicated setup - simply import
> a virtual appliance and go from zero to informed in seconds.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
> _______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to