Hi Igor,
Thanks for the quick reply.
I just did in my workflow. The number of discrepancies increased from 200
to 950 :(
George


On 30 January 2014 19:19, Igor Filippov <[email protected]> wrote:

> George,
>
> Have you added coordinates to the mols converted from InChI?
> It made a huge difference for the examples I've tried.
>
> Igor
>
>
> On Thu, Jan 30, 2014 at 2:07 PM, George Papadatos <[email protected]>wrote:
>
>> OK just to add some fuel to this fire: A colleague of mine and I looked
>> at the inchi roundtrip using KNIME 2.9 and the latest versions of indigo
>> and rdkit nodes. We used ~90,000 inchis from chembl_17, converted them to
>> mols (sanitise + remove Hs), removed the ones that fail to convert, and
>> then we converted back to inchis (standard ones, no extra parameters). We
>> assessed the discrepancies between indigo and rdkit inchis compared to the
>> original input inchis that are stored in chembl.
>> Rdkit had 10 times more discrepancies with 200 failures as opposed to 21
>> from indigo. This rate (~0.2%) was also confirmed using ~1 million inchis.
>>
>> I had a closer look to a couple of cases here:
>> http://nbviewer.ipython.org/gist/madgpap/8715974
>>
>> It seems that there is more that one reason for the failure. I totally
>> understand Greg's caution about the inchi2mol conversion, but given the
>> difference between rdkit and indigo, there might room for improvement. Any
>> insights would be very much appreciated.
>>
>> Btw, the KNIME workflow and full list of fails are available to you.
>>
>> Cheers,
>>
>> George
>>
>>
>>
>> On 30 January 2014 04:11, Greg Landrum <[email protected]> wrote:
>>
>>> Yeah, I have been tempted several times to remove the InChI->RDKit
>>> functionality entirely
>>>
>>>
>>>
>>> On Thu, Jan 30, 2014 at 5:05 AM, Igor Filippov <
>>> [email protected]> wrote:
>>>
>>>> Thank you, Greg!
>>>> Very nice explanation and I think this issue has confused people before
>>>> me as well. I am going to have to keep reminding myself about it as the
>>>> subject comes up every now and then.
>>>>
>>>> Igor
>>>> On Jan 29, 2014 10:59 PM, "Greg Landrum" <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Igor,
>>>>>
>>>>> On Wed, Jan 29, 2014 at 2:04 PM, Igor Filippov <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Greg et al,
>>>>>>
>>>>>> Here is a little script that demonstrates a problem with fingerprints
>>>>>> after the roundtrip through InChI.
>>>>>> My input mol file is also attached.
>>>>>> As you can see the similarity between "before" and "after" is not 1
>>>>>> in 45 out of 100 cases.
>>>>>> In one case it is as low as 0.29. Could someone take a look and tell
>>>>>> me what I'm doing wrong?
>>>>>>
>>>>>
>>>>> Ah! Now I see what you're doing and understand the problem.
>>>>>
>>>>> It's really important when using InChI to remember that InChI is
>>>>> designed to be an identifier, not an interchange format. The InChI
>>>>> algorithm modifies the molecule as part of its canonicalization step. This
>>>>> modification includes standardizing tautomers.
>>>>>
>>>>> Here's an example of the type of substructure modification that
>>>>> happens in your molecules:
>>>>> input smiles c1ccccc1C(=O)Nc1ccccc1 on begin converted to InChI and
>>>>> back yields: OC(=Nc1ccccc1)c1ccccc1
>>>>>
>>>>> Basically: If you think you know what your molecules are, you probably
>>>>> should be building them from SMILES or CTAB, not InChI.
>>>>>
>>>>> Apologies that I didn't think of this before; I was just focusing on
>>>>> the stereochemistry.
>>>>>
>>>>> -greg
>>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> WatchGuard Dimension instantly turns raw network data into actionable
>>> security intelligence. It gives you real-time visual feedback on key
>>> security issues and trends.  Skip the complicated setup - simply import
>>> a virtual appliance and go from zero to informed in seconds.
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Rdkit-discuss mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> WatchGuard Dimension instantly turns raw network data into actionable
>> security intelligence. It gives you real-time visual feedback on key
>> security issues and trends.  Skip the complicated setup - simply import
>> a virtual appliance and go from zero to informed in seconds.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Rdkit-discuss mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to