George,

Have you added coordinates to the mols converted from InChI?
It made a huge difference for the examples I've tried.

Igor


On Thu, Jan 30, 2014 at 2:07 PM, George Papadatos <[email protected]>wrote:

> OK just to add some fuel to this fire: A colleague of mine and I looked at
> the inchi roundtrip using KNIME 2.9 and the latest versions of indigo and
> rdkit nodes. We used ~90,000 inchis from chembl_17, converted them to mols
> (sanitise + remove Hs), removed the ones that fail to convert, and then we
> converted back to inchis (standard ones, no extra parameters). We assessed
> the discrepancies between indigo and rdkit inchis compared to the original
> input inchis that are stored in chembl.
> Rdkit had 10 times more discrepancies with 200 failures as opposed to 21
> from indigo. This rate (~0.2%) was also confirmed using ~1 million inchis.
>
> I had a closer look to a couple of cases here:
> http://nbviewer.ipython.org/gist/madgpap/8715974
>
> It seems that there is more that one reason for the failure. I totally
> understand Greg's caution about the inchi2mol conversion, but given the
> difference between rdkit and indigo, there might room for improvement. Any
> insights would be very much appreciated.
>
> Btw, the KNIME workflow and full list of fails are available to you.
>
> Cheers,
>
> George
>
>
>
> On 30 January 2014 04:11, Greg Landrum <[email protected]> wrote:
>
>> Yeah, I have been tempted several times to remove the InChI->RDKit
>> functionality entirely
>>
>>
>>
>> On Thu, Jan 30, 2014 at 5:05 AM, Igor Filippov <[email protected]
>> > wrote:
>>
>>> Thank you, Greg!
>>> Very nice explanation and I think this issue has confused people before
>>> me as well. I am going to have to keep reminding myself about it as the
>>> subject comes up every now and then.
>>>
>>> Igor
>>> On Jan 29, 2014 10:59 PM, "Greg Landrum" <[email protected]> wrote:
>>>
>>>> Hi Igor,
>>>>
>>>> On Wed, Jan 29, 2014 at 2:04 PM, Igor Filippov <
>>>> [email protected]> wrote:
>>>>
>>>>> Greg et al,
>>>>>
>>>>> Here is a little script that demonstrates a problem with fingerprints
>>>>> after the roundtrip through InChI.
>>>>> My input mol file is also attached.
>>>>> As you can see the similarity between "before" and "after" is not 1 in
>>>>> 45 out of 100 cases.
>>>>> In one case it is as low as 0.29. Could someone take a look and tell
>>>>> me what I'm doing wrong?
>>>>>
>>>>
>>>> Ah! Now I see what you're doing and understand the problem.
>>>>
>>>> It's really important when using InChI to remember that InChI is
>>>> designed to be an identifier, not an interchange format. The InChI
>>>> algorithm modifies the molecule as part of its canonicalization step. This
>>>> modification includes standardizing tautomers.
>>>>
>>>> Here's an example of the type of substructure modification that happens
>>>> in your molecules:
>>>> input smiles c1ccccc1C(=O)Nc1ccccc1 on begin converted to InChI and
>>>> back yields: OC(=Nc1ccccc1)c1ccccc1
>>>>
>>>> Basically: If you think you know what your molecules are, you probably
>>>> should be building them from SMILES or CTAB, not InChI.
>>>>
>>>> Apologies that I didn't think of this before; I was just focusing on
>>>> the stereochemistry.
>>>>
>>>> -greg
>>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> WatchGuard Dimension instantly turns raw network data into actionable
>> security intelligence. It gives you real-time visual feedback on key
>> security issues and trends.  Skip the complicated setup - simply import
>> a virtual appliance and go from zero to informed in seconds.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Rdkit-discuss mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
>
> ------------------------------------------------------------------------------
> WatchGuard Dimension instantly turns raw network data into actionable
> security intelligence. It gives you real-time visual feedback on key
> security issues and trends.  Skip the complicated setup - simply import
> a virtual appliance and go from zero to informed in seconds.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
> _______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to