Yeah, I have been tempted several times to remove the InChI->RDKit
functionality entirely


On Thu, Jan 30, 2014 at 5:05 AM, Igor Filippov <[email protected]>wrote:

> Thank you, Greg!
> Very nice explanation and I think this issue has confused people before me
> as well. I am going to have to keep reminding myself about it as the
> subject comes up every now and then.
>
> Igor
> On Jan 29, 2014 10:59 PM, "Greg Landrum" <[email protected]> wrote:
>
>> Hi Igor,
>>
>> On Wed, Jan 29, 2014 at 2:04 PM, Igor Filippov <[email protected]
>> > wrote:
>>
>>> Greg et al,
>>>
>>> Here is a little script that demonstrates a problem with fingerprints
>>> after the roundtrip through InChI.
>>> My input mol file is also attached.
>>> As you can see the similarity between "before" and "after" is not 1 in
>>> 45 out of 100 cases.
>>> In one case it is as low as 0.29. Could someone take a look and tell me
>>> what I'm doing wrong?
>>>
>>
>> Ah! Now I see what you're doing and understand the problem.
>>
>> It's really important when using InChI to remember that InChI is designed
>> to be an identifier, not an interchange format. The InChI algorithm
>> modifies the molecule as part of its canonicalization step. This
>> modification includes standardizing tautomers.
>>
>> Here's an example of the type of substructure modification that happens
>> in your molecules:
>> input smiles c1ccccc1C(=O)Nc1ccccc1 on begin converted to InChI and back
>> yields: OC(=Nc1ccccc1)c1ccccc1
>>
>> Basically: If you think you know what your molecules are, you probably
>> should be building them from SMILES or CTAB, not InChI.
>>
>> Apologies that I didn't think of this before; I was just focusing on the
>> stereochemistry.
>>
>> -greg
>>
>
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to