[Apologies, resending, as my previous reply did not go to rdkit; added some 
more info, too].

Thank you, James.

My mistake was to think that, as RGroupDecompose() is somehow able to tell that 
molecules with this literal substructure in my set (unsubstituted N):

[cid:image004.png@01D95D5E.053F3E00]

match the original core I used:

[cid:image005.png@01D95D5E.053F3E00]

it would then stick to the core I specified for the RGroup decomposition, not 
create its tautomer with different R group labels on it, to match the target 
molecule's pattern rather than the core pattern. But OK, I imagine there's a 
reason for this.

I tried specifying core_mol with R labels on:

[cid:image006.png@01D95D5E.053F3E00]

--> then all the molecules with the alternative tautomeric form, even when N is 
unsubstituted, do not match :/

In practice, I think I must convert all N-unsubstituted molecules to the 
tautomeric form I want, before running RGroupDecompose().

I tried CanonicalTautomer(), and it does not do that consistently; actually, it 
converts more often the desired tautomer (NH attached to the benzene ring) into 
the other one.
Probably need to do this via a reaction.

Thanks again for your input.

From: James Wallace <james.wall...@evotec.com<mailto:james.wall...@evotec.com>>
Sent: 22 March 2023 14:20
To: Giovanni Tricarico 
<giovanni.tricar...@glpg.com<mailto:giovanni.tricar...@glpg.com>>
Subject: RE: invalid core SMILES returned by RGroupDecompose

You don't often get email from 
james.wall...@evotec.com<mailto:james.wall...@evotec.com>. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>
I've found some similar behaviour with respect to the tautomer, but when part 
of my query molecule is a bridged ring. In that case, instead of matching the 
structure as presented, it matches the bridged ring as a whole, as well as 
matching smaller rings represented by the bridge.

Being able to force a 'complete' match so to speak will help here.

As for your core, I've experienced this before where the aromaticity check 
seems to fail around the presence of [nH] in that kind of structure confusing 
the Kekulize/dekekulize code. All I could do to work around it was to build the 
molecule with the added option sanitize=False, so:

mol = 
Chem.MolFromSmiles("[nH]1c2c([*:5])c([*:6])c([*:7])c([*:1])c2c([*:2])n1[*:3]", 
sanitize=False)

But that's not ideal.

From: Giovanni Tricarico 
<giovanni.tricar...@glpg.com<mailto:giovanni.tricar...@glpg.com>>
Sent: 22 March 2023 10:04
To: 
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
Subject: [Rdkit-discuss] invalid core SMILES returned by RGroupDecompose

ALERT : This message originated outside of Evotec's network. BE CAUTIOUS before 
clicking any link or attachment.
Hello,
I tried out RGroupDecompose on a set of indazoles, using "c1ccc2[nH]ncc2c1" as 
core molecule.
Most of them gave a valid core SMILES:

n1c([*:2])c2c([*:1])c([*:7])c([*:6])c([*:5])c2n1[*:4]

However, some gave this core SMILES:

[nH]1c2c([*:5])c([*:6])c([*:7])c([*:1])c2c([*:2])n1[*:3]

which rdkit itself then refuses to convert to a molecule (other software like 
Dotmatics Vortex does instead (?)).

[cid:image007.png@01D95D5E.053F3E00]

Any idea what may be going wrong?

I noticed that the tautomeric form of the indazole ring is different in the 
molecules that originated the 'wrong' core, in particular the H (or other 
substituent) is on the nitrogen atom that is not attached to the benzene ring.

[In fact, that also raises the question of why a tautomer of the original core 
was matched by RGroupDecompose, and how one would instead force the matching of 
the chosen tautomer only].

Thanks

Giovanni Tricarico
Principal Scientist Computational Chemistry

[cid:image008.png@01D95D5E.053F3E00]

Galapagos
Generaal De Wittelaan L11 A3
2800 Mechelen
Belgium
T: +32 15 6514 30
www.glpg.com<https://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fddec1-0-en-ctp.trendmicro.com%2Fwis%2Fclicktime%2Fv1%2Fquery%3Furl%3Dhttp%253a%252f%252fwww.glpg.com%26umid%3D5446a8aa-7b52-447d-8f31-5d6c34bce118%26auth%3D670e6529b563b7dbb42ee90dda0d50ae13dc637b-7d301ceda7a7fffd7f39b707b1db234a1d670c4c&data=05%7C01%7Cgiovanni.tricarico%40glpg.com%7Cd2b8ec427d2b47fc7c7b08db2ad82b30%7C627f3c33bccc48bba033c0a6521f7642%7C1%7C0%7C638150880169822616%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=unQhIedwwzpjVS1zGIPI5OMRck6huQsJCnyc14VFiRA%3D&reserved=0>

This e-mail and its attachment(s) (if any) may contain confidential and/or 
proprietary information and is intended for its addressee(s) only. Any 
unauthorized use of the information contained herein (including, but not 
limited to, alteration, reproduction, communication, distribution or any other 
form of dissemination) is strictly prohibited. If you are not the intended 
addressee, please notify the originator promptly and delete this e-mail and its 
attachment(s) (if any) subsequently. Neither Galapagos nor any of its 
affiliates shall be liable for direct, special, indirect or consequential 
damages arising from alteration of the contents of this message (by a third 
party) or as a result of a virus being passed on.

Please find our information on data protection 
here<https://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.evotec.com%2Fen%2Fabout%2Fsite-information%2Fprivacy-policy&data=05%7C01%7Cgiovanni.tricarico%40glpg.com%7Cd2b8ec427d2b47fc7c7b08db2ad82b30%7C627f3c33bccc48bba033c0a6521f7642%7C1%7C0%7C638150880169822616%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HQthypGPa9kn4brcOuLuR3Lwji7Pd%2Fp9rKDEKCEgfhI%3D&reserved=0>.

Evotec (UK) Ltd is a limited company registered in England and Wales.  
Registration number:2674265.  Registered Office:  114 Innovation Drive, Milton 
Park, Abingdon, Oxfordshire, OX14 4RZ, United Kingdom

STATEMENT OF CONFIDENTIALITY.



This email and any attachments may contain confidential, proprietary, 
privileged and/or private information.

If received in error, please notify us immediately by reply email and then 
delete this email and any attachments from your system. Thank you.

This e-mail and its attachment(s) (if any) may contain confidential and/or 
proprietary information and is intended for its addressee(s) only. Any 
unauthorized use of the information contained herein (including, but not 
limited to, alteration, reproduction, communication, distribution or any other 
form of dissemination) is strictly prohibited. If you are not the intended 
addressee, please notify the originator promptly and delete this e-mail and its 
attachment(s) (if any) subsequently. Neither Galapagos nor any of its 
affiliates shall be liable for direct, special, indirect or consequential 
damages arising from alteration of the contents of this message (by a third 
party) or as a result of a virus being passed on.
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to