Re: [Rdkit-discuss] GetSubstructMatches and unique match

Jean-Marc Nuzillard Tue, 05 May 2020 07:38:33 -0700

Dear Paolo,

this answers my question as well, but in an unexpected way.


Best,

Jean-Marc


Le 05/05/2020 à 14:52, Paolo Tosco a écrit :

Dear Quoc-Tuan,

this should do what you need:

https://gist.github.com/ptosco/dc4d27153e6e8e45aed654761e4d7409

Cheers,
p.

On 05/05/2020 11:52, Quoc-Tuan DO wrote:
Dear Paolo,

Thank you for your reply.
I understand now... I did not use uniquify option first then onlyuniquify=True. I thought the default would be uniquify=False.
Actually my problem is to find 2 distinct units of isoprene (pattern)in the borneol (smiles) as the latter is a monoterpene.
Do you have any idea I can do this ?

Thanks in advance for your time.

Best regards,

QT



Le 04/05/2020 à 19:53, Paolo Tosco a écrit :
Dear Quoc-Tuan,

On 04/05/2020 09:10, Greenpharma S.A.S. wrote:
Dear All,
Please could you help with the following problem (I could not findanswers in discussion list) ?
pattern='C~C~C(~C)~C'

smiles='O[C@H]1C[C@H]2C([C@@]1(C)CC2)(C)C'


pat = Chem.MolFromSmiles(pattern)
mol = Chem.MolFromSmiles(smiles)
res = mol.GetSubstructMatches(pat, uniquify=True)


The results are:
((1, 2, 3, 4, 8), (1, 5, 4, 3, 9), (1, 5, 4, 3, 10), (1, 5, 4, 9,10), (2, 1, 5, 4, 6), (2, 1, 5, 4, 7), (2, 1, 5, 6, 7), (2, 3, 4,5, 9), (2, 3, 4, 5, 10), (2, 3, 4, 9, 10), (3, 4, 5, 1, 6), (3, 4,5, 1, 7), (3, 4, 5, 6, 7), (5, 4, 3, 2, 8), (6, 5, 4, 3, 9), (6, 5,4, 3, 10), (6, 5, 4, 9, 10), (7, 5, 4, 3, 9), (7, 5, 4, 3, 10), (7,5, 4, 9, 10), (7, 8, 3, 2, 4), (8, 3, 4, 5, 9), (8, 3, 4, 5, 10),(8, 3, 4, 9, 10), (8, 7, 5, 1, 4), (8, 7, 5, 1, 6), (8, 7, 5, 4,6), (9, 4, 3, 2, 8), (9, 4, 5, 1, 6), (9, 4, 5, 1, 7), (9, 4, 5, 6,7), (10, 4, 3, 2, 8), (10, 4, 5, 1, 6), (10, 4, 5, 1, 7), (10, 4,5, 6, 7))
I expect to have only 2 matches with uniquify=True as I only have 2units of the pattern.
GetSubstructMatches() will report all matches of the pattern againstyour molecule. In your case, there are 35 matches which are allconstituted by different atom indices.
Furthermore, with or without uniquify, I have the same answers.
If you set uniquify=False, you actually get 70 matches, so twice asmany answers. This time, matches can be constitued by the sameindices, provided they are in a different permutation.
I have uploaded a gist here:

https://gist.github.com/ptosco/6d70cec235361fbaddc7cbc2cf9c3b5d

that hopefully will make this clearer.

Cheers,
p.
I also expected that there should be 2 "independent" lists buthere, there is always at least one common atom between each list.
Is there something misunderstood or misused?

Thanks in advance for your help and explanations.

Best regards,

Quoc-Tuan



_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



--
Jean-Marc Nuzillard
Directeur de Recherches au CNRS

Institut de Chimie Moléculaire de Reims
CNRS UMR 7312
Moulin de la Housse
CPCBAI, Bâtiment 18
BP 1039
51687 REIMS Cedex 2
France

Tel : 03 26 91 82 10
Fax : 03 26 91 31 66
http://www.univ-reims.fr/icmr
http://eos.univ-reims.fr/LSD/CSNteam.html

http://www.univ-reims.fr/LSD/
http://www.univ-reims.fr/LSD/JmnSoft/

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] GetSubstructMatches and unique match

Reply via email to