Re: [Rdkit-discuss] GetSubstructMatches and unique match

2020-05-11 Thread Greenpharma S.A.S.
Dear Paolo,
Thank you very much. I'll test this and revert to you.
Have a nice day.
Best regards,
Quoc-Tuan

> Le 10 mai 2020 à 13:09, Paolo Tosco  mailto:paolo.tosco.m...@gmail.com > a écrit :
> 
> 
> Dear Quoc-Tuan,
> 
> I think I have come with a reasonably fast algorithm that seems to be
> more robust:
> 
> https://gist.github.com/ptosco/dc4d27153e6e8e45aed654761e4d7409
> 
> Cheers,
> p.
> 
> On 06/05/2020 09:11, Quoc-Tuan DO wrote:
> 
> > > Dear Paolo,
> > 
> > > 
> > > Thank you again for your code. Sorry for bothering you again. It 
> works
> > all fine for monoterpenes but not for diterpenes, sesquiterpenes nor
> > triterpenes.
> > 
> > > 
> > > pattern: C~C~C(~C)~C
> > 
> > > 
> > > mol1: 
> CC(=O)O[C@H]1CC[C@]2([C@H](C1(C)C)CC=C([C@@H]2CC/C(=C/C(=O)O)/C)C)C
> > 
> > > 
> > > => ((17, 18, 19, 20, 23), (16, 24, 13, 14, 15), (8, 9, 4, 12, 7))
> > 
> > > 
> > > It should find 4 distinct units.
> > 
> > > 
> > > mol2: 
> OCC12CCC(C2C2C(CC1)(C)C1(C)CCC3C(C1CC2)(C)CCC(C3(C)C)O)C(=C)C
> > 
> > > 
> > > => ((16, 25, 27, 17, 15), (18, 19, 12, 13, 14), (1, 2, 5, 6, 7))
> > 
> > > 
> > > It should find 6 distinct units.
> > 
> > > 
> > > I tried with a smarts version of the pattern
> > [#6]~[#6]~[#6](~[#6])~[#6], but got the same results as with smiles.
> > 
> > > 
> > > What do you think? Is there something missing in the query?
> > 
> > > 
> > > Thanks for your time,
> > 
> > > 
> > > Best regards,
> > 
> > > 
> > > QT
> > 
> > > >
> >
> 
> > > Le 05/05/2020 à 14:52, Paolo Tosco a écrit :
> > >
> > 
> > > >> Dear Quoc-Tuan,
> >>
> >> this should do what you need:
> >>
> >> https://gist.github.com/ptosco/dc4d27153e6e8e45aed654761e4d7409
> >>
> >> Cheers,
> >> p.
> >>
> >
> 
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GetSubstructMatches and unique match

2020-05-10 Thread Paolo Tosco

Dear Quoc-Tuan,

I think I have come with a reasonably fast algorithm that seems to be 
more robust:


https://gist.github.com/ptosco/dc4d27153e6e8e45aed654761e4d7409

Cheers,
p.

On 06/05/2020 09:11, Quoc-Tuan DO wrote:

Dear Paolo,

Thank you again for your code. Sorry for bothering you again. It works 
all fine for monoterpenes but not for diterpenes, sesquiterpenes nor 
triterpenes.


pattern: C~C~C(~C)~C

mol1: CC(=O)O[C@H]1CC[C@]2([C@H](C1(C)C)CC=C([C@@H]2CC/C(=C/C(=O)O)/C)C)C

=> ((17, 18, 19, 20, 23), (16, 24, 13, 14, 15), (8, 9, 4, 12, 7))

It should find 4 distinct units.

mol2: OCC12CCC(C2C2C(CC1)(C)C1(C)CCC3C(C1CC2)(C)CCC(C3(C)C)O)C(=C)C

=> ((16, 25, 27, 17, 15), (18, 19, 12, 13, 14), (1, 2, 5, 6, 7))

It should find 6 distinct units.

I tried with a smarts version of the pattern 
[#6]~[#6]~[#6](~[#6])~[#6], but got the same results as with smiles.


What do you think? Is there something missing in the query?

Thanks for your time,

Best regards,

QT



Le 05/05/2020 à 14:52, Paolo Tosco a écrit :


Dear Quoc-Tuan,

this should do what you need:

https://gist.github.com/ptosco/dc4d27153e6e8e45aed654761e4d7409

Cheers,
p.






___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GetSubstructMatches and unique match

2020-05-06 Thread Quoc-Tuan DO

Dear Paolo,

Thank you again for your code. Sorry for bothering you again. It works 
all fine for monoterpenes but not for diterpenes, sesquiterpenes nor 
triterpenes.


pattern: C~C~C(~C)~C

mol1: CC(=O)O[C@H]1CC[C@]2([C@H](C1(C)C)CC=C([C@@H]2CC/C(=C/C(=O)O)/C)C)C

=> ((17, 18, 19, 20, 23), (16, 24, 13, 14, 15), (8, 9, 4, 12, 7))

It should find 4 distinct units.

mol2: OCC12CCC(C2C2C(CC1)(C)C1(C)CCC3C(C1CC2)(C)CCC(C3(C)C)O)C(=C)C

=> ((16, 25, 27, 17, 15), (18, 19, 12, 13, 14), (1, 2, 5, 6, 7))

It should find 6 distinct units.

I tried with a smarts version of the pattern [#6]~[#6]~[#6](~[#6])~[#6], 
but got the same results as with smiles.


What do you think? Is there something missing in the query?

Thanks for your time,

Best regards,

QT



Le 05/05/2020 à 14:52, Paolo Tosco a écrit :


Dear Quoc-Tuan,

this should do what you need:

https://gist.github.com/ptosco/dc4d27153e6e8e45aed654761e4d7409

Cheers,
p.





___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GetSubstructMatches and unique match

2020-05-05 Thread Jean-Marc Nuzillard

Dear Paolo,

this answers my question as well, but in an unexpected way.

Best,

Jean-Marc


Le 05/05/2020 à 14:52, Paolo Tosco a écrit :


Dear Quoc-Tuan,

this should do what you need:

https://gist.github.com/ptosco/dc4d27153e6e8e45aed654761e4d7409

Cheers,
p.

On 05/05/2020 11:52, Quoc-Tuan DO wrote:


Dear Paolo,

Thank you for your reply.

I understand now... I did not use uniquify option first then only 
uniquify=True. I thought the default would be uniquify=False.


Actually my problem is to find 2 distinct units of isoprene (pattern) 
in the borneol (smiles) as the latter is a monoterpene.


Do you have any idea I can do this ?

Thanks in advance for your time.

Best regards,

QT



Le 04/05/2020 à 19:53, Paolo Tosco a écrit :


Dear Quoc-Tuan,

On 04/05/2020 09:10, Greenpharma S.A.S. wrote:


Dear All,

Please could you help with the following problem (I could not find 
answers in discussion list) ?


pattern='C~C~C(~C)~C'

smiles='O[C@H]1C[C@H]2C([C@@]1(C)CC2)(C)C'


pat = Chem.MolFromSmiles(pattern)
mol = Chem.MolFromSmiles(smiles)
res = mol.GetSubstructMatches(pat, uniquify=True)


The results are:

((1, 2, 3, 4, 8), (1, 5, 4, 3, 9), (1, 5, 4, 3, 10), (1, 5, 4, 9, 
10), (2, 1, 5, 4, 6), (2, 1, 5, 4, 7), (2, 1, 5, 6, 7), (2, 3, 4, 
5, 9), (2, 3, 4, 5, 10), (2, 3, 4, 9, 10), (3, 4, 5, 1, 6), (3, 4, 
5, 1, 7), (3, 4, 5, 6, 7), (5, 4, 3, 2, 8), (6, 5, 4, 3, 9), (6, 5, 
4, 3, 10), (6, 5, 4, 9, 10), (7, 5, 4, 3, 9), (7, 5, 4, 3, 10), (7, 
5, 4, 9, 10), (7, 8, 3, 2, 4), (8, 3, 4, 5, 9), (8, 3, 4, 5, 10), 
(8, 3, 4, 9, 10), (8, 7, 5, 1, 4), (8, 7, 5, 1, 6), (8, 7, 5, 4, 
6), (9, 4, 3, 2, 8), (9, 4, 5, 1, 6), (9, 4, 5, 1, 7), (9, 4, 5, 6, 
7), (10, 4, 3, 2, 8), (10, 4, 5, 1, 6), (10, 4, 5, 1, 7), (10, 4, 
5, 6, 7))



I expect to have only 2 matches with uniquify=True as I only have 2 
units of the pattern.


GetSubstructMatches() will report all matches of the pattern against 
your molecule. In your case, there are 35 matches which are all 
constituted by different atom indices.



Furthermore, with or without uniquify, I have the same answers.

If you set uniquify=False, you actually get 70 matches, so twice as 
many answers. This time, matches can be constitued by the same 
indices, provided they are in a different permutation.


I have uploaded a gist here:

https://gist.github.com/ptosco/6d70cec235361fbaddc7cbc2cf9c3b5d

that hopefully will make this clearer.

Cheers,
p.

I also expected that there should be 2 "independent" lists but 
here, there is always at least one common atom between each list.


Is there something misunderstood or misused?

Thanks in advance for your help and explanations.

Best regards,

Quoc-Tuan



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss





___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



--
Jean-Marc Nuzillard
Directeur de Recherches au CNRS

Institut de Chimie Moléculaire de Reims
CNRS UMR 7312
Moulin de la Housse
CPCBAI, Bâtiment 18
BP 1039
51687 REIMS Cedex 2
France

Tel : 03 26 91 82 10
Fax : 03 26 91 31 66
http://www.univ-reims.fr/icmr
http://eos.univ-reims.fr/LSD/CSNteam.html

http://www.univ-reims.fr/LSD/
http://www.univ-reims.fr/LSD/JmnSoft/

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GetSubstructMatches and unique match

2020-05-05 Thread Paolo Tosco

Dear Quoc-Tuan,

this should do what you need:

https://gist.github.com/ptosco/dc4d27153e6e8e45aed654761e4d7409

Cheers,
p.

On 05/05/2020 11:52, Quoc-Tuan DO wrote:


Dear Paolo,

Thank you for your reply.

I understand now... I did not use uniquify option first then only 
uniquify=True. I thought the default would be uniquify=False.


Actually my problem is to find 2 distinct units of isoprene (pattern) 
in the borneol (smiles) as the latter is a monoterpene.


Do you have any idea I can do this ?

Thanks in advance for your time.

Best regards,

QT



Le 04/05/2020 à 19:53, Paolo Tosco a écrit :


Dear Quoc-Tuan,

On 04/05/2020 09:10, Greenpharma S.A.S. wrote:


Dear All,

Please could you help with the following problem (I could not find 
answers in discussion list) ?


pattern='C~C~C(~C)~C'

smiles='O[C@H]1C[C@H]2C([C@@]1(C)CC2)(C)C'


pat = Chem.MolFromSmiles(pattern)
mol = Chem.MolFromSmiles(smiles)
res = mol.GetSubstructMatches(pat, uniquify=True)


The results are:

((1, 2, 3, 4, 8), (1, 5, 4, 3, 9), (1, 5, 4, 3, 10), (1, 5, 4, 9, 
10), (2, 1, 5, 4, 6), (2, 1, 5, 4, 7), (2, 1, 5, 6, 7), (2, 3, 4, 5, 
9), (2, 3, 4, 5, 10), (2, 3, 4, 9, 10), (3, 4, 5, 1, 6), (3, 4, 5, 
1, 7), (3, 4, 5, 6, 7), (5, 4, 3, 2, 8), (6, 5, 4, 3, 9), (6, 5, 4, 
3, 10), (6, 5, 4, 9, 10), (7, 5, 4, 3, 9), (7, 5, 4, 3, 10), (7, 5, 
4, 9, 10), (7, 8, 3, 2, 4), (8, 3, 4, 5, 9), (8, 3, 4, 5, 10), (8, 
3, 4, 9, 10), (8, 7, 5, 1, 4), (8, 7, 5, 1, 6), (8, 7, 5, 4, 6), (9, 
4, 3, 2, 8), (9, 4, 5, 1, 6), (9, 4, 5, 1, 7), (9, 4, 5, 6, 7), (10, 
4, 3, 2, 8), (10, 4, 5, 1, 6), (10, 4, 5, 1, 7), (10, 4, 5, 6, 7))



I expect to have only 2 matches with uniquify=True as I only have 2 
units of the pattern.


GetSubstructMatches() will report all matches of the pattern against 
your molecule. In your case, there are 35 matches which are all 
constituted by different atom indices.



Furthermore, with or without uniquify, I have the same answers.

If you set uniquify=False, you actually get 70 matches, so twice as 
many answers. This time, matches can be constitued by the same 
indices, provided they are in a different permutation.


I have uploaded a gist here:

https://gist.github.com/ptosco/6d70cec235361fbaddc7cbc2cf9c3b5d

that hopefully will make this clearer.

Cheers,
p.

I also expected that there should be 2 "independent" lists but here, 
there is always at least one common atom between each list.


Is there something misunderstood or misused?

Thanks in advance for your help and explanations.

Best regards,

Quoc-Tuan



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GetSubstructMatches and unique match

2020-05-05 Thread Quoc-Tuan DO

  
  
Dear Paolo,
Thank you for your reply.
I understand now... I did not use uniquify option first then only
  uniquify=True. I thought the default would be uniquify=False.
Actually my problem is to find 2 distinct units of isoprene
  (pattern) in the borneol (smiles) as the latter is a monoterpene.
Do you have any idea I can do this ?
Thanks in advance for your time.
Best regards,
QT
  





Le 04/05/2020 à 19:53, Paolo Tosco a
  écrit :


  
  Dear Quoc-Tuan,
  On 04/05/2020 09:10, Greenpharma S.A.S. wrote:
  


Dear All,

Please could you help with the following problem (I could not
  find answers in discussion list) ?

pattern='C~C~C(~C)~C'
smiles='O[C@H]1C[C@H]2C([C@@]1(C)CC2)(C)C'



 pat = Chem.MolFromSmiles(pattern)
  mol = Chem.MolFromSmiles(smiles)
  res = mol.GetSubstructMatches(pat, uniquify=True)



The results are:

((1, 2, 3, 4, 8), (1, 5, 4, 3, 9), (1, 5, 4, 3, 10), (1, 5,
  4, 9, 10), (2, 1, 5, 4, 6), (2, 1, 5, 4, 7), (2, 1, 5, 6, 7),
  (2, 3, 4, 5, 9), (2, 3, 4, 5, 10), (2, 3, 4, 9, 10), (3, 4, 5,
  1, 6), (3, 4, 5, 1, 7), (3, 4, 5, 6, 7), (5, 4, 3, 2, 8), (6,
  5, 4, 3, 9), (6, 5, 4, 3, 10), (6, 5, 4, 9, 10), (7, 5, 4, 3,
  9), (7, 5, 4, 3, 10), (7, 5, 4, 9, 10), (7, 8, 3, 2, 4), (8,
  3, 4, 5, 9), (8, 3, 4, 5, 10), (8, 3, 4, 9, 10), (8, 7, 5, 1,
  4), (8, 7, 5, 1, 6), (8, 7, 5, 4, 6), (9, 4, 3, 2, 8), (9, 4,
  5, 1, 6), (9, 4, 5, 1, 7), (9, 4, 5, 6, 7), (10, 4, 3, 2, 8),
  (10, 4, 5, 1, 6), (10, 4, 5, 1, 7), (10, 4, 5, 6, 7))


I expect to have only 2 matches with uniquify=True as I only
  have 2 units of the pattern.
  
  GetSubstructMatches() will report all matches of the
pattern against your molecule. In your case, there are 35
matches which are all constituted by different atom indices.
  
Furthermore, with or without uniquify, I have the same
  answers.
  
  If you set uniquify=False, you actually get 70
matches, so twice as many answers. This time, matches can be
constitued by the same indices, provided they are in a different
permutation.
  I have uploaded a gist here:
  https://gist.github.com/ptosco/6d70cec235361fbaddc7cbc2cf9c3b5d
  that hopefully will make this clearer.
  Cheers,
p.
  
  
I also expected that there should be 2 "independent" lists
  but here, there is always at least one common atom between
  each list.

Is there something misunderstood or misused?

Thanks in advance for your help and explanations.

Best regards,

Quoc-Tuan





___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

  



  


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GetSubstructMatches and unique match

2020-05-04 Thread Jean-Marc Nuzillard

Dear Quoc-Tuan,

GetSubstructMatches() tries to find isoprene at all positions where this 
is possible.


You may want to test your SMARTS and its matching with structures at 
this great place:

https://smartsview.zbh.uni-hamburg.de/

Maybe you would prefer to known whether borneol
follows the isoprene rule or not by trying to cover its structure
with two, unbound, isoprene units.
I really would like to know how to write that with SMARTS.

Jean-Marc


Le 04/05/2020 à 10:10, Greenpharma S.A.S. a écrit :


Dear All,

Please could you help with the following problem (I could not find 
answers in discussion list) ?


pattern='C~C~C(~C)~C'

smiles='O[C@H]1C[C@H]2C([C@@]1(C)CC2)(C)C'


pat = Chem.MolFromSmiles(pattern)
mol = Chem.MolFromSmiles(smiles)
res = mol.GetSubstructMatches(pat, uniquify=True)


The results are:

((1, 2, 3, 4, 8), (1, 5, 4, 3, 9), (1, 5, 4, 3, 10), (1, 5, 4, 9, 10), 
(2, 1, 5, 4, 6), (2, 1, 5, 4, 7), (2, 1, 5, 6, 7), (2, 3, 4, 5, 9), 
(2, 3, 4, 5, 10), (2, 3, 4, 9, 10), (3, 4, 5, 1, 6), (3, 4, 5, 1, 7), 
(3, 4, 5, 6, 7), (5, 4, 3, 2, 8), (6, 5, 4, 3, 9), (6, 5, 4, 3, 10), 
(6, 5, 4, 9, 10), (7, 5, 4, 3, 9), (7, 5, 4, 3, 10), (7, 5, 4, 9, 10), 
(7, 8, 3, 2, 4), (8, 3, 4, 5, 9), (8, 3, 4, 5, 10), (8, 3, 4, 9, 10), 
(8, 7, 5, 1, 4), (8, 7, 5, 1, 6), (8, 7, 5, 4, 6), (9, 4, 3, 2, 8), 
(9, 4, 5, 1, 6), (9, 4, 5, 1, 7), (9, 4, 5, 6, 7), (10, 4, 3, 2, 8), 
(10, 4, 5, 1, 6), (10, 4, 5, 1, 7), (10, 4, 5, 6, 7))



I expect to have only 2 matches with uniquify=True as I only have 2 
units of the pattern. Furthermore, with or without uniquify, I have 
the same answers. I also expected that there should be 2 "independent" 
lists but here, there is always at least one common atom between each 
list.


Is there something misunderstood or misused?

Thanks in advance for your help and explanations.

Best regards,

Quoc-Tuan



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



--
Jean-Marc Nuzillard
Directeur de Recherches au CNRS

Institut de Chimie Moléculaire de Reims
CNRS UMR 7312
Moulin de la Housse
CPCBAI, Bâtiment 18
BP 1039
51687 REIMS Cedex 2
France

Tel : 03 26 91 82 10
Fax : 03 26 91 31 66
http://www.univ-reims.fr/icmr
http://eos.univ-reims.fr/LSD/CSNteam.html

http://www.univ-reims.fr/LSD/
http://www.univ-reims.fr/LSD/JmnSoft/

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GetSubstructMatches and unique match

2020-05-04 Thread Paolo Tosco

Dear Quoc-Tuan,

On 04/05/2020 09:10, Greenpharma S.A.S. wrote:


Dear All,

Please could you help with the following problem (I could not find 
answers in discussion list) ?


pattern='C~C~C(~C)~C'

smiles='O[C@H]1C[C@H]2C([C@@]1(C)CC2)(C)C'


pat = Chem.MolFromSmiles(pattern)
mol = Chem.MolFromSmiles(smiles)
res = mol.GetSubstructMatches(pat, uniquify=True)


The results are:

((1, 2, 3, 4, 8), (1, 5, 4, 3, 9), (1, 5, 4, 3, 10), (1, 5, 4, 9, 10), 
(2, 1, 5, 4, 6), (2, 1, 5, 4, 7), (2, 1, 5, 6, 7), (2, 3, 4, 5, 9), 
(2, 3, 4, 5, 10), (2, 3, 4, 9, 10), (3, 4, 5, 1, 6), (3, 4, 5, 1, 7), 
(3, 4, 5, 6, 7), (5, 4, 3, 2, 8), (6, 5, 4, 3, 9), (6, 5, 4, 3, 10), 
(6, 5, 4, 9, 10), (7, 5, 4, 3, 9), (7, 5, 4, 3, 10), (7, 5, 4, 9, 10), 
(7, 8, 3, 2, 4), (8, 3, 4, 5, 9), (8, 3, 4, 5, 10), (8, 3, 4, 9, 10), 
(8, 7, 5, 1, 4), (8, 7, 5, 1, 6), (8, 7, 5, 4, 6), (9, 4, 3, 2, 8), 
(9, 4, 5, 1, 6), (9, 4, 5, 1, 7), (9, 4, 5, 6, 7), (10, 4, 3, 2, 8), 
(10, 4, 5, 1, 6), (10, 4, 5, 1, 7), (10, 4, 5, 6, 7))



I expect to have only 2 matches with uniquify=True as I only have 2 
units of the pattern.


GetSubstructMatches() will report all matches of the pattern against 
your molecule. In your case, there are 35 matches which are all 
constituted by different atom indices.



Furthermore, with or without uniquify, I have the same answers.

If you set uniquify=False, you actually get 70 matches, so twice as many 
answers. This time, matches can be constitued by the same indices, 
provided they are in a different permutation.


I have uploaded a gist here:

https://gist.github.com/ptosco/6d70cec235361fbaddc7cbc2cf9c3b5d

that hopefully will make this clearer.

Cheers,
p.

I also expected that there should be 2 "independent" lists but here, 
there is always at least one common atom between each list.


Is there something misunderstood or misused?

Thanks in advance for your help and explanations.

Best regards,

Quoc-Tuan



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss