Re: [Rdkit-discuss] Substructure search issue with aliphatic/aromatic bonds

2020-05-20 Thread theozh
Hi Paolo,

argh... I thought if you are setting  params.sanitize=False  then you don't 
want sanitization.
Apparently, you need it for the mols but skipping only aromatization.
I guess, I slowly start to understand...

What you explained here and in your github example I tried to find something 
similar in the RDKit documentation or in the web... without too much success. 
Wouldn't this be an essential step in substructure search or even a FAQ?

Well, now I took a larger list and I got as many hits as I expected. Happy End!

Thank you very much for your kind help!
Theo.

Am 20.05.2020 um 15:22 schrieb Paolo Tosco:
> Hi Theo,
>
> that's because you omitted the sanitization step completely, so the molecule 
> is missing crucial information for the SubstructureMatch to do a proper job.
>
> If you put back sanitization, only leaving out the aromatization step, things 
> work as expected.
> Also, you do not need to create pattern again from SMILES, you can make a 
> copy of the molecule that you have already created and sanitized using the 
> Chem.Mol copy constructor.
>
> from rdkit import Chem
>
> smiles_strings = '''
> N12N3C(CC4=CC=CC(NC=C2)=C14)=CC=C3
> C12=CC=CC3=C1N(N4C=CC=C4C2)C=CN3
> '''
>
> smiles_list = smiles_strings.splitlines()[1:]
> print(smiles_list)
>
> params = Chem.SmilesParserParams()
> params.sanitize=False
>
> mols = [Chem.MolFromSmiles(x,params) for x in smiles_list]
> for m in mols:
>     Chem.SanitizeMol(m, Chem.SANITIZE_ALL ^ Chem.SANITIZE_SETAROMATICITY)
>
> pattern = Chem.Mol(mols[0])
>
> query_params = Chem.AdjustQueryParameters()
> query_params.makeBondsGeneric = True
> query_params.aromatizeIfPossible = False
> query_params.adjustDegree = False
> query_params.adjustHeavyDegree = False
> pattern_generic_bonds = Chem.AdjustQueryProperties(pattern,query_params)
>
> matches = [idx for idx,m in enumerate(mols) if 
> m.HasSubstructMatch(pattern_generic_bonds)]
> print("{} of {}: {}".format(len(matches),len(smiles_list),matches))
>
> $ python3 SubstructMatch2.py
>
> ['N12N3C(CC4=CC=CC(NC=C2)=C14)=CC=C3', 'C12=CC=CC3=C1N(N4C=CC=C4C2)C=CN3']
> 2 of 2: [0, 1]
>
> Cheers,
> p.
>
> On 20/05/2020 09:50, theozh wrote:
>> from rdkit import Chem
>>
>> smiles_strings = '''
>> N12N3C(CC4=CC=CC(NC=C2)=C14)=CC=C3
>> C12=CC=CC3=C1N(N4C=CC=C4C2)C=CN3
>> '''
>>
>> smiles_list = smiles_strings.splitlines()[1:]
>> print(smiles_list)
>>
>> params = Chem.SmilesParserParams()
>> params.sanitize=False
>>
>> mols = [Chem.MolFromSmiles(x,params) for x in smiles_list]
>>
>> pattern = Chem.MolFromSmiles(smiles_list[0],params)
>>
>> query_params = Chem.AdjustQueryParameters()
>> query_params.makeBondsGeneric = True
>> query_params.aromatizeIfPossible = False
>> query_params.adjustDegree = False
>> query_params.adjustHeavyDegree = False
>> pattern_generic_bonds = Chem.AdjustQueryProperties(pattern,query_params)
>>
>> matches = [idx for idx,m in enumerate(mols) if 
>> m.HasSubstructMatch(pattern_generic_bonds)]
>> print("{} of {}: {}".format(len(matches),len(smiles_list),matches))


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Substructure search issue with aliphatic/aromatic bonds

2020-05-20 Thread Paolo Tosco

Hi Theo,

that's because you omitted the sanitization step completely, so the 
molecule is missing crucial information for the SubstructureMatch to do 
a proper job.


If you put back sanitization, only leaving out the aromatization step, 
things work as expected.
Also, you do not need to create pattern again from SMILES, you can make 
a copy of the molecule that you have already created and sanitized using 
the Chem.Mol copy constructor.


from rdkit import Chem

smiles_strings = '''
N12N3C(CC4=CC=CC(NC=C2)=C14)=CC=C3
C12=CC=CC3=C1N(N4C=CC=C4C2)C=CN3
'''

smiles_list = smiles_strings.splitlines()[1:]
print(smiles_list)

params = Chem.SmilesParserParams()
params.sanitize=False

mols = [Chem.MolFromSmiles(x,params) for x in smiles_list]
for m in mols:
    Chem.SanitizeMol(m, Chem.SANITIZE_ALL ^ Chem.SANITIZE_SETAROMATICITY)

pattern = Chem.Mol(mols[0])

query_params = Chem.AdjustQueryParameters()
query_params.makeBondsGeneric = True
query_params.aromatizeIfPossible = False
query_params.adjustDegree = False
query_params.adjustHeavyDegree = False
pattern_generic_bonds = Chem.AdjustQueryProperties(pattern,query_params)

matches = [idx for idx,m in enumerate(mols) if 
m.HasSubstructMatch(pattern_generic_bonds)]

print("{} of {}: {}".format(len(matches),len(smiles_list),matches))

$ python3 SubstructMatch2.py

['N12N3C(CC4=CC=CC(NC=C2)=C14)=CC=C3', 'C12=CC=CC3=C1N(N4C=CC=C4C2)C=CN3']
2 of 2: [0, 1]

Cheers,
p.

On 20/05/2020 09:50, theozh wrote:

from rdkit import Chem

smiles_strings = '''
N12N3C(CC4=CC=CC(NC=C2)=C14)=CC=C3
C12=CC=CC3=C1N(N4C=CC=C4C2)C=CN3
'''

smiles_list = smiles_strings.splitlines()[1:]
print(smiles_list)

params = Chem.SmilesParserParams()
params.sanitize=False

mols = [Chem.MolFromSmiles(x,params) for x in smiles_list]

pattern = Chem.MolFromSmiles(smiles_list[0],params)

query_params = Chem.AdjustQueryParameters()
query_params.makeBondsGeneric = True
query_params.aromatizeIfPossible = False
query_params.adjustDegree = False
query_params.adjustHeavyDegree = False
pattern_generic_bonds = Chem.AdjustQueryProperties(pattern,query_params)

matches = [idx for idx,m in enumerate(mols) if 
m.HasSubstructMatch(pattern_generic_bonds)]
print("{} of {}: {}".format(len(matches),len(smiles_list),matches))



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Substructure search issue with aliphatic/aromatic bonds

2020-05-20 Thread theozh
Hi Paolo,

sorry, I made a typo (makeBondGeneric instead of makeBondsGeneric) that's why 
the bonds weren't UNSPECIFIED.
The following examples seem to work fine now for these two SMILES, the first 
structure will be found in the second one.

C12=CC=CN1NCCC2
and
C12C=CC=C(C=C3)C=1N3NCC2

However, there is another example where it still doesn't work with this code. 
See my code below.
The two SMILES

N12N3C(CC4=CC=CC(NC=C2)=C14)=CC=C3
and
C12=CC=CC3=C1N(N4C=CC=C4C2)C=CN3

actually describe the identical structure, but were drawn in a different way in 
ChemDraw. As a consequence the SMILES are different which shouldn't be a 
problem. But if I put these SMILES into the code below the first one won't 
match the second one and the other way around as well.
I must be doing something horribly wrong.
Do I have to canonicalize the SMILES first?
Isn't there a good tutorial on substructure search with RDKit and all its 
options and frequently asked questions and tons of examples?

best,
Theo.


### start of code
from rdkit import Chem

smiles_strings = '''
N12N3C(CC4=CC=CC(NC=C2)=C14)=CC=C3
C12=CC=CC3=C1N(N4C=CC=C4C2)C=CN3
'''

smiles_list = smiles_strings.splitlines()[1:]
print(smiles_list)

params = Chem.SmilesParserParams()
params.sanitize=False

mols = [Chem.MolFromSmiles(x,params) for x in smiles_list]

pattern = Chem.MolFromSmiles(smiles_list[0],params)

query_params = Chem.AdjustQueryParameters()
query_params.makeBondsGeneric = True
query_params.aromatizeIfPossible = False
query_params.adjustDegree = False
query_params.adjustHeavyDegree = False
pattern_generic_bonds = Chem.AdjustQueryProperties(pattern,query_params)

matches = [idx for idx,m in enumerate(mols) if 
m.HasSubstructMatch(pattern_generic_bonds)]
print("{} of {}: {}".format(len(matches),len(smiles_list),matches))
### end of code


Am 19.05.2020 um 18:30 schrieb Paolo Tosco:
> Hi Theo,
>
> I don't think the RDKit version should make a difference; did you notice that 
> rdmolops.AdjustQueryProperties() does not modify the molecule in place, but 
> rather returns a modified copy?
>
> pattern_generic_bonds = Chem.AdjustQueryProperties(pattern, query_params)
>
> That might be the reason. Also, only pattern_generic_bonds will have 
> UNSPECIFIED bonds, the mols will still have SINGLE and DOUBLE bonds.
>
> Feel free to contact me off-list if you need help with the above.
>
> Cheers,
> p.
>
> On 19/05/2020 17:01, theozh wrote:
>> Hi Paolo,
>>
>> thank you very much for your detailed answer.
>> I tried to reproduce your last suggestion (but I don't have Jupyter 
>> Notebook).
>> However, my bonds are still SINGLE and DOUBLE instead of UNSPECIFIED.
>> Does this maybe depend on the RDKit Version, I have 2019.03... ?
>>
>> Maybe, I should update and need to investigate further.
>> Theo.
>>
>>
>> Am 19.05.2020 um 16:44 schrieb Paolo Tosco:
>>> Hi Theo,
>>>
>>> the lack of match is due to different aromaticity flags on atoms and bonds 
>>> in the larger molecule.
>>>
>>> This gist provides some explanation and a possible solution:
>>>
>>> https://gist.github.com/ptosco/e410e45278b94e8f047ff224193d7788
>>>
>>> Cheers,
>>> p.
>>>
>>> On 19/05/2020 14:13, theozh wrote:
 Dear RDKit-users,

 I would like to do a very simple substructure search.
 The chapter 3.5 "Substructure Searching" in RDKit Documentation 
 (2019.09.1) is pretty short and doesn't point to a solution. So far, I've 
 learned that you can create your search pattern via Chem.MolFromSmiles() 
 or Chem.MolFromSmarts().

 In the below copy minimal example, I want to use the first SMILES in 
 the list as search pattern. I expect 2 matches but I get either 1 or 0 
 matches. So, I'm doing something wrong. What am I missing?
 Is it about implicit/explicit aromatic and aliphatic bonds or some 
 explicit/implicit hydrogen?
 How to find the first structure in both SMILES?

 thank you for any hints,
 Theo.

 ### simple substructure search (but doesn't find what is expected)
 from rdkit import Chem

 smiles_strings = '''
 C12=CC=CN1NCCC2
 C12=CC=CC(C=C3)=C1N3NCC2
 '''
 smiles_list = smiles_strings.splitlines()[1:]
 print(smiles_list)

 pattern = Chem.MolFromSmiles(smiles_list[0])  # MolFromSmiles
 matches = [x for x in smiles_list if 
 Chem.MolFromSmiles(x).HasSubstructMatch(pattern)]
 print(len(matches))   # result: 1, why not 2?

 pattern = Chem.MolFromSmarts(smiles_list[0])  # MolFromSmarts
 matches = [x for x in smiles_list if 
 Chem.MolFromSmiles(x).HasSubstructMatch(pattern)]
 print(len(matches))   # result: 0, why not 2?
 ### end of code


 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


___
Rdkit-discuss mailing list

Re: [Rdkit-discuss] Substructure search issue with aliphatic/aromatic bonds

2020-05-19 Thread Paolo Tosco

Hi Theo,

I don't think the RDKit version should make a difference; did you notice 
that rdmolops.AdjustQueryProperties() does not modify the molecule in 
place, but rather returns a modified copy?


pattern_generic_bonds  =  Chem.AdjustQueryProperties(pattern,  query_params)

That might be the reason. Also, only pattern_generic_bonds will have 
UNSPECIFIED bonds, the mols will still have SINGLE and DOUBLE bonds.


Feel free to contact me off-list if you need help with the above.

Cheers,
p.

On 19/05/2020 17:01, theozh wrote:

Hi Paolo,

thank you very much for your detailed answer.
I tried to reproduce your last suggestion (but I don't have Jupyter Notebook).
However, my bonds are still SINGLE and DOUBLE instead of UNSPECIFIED.
Does this maybe depend on the RDKit Version, I have 2019.03... ?

Maybe, I should update and need to investigate further.
Theo.


Am 19.05.2020 um 16:44 schrieb Paolo Tosco:

Hi Theo,

the lack of match is due to different aromaticity flags on atoms and bonds in 
the larger molecule.

This gist provides some explanation and a possible solution:

https://gist.github.com/ptosco/e410e45278b94e8f047ff224193d7788

Cheers,
p.

On 19/05/2020 14:13, theozh wrote:

Dear RDKit-users,

I would like to do a very simple substructure search.
The chapter 3.5 "Substructure Searching" in RDKit Documentation (2019.09.1) is 
pretty short and doesn't point to a solution. So far, I've learned that you can create 
your search pattern via Chem.MolFromSmiles() or Chem.MolFromSmarts().

In the below copy minimal example, I want to use the first SMILES in the 
list as search pattern. I expect 2 matches but I get either 1 or 0 matches. So, I'm 
doing something wrong. What am I missing?
Is it about implicit/explicit aromatic and aliphatic bonds or some 
explicit/implicit hydrogen?
How to find the first structure in both SMILES?

thank you for any hints,
Theo.

### simple substructure search (but doesn't find what is expected)
from rdkit import Chem

smiles_strings = '''
C12=CC=CN1NCCC2
C12=CC=CC(C=C3)=C1N3NCC2
'''
smiles_list = smiles_strings.splitlines()[1:]
print(smiles_list)

pattern = Chem.MolFromSmiles(smiles_list[0])  # MolFromSmiles
matches = [x for x in smiles_list if 
Chem.MolFromSmiles(x).HasSubstructMatch(pattern)]
print(len(matches))   # result: 1, why not 2?

pattern = Chem.MolFromSmarts(smiles_list[0])  # MolFromSmarts
matches = [x for x in smiles_list if 
Chem.MolFromSmiles(x).HasSubstructMatch(pattern)]
print(len(matches))   # result: 0, why not 2?
### end of code


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Substructure search issue with aliphatic/aromatic bonds

2020-05-19 Thread theozh
Hi Paolo,

thank you very much for your detailed answer.
I tried to reproduce your last suggestion (but I don't have Jupyter Notebook).
However, my bonds are still SINGLE and DOUBLE instead of UNSPECIFIED.
Does this maybe depend on the RDKit Version, I have 2019.03... ?

Maybe, I should update and need to investigate further.
Theo.


Am 19.05.2020 um 16:44 schrieb Paolo Tosco:
> Hi Theo,
>
> the lack of match is due to different aromaticity flags on atoms and bonds in 
> the larger molecule.
>
> This gist provides some explanation and a possible solution:
>
> https://gist.github.com/ptosco/e410e45278b94e8f047ff224193d7788
>
> Cheers,
> p.
>
> On 19/05/2020 14:13, theozh wrote:
>> Dear RDKit-users,
>>
>> I would like to do a very simple substructure search.
>> The chapter 3.5 "Substructure Searching" in RDKit Documentation (2019.09.1) 
>> is pretty short and doesn't point to a solution. So far, I've learned that 
>> you can create your search pattern via Chem.MolFromSmiles() or 
>> Chem.MolFromSmarts().
>>
>> In the below copy minimal example, I want to use the first SMILES in 
>> the list as search pattern. I expect 2 matches but I get either 1 or 0 
>> matches. So, I'm doing something wrong. What am I missing?
>> Is it about implicit/explicit aromatic and aliphatic bonds or some 
>> explicit/implicit hydrogen?
>> How to find the first structure in both SMILES?
>>
>> thank you for any hints,
>> Theo.
>>
>> ### simple substructure search (but doesn't find what is expected)
>> from rdkit import Chem
>>
>> smiles_strings = '''
>> C12=CC=CN1NCCC2
>> C12=CC=CC(C=C3)=C1N3NCC2
>> '''
>> smiles_list = smiles_strings.splitlines()[1:]
>> print(smiles_list)
>>
>> pattern = Chem.MolFromSmiles(smiles_list[0])  # MolFromSmiles
>> matches = [x for x in smiles_list if 
>> Chem.MolFromSmiles(x).HasSubstructMatch(pattern)]
>> print(len(matches))   # result: 1, why not 2?
>>
>> pattern = Chem.MolFromSmarts(smiles_list[0])  # MolFromSmarts
>> matches = [x for x in smiles_list if 
>> Chem.MolFromSmiles(x).HasSubstructMatch(pattern)]
>> print(len(matches))   # result: 0, why not 2?
>> ### end of code
>>
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Substructure search issue with aliphatic/aromatic bonds

2020-05-19 Thread Paolo Tosco

Hi Theo,

the lack of match is due to different aromaticity flags on atoms and 
bonds in the larger molecule.


This gist provides some explanation and a possible solution:

https://gist.github.com/ptosco/e410e45278b94e8f047ff224193d7788

Cheers,
p.

On 19/05/2020 14:13, theozh wrote:

Dear RDKit-users,

I would like to do a very simple substructure search.
The chapter 3.5 "Substructure Searching" in RDKit Documentation (2019.09.1) is 
pretty short and doesn't point to a solution. So far, I've learned that you can create 
your search pattern via Chem.MolFromSmiles() or Chem.MolFromSmarts().

In the below copy minimal example, I want to use the first SMILES in the 
list as search pattern. I expect 2 matches but I get either 1 or 0 matches. So, I'm 
doing something wrong. What am I missing?
Is it about implicit/explicit aromatic and aliphatic bonds or some 
explicit/implicit hydrogen?
How to find the first structure in both SMILES?

thank you for any hints,
Theo.

### simple substructure search (but doesn't find what is expected)
from rdkit import Chem

smiles_strings = '''
C12=CC=CN1NCCC2
C12=CC=CC(C=C3)=C1N3NCC2
'''
smiles_list = smiles_strings.splitlines()[1:]
print(smiles_list)

pattern = Chem.MolFromSmiles(smiles_list[0])  # MolFromSmiles
matches = [x for x in smiles_list if 
Chem.MolFromSmiles(x).HasSubstructMatch(pattern)]
print(len(matches))   # result: 1, why not 2?

pattern = Chem.MolFromSmarts(smiles_list[0])  # MolFromSmarts
matches = [x for x in smiles_list if 
Chem.MolFromSmiles(x).HasSubstructMatch(pattern)]
print(len(matches))   # result: 0, why not 2?
### end of code


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Substructure search issue with aliphatic/aromatic bonds

2020-05-19 Thread theozh
Dear RDKit-users,

I would like to do a very simple substructure search.
The chapter 3.5 "Substructure Searching" in RDKit Documentation (2019.09.1) is 
pretty short and doesn't point to a solution. So far, I've learned that you can 
create your search pattern via Chem.MolFromSmiles() or Chem.MolFromSmarts().

In the below copy minimal example, I want to use the first SMILES in the 
list as search pattern. I expect 2 matches but I get either 1 or 0 matches. So, 
I'm doing something wrong. What am I missing?
Is it about implicit/explicit aromatic and aliphatic bonds or some 
explicit/implicit hydrogen?
How to find the first structure in both SMILES?

thank you for any hints,
Theo.

### simple substructure search (but doesn't find what is expected)
from rdkit import Chem

smiles_strings = '''
C12=CC=CN1NCCC2
C12=CC=CC(C=C3)=C1N3NCC2
'''
smiles_list = smiles_strings.splitlines()[1:]
print(smiles_list)

pattern = Chem.MolFromSmiles(smiles_list[0])  # MolFromSmiles
matches = [x for x in smiles_list if 
Chem.MolFromSmiles(x).HasSubstructMatch(pattern)]
print(len(matches))   # result: 1, why not 2?

pattern = Chem.MolFromSmarts(smiles_list[0])  # MolFromSmarts
matches = [x for x in smiles_list if 
Chem.MolFromSmiles(x).HasSubstructMatch(pattern)]
print(len(matches))   # result: 0, why not 2?
### end of code


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss