Hi Theo,

the lack of match is due to different aromaticity flags on atoms and bonds in the larger molecule.

This gist provides some explanation and a possible solution:

https://gist.github.com/ptosco/e410e45278b94e8f047ff224193d7788

Cheers,
p.

On 19/05/2020 14:13, theozh wrote:
Dear RDKit-users,

I would like to do a very simple substructure search.
The chapter 3.5 "Substructure Searching" in RDKit Documentation (2019.09.1) is 
pretty short and doesn't point to a solution. So far, I've learned that you can create 
your search pattern via Chem.MolFromSmiles() or Chem.MolFromSmarts().

In the below copy&paste minimal example, I want to use the first SMILES in the 
list as search pattern. I expect 2 matches but I get either 1 or 0 matches. So, I'm 
doing something wrong. What am I missing?
Is it about implicit/explicit aromatic and aliphatic bonds or some 
explicit/implicit hydrogen?
How to find the first structure in both SMILES?

thank you for any hints,
Theo.

### simple substructure search (but doesn't find what is expected)
from rdkit import Chem

smiles_strings = '''
C12=CC=CN1NCCC2
C12=CC=CC(C=C3)=C1N3NCC2
'''
smiles_list = smiles_strings.splitlines()[1:]
print(smiles_list)

pattern = Chem.MolFromSmiles(smiles_list[0])  # MolFromSmiles
matches = [x for x in smiles_list if 
Chem.MolFromSmiles(x).HasSubstructMatch(pattern)]
print(len(matches))   # result: 1, why not 2?

pattern = Chem.MolFromSmarts(smiles_list[0])  # MolFromSmarts
matches = [x for x in smiles_list if 
Chem.MolFromSmiles(x).HasSubstructMatch(pattern)]
print(len(matches))   # result: 0, why not 2?
### end of code


_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to