Hi Theo,
that's because you omitted the sanitization step completely, so the
molecule is missing crucial information for the SubstructureMatch to do
a proper job.
If you put back sanitization, only leaving out the aromatization step,
things work as expected.
Also, you do not need to create pattern again from SMILES, you can make
a copy of the molecule that you have already created and sanitized using
the Chem.Mol copy constructor.
from rdkit import Chem
smiles_strings = '''
N12N3C(CC4=CC=CC(NC=C2)=C14)=CC=C3
C12=CC=CC3=C1N(N4C=CC=C4C2)C=CN3
'''
smiles_list = smiles_strings.splitlines()[1:]
print(smiles_list)
params = Chem.SmilesParserParams()
params.sanitize=False
mols = [Chem.MolFromSmiles(x,params) for x in smiles_list]
for m in mols:
Chem.SanitizeMol(m, Chem.SANITIZE_ALL ^ Chem.SANITIZE_SETAROMATICITY)
pattern = Chem.Mol(mols[0])
query_params = Chem.AdjustQueryParameters()
query_params.makeBondsGeneric = True
query_params.aromatizeIfPossible = False
query_params.adjustDegree = False
query_params.adjustHeavyDegree = False
pattern_generic_bonds = Chem.AdjustQueryProperties(pattern,query_params)
matches = [idx for idx,m in enumerate(mols) if
m.HasSubstructMatch(pattern_generic_bonds)]
print("{} of {}: {}".format(len(matches),len(smiles_list),matches))
$ python3 SubstructMatch2.py
['N12N3C(CC4=CC=CC(NC=C2)=C14)=CC=C3', 'C12=CC=CC3=C1N(N4C=CC=C4C2)C=CN3']
2 of 2: [0, 1]
Cheers,
p.
On 20/05/2020 09:50, theozh wrote:
from rdkit import Chem
smiles_strings = '''
N12N3C(CC4=CC=CC(NC=C2)=C14)=CC=C3
C12=CC=CC3=C1N(N4C=CC=C4C2)C=CN3
'''
smiles_list = smiles_strings.splitlines()[1:]
print(smiles_list)
params = Chem.SmilesParserParams()
params.sanitize=False
mols = [Chem.MolFromSmiles(x,params) for x in smiles_list]
pattern = Chem.MolFromSmiles(smiles_list[0],params)
query_params = Chem.AdjustQueryParameters()
query_params.makeBondsGeneric = True
query_params.aromatizeIfPossible = False
query_params.adjustDegree = False
query_params.adjustHeavyDegree = False
pattern_generic_bonds = Chem.AdjustQueryProperties(pattern,query_params)
matches = [idx for idx,m in enumerate(mols) if
m.HasSubstructMatch(pattern_generic_bonds)]
print("{} of {}: {}".format(len(matches),len(smiles_list),matches))
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss