What are the steps one must to to use an input structure (from
a SMILES string) as a substructure query? It looks like I need
to remove explicit hydrogens [* see footnote]. Is there anything
else? And what is the right way to remove explicit hydrogens?
I'm working again on a project to do substructure searches.
In the data flow, the user sketches a structure, which gets
turned into a SMILES string. Let's suppose that sketched
SMILES is c1cc2ccccc2[nH]1 .
The dead-simple, brute-force solution is:
query = Chem.MolFromSmiles("c1cc2ccccc2[nH]1")
for mol in dataset:
if mol.HasSubstructMatch(query):
print "Match!", mol.GetProp("_Name")
However, that doesn't work like I hoped it would. Consider these:
>>> mol = Chem.MolFromSmiles("c1c(C)c2c(N)cccc2[nH]1")
>>> mol.HasSubstructMatch(query)
True
>>> mol = Chem.MolFromSmiles("Fn1cc(Cl)c2ccccc21")
>>> mol.HasSubstructMatch(query)
False
I expected the sketched structure to match both structures,
and not just the first. The failure appears to be because
of the explicit hydrogen in the '[nH]' term. If I remove the
'H' then the match is fine.
>>> for atom in query.GetAtoms():
... print atom.GetNumExplicitHs(),
... else:
... print
...
0 0 0 0 0 0 0 0 1
>>> for atom in query.GetAtoms():
... atom.SetNumExplicitHs(0)
...
>>> mol.HasSubstructMatch(query)
True
Is a description of how atoms and bonds are matched, when the
given substructure comes from a molecule and not a SMARTS,
available somewhere?
I thought that I could use RemoveHs to automate this step, but that
does not seem to be the case.
>>> query = Chem.MolFromSmiles("c1cc2ccccc2[nH]1")
>>> query2 = Chem.RemoveHs(query)
>>> mol.HasSubstructMatch(query2)
False
>>>
Am I using it incorrectly, or perhaps there is another
function I should be using?
Finally, am I missing anything else in what I need to do
in order to prepare an input substructure as a substructure
query?
Cheers,
Andrew
[email protected]
[*]
RDKit uses a different terminology than Daylight/OEChem.
In RDKit, "O" has implicit hydrogens, "[OH2]" has explicit
hydrogens, and "[2H]O[2H]" has two hydrogen atoms. In Daylight,
the first two are different ways of writing implicit hydrogens,
and the third has two explicit hydrogen atoms.
This is not obvious from the Daylight documentation, and
various toolkits do it in different ways.
------------------------------------------------------------------------------
Keep yourself connected to Go Parallel:
BUILD Helping you discover the best ways to construct your parallel projects.
http://goparallel.sourceforge.net
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss