On Dec 3, 2012, at 4:55 PM, Greg Landrum wrote:
> Yes, it's here:
> http://www.rdkit.org/docs/RDKit_Book.html#atom-atom-matching-in-substructure-queries
Thanks.
It's incomplete though - it doesn't show how bonds are matched nor
how aromaticity is handled for atoms. Does a SMILES with a "C" mean
that aromaticity is specified, and so that "c" is not matched? I
can't determine that from the docs.
I suspect the following shows an incorrect implementation:
>>> query = Chem.MolFromSmiles("CC")
>>> target = Chem.MolFromSmiles("c1ccccc1C")
>>> target.HasSubstructMatch(query)
True
>>> target = Chem.MolFromSmiles("c1ccccc1")
>>> target.HasSubstructMatch(query)
False
I did not expect a "CC" to match the "cC".
There's also a strangeness in the following, where
a single bond can match a double:
>>> query = Chem.MolFromSmiles("CC")
>>> target = Chem.MolFromSmiles("c1cccc1=C")
>>> target.HasSubstructMatch(query)
True
>>> target = Chem.MolFromSmiles("c1ccccc1")
>>> target.HasSubstructMatch(query)
False
... even when I explicitly give a single bond:
>>> query = Chem.MolFromSmiles("C-C")
>>> target = Chem.MolFromSmiles("c1cccc1=C")
>>> target.HasSubstructMatch(query)
True
>>> target = Chem.MolFromSmiles("c1ccccc1")
>>> target.HasSubstructMatch(query)
False
The reason this is important to what I'm doing is that I
am developing new SMARTS patterns for screening. One of my
patterns is "CC". Consider the following case:
My query is: CC1=Cc2ccccc2CN1
My target is: c1ccc2c(c1)C=C3c4ccccc4C(=O)N3[C@@H]2O
>>> query = Chem.MolFromSmiles("CC1=Cc2ccccc2CN1")
>>> target = Chem.MolFromSmiles("c1ccc2c(c1)C=C3c4ccccc4C(=O)N3[C@@H]2O")
Here's the code which does the screening.
>>> screen = Chem.MolFromSmarts("CC")
>>> query.HasSubstructMatch(screen)
True
>>> target.HasSubstructMatch(screen)
False
>>>
This should mean that the target is screened out. However,
RDKit says that the query is actually a substructure of the target:
>>> target.HasSubstructMatch(query)
True
This means the the SMARTS pattern "CC" is a false screen.
Based on this, it seems that I can't use SMARTS patterns to define
a screen which is easily compatible with the molecule-based substructure
matcher.
What I think I can do is:
1) parse the SMILES for the query
2) remove any explicit hydrogens
3) use Chem.MolFragmentToSmiles to turn the de-hydrogenated molecule
into a SMARTS string
4) convert the SMARTS into the actual query
But I know that MolFragmentToSmiles is a new API function, and I'm
pretty certain that you do something else in your Postgres cartridge.
We even had an exchange about a year ago on improving the SMARTS
patterns which you use for screening.
So, how do you screen so that you can use an input molecule as a query?
Cheers,
Andrew
[email protected]
------------------------------------------------------------------------------
Keep yourself connected to Go Parallel:
BUILD Helping you discover the best ways to construct your parallel projects.
http://goparallel.sourceforge.net
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss