Re: [Rdkit-discuss] how to use structure as substructure query

Andrew Dalke Mon, 03 Dec 2012 12:58:54 -0800

On Dec 3, 2012, at 4:55 PM, Greg Landrum wrote:
> Yes, it's here:
> http://www.rdkit.org/docs/RDKit_Book.html#atom-atom-matching-in-substructure-queries


Thanks.

It's incomplete though - it doesn't show how bonds are matched nor
how aromaticity is handled for atoms. Does a SMILES with a "C" mean
that aromaticity is specified, and so that "c" is not matched? I
can't determine that from the docs.


I suspect the following shows an incorrect implementation:

>>> query = Chem.MolFromSmiles("CC")
>>> target = Chem.MolFromSmiles("c1ccccc1C")
>>> target.HasSubstructMatch(query)
True
>>> target = Chem.MolFromSmiles("c1ccccc1")
>>> target.HasSubstructMatch(query)
False

  I did not expect a "CC" to match the "cC".

There's also a strangeness in the following, where
a single bond can match a double:

>>> query = Chem.MolFromSmiles("CC")
>>> target = Chem.MolFromSmiles("c1cccc1=C")
>>> target.HasSubstructMatch(query)
True
>>> target = Chem.MolFromSmiles("c1ccccc1")
>>> target.HasSubstructMatch(query)
False

... even when I explicitly give a single bond:

>>> query = Chem.MolFromSmiles("C-C")
>>> target = Chem.MolFromSmiles("c1cccc1=C")
>>> target.HasSubstructMatch(query)
True
>>> target = Chem.MolFromSmiles("c1ccccc1")
>>> target.HasSubstructMatch(query)
False


The reason this is important to what I'm doing is that I
am developing new SMARTS patterns for screening. One of my
patterns is "CC". Consider the following case:

My query is:  CC1=Cc2ccccc2CN1
My target is: c1ccc2c(c1)C=C3c4ccccc4C(=O)N3[C@@H]2O

>>> query = Chem.MolFromSmiles("CC1=Cc2ccccc2CN1")
>>> target = Chem.MolFromSmiles("c1ccc2c(c1)C=C3c4ccccc4C(=O)N3[C@@H]2O")

Here's the code which does the screening.

>>> screen = Chem.MolFromSmarts("CC")
>>> query.HasSubstructMatch(screen)
True
>>> target.HasSubstructMatch(screen)
False
>>> 

This should mean that the target is screened out. However,
RDKit says that the query is actually a substructure of the target:

>>> target.HasSubstructMatch(query)
True

This means the the SMARTS pattern "CC" is a false screen.


Based on this, it seems that I can't use SMARTS patterns to define
a screen which is easily compatible with the molecule-based substructure
matcher.

What I think I can do is:
  1) parse the SMILES for the query
  2) remove any explicit hydrogens
  3) use Chem.MolFragmentToSmiles to turn the de-hydrogenated molecule 
      into a SMARTS string
  4) convert the SMARTS into the actual query

But I know that MolFragmentToSmiles is a new API function, and I'm
pretty certain that you do something else in your Postgres cartridge.
We even had an exchange about a year ago on improving the SMARTS
patterns which you use for screening.

So, how do you screen so that you can use an input molecule as a query?

Cheers,

                                Andrew
                                [email protected]



------------------------------------------------------------------------------
Keep yourself connected to Go Parallel: 
BUILD Helping you discover the best ways to construct your parallel projects.
http://goparallel.sourceforge.net
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] how to use structure as substructure query

Reply via email to