I neglected to cc Rdkit on this earlier. If he can get the matching atom
list from their other program, he won't have to mess w. SMARTS matching in
Rdkit.

-P.
Sent from a cell phone. Please forgive brvty and m1St@kes.
---------- Forwarded message ----------
From: "Peter S. Shenkin" <shen...@gmail.com>
Date: Sep 13, 2017 3:15 PM
Subject: Re: [Rdkit-discuss] HasSubstructMatch doesn't work as expected
To: "Michał Nowotka" <mmm...@gmail.com>
Cc:

​Well, depending on how the substructure results from the other program are
presented, you might not have to deal with SMARTS matching at all yourself.
For example, if you have a SMILES for the structure and a list of atom
indices into that SMILES that constitute the matching substructure (where
the first atom in the SMILES has index 0), you can do the following:

from rdkit import Chem
from rdkit.Chem import Draw

smi = 'Oc1ccccc1' # Assume a SMILES
matching_atoms = [0, 1] # Assume a list of matching atoms
mol = Chem.MolFromSmiles(smi)
x = Draw.MolToImage(mol,highlightAtoms=(0,1))
display(x)


​See attached for the image, from a Jupyter notebook.

If, on the other hand, you have to work from SMARTS, then it seems to me
that you need to understand something about how SMARTS works, and you have
to understand the needed chemical concepts, or at least interact with
someone who does.

Otherwise, it's a bit like trying to do complicated substring matches using
regular expressions, without knowing how regular expressions work.

-
​
P.​


On Sep 13, 2017 12:12 PM, "Michał Nowotka" <mmm...@gmail.com> wrote:

> OK, so what I have is some substructure results from other (non-rdkit)
> cartridge and I want to use rdkit to generate images of all results
> with the query substracture highlighed and aligned.
> So I have two things: a list of compounds and a query compound.
> Now I need to highlight the query compound for every compound from the
> list and I need to do it at all costs. I can't leave any compound not
> highlighted even if rdkit by default has a different opinion weather
> the query compound really is a true substructure of a given compound.
>
> So how can I instruct rdkit to ignore aromacity and other factors,
> preferably one by one, each time going one level deeper where the last
> resort would be simply matching on the level of two planar graphs. Is
> that possible?
>
> On Wed, Sep 13, 2017 at 4:48 PM, Peter S. Shenkin <shen...@gmail.com>
> wrote:
> > Your course of action depends upon just what you are really trying to
> do. If
> > it's only aspirin, then why wouldn't you just do it manually? If it goes
> > beyond aspirin, you have to start by defining in general terms exactly
> what
> > you want to match to what.
> >
> > For example, given a query molecule (aspirin in this case), if you want
> all
> > its non-aromatic atoms to match aromatic as well as non-aromatic atoms in
> > the database, you could write a string-alteration routine to munge the
> > SMILES of a query molecule into a SMARTS that would do just that, and
> then
> > use that SMARTS to match your database molecules. Repeat for each query
> > molecule.
> >
> > But you have to start with a precise definition of just what kind of
> > matching you wish to do. For instance, maybe you don't really want
> > non-aromatic ring atoms in your query to match aromatic rings and vice
> versa
> > (i.e., a cyclohexyl to match a phenyl); maybe you only want non-ring
> atoms
> > in the query to match aliphatic as well as aromatic substructures. And so
> > on.
> >
> > -P.
> >
> >
> > On Wed, Sep 13, 2017 at 10:42 AM, Michał Nowotka <mmm...@gmail.com>
> wrote:
> >>
> >> Is there any flag in RDkit to match both 'normal' aspirin and embedded
> >> aromatic analogues?
> >> The problem is that I can't modify user queries by hand in real time :)
> >>
> >> On Wed, Sep 13, 2017 at 2:12 PM, Chris Earnshaw <cgearns...@gmail.com>
> >> wrote:
> >> > Hi
> >> >
> >> > The problem is due to RDkit perceiving the embedded pyranone in
> >> > CHEMBL1999443 as an aromatic system, which is probably correct.
> However,
> >> > in
> >> > the structure of aspirin the carboxyl carbon and singly bonded oxygen
> >> > are
> >> > non-aromatic, so if you just use the SMILES of aspirin as a query it
> >> > won't
> >> > match CHEMBL1999443
> >> >
> >> > You'll need to use a slightly more generic aspirin-like query to allow
> >> > the
> >> > possibility of matching both 'normal' aspirin and embedded aromatic
> >> > analogues. CC(=O)Oc1ccccc1[#6](=O)[#8] should work OK.
> >> >
> >> > Regards,
> >> > Chris
> >> >
> >> > On 13 September 2017 at 13:40, Michał Nowotka <mmm...@gmail.com>
> wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> This problem is probably due to my lack of chemistry knowledge but
> >> >> plese have a look:
> >> >>
> >> >> If I do a substructure search in ChEMBL using aspirin (CHEMBL25) as a
> >> >> query (ChEMBL API uses the Symix catridge):
> >> >>
> >> >>     from chembl_webresource_client.new_client import new_client
> >> >>     res = new_client.substructure.filter(chembl_id='CHEMBL25')
> >> >>
> >> >> One of them will be CHEMBL1999443:
> >> >>
> >> >>     'CHEMBL1999443' in (r['molecule_chembl_id'] for r in res)
> >> >>     >>> True
> >> >>
> >> >> Now I take the molfile:
> >> >>
> >> >>     new_client.molecule.set_format('mol')
> >> >>     mol = new_client.molecule.get('CHEMBL1999443')
> >> >>
> >> >> and load it with aspirin into rdkit:
> >> >>
> >> >>     from rdkit import Chem
> >> >>     m = Chem.MolFromMolBlock(mol)
> >> >>     pattern = Chem.MolFromMolBlock(new_clien
> t.molecule.get('CHEMBL25'))
> >> >>
> >> >> If I check if it has an aspirin as a substructure using rdkit, I'm
> >> >> getting false...
> >> >>
> >> >>     m.HasSubstructMatch(pattern)
> >> >>     >>> False
> >> >>
> >> >> Looking at this blog post:
> >> >>
> >> >>
> >> >> https://github.com/rdkit/rdkit-tutorials/blob/master/noteboo
> ks/002_SMARTS_SubstructureMatching.ipynb
> >> >> I tried to initialize rings and retry:
> >> >>
> >> >>      Chem.GetSymmSSSR(m)
> >> >>      m.HasSubstructMatch(pattern)
> >> >>      >>>False
> >> >>
> >> >>     Chem.GetSymmSSSR(pattern)
> >> >>     m.HasSubstructMatch(pattern)
> >> >>     >>>False
> >> >>
> >> >> But as you can see without any luck. Is there anything else I can do
> >> >> to get the match anyway?
> >> >> Without having a match I can't aligh and higlight asprin substructure
> >> >> in CHEMBL1999443 image using GenerateDepictionMatching2DStructure
> and
> >> >> DrawMolecule functions.
> >> >>
> >> >> Kind regards,
> >> >>
> >> >> Michał Nowotka
> >> >>
> >> >>
> >> >>
> >> >> ------------------------------------------------------------
> ------------------
> >> >> Check out the vibrant tech community on one of the world's most
> >> >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> >> >> _______________________________________________
> >> >> Rdkit-discuss mailing list
> >> >> Rdkit-discuss@lists.sourceforge.net
> >> >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> >> >
> >> >
> >>
> >>
> >> ------------------------------------------------------------
> ------------------
> >> Check out the vibrant tech community on one of the world's most
> >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> >> _______________________________________________
> >> Rdkit-discuss mailing list
> >> Rdkit-discuss@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> >
> >
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to