Re: [Rdkit-discuss] HasSubstructMatch doesn't work as expected

2017-09-14 Thread Greg Landrum
On Thu, Sep 14, 2017 at 12:05 PM, Michał Nowotka  wrote:

> Using
>
> params = AdjustQueryParameters()
> params.makeAtomsGeneric = True
> params.makeBondsGeneric = True
> pattern = AdjustQueryProperties(pattern, params)
>
> Seems to solve my problem - I'm getting a match now (but I haven't yet
> verified if the match is correct and I don't understand why I had to
> set both flags, I'd expect that setting makeBondsGeneric should make
> it work already).
>

I don't think you should need to do "makeAtomsGeneric", but you likely want
to do params.adjustDegree=False.

it's worth being careful with makeBondsGeneric... there you could match
single against triple, which is unlikely to be you want.



> Thank you, Greg!
>
> On Thu, Sep 14, 2017 at 5:58 AM, Greg Landrum 
> wrote:
> > This isn't a really straightforward one.
> >
> > One solution (and I think the best one) is to change the aromaticity
> model
> > used to match whatever is generating the hits (in your case it's the
> Symyx
> > cartridge).
> > The RDKit has functionality to do this already when you call the
> > SetAromaticity() function:
> >
> > In [29]: m2 = Chem.MolFromMolFile('./CHEMBL25.mol',sanitize=False)
> >
> > In [30]: Chem.SanitizeMol(m2,Chem.SANITIZE_ALL^Chem.SANITIZE_
> SETAROMATICITY)
> > Out[30]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE
> >
> > In [31]: Chem.SetAromaticity(m2,Chem.AROMATICITY_SIMPLE)
> >
> >
> > The problem here is that there isn't an aromaticity model there for
> > MDL/Symyx. This would be a useful thing to have and can be done quickly.
> If
> > someone can describe the aromaticity model to me, or point me to a
> > description of it, I can add it for the next release (which happens
> soon).
> >
> > Another solution that I think would work is to read the query molecule in
> > without doing aromaticity perception (see input line 30 above) and then
> to
> > convert all the bonds to either single-or-aromatic or double-or-aromatic
> > queries using the approaches described here:
> > http://rdkit.blogspot.ch/2015/08/tuning-substructure-queries.html
> > and here:
> > http://rdkit.blogspot.ch/2016/07/tuning-substructure-queries-ii.html
> >
> > Unfortunately the AdjustQueryParameters function doesn't have anything
> that
> > helps with the kind of bond queries you need, so you'd need to make the
> bond
> > changes in your code. If you want to go down this road and it's not clear
> > how to do so, let me know and I can post some sample code. I'm afraid
> it's
> > not completely trivial with bond queries
> >
> > -greg
> >
> >
> >
> > On Wed, Sep 13, 2017 at 4:42 PM, Michał Nowotka 
> wrote:
> >>
> >> Is there any flag in RDkit to match both 'normal' aspirin and embedded
> >> aromatic analogues?
> >> The problem is that I can't modify user queries by hand in real time :)
> >>
> >> On Wed, Sep 13, 2017 at 2:12 PM, Chris Earnshaw 
> >> wrote:
> >> > Hi
> >> >
> >> > The problem is due to RDkit perceiving the embedded pyranone in
> >> > CHEMBL1999443 as an aromatic system, which is probably correct.
> However,
> >> > in
> >> > the structure of aspirin the carboxyl carbon and singly bonded oxygen
> >> > are
> >> > non-aromatic, so if you just use the SMILES of aspirin as a query it
> >> > won't
> >> > match CHEMBL1999443
> >> >
> >> > You'll need to use a slightly more generic aspirin-like query to allow
> >> > the
> >> > possibility of matching both 'normal' aspirin and embedded aromatic
> >> > analogues. CC(=O)Oc1c1[#6](=O)[#8] should work OK.
> >> >
> >> > Regards,
> >> > Chris
> >> >
> >> > On 13 September 2017 at 13:40, Michał Nowotka 
> wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> This problem is probably due to my lack of chemistry knowledge but
> >> >> plese have a look:
> >> >>
> >> >> If I do a substructure search in ChEMBL using aspirin (CHEMBL25) as a
> >> >> query (ChEMBL API uses the Symix catridge):
> >> >>
> >> >> from chembl_webresource_client.new_client import new_client
> >> >> res = new_client.substructure.filter(chembl_id='CHEMBL25')
> >> >>
> >> >> One of them will be CHEMBL1999443:
> >> >>
> >> >> 'CHEMBL1999443' in (r['molecule_chembl_id'] for r in res)
> >> >> >>> True
> >> >>
> >> >> Now I take the molfile:
> >> >>
> >> >> new_client.molecule.set_format('mol')
> >> >> mol = new_client.molecule.get('CHEMBL1999443')
> >> >>
> >> >> and load it with aspirin into rdkit:
> >> >>
> >> >> from rdkit import Chem
> >> >> m = Chem.MolFromMolBlock(mol)
> >> >> pattern = Chem.MolFromMolBlock(new_
> client.molecule.get('CHEMBL25'))
> >> >>
> >> >> If I check if it has an aspirin as a substructure using rdkit, I'm
> >> >> getting false...
> >> >>
> >> >> m.HasSubstructMatch(pattern)
> >> >> >>> False
> >> >>
> >> >> Looking at this blog post:
> >> >>
> >> >>
> >> >> https://github.com/rdkit/rdkit-tutorials/blob/master/
> notebooks/002_SMARTS_SubstructureMatching.ipynb
> >> >> I tried to initialize rings and retry:
> >> >>
> >> >>  Chem.GetSymmSSSR(m)
> >

Re: [Rdkit-discuss] HasSubstructMatch doesn't work as expected

2017-09-14 Thread Michał Nowotka
Using

params = AdjustQueryParameters()
params.makeAtomsGeneric = True
params.makeBondsGeneric = True
pattern = AdjustQueryProperties(pattern, params)

Seems to solve my problem - I'm getting a match now (but I haven't yet
verified if the match is correct and I don't understand why I had to
set both flags, I'd expect that setting makeBondsGeneric should make
it work already).
Thank you, Greg!

On Thu, Sep 14, 2017 at 5:58 AM, Greg Landrum  wrote:
> This isn't a really straightforward one.
>
> One solution (and I think the best one) is to change the aromaticity model
> used to match whatever is generating the hits (in your case it's the Symyx
> cartridge).
> The RDKit has functionality to do this already when you call the
> SetAromaticity() function:
>
> In [29]: m2 = Chem.MolFromMolFile('./CHEMBL25.mol',sanitize=False)
>
> In [30]: Chem.SanitizeMol(m2,Chem.SANITIZE_ALL^Chem.SANITIZE_SETAROMATICITY)
> Out[30]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE
>
> In [31]: Chem.SetAromaticity(m2,Chem.AROMATICITY_SIMPLE)
>
>
> The problem here is that there isn't an aromaticity model there for
> MDL/Symyx. This would be a useful thing to have and can be done quickly. If
> someone can describe the aromaticity model to me, or point me to a
> description of it, I can add it for the next release (which happens soon).
>
> Another solution that I think would work is to read the query molecule in
> without doing aromaticity perception (see input line 30 above) and then to
> convert all the bonds to either single-or-aromatic or double-or-aromatic
> queries using the approaches described here:
> http://rdkit.blogspot.ch/2015/08/tuning-substructure-queries.html
> and here:
> http://rdkit.blogspot.ch/2016/07/tuning-substructure-queries-ii.html
>
> Unfortunately the AdjustQueryParameters function doesn't have anything that
> helps with the kind of bond queries you need, so you'd need to make the bond
> changes in your code. If you want to go down this road and it's not clear
> how to do so, let me know and I can post some sample code. I'm afraid it's
> not completely trivial with bond queries
>
> -greg
>
>
>
> On Wed, Sep 13, 2017 at 4:42 PM, Michał Nowotka  wrote:
>>
>> Is there any flag in RDkit to match both 'normal' aspirin and embedded
>> aromatic analogues?
>> The problem is that I can't modify user queries by hand in real time :)
>>
>> On Wed, Sep 13, 2017 at 2:12 PM, Chris Earnshaw 
>> wrote:
>> > Hi
>> >
>> > The problem is due to RDkit perceiving the embedded pyranone in
>> > CHEMBL1999443 as an aromatic system, which is probably correct. However,
>> > in
>> > the structure of aspirin the carboxyl carbon and singly bonded oxygen
>> > are
>> > non-aromatic, so if you just use the SMILES of aspirin as a query it
>> > won't
>> > match CHEMBL1999443
>> >
>> > You'll need to use a slightly more generic aspirin-like query to allow
>> > the
>> > possibility of matching both 'normal' aspirin and embedded aromatic
>> > analogues. CC(=O)Oc1c1[#6](=O)[#8] should work OK.
>> >
>> > Regards,
>> > Chris
>> >
>> > On 13 September 2017 at 13:40, Michał Nowotka  wrote:
>> >>
>> >> Hi,
>> >>
>> >> This problem is probably due to my lack of chemistry knowledge but
>> >> plese have a look:
>> >>
>> >> If I do a substructure search in ChEMBL using aspirin (CHEMBL25) as a
>> >> query (ChEMBL API uses the Symix catridge):
>> >>
>> >> from chembl_webresource_client.new_client import new_client
>> >> res = new_client.substructure.filter(chembl_id='CHEMBL25')
>> >>
>> >> One of them will be CHEMBL1999443:
>> >>
>> >> 'CHEMBL1999443' in (r['molecule_chembl_id'] for r in res)
>> >> >>> True
>> >>
>> >> Now I take the molfile:
>> >>
>> >> new_client.molecule.set_format('mol')
>> >> mol = new_client.molecule.get('CHEMBL1999443')
>> >>
>> >> and load it with aspirin into rdkit:
>> >>
>> >> from rdkit import Chem
>> >> m = Chem.MolFromMolBlock(mol)
>> >> pattern = Chem.MolFromMolBlock(new_client.molecule.get('CHEMBL25'))
>> >>
>> >> If I check if it has an aspirin as a substructure using rdkit, I'm
>> >> getting false...
>> >>
>> >> m.HasSubstructMatch(pattern)
>> >> >>> False
>> >>
>> >> Looking at this blog post:
>> >>
>> >>
>> >> https://github.com/rdkit/rdkit-tutorials/blob/master/notebooks/002_SMARTS_SubstructureMatching.ipynb
>> >> I tried to initialize rings and retry:
>> >>
>> >>  Chem.GetSymmSSSR(m)
>> >>  m.HasSubstructMatch(pattern)
>> >>  >>>False
>> >>
>> >> Chem.GetSymmSSSR(pattern)
>> >> m.HasSubstructMatch(pattern)
>> >> >>>False
>> >>
>> >> But as you can see without any luck. Is there anything else I can do
>> >> to get the match anyway?
>> >> Without having a match I can't aligh and higlight asprin substructure
>> >> in CHEMBL1999443 image using GenerateDepictionMatching2DStructure and
>> >> DrawMolecule functions.
>> >>
>> >> Kind regards,
>> >>
>> >> Michał Nowotka
>> >>
>> >>
>> >>
>> >> -

Re: [Rdkit-discuss] HasSubstructMatch doesn't work as expected

2017-09-13 Thread Greg Landrum
This isn't a really straightforward one.

One solution (and I think the best one) is to change the aromaticity model
used to match whatever is generating the hits (in your case it's the Symyx
cartridge).
The RDKit has functionality to do this already when you call the
SetAromaticity() function:

In [29]: m2 = Chem.MolFromMolFile('./CHEMBL25.mol',sanitize=False)

In [30]: Chem.SanitizeMol(m2,Chem.SANITIZE_ALL^Chem.SANITIZE_SETAROMATICITY)
Out[30]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE

In [31]: Chem.SetAromaticity(m2,Chem.AROMATICITY_SIMPLE)


The problem here is that there isn't an aromaticity model there for
MDL/Symyx. This would be a useful thing to have and can be done quickly. If
someone can describe the aromaticity model to me, or point me to a
description of it, I can add it for the next release (which happens soon).

Another solution that I think would work is to read the query molecule in
without doing aromaticity perception (see input line 30 above) and then to
convert all the bonds to either single-or-aromatic or double-or-aromatic
queries using the approaches described here:
http://rdkit.blogspot.ch/2015/08/tuning-substructure-queries.html
and here:
http://rdkit.blogspot.ch/2016/07/tuning-substructure-queries-ii.html

Unfortunately the AdjustQueryParameters function doesn't have anything that
helps with the kind of bond queries you need, so you'd need to make the
bond changes in your code. If you want to go down this road and it's not
clear how to do so, let me know and I can post some sample code. I'm afraid
it's not completely trivial with bond queries

-greg



On Wed, Sep 13, 2017 at 4:42 PM, Michał Nowotka  wrote:

> Is there any flag in RDkit to match both 'normal' aspirin and embedded
> aromatic analogues?
> The problem is that I can't modify user queries by hand in real time :)
>
> On Wed, Sep 13, 2017 at 2:12 PM, Chris Earnshaw 
> wrote:
> > Hi
> >
> > The problem is due to RDkit perceiving the embedded pyranone in
> > CHEMBL1999443 as an aromatic system, which is probably correct. However,
> in
> > the structure of aspirin the carboxyl carbon and singly bonded oxygen are
> > non-aromatic, so if you just use the SMILES of aspirin as a query it
> won't
> > match CHEMBL1999443
> >
> > You'll need to use a slightly more generic aspirin-like query to allow
> the
> > possibility of matching both 'normal' aspirin and embedded aromatic
> > analogues. CC(=O)Oc1c1[#6](=O)[#8] should work OK.
> >
> > Regards,
> > Chris
> >
> > On 13 September 2017 at 13:40, Michał Nowotka  wrote:
> >>
> >> Hi,
> >>
> >> This problem is probably due to my lack of chemistry knowledge but
> >> plese have a look:
> >>
> >> If I do a substructure search in ChEMBL using aspirin (CHEMBL25) as a
> >> query (ChEMBL API uses the Symix catridge):
> >>
> >> from chembl_webresource_client.new_client import new_client
> >> res = new_client.substructure.filter(chembl_id='CHEMBL25')
> >>
> >> One of them will be CHEMBL1999443:
> >>
> >> 'CHEMBL1999443' in (r['molecule_chembl_id'] for r in res)
> >> >>> True
> >>
> >> Now I take the molfile:
> >>
> >> new_client.molecule.set_format('mol')
> >> mol = new_client.molecule.get('CHEMBL1999443')
> >>
> >> and load it with aspirin into rdkit:
> >>
> >> from rdkit import Chem
> >> m = Chem.MolFromMolBlock(mol)
> >> pattern = Chem.MolFromMolBlock(new_client.molecule.get('CHEMBL25'))
> >>
> >> If I check if it has an aspirin as a substructure using rdkit, I'm
> >> getting false...
> >>
> >> m.HasSubstructMatch(pattern)
> >> >>> False
> >>
> >> Looking at this blog post:
> >>
> >> https://github.com/rdkit/rdkit-tutorials/blob/master/
> notebooks/002_SMARTS_SubstructureMatching.ipynb
> >> I tried to initialize rings and retry:
> >>
> >>  Chem.GetSymmSSSR(m)
> >>  m.HasSubstructMatch(pattern)
> >>  >>>False
> >>
> >> Chem.GetSymmSSSR(pattern)
> >> m.HasSubstructMatch(pattern)
> >> >>>False
> >>
> >> But as you can see without any luck. Is there anything else I can do
> >> to get the match anyway?
> >> Without having a match I can't aligh and higlight asprin substructure
> >> in CHEMBL1999443 image using GenerateDepictionMatching2DStructure and
> >> DrawMolecule functions.
> >>
> >> Kind regards,
> >>
> >> Michał Nowotka
> >>
> >>
> >> 
> --
> >> Check out the vibrant tech community on one of the world's most
> >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> >> ___
> >> Rdkit-discuss mailing list
> >> Rdkit-discuss@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> >
> >
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-disc

Re: [Rdkit-discuss] HasSubstructMatch doesn't work as expected

2017-09-13 Thread Michal Krompiec
I'm afraid it won't work in the general case (i.e. you can make it work for
some classes of compounds, but not without unwanted side effects on others)
if the aromaticity model of the other cartridge is different - and it seems
to be the case here...

On Wednesday, 13 September 2017, Michał Nowotka  wrote:

> OK, so what I have is some substructure results from other (non-rdkit)
> cartridge and I want to use rdkit to generate images of all results
> with the query substracture highlighed and aligned.
> So I have two things: a list of compounds and a query compound.
> Now I need to highlight the query compound for every compound from the
> list and I need to do it at all costs. I can't leave any compound not
> highlighted even if rdkit by default has a different opinion weather
> the query compound really is a true substructure of a given compound.
>
> So how can I instruct rdkit to ignore aromacity and other factors,
> preferably one by one, each time going one level deeper where the last
> resort would be simply matching on the level of two planar graphs. Is
> that possible?
>
> On Wed, Sep 13, 2017 at 4:48 PM, Peter S. Shenkin  > wrote:
> > Your course of action depends upon just what you are really trying to
> do. If
> > it's only aspirin, then why wouldn't you just do it manually? If it goes
> > beyond aspirin, you have to start by defining in general terms exactly
> what
> > you want to match to what.
> >
> > For example, given a query molecule (aspirin in this case), if you want
> all
> > its non-aromatic atoms to match aromatic as well as non-aromatic atoms in
> > the database, you could write a string-alteration routine to munge the
> > SMILES of a query molecule into a SMARTS that would do just that, and
> then
> > use that SMARTS to match your database molecules. Repeat for each query
> > molecule.
> >
> > But you have to start with a precise definition of just what kind of
> > matching you wish to do. For instance, maybe you don't really want
> > non-aromatic ring atoms in your query to match aromatic rings and vice
> versa
> > (i.e., a cyclohexyl to match a phenyl); maybe you only want non-ring
> atoms
> > in the query to match aliphatic as well as aromatic substructures. And so
> > on.
> >
> > -P.
> >
> >
> > On Wed, Sep 13, 2017 at 10:42 AM, Michał Nowotka  > wrote:
> >>
> >> Is there any flag in RDkit to match both 'normal' aspirin and embedded
> >> aromatic analogues?
> >> The problem is that I can't modify user queries by hand in real time :)
> >>
> >> On Wed, Sep 13, 2017 at 2:12 PM, Chris Earnshaw  >
> >> wrote:
> >> > Hi
> >> >
> >> > The problem is due to RDkit perceiving the embedded pyranone in
> >> > CHEMBL1999443 as an aromatic system, which is probably correct.
> However,
> >> > in
> >> > the structure of aspirin the carboxyl carbon and singly bonded oxygen
> >> > are
> >> > non-aromatic, so if you just use the SMILES of aspirin as a query it
> >> > won't
> >> > match CHEMBL1999443
> >> >
> >> > You'll need to use a slightly more generic aspirin-like query to allow
> >> > the
> >> > possibility of matching both 'normal' aspirin and embedded aromatic
> >> > analogues. CC(=O)Oc1c1[#6](=O)[#8] should work OK.
> >> >
> >> > Regards,
> >> > Chris
> >> >
> >> > On 13 September 2017 at 13:40, Michał Nowotka  > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> This problem is probably due to my lack of chemistry knowledge but
> >> >> plese have a look:
> >> >>
> >> >> If I do a substructure search in ChEMBL using aspirin (CHEMBL25) as a
> >> >> query (ChEMBL API uses the Symix catridge):
> >> >>
> >> >> from chembl_webresource_client.new_client import new_client
> >> >> res = new_client.substructure.filter(chembl_id='CHEMBL25')
> >> >>
> >> >> One of them will be CHEMBL1999443:
> >> >>
> >> >> 'CHEMBL1999443' in (r['molecule_chembl_id'] for r in res)
> >> >> >>> True
> >> >>
> >> >> Now I take the molfile:
> >> >>
> >> >> new_client.molecule.set_format('mol')
> >> >> mol = new_client.molecule.get('CHEMBL1999443')
> >> >>
> >> >> and load it with aspirin into rdkit:
> >> >>
> >> >> from rdkit import Chem
> >> >> m = Chem.MolFromMolBlock(mol)
> >> >> pattern = Chem.MolFromMolBlock(new_
> client.molecule.get('CHEMBL25'))
> >> >>
> >> >> If I check if it has an aspirin as a substructure using rdkit, I'm
> >> >> getting false...
> >> >>
> >> >> m.HasSubstructMatch(pattern)
> >> >> >>> False
> >> >>
> >> >> Looking at this blog post:
> >> >>
> >> >>
> >> >> https://github.com/rdkit/rdkit-tutorials/blob/master/
> notebooks/002_SMARTS_SubstructureMatching.ipynb
> >> >> I tried to initialize rings and retry:
> >> >>
> >> >>  Chem.GetSymmSSSR(m)
> >> >>  m.HasSubstructMatch(pattern)
> >> >>  >>>False
> >> >>
> >> >> Chem.GetSymmSSSR(pattern)
> >> >> m.HasSubstructMatch(pattern)
> >> >> >>>False
> >> >>
> >> >> But as you can see without any luck. Is there anything else I can do
> >> >> to get the match anyway?
> >> >> Without having a

Re: [Rdkit-discuss] HasSubstructMatch doesn't work as expected

2017-09-13 Thread Michał Nowotka
OK, so what I have is some substructure results from other (non-rdkit)
cartridge and I want to use rdkit to generate images of all results
with the query substracture highlighed and aligned.
So I have two things: a list of compounds and a query compound.
Now I need to highlight the query compound for every compound from the
list and I need to do it at all costs. I can't leave any compound not
highlighted even if rdkit by default has a different opinion weather
the query compound really is a true substructure of a given compound.

So how can I instruct rdkit to ignore aromacity and other factors,
preferably one by one, each time going one level deeper where the last
resort would be simply matching on the level of two planar graphs. Is
that possible?

On Wed, Sep 13, 2017 at 4:48 PM, Peter S. Shenkin  wrote:
> Your course of action depends upon just what you are really trying to do. If
> it's only aspirin, then why wouldn't you just do it manually? If it goes
> beyond aspirin, you have to start by defining in general terms exactly what
> you want to match to what.
>
> For example, given a query molecule (aspirin in this case), if you want all
> its non-aromatic atoms to match aromatic as well as non-aromatic atoms in
> the database, you could write a string-alteration routine to munge the
> SMILES of a query molecule into a SMARTS that would do just that, and then
> use that SMARTS to match your database molecules. Repeat for each query
> molecule.
>
> But you have to start with a precise definition of just what kind of
> matching you wish to do. For instance, maybe you don't really want
> non-aromatic ring atoms in your query to match aromatic rings and vice versa
> (i.e., a cyclohexyl to match a phenyl); maybe you only want non-ring atoms
> in the query to match aliphatic as well as aromatic substructures. And so
> on.
>
> -P.
>
>
> On Wed, Sep 13, 2017 at 10:42 AM, Michał Nowotka  wrote:
>>
>> Is there any flag in RDkit to match both 'normal' aspirin and embedded
>> aromatic analogues?
>> The problem is that I can't modify user queries by hand in real time :)
>>
>> On Wed, Sep 13, 2017 at 2:12 PM, Chris Earnshaw 
>> wrote:
>> > Hi
>> >
>> > The problem is due to RDkit perceiving the embedded pyranone in
>> > CHEMBL1999443 as an aromatic system, which is probably correct. However,
>> > in
>> > the structure of aspirin the carboxyl carbon and singly bonded oxygen
>> > are
>> > non-aromatic, so if you just use the SMILES of aspirin as a query it
>> > won't
>> > match CHEMBL1999443
>> >
>> > You'll need to use a slightly more generic aspirin-like query to allow
>> > the
>> > possibility of matching both 'normal' aspirin and embedded aromatic
>> > analogues. CC(=O)Oc1c1[#6](=O)[#8] should work OK.
>> >
>> > Regards,
>> > Chris
>> >
>> > On 13 September 2017 at 13:40, Michał Nowotka  wrote:
>> >>
>> >> Hi,
>> >>
>> >> This problem is probably due to my lack of chemistry knowledge but
>> >> plese have a look:
>> >>
>> >> If I do a substructure search in ChEMBL using aspirin (CHEMBL25) as a
>> >> query (ChEMBL API uses the Symix catridge):
>> >>
>> >> from chembl_webresource_client.new_client import new_client
>> >> res = new_client.substructure.filter(chembl_id='CHEMBL25')
>> >>
>> >> One of them will be CHEMBL1999443:
>> >>
>> >> 'CHEMBL1999443' in (r['molecule_chembl_id'] for r in res)
>> >> >>> True
>> >>
>> >> Now I take the molfile:
>> >>
>> >> new_client.molecule.set_format('mol')
>> >> mol = new_client.molecule.get('CHEMBL1999443')
>> >>
>> >> and load it with aspirin into rdkit:
>> >>
>> >> from rdkit import Chem
>> >> m = Chem.MolFromMolBlock(mol)
>> >> pattern = Chem.MolFromMolBlock(new_client.molecule.get('CHEMBL25'))
>> >>
>> >> If I check if it has an aspirin as a substructure using rdkit, I'm
>> >> getting false...
>> >>
>> >> m.HasSubstructMatch(pattern)
>> >> >>> False
>> >>
>> >> Looking at this blog post:
>> >>
>> >>
>> >> https://github.com/rdkit/rdkit-tutorials/blob/master/notebooks/002_SMARTS_SubstructureMatching.ipynb
>> >> I tried to initialize rings and retry:
>> >>
>> >>  Chem.GetSymmSSSR(m)
>> >>  m.HasSubstructMatch(pattern)
>> >>  >>>False
>> >>
>> >> Chem.GetSymmSSSR(pattern)
>> >> m.HasSubstructMatch(pattern)
>> >> >>>False
>> >>
>> >> But as you can see without any luck. Is there anything else I can do
>> >> to get the match anyway?
>> >> Without having a match I can't aligh and higlight asprin substructure
>> >> in CHEMBL1999443 image using GenerateDepictionMatching2DStructure and
>> >> DrawMolecule functions.
>> >>
>> >> Kind regards,
>> >>
>> >> Michał Nowotka
>> >>
>> >>
>> >>
>> >> --
>> >> Check out the vibrant tech community on one of the world's most
>> >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> >> ___
>> >> Rdkit-discuss mailing list
>> >> Rdkit-discuss@lists.so

Re: [Rdkit-discuss] HasSubstructMatch doesn't work as expected

2017-09-13 Thread Peter S. Shenkin
Your course of action depends upon just what you are really trying to do.
If it's only aspirin, then why wouldn't you just do it manually? If it goes
beyond aspirin, you have to start by defining in general terms exactly what
you want to match to what.

For example, given a query molecule (aspirin in this case), if you want all
its non-aromatic atoms to match aromatic as well as non-aromatic atoms in
the database, you could write a string-alteration routine to munge the
SMILES of a query molecule into a SMARTS that would do just that, and then
use that SMARTS to match your database molecules. Repeat for each query
molecule.

But you have to start with a precise definition of just what kind of
matching you wish to do. For instance, maybe you don't really want
non-aromatic ring atoms in your query to match aromatic rings and vice
versa (i.e., a cyclohexyl to match a phenyl); maybe you only want non-ring
atoms in the query to match aliphatic as well as aromatic substructures.
And so on.

-P.


On Wed, Sep 13, 2017 at 10:42 AM, Michał Nowotka  wrote:

> Is there any flag in RDkit to match both 'normal' aspirin and embedded
> aromatic analogues?
> The problem is that I can't modify user queries by hand in real time :)
>
> On Wed, Sep 13, 2017 at 2:12 PM, Chris Earnshaw 
> wrote:
> > Hi
> >
> > The problem is due to RDkit perceiving the embedded pyranone in
> > CHEMBL1999443 as an aromatic system, which is probably correct. However,
> in
> > the structure of aspirin the carboxyl carbon and singly bonded oxygen are
> > non-aromatic, so if you just use the SMILES of aspirin as a query it
> won't
> > match CHEMBL1999443
> >
> > You'll need to use a slightly more generic aspirin-like query to allow
> the
> > possibility of matching both 'normal' aspirin and embedded aromatic
> > analogues. CC(=O)Oc1c1[#6](=O)[#8] should work OK.
> >
> > Regards,
> > Chris
> >
> > On 13 September 2017 at 13:40, Michał Nowotka  wrote:
> >>
> >> Hi,
> >>
> >> This problem is probably due to my lack of chemistry knowledge but
> >> plese have a look:
> >>
> >> If I do a substructure search in ChEMBL using aspirin (CHEMBL25) as a
> >> query (ChEMBL API uses the Symix catridge):
> >>
> >> from chembl_webresource_client.new_client import new_client
> >> res = new_client.substructure.filter(chembl_id='CHEMBL25')
> >>
> >> One of them will be CHEMBL1999443:
> >>
> >> 'CHEMBL1999443' in (r['molecule_chembl_id'] for r in res)
> >> >>> True
> >>
> >> Now I take the molfile:
> >>
> >> new_client.molecule.set_format('mol')
> >> mol = new_client.molecule.get('CHEMBL1999443')
> >>
> >> and load it with aspirin into rdkit:
> >>
> >> from rdkit import Chem
> >> m = Chem.MolFromMolBlock(mol)
> >> pattern = Chem.MolFromMolBlock(new_client.molecule.get('CHEMBL25'))
> >>
> >> If I check if it has an aspirin as a substructure using rdkit, I'm
> >> getting false...
> >>
> >> m.HasSubstructMatch(pattern)
> >> >>> False
> >>
> >> Looking at this blog post:
> >>
> >> https://github.com/rdkit/rdkit-tutorials/blob/master/
> notebooks/002_SMARTS_SubstructureMatching.ipynb
> >> I tried to initialize rings and retry:
> >>
> >>  Chem.GetSymmSSSR(m)
> >>  m.HasSubstructMatch(pattern)
> >>  >>>False
> >>
> >> Chem.GetSymmSSSR(pattern)
> >> m.HasSubstructMatch(pattern)
> >> >>>False
> >>
> >> But as you can see without any luck. Is there anything else I can do
> >> to get the match anyway?
> >> Without having a match I can't aligh and higlight asprin substructure
> >> in CHEMBL1999443 image using GenerateDepictionMatching2DStructure and
> >> DrawMolecule functions.
> >>
> >> Kind regards,
> >>
> >> Michał Nowotka
> >>
> >>
> >> 
> --
> >> Check out the vibrant tech community on one of the world's most
> >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> >> ___
> >> Rdkit-discuss mailing list
> >> Rdkit-discuss@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> >
> >
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] HasSubstructMatch doesn't work as expected

2017-09-13 Thread Michał Nowotka
Is there any flag in RDkit to match both 'normal' aspirin and embedded
aromatic analogues?
The problem is that I can't modify user queries by hand in real time :)

On Wed, Sep 13, 2017 at 2:12 PM, Chris Earnshaw  wrote:
> Hi
>
> The problem is due to RDkit perceiving the embedded pyranone in
> CHEMBL1999443 as an aromatic system, which is probably correct. However, in
> the structure of aspirin the carboxyl carbon and singly bonded oxygen are
> non-aromatic, so if you just use the SMILES of aspirin as a query it won't
> match CHEMBL1999443
>
> You'll need to use a slightly more generic aspirin-like query to allow the
> possibility of matching both 'normal' aspirin and embedded aromatic
> analogues. CC(=O)Oc1c1[#6](=O)[#8] should work OK.
>
> Regards,
> Chris
>
> On 13 September 2017 at 13:40, Michał Nowotka  wrote:
>>
>> Hi,
>>
>> This problem is probably due to my lack of chemistry knowledge but
>> plese have a look:
>>
>> If I do a substructure search in ChEMBL using aspirin (CHEMBL25) as a
>> query (ChEMBL API uses the Symix catridge):
>>
>> from chembl_webresource_client.new_client import new_client
>> res = new_client.substructure.filter(chembl_id='CHEMBL25')
>>
>> One of them will be CHEMBL1999443:
>>
>> 'CHEMBL1999443' in (r['molecule_chembl_id'] for r in res)
>> >>> True
>>
>> Now I take the molfile:
>>
>> new_client.molecule.set_format('mol')
>> mol = new_client.molecule.get('CHEMBL1999443')
>>
>> and load it with aspirin into rdkit:
>>
>> from rdkit import Chem
>> m = Chem.MolFromMolBlock(mol)
>> pattern = Chem.MolFromMolBlock(new_client.molecule.get('CHEMBL25'))
>>
>> If I check if it has an aspirin as a substructure using rdkit, I'm
>> getting false...
>>
>> m.HasSubstructMatch(pattern)
>> >>> False
>>
>> Looking at this blog post:
>>
>> https://github.com/rdkit/rdkit-tutorials/blob/master/notebooks/002_SMARTS_SubstructureMatching.ipynb
>> I tried to initialize rings and retry:
>>
>>  Chem.GetSymmSSSR(m)
>>  m.HasSubstructMatch(pattern)
>>  >>>False
>>
>> Chem.GetSymmSSSR(pattern)
>> m.HasSubstructMatch(pattern)
>> >>>False
>>
>> But as you can see without any luck. Is there anything else I can do
>> to get the match anyway?
>> Without having a match I can't aligh and higlight asprin substructure
>> in CHEMBL1999443 image using GenerateDepictionMatching2DStructure and
>> DrawMolecule functions.
>>
>> Kind regards,
>>
>> Michał Nowotka
>>
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] HasSubstructMatch doesn't work as expected

2017-09-13 Thread Chris Earnshaw
Hi

The problem is due to RDkit perceiving the embedded pyranone in
CHEMBL1999443 as an aromatic system, which is probably correct. However, in
the structure of aspirin the carboxyl carbon and singly bonded oxygen are
non-aromatic, so if you just use the SMILES of aspirin as a query it won't
match CHEMBL1999443

You'll need to use a slightly more generic aspirin-like query to allow the
possibility of matching both 'normal' aspirin and embedded aromatic
analogues. CC(=O)Oc1c1[#6](=O)[#8] should work OK.

Regards,
Chris

On 13 September 2017 at 13:40, Michał Nowotka  wrote:

> Hi,
>
> This problem is probably due to my lack of chemistry knowledge but
> plese have a look:
>
> If I do a substructure search in ChEMBL using aspirin (CHEMBL25) as a
> query (ChEMBL API uses the Symix catridge):
>
> from chembl_webresource_client.new_client import new_client
> res = new_client.substructure.filter(chembl_id='CHEMBL25')
>
> One of them will be CHEMBL1999443:
>
> 'CHEMBL1999443' in (r['molecule_chembl_id'] for r in res)
> >>> True
>
> Now I take the molfile:
>
> new_client.molecule.set_format('mol')
> mol = new_client.molecule.get('CHEMBL1999443')
>
> and load it with aspirin into rdkit:
>
> from rdkit import Chem
> m = Chem.MolFromMolBlock(mol)
> pattern = Chem.MolFromMolBlock(new_client.molecule.get('CHEMBL25'))
>
> If I check if it has an aspirin as a substructure using rdkit, I'm
> getting false...
>
> m.HasSubstructMatch(pattern)
> >>> False
>
> Looking at this blog post:
> https://github.com/rdkit/rdkit-tutorials/blob/master/notebooks/002_SMARTS_
> SubstructureMatching.ipynb
> I tried to initialize rings and retry:
>
>  Chem.GetSymmSSSR(m)
>  m.HasSubstructMatch(pattern)
>  >>>False
>
> Chem.GetSymmSSSR(pattern)
> m.HasSubstructMatch(pattern)
> >>>False
>
> But as you can see without any luck. Is there anything else I can do
> to get the match anyway?
> Without having a match I can't aligh and higlight asprin substructure
> in CHEMBL1999443 image using GenerateDepictionMatching2DStructure and
> DrawMolecule functions.
>
> Kind regards,
>
> Michał Nowotka
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss