Re: [Rdkit-discuss] how to output multiple Kekule structures

2017-09-12 Thread Paolo Tosco
Hi Jason, Jim,

You are right; I think an alternative could be the following:

>>> from rdkit import Chem
>>> suppl=Chem.ResonanceMolSupplier(Chem.MolFromSmiles('c1c1'),Chem.KEKULE_ALL)
>>>  
>>> q=Chem.MolFromSmarts('C-C=C')
>>> for mol in suppl:
...   molCopy=Chem.Mol(mol)
...   for a in molCopy.GetAtoms():
... a.SetIsAromatic(False)
...   for b in molCopy.GetBonds():
... b.SetIsAromatic(False)
...   print molCopy.GetSubstructMatches(q)
... 
((0, 5, 4), (1, 2, 3), (2, 1, 0), (3, 4, 5), (4, 3, 2), (5, 0, 1))
((0, 1, 2), (1, 0, 5), (2, 3, 4), (3, 2, 1), (4, 5, 0), (5, 4, 3))
>>> 

Cheers,
p.

> On 11 Sep 2017, at 23:38, Jason Biggs <jasondbi...@gmail.com> wrote:
> 
> But keep in mind that the kekulized mols you create with the resonance 
> supplier will not match the SMARTS patterns given.
> 
> Chem.MolToSmiles(mol2, kekuleSmiles = True)
> 
> >'C1C=CC=CC=1'
> 
> mol2.HasSubstructMatch(Chem.MolFromSmarts('[C]=[C]-[C]'))
> 
> > False
> 
> mol2.HasSubstructMatch(Chem.MolFromSmarts('[c]=[c]-[c]'))
> 
> > True
> 
> So at the very least, you need to change the smarts strings to use [#6] 
> instead of [C]
> 
> 
> 
> Jason Biggs
> 
> 
>> On Mon, Sep 11, 2017 at 2:53 PM, Paolo Tosco <paolo.to...@unito.it> wrote:
>> Hi Jim,
>> 
>> you can indeed enumerate all Kekulè structures for a molecule within the 
>> RDKit using Chem.ResonanceMolSupplier():
>> 
>> from rdkit import Chem
>> mol = Chem.MolFromSmiles('c1c1')
>> suppl = Chem.ResonanceMolSupplier(mol, Chem.KEKULE_ALL)
>> len(suppl)
>> 2
>> for i in range(len(suppl)):
>> print (Chem.MolToSmiles(suppl[i], kekuleSmiles=True))
>> C1C=CC=CC=1
>> C1=CC=CC=C1
>>  
>> Best,
>> Paolo
>> 
>> 
>>> On 09/11/2017 05:22 PM, James T. Metz via Rdkit-discuss wrote:
>>> Greg,
>>> 
>>> Thanks!  Yes, very helpful.  I will need to digest the detailed 
>>> information
>>> you have provided.  I am somewhat familiar with recursive SMARTS.  Thanks
>>> again.
>>> 
>>> Regards,
>>> Jim Metz
>>> 
>>> 
>>> 
>>> 
>>> -Original Message-
>>> From: Greg Landrum <greg.land...@gmail.com>
>>> To: James T. Metz <jamestm...@aol.com>
>>> Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
>>> Sent: Mon, Sep 11, 2017 11:15 am
>>> Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures
>>> 
>>> 
>>> On Mon, Sep 11, 2017 at 5:55 PM, James T. Metz <jamestm...@aol.com> wrote:
>>> Greg,
>>> 
>>> I need to be able to use SMARTS patterns to identify substructures in 
>>> molecules
>>> that can be aromatic, and I need to be able to handle cases where there can 
>>> be
>>> differences in the way that the molecule was entered or drawn by a user.
>>> 
>>> That particular problem is a big part of the reason that we tend to use the 
>>> aromatic representation of things.
>>>  
>>> For example, consider the following alkenyl-substituted pyridine, there
>>> are two possible Kekule structures
>>> 
>>> m1 = 'C=CC1=NC=CC=C1'
>>> m2 = 'C=CC1N=CC=CC1'
>>> 
>>> Fixing what I assume is a typo for m2, I can do the following:
>>> 
>>> In [11]: m1 = Chem.MolFromSmiles('C=CC1=NC=CC=C1')
>>> 
>>> In [12]: m2 = Chem.MolFromSmiles('C=CC1N=CC=CC=1')
>>> 
>>> In [13]: q1 = Chem.MolFromSmarts('')
>>> 
>>> In [14]: q2 = Chem.MolFromSmarts('cccn')
>>> 
>>> In [15]: list(m1.GetSubstructMatch(q1))
>>> Out[15]: [2, 7, 6, 5]
>>> 
>>> In [16]: list(m1.GetSubstructMatch(q2))
>>> Out[16]: [6, 5, 4, 3]
>>> 
>>> In [17]: list(m2.GetSubstructMatch(q1))
>>> Out[17]: [2, 7, 6, 5]
>>> 
>>> In [18]: list(m2.GetSubstructMatch(q2))
>>> Out[18]: [6, 5, 4, 3]
>>>  
>>> 
>>> Those particular queries were going for the aromatic species and will only 
>>> match inside the ring, but if you want to be more generic you could tune 
>>> your queries like this:
>>> 
>>> In [28]: q3 = 
>>> Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]-=,:[*])]')
>>> 
>>> In [29]: q4 = 
>>> Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#7;$([#7]-=,:[*])]')
>>> 
>&g

Re: [Rdkit-discuss] how to output multiple Kekule structures

2017-09-11 Thread Jason Biggs
But keep in mind that the kekulized mols you create with the resonance
supplier will not match the SMARTS patterns given.

Chem.MolToSmiles(mol2, kekuleSmiles = True)

>'C1C=CC=CC=1'


mol2.HasSubstructMatch(Chem.MolFromSmarts('[C]=[C]-[C]'))

> False

mol2.HasSubstructMatch(Chem.MolFromSmarts('[c]=[c]-[c]'))

> True

So at the very least, you need to change the smarts strings to use [#6]
instead of [C]



Jason Biggs


On Mon, Sep 11, 2017 at 2:53 PM, Paolo Tosco <paolo.to...@unito.it> wrote:

> Hi Jim,
>
> you can indeed enumerate all Kekulè structures for a molecule within the
> RDKit using Chem.ResonanceMolSupplier():
>
> from rdkit import Chem
>
> mol = Chem.MolFromSmiles('c1c1')
>
> suppl = Chem.ResonanceMolSupplier(mol, Chem.KEKULE_ALL)
>
> len(suppl)
>
> 2
>
> for i in range(len(suppl)):
> print (Chem.MolToSmiles(suppl[i], kekuleSmiles=True))
>
> C1C=CC=CC=1
> C1=CC=CC=C1
>
>   Best,
> Paolo
>
>
> On 09/11/2017 05:22 PM, James T. Metz via Rdkit-discuss wrote:
>
> Greg,
>
> Thanks!  Yes, very helpful.  I will need to digest the detailed
> information
> you have provided.  I am somewhat familiar with recursive SMARTS.  Thanks
> again.
>
> Regards,
> Jim Metz
>
>
>
>
> -Original Message-
> From: Greg Landrum <greg.land...@gmail.com> <greg.land...@gmail.com>
> To: James T. Metz <jamestm...@aol.com> <jamestm...@aol.com>
> Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
> <rdkit-discuss@lists.sourceforge.net>
> Sent: Mon, Sep 11, 2017 11:15 am
> Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures
>
>
> On Mon, Sep 11, 2017 at 5:55 PM, James T. Metz < <jamestm...@aol.com>
> jamestm...@aol.com> wrote:
>
> Greg,
>
> I need to be able to use SMARTS patterns to identify substructures in
> molecules
> that can be aromatic, and I need to be able to handle cases where there
> can be
> differences in the way that the molecule was entered or drawn by a user.
>
>
> That particular problem is a big part of the reason that we tend to use
> the aromatic representation of things.
>
>
> For example, consider the following alkenyl-substituted pyridine, there
> are two possible Kekule structures
>
> m1 = 'C=CC1=NC=CC=C1'
> m2 = 'C=CC1N=CC=CC1'
>
>
> Fixing what I assume is a typo for m2, I can do the following:
>
> In [11]: m1 = Chem.MolFromSmiles('C=CC1=NC=CC=C1')
>
> In [12]: m2 = Chem.MolFromSmiles('C=CC1N=CC=CC=1')
>
> In [13]: q1 = Chem.MolFromSmarts('')
>
> In [14]: q2 = Chem.MolFromSmarts('cccn')
>
> In [15]: list(m1.GetSubstructMatch(q1))
> Out[15]: [2, 7, 6, 5]
>
> In [16]: list(m1.GetSubstructMatch(q2))
> Out[16]: [6, 5, 4, 3]
>
> In [17]: list(m2.GetSubstructMatch(q1))
> Out[17]: [2, 7, 6, 5]
>
> In [18]: list(m2.GetSubstructMatch(q2))
> Out[18]: [6, 5, 4, 3]
>
>
> Those particular queries were going for the aromatic species and will only
> match inside the ring, but if you want to be more generic you could tune
> your queries like this:
>
> In [28]: q3 = Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])
> ]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]-=,:[*])]')
>
> In [29]: q4 = Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])
> ]-,=,:[#6;$([#6]=,:[*])]-,=,:[#7;$([#7]-=,:[*])]')
>
> In [30]: list(m1.GetSubstructMatch(q3))
> Out[30]: [0, 1, 2, 7]
>
> In [31]: list(m1.GetSubstructMatch(q4))
> Out[31]: [0, 1, 2, 3]
>
> In [32]: list(m2.GetSubstructMatch(q3))
> Out[32]: [0, 1, 2, 7]
>
> In [33]: list(m2.GetSubstructMatch(q4))
> Out[33]: [0, 1, 2, 3]
>
> If you aren't familiar with recursive SMARTS, this construct:
> "[#6;$([#6]=,:[*])]" means "a carbon that has either a double bond or an
> aromatic bond to another atom".  So you can interpret q3 as "four carbons
> that each have either a double or aromatic bond and that are connected to
> each other by single, double, or aromatic bonds".
>
> Is this starting to approximate what you're looking for?
> -greg
>
>
>
>
> Now consider two SMARTS
>
> pattern1 = '[C]=[C]-[C]={C]
> pattern2 = '[C]=[C]-[C]=[N]'
>
> I need to be able to detect the existence of each pattern in the
> molecule
>
> If m1 is the only available generated Kekule structure, then pattern2
> will be recognized.
> If m2 is the only available generated Kekule  structure, then pattern1
> will be recognized.
>
> Hence, I am getting different answers for the same input molecule just
> because
> it was drawn in different Kekule structures.
>
> Regards,
&g

Re: [Rdkit-discuss] how to output multiple Kekule structures

2017-09-11 Thread James T. Metz via Rdkit-discuss
Paolo,


Exactly what I was looking for.  Very helpful.  Thank you.

Regards,
Jim Metz





-Original Message-
From: Paolo Tosco <paolo.to...@unito.it>
To: James T. Metz <jamestm...@aol.com>; greg.landrum <greg.land...@gmail.com>; 
rdkit-discuss <rdkit-discuss@lists.sourceforge.net>
Sent: Mon, Sep 11, 2017 2:53 pm
Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures


Hi Jim,

you can indeed enumerate all Kekulè structures for a molecule withinthe 
RDKit using Chem.ResonanceMolSupplier():


  

  
  

  


  

  

  
from rdkit import Chem


  

  


  
  


  

  

  
mol = Chem.MolFromSmiles('c1c1')


  

  


  
  


  

  

  
suppl = Chem.ResonanceMolSupplier(mol, Chem.KEKULE_ALL)


  

  


  
  


  

  

len(suppl)

  

  


  
  


  

  

  
2

  

  


  
  


  

  

for i in range(len(suppl)):
print (Chem.MolToSmiles(suppl[i], kekuleSmiles=True))

  

  


  
  


  

  

  
C1C=CC=CC=1
C1=CC=CC=C1


  

  


  
  


  

  

  

  
 


  

  

  

Best,
Paolo


On 09/11/2017 05:22 PM, James T. Metz  via Rdkit-discuss wrote:


Greg,



Thanks!  Yes, very helpful.  I will need to digest the  detailed 
information

you have provided.  I am somewhat familiar with recursive  SMARTS.  
Thanks

again.




Regards,

Jim Metz


  
  
  
  
-OriginalMessage-
From: Greg Landrum <greg.land...@gmail.com>
To: James T. Metz <jamestm...@aol.com>
Cc: RDKit Discuss<rdkit-discuss@lists.sourceforge.net>
Sent: Mon, Sep 11, 2017 11:15 am
    Subject: Re: [Rdkit-discuss] how to output multiple Kekule  
  structures


  

  

  
  

On Mon, Sep 11,  2017 at 5:55 PM, James T. Metz 
<jamestm...@aol.com>  wrote:
  
Greg,  

  
  
I need to be able to use SMARTSpatterns to 
identify substructures inmolecules
  
that can be aromatic, and I need to beable to 
handle cases where there can be
  
differences in the way that the moleculewas entered 
or drawn by a user.

  

  
  

That particular  problem is a big part of the reason 
that we  tend to use the aromatic representation of 
 things.

 

  
  
  
  
  
For example, consider the following
alkenyl-substituted pyridine, there
  
are two possible Kekule structures
  

  
  
m1 = 'C=CC1=NC=CC=C1'
  
m2 = 'C=CC1N=CC=CC1'

  

  
  
Fixing what I assume is a typo for m2, I cando the 
following:
  

  
  
In [11]: m1 =Chem.MolFromSmiles('C=CC1=NC=CC=C1')
  

  
  
In [12]: m2 =Chem.MolFromSmiles('C=CC1N=CC=CC=1')
  

  
  
In [13]: q1 = Chem.MolFromSmarts('')
  

  
 

Re: [Rdkit-discuss] how to output multiple Kekule structures

2017-09-11 Thread Paolo Tosco

Hi Jim,

you can indeed enumerate all Kekulè structures for a molecule within the 
RDKit using Chem.ResonanceMolSupplier():


from  rdkit  import  Chem

mol  =  Chem.MolFromSmiles('c1c1')

suppl  =  Chem.ResonanceMolSupplier(mol,  Chem.KEKULE_ALL)

len(suppl)

2

for  i  in  range(len(suppl)):
print  (Chem.MolToSmiles(suppl[i],  kekuleSmiles=True))

C1C=CC=CC=1
C1=CC=CC=C1

 


Best,
Paolo

On 09/11/2017 05:22 PM, James T. Metz via Rdkit-discuss wrote:

Greg,

Thanks!  Yes, very helpful.  I will need to digest the detailed 
information

you have provided.  I am somewhat familiar with recursive SMARTS.  Thanks
again.

Regards,
Jim Metz




-Original Message-
From: Greg Landrum <greg.land...@gmail.com>
To: James T. Metz <jamestm...@aol.com>
Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
Sent: Mon, Sep 11, 2017 11:15 am
Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures


On Mon, Sep 11, 2017 at 5:55 PM, James T. Metz <jamestm...@aol.com 
<mailto:jamestm...@aol.com>> wrote:


Greg,

I need to be able to use SMARTS patterns to identify
substructures in molecules
that can be aromatic, and I need to be able to handle cases where
there can be
differences in the way that the molecule was entered or drawn by a
user.


That particular problem is a big part of the reason that we tend to 
use the aromatic representation of things.


For example, consider the following alkenyl-substituted
pyridine, there
are two possible Kekule structures

m1 = 'C=CC1=NC=CC=C1'
m2 = 'C=CC1N=CC=CC1'


Fixing what I assume is a typo for m2, I can do the following:

In [11]: m1 = Chem.MolFromSmiles('C=CC1=NC=CC=C1')

In [12]: m2 = Chem.MolFromSmiles('C=CC1N=CC=CC=1')

In [13]: q1 = Chem.MolFromSmarts('')

In [14]: q2 = Chem.MolFromSmarts('cccn')

In [15]: list(m1.GetSubstructMatch(q1))
Out[15]: [2, 7, 6, 5]

In [16]: list(m1.GetSubstructMatch(q2))
Out[16]: [6, 5, 4, 3]

In [17]: list(m2.GetSubstructMatch(q1))
Out[17]: [2, 7, 6, 5]

In [18]: list(m2.GetSubstructMatch(q2))
Out[18]: [6, 5, 4, 3]

Those particular queries were going for the aromatic species and will 
only match inside the ring, but if you want to be more generic you 
could tune your queries like this:


In [28]: q3 = 
Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]-=,:[*])]')


In [29]: q4 = 
Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#7;$([#7]-=,:[*])]')


In [30]: list(m1.GetSubstructMatch(q3))
Out[30]: [0, 1, 2, 7]

In [31]: list(m1.GetSubstructMatch(q4))
Out[31]: [0, 1, 2, 3]

In [32]: list(m2.GetSubstructMatch(q3))
Out[32]: [0, 1, 2, 7]

In [33]: list(m2.GetSubstructMatch(q4))
Out[33]: [0, 1, 2, 3]

If you aren't familiar with recursive SMARTS, this construct: 
"[#6;$([#6]=,:[*])]" means "a carbon that has either a double bond or 
an aromatic bond to another atom".  So you can interpret q3 as "four 
carbons that each have either a double or aromatic bond and that are 
connected to each other by single, double, or aromatic bonds".


Is this starting to approximate what you're looking for?
-greg




Now consider two SMARTS

pattern1 = '[C]=[C]-[C]={C]
pattern2 = '[C]=[C]-[C]=[N]'

I need to be able to detect the existence of each pattern in
the molecule

If m1 is the only available generated Kekule structure, then
pattern2 will be recognized.
If m2 is the only available generated Kekule  structure, then
pattern1 will be recognized.

Hence, I am getting different answers for the same input
molecule just because
it was drawn in different Kekule structures.

Regards,
Jim Metz





-Original Message-
From: Greg Landrum <greg.land...@gmail.com
<mailto:greg.land...@gmail.com>>
To: James T. Metz <jamestm...@aol.com <mailto:jamestm...@aol.com>>
Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net
<mailto:rdkit-discuss@lists.sourceforge.net>>
    Sent: Mon, Sep 11, 2017 10:31 am
Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures

Hi Jim,

The code currently has no way to enumerate Kekule structures. I
don't recall this coming up in the past and, to be honest, it
doesn't seem all that generally useful.

Perhaps there's an alternate way to solve the problem; what are
you trying to do?

-greg


On Mon, Sep 11, 2017 at 5:04 PM, James T. Metz via Rdkit-discuss
<rdkit-discuss@lists.sourceforge.net> wrote:

Hello,

Suppose I read in an aromatic SMILES e.g., for benzene

c1c1

I would like to generate the major canonical resonance forms
and save the results as two separate molecules.  Essentially
I am trying to generate

 

Re: [Rdkit-discuss] how to output multiple Kekule structures

2017-09-11 Thread James T. Metz via Rdkit-discuss
Greg,


Thanks!  Yes, very helpful.  I will need to digest the detailed information
you have provided.  I am somewhat familiar with recursive SMARTS.  Thanks
again.


Regards,
Jim Metz





-Original Message-
From: Greg Landrum <greg.land...@gmail.com>
To: James T. Metz <jamestm...@aol.com>
Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
Sent: Mon, Sep 11, 2017 11:15 am
Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures






On Mon, Sep 11, 2017 at 5:55 PM, James T. Metz <jamestm...@aol.com> wrote:

Greg,


I need to be able to use SMARTS patterns to identify substructures in 
molecules
that can be aromatic, and I need to be able to handle cases where there can be
differences in the way that the molecule was entered or drawn by a user.




That particular problem is a big part of the reason that we tend to use the 
aromatic representation of things.
 



For example, consider the following alkenyl-substituted pyridine, there
are two possible Kekule structures


m1 = 'C=CC1=NC=CC=C1'
m2 = 'C=CC1N=CC=CC1'



Fixing what I assume is a typo for m2, I can do the following:


In [11]: m1 = Chem.MolFromSmiles('C=CC1=NC=CC=C1')


In [12]: m2 = Chem.MolFromSmiles('C=CC1N=CC=CC=1')


In [13]: q1 = Chem.MolFromSmarts('')


In [14]: q2 = Chem.MolFromSmarts('cccn')


In [15]: list(m1.GetSubstructMatch(q1))
Out[15]: [2, 7, 6, 5]


In [16]: list(m1.GetSubstructMatch(q2))
Out[16]: [6, 5, 4, 3]


In [17]: list(m2.GetSubstructMatch(q1))
Out[17]: [2, 7, 6, 5]


In [18]: list(m2.GetSubstructMatch(q2))
Out[18]: [6, 5, 4, 3]
 


Those particular queries were going for the aromatic species and will only 
match inside the ring, but if you want to be more generic you could tune your 
queries like this:




In [28]: q3 = 
Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]-=,:[*])]')


In [29]: q4 = 
Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#7;$([#7]-=,:[*])]')


In [30]: list(m1.GetSubstructMatch(q3))
Out[30]: [0, 1, 2, 7]


In [31]: list(m1.GetSubstructMatch(q4))
Out[31]: [0, 1, 2, 3]


In [32]: list(m2.GetSubstructMatch(q3))
Out[32]: [0, 1, 2, 7]


In [33]: list(m2.GetSubstructMatch(q4))
Out[33]: [0, 1, 2, 3]



If you aren't familiar with recursive SMARTS, this construct: 
"[#6;$([#6]=,:[*])]" means "a carbon that has either a double bond or an 
aromatic bond to another atom".  So you can interpret q3 as "four carbons that 
each have either a double or aromatic bond and that are connected to each other 
by single, double, or aromatic bonds".


Is this starting to approximate what you're looking for?
-greg










Now consider two SMARTS


pattern1 = '[C]=[C]-[C]={C]

pattern2 = '[C]=[C]-[C]=[N]'



I need to be able to detect the existence of each pattern in the molecule



If m1 is the only available generated Kekule structure, then pattern2 will 
be recognized.

If m2 is the only available generated Kekule  structure, then pattern1 will 
be recognized.



Hence, I am getting different answers for the same input molecule just 
because

it was drawn in different Kekule structures.


Regards,

Jim Metz










-Original Message-
From: Greg Landrum <greg.land...@gmail.com>
To: James T. Metz <jamestm...@aol.com>
Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
Sent: Mon, Sep 11, 2017 10:31 am
Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures



Hi Jim,


The code currently has no way to enumerate Kekule structures. I don't recall 
this coming up in the past and, to be honest, it doesn't seem all that 
generally useful. 


Perhaps there's an alternate way to solve the problem; what are you trying to 
do?


-greg





On Mon, Sep 11, 2017 at 5:04 PM, James T. Metz via Rdkit-discuss 
<rdkit-discuss@lists.sourceforge.net> wrote:

Hello,


Suppose I read in an aromatic SMILES e.g., for benzene



c1c1



I would like to generate the major canonical resonance forms

and save the results as two separate molecules.  Essentially
I am trying to generate


m1 = 'C1=CC=CC-C1'

m2 = 'C1C=CC=CC1'



Can this be done in RDkit?  I have found a KEKULE_ALL 

option in the detailed documentation which seems to be what I
am trying to do, but I don't understand how this option is to be used,
or the proper syntax.


If it is necessary to somehow renumber the atoms and re-generate

Kekule structures, that is OK.  Thank you.


Regards,

Jim Metz














--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@li

Re: [Rdkit-discuss] how to output multiple Kekule structures

2017-09-11 Thread Greg Landrum
On Mon, Sep 11, 2017 at 5:55 PM, James T. Metz <jamestm...@aol.com> wrote:

> Greg,
>
> I need to be able to use SMARTS patterns to identify substructures in
> molecules
> that can be aromatic, and I need to be able to handle cases where there
> can be
> differences in the way that the molecule was entered or drawn by a user.
>

That particular problem is a big part of the reason that we tend to use the
aromatic representation of things.


> For example, consider the following alkenyl-substituted pyridine, there
> are two possible Kekule structures
>
> m1 = 'C=CC1=NC=CC=C1'
> m2 = 'C=CC1N=CC=CC1'
>

Fixing what I assume is a typo for m2, I can do the following:

In [11]: m1 = Chem.MolFromSmiles('C=CC1=NC=CC=C1')

In [12]: m2 = Chem.MolFromSmiles('C=CC1N=CC=CC=1')

In [13]: q1 = Chem.MolFromSmarts('')

In [14]: q2 = Chem.MolFromSmarts('cccn')

In [15]: list(m1.GetSubstructMatch(q1))
Out[15]: [2, 7, 6, 5]

In [16]: list(m1.GetSubstructMatch(q2))
Out[16]: [6, 5, 4, 3]

In [17]: list(m2.GetSubstructMatch(q1))
Out[17]: [2, 7, 6, 5]

In [18]: list(m2.GetSubstructMatch(q2))
Out[18]: [6, 5, 4, 3]


Those particular queries were going for the aromatic species and will only
match inside the ring, but if you want to be more generic you could tune
your queries like this:

In [28]: q3 =
Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]-=,:[*])]')

In [29]: q4 =
Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#7;$([#7]-=,:[*])]')

In [30]: list(m1.GetSubstructMatch(q3))
Out[30]: [0, 1, 2, 7]

In [31]: list(m1.GetSubstructMatch(q4))
Out[31]: [0, 1, 2, 3]

In [32]: list(m2.GetSubstructMatch(q3))
Out[32]: [0, 1, 2, 7]

In [33]: list(m2.GetSubstructMatch(q4))
Out[33]: [0, 1, 2, 3]

If you aren't familiar with recursive SMARTS, this construct:
"[#6;$([#6]=,:[*])]" means "a carbon that has either a double bond or an
aromatic bond to another atom".  So you can interpret q3 as "four carbons
that each have either a double or aromatic bond and that are connected to
each other by single, double, or aromatic bonds".

Is this starting to approximate what you're looking for?
-greg




Now consider two SMARTS
>
> pattern1 = '[C]=[C]-[C]={C]
> pattern2 = '[C]=[C]-[C]=[N]'
>
> I need to be able to detect the existence of each pattern in the
> molecule
>
> If m1 is the only available generated Kekule structure, then pattern2
> will be recognized.
> If m2 is the only available generated Kekule  structure, then pattern1
> will be recognized.
>
> Hence, I am getting different answers for the same input molecule just
> because
> it was drawn in different Kekule structures.
>
> Regards,
> Jim Metz
>
>
>
>
>
> -Original Message-
> From: Greg Landrum <greg.land...@gmail.com>
> To: James T. Metz <jamestm...@aol.com>
> Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
> Sent: Mon, Sep 11, 2017 10:31 am
> Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures
>
> Hi Jim,
>
> The code currently has no way to enumerate Kekule structures. I don't
> recall this coming up in the past and, to be honest, it doesn't seem all
> that generally useful.
>
> Perhaps there's an alternate way to solve the problem; what are you trying
> to do?
>
> -greg
>
>
> On Mon, Sep 11, 2017 at 5:04 PM, James T. Metz via Rdkit-discuss <
> rdkit-discuss@lists.sourceforge.net> wrote:
>
> Hello,
>
> Suppose I read in an aromatic SMILES e.g., for benzene
>
> c1c1
>
> I would like to generate the major canonical resonance forms
> and save the results as two separate molecules.  Essentially
> I am trying to generate
>
> m1 = 'C1=CC=CC-C1'
> m2 = 'C1C=CC=CC1'
>
> Can this be done in RDkit?  I have found a KEKULE_ALL
> option in the detailed documentation which seems to be what I
> am trying to do, but I don't understand how this option is to be used,
> or the proper syntax.
>
> If it is necessary to somehow renumber the atoms and re-generate
> Kekule structures, that is OK.  Thank you.
>
> Regards,
> Jim Metz
>
>
>
>
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] how to output multiple Kekule structures

2017-09-11 Thread James T. Metz via Rdkit-discuss
Greg,


I need to be able to use SMARTS patterns to identify substructures in 
molecules
that can be aromatic, and I need to be able to handle cases where there can be
differences in the way that the molecule was entered or drawn by a user.


For example, consider the following alkenyl-substituted pyridine, there
are two possible Kekule structures


m1 = 'C=CC1=NC=CC=C1'
m2 = 'C=CC1N=CC=CC1'


Now consider two SMARTS


pattern1 = '[C]=[C]-[C]={C]

pattern2 = '[C]=[C]-[C]=[N]'



I need to be able to detect the existence of each pattern in the molecule



If m1 is the only available generated Kekule structure, then pattern2 will 
be recognized.

If m2 is the only available generated Kekule  structure, then pattern1 will 
be recognized.



Hence, I am getting different answers for the same input molecule just 
because

it was drawn in different Kekule structures.


Regards,

Jim Metz









-Original Message-
From: Greg Landrum <greg.land...@gmail.com>
To: James T. Metz <jamestm...@aol.com>
Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
Sent: Mon, Sep 11, 2017 10:31 am
Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures



Hi Jim,


The code currently has no way to enumerate Kekule structures. I don't recall 
this coming up in the past and, to be honest, it doesn't seem all that 
generally useful. 


Perhaps there's an alternate way to solve the problem; what are you trying to 
do?


-greg





On Mon, Sep 11, 2017 at 5:04 PM, James T. Metz via Rdkit-discuss 
<rdkit-discuss@lists.sourceforge.net> wrote:

Hello,


Suppose I read in an aromatic SMILES e.g., for benzene



c1c1



I would like to generate the major canonical resonance forms

and save the results as two separate molecules.  Essentially
I am trying to generate


m1 = 'C1=CC=CC-C1'

m2 = 'C1C=CC=CC1'



Can this be done in RDkit?  I have found a KEKULE_ALL 

option in the detailed documentation which seems to be what I
am trying to do, but I don't understand how this option is to be used,
or the proper syntax.


If it is necessary to somehow renumber the atoms and re-generate

Kekule structures, that is OK.  Thank you.


Regards,

Jim Metz














--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss






--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] how to output multiple Kekule structures

2017-09-11 Thread Greg Landrum
Hi Jim,

The code currently has no way to enumerate Kekule structures. I don't
recall this coming up in the past and, to be honest, it doesn't seem all
that generally useful.

Perhaps there's an alternate way to solve the problem; what are you trying
to do?

-greg


On Mon, Sep 11, 2017 at 5:04 PM, James T. Metz via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> Hello,
>
> Suppose I read in an aromatic SMILES e.g., for benzene
>
> c1c1
>
> I would like to generate the major canonical resonance forms
> and save the results as two separate molecules.  Essentially
> I am trying to generate
>
> m1 = 'C1=CC=CC-C1'
> m2 = 'C1C=CC=CC1'
>
> Can this be done in RDkit?  I have found a KEKULE_ALL
> option in the detailed documentation which seems to be what I
> am trying to do, but I don't understand how this option is to be used,
> or the proper syntax.
>
> If it is necessary to somehow renumber the atoms and re-generate
> Kekule structures, that is OK.  Thank you.
>
> Regards,
> Jim Metz
>
>
>
>
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss