Re: [Rdkit-discuss] how to output multiple Kekule structures
Hi Jason, Jim, You are right; I think an alternative could be the following: >>> from rdkit import Chem >>> suppl=Chem.ResonanceMolSupplier(Chem.MolFromSmiles('c1c1'),Chem.KEKULE_ALL) >>> >>> q=Chem.MolFromSmarts('C-C=C') >>> for mol in suppl: ... molCopy=Chem.Mol(mol) ... for a in molCopy.GetAtoms(): ... a.SetIsAromatic(False) ... for b in molCopy.GetBonds(): ... b.SetIsAromatic(False) ... print molCopy.GetSubstructMatches(q) ... ((0, 5, 4), (1, 2, 3), (2, 1, 0), (3, 4, 5), (4, 3, 2), (5, 0, 1)) ((0, 1, 2), (1, 0, 5), (2, 3, 4), (3, 2, 1), (4, 5, 0), (5, 4, 3)) >>> Cheers, p. > On 11 Sep 2017, at 23:38, Jason Biggs <jasondbi...@gmail.com> wrote: > > But keep in mind that the kekulized mols you create with the resonance > supplier will not match the SMARTS patterns given. > > Chem.MolToSmiles(mol2, kekuleSmiles = True) > > >'C1C=CC=CC=1' > > mol2.HasSubstructMatch(Chem.MolFromSmarts('[C]=[C]-[C]')) > > > False > > mol2.HasSubstructMatch(Chem.MolFromSmarts('[c]=[c]-[c]')) > > > True > > So at the very least, you need to change the smarts strings to use [#6] > instead of [C] > > > > Jason Biggs > > >> On Mon, Sep 11, 2017 at 2:53 PM, Paolo Tosco <paolo.to...@unito.it> wrote: >> Hi Jim, >> >> you can indeed enumerate all Kekulè structures for a molecule within the >> RDKit using Chem.ResonanceMolSupplier(): >> >> from rdkit import Chem >> mol = Chem.MolFromSmiles('c1c1') >> suppl = Chem.ResonanceMolSupplier(mol, Chem.KEKULE_ALL) >> len(suppl) >> 2 >> for i in range(len(suppl)): >> print (Chem.MolToSmiles(suppl[i], kekuleSmiles=True)) >> C1C=CC=CC=1 >> C1=CC=CC=C1 >> >> Best, >> Paolo >> >> >>> On 09/11/2017 05:22 PM, James T. Metz via Rdkit-discuss wrote: >>> Greg, >>> >>> Thanks! Yes, very helpful. I will need to digest the detailed >>> information >>> you have provided. I am somewhat familiar with recursive SMARTS. Thanks >>> again. >>> >>> Regards, >>> Jim Metz >>> >>> >>> >>> >>> -Original Message- >>> From: Greg Landrum <greg.land...@gmail.com> >>> To: James T. Metz <jamestm...@aol.com> >>> Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net> >>> Sent: Mon, Sep 11, 2017 11:15 am >>> Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures >>> >>> >>> On Mon, Sep 11, 2017 at 5:55 PM, James T. Metz <jamestm...@aol.com> wrote: >>> Greg, >>> >>> I need to be able to use SMARTS patterns to identify substructures in >>> molecules >>> that can be aromatic, and I need to be able to handle cases where there can >>> be >>> differences in the way that the molecule was entered or drawn by a user. >>> >>> That particular problem is a big part of the reason that we tend to use the >>> aromatic representation of things. >>> >>> For example, consider the following alkenyl-substituted pyridine, there >>> are two possible Kekule structures >>> >>> m1 = 'C=CC1=NC=CC=C1' >>> m2 = 'C=CC1N=CC=CC1' >>> >>> Fixing what I assume is a typo for m2, I can do the following: >>> >>> In [11]: m1 = Chem.MolFromSmiles('C=CC1=NC=CC=C1') >>> >>> In [12]: m2 = Chem.MolFromSmiles('C=CC1N=CC=CC=1') >>> >>> In [13]: q1 = Chem.MolFromSmarts('') >>> >>> In [14]: q2 = Chem.MolFromSmarts('cccn') >>> >>> In [15]: list(m1.GetSubstructMatch(q1)) >>> Out[15]: [2, 7, 6, 5] >>> >>> In [16]: list(m1.GetSubstructMatch(q2)) >>> Out[16]: [6, 5, 4, 3] >>> >>> In [17]: list(m2.GetSubstructMatch(q1)) >>> Out[17]: [2, 7, 6, 5] >>> >>> In [18]: list(m2.GetSubstructMatch(q2)) >>> Out[18]: [6, 5, 4, 3] >>> >>> >>> Those particular queries were going for the aromatic species and will only >>> match inside the ring, but if you want to be more generic you could tune >>> your queries like this: >>> >>> In [28]: q3 = >>> Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]-=,:[*])]') >>> >>> In [29]: q4 = >>> Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#7;$([#7]-=,:[*])]') >>> >&g
Re: [Rdkit-discuss] how to output multiple Kekule structures
But keep in mind that the kekulized mols you create with the resonance supplier will not match the SMARTS patterns given. Chem.MolToSmiles(mol2, kekuleSmiles = True) >'C1C=CC=CC=1' mol2.HasSubstructMatch(Chem.MolFromSmarts('[C]=[C]-[C]')) > False mol2.HasSubstructMatch(Chem.MolFromSmarts('[c]=[c]-[c]')) > True So at the very least, you need to change the smarts strings to use [#6] instead of [C] Jason Biggs On Mon, Sep 11, 2017 at 2:53 PM, Paolo Tosco <paolo.to...@unito.it> wrote: > Hi Jim, > > you can indeed enumerate all Kekulè structures for a molecule within the > RDKit using Chem.ResonanceMolSupplier(): > > from rdkit import Chem > > mol = Chem.MolFromSmiles('c1c1') > > suppl = Chem.ResonanceMolSupplier(mol, Chem.KEKULE_ALL) > > len(suppl) > > 2 > > for i in range(len(suppl)): > print (Chem.MolToSmiles(suppl[i], kekuleSmiles=True)) > > C1C=CC=CC=1 > C1=CC=CC=C1 > > Best, > Paolo > > > On 09/11/2017 05:22 PM, James T. Metz via Rdkit-discuss wrote: > > Greg, > > Thanks! Yes, very helpful. I will need to digest the detailed > information > you have provided. I am somewhat familiar with recursive SMARTS. Thanks > again. > > Regards, > Jim Metz > > > > > -Original Message- > From: Greg Landrum <greg.land...@gmail.com> <greg.land...@gmail.com> > To: James T. Metz <jamestm...@aol.com> <jamestm...@aol.com> > Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net> > <rdkit-discuss@lists.sourceforge.net> > Sent: Mon, Sep 11, 2017 11:15 am > Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures > > > On Mon, Sep 11, 2017 at 5:55 PM, James T. Metz < <jamestm...@aol.com> > jamestm...@aol.com> wrote: > > Greg, > > I need to be able to use SMARTS patterns to identify substructures in > molecules > that can be aromatic, and I need to be able to handle cases where there > can be > differences in the way that the molecule was entered or drawn by a user. > > > That particular problem is a big part of the reason that we tend to use > the aromatic representation of things. > > > For example, consider the following alkenyl-substituted pyridine, there > are two possible Kekule structures > > m1 = 'C=CC1=NC=CC=C1' > m2 = 'C=CC1N=CC=CC1' > > > Fixing what I assume is a typo for m2, I can do the following: > > In [11]: m1 = Chem.MolFromSmiles('C=CC1=NC=CC=C1') > > In [12]: m2 = Chem.MolFromSmiles('C=CC1N=CC=CC=1') > > In [13]: q1 = Chem.MolFromSmarts('') > > In [14]: q2 = Chem.MolFromSmarts('cccn') > > In [15]: list(m1.GetSubstructMatch(q1)) > Out[15]: [2, 7, 6, 5] > > In [16]: list(m1.GetSubstructMatch(q2)) > Out[16]: [6, 5, 4, 3] > > In [17]: list(m2.GetSubstructMatch(q1)) > Out[17]: [2, 7, 6, 5] > > In [18]: list(m2.GetSubstructMatch(q2)) > Out[18]: [6, 5, 4, 3] > > > Those particular queries were going for the aromatic species and will only > match inside the ring, but if you want to be more generic you could tune > your queries like this: > > In [28]: q3 = Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*]) > ]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]-=,:[*])]') > > In [29]: q4 = Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*]) > ]-,=,:[#6;$([#6]=,:[*])]-,=,:[#7;$([#7]-=,:[*])]') > > In [30]: list(m1.GetSubstructMatch(q3)) > Out[30]: [0, 1, 2, 7] > > In [31]: list(m1.GetSubstructMatch(q4)) > Out[31]: [0, 1, 2, 3] > > In [32]: list(m2.GetSubstructMatch(q3)) > Out[32]: [0, 1, 2, 7] > > In [33]: list(m2.GetSubstructMatch(q4)) > Out[33]: [0, 1, 2, 3] > > If you aren't familiar with recursive SMARTS, this construct: > "[#6;$([#6]=,:[*])]" means "a carbon that has either a double bond or an > aromatic bond to another atom". So you can interpret q3 as "four carbons > that each have either a double or aromatic bond and that are connected to > each other by single, double, or aromatic bonds". > > Is this starting to approximate what you're looking for? > -greg > > > > > Now consider two SMARTS > > pattern1 = '[C]=[C]-[C]={C] > pattern2 = '[C]=[C]-[C]=[N]' > > I need to be able to detect the existence of each pattern in the > molecule > > If m1 is the only available generated Kekule structure, then pattern2 > will be recognized. > If m2 is the only available generated Kekule structure, then pattern1 > will be recognized. > > Hence, I am getting different answers for the same input molecule just > because > it was drawn in different Kekule structures. > > Regards, &g
Re: [Rdkit-discuss] how to output multiple Kekule structures
Paolo, Exactly what I was looking for. Very helpful. Thank you. Regards, Jim Metz -Original Message- From: Paolo Tosco <paolo.to...@unito.it> To: James T. Metz <jamestm...@aol.com>; greg.landrum <greg.land...@gmail.com>; rdkit-discuss <rdkit-discuss@lists.sourceforge.net> Sent: Mon, Sep 11, 2017 2:53 pm Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures Hi Jim, you can indeed enumerate all Kekulè structures for a molecule withinthe RDKit using Chem.ResonanceMolSupplier(): from rdkit import Chem mol = Chem.MolFromSmiles('c1c1') suppl = Chem.ResonanceMolSupplier(mol, Chem.KEKULE_ALL) len(suppl) 2 for i in range(len(suppl)): print (Chem.MolToSmiles(suppl[i], kekuleSmiles=True)) C1C=CC=CC=1 C1=CC=CC=C1 Best, Paolo On 09/11/2017 05:22 PM, James T. Metz via Rdkit-discuss wrote: Greg, Thanks! Yes, very helpful. I will need to digest the detailed information you have provided. I am somewhat familiar with recursive SMARTS. Thanks again. Regards, Jim Metz -OriginalMessage- From: Greg Landrum <greg.land...@gmail.com> To: James T. Metz <jamestm...@aol.com> Cc: RDKit Discuss<rdkit-discuss@lists.sourceforge.net> Sent: Mon, Sep 11, 2017 11:15 am Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures On Mon, Sep 11, 2017 at 5:55 PM, James T. Metz <jamestm...@aol.com> wrote: Greg, I need to be able to use SMARTSpatterns to identify substructures inmolecules that can be aromatic, and I need to beable to handle cases where there can be differences in the way that the moleculewas entered or drawn by a user. That particular problem is a big part of the reason that we tend to use the aromatic representation of things. For example, consider the following alkenyl-substituted pyridine, there are two possible Kekule structures m1 = 'C=CC1=NC=CC=C1' m2 = 'C=CC1N=CC=CC1' Fixing what I assume is a typo for m2, I cando the following: In [11]: m1 =Chem.MolFromSmiles('C=CC1=NC=CC=C1') In [12]: m2 =Chem.MolFromSmiles('C=CC1N=CC=CC=1') In [13]: q1 = Chem.MolFromSmarts('')
Re: [Rdkit-discuss] how to output multiple Kekule structures
Hi Jim, you can indeed enumerate all Kekulè structures for a molecule within the RDKit using Chem.ResonanceMolSupplier(): from rdkit import Chem mol = Chem.MolFromSmiles('c1c1') suppl = Chem.ResonanceMolSupplier(mol, Chem.KEKULE_ALL) len(suppl) 2 for i in range(len(suppl)): print (Chem.MolToSmiles(suppl[i], kekuleSmiles=True)) C1C=CC=CC=1 C1=CC=CC=C1 Best, Paolo On 09/11/2017 05:22 PM, James T. Metz via Rdkit-discuss wrote: Greg, Thanks! Yes, very helpful. I will need to digest the detailed information you have provided. I am somewhat familiar with recursive SMARTS. Thanks again. Regards, Jim Metz -Original Message- From: Greg Landrum <greg.land...@gmail.com> To: James T. Metz <jamestm...@aol.com> Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net> Sent: Mon, Sep 11, 2017 11:15 am Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures On Mon, Sep 11, 2017 at 5:55 PM, James T. Metz <jamestm...@aol.com <mailto:jamestm...@aol.com>> wrote: Greg, I need to be able to use SMARTS patterns to identify substructures in molecules that can be aromatic, and I need to be able to handle cases where there can be differences in the way that the molecule was entered or drawn by a user. That particular problem is a big part of the reason that we tend to use the aromatic representation of things. For example, consider the following alkenyl-substituted pyridine, there are two possible Kekule structures m1 = 'C=CC1=NC=CC=C1' m2 = 'C=CC1N=CC=CC1' Fixing what I assume is a typo for m2, I can do the following: In [11]: m1 = Chem.MolFromSmiles('C=CC1=NC=CC=C1') In [12]: m2 = Chem.MolFromSmiles('C=CC1N=CC=CC=1') In [13]: q1 = Chem.MolFromSmarts('') In [14]: q2 = Chem.MolFromSmarts('cccn') In [15]: list(m1.GetSubstructMatch(q1)) Out[15]: [2, 7, 6, 5] In [16]: list(m1.GetSubstructMatch(q2)) Out[16]: [6, 5, 4, 3] In [17]: list(m2.GetSubstructMatch(q1)) Out[17]: [2, 7, 6, 5] In [18]: list(m2.GetSubstructMatch(q2)) Out[18]: [6, 5, 4, 3] Those particular queries were going for the aromatic species and will only match inside the ring, but if you want to be more generic you could tune your queries like this: In [28]: q3 = Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]-=,:[*])]') In [29]: q4 = Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#7;$([#7]-=,:[*])]') In [30]: list(m1.GetSubstructMatch(q3)) Out[30]: [0, 1, 2, 7] In [31]: list(m1.GetSubstructMatch(q4)) Out[31]: [0, 1, 2, 3] In [32]: list(m2.GetSubstructMatch(q3)) Out[32]: [0, 1, 2, 7] In [33]: list(m2.GetSubstructMatch(q4)) Out[33]: [0, 1, 2, 3] If you aren't familiar with recursive SMARTS, this construct: "[#6;$([#6]=,:[*])]" means "a carbon that has either a double bond or an aromatic bond to another atom". So you can interpret q3 as "four carbons that each have either a double or aromatic bond and that are connected to each other by single, double, or aromatic bonds". Is this starting to approximate what you're looking for? -greg Now consider two SMARTS pattern1 = '[C]=[C]-[C]={C] pattern2 = '[C]=[C]-[C]=[N]' I need to be able to detect the existence of each pattern in the molecule If m1 is the only available generated Kekule structure, then pattern2 will be recognized. If m2 is the only available generated Kekule structure, then pattern1 will be recognized. Hence, I am getting different answers for the same input molecule just because it was drawn in different Kekule structures. Regards, Jim Metz -Original Message- From: Greg Landrum <greg.land...@gmail.com <mailto:greg.land...@gmail.com>> To: James T. Metz <jamestm...@aol.com <mailto:jamestm...@aol.com>> Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net <mailto:rdkit-discuss@lists.sourceforge.net>> Sent: Mon, Sep 11, 2017 10:31 am Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures Hi Jim, The code currently has no way to enumerate Kekule structures. I don't recall this coming up in the past and, to be honest, it doesn't seem all that generally useful. Perhaps there's an alternate way to solve the problem; what are you trying to do? -greg On Mon, Sep 11, 2017 at 5:04 PM, James T. Metz via Rdkit-discuss <rdkit-discuss@lists.sourceforge.net> wrote: Hello, Suppose I read in an aromatic SMILES e.g., for benzene c1c1 I would like to generate the major canonical resonance forms and save the results as two separate molecules. Essentially I am trying to generate
Re: [Rdkit-discuss] how to output multiple Kekule structures
Greg, Thanks! Yes, very helpful. I will need to digest the detailed information you have provided. I am somewhat familiar with recursive SMARTS. Thanks again. Regards, Jim Metz -Original Message- From: Greg Landrum <greg.land...@gmail.com> To: James T. Metz <jamestm...@aol.com> Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net> Sent: Mon, Sep 11, 2017 11:15 am Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures On Mon, Sep 11, 2017 at 5:55 PM, James T. Metz <jamestm...@aol.com> wrote: Greg, I need to be able to use SMARTS patterns to identify substructures in molecules that can be aromatic, and I need to be able to handle cases where there can be differences in the way that the molecule was entered or drawn by a user. That particular problem is a big part of the reason that we tend to use the aromatic representation of things. For example, consider the following alkenyl-substituted pyridine, there are two possible Kekule structures m1 = 'C=CC1=NC=CC=C1' m2 = 'C=CC1N=CC=CC1' Fixing what I assume is a typo for m2, I can do the following: In [11]: m1 = Chem.MolFromSmiles('C=CC1=NC=CC=C1') In [12]: m2 = Chem.MolFromSmiles('C=CC1N=CC=CC=1') In [13]: q1 = Chem.MolFromSmarts('') In [14]: q2 = Chem.MolFromSmarts('cccn') In [15]: list(m1.GetSubstructMatch(q1)) Out[15]: [2, 7, 6, 5] In [16]: list(m1.GetSubstructMatch(q2)) Out[16]: [6, 5, 4, 3] In [17]: list(m2.GetSubstructMatch(q1)) Out[17]: [2, 7, 6, 5] In [18]: list(m2.GetSubstructMatch(q2)) Out[18]: [6, 5, 4, 3] Those particular queries were going for the aromatic species and will only match inside the ring, but if you want to be more generic you could tune your queries like this: In [28]: q3 = Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]-=,:[*])]') In [29]: q4 = Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#7;$([#7]-=,:[*])]') In [30]: list(m1.GetSubstructMatch(q3)) Out[30]: [0, 1, 2, 7] In [31]: list(m1.GetSubstructMatch(q4)) Out[31]: [0, 1, 2, 3] In [32]: list(m2.GetSubstructMatch(q3)) Out[32]: [0, 1, 2, 7] In [33]: list(m2.GetSubstructMatch(q4)) Out[33]: [0, 1, 2, 3] If you aren't familiar with recursive SMARTS, this construct: "[#6;$([#6]=,:[*])]" means "a carbon that has either a double bond or an aromatic bond to another atom". So you can interpret q3 as "four carbons that each have either a double or aromatic bond and that are connected to each other by single, double, or aromatic bonds". Is this starting to approximate what you're looking for? -greg Now consider two SMARTS pattern1 = '[C]=[C]-[C]={C] pattern2 = '[C]=[C]-[C]=[N]' I need to be able to detect the existence of each pattern in the molecule If m1 is the only available generated Kekule structure, then pattern2 will be recognized. If m2 is the only available generated Kekule structure, then pattern1 will be recognized. Hence, I am getting different answers for the same input molecule just because it was drawn in different Kekule structures. Regards, Jim Metz -Original Message- From: Greg Landrum <greg.land...@gmail.com> To: James T. Metz <jamestm...@aol.com> Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net> Sent: Mon, Sep 11, 2017 10:31 am Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures Hi Jim, The code currently has no way to enumerate Kekule structures. I don't recall this coming up in the past and, to be honest, it doesn't seem all that generally useful. Perhaps there's an alternate way to solve the problem; what are you trying to do? -greg On Mon, Sep 11, 2017 at 5:04 PM, James T. Metz via Rdkit-discuss <rdkit-discuss@lists.sourceforge.net> wrote: Hello, Suppose I read in an aromatic SMILES e.g., for benzene c1c1 I would like to generate the major canonical resonance forms and save the results as two separate molecules. Essentially I am trying to generate m1 = 'C1=CC=CC-C1' m2 = 'C1C=CC=CC1' Can this be done in RDkit? I have found a KEKULE_ALL option in the detailed documentation which seems to be what I am trying to do, but I don't understand how this option is to be used, or the proper syntax. If it is necessary to somehow renumber the atoms and re-generate Kekule structures, that is OK. Thank you. Regards, Jim Metz -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@li
Re: [Rdkit-discuss] how to output multiple Kekule structures
On Mon, Sep 11, 2017 at 5:55 PM, James T. Metz <jamestm...@aol.com> wrote: > Greg, > > I need to be able to use SMARTS patterns to identify substructures in > molecules > that can be aromatic, and I need to be able to handle cases where there > can be > differences in the way that the molecule was entered or drawn by a user. > That particular problem is a big part of the reason that we tend to use the aromatic representation of things. > For example, consider the following alkenyl-substituted pyridine, there > are two possible Kekule structures > > m1 = 'C=CC1=NC=CC=C1' > m2 = 'C=CC1N=CC=CC1' > Fixing what I assume is a typo for m2, I can do the following: In [11]: m1 = Chem.MolFromSmiles('C=CC1=NC=CC=C1') In [12]: m2 = Chem.MolFromSmiles('C=CC1N=CC=CC=1') In [13]: q1 = Chem.MolFromSmarts('') In [14]: q2 = Chem.MolFromSmarts('cccn') In [15]: list(m1.GetSubstructMatch(q1)) Out[15]: [2, 7, 6, 5] In [16]: list(m1.GetSubstructMatch(q2)) Out[16]: [6, 5, 4, 3] In [17]: list(m2.GetSubstructMatch(q1)) Out[17]: [2, 7, 6, 5] In [18]: list(m2.GetSubstructMatch(q2)) Out[18]: [6, 5, 4, 3] Those particular queries were going for the aromatic species and will only match inside the ring, but if you want to be more generic you could tune your queries like this: In [28]: q3 = Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]-=,:[*])]') In [29]: q4 = Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#7;$([#7]-=,:[*])]') In [30]: list(m1.GetSubstructMatch(q3)) Out[30]: [0, 1, 2, 7] In [31]: list(m1.GetSubstructMatch(q4)) Out[31]: [0, 1, 2, 3] In [32]: list(m2.GetSubstructMatch(q3)) Out[32]: [0, 1, 2, 7] In [33]: list(m2.GetSubstructMatch(q4)) Out[33]: [0, 1, 2, 3] If you aren't familiar with recursive SMARTS, this construct: "[#6;$([#6]=,:[*])]" means "a carbon that has either a double bond or an aromatic bond to another atom". So you can interpret q3 as "four carbons that each have either a double or aromatic bond and that are connected to each other by single, double, or aromatic bonds". Is this starting to approximate what you're looking for? -greg Now consider two SMARTS > > pattern1 = '[C]=[C]-[C]={C] > pattern2 = '[C]=[C]-[C]=[N]' > > I need to be able to detect the existence of each pattern in the > molecule > > If m1 is the only available generated Kekule structure, then pattern2 > will be recognized. > If m2 is the only available generated Kekule structure, then pattern1 > will be recognized. > > Hence, I am getting different answers for the same input molecule just > because > it was drawn in different Kekule structures. > > Regards, > Jim Metz > > > > > > -Original Message- > From: Greg Landrum <greg.land...@gmail.com> > To: James T. Metz <jamestm...@aol.com> > Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net> > Sent: Mon, Sep 11, 2017 10:31 am > Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures > > Hi Jim, > > The code currently has no way to enumerate Kekule structures. I don't > recall this coming up in the past and, to be honest, it doesn't seem all > that generally useful. > > Perhaps there's an alternate way to solve the problem; what are you trying > to do? > > -greg > > > On Mon, Sep 11, 2017 at 5:04 PM, James T. Metz via Rdkit-discuss < > rdkit-discuss@lists.sourceforge.net> wrote: > > Hello, > > Suppose I read in an aromatic SMILES e.g., for benzene > > c1c1 > > I would like to generate the major canonical resonance forms > and save the results as two separate molecules. Essentially > I am trying to generate > > m1 = 'C1=CC=CC-C1' > m2 = 'C1C=CC=CC1' > > Can this be done in RDkit? I have found a KEKULE_ALL > option in the detailed documentation which seems to be what I > am trying to do, but I don't understand how this option is to be used, > or the proper syntax. > > If it is necessary to somehow renumber the atoms and re-generate > Kekule structures, that is OK. Thank you. > > Regards, > Jim Metz > > > > > > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] how to output multiple Kekule structures
Greg, I need to be able to use SMARTS patterns to identify substructures in molecules that can be aromatic, and I need to be able to handle cases where there can be differences in the way that the molecule was entered or drawn by a user. For example, consider the following alkenyl-substituted pyridine, there are two possible Kekule structures m1 = 'C=CC1=NC=CC=C1' m2 = 'C=CC1N=CC=CC1' Now consider two SMARTS pattern1 = '[C]=[C]-[C]={C] pattern2 = '[C]=[C]-[C]=[N]' I need to be able to detect the existence of each pattern in the molecule If m1 is the only available generated Kekule structure, then pattern2 will be recognized. If m2 is the only available generated Kekule structure, then pattern1 will be recognized. Hence, I am getting different answers for the same input molecule just because it was drawn in different Kekule structures. Regards, Jim Metz -Original Message- From: Greg Landrum <greg.land...@gmail.com> To: James T. Metz <jamestm...@aol.com> Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net> Sent: Mon, Sep 11, 2017 10:31 am Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures Hi Jim, The code currently has no way to enumerate Kekule structures. I don't recall this coming up in the past and, to be honest, it doesn't seem all that generally useful. Perhaps there's an alternate way to solve the problem; what are you trying to do? -greg On Mon, Sep 11, 2017 at 5:04 PM, James T. Metz via Rdkit-discuss <rdkit-discuss@lists.sourceforge.net> wrote: Hello, Suppose I read in an aromatic SMILES e.g., for benzene c1c1 I would like to generate the major canonical resonance forms and save the results as two separate molecules. Essentially I am trying to generate m1 = 'C1=CC=CC-C1' m2 = 'C1C=CC=CC1' Can this be done in RDkit? I have found a KEKULE_ALL option in the detailed documentation which seems to be what I am trying to do, but I don't understand how this option is to be used, or the proper syntax. If it is necessary to somehow renumber the atoms and re-generate Kekule structures, that is OK. Thank you. Regards, Jim Metz -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] how to output multiple Kekule structures
Hi Jim, The code currently has no way to enumerate Kekule structures. I don't recall this coming up in the past and, to be honest, it doesn't seem all that generally useful. Perhaps there's an alternate way to solve the problem; what are you trying to do? -greg On Mon, Sep 11, 2017 at 5:04 PM, James T. Metz via Rdkit-discuss < rdkit-discuss@lists.sourceforge.net> wrote: > Hello, > > Suppose I read in an aromatic SMILES e.g., for benzene > > c1c1 > > I would like to generate the major canonical resonance forms > and save the results as two separate molecules. Essentially > I am trying to generate > > m1 = 'C1=CC=CC-C1' > m2 = 'C1C=CC=CC1' > > Can this be done in RDkit? I have found a KEKULE_ALL > option in the detailed documentation which seems to be what I > am trying to do, but I don't understand how this option is to be used, > or the proper syntax. > > If it is necessary to somehow renumber the atoms and re-generate > Kekule structures, that is OK. Thank you. > > Regards, > Jim Metz > > > > > > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss