Re: [Rdkit-discuss] Another Can't kekulize mol observation
Hi Greg: Your suggestions pointed me into the right direction. Thank you very much. I tried your example but it seems (?) that the method GetAtomWithIdx() is not available for edit_mol. So I stored a neighbor atom which is a N atom, is aromatic and does not belong to the same ring. I then applied the setnumexplicitHs method on the mol. Cheers, Markus On Thu, Apr 27, 2017 at 8:07 AM, Greg Landrumwrote: > Hi Markus, > > The general rule of thumb is that if you remove an exocyclic neighbor from > an aromatic heteroatom you need to add an "explicit H" to the heteroatom. > Here's a modification of one of your pieces of code that adds that H as an > atom that's actually in the graph: > > # use ReplaceAtom: > Hatom = Chem.MolFromSmiles('[H]').GetAtomWithIdx(0) > > atidx = 8 > > edit_mol = Chem.EditableMol(mol) > > edit_mol.ReplaceAtom(atidx,Hatom) > scaffold = edit_mol.GetMol() > > scaffold_smiles = Chem.MolToSmiles(scaffold) > print(scaffold_smiles) > > The change relative to what you were doing is the use of MolFromSmiles() > to get the H. > > A more efficient approach that has the advantage of not leaving extra H > atoms in the molecule that then need to be removed is to add the "explicit > H" to the atom: > > atidx = 8 > nbrIdx = 7 > > edit_mol = Chem.RWMol(mol) > > edit_mol.RemoveAtom(atidx) > edit_mol.GetAtomWithIdx(nbrIdx).SetNumExplicitHs(1) > scaffold = edit_mol.GetMol() > > scaffold_smiles = Chem.MolToSmiles(scaffold) > print(scaffold_smiles) > > This produces: > > c1ccc(cc1)-c1n[nH]c(n1)-c1c1 > > > I hope that helps > -greg > > > On Thu, Apr 27, 2017 at 4:53 PM, Markus Metz wrote: > >> Hello all: >> >> Thank you very much for your messages. >> >> As I would like to process many molecules manually editing smiles is >> unfortunately not an option. >> >> Therefore I tried to automatize this step using the method ReplaceAtom of >> the class EditableMol. >> I defined an Hatom and tried to use it. Upon executing attached notebook >> the input molecule is unchanged. >> >> Do you have another suggestions which might help answer my question? >> >> Best, >> Markus >> >> >> >> >> >> >> >> On Wed, Apr 26, 2017 at 11:46 PM, Peter S. Shenkin >> wrote: >> >>> I would just replace 'n' with '[nH]' in your existing SMILES, for the N >>> you want the H on. >>> >>> -P. >>> >>> On Thu, Apr 27, 2017 at 12:32 AM, Hongbin Yang >>> wrote: >>> Hi Markus, “c1ccc(cc1)-c1nnc(n1)-c1c1” is different from "c1ccc(cc1)-c1nncn1-c1c1", so you cannot remove the parentheses. The error "Can't kekulize mol." is caused by the triazole in your molecule. "c1nncn1" tells that the molecule is aromatic, but it do not tell where the H is. For example, "C1=NN=CN1" is "4H-1,2,4-triazole" and "C1=NC=NN1" is 1H-1,2,4-triazole. They are different in Kekulize but both of them can represented by "c1nncn1" There's two solutions I suggest: 1. use `Chem.MolFromSmiles('c1ccc(cc1)-c1nnc(n1)-c1c1',False)` (reference: http://www.rdkit.org/docs/api/rdkit.Chem.rdmolfi les-module.html#MolFromSmiles) 2. Manually Kekulize it: `Chem.MolFromSmiles('c1ccc(cc1)-C1=NN=C(N1)-c1c1')` . This indicate the H is on the 4'N. -- Hongbin Yang *From:* Markus Metz *Date:* 2017-04-27 09:30 *To:* RDKit Discuss *Subject:* [Rdkit-discuss] Another Can't kekulize mol observation Hello all: I obtained this smiles string: c1ccc(cc1)-c1nnc(n1)-c1c1 by removing atoms from the n1 in parentheses. Using: mol = Chem.MolFromSmiles("c1ccc(cc1)-c1nnc(n1)-c1c1") throws an error: Can't kekulize mol. Using mol = Chem.MolFromSmiles("c1ccc(cc1)-c1nncn1-c1c1") works fine. Is there any workaround? Any input is highly appreciated. Cheers, Markus -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >> >> >> -- >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> >
Re: [Rdkit-discuss] Another Can't kekulize mol observation
Hi Markus, The general rule of thumb is that if you remove an exocyclic neighbor from an aromatic heteroatom you need to add an "explicit H" to the heteroatom. Here's a modification of one of your pieces of code that adds that H as an atom that's actually in the graph: # use ReplaceAtom: Hatom = Chem.MolFromSmiles('[H]').GetAtomWithIdx(0) atidx = 8 edit_mol = Chem.EditableMol(mol) edit_mol.ReplaceAtom(atidx,Hatom) scaffold = edit_mol.GetMol() scaffold_smiles = Chem.MolToSmiles(scaffold) print(scaffold_smiles) The change relative to what you were doing is the use of MolFromSmiles() to get the H. A more efficient approach that has the advantage of not leaving extra H atoms in the molecule that then need to be removed is to add the "explicit H" to the atom: atidx = 8 nbrIdx = 7 edit_mol = Chem.RWMol(mol) edit_mol.RemoveAtom(atidx) edit_mol.GetAtomWithIdx(nbrIdx).SetNumExplicitHs(1) scaffold = edit_mol.GetMol() scaffold_smiles = Chem.MolToSmiles(scaffold) print(scaffold_smiles) This produces: c1ccc(cc1)-c1n[nH]c(n1)-c1c1 I hope that helps -greg On Thu, Apr 27, 2017 at 4:53 PM, Markus Metzwrote: > Hello all: > > Thank you very much for your messages. > > As I would like to process many molecules manually editing smiles is > unfortunately not an option. > > Therefore I tried to automatize this step using the method ReplaceAtom of > the class EditableMol. > I defined an Hatom and tried to use it. Upon executing attached notebook > the input molecule is unchanged. > > Do you have another suggestions which might help answer my question? > > Best, > Markus > > > > > > > > On Wed, Apr 26, 2017 at 11:46 PM, Peter S. Shenkin > wrote: > >> I would just replace 'n' with '[nH]' in your existing SMILES, for the N >> you want the H on. >> >> -P. >> >> On Thu, Apr 27, 2017 at 12:32 AM, Hongbin Yang >> wrote: >> >>> Hi Markus, >>> “c1ccc(cc1)-c1nnc(n1)-c1c1” is different from >>> "c1ccc(cc1)-c1nncn1-c1c1", >>> so you cannot remove the parentheses. >>> >>> The error "Can't kekulize mol." is caused by the triazole in your >>> molecule. >>> >>> "c1nncn1" tells that the molecule is aromatic, but it do not tell where >>> the H is. >>> >>> For example, "C1=NN=CN1" is "4H-1,2,4-triazole" and "C1=NC=NN1" is >>> 1H-1,2,4-triazole. >>> They are different in Kekulize but both of them can represented by "c1nncn1" >>> >>> There's two solutions I suggest: >>> 1. use `Chem.MolFromSmiles('c1ccc(cc1)-c1nnc(n1)-c1c1',False)` >>> (reference: http://www.rdkit.org/docs/api/rdkit.Chem.rdmolfi >>> les-module.html#MolFromSmiles) >>> >>> 2. Manually Kekulize it: >>> `Chem.MolFromSmiles('c1ccc(cc1)-C1=NN=C(N1)-c1c1')` >>> . This indicate the H is on the 4'N. >>> >>> >>> -- >>> Hongbin Yang >>> >>> >>> *From:* Markus Metz >>> *Date:* 2017-04-27 09:30 >>> *To:* RDKit Discuss >>> *Subject:* [Rdkit-discuss] Another Can't kekulize mol observation >>> Hello all: >>> >>> I obtained this smiles string: >>> c1ccc(cc1)-c1nnc(n1)-c1c1 >>> by removing atoms from the n1 in parentheses. >>> >>> Using: >>> mol = Chem.MolFromSmiles("c1ccc(cc1)-c1nnc(n1)-c1c1") >>> throws an error: Can't kekulize mol. >>> >>> Using >>> mol = Chem.MolFromSmiles("c1ccc(cc1)-c1nncn1-c1c1") >>> works fine. >>> >>> Is there any workaround? >>> Any input is highly appreciated. >>> >>> Cheers, >>> Markus >>> >>> >>> >>> -- >>> Check out the vibrant tech community on one of the world's most >>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >>> ___ >>> Rdkit-discuss mailing list >>> Rdkit-discuss@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >>> >> > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Another Can't kekulize mol observation
Hello all: Thank you very much for your messages. As I would like to process many molecules manually editing smiles is unfortunately not an option. Therefore I tried to automatize this step using the method ReplaceAtom of the class EditableMol. I defined an Hatom and tried to use it. Upon executing attached notebook the input molecule is unchanged. Do you have another suggestions which might help answer my question? Best, Markus On Wed, Apr 26, 2017 at 11:46 PM, Peter S. Shenkinwrote: > I would just replace 'n' with '[nH]' in your existing SMILES, for the N > you want the H on. > > -P. > > On Thu, Apr 27, 2017 at 12:32 AM, Hongbin Yang > wrote: > >> Hi Markus, >> “c1ccc(cc1)-c1nnc(n1)-c1c1” is different from >> "c1ccc(cc1)-c1nncn1-c1c1", >> so you cannot remove the parentheses. >> >> The error "Can't kekulize mol." is caused by the triazole in your >> molecule. >> >> "c1nncn1" tells that the molecule is aromatic, but it do not tell where >> the H is. >> >> For example, "C1=NN=CN1" is "4H-1,2,4-triazole" and "C1=NC=NN1" is >> 1H-1,2,4-triazole. >> They are different in Kekulize but both of them can represented by "c1nncn1" >> >> There's two solutions I suggest: >> 1. use `Chem.MolFromSmiles('c1ccc(cc1)-c1nnc(n1)-c1c1',False)` >> (reference: http://www.rdkit.org/docs/api/rdkit.Chem.rdmolfi >> les-module.html#MolFromSmiles) >> >> 2. Manually Kekulize it: >> `Chem.MolFromSmiles('c1ccc(cc1)-C1=NN=C(N1)-c1c1')` >> . This indicate the H is on the 4'N. >> >> >> -- >> Hongbin Yang >> >> >> *From:* Markus Metz >> *Date:* 2017-04-27 09:30 >> *To:* RDKit Discuss >> *Subject:* [Rdkit-discuss] Another Can't kekulize mol observation >> Hello all: >> >> I obtained this smiles string: >> c1ccc(cc1)-c1nnc(n1)-c1c1 >> by removing atoms from the n1 in parentheses. >> >> Using: >> mol = Chem.MolFromSmiles("c1ccc(cc1)-c1nnc(n1)-c1c1") >> throws an error: Can't kekulize mol. >> >> Using >> mol = Chem.MolFromSmiles("c1ccc(cc1)-c1nncn1-c1c1") >> works fine. >> >> Is there any workaround? >> Any input is highly appreciated. >> >> Cheers, >> Markus >> >> >> >> -- >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> > Cannot_Kekulize.ipynb Description: Binary data -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Another Can't kekulize mol observation
I would just replace 'n' with '[nH]' in your existing SMILES, for the N you want the H on. -P. On Thu, Apr 27, 2017 at 12:32 AM, Hongbin Yangwrote: > Hi Markus, > “c1ccc(cc1)-c1nnc(n1)-c1c1” is different from > "c1ccc(cc1)-c1nncn1-c1c1", > so you cannot remove the parentheses. > > The error "Can't kekulize mol." is caused by the triazole in your > molecule. > > "c1nncn1" tells that the molecule is aromatic, but it do not tell where > the H is. > > For example, "C1=NN=CN1" is "4H-1,2,4-triazole" and "C1=NC=NN1" is > 1H-1,2,4-triazole. > They are different in Kekulize but both of them can represented by "c1nncn1" > > There's two solutions I suggest: > 1. use `Chem.MolFromSmiles('c1ccc(cc1)-c1nnc(n1)-c1c1',False)` > (reference: http://www.rdkit.org/docs/api/rdkit.Chem. > rdmolfiles-module.html#MolFromSmiles) > > 2. Manually Kekulize it: > `Chem.MolFromSmiles('c1ccc(cc1)-C1=NN=C(N1)-c1c1')` > . This indicate the H is on the 4'N. > > > -- > Hongbin Yang > > > *From:* Markus Metz > *Date:* 2017-04-27 09:30 > *To:* RDKit Discuss > *Subject:* [Rdkit-discuss] Another Can't kekulize mol observation > Hello all: > > I obtained this smiles string: > c1ccc(cc1)-c1nnc(n1)-c1c1 > by removing atoms from the n1 in parentheses. > > Using: > mol = Chem.MolFromSmiles("c1ccc(cc1)-c1nnc(n1)-c1c1") > throws an error: Can't kekulize mol. > > Using > mol = Chem.MolFromSmiles("c1ccc(cc1)-c1nncn1-c1c1") > works fine. > > Is there any workaround? > Any input is highly appreciated. > > Cheers, > Markus > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Another Can't kekulize mol observation
Hi Markus,“c1ccc(cc1)-c1nnc(n1)-c1c1” is different from "c1ccc(cc1)-c1nncn1-c1c1", so you cannot remove the parentheses. The error "Can't kekulize mol." is caused by the triazole in your molecule. "c1nncn1" tells that the molecule is aromatic, but it do not tell where the H is. For example, "C1=NN=CN1" is "4H-1,2,4-triazole" and "C1=NC=NN1" is 1H-1,2,4-triazole. They are different in Kekulize but both of them can represented by "c1nncn1" There's two solutions I suggest:1. use `Chem.MolFromSmiles('c1ccc(cc1)-c1nnc(n1)-c1c1',False)` (reference: http://www.rdkit.org/docs/api/rdkit.Chem.rdmolfiles-module.html#MolFromSmiles) 2. Manually Kekulize it: `Chem.MolFromSmiles('c1ccc(cc1)-C1=NN=C(N1)-c1c1')` . This indicate the H is on the 4'N. Hongbin Yang From: Markus MetzDate: 2017-04-27 09:30To: RDKit DiscussSubject: [Rdkit-discuss] Another Can't kekulize mol observationHello all: I obtained this smiles string:c1ccc(cc1)-c1nnc(n1)-c1c1by removing atoms from the n1 in parentheses. Using:mol = Chem.MolFromSmiles("c1ccc(cc1)-c1nnc(n1)-c1c1")throws an error: Can't kekulize mol. Using mol = Chem.MolFromSmiles("c1ccc(cc1)-c1nncn1-c1c1") works fine. Is there any workaround?Any input is highly appreciated. Cheers,Markus -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss