Re: [Rdkit-discuss] Preserving hydrogens necessary for imine cis/trans stereochemistry?
There isn't currently any easy way to do this. One workaround that could make the manual stuff a bit less painful would be find those Hs, convert them into Ds, call RemoveHs(), and then convert the Ds back into Hs. It seems like it does make sense to modify the RemoveHs() function: https://github.com/rdkit/rdkit/issues/1419 As mentioned in the issue report, this will lead to this oddity: >>> Chem.CanonSmiles('C/C(F)=N/[H]') '[H]/N=C(/C)F' >>> Chem.CanonSmiles('C/C(F)=N[H]') 'CC(=N)F' I can't think of a precedent for that in the RDKit, but I think it's better to add that oddity than to modify all of the code managing cis/trans double bonds to be able to handle using the implicit H. Does anyone have objections to this approach to fixing the problem? As an aside, while looking into this I noticed that removeHs() also removes H atoms that have atom map information. This probably also shouldn't happen. That's now here: https://github.com/rdkit/rdkit/issues/1420 -greg On Thu, May 18, 2017 at 2:54 AM, Brian Colewrote: > Is there a recommended way in RDKit to preserve hydrogens necessary for > representing cis/trans stereochemistry of imines? > > For example, given the attached SDF I need to maintain explicit hydrogens > in the output SMILES string to maintain the imine cis/trans > stereo-chemistry. > > mol = Chem.ForwardSDMolSupplier(open('ZINC23714507.sdf'), > removeHs=False).next() > print(Chem.MolToSmiles(mol, True)) > > Yields the correct but ugly smiles: > [H]/N=C(/[H])C([H])([H])N(C([H])([H])/C([H])=N\[H])S(=O)(= > O)c1c([H])c([H])c(C(=O)N([H])c2sc3c(c2C(=O)N([H])C([H])([H] > )[H])C([H])([H])C([H])([H])[N+]([H])(C([H])([H])C([H])([H]) > C([H])([H])[H])C3([H])[H])c([H])c1[H] > > RemoveHs is too heavy a hammer as it removes my cis/trans stereo: > > print(Chem.MolToSmiles(Chem.RemoveHs(mol), True)) > > CCC[NH+]1CCc2c(sc(NC(=O)c3ccc(S(=O)(=O)N(CC=N)CC=N)cc3)c2C(=O)NC)C1 > > > I can write the explicit loop myself to only remove hydrogens not part of > stereo chemistry, but seems like this might be functionality buried > somewhere in the RDKit. > > Thanks, > Brian > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Preserving hydrogens necessary for imine cis/trans stereochemistry?
Is there a recommended way in RDKit to preserve hydrogens necessary for representing cis/trans stereochemistry of imines? For example, given the attached SDF I need to maintain explicit hydrogens in the output SMILES string to maintain the imine cis/trans stereo-chemistry. mol = Chem.ForwardSDMolSupplier(open('ZINC23714507.sdf'), removeHs=False).next() print(Chem.MolToSmiles(mol, True)) Yields the correct but ugly smiles: [H]/N=C(/[H])C([H])([H])N(C([H])([H])/C([H])=N\[H])S(=O)(=O)c1c([H])c([H])c(C(=O)N([H])c2sc3c(c2C(=O)N([H])C([H])([H])[H])C([H])([H])C([H])([H])[N+]([H])(C([H])([H])C([H])([H])C([H])([H])[H])C3([H])[H])c([H])c1[H] RemoveHs is too heavy a hammer as it removes my cis/trans stereo: print(Chem.MolToSmiles(Chem.RemoveHs(mol), True)) CCC[NH+]1CCc2c(sc(NC(=O)c3ccc(S(=O)(=O)N(CC=N)CC=N)cc3)c2C(=O)NC)C1 I can write the explicit loop myself to only remove hydrogens not part of stereo chemistry, but seems like this might be functionality buried somewhere in the RDKit. Thanks, Brian ZINC23714507.sdf Description: Binary data -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] How to match any halogen of a structure with any halogen of a substructure?
Dear All, In case it helps, there is a wealth of functional groups already in RDKit available here: https://github.com/rdkit/rdkit/blob/master/Data/Functional_Group_Hierarchy.txt For instance, the functional group halogen pattern we use is a bit more complicated: [$([F,Cl,Br,I]-!@[#6]);!$([F,Cl,Br,I]-!@C-!@[F,Cl,Br,I]);!$([F,Cl,Br,I]-[C,S](=[O,S,N]))] That can (1) help you write your own patterns and (2) be used (from python) as follows: from __future__ import print_function from rdkit import Chem from rdkit.Chem import FilterCatalog queryDefs = FilterCatalog.GetFlattenedFunctionalGroupHierarchy() smiles = "ClC1=CC(C2NCCOC2)=C(C=CC=C3)C3=C1" mol = Chem.MolFromSmiles(smiles) items = sorted(queryDefs.items()) for name, pat in items: print("%s\t%s"%(name, mol.HasSubstructMatch(pat))) AcidChloride False AcidChloride.Aliphatic False AcidChloride.Aromatic False Alcohol False Alcohol.Aliphatic False Alcohol.Aromatic False Aldehyde False Aldehyde.Aliphatic False Aldehyde.Aromatic False Amine True Amine.Aliphatic True Amine.Aromatic False Amine.Cyclic True Amine.Primary False Amine.Primary.Aliphatic False Amine.Primary.Aromatic False Amine.Secondary True Amine.Secondary.Aliphatic True Amine.Secondary.Aromatic False Amine.Tertiary False Amine.Tertiary.Aliphatic False Amine.Tertiary.Aromatic False Azide False Azide.Aliphatic False Azide.Aromatic False BoronicAcid False BoronicAcid.Aliphatic False BoronicAcid.Aromatic False CarboxylicAcid False CarboxylicAcid.Aliphatic False CarboxylicAcid.AlphaAmino False CarboxylicAcid.Aromatic False Halogen True Halogen.Aliphatic False Halogen.Aromatic True Halogen.Bromine False Halogen.Bromine.Aliphatic False Halogen.Bromine.Aromatic False Halogen.Bromine.BromoKetone False Halogen.NotFluorine True Halogen.NotFluorine.Aliphatic False Halogen.NotFluorine.Aromatic True Isocyanate False Isocyanate.Aliphatic False Isocyanate.Aromatic False Nitro False Nitro.Aliphatic False Nitro.Aromatic False SulfonylChloride False SulfonylChloride.Aliphatic False SulfonylChloride.Aromatic False TerminalAlkyne False Cheers, Brian On Wed, May 17, 2017 at 9:20 AM, Alexis Parenty < alexis.parenty.h...@gmail.com> wrote: > Hi Michal, thanks for your response. > I think I made a typo somewhere in my previous code since it now works > fine, even without the the kekule notation... Sorry about the confusion... > Best, > > Alexis > > On 17 May 2017 at 13:59, Michal Krompiec> wrote: > >> Hi Alexis, >> Try aromatic form instead of Kekule notation. >> Best, >> Michal >> >> On 17 May 2017 at 12:55, Alexis Parenty >> wrote: >> >>> Hi everyone, >>> >>> I am looking for substructure match between a smarts and a smiles, but I >>> want any heteroatom from the smarts to match any heteroatom from a smiles: >>> >>> >>> [image: Inline images 1] >>> >>> >>> >>> >>> >>> The following does not return what I would expect: >>> >>> smarts1 = " [F,Cl,Br,I]C1=CC(C2[N,O,S]CC[N,O,S]C2)=CC=C1"smiles2 = " >>> ClC1=CC(C2NCCOC2)=C(C=CC=C3)C3=C1" >>> >>> mol1 = Chem.MolFromSmarts(smarts1)mol2 = Chem.MolFromSmiles(smiles2) >>> *print*("mol1 is a substructure of mol2: >>> {}".format(mol2.HasSubstructMatch(mol1) *print*("mol2 is a substructure of >>> mol1: {}".format(mol1.HasSubstructMatch(mol2))) >>> >>> >>> >>> ð mol1 is a substructure of mol2: False >>> >>> ð mol2 is a substructure of mol1: False >>> >>> How could I do that? >>> >>> >>> >>> Thanks, >>> >>> >>> >>> Alexis >>> >>> >>> >>> -- >>> Check out the vibrant tech community on one of the world's most >>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >>> ___ >>> Rdkit-discuss mailing list >>> Rdkit-discuss@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >>> >> > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] How to match any halogen of a structure with any halogen of a substructure?
Hi Michal, thanks for your response. I think I made a typo somewhere in my previous code since it now works fine, even without the the kekule notation... Sorry about the confusion... Best, Alexis On 17 May 2017 at 13:59, Michal Krompiecwrote: > Hi Alexis, > Try aromatic form instead of Kekule notation. > Best, > Michal > > On 17 May 2017 at 12:55, Alexis Parenty > wrote: > >> Hi everyone, >> >> I am looking for substructure match between a smarts and a smiles, but I >> want any heteroatom from the smarts to match any heteroatom from a smiles: >> >> >> [image: Inline images 1] >> >> >> >> >> >> The following does not return what I would expect: >> >> smarts1 = " [F,Cl,Br,I]C1=CC(C2[N,O,S]CC[N,O,S]C2)=CC=C1"smiles2 = " >> ClC1=CC(C2NCCOC2)=C(C=CC=C3)C3=C1" >> >> mol1 = Chem.MolFromSmarts(smarts1)mol2 = Chem.MolFromSmiles(smiles2) >> *print*("mol1 is a substructure of mol2: >> {}".format(mol2.HasSubstructMatch(mol1) *print*("mol2 is a substructure of >> mol1: {}".format(mol1.HasSubstructMatch(mol2))) >> >> >> >> ð mol1 is a substructure of mol2: False >> >> ð mol2 is a substructure of mol1: False >> >> How could I do that? >> >> >> >> Thanks, >> >> >> >> Alexis >> >> >>> >> >> -- >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] How to match any halogen of a structure with any halogen of a substructure?
Hi Alexis, Try aromatic form instead of Kekule notation. Best, Michal On 17 May 2017 at 12:55, Alexis Parentywrote: > Hi everyone, > > I am looking for substructure match between a smarts and a smiles, but I > want any heteroatom from the smarts to match any heteroatom from a smiles: > > > [image: Inline images 1] > > > > > > The following does not return what I would expect: > > smarts1 = " [F,Cl,Br,I]C1=CC(C2[N,O,S]CC[N,O,S]C2)=CC=C1"smiles2 = " > ClC1=CC(C2NCCOC2)=C(C=CC=C3)C3=C1" > > mol1 = Chem.MolFromSmarts(smarts1)mol2 = Chem.MolFromSmiles(smiles2) > *print*("mol1 is a substructure of mol2: > {}".format(mol2.HasSubstructMatch(mol1) *print*("mol2 is a substructure of > mol1: {}".format(mol1.HasSubstructMatch(mol2))) > > > > ð mol1 is a substructure of mol2: False > > ð mol2 is a substructure of mol1: False > > How could I do that? > > > > Thanks, > > > > Alexis > > >> > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] How to match any halogen of a structure with any halogen of a substructure?
Hi everyone, I am looking for substructure match between a smarts and a smiles, but I want any heteroatom from the smarts to match any heteroatom from a smiles: [image: Inline images 1] The following does not return what I would expect: smarts1 = " [F,Cl,Br,I]C1=CC(C2[N,O,S]CC[N,O,S]C2)=CC=C1"smiles2 = " ClC1=CC(C2NCCOC2)=C(C=CC=C3)C3=C1" mol1 = Chem.MolFromSmarts(smarts1)mol2 = Chem.MolFromSmiles(smiles2) *print*("mol1 is a substructure of mol2: {}".format(mol2.HasSubstructMatch(mol1) *print*("mol2 is a substructure of mol1: {}".format(mol1.HasSubstructMatch(mol2))) ð mol1 is a substructure of mol2: False ð mol2 is a substructure of mol1: False How could I do that? Thanks, Alexis > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss