Hi Adelene, I have updated the gist
https://gist.github.com/ptosco/1e1c23ad24c90444993fa1db21ccb48b to account for your questions. Cheers, p. On Tue, Oct 20, 2020 at 2:08 PM Adelene LAI <adelene....@uni.lu> wrote: > Hi Dave and Pablo, > > > Thanks for your helpful replies. > > > @Dave, issue created: https://github.com/rdkit/rdkit/issues/3514 > > > @Pablo, your gist shows that the internal representation of the mol does > indeed factor in undefined stereo, contrary to the way it is depicted. > > > But why then does this happen when I check if the 2 molecules are the same? > > > smi = > Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O') > isosmi = > Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)/C(=C/C(C(=O)O)Br)/CC(=O)O') > print(smi == isosmi) #True, expect False > print(smi.HasSubstructMatch(isosmi)) #True, expect False > print(isosmi.HasSubstructMatch(smi)) #True, expect False > print(smi.HasSubstructMatch(isosmi) and isosmi.HasSubstructMatch(smi)) > #True, > expect False > > > However, converting smi and isosmi to canonical smiles and comparing them > gives False, as expected: > > a = > Chem.CanonSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O') > b = > Chem.CanonSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)/C(=C/C(C(=O)O)Br)/CC(=O)O') > a == b #False > > > (If there are better ways to check if 2 molecules are equal, I'd be > interested to know.) > > https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/9DF05ED7-A30E-4742-A568-9B3995689382%40dalkescientific.com/#msg29882815 > ? > > > Adelene > > > > > > Doctoral Researcher > > Environmental Cheminformatics > > UNIVERSITÉ DU LUXEMBOURG > > > LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE > > 6, avenue du Swing, L-4367 Belvaux > > T +356 46 66 44 67 18 > > [image: github.png] adelenelai > > > > > > ------------------------------ > *From:* Paolo Tosco <paolo.tosco.m...@gmail.com> > *Sent:* Tuesday, October 20, 2020 1:52:12 PM > *To:* Adelene LAI > *Cc:* rdkit-discuss > *Subject:* Re: [Rdkit-discuss] How to preserve undefined stereochemistry? > > Hi Adelene, > > this gist > > https://gist.github.com/ptosco/1e1c23ad24c90444993fa1db21ccb48b > > shows how to add stereo annotations to RDKit 2D depictions, and also how > to access the double bond stereochemistry programmatically. > > Cheers, > p. > > > On Tue, Oct 20, 2020 at 12:24 PM Adelene LAI <adelene....@uni.lu> wrote: > >> Hi RDKit Community, >> >> >> Is there a way to preserve undefined stereochemistry aka unspecified >> stereochemistry when doing MolFromSmiles? >> >> I'm working with a bunch of molecules, some with stereochemistry defined, >> some without. >> >> >> If stereochemistry is undefined in the SMILES, I would like it to stay >> that way when converted to a Mol, but this doesn't seem to be the case: >> >> >> > mol = >> Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O') >> > mol >> >> One would expect that C=C to either be crossed, as in PubChem's depiction: >> >> https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure >> >> <https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure> >> >> >> or that single bond to be squiggly, as in CDK's depiction: >> >> But it's not just a matter of depiction, as it seems internally, mol is >> equivalent to its stereochem-specific sibling (Entgegen form) >> >> >> CC(C)(C1=CC(=C(C(=C1)Br)O)Br)/C(=C/C(C(=O)O)Br)/CC(=O)O >> >> >> >> I've tried sanitize=False, but it doesn't seem to have any effect. I >> would prefer not having to manually SetStereo(Chem.BondStereo.STEREOANY) >> for every molecule with undefined stereochem (not sure how I would even go >> about that...). >> >> >> Possibly related to: >> >> >> https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570 >> >> >> >> <https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570> >> >> https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/CAHOi4k3revAu-9qhFt0MpUpr0aADQ9d8bV2XT6FurTEKimCQng%40mail.gmail.com/#msg36365128 >> o = Chem.MolFromSmiles('C/C=C/C') >> >> >> <https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570> >> https://www.rdkit.org/docs/source/rdkit.Chem.EnumerateStereoisomers.html >> >> https://github.com/openforcefield/openforcefield/issues/146 >> >> >> >> >> Any help would be much appreciated. >> >> >> Thanks, >> >> Adelene >> >> >> >> >> >> >> >> >> Doctoral Researcher >> >> Environmental Cheminformatics >> >> UNIVERSITÉ DU LUXEMBOURG >> >> >> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE >> >> 6, avenue du Swing, L-4367 Belvaux >> >> T +356 46 66 44 67 18 >> >> [image: github.png] adelenelai >> >> >> >> >> >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss