Re: [Rdkit-discuss] handling of stereo information from mol files when not sanitizing
On Tue, Dec 3, 2019 at 9:58 PM Rasmus "Termo" Lundsgaard < termope...@gmail.com> wrote: > Hi Greg. > > Thks for your gist. I guess the line: > > nbrs = [(x.GetOtherAtomIdx(1),x.GetIdx()) for x in atom.GetBonds()] > > should be: > nbrs = [(x.GetOtherAtomIdx(atom.GetIdx()),x.GetIdx()) for x in > atom.GetBonds()] > > You're absolutely right, thanks for catching that. I updated the gist. -greg > > On Tue, Dec 3, 2019 at 5:38 PM Greg Landrum > wrote: > >> What's going on here is that the RDKit defines stereochemistry based on >> the ordering of bonds, not atom indices. >> This has come up on the list multiple times, a relatively recent instance >> is here: >> >> https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg08955.html >> >> Here's a gist that I have laying around that may help here:[1] >> https://gist.github.com/greglandrum/9f0e068e53171174b6797348eca64b3e >> >> >> -greg >> [1] Now if only I could find *why* have that gist laying around... >> >> >> On Tue, Dec 3, 2019 at 3:32 PM Rasmus "Termo" Lundsgaard < >> termope...@gmail.com> wrote: >> >>> Hi Pablo, >>> >>> thank you for the heads up on that removeHs is not honorred when not >>> sanitizing (and that removeH has to be done to solve that issue here). >>> >>> Now I tried with the same molecule but where I also move around on the >>> order of the atoms (attached as Ran1_neworder.sdf), and here I still get a >>> different isomeric smiles, eventhough the chiral tag is the same: >>> for f in ['Ran1.sdf','Ran2.sdf', 'Ran1_neworder.sdf']: >>> m = Chem.MolFromMolFile(f, sanitize=False) >>> m = Chem.RemoveHs(m, sanitize=False) >>> print( Chem.MolToSmiles(set_correct_Chiral_flags(m), >>> isomericSmiles=True) ) >>> >>> >>> C[C@@H](N)C(=O)O >>> C[C@@H](N)C(=O)O >>> C[C@H](N)C(=O)O >>> >>> >>> On Tue, Dec 3, 2019 at 2:58 PM Paolo Tosco >>> wrote: >>> Hi Rasmus, the problem is that, as stated in the rdmolfiles.MolFromMolFile() docs, the removeHs option is only honored when sanitize is True. So to obtain sensible results without sanitizing you should rather do something like: m1 = Chem.MolFromMolFile('Ran1.sdf', sanitize=False) m1 = Chem.RemoveHs(m1, sanitize=False) print( Chem.MolToSmiles(set_correct_Chiral_flags(m1), isomericSmiles=True) ) m2 = Chem.MolFromMolFile('Ran2.sdf', sanitize=False) m2 = Chem.RemoveHs(m2, sanitize=False) print( Chem.MolToSmiles(set_correct_Chiral_flags(m2), isomericSmiles=True) ) You may check the individual sanitization operations here: https://www.rdkit.org/docs/source/rdkit.Chem.rdmolops.html?highlight=rdmolops%20sanitizeflags#rdkit.Chem.rdmolops.SanitizeFlags Cheers, p. On 03/12/2019 12:46, Rasmus "Termo" Lundsgaard wrote: Hi all I would like to avoid sanitizing the sdf files, as information in these files should be seen as the ground truth. I however have some problems in figuring out how to read and set chiral information from the file and also have RDkit behave the same always. Attached are two sdf files with no 3d information and only stereo information in the atoms section for R-Aniline. The only difference as I see it is the order of the lines of the bond information. Even so I get two different smiles back with isomeric information when not sanitizing. Attached is also the minimal python code: which for me at least outputs: not setting chiral flags > CC(N)C(=O)O > CC(N)C(=O)O > > setting chiral flags > [H]OC(=O)[C@]([H])(N([H])[H])C([H])([H])[H] > [H]OC(=O)[C@@]([H])(N([H])[H])C([H])([H])[H] > > setting chiral flags and sanitize > C[C@@H](N)C(=O)O > C[C@@H](N)C(=O)O > Any ideas to why this happens and how I can handle it strictly. Also what does the sanitizing exactly do? Regards Rasmus ___ Rdkit-discuss mailing listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ >>> Rdkit-discuss mailing list >>> Rdkit-discuss@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >> ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] handling of stereo information from mol files when not sanitizing
Hi Greg. Thks for your gist. I guess the line: nbrs = [(x.GetOtherAtomIdx(1),x.GetIdx()) for x in atom.GetBonds()] should be: nbrs = [(x.GetOtherAtomIdx(atom.GetIdx()),x.GetIdx()) for x in atom.GetBonds()] On Tue, Dec 3, 2019 at 5:38 PM Greg Landrum wrote: > What's going on here is that the RDKit defines stereochemistry based on > the ordering of bonds, not atom indices. > This has come up on the list multiple times, a relatively recent instance > is here: > > https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg08955.html > > Here's a gist that I have laying around that may help here:[1] > https://gist.github.com/greglandrum/9f0e068e53171174b6797348eca64b3e > > > -greg > [1] Now if only I could find *why* have that gist laying around... > > > On Tue, Dec 3, 2019 at 3:32 PM Rasmus "Termo" Lundsgaard < > termope...@gmail.com> wrote: > >> Hi Pablo, >> >> thank you for the heads up on that removeHs is not honorred when not >> sanitizing (and that removeH has to be done to solve that issue here). >> >> Now I tried with the same molecule but where I also move around on the >> order of the atoms (attached as Ran1_neworder.sdf), and here I still get a >> different isomeric smiles, eventhough the chiral tag is the same: >> for f in ['Ran1.sdf','Ran2.sdf', 'Ran1_neworder.sdf']: >> m = Chem.MolFromMolFile(f, sanitize=False) >> m = Chem.RemoveHs(m, sanitize=False) >> print( Chem.MolToSmiles(set_correct_Chiral_flags(m), >> isomericSmiles=True) ) >> >> >> C[C@@H](N)C(=O)O >> C[C@@H](N)C(=O)O >> C[C@H](N)C(=O)O >> >> >> On Tue, Dec 3, 2019 at 2:58 PM Paolo Tosco >> wrote: >> >>> Hi Rasmus, >>> >>> the problem is that, as stated in the rdmolfiles.MolFromMolFile() docs, >>> the removeHs option is only honored when sanitize is True. >>> >>> So to obtain sensible results without sanitizing you should rather do >>> something like: >>> >>> m1 = Chem.MolFromMolFile('Ran1.sdf', sanitize=False) >>> m1 = Chem.RemoveHs(m1, sanitize=False) >>> print( Chem.MolToSmiles(set_correct_Chiral_flags(m1), >>> isomericSmiles=True) ) >>> m2 = Chem.MolFromMolFile('Ran2.sdf', sanitize=False) >>> m2 = Chem.RemoveHs(m2, sanitize=False) >>> print( Chem.MolToSmiles(set_correct_Chiral_flags(m2), >>> isomericSmiles=True) ) >>> >>> You may check the individual sanitization operations here: >>> >>> https://www.rdkit.org/docs/source/rdkit.Chem.rdmolops.html?highlight=rdmolops%20sanitizeflags#rdkit.Chem.rdmolops.SanitizeFlags >>> >>> Cheers, >>> p. >>> >>> On 03/12/2019 12:46, Rasmus "Termo" Lundsgaard wrote: >>> >>> Hi all >>> >>> I would like to avoid sanitizing the sdf files, as information in these >>> files should be seen as the ground truth. >>> >>> I however have some problems in figuring out how to read and set chiral >>> information from the file and also have RDkit behave the same always. >>> Attached are two sdf files with no 3d information and only stereo >>> information in the atoms section for R-Aniline. The only difference as I >>> see it is the order of the lines of the bond information. >>> Even so I get two different smiles back with isomeric information when >>> not sanitizing. >>> >>> Attached is also the minimal python code: which for me at least outputs: >>> >>> not setting chiral flags CC(N)C(=O)O CC(N)C(=O)O setting chiral flags [H]OC(=O)[C@]([H])(N([H])[H])C([H])([H])[H] [H]OC(=O)[C@@]([H])(N([H])[H])C([H])([H])[H] setting chiral flags and sanitize C[C@@H](N)C(=O)O C[C@@H](N)C(=O)O >>> >>> Any ideas to why this happens and how I can handle it strictly. Also >>> what does the sanitizing exactly do? >>> >>> Regards Rasmus >>> >>> >>> >>> ___ >>> Rdkit-discuss mailing >>> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >>> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] handling of stereo information from mol files when not sanitizing
What's going on here is that the RDKit defines stereochemistry based on the ordering of bonds, not atom indices. This has come up on the list multiple times, a relatively recent instance is here: https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg08955.html Here's a gist that I have laying around that may help here:[1] https://gist.github.com/greglandrum/9f0e068e53171174b6797348eca64b3e -greg [1] Now if only I could find *why* have that gist laying around... On Tue, Dec 3, 2019 at 3:32 PM Rasmus "Termo" Lundsgaard < termope...@gmail.com> wrote: > Hi Pablo, > > thank you for the heads up on that removeHs is not honorred when not > sanitizing (and that removeH has to be done to solve that issue here). > > Now I tried with the same molecule but where I also move around on the > order of the atoms (attached as Ran1_neworder.sdf), and here I still get a > different isomeric smiles, eventhough the chiral tag is the same: > for f in ['Ran1.sdf','Ran2.sdf', 'Ran1_neworder.sdf']: > m = Chem.MolFromMolFile(f, sanitize=False) > m = Chem.RemoveHs(m, sanitize=False) > print( Chem.MolToSmiles(set_correct_Chiral_flags(m), > isomericSmiles=True) ) > > > C[C@@H](N)C(=O)O > C[C@@H](N)C(=O)O > C[C@H](N)C(=O)O > > > On Tue, Dec 3, 2019 at 2:58 PM Paolo Tosco > wrote: > >> Hi Rasmus, >> >> the problem is that, as stated in the rdmolfiles.MolFromMolFile() docs, >> the removeHs option is only honored when sanitize is True. >> >> So to obtain sensible results without sanitizing you should rather do >> something like: >> >> m1 = Chem.MolFromMolFile('Ran1.sdf', sanitize=False) >> m1 = Chem.RemoveHs(m1, sanitize=False) >> print( Chem.MolToSmiles(set_correct_Chiral_flags(m1), >> isomericSmiles=True) ) >> m2 = Chem.MolFromMolFile('Ran2.sdf', sanitize=False) >> m2 = Chem.RemoveHs(m2, sanitize=False) >> print( Chem.MolToSmiles(set_correct_Chiral_flags(m2), >> isomericSmiles=True) ) >> >> You may check the individual sanitization operations here: >> >> https://www.rdkit.org/docs/source/rdkit.Chem.rdmolops.html?highlight=rdmolops%20sanitizeflags#rdkit.Chem.rdmolops.SanitizeFlags >> >> Cheers, >> p. >> >> On 03/12/2019 12:46, Rasmus "Termo" Lundsgaard wrote: >> >> Hi all >> >> I would like to avoid sanitizing the sdf files, as information in these >> files should be seen as the ground truth. >> >> I however have some problems in figuring out how to read and set chiral >> information from the file and also have RDkit behave the same always. >> Attached are two sdf files with no 3d information and only stereo >> information in the atoms section for R-Aniline. The only difference as I >> see it is the order of the lines of the bond information. >> Even so I get two different smiles back with isomeric information when >> not sanitizing. >> >> Attached is also the minimal python code: which for me at least outputs: >> >> not setting chiral flags >>> CC(N)C(=O)O >>> CC(N)C(=O)O >>> >>> setting chiral flags >>> [H]OC(=O)[C@]([H])(N([H])[H])C([H])([H])[H] >>> [H]OC(=O)[C@@]([H])(N([H])[H])C([H])([H])[H] >>> >>> setting chiral flags and sanitize >>> C[C@@H](N)C(=O)O >>> C[C@@H](N)C(=O)O >>> >> >> Any ideas to why this happens and how I can handle it strictly. Also what >> does the sanitizing exactly do? >> >> Regards Rasmus >> >> >> >> ___ >> Rdkit-discuss mailing >> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] handling of stereo information from mol files when not sanitizing
Hi Pablo, thank you for the heads up on that removeHs is not honorred when not sanitizing (and that removeH has to be done to solve that issue here). Now I tried with the same molecule but where I also move around on the order of the atoms (attached as Ran1_neworder.sdf), and here I still get a different isomeric smiles, eventhough the chiral tag is the same: for f in ['Ran1.sdf','Ran2.sdf', 'Ran1_neworder.sdf']: m = Chem.MolFromMolFile(f, sanitize=False) m = Chem.RemoveHs(m, sanitize=False) print( Chem.MolToSmiles(set_correct_Chiral_flags(m), isomericSmiles=True) ) C[C@@H](N)C(=O)O C[C@@H](N)C(=O)O C[C@H](N)C(=O)O On Tue, Dec 3, 2019 at 2:58 PM Paolo Tosco wrote: > Hi Rasmus, > > the problem is that, as stated in the rdmolfiles.MolFromMolFile() docs, > the removeHs option is only honored when sanitize is True. > > So to obtain sensible results without sanitizing you should rather do > something like: > > m1 = Chem.MolFromMolFile('Ran1.sdf', sanitize=False) > m1 = Chem.RemoveHs(m1, sanitize=False) > print( Chem.MolToSmiles(set_correct_Chiral_flags(m1), isomericSmiles=True) > ) > m2 = Chem.MolFromMolFile('Ran2.sdf', sanitize=False) > m2 = Chem.RemoveHs(m2, sanitize=False) > print( Chem.MolToSmiles(set_correct_Chiral_flags(m2), isomericSmiles=True) > ) > > You may check the individual sanitization operations here: > > https://www.rdkit.org/docs/source/rdkit.Chem.rdmolops.html?highlight=rdmolops%20sanitizeflags#rdkit.Chem.rdmolops.SanitizeFlags > > Cheers, > p. > > On 03/12/2019 12:46, Rasmus "Termo" Lundsgaard wrote: > > Hi all > > I would like to avoid sanitizing the sdf files, as information in these > files should be seen as the ground truth. > > I however have some problems in figuring out how to read and set chiral > information from the file and also have RDkit behave the same always. > Attached are two sdf files with no 3d information and only stereo > information in the atoms section for R-Aniline. The only difference as I > see it is the order of the lines of the bond information. > Even so I get two different smiles back with isomeric information when not > sanitizing. > > Attached is also the minimal python code: which for me at least outputs: > > not setting chiral flags >> CC(N)C(=O)O >> CC(N)C(=O)O >> >> setting chiral flags >> [H]OC(=O)[C@]([H])(N([H])[H])C([H])([H])[H] >> [H]OC(=O)[C@@]([H])(N([H])[H])C([H])([H])[H] >> >> setting chiral flags and sanitize >> C[C@@H](N)C(=O)O >> C[C@@H](N)C(=O)O >> > > Any ideas to why this happens and how I can handle it strictly. Also what > does the sanitizing exactly do? > > Regards Rasmus > > > > ___ > Rdkit-discuss mailing > listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > Ran1_neworder.sdf Description: StarMath document ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] handling of stereo information from mol files when not sanitizing
Hi Rasmus, the problem is that, as stated in the |rdmolfiles.||MolFromMolFile|() docs, the removeHs option is only honored when sanitize is True. So to obtain sensible results without sanitizing you should rather do something like: m1 = Chem.MolFromMolFile('Ran1.sdf', sanitize=False) m1 = Chem.RemoveHs(m1, sanitize=False) print( Chem.MolToSmiles(set_correct_Chiral_flags(m1), isomericSmiles=True) ) m2 = Chem.MolFromMolFile('Ran2.sdf', sanitize=False) m2 = Chem.RemoveHs(m2, sanitize=False) print( Chem.MolToSmiles(set_correct_Chiral_flags(m2), isomericSmiles=True) ) You may check the individual sanitization operations here: https://www.rdkit.org/docs/source/rdkit.Chem.rdmolops.html?highlight=rdmolops%20sanitizeflags#rdkit.Chem.rdmolops.SanitizeFlags Cheers, p. On 03/12/2019 12:46, Rasmus "Termo" Lundsgaard wrote: Hi all I would like to avoid sanitizing the sdf files, as information in these files should be seen as the ground truth. I however have some problems in figuring out how to read and set chiral information from the file and also have RDkit behave the same always. Attached are two sdf files with no 3d information and only stereo information in the atoms section for R-Aniline. The only difference as I see it is the order of the lines of the bond information. Even so I get two different smiles back with isomeric information when not sanitizing. Attached is also the minimal python code: which for me at least outputs: not setting chiral flags CC(N)C(=O)O CC(N)C(=O)O setting chiral flags [H]OC(=O)[C@]([H])(N([H])[H])C([H])([H])[H] [H]OC(=O)[C@@]([H])(N([H])[H])C([H])([H])[H] setting chiral flags and sanitize C[C@@H](N)C(=O)O C[C@@H](N)C(=O)O Any ideas to why this happens and how I can handle it strictly. Also what does the sanitizing exactly do? Regards Rasmus ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss