Re: [Rdkit-discuss] MolToSmiles gives explicit H after ReplaceSubstructs
Dear Paolo, Thank you for all the tips. I was not aware of these. In fact I did suspect that that [H] has something to do with the stereochemistry, since the original F made the bond stereo. However, the "[H]" did not go away even after I called AllChem.SanitizeMol(m6) AllChem.AssignStereochemistry(m6) or AllChem.SanitizeMol(m6) AllChem.FindPotentialStereo(m6) I thought these would trigger the re-perception of the double bond, which is no longer stereo. By the way, I wrote down in my notes that rdkit.Chem.rdmolops.AssignStereochemistry is old, while rdkit.Chem.rdmolops.FindPotentialStereo is new. So it may be better to use the latter. As for DeleteSubstructs, in fact I started out using this but ran into some problem. Then I switched to ReplaceSubstructs. I am still analyzing that problem. I ran it on a big data set and hence I need to make sense of the issue first. If I can boil it down, I may make a separate forum post. Ling Paolo Tosco 於 2021年11月5日週五 上午5:54寫道: > Hi Ling, > > By default hydrogens defining double bond stereochemistry are not removed. > You may remove that residual hydrogen by either > > params = Chem.RemoveHsParameters() > params.removeDefiningBondStereo = True > Chem.RemoveHs(m6, params) > > or simply > > Chem.RemoveAllHs(m6) > > I think you may obtain the same result by just > > m6s = AllChem.DeleteSubstructs(m5, mf) > > Cheers, > p. > > > On Wed, Nov 3, 2021 at 9:29 PM Ling Chan wrote: > >> Hello colleagues, >> >> I tried to change all F's into H's. It worked. But when I converted the >> result into a smiles string, there is the occasional lingering explicit >> hydrogen. It is there even after I do a RemoveHs(). >> >> Just wonder what is this explicit H about, since it may have implications >> on any further processing. >> >> Thank you! >> >> Ling >> >> >> >> mh = Chem.MolFromSmiles("[#1]") >> mf = Chem.MolFromSmarts('F') >> m5 = Chem.MolFromSmiles("F/C=C1/[C@H](F)[C@@H](F)O[C@@H]1F") >> m6s = AllChem.ReplaceSubstructs(m5,mf,mh,replaceAll=True) >> m6 = m6s[0] >> print(Chem.MolToSmiles(Chem.RemoveHs(m6))) >> >> [H]C=C1CCOC1 >> >> >> >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MolToSmiles gives explicit H after ReplaceSubstructs
Hi Ling, By default hydrogens defining double bond stereochemistry are not removed. You may remove that residual hydrogen by either params = Chem.RemoveHsParameters() params.removeDefiningBondStereo = True Chem.RemoveHs(m6, params) or simply Chem.RemoveAllHs(m6) I think you may obtain the same result by just m6s = AllChem.DeleteSubstructs(m5, mf) Cheers, p. On Wed, Nov 3, 2021 at 9:29 PM Ling Chan wrote: > Hello colleagues, > > I tried to change all F's into H's. It worked. But when I converted the > result into a smiles string, there is the occasional lingering explicit > hydrogen. It is there even after I do a RemoveHs(). > > Just wonder what is this explicit H about, since it may have implications > on any further processing. > > Thank you! > > Ling > > > > mh = Chem.MolFromSmiles("[#1]") > mf = Chem.MolFromSmarts('F') > m5 = Chem.MolFromSmiles("F/C=C1/[C@H](F)[C@@H](F)O[C@@H]1F") > m6s = AllChem.ReplaceSubstructs(m5,mf,mh,replaceAll=True) > m6 = m6s[0] > print(Chem.MolToSmiles(Chem.RemoveHs(m6))) > > [H]C=C1CCOC1 > > > > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] MolToSmiles gives explicit H after ReplaceSubstructs
Hello colleagues, I tried to change all F's into H's. It worked. But when I converted the result into a smiles string, there is the occasional lingering explicit hydrogen. It is there even after I do a RemoveHs(). Just wonder what is this explicit H about, since it may have implications on any further processing. Thank you! Ling mh = Chem.MolFromSmiles("[#1]") mf = Chem.MolFromSmarts('F') m5 = Chem.MolFromSmiles("F/C=C1/[C@H](F)[C@@H](F)O[C@@H]1F") m6s = AllChem.ReplaceSubstructs(m5,mf,mh,replaceAll=True) m6 = m6s[0] print(Chem.MolToSmiles(Chem.RemoveHs(m6))) [H]C=C1CCOC1 ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MolToSmiles atom ordering
Cool, good to know this special property. Thank you Andrew! Ling Andrew Dalke 於 2021年11月2日週二 下午10:36寫道: > Hi Ling, > > If there are symmetries then a substructure search like will only give > you one mapping, and that might not be the canonical mapping. > > What you're looking for is the special property _smilesAtomOutputOrder > > > >>> from rdkit import Chem > >>> mol = Chem.MolFromSmiles("O=C(NCc1cc(OC)c(O)cc1)/C=C/C(C)C") > >>> Chem.MolToSmiles(mol) > 'COc1cc(CNC(=O)/C=C/C(C)C)ccc1O' > >>> mol.GetProp("_smilesAtomOutputOrder") > '[8,7,6,5,4,3,2,1,0,13,14,15,16,17,18,19,20,21,12,11,9,10,]' > > Here are the atom indices of the original SMILES: > > ┌ 1 11 1 1 1 2 2 > atoms│ 0 1 234 56 78 9 0 12 3456 7 8 9 0 1 > └ | | ||| || || | | || | | | | | >SMILES[ O=C(NCc1cc(OC)c(O)cc1)/C=C/C(C)C > > > You can see the first atom of the output is a "C", which is mapped to > position 8 in the _smilesAtomOutputOrder, which is the "...C)..." in the > original SMILES, etc. > > > Cheers, > > > Andrew > da...@dalkescientific.com > > > > On Nov 3, 2021, at 00:18, Ling Chan wrote: > > > > O.K. Problem solved. Sorry about the spam, folks. > > > > I can use GetSubstructMatch, as follows. > > > > # sinput is the input smiles > > # scanon is the output smiles > > > > minput = Chem.MolFromSmiles(sinput) > > scanon=Chem.MolToSmiles(minput) > > mcanon=Chem.MolFromSmiles(scanon) > > map_forward = minput.GetSubstructMatch(mcanon) > > map_backward = mcanon.GetSubstructMatch(minput) > > > > > > > > > > Ling Chan 於 2021年11月2日週二 下午3:55寫道: > > Dear colleagues, > > > > Just wonder if I can obtain a mapping of the atom indices upon > canonicalization by MolToSmiles ? I am aware that canonicalization (and > hence atom reordering) can be suppressed in MolToSmiles, but I do want to > canonicalize the output smiles. > > > > If you are interested, here is a bit more details of my problem. For > each molecule, I want to delete one or two side chains, and obtain a smiles > of what is left. Just that I want to know what are the atoms that bonded to > the deleted side chains. I know, by suppressing canonicalization things > will work. But I would like to canonicalize the smiles so that I can know > if there are duplicates. > > > > I tried marking the atoms. But I believe that properties that got > carried over to the output smiles, e.g. Isotope, affect the > canonicalization, while properties that do not affect canonicalization, > e.g, IntProp, are lost upon the conversion to smiles. > > > > Thank you for your insight. > > > > Ling > > > > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MolToSmiles atom ordering
Hi Ling, If there are symmetries then a substructure search like will only give you one mapping, and that might not be the canonical mapping. What you're looking for is the special property _smilesAtomOutputOrder >>> from rdkit import Chem >>> mol = Chem.MolFromSmiles("O=C(NCc1cc(OC)c(O)cc1)/C=C/C(C)C") >>> Chem.MolToSmiles(mol) 'COc1cc(CNC(=O)/C=C/C(C)C)ccc1O' >>> mol.GetProp("_smilesAtomOutputOrder") '[8,7,6,5,4,3,2,1,0,13,14,15,16,17,18,19,20,21,12,11,9,10,]' Here are the atom indices of the original SMILES: ┌ 1 11 1 1 1 2 2 atoms│ 0 1 234 56 78 9 0 12 3456 7 8 9 0 1 └ | | ||| || || | | || | | | | | SMILES[ O=C(NCc1cc(OC)c(O)cc1)/C=C/C(C)C You can see the first atom of the output is a "C", which is mapped to position 8 in the _smilesAtomOutputOrder, which is the "...C)..." in the original SMILES, etc. Cheers, Andrew da...@dalkescientific.com > On Nov 3, 2021, at 00:18, Ling Chan wrote: > > O.K. Problem solved. Sorry about the spam, folks. > > I can use GetSubstructMatch, as follows. > > # sinput is the input smiles > # scanon is the output smiles > > minput = Chem.MolFromSmiles(sinput) > scanon=Chem.MolToSmiles(minput) > mcanon=Chem.MolFromSmiles(scanon) > map_forward = minput.GetSubstructMatch(mcanon) > map_backward = mcanon.GetSubstructMatch(minput) > > > > > Ling Chan 於 2021年11月2日週二 下午3:55寫道: > Dear colleagues, > > Just wonder if I can obtain a mapping of the atom indices upon > canonicalization by MolToSmiles ? I am aware that canonicalization (and hence > atom reordering) can be suppressed in MolToSmiles, but I do want to > canonicalize the output smiles. > > If you are interested, here is a bit more details of my problem. For each > molecule, I want to delete one or two side chains, and obtain a smiles of > what is left. Just that I want to know what are the atoms that bonded to the > deleted side chains. I know, by suppressing canonicalization things will > work. But I would like to canonicalize the smiles so that I can know if there > are duplicates. > > I tried marking the atoms. But I believe that properties that got carried > over to the output smiles, e.g. Isotope, affect the canonicalization, while > properties that do not affect canonicalization, e.g, IntProp, are lost upon > the conversion to smiles. > > Thank you for your insight. > > Ling > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MolToSmiles atom ordering
O.K. Problem solved. Sorry about the spam, folks. I can use GetSubstructMatch, as follows. # sinput is the input smiles # scanon is the output smiles minput = Chem.MolFromSmiles(sinput) scanon=Chem.MolToSmiles(minput) mcanon=Chem.MolFromSmiles(scanon) map_forward = minput.GetSubstructMatch(mcanon) map_backward = mcanon.GetSubstructMatch(minput) Ling Chan 於 2021年11月2日週二 下午3:55寫道: > Dear colleagues, > > Just wonder if I can obtain a mapping of the atom indices upon > canonicalization by MolToSmiles ? I am aware that canonicalization (and > hence atom reordering) can be suppressed in MolToSmiles, but I do want to > canonicalize the output smiles. > > If you are interested, here is a bit more details of my problem. For each > molecule, I want to delete one or two side chains, and obtain a smiles of > what is left. Just that I want to know what are the atoms that bonded to > the deleted side chains. I know, by suppressing canonicalization things > will work. But I would like to canonicalize the smiles so that I can know > if there are duplicates. > > I tried marking the atoms. But I believe that properties that got carried > over to the output smiles, e.g. Isotope, affect the canonicalization, while > properties that do not affect canonicalization, e.g, IntProp, are lost upon > the conversion to smiles. > > Thank you for your insight. > > Ling > > > > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] MolToSmiles atom ordering
Dear colleagues, Just wonder if I can obtain a mapping of the atom indices upon canonicalization by MolToSmiles ? I am aware that canonicalization (and hence atom reordering) can be suppressed in MolToSmiles, but I do want to canonicalize the output smiles. If you are interested, here is a bit more details of my problem. For each molecule, I want to delete one or two side chains, and obtain a smiles of what is left. Just that I want to know what are the atoms that bonded to the deleted side chains. I know, by suppressing canonicalization things will work. But I would like to canonicalize the smiles so that I can know if there are duplicates. I tried marking the atoms. But I believe that properties that got carried over to the output smiles, e.g. Isotope, affect the canonicalization, while properties that do not affect canonicalization, e.g, IntProp, are lost upon the conversion to smiles. Thank you for your insight. Ling ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MolToSmiles
> On Oct 21, 2021, at 04:50, Ling Chan wrote: > > I got the attached sdf. When I did a MolToSmiles, it gives me the following. > > >>> for m in Chem.SDMolSupplier("pdb_structures/1q6k_ligand.sdf"): > ... print (Chem.MolToSmiles(m)) > ... > [CH3:0][C:0]([CH3:0])([CH3:0])[O:0][C:0](=[O:0])[NH:0][CH:0]([CH:0]=[O:0])[CH:0]1[CH2:0][CH2:0][CH2:0][CH2:0][CH2:0]1 > > Just wonder why does it not give something like > O=C(OC(C)(C)C)NC(C=O)C1C1 The terms after the atom symbol in your atom block lines are center-justified (or left-justified, in the 2-digit mass difference term 'dd') instead of right-justified. Here's a comparison of your first atom line, compared with the ctfile spec, and then compared with the round-trip through RDKit: 74.0060 -9.5770 134.8660 N 0 0 0 0 0 0 0 0 0 0 0 0<-- yours x.y.z. aaaddcccssshhhbbbvvvHHHrrriiimmmnnneee <-- spec 74.0060 -9.5770 134.8660 N 0 0 0 0 0 0 0 0 0 0 0 0 <-- RDKit Add a space after the atom symbol field ("aaa") and everything works. What happened? The ":0" in the SMILES string derives from the atom-atom mapping number, "mmm", in the SDF. The relevant code from Code/GraphMol/FileParsers/MolFileParser.cpp::ParseMolFileAtomLine() is: if (text.size() >= 63 && text.substr(60, 3) != " 0") { int atomMapNumber = 0; try { atomMapNumber = FileParserUtils::toInt(text.substr(60, 3), true); } catch (boost::bad_lexical_cast &) { std::ostringstream errout; errout << "Cannot convert '" << text.substr(60, 3) << "' to int on line " << line; delete res; throw FileParseException(errout.str()); } res->setProp(common_properties::molAtomMapNumber, atomMapNumber); } This says that if the field isn't exactly " 0" then parse it as an integer and store it in the atom's molAtomMapNumber. Since your " 0 " field isn't exactly " 0", it gets converted into the atom map value of 0. I don't see an explicit statement in the spec about alignment in fields. It's clear the spec comes from a Fortran background, so these should be interpreted as "I2" and "I3", and right-justified. By the way, if you pass your file through CDK you get: org.openscience.cdk.io.MDLV2000Reader ERROR: Error while parsing line 5: 74.0060 -9.5770 134.8660 N 0 0 0 0 0 0 0 0 0 0 0 0 -> invalid line length, 68:74.0060 -9.5770 134.8660 N 0 0 0 0 0 0 0 0 0 0 0 0 org.openscience.cdk.io.iterator.IteratingSDFReader ERROR: Error while reading next molecule: invalid line length, 68:74.0060 -9.5770 134.8660 N 0 0 0 0 0 0 0 0 0 0 0 0 CDK's storage/ctab/src/main/java/org/openscience/cdk/io/MDLV2000Reader.java::readAtomFast() requires that either all characters of a field be present, or the end of line. Your line is 68 characters long because your last field is " 0" instead of the " 0 " needed to match the exact charge flag "eee". Best regards, Andrew da...@dalkescientific.com ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] MolToSmiles
Dear colleagues, I got the attached sdf. When I did a MolToSmiles, it gives me the following. >>> for m in Chem.SDMolSupplier("pdb_structures/1q6k_ligand.sdf"): ... print (Chem.MolToSmiles(m)) ... [CH3:0][C:0]([CH3:0])([CH3:0])[O:0][C:0](=[O:0])[NH:0][CH:0]([CH:0]=[O:0])[CH:0]1[CH2:0][CH2:0][CH2:0][CH2:0][CH2:0]1 Just wonder why does it not give something like O=C(OC(C)(C)C)NC(C=O)C1C1 Thank you for your insight. Ling Chan 1q6k_ligand.sdf Description: application/vnd.openmolecules.sdf ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MolToSmiles preserve atom order
On Nov 18, 2019, at 17:40, David Cosgrove wrote: > > Point taken. I don’t think you’d be able to get RDKit to spit such SMILES > strings out unless you tortured it pretty hard, however. Did someone mention one of my favorite things to do? :) See: http://dalkescientific.com/writings/diary/archive/2010/12/28/reordering_smiles.html Note that that code does not preserve stereochemistry. It's for Python 2, so change the: available_closures = range(100) to available_closures = list(range(100)) to make it work under Python 3. Here's what it looks like: >>> from x import reordered_smiles >>> from rdkit import Chem >>> mol = Chem.MolFromSmiles("OCCl") >>> atoms = list(mol.GetAtoms()) >>> reordered_smiles(mol, [atoms[1], atoms[0], atoms[2]]) '[CH2]12.[OH1]1.[Cl]2' Andrew da...@dalkescientific.com ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MolToSmiles preserve atom order
On Mon, 18 Nov 2019 16:40:28 + David Cosgrove wrote: > Point taken. I don’t think you’d be able to get RDKit to spit such SMILES > strings out unless you tortured it pretty hard, however. Export smiles with arbitrary given atom order is diffrent problem. Normally working with mol object you dont remove any bond, but rather you change atoms properties (such as isotope, AtomMapNum, explicitHs and so on). I want to show some simple example but in simple cases MolToSmiles with rootedAtAtom=0, canonical=False preserve atom order. I found one example when it didn't work as I expected (atom order was altered) but it seems I lost this smiles. Anyway, is such code: mol=Chem.MolFromSmiles(someSmilesString) change_properties_of_some_atoms_in_mol(mol) #this function changes isotopes of selected atoms smiles2 = Chem.MolToSmiles(mol, rootedAtAtom=0, canonical=False) mol_from_smiles2 = Chem.MolFromSmiles(smiles2) atom order (or atom indices returned by GetIdx() function) should be the same or it can be diffrent? best, Rafal ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MolToSmiles preserve atom order
Hi Rocco, Point taken. I don’t think you’d be able to get RDKit to spit such SMILES strings out unless you tortured it pretty hard, however. Dave On Mon, 18 Nov 2019 at 16:36, Rocco Moretti wrote: > Actually, it is possible to get arbitrary orders, if you (ab)use the '.' > component ("zero order bond") directive and the numeric bonding ("ring > closure") directives: > > >>> Chem.MolToSmiles( Chem.MolFromSmiles("O1.Cl2.C12" ) ) > 'OCCl' > > Whether you want to do things that way is another question. > > On Mon, Nov 18, 2019 at 10:24 AM David Cosgrove < > davidacosgrov...@gmail.com> wrote: > >> Hi Rafal, >> It is not always possible to preserve the atom ordering in the SMILES >> string because there is an implied bond between contiguous symbols in the >> SMILES. I think, for example, that the molecule with the SMILES OCCl >> couldn’t have the order in the molecule object O first, Cl second, C third, >> with bonds between 1 and 3 and 2 and 3 and get the SMILES in that order. >> >> I hope that made sense. Please ask again if not. >> >> Best regards, >> Dave >> >> >> On Mon, 18 Nov 2019 at 12:33, Rafal Roszak wrote: >> >>> Hi all, >>> >>> Is there any way to preserve atom order from Mol object during >>> exporting to smiles? I tried MolToSmiles with rootedAtAtom=0 and >>> canonical=False options but it not always prevent oryginal order. >>> I know I can use _smilesAtomOutputOrder to map old indices to new one >>> in canonical smiles but maybe we have something more handy? >>> >>> Best, >>> >>> Rafał >>> >>> >>> ___ >>> Rdkit-discuss mailing list >>> Rdkit-discuss@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >> -- >> David Cosgrove >> Freelance computational chemistry and chemoinformatics developer >> http://cozchemix.co.uk >> >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- David Cosgrove Freelance computational chemistry and chemoinformatics developer http://cozchemix.co.uk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MolToSmiles preserve atom order
Actually, it is possible to get arbitrary orders, if you (ab)use the '.' component ("zero order bond") directive and the numeric bonding ("ring closure") directives: >>> Chem.MolToSmiles( Chem.MolFromSmiles("O1.Cl2.C12" ) ) 'OCCl' Whether you want to do things that way is another question. On Mon, Nov 18, 2019 at 10:24 AM David Cosgrove wrote: > Hi Rafal, > It is not always possible to preserve the atom ordering in the SMILES > string because there is an implied bond between contiguous symbols in the > SMILES. I think, for example, that the molecule with the SMILES OCCl > couldn’t have the order in the molecule object O first, Cl second, C third, > with bonds between 1 and 3 and 2 and 3 and get the SMILES in that order. > > I hope that made sense. Please ask again if not. > > Best regards, > Dave > > > On Mon, 18 Nov 2019 at 12:33, Rafal Roszak wrote: > >> Hi all, >> >> Is there any way to preserve atom order from Mol object during >> exporting to smiles? I tried MolToSmiles with rootedAtAtom=0 and >> canonical=False options but it not always prevent oryginal order. >> I know I can use _smilesAtomOutputOrder to map old indices to new one >> in canonical smiles but maybe we have something more handy? >> >> Best, >> >> Rafał >> >> >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > -- > David Cosgrove > Freelance computational chemistry and chemoinformatics developer > http://cozchemix.co.uk > > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MolToSmiles preserve atom order
Hi Rafal, It is not always possible to preserve the atom ordering in the SMILES string because there is an implied bond between contiguous symbols in the SMILES. I think, for example, that the molecule with the SMILES OCCl couldn’t have the order in the molecule object O first, Cl second, C third, with bonds between 1 and 3 and 2 and 3 and get the SMILES in that order. I hope that made sense. Please ask again if not. Best regards, Dave On Mon, 18 Nov 2019 at 12:33, Rafal Roszak wrote: > Hi all, > > Is there any way to preserve atom order from Mol object during > exporting to smiles? I tried MolToSmiles with rootedAtAtom=0 and > canonical=False options but it not always prevent oryginal order. > I know I can use _smilesAtomOutputOrder to map old indices to new one > in canonical smiles but maybe we have something more handy? > > Best, > > Rafał > > > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- David Cosgrove Freelance computational chemistry and chemoinformatics developer http://cozchemix.co.uk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] MolToSmiles preserve atom order
Hi all, Is there any way to preserve atom order from Mol object during exporting to smiles? I tried MolToSmiles with rootedAtAtom=0 and canonical=False options but it not always prevent oryginal order. I know I can use _smilesAtomOutputOrder to map old indices to new one in canonical smiles but maybe we have something more handy? Best, Rafał ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MolToSmiles(), atom indexes
Dear Jose Manuel, Many thanks for your quick answer and for your script. All the best, Jean-Marc Le 01/02/2019 à 13:20, Jose Manuel Gally a écrit : Dear Jean-Marc, I believe this can be achieved by using the Mol property "_smilesAtomOutputOrder", which is set only after using the function Chem.MolToSmiles. Please find attached a very simple example of how it can be extracted. Cheers, Jose Manuel On 01.02.19 13:03, Jean-Marc Nuzillard wrote: Dear all, I am looking for a way to relate atom indexes of a Mol object and the order of appearance of the atoms along the corresponding SMILES chain, as produced by Chem.MolToSmiles(). Thanks in advance, Jean-Marc -- Dr. Jean-Marc Nuzillard Institute of Molecular Chemistry, CNRS UMR 7312 Faculté des Sciences Exactes et Naturelles, Bâtiment 18 BP 1039 51687 REIMS Cedex 2 France Tel : 33 3 26 91 82 10 Fax : 33 3 26 91 31 66 http://www.univ-reims.fr/ICMR http://eos.univ-reims.fr/LSD/CSNteam.html http://www.univ-reims.fr/LSD/ http://www.univ-reims.fr/LSD/JmnSoft/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Jean-Marc Nuzillard Directeur de Recherches au CNRS Institut de Chimie Moléculaire de Reims CNRS UMR 7312 Moulin de la Housse CPCBAI, Bâtiment 18 BP 1039 51687 REIMS Cedex 2 France Tel : 03 26 91 82 10 Fax : 03 26 91 31 66 http://www.univ-reims.fr/ICMR http://eos.univ-reims.fr/LSD/CSNteam.html http://www.univ-reims.fr/LSD/ http://www.univ-reims.fr/LSD/JmnSoft/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MolToSmiles(), atom indexes
Dear Jean-Marc, I believe this can be achieved by using the Mol property "_smilesAtomOutputOrder", which is set only after using the function Chem.MolToSmiles. Please find attached a very simple example of how it can be extracted. Cheers, Jose Manuel On 01.02.19 13:03, Jean-Marc Nuzillard wrote: Dear all, I am looking for a way to relate atom indexes of a Mol object and the order of appearance of the atoms along the corresponding SMILES chain, as produced by Chem.MolToSmiles(). Thanks in advance, Jean-Marc -- Dr. Jean-Marc Nuzillard Institute of Molecular Chemistry, CNRS UMR 7312 Faculté des Sciences Exactes et Naturelles, Bâtiment 18 BP 1039 51687 REIMS Cedex 2 France Tel : 33 3 26 91 82 10 Fax : 33 3 26 91 31 66 http://www.univ-reims.fr/ICMR http://eos.univ-reims.fr/LSD/CSNteam.html http://www.univ-reims.fr/LSD/ http://www.univ-reims.fr/LSD/JmnSoft/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss rdkit_example_smiles_atom_order.ipynb Description: application/ipynb ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] MolToSmiles(), atom indexes
Dear all, I am looking for a way to relate atom indexes of a Mol object and the order of appearance of the atoms along the corresponding SMILES chain, as produced by Chem.MolToSmiles(). Thanks in advance, Jean-Marc -- Dr. Jean-Marc Nuzillard Institute of Molecular Chemistry, CNRS UMR 7312 Faculté des Sciences Exactes et Naturelles, Bâtiment 18 BP 1039 51687 REIMS Cedex 2 France Tel : 33 3 26 91 82 10 Fax : 33 3 26 91 31 66 http://www.univ-reims.fr/ICMR http://eos.univ-reims.fr/LSD/CSNteam.html http://www.univ-reims.fr/LSD/ http://www.univ-reims.fr/LSD/JmnSoft/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MolToSmiles
I agree with Andrew's suggestion. The optional list argument defaulting to None exactly how I would solve it and fits (I think) with at least most of the RDKit. -greg On Mon, Dec 19, 2016 at 9:14 PM +0100, "Brian Kelley" wrote: I'm happy to do that as long as there is a consensus. We could also expose the properties in non-string form, but that is a bit harder to do. GetPropsAsDict does this, but has the overhead that it does a conversion for everything, not just the thing you want. It does handle the underlying type correctly though which is convenient. Brian Kelley > On Dec 19, 2016, at 2:59 PM, Andrew Dalke wrote: > >> On Dec 19, 2016, at 6:22 PM, Brian Kelley wrote: >> I had thought about making a CanonicalAtomOrder function that does this as >> well, or perhaps making a MolToSmiles variant. > > I learned about this function from Noel's blog post at > https://nextmovesoftware.com/blog/2013/07/01/accessing-smiles-atom-order/ , > which uses the C++ API. > > I would like a variant more along those lines, like: > > MolToSmiles(mol, isomericSmiles=None, allHsExplicit=False, > atomOrder=None) > > where if I pass in: > > atomOrder = [] > MolToSmiles(mol, atomOrder=atomOrder) > > then I get the list of indices in atomOrder, rather than a per-molecule > property. > > atomOrder=None can do the existing behavior. > > > Cheers, > > >Andrew >da...@dalkescientific.com > > > > -- > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today.http://sdm.link/intel > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today.http://sdm.link/intel ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today.http://sdm.link/intel___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MolToSmiles
I'm happy to do that as long as there is a consensus. We could also expose the properties in non-string form, but that is a bit harder to do. GetPropsAsDict does this, but has the overhead that it does a conversion for everything, not just the thing you want. It does handle the underlying type correctly though which is convenient. Brian Kelley > On Dec 19, 2016, at 2:59 PM, Andrew Dalke wrote: > >> On Dec 19, 2016, at 6:22 PM, Brian Kelley wrote: >> I had thought about making a CanonicalAtomOrder function that does this as >> well, or perhaps making a MolToSmiles variant. > > I learned about this function from Noel's blog post at > https://nextmovesoftware.com/blog/2013/07/01/accessing-smiles-atom-order/ , > which uses the C++ API. > > I would like a variant more along those lines, like: > > MolToSmiles(mol, isomericSmiles=None, allHsExplicit=False, > atomOrder=None) > > where if I pass in: > > atomOrder = [] > MolToSmiles(mol, atomOrder=atomOrder) > > then I get the list of indices in atomOrder, rather than a per-molecule > property. > > atomOrder=None can do the existing behavior. > > > Cheers, > > >Andrew >da...@dalkescientific.com > > > > -- > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today.http://sdm.link/intel > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today.http://sdm.link/intel ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MolToSmiles
On Dec 19, 2016, at 6:22 PM, Brian Kelley wrote: > I had thought about making a CanonicalAtomOrder function that does this as > well, or perhaps making a MolToSmiles variant. I learned about this function from Noel's blog post at https://nextmovesoftware.com/blog/2013/07/01/accessing-smiles-atom-order/ , which uses the C++ API. I would like a variant more along those lines, like: MolToSmiles(mol, isomericSmiles=None, allHsExplicit=False, atomOrder=None) where if I pass in: atomOrder = [] MolToSmiles(mol, atomOrder=atomOrder) then I get the list of indices in atomOrder, rather than a per-molecule property. atomOrder=None can do the existing behavior. Cheers, Andrew da...@dalkescientific.com -- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today.http://sdm.link/intel ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MolToSmiles
I would vote for make a more obvious way to get to these values. I have had the need to do this when working with external depictors (i.e. mol -> smiles -> depict with atom highlighting is one use case) I just couldn't think of a valid API way of doing this. Attaching these values to the molecule seems like it isn't really the right solution considering there are two forms of canonical ordering if isomerisms are considered. I had thought about making a CanonicalAtomOrder function that does this as well, or perhaps making a MolToSmiles variant. Any other ideas? On Mon, Dec 19, 2016 at 3:58 AM, Greg Landrum wrote: > > On Mon, Dec 19, 2016 at 9:43 AM, Maciek Wójcikowski > wrote: > >> >> There is also CanonicalRankAtoms [http://www.rdkit.org/Python_D >> ocs/rdkit.Chem.rdmolfiles-module.html#CanonicalRankAtoms] which seams to >> be forgotten. >> > > One thing to be aware of here is that this provides the canonical ranking > of atoms that is used for the SMILES generation, but the values are not > equal to the actual output order of the atoms. > Here's an example of that: > In [3]: m = Chem.MolFromSmiles('CC(O)CCN') > > In [4]: list(Chem.CanonicalRankAtoms(m)) > Out[4]: [0, 5, 2, 4, 3, 1] > > In [5]: Chem.MolToSmiles(m) > Out[5]: 'CC(O)CCN' > > In [7]: m.GetProp('_smilesAtomOutputOrder') > Out[7]: '[0,1,2,3,4,5,]' > > so though atom 1 is ranked in position 5, it ends up being the second atom > output since it is connected to atom 0, which happens to have rank 0. > > -greg > > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today.http://sdm.link/intel___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MolToSmiles
On Mon, Dec 19, 2016 at 9:43 AM, Maciek Wójcikowski wrote: > > There is also CanonicalRankAtoms [http://www.rdkit.org/Python_ > Docs/rdkit.Chem.rdmolfiles-module.html#CanonicalRankAtoms] which seams to > be forgotten. > One thing to be aware of here is that this provides the canonical ranking of atoms that is used for the SMILES generation, but the values are not equal to the actual output order of the atoms. Here's an example of that: In [3]: m = Chem.MolFromSmiles('CC(O)CCN') In [4]: list(Chem.CanonicalRankAtoms(m)) Out[4]: [0, 5, 2, 4, 3, 1] In [5]: Chem.MolToSmiles(m) Out[5]: 'CC(O)CCN' In [7]: m.GetProp('_smilesAtomOutputOrder') Out[7]: '[0,1,2,3,4,5,]' so though atom 1 is ranked in position 5, it ends up being the second atom output since it is connected to atom 0, which happens to have rank 0. -greg -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MolToSmiles
Hi Jean-Marc and others, There is also CanonicalRankAtoms [ http://www.rdkit.org/Python_Docs/rdkit.Chem.rdmolfiles-module.html#CanonicalRankAtoms] which seams to be forgotten. Pozdrawiam, | Best regards, Maciek Wójcikowski mac...@wojcikowski.pl 2016-12-18 23:14 GMT+01:00 Jean-Marc Nuzillard : > Thank you Andrew, Brian and David for your answers. > > mol.GetProp("_smilesAtomOutputOrder") does the job. > I also expected a.GetProp("molAtomMapNumber") could do it for each atom a. > > All the best, > > Jean-Marc > > Le 18/12/2016 à 19:04, Andrew Dalke a écrit : > > On Dec 18, 2016, at 6:32 PM, Brian Kelley wrote: > > m.GetProp("_smilesAtomOutputOrder") > >> '[3,2,1,0,]' > >> > >> Note that this returns the list as a string which is sub-optimal. > GetPropsAsDict will convert these to proper python objects, however, this > is considered a private member so you need to return these as well: > >> > > list(m.GetPropsAsDict(True,True)["_smilesAtomOutputOrder"]) > >> [3, 2, 1, 0] > > For fun, here are a few timing numbers: > > > ># Common setup > > from rdkit import Chem > > mol = Chem.MolFromSmiles("c1c1Oc1c1") > > Chem.MolToSmiles(mol)' > > import json > > import ujson # third-party JSON decoder > > import re > > integer_pat = re.compile("[0-9]+") > > > > > > # Get the string (give a lower bound) > > mol.GetProp("_smilesAtomOutputOrder")' > > 1 loops, best of 3: 31.3 usec per loop > > > > > > Here are variations for how to get that information as a list of > integers: > > > > # Using Python's "eval()" to decode the list (this is generally UNSAFE!) > > eval(mol.GetProp("_smilesAtomOutputOrder"))' > > 1 loops, best of 3: 157 usec per loop > > > > # Use the built-in json module (need to remove the terminal ",") > > json.loads(mol.GetProp("_smilesAtomOutputOrder")[:-2]+"]")' > > 1 loops, best of 3: 66.5 usec per loop > > > > # Use the third-party "ujson" package, which is faster than json. > > ujson.loads(mol.GetProp("_smilesAtomOutputOrder")[:-2]+"]") > > 1 loops, best of 3: 41.2 usec per loop > > > > ("cjson" takes 49.7 usec per loop) > > > > # Use the properties dictionary > > mol.GetPropsAsDict(True,True)["_smilesAtomOutputOrder"] > > 1000 loops, best of 3: 462 usec per loop > > > > # Parse it more directly > > map(int, integer_pat.findall(mol.GetProp("_smilesAtomOutputOrder"))) > > 1 loops, best of 3: 89 usec per loop > > > > > > Andrew > > da...@dalkescientific.com > > > > > > > > > -- > > Check out the vibrant tech community on one of the world's most > > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > > ___ > > Rdkit-discuss mailing list > > Rdkit-discuss@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > > > > -- > Jean-Marc Nuzillard > Institut de Chimie Moléculaire de Reims > CNRS UMR 7312 > Moulin de la Housse > CPCBAI, Bâtiment 18 > BP 1039 > 51687 REIMS Cedex 2 > France > > Tel : 03 26 91 82 10 > Fax : 03 26 91 31 66 > http://www.univ-reims.fr/ICMR > > http://www.univ-reims.fr/LSD/ > http://www.univ-reims.fr/LSD/JmnSoft/ > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MolToSmiles
Thank you Andrew, Brian and David for your answers. mol.GetProp("_smilesAtomOutputOrder") does the job. I also expected a.GetProp("molAtomMapNumber") could do it for each atom a. All the best, Jean-Marc Le 18/12/2016 à 19:04, Andrew Dalke a écrit : > On Dec 18, 2016, at 6:32 PM, Brian Kelley wrote: > m.GetProp("_smilesAtomOutputOrder") >> '[3,2,1,0,]' >> >> Note that this returns the list as a string which is sub-optimal. >> GetPropsAsDict will convert these to proper python objects, however, this is >> considered a private member so you need to return these as well: >> > list(m.GetPropsAsDict(True,True)["_smilesAtomOutputOrder"]) >> [3, 2, 1, 0] > For fun, here are a few timing numbers: > ># Common setup > from rdkit import Chem > mol = Chem.MolFromSmiles("c1c1Oc1c1") > Chem.MolToSmiles(mol)' > import json > import ujson # third-party JSON decoder > import re > integer_pat = re.compile("[0-9]+") > > > # Get the string (give a lower bound) > mol.GetProp("_smilesAtomOutputOrder")' > 1 loops, best of 3: 31.3 usec per loop > > > Here are variations for how to get that information as a list of integers: > > # Using Python's "eval()" to decode the list (this is generally UNSAFE!) > eval(mol.GetProp("_smilesAtomOutputOrder"))' > 1 loops, best of 3: 157 usec per loop > > # Use the built-in json module (need to remove the terminal ",") > json.loads(mol.GetProp("_smilesAtomOutputOrder")[:-2]+"]")' > 1 loops, best of 3: 66.5 usec per loop > > # Use the third-party "ujson" package, which is faster than json. > ujson.loads(mol.GetProp("_smilesAtomOutputOrder")[:-2]+"]") > 1 loops, best of 3: 41.2 usec per loop > > ("cjson" takes 49.7 usec per loop) > > # Use the properties dictionary > mol.GetPropsAsDict(True,True)["_smilesAtomOutputOrder"] > 1000 loops, best of 3: 462 usec per loop > > # Parse it more directly > map(int, integer_pat.findall(mol.GetProp("_smilesAtomOutputOrder"))) > 1 loops, best of 3: 89 usec per loop > > > Andrew > da...@dalkescientific.com > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- Jean-Marc Nuzillard Institut de Chimie Moléculaire de Reims CNRS UMR 7312 Moulin de la Housse CPCBAI, Bâtiment 18 BP 1039 51687 REIMS Cedex 2 France Tel : 03 26 91 82 10 Fax : 03 26 91 31 66 http://www.univ-reims.fr/ICMR http://www.univ-reims.fr/LSD/ http://www.univ-reims.fr/LSD/JmnSoft/ -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MolToSmiles
On Dec 18, 2016, at 6:32 PM, Brian Kelley wrote: > >>> m.GetProp("_smilesAtomOutputOrder") > '[3,2,1,0,]' > > Note that this returns the list as a string which is sub-optimal. > GetPropsAsDict will convert these to proper python objects, however, this is > considered a private member so you need to return these as well: > > >>> list(m.GetPropsAsDict(True,True)["_smilesAtomOutputOrder"]) > [3, 2, 1, 0] For fun, here are a few timing numbers: # Common setup from rdkit import Chem mol = Chem.MolFromSmiles("c1c1Oc1c1") Chem.MolToSmiles(mol)' import json import ujson # third-party JSON decoder import re integer_pat = re.compile("[0-9]+") # Get the string (give a lower bound) mol.GetProp("_smilesAtomOutputOrder")' 1 loops, best of 3: 31.3 usec per loop Here are variations for how to get that information as a list of integers: # Using Python's "eval()" to decode the list (this is generally UNSAFE!) eval(mol.GetProp("_smilesAtomOutputOrder"))' 1 loops, best of 3: 157 usec per loop # Use the built-in json module (need to remove the terminal ",") json.loads(mol.GetProp("_smilesAtomOutputOrder")[:-2]+"]")' 1 loops, best of 3: 66.5 usec per loop # Use the third-party "ujson" package, which is faster than json. ujson.loads(mol.GetProp("_smilesAtomOutputOrder")[:-2]+"]") 1 loops, best of 3: 41.2 usec per loop ("cjson" takes 49.7 usec per loop) # Use the properties dictionary mol.GetPropsAsDict(True,True)["_smilesAtomOutputOrder"] 1000 loops, best of 3: 462 usec per loop # Parse it more directly map(int, integer_pat.findall(mol.GetProp("_smilesAtomOutputOrder"))) 1 loops, best of 3: 89 usec per loop Andrew da...@dalkescientific.com -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MolToSmiles
Hi Jean-Marc, There is a property of the molecule created when it is read that contains this information. I forget what it is called, but if you call the molecule's GetPropNames function you should see something obvious in the values returned. You can then call GetProp with that property name to get a string containing the canonical atom order. Note that string is a string representation of the Python list, with '[' at the start, ']' at the end, and commas in between. You'll need to manipulate it a bit to release the array of integers you need. Cheers, Dave On Sun, Dec 18, 2016 at 5:19 PM, Jean-Marc Nuzillard < jm.nuzill...@univ-reims.fr> wrote: > Hi all, > > maybe my question has been already been answered: > when converting from Mol to a canonical SMILES string, > is there a way to obtain the mapping between the atom indexes in the > Mol object and the atom indexes in the SMILES chain? > > All the best, > > Jean-Marc > > -- > > Dr. Jean-Marc Nuzillard > Institute of Molecular Chemistry > CNRS UMR 7312 > Moulin de la Housse > CPCBAI, Bâtiment 18 > BP 1039 > 51687 REIMS Cedex 2 > France > > Tel : 33 3 26 91 82 10 > Fax :33 3 26 91 31 66 > http://www.univ-reims.fr/ICMR > > http://eos.univ-reims.fr/LSD/ > http://eos.univ-reims.fr/LSD/JmnSoft/ > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MolToSmiles
Jean-Marc, This is very non-obvious, but here is how you can do it from python: >>> from rdkit import Chem >>> m = Chem.MolFromSmiles("NCCC") >>> Chem.MolToSmiles(m) 'CCCN' >>> m.GetProp("_smilesAtomOutputOrder") '[3,2,1,0,]' Note that this returns the list as a string which is sub-optimal. GetPropsAsDict will convert these to proper python objects, however, this is considered a private member so you need to return these as well: >>> list(m.GetPropsAsDict(True,True)["_smilesAtomOutputOrder"]) [3, 2, 1, 0] I'm converting to a list here to show the output, this is really a wrapped vector but it can be used as a sequence. Hope this helps. Note that you can just dump out the dictionary for any object with SetProp: >>> m.GetPropsAsDict(True,True) {'_smilesAtomOutputOrder': , 'numArom': 0, '_StereochemDone': 1, '__computedProps': } And see some of how the sausage is made inside. Cheers, Brian On Sun, Dec 18, 2016 at 12:19 PM, Jean-Marc Nuzillard < jm.nuzill...@univ-reims.fr> wrote: > Hi all, > > maybe my question has been already been answered: > when converting from Mol to a canonical SMILES string, > is there a way to obtain the mapping between the atom indexes in the > Mol object and the atom indexes in the SMILES chain? > > All the best, > > Jean-Marc > > -- > > Dr. Jean-Marc Nuzillard > Institute of Molecular Chemistry > CNRS UMR 7312 > Moulin de la Housse > CPCBAI, Bâtiment 18 > BP 1039 > 51687 REIMS Cedex 2 > France > > Tel : 33 3 26 91 82 10 > Fax :33 3 26 91 31 66 > http://www.univ-reims.fr/ICMR > > http://eos.univ-reims.fr/LSD/ > http://eos.univ-reims.fr/LSD/JmnSoft/ > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] MolToSmiles
Hi all, maybe my question has been already been answered: when converting from Mol to a canonical SMILES string, is there a way to obtain the mapping between the atom indexes in the Mol object and the atom indexes in the SMILES chain? All the best, Jean-Marc -- Dr. Jean-Marc Nuzillard Institute of Molecular Chemistry CNRS UMR 7312 Moulin de la Housse CPCBAI, Bâtiment 18 BP 1039 51687 REIMS Cedex 2 France Tel : 33 3 26 91 82 10 Fax :33 3 26 91 31 66 http://www.univ-reims.fr/ICMR http://eos.univ-reims.fr/LSD/ http://eos.univ-reims.fr/LSD/JmnSoft/ -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss