[Rdkit-discuss] sampling of ring conformation for docking
Hello, I use RDKit to embed initial conformations for docking. The issue is with saturated rings. I can use a single random conformer but its geometry may be unsuitable and the whole molecule will fail to dock. I can use several starting conformers for docking and to avoid docking of very similar conformers I can select a few diverse conformers based on RMSD between rings only. However, the issue occurs if a molecule has several such saturated rings. The current workaround is to compute RMSD between corresponding rings individually, then average RMSD values and select a diverse set of conformers. It may work to some extend. However I'm curious whether a better solution possible? Can we sample rings individually and embed a molecule using pre-generated conformers of some parts (rings)? I know about the restricted conformer enumeration function, but it will work if we supply only a single connected part as fixed. It should not work if we have two disconnected parts (rings) with 3D coordinates, because we do not know their relative position to generate 3D coordinates for the rest of atoms in a molecule. Maybe someone will have some ideas/suggestions? Kind regards, Pavel ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] 6th Advanced in silico Drug Design workshop in Olomouc
Dear colleagues, after covid "holidays" we are cordially invite you on the 6th Drug Design workshop which will be held 30 January - 3 February 2023 in Olomouc (Czech Republic). It is focused on practical applications of different chemoinformatic tools and approaches for drug development. This might be interesting for bachelor, master and PhD students to broaden their experience and sharpen skills. Basic Python skills would be desirable for machine learning and de novo tutorials. During the workshop, participants will be introduced with pharmacophore modeling (LigandScout), molecular docking (AutoDock Vina), QSAR modeling (sklearn and RDKit), high-throughput molecular dynamics (GROMACS) and de novo methods (CReM). A competition will be organized at the last day of the workshop where participants will be able to apply acquired knowledge and skills to solve real chemoinformatic tasks and win prizes. https://www.kfc.upol.cz/6add Please feel free to share this information with those who can be interested in participation in such event. Thank you! Kind regards, Pavel ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Comparison of the docking conformations with the X-ray pose
Hi Enrico, you may look at this script https://github.com/DrrDom/rdkit-scripts/blob/master/rmsd_rdkit.py It takes PDBQT as input and reference files and calc rmsd for the largest common fragment if structures do not match each other. If there are multiple models in the input file it will calc rmsd for all of them. Do not forget supply SMILES files complementary to PDBQT, because PDB does not contain bond orders. There is a built-in help message in the script, but if you will have question regarding it you may ask me. At least you may use it as a source to create your own script suitable for your needs. Kind regards, Pavel. On 30/03/2022 18:16, Enrico Martinez wrote: Dear RDKIT users! I am dealing with the analysis of the results of the docking poses calculated via VINA and saved into the multi-model pdb. I need to find a possibility to compare each docking pose with the pose in the X-ray structure (which has similar but not the identical ligand!) in order to find automatically the model (= docking solution) with the positions of the ligand similar to the X-ray structure? Assuming that the both pdbs (docking poses, and X-ray structure) have been superimposed (based on the protein atoms) how could I automatically find the model (in the ensemble of 100 docking solutions) where the ligand may resemble the X-ray pose (e.g. taking a part of the ligand shared between the both structures as the reference for comparison) ? I would be grateful for any suggestions With kind regards, Enrico ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] 2dpharmacophores
Hi Chris, Error reported a bin (0,2,0) which does not satisfy the triangle rule. I cannot check it, but I suppose that the error may disappear if you set trianglePruneBins=False in SigFactory, but this would be a workaround not a proper solution I think. Pavel. On 12/08/2021 10:25, Chris Swain via Rdkit-discuss wrote: Hi, I’m trying to generate 2Dpharmacophore fingerprints as described here https://www.rdkit.org/docs/GettingStartedInPython.html#d-pharmacophore-fingerprints <https://www.rdkit.org/docs/GettingStartedInPython.html#d-pharmacophore-fingerprints> For the majority of molecules this works fine but for a few I get this error, any idea what the issue is? Chris qmol = Chem.MolFromSmiles('COc1ccc(C(C)C)cc1CN[C@H]1C2CCN(CC2)[C@H]1C(c1c1)c1c1') fpquery = Generate.Gen2DFingerprint(qmol,sigFactory) fpquery --- ValueError Traceback (most recent call last) ~/miniconda3/lib/python3.7/site-packages/rdkit/Chem/Pharm2D/SigFactory.py inGetBitIdx(self, featIndices, dists, sortIndices) 248 print('\tbins:', repr(self._bins), type(self._bins)) --> 249bin_= self._findBinIdx(dists, self._bins, self._scaffolds[len(dists)]) 250 except ValueError: ~/miniconda3/lib/python3.7/site-packages/rdkit/Chem/Pharm2D/SigFactory.py in_findBinIdx(self, dists, bins, scaffolds) 167 whichBins[i] = where --> 168res= scaffolds.index(tuple(whichBins)) 169 if _verbose: ValueError: (0, 2, 0) is not in list During handling of the above exception, another exception occurred: IndexError Traceback (most recent call last) in 1 qmol= Chem.MolFromSmiles('COc1ccc(C(C)C)cc1CN[C@H]1C2CCN(CC2)[C@H]1C(c1c1)c1c1') > 2fpquery= Generate.Gen2DFingerprint(qmol,sigFactory) 3 fpquery ~/miniconda3/lib/python3.7/site-packages/rdkit/Chem/Pharm2D/Generate.py inGen2DFingerprint(mol, sigFactory, perms, dMat, bitInfo) 160 for matchin matchesToMap: 161if sigFactory.shortestPathsOnly: --> 162idx= _ShortestPathsMatch(match, perm, sig, dMat, sigFactory) 163 if idxis not None and bitInfois not None: 164l= bitInfo.get(idx, []) ~/miniconda3/lib/python3.7/site-packages/rdkit/Chem/Pharm2D/Generate.py in_ShortestPathsMatch(match, featureSet, sig, dMat, sigFactory) 71 dist[i] = d 72 ---> 73idx= sigFactory.GetBitIdx(featureSet, dist, sortIndices=False) 74if _verbose: 75 print('\t', dist, minD, maxD, idx) ~/miniconda3/lib/python3.7/site-packages/rdkit/Chem/Pharm2D/SigFactory.py inGetBitIdx(self, featIndices, dists, sortIndices) 252 fams= [fams[x] for xin featIndices] 253 raise IndexError('distance bin not found: feats: %s; dists=%s; bins=%s; scaffolds: %s' % --> 254(fams, dists, self._bins, self._scaffolds)) 255 256 return startIdx+ offset+ bin_ IndexError: distance bin not found: feats: ['Acceptor', 'Aromatic', 'Hydrophobe']; dists=[1, 5, 1]; bins=[(0, 2), (2, 5), (5, 8)]; scaffolds: [0, [(0,), (1,), (2,)], 0, [(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (0, 1, 2), (0, 2, 1), (0, 2, 2), (1, 0, 0), (1, 0, 1), (1, 0, 2), (1, 1, 0), (1, 1, 1), (1, 1, 2), (1, 2, 0), (1, 2, 1), (1, 2, 2), (2, 0, 1), (2, 0, 2), (2, 1, 0), (2, 1, 1), (2, 1, 2), (2, 2, 0), (2, 2, 1), (2, 2, 2)], 0] ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] rejoining pairs of fragments after fragmenting a molecule
Hi Ling, this can be a workaround if RDKit does not have a built-in function to extract a submolecule by atom ids. You may assign atom property labels to these atoms and then looping over atoms in EditableMol remove those ones which do not have this property assigned. Kind regards, Pavel. On 02/04/2021 02:30, Ling Chan wrote: Yes, Kangway, that was what I first tried, as mentioned in the first post. I did not have any problem with obtaining the primary fragments (applying all cuts) . Just that I have not yet figured out how to obtain the secondary fragments, either from recombining the primary fragments, or from fragmenting from the initial molecule (by not applying all cuts). Chuang, Kangway <mailto:kangway.chu...@ucsf.edu>> 於 2021年4月1日週四 下午5:20寫道: Hi Ling, I think I've run into something similar before, have you tried using FragmentOnBonds followed by Chem.GetMolFrags? GetMolFrags lets you toggle a few things (e.g. (bool)asMols=False [, (bool)sanitizeFrags=True) to provide some workarounds with sanitization. Best, Kangway *From:* Ling Chan mailto:lingtrek...@gmail.com>> *Sent:* Thursday, April 1, 2021 5:08 PM *To:* Mark Mackey mailto:m...@cresset-group.com>> *Cc:* RDKit mailto:rdkit-discuss@lists.sourceforge.net>> *Subject:* Re: [Rdkit-discuss] rejoining pairs of fragments after fragmenting a molecule >> /I did manage to achieve what I want by going through Chem.MolFragmentToSmiles and then convert the Smiles back to a Mol. But is there a neater way?/ Oops, I wrote too soon. Actually I did not achieve what I want. The conversion from the smiles from Chem.MolFragmentToSmiles sometimes crashes, because of Sanitization problem. Ling Chan mailto:lingtrek...@gmail.com>> 於 2021年4月1日週四 下午4:11寫道: Thank you Mark for your suggestion. It sounds good and I gave it a try. However, this leads to another question that may sound dumb. I have the atom indices of a fragment. For example, the fragment comes from atoms [3,4,5,9,10,11,14] of the original molecule. How can I extract this fragment from the molecule? I tried (1) using EditableMol and deleting atoms one by one using RemoveAtom. But this does not work since the atom numbering changes after each deletion. (2) going through FragmentOnBonds. But the output of FragmentOnBonds have the atom indices reshuffled so I cannot directly use my index list to fish out my fragment. I did manage to achieve what I want by going through Chem.MolFragmentToSmiles and then convert the Smiles back to a Mol. But is there a neater way? Basically, is there a Chem.MolFragmentToMol function? Thank you again. Ling Mark Mackey mailto:m...@cresset-group.com>> 於 2021年4月1日週四 上午1:58寫道: Hi Ling, Having done something similar (but not in RDKit), I would suggest a different algorithm. I think that fragmenting the molecule first and then stitching the bits together is always going to be very complicated. Instead, just fragment the molecule in the ways that you want: - Find the set B of all breakable bonds according to your rules. I’m assuming here that B contains only acyclic bonds. - To get all of the pairwise pieces, for each element b of B break all bonds in B _except_ b. Keep the fragment containing b, and clean up. - To get all of the triplets, for each tuple (b1, b2) in B, break all bonds in B except b1 and b2. Keep the fragment containing b1 only if it also contains b2. Regards, Mark ** *-- * *Mark Mackey* *Chief Scientific Officer* *Cresset* New Cambridge House, Bassingbourn Road, Litlington, Cambridgeshire, SG8 0SS, UK tel: +44 (0)1223 858890mobile: +44 (0)7595 099165fax: +44 (0)1223 853667 email:_m...@cresset-group.com <mailto:m...@cresset-group.com>_web:www.cresset-group.com <https://urldefense.com/v3/__http://www.cresset-group.com/__;!!LQC6Cpwp!_TcJCf-sigb7LDzkbKrsrFcW9tvIZeaipJYFoxrlJcJu_i7ISOrtBw_r-FiTOxLSLc3HOA$> skype: mark_cresset *From:*Ling Chan mailto:lingtrek...@gmail.com>> *Sent:* 31 March 2021 20:56 *To:* RDKit mailto:rdkit-discuss@lists.sourceforge.net>> *Subject:* [Rdkit-discuss] rejoining pairs of fragments after fragmenting a molecule Dear Colleagues, I am trying to do something that I think
Re: [Rdkit-discuss] ConstrainedEmbed issue
Thank you, Sereina. I understand importance of addition of hydrogens to get a reasonable 3D coordinates. But the situation may be not that simple. 1. Addition of hydrogen is only required for custom coordinates supplied from an external file. If coordinates of a template is generated with rdkit embedding it works without addition of explicit hydrogens. 2. I found an opposite example where addition of hydrogens breaks constrained embedding if custom coordinates of a template is used. And again if I generate coordinates of a template by rdkit everything is OK without addition of Hs. These suggest that there is some issue with custom coordinates usage for constrained embedding. I provided the code and output below. Code: data = [('1.mol', 'C[C@@H]1C1=O', 'C[C@@H]1CC[C@H](O)CC1=O'), ('2.mol', '[C@@H](CCC)NC(=O)c1ccc(F)cc1', '[C@H](CCC[C@@H](CCC)NC(=O)c1ccc(F)cc1)NC(=O)c1ccco1')] for i, (mol_fname, smi_template, smi_child) in enumerate(data): print('iteration', i) mode = 'read template mol file, no AddHs' print(mode) mol_template = Chem.MolFromMolFile(mol_fname) mol_child = Chem.MolFromSmiles(smi_child) try: mol = AllChem.ConstrainedEmbed(mol_child, mol_template) print(mol.GetProp('EmbedRMS')) except ValueError as e: print(e) mode = 'read template mol file, AddHs' print(mode) mol_template = Chem.MolFromMolFile(mol_fname) mol_child = Chem.MolFromSmiles(smi_child) try: mol = AllChem.ConstrainedEmbed(Chem.AddHs(mol_child), mol_template) print(mol.GetProp('EmbedRMS')) except ValueError as e: print(e) mode = 'embed template mol in rdkit, no AddHs' print(mode) mol_template = Chem.MolFromSmiles(smi_template) AllChem.EmbedMolecule(mol_template) mol_child = Chem.MolFromSmiles(smi_child) try: mol = AllChem.ConstrainedEmbed(mol_child, mol_template) print(mol.GetProp('EmbedRMS')) except ValueError as e: print(e) Output: iteration 0 read template mol file, no AddHs Could not embed molecule. read template mol file, AddHs 0.05014807519735495 embed template mol in rdkit, no AddHs 0.12358989886023371 iteration 1 read template mol file, no AddHs 0.057937898735270194 read template mol file, AddHs Could not embed molecule. # <-- here rdkit spends a lot of time but fails embed template mol in rdkit, no AddHs 0.1012757033705761 Pavel. On 07/07/2020 21:41, Sunhwan Jo wrote: Makes sense :) On Jul 7, 2020, at 12:35 PM, Sereina Riniker mailto:sereina.rini...@gmail.com>> wrote: Dear Pavel and Sunhwan, Please note that hydrogens should always be added for the embedding algorithm to work properly (i.e. it’s not a walk around but what should be done). See also Section “Working with 3D Molecules” in https://www.rdkit.org/docs/GettingStartedInPython.html Best regards, Sereina On 7 Jul 2020, at 21:26, Sunhwan Jo <mailto:sunhw...@gmail.com>> wrote: The reason constraint embed didn’t work is the molecule simply can’t be embedded using the rdkit’s algorithm. In [25]: mol_child = Chem.MolFromSmiles('C[C@@H]1CC[C@H](O)CC1=O') In [26]: AllChem.EmbedMolecule(mol_child) Out[26]: -1 See more discussion here: https://github.com/rdkit/rdkit/issues/2996 The SMILES you posted looks valid to me and doesn’t look that complicated, but the anyway I think somehow the RDKit’s algorithm tripped up and couldn’t finish embedding without some help. Hope someone with more in-depth insight can help here. Anyway, for a walk around, adding H seems to do the trick: In [39]: mol = AllChem.AddHs(mol_child) In [40]: AllChem.EmbedMolecule(mol) Out[40]: 0 # worked In [41]: AllChem.ConstrainedEmbed(mol, mol_parent) Out[41]: # also worked Sunhwan On Jul 7, 2020, at 12:36 AM, Pavel Polishchuk mailto:pavel_polishc...@ukr.net>> wrote: Hi all, I have an issue with ConstrainedEmbed and I cannot figure out what exactly causes this. I have a molecule C[C@@H]1C1=O with 3D coordinates in 1.mol file (attached). And I want to generate coordinates for another structure with this core - C[C@@H]1CC[C@H](O)CC1=O. This is usual way which causes issue with embedding and the corresponding error. mol_parent = Chem.MolFromMolFile('1.mol') mol_child = Chem.MolFromSmiles('C[C@@H]1CC[C@H](O)CC1=O') try: mol = AllChem.ConstrainedEmbed(mol_child, mol_parent) except ValueError as e: print(e) If I add explicit hydrogens the issue disappears. mol_parent = Chem.MolFromMolFile('1.mol') mol_child = Chem.MolFromSmiles('C[C@@H]1CC[C@H](O)CC1=O') mol = AllChem.ConstrainedEmbed(Chem.AddHs(mol_child), mol_parent) If I do not use pre-defined coordinates - everything works well. mol_parent = Chem.MolFromSmiles('C[C@@H]1C1=O') AllChem.EmbedMolecule(mol_parent) mol_child = Chem.MolFromSmiles('C[C@@H]1CC[C@H](O)CC1=O') mol = AllChem.ConstrainedEmbed(mol_child, mol_parent) Does ugly coordin
[Rdkit-discuss] ConstrainedEmbed issue
Hi all, I have an issue with ConstrainedEmbed and I cannot figure out what exactly causes this. I have a molecule C[C@@H]1C1=O with 3D coordinates in 1.mol file (attached). And I want to generate coordinates for another structure with this core - C[C@@H]1CC[C@H](O)CC1=O. This is usual way which causes issue with embedding and the corresponding error. mol_parent = Chem.MolFromMolFile('1.mol') mol_child = Chem.MolFromSmiles('C[C@@H]1CC[C@H](O)CC1=O') try: mol = AllChem.ConstrainedEmbed(mol_child, mol_parent) except ValueError as e: print(e) If I add explicit hydrogens the issue disappears. mol_parent = Chem.MolFromMolFile('1.mol') mol_child = Chem.MolFromSmiles('C[C@@H]1CC[C@H](O)CC1=O') mol = AllChem.ConstrainedEmbed(Chem.AddHs(mol_child), mol_parent) If I do not use pre-defined coordinates - everything works well. mol_parent = Chem.MolFromSmiles('C[C@@H]1C1=O') AllChem.EmbedMolecule(mol_parent) mol_child = Chem.MolFromSmiles('C[C@@H]1CC[C@H](O)CC1=O') mol = AllChem.ConstrainedEmbed(mol_child, mol_parent) Does ugly coordinates in 1.mol file cause the embedding issue? Or the issue is caused by some implicit properties of a molecule? How to solve this properly? Kind regards, Pavel. 1.mol Description: MOL mdl chemical test ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] conformations for cycles with multiple stereocenters
Hello, I want to generate conformers for a stereoisomeric sugar moiety. The code below works (loads proccesor) but returns none of them. But if I remove all stereoconfiguration info in input SMILES the code generates conformers. Playing with this issue I noticed that conformers are generated if only one stereocenter is defined. If I define configuration of two centers the code stops working. I'm puzzled. Is there a solution or workaround? I use rdkit 2019.09.1 mol = Chem.MolFromSmiles('OC[C@H]1O[C@H](O)C[C@@H](O)[C@H]1O') AllChem.EmbedMultipleConfs(mol, 50, Chem.rdDistGeom.ETKDGv2()) mol.GetConformers() Kind regards, Pavel. ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] 5th in silico drug design workshop in Olomouc
Sorry for the mistake, the dates are 3-7 of *February* 2020. Pavel. On 20/11/2019 09:39, Pavel Polishchuk wrote: Dear colleagues, we would like to invite you on the 5th Drug Design workshop which will be held 3-7 January 2020 in Olomouc (Czech Republic). It is focused on practical applications of different chemoinformatic tools and approaches for drug development. This might be interesting for bachelor, master and PhD students to broaden their experience and sharpen skills. During the workshop, students will learn pharmacophore and QSAR modeling, molecular docking, PDBe services. A competition will be organized at the last day of the workshop where participants will be able to apply acquired knowledge to solve a real chemoinformatic task and win prizes. This year we will organize a poster session for participants. https://fch.upol.cz/en/5add/ Please feel free to share this information to those who can be interested in participation in such event. Thank you! Kind regards, Pavel. -- Dr. Pavel Polishchuk senior researcher Institute of Molecular and Translational Medicine Faculty of Medicine and Dentistry Palacky University Hněvotínská 1333/5 779 00 Olomouc Czech Republic +420 585632298 ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] 5th in silico drug design workshop in Olomouc
Dear colleagues, we would like to invite you on the 5th Drug Design workshop which will be held 3-7 January 2020 in Olomouc (Czech Republic). It is focused on practical applications of different chemoinformatic tools and approaches for drug development. This might be interesting for bachelor, master and PhD students to broaden their experience and sharpen skills. During the workshop, students will learn pharmacophore and QSAR modeling, molecular docking, PDBe services. A competition will be organized at the last day of the workshop where participants will be able to apply acquired knowledge to solve a real chemoinformatic task and win prizes. This year we will organize a poster session for participants. https://fch.upol.cz/en/5add/ Please feel free to share this information to those who can be interested in participation in such event. Thank you! Kind regards, Pavel. -- Dr. Pavel Polishchuk senior researcher Institute of Molecular and Translational Medicine Faculty of Medicine and Dentistry Palacky University Hněvotínská 1333/5 779 00 Olomouc Czech Republic +420 585632298 ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] add hydrogen after its removal
Thank you, Paolo! That was not obvious for me how RDKit manage hydrogens in this case. Pavel. On 22/03/2019 12:15, Paolo Tosco wrote: Hi Pavel, After you have first called AddHs(), all hydrogens in your molecule are now in the molecule graph as real atoms, and there are no more implicit/explicit Hs . Therefore, when you call RemoveAtom(), you are removing a real atom from the molecule graph, and the implicit/explicit H count stays 0 for the parent heavy atom. Hence calling again AddHs() does not add any hydrogens. If you wish the hydrogen to come back when you call AddHs(), you need to increase the explicit H count on the parent heavy atom: print(' load mol ') m = Chem.MolFromSmiles('c1c1O') print(Chem.MolToSmiles(m, allHsExplicit=True)) print(' add Hs ') m = Chem.AddHs(m) print(Chem.MolToSmiles(m, allHsExplicit=True)) print(' remove H ') nbrs = m.GetAtomWithIdx(7).GetNeighbors() if (len(nbrs) == 1 and nbrs[0].GetAtomicNum() > 1): nbrs[0].SetNumExplicitHs(nbrs[0].GetNumExplicitHs() + 1) em = Chem.EditableMol(m) em.RemoveAtom(7) print(Chem.MolToSmiles(em.GetMol(), allHsExplicit=True)) print(' add Hs ') mm = Chem.AddHs(em.GetMol()) print(Chem.MolToSmiles(mm, allHsExplicit=True)) load mol [OH][c]1[cH][cH][cH][cH][cH]1 add Hs [H][O][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]1[H] remove H [H][O][c]1[cH][c]([H])[c]([H])[c]([H])[c]1[H] add Hs [H][O][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]1[H] HTH, cheers, p. On 03/22/19 10:20, Pavel wrote: Hello, I encountered with an issue which I cannot understand and solve. This might be a bug or a feature. After removal of some specific hydrogens I could not add them back. Is it expected behavior or should I create an issue on github? print(' load mol ') m = Chem.MolFromSmiles('c1c1O') print(Chem.MolToSmiles(m, allHsExplicit=True)) print(' add Hs ') m = Chem.AddHs(m) print(Chem.MolToSmiles(m, allHsExplicit=True)) print(' remove H ') em = Chem.EditableMol(m) em.RemoveAtom(7) print(Chem.MolToSmiles(em.GetMol(), allHsExplicit=True)) print(' add Hs ') mm = Chem.AddHs(em.GetMol()) print(Chem.MolToSmiles(mm, allHsExplicit=True)) print(' update property cache ') mm.UpdatePropertyCache() print(Chem.MolToSmiles(mm, allHsExplicit=True)) output: load mol [OH][c]1[cH][cH][cH][cH][cH]1 add Hs [H][O][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]1[H] remove H [H][O][c]1[c][c]([H])[c]([H])[c]([H])[c]1[H] add Hs [H][O][c]1[c][c]([H])[c]([H])[c]([H])[c]1[H] update property cache [H][O][c]1[c][c]([H])[c]([H])[c]([H])[c]1[H] Pavel. ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] add hydrogen after its removal
Hello, I encountered with an issue which I cannot understand and solve. This might be a bug or a feature. After removal of some specific hydrogens I could not add them back. Is it expected behavior or should I create an issue on github? print(' load mol ') m = Chem.MolFromSmiles('c1c1O') print(Chem.MolToSmiles(m, allHsExplicit=True)) print(' add Hs ') m = Chem.AddHs(m) print(Chem.MolToSmiles(m, allHsExplicit=True)) print(' remove H ') em = Chem.EditableMol(m) em.RemoveAtom(7) print(Chem.MolToSmiles(em.GetMol(), allHsExplicit=True)) print(' add Hs ') mm = Chem.AddHs(em.GetMol()) print(Chem.MolToSmiles(mm, allHsExplicit=True)) print(' update property cache ') mm.UpdatePropertyCache() print(Chem.MolToSmiles(mm, allHsExplicit=True)) output: load mol [OH][c]1[cH][cH][cH][cH][cH]1 add Hs [H][O][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]1[H] remove H [H][O][c]1[c][c]([H])[c]([H])[c]([H])[c]1[H] add Hs [H][O][c]1[c][c]([H])[c]([H])[c]([H])[c]1[H] update property cache [H][O][c]1[c][c]([H])[c]([H])[c]([H])[c]1[H] Pavel. ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] 3D pharmacophore-based screening
Hi Pawan, I can suggest to use our developed module for 3D pharmacophore representation: https://github.com/DrrDom/pmapper I added an example to answer your particular question: https://github.com/DrrDom/pmapper/blob/master/examples/screen_example.ipynb Hope this will help, Pavel. On 20/02/2019 07:56, Greg Landrum wrote: Hi Pawan, There are a lot of different variables here that could change how you approach the problem, so it's difficult to give a general answer. Nik Stiefl did a tutorial a couple of years ago that might be helpful to you: https://github.com/rdkit/UGM_2016/blob/master/Notebooks/Stiefl_RDKitPh4FullPublication.ipynb -greg On Tue, Feb 19, 2019 at 2:50 PM PAWAN KUMAR <mailto:pawan12_...@jnu.ac.in>> wrote: I am new to rdkit package and want to use the rdkit functionality to screen the chemical database using 3D pharmacophore features. I have pharmacophore features with inter-feature distances and coordinate information. Will you please provide me some demo example regarding that so that I can use this for my set of compounds. -- ~thanks and regard PAWAN KUMAR ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net <mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] 4th advanced in silico drug design workshop in Olomouc
Dear colleagues, we would like to invite you on the 4th Drug Design workshop which will be held 21-25 January 2019 in Olomouc (Czech Republic). It is focused on practical applications of different chemoinformatic tools and approaches for drug development. This might be interesting for bachelor, master and PhD students to broaden their experience and sharpen skills. Basic Python skills would be desirable for QSAR and deep learning tutorials. However, a brief introduction to Python will be given at the workshop. During the workshop, students will learn pharmacophore modeling with LigandScout, molecular docking with AutoDock Vina, QSAR modeling with RDKit, scikit-learn and SPCI, molecular dynamics and deep learning. A competition will be organized at the last day of the workshop where participants will be able to apply acquired knowledge to solve real chemoinformatic tasks and win prizes. http://fch.upol.cz/en/research/conferences-workshops/4add/ Please feel free to share this information to those who can be interested in participation in such event. Thank you. Kind regards, Pavel. -- Dr. Pavel Polishchuk senior researcher Institute of Molecular and Translational Medicine Faculty of Medicine and Dentistry Palacky University Hněvotínská 1333/5 779 00 Olomouc Czech Republic +420 585632298 ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Fingerprint collision and machine learning
Hi Michal, I think if you can provide several examples of structures having identical bitstrings this will help a lot to better understand the issue. Pavel. On 10/10/18 14:15, Michal Krompiec wrote: Hi Thomas, Radius 2, 2048 bits, 5200 data points. On Wed, 10 Oct 2018 at 13:13, Thomas Evangelidis <mailto:teva...@gmail.com>> wrote: What's your bitvector length and radius? How many training samples do you have? On Wed, 10 Oct 2018 at 13:51, Michal Krompiec mailto:michal.kromp...@gmail.com>> wrote: Hi all, I have a slightly off-topic question. I'm trying to train a neural network on a dataset of small molecules and their melting points. I did get a not-so-bad accuracy with Morgan fingerprints, but I've realised that regardless of FP radius and bitvector length, several dozen molecules have the same fingerprints but wildly different melting points. I am pretty sure this is a "solved problem" so I don't want to reinvent the wheel. What is the recommended/usual way of dealing with this? Thanks, Michal ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net <mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- == Dr Thomas Evangelidis Research Scientist IOCB - Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences <https://www.uochb.cz/web/structure/31.html?lang=en> Prague, Czech Republic & CEITEC - Central European Institute of Technology <https://www.ceitec.eu/> Brno, Czech Republic email: teva...@gmail.com <mailto:teva...@gmail.com> website: https://sites.google.com/site/thomasevangelidishomepage/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Dr. Pavel Polishchuk senior researcher Institute of Molecular and Translational Medicine Faculty of Medicine and Dentistry Palacky University Hněvotínská 1333/5 779 00 Olomouc Czech Republic +420 585632298 ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] similarity maps for regression models
Hello, I'm curious, would it be possible to use regression models for similarity maps drawing or only classification ones? If someone has a piece of code, I would appreciate this. Because I tried to modify the example given in the cookbook, but without success. Pavel. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] access C++ code from Python
Hi, there is a possibility to retrieveprincipal moments of inertia which are eigenvalues of an inertia tensor. I would be interested to retrieve also eigenvectors of a tensor. However it seems that it is not accessible from Python. I found computePrincipalAxesAndMoments in MolTransforms.cpp (https://github.com/rdkit/rdkit/blob/6655a694f516d9dc30da835c26c70450e1e06fb2/Code/GraphMol/MolTransforms/MolTransforms.cpp#L149) which returns what I want but how to access it? Is it possible to do in an easy way or in a hard way at least? # get principal moment of inertia m = Chem.MolFromSmiles('F[C@@H](Cl)Br') m = Chem.AddHs(m) AllChem.EmbedMolecule(m) AllChem.MMFFOptimizeMolecule(m, confId=0) Descriptors3D.PMI1(m, 0) Pavel. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] canonical SMILES of a fragment
Thanks Greg! I found an alternative solution which is also no so straightforward. I set an isotope label to aromatic atoms, generate isomeric SMILES and make regex replacement. But your suggestion to set remove hydrogens is important, since this can cause other ambiguity. import re m = RWMol() for i in range(3): a = Atom(6) a.SetNoImplicit(True) # remove implicit Hs m.AddAtom(a) a = Atom(0) m.AddAtom(a) m.GetAtomWithIdx(0).SetIsAromatic(True) # set aromatic m.GetAtomWithIdx(0).SetIsotope(42) # set isotope m.GetAtomWithIdx(3).SetAtomMapNum(1) m.AddBond(0, 1, Chem.rdchem.BondType.SINGLE) m.AddBond(1, 2, Chem.rdchem.BondType.SINGLE) m.AddBond(1, 3, Chem.rdchem.BondType.SINGLE) s = Chem.MolToSmiles(m, isomericSmiles=True) re.sub('\[[0-9]+([a-z]+)H?[0-9]?\]', '\\1', s) # remove isotope in output SMILES OUTPUT: 'CC(c)[*:1]' Pavel. On 08/02/2017 06:24 AM, Greg Landrum wrote: Hi Pavel, It is, unfortunately, not that easy. The canonicalization algorithm does not use atomic aromaticity when determining atom ordering, so as far as it is concerned there is no difference between atoms 0 and 2 in either of your examples. What does get used is the number of hydrogens, so you need to use that in order to get the results you are looking for.[1] For technical reasons, you also need to tell the RDKit that the atoms should not have implicit Hs attached. Here's a gist that works for me: https://gist.github.com/greglandrum/f4e2f2f2ad311560d8ab36874d503843 Two notes: 1) I don't set the number of Hs on atom 1 in that gist, but I would suggest doing that too. 2) If atoms 0 and 2 have the same number of Hs attached, this still is not going to work if you're building things from fragments. The canonicalization code was not really designed to be used in situations like this. -greg [1] The details of the canonicalization algorithm, including the contents of the atom invariants, are described here: http://dx.doi.org/10.1021/acs.jcim.5b00543 On Tue, Aug 1, 2017 at 2:53 PM, Pavel Polishchuk <pavel_polishc...@ukr.net <mailto:pavel_polishc...@ukr.net>> wrote: Hi all, canonicalization of fragment SMILES does not work properly. Below there are two examples of identical fragments. The only difference is the order of atoms (indices). However, it seems that RDKit canonicalization does not take into account atom types. Does someone have an idea how to solve this issue with small losses? #1 === m = RWMol() for i in range(3): a = Atom(6) m.AddAtom(a) a = Atom(0) m.AddAtom(a) m.GetAtomWithIdx(0).SetIsAromatic(True) # set atom 0 as aromatic m.GetAtomWithIdx(3).SetAtomMapNum(1) m.AddBond(0, 1, Chem.rdchem.BondType.SINGLE) m.AddBond(1, 2, Chem.rdchem.BondType.SINGLE) m.AddBond(1, 3, Chem.rdchem.BondType.SINGLE) Chem.MolToSmiles(m) OUTPUT: 'cC(C)[*:1]' #2 === m2 = RWMol() for i in range(3): a = Atom(6) m2.AddAtom(a) a = Atom(0) m2.AddAtom(a) m2.GetAtomWithIdx(2).SetIsAromatic(True) # set atom 2 as aromatic m2.GetAtomWithIdx(3).SetAtomMapNum(1) m2.AddBond(0, 1, Chem.rdchem.BondType.SINGLE) m2.AddBond(1, 2, Chem.rdchem.BondType.SINGLE) m2.AddBond(1, 3, Chem.rdchem.BondType.SINGLE) Chem.MolToSmiles(m2) OUTPUT: 'CC(c)[*:1]' Pavel. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net <mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss> -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] canonical SMILES of a fragment
Hi all, canonicalization of fragment SMILES does not work properly. Below there are two examples of identical fragments. The only difference is the order of atoms (indices). However, it seems that RDKit canonicalization does not take into account atom types. Does someone have an idea how to solve this issue with small losses? #1 === m = RWMol() for i in range(3): a = Atom(6) m.AddAtom(a) a = Atom(0) m.AddAtom(a) m.GetAtomWithIdx(0).SetIsAromatic(True) # set atom 0 as aromatic m.GetAtomWithIdx(3).SetAtomMapNum(1) m.AddBond(0, 1, Chem.rdchem.BondType.SINGLE) m.AddBond(1, 2, Chem.rdchem.BondType.SINGLE) m.AddBond(1, 3, Chem.rdchem.BondType.SINGLE) Chem.MolToSmiles(m) OUTPUT: 'cC(C)[*:1]' #2 === m2 = RWMol() for i in range(3): a = Atom(6) m2.AddAtom(a) a = Atom(0) m2.AddAtom(a) m2.GetAtomWithIdx(2).SetIsAromatic(True) # set atom 2 as aromatic m2.GetAtomWithIdx(3).SetAtomMapNum(1) m2.AddBond(0, 1, Chem.rdchem.BondType.SINGLE) m2.AddBond(1, 2, Chem.rdchem.BondType.SINGLE) m2.AddBond(1, 3, Chem.rdchem.BondType.SINGLE) Chem.MolToSmiles(m2) OUTPUT: 'CC(c)[*:1]' Pavel. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Using RDKit in PyCharm and Anaconda on Windows
I had some issues to run rdkit from Python console in PyCharm (4.5.5) on Linux. After recent installation of PyCharm 2017.1.3 it started to work. Maybe updating PyCharm will help on Win as well. Pavel. On 05/30/2017 10:10 PM, West, Richard wrote: We're having trouble getting RDKit to work in a PyCharm project using an Anaconda interpreter (Python 2.7), on Windows 8.1. Has anyone had success with this and can guide us? The trouble is we get an ImportError: DLL load failed: The specified module could not be found. when trying to import rdkit (or rdBase). We have tried many variations of the following, but here is a basic recipe of what does/doesn't work: 1. Make a new conda environment (called 'eg1'), install rdkit ('conda install -c rdkit rdkit') 2. From a cmd.exe prompt, use this environment ('activate eg1') load python ('python') and import rdkit ('import rdkit') it works fine. 3. From PyCharm, create a Project Interpreter (pointing to 'C:\Anaconda2\envs\eg1\python.exe'), and use this to run a script or create a new Python Console in which you 'import rdkit', leading to the "DLL load failed" message. 4. We have tried manually adding a bunch of things to the "Interpreter Paths" in PyCharm, but without success (perhaps we just didn't add the right thing). Update: just before I hit "send" on this request for help, we stumbled across this posting of the same problem, and solution, from Christian Ribeaud: https://intellij-support.jetbrains.com/hc/en-us/community/posts/115000244450-DLL-load-failed It seems that if we open cmd.exe, activate the environment, and then launch PyCharm exe from there, it works. I'm sharing this here because it took us a while to find the other post, but also to ask: is there a "better" way? Cheers, Richard -- Richard H. West, Ph.D. Assistant Professor, Department of Chemical Engineering, Northeastern University, 360 Huntington Ave, Boston, MA 02115 http://northeastern.edu/comochengPhone: 617-373-5163 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] RemoveHs and [H][*:1] mol
Hello, probably this is a message mainly to developers. I discovered some strange behavior of removeHs function applied for '[H][*:1]' molecule. If I create that mol from smiles, RemoveHs does not remove a single H from the mol: mmm = Chem.MolFromSmiles('[H][*:1]') Chem.MolToSmiles(Chem.RemoveHs(mmm)) output: '[H][*:1]' If I apply RemoveHs to fragments obtained after MMPA cuts it removes H and keeps only '[*:1]' mmm = Chem.MolFromSmiles('c1c1C') mmm = Chem.AddHs(mmm) fr = rdMMPA.FragmentMol(mmm, pattern="[*]!@!=!#[!#1]", maxCuts=1, resultsAsMols=True, maxCutBonds=30) for f in fr: ff = Chem.GetMolFrags(f[1], asMols=True) print(Chem.MolToSmiles(ff[0]), Chem.MolToSmiles(Chem.RemoveHs(ff[0]))) print(Chem.MolToSmiles(ff[1]), Chem.MolToSmiles(Chem.RemoveHs(ff[1]))) output: [H]c1c([H])c([H])c([*:1])c([H])c1[H] c1ccc([*:1])cc1 [H]C([H])([H])[*:1] C[*:1] [H]c1c([H])c([H])c([*:1])c(C([H])([H])[H])c1[H] Cc1c1[*:1] [H][*:1] [*:1] [H]c1c([H])c(C([H])([H])[H])c([H])c([*:1])c1[H] Cc1([*:1])c1 [H][*:1] [*:1] [H]c1c([H])c([*:1])c([H])c([H])c1C([H])([H])[H] Cc1ccc([*:1])cc1 [H][*:1] [*:1] [H]c1c([H])c([H])c(C([H])([H])[*:1])c([H])c1[H] c1ccc(C[*:1])cc1 [H][*:1] [*:1] If this is a bug I can create an issue on github Pavel. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Redundant hydrogen atoms
I do not know the valence model of RDkit. Therefore below there is an ugly but working solution: set to the S+ atom NoImplicit. mol = Chem.MolFromSmiles('CC(C)(C)SC') rxn = AllChem.ReactionFromSmarts('[CH0:1][S:2][CH3:3]>>[C:1][SH0+:2].[CH3-:3]') ps = rxn.RunReactants((mol,)) print(Chem.MolToSmiles(ps[0][0])) print(Chem.MolToSmiles(ps[0][1])) for a in ps[0][0].GetAtoms(): if a.GetAtomicNum() == 16 and a.GetFormalCharge() == 1: a.SetNoImplicit(True) print(Chem.MolToSmiles(ps[0][0])) print(Chem.MolToSmiles(ps[0][1])) Pavel. On 05/28/2017 08:26 AM, Nitzan Tzanani wrote: mol = Chem.MolFromSmiles('CC(C)(C)SC') rxn = AllChem.ReactionFromSmarts('[CH0:1][S:2][CH3:3]>>[C:1][SH0+:2].[CH3-:3]') ps = rxn.RunReactants((mol,)) print Chem.MolToSmiles(ps[0][0]) print Chem.MolToSmiles(ps[0][1]) -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] canonical smiles for fragments with map numbers
Thank you, Brian! Actually what I expected as output: S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2] S=c1c([*:1])c(Cl)[nH]c([*:2])c1[*:3] S=c1c([*:2])c(Cl)[nH]c([*:1])c1[*:3] and so on You gave me the right direction. I can store old-new maps in a dict and after relabeling and producing of canonical smiles it would be easy to relabel attachment points back. Thank you again! Pavel. On 05/27/2017 03:03 PM, Brian Kelley wrote: Pavel, this isn't exactly trivial so I went ahead and made an example. The basics are that atomMaps are canonicalized, i.e. their value is used in the generation of smiles. To solve this problem: 1) backup the atom maps and remove them 2) canonicalize *without* atom maps but figure out the order in which the atoms in the molecule are output 3) using the atom output order, relabel the atom maps based on output order. That's a mouthful, but here's some code that should do the trick: from rdkit import Chem smi = ["ClC1=C([*:1])C(=S)C([*:2])=C([*:3])N1", "ClC1=C([*:1])C(=S)C([*:3])=C([*:2])N1", "ClC1=C([*:2])C(=S)C([*:1])=C([*:3])N1", "ClC1=C([*:2])C(=S)C([*:3])=C([*:1])N1", "ClC1=C([*:3])C(=S)C([*:1])=C([*:2])N1", "ClC1=C([*:3])C(=S)C([*:2])=C([*:1])N1"] def CanonicalizeMaps(m, *a, **kw): # atom maps are canonicalized, so rename them # figure out where they would have gone # and relabel from 1...N based on output order atomMap = "molAtomMapNumber" backupAtomMap = "oldMolAtomMapNumber" for atom in m.GetAtoms(): if atom.HasProp(atomMap): atomNum = atom.GetProp(atomMap) atom.SetProp(backupAtomMap, atomNum) atom.ClearProp(atomMap) # canonicalize smi = Chem.MolToSmiles(m, *a, **kw) # where did the atoms end up in the output string? atoms = [(pos, atom_idx) for atom_idx, pos in enumerate( eval(m.GetProp("_smilesAtomOutputOrder")))] atommap = 1 atoms.sort() # set the new atommap based on output position for pos, atom_idx in atoms: atom = m.GetAtomWithIdx(atom_idx) if atom.HasProp(backupAtomMap): atom.SetProp(atomMap, str(atommap)) atommap +=1 return Chem.MolToSmiles(m) for s in smi: m = Chem.MolFromSmiles(s) print CanonicalizeMaps(m,True) Output: S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2] S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2] S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2] S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2] S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2] S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2] Now, if you want the atomMaps in 1...2...3 output order, we could do that as well, but it is even trickier. Enjoy, Brian On Sat, May 27, 2017 at 8:36 AM, Pavel Polishchuk <pavel_polishc...@ukr.net <mailto:pavel_polishc...@ukr.net>> wrote: Hi, I cannot solve an issue and would like to ask for an advice. If there are different map numbers for attachment points for the same fragment different canonical smiles are generated. I observed such behavior only for fragments with 3 attachment points. Below is an example. I'm looking for a solution/workaround how to produce the "same" smiles strings irrespectively of mapping that after removal of map numbers smiles will become identical. Any advice would be appreciated. smi = ["ClC1=C([*:1])C(=S)C([*:2])=C([*:3])N1", "ClC1=C([*:1])C(=S)C([*:3])=C([*:2])N1", "ClC1=C([*:2])C(=S)C([*:1])=C([*:3])N1", "ClC1=C([*:2])C(=S)C([*:3])=C([*:1])N1", "ClC1=C([*:3])C(=S)C([*:1])=C([*:2])N1", "ClC1=C([*:3])C(=S)C([*:2])=C([*:1])N1"] for s in smi: print(Chem.MolToSmiles(Chem.MolFromSmiles(s))) output: S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2] S=c1c([*:1])c(Cl)[nH]c([*:2])c1[*:3] S=c1c([*:1])c([*:3])[nH]c(Cl)c1[*:2] S=c1c([*:2])c(Cl)[nH]c([*:1])c1[*:3] S=c1c([*:1])c([*:2])[nH]c(Cl)c1[*:3] S=c1c([*:2])c([*:1])[nH]c(Cl)c1[*:3] Kind regards, Pavel. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net <mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss> -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] custom fingerprints in PostgreSQL
Hello, is it possible to store custom fingerprints in psql DB and use them for similarity search? And how to do this? I foundtwo commands bfp_to_binary_text(bfp) and bfp_from_binary_text(bytea)in RDKit cartridge but cannot understand how to use them. I want to store pharmacophore fingerprints. There is no a built-in command in RDKit cartridge to calculate them so I have to calculate them in a Pythonscript. Then I need to store them in psql DB and create similarity search index but I could not find a solution yet. It might be of general interest how to store and use arbitrary fingerprints in DB. An example of pharmacophore FP generation: from rdkit import Chem from rdkit import RDConfig from rdkit.Chem.Pharm2D.SigFactory import SigFactory from rdkit.Chem.Pharm2D import Generate import os fdefName = os.path.join(RDConfig.RDDataDir,'BaseFeatures.fdef') factory = ChemicalFeatures.BuildFeatureFactory(fdefName) sigFactory = SigFactory(factory, minPointCount=2, maxPointCount=3, trianglePruneBins=False) sigFactory.SetBins([(0,2),(2,5),(5,8)]) sigFactory.Init() mol = Chem.MolFromSmiles('Cc1nc(CN(C)c2ncnc3ccc(-c4ccc5c(c4)OCO5)cc23)cs1') fp = Generate.Gen2DFingerprint(mol, sigFactory) Pavel. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Pharm2D.SigFactory and psql issues
Hello, I found two issues which are reproducible in 2016.09 and 2017.3 conda builds. 0. The first issue is related to PostgreSQL. If I try to install it from conda it asks me to downgrade rdkit to 2016.3. After downgrading of rdkit psql works well. 1. Generation of 2D pharmacophore fingerprints using default parameters returns errors for some molecules. However if I add trianglePruneBins=False option to initialization procedure of a signature factory it starts working. It seems that Gen2DFingerprint ignores trianglePruneBins parameter. Below is a code example: from rdkit import Chem from rdkit import RDConfig from rdkit.Chem.Pharm2D.SigFactory import SigFactory from rdkit.Chem.Pharm2D import Generate import os fdefName = os.path.join(RDConfig.RDDataDir,'BaseFeatures.fdef') factory = ChemicalFeatures.BuildFeatureFactory(fdefName) sigFactory = SigFactory(factory, minPointCount=2, maxPointCount=3, trianglePruneBins=True) # replace to False and the error will disappear(True is a default value) sigFactory.SetBins([(0,2),(2,5),(5,8)]) sigFactory.Init() mol = Chem.MolFromSmiles('Cc1nc(CN(C)c2ncnc3ccc(-c4ccc5c(c4)OCO5)cc23)cs1') fp = Generate.Gen2DFingerprint(mol, sigFactory) --- ValueErrorTraceback (most recent call last) /home/pavlo/anaconda3/envs/rdk_2017_3/lib/python3.6/site-packages/rdkit/Chem/Pharm2D/SigFactory.py in GetBitIdx(self, featIndices, dists, sortIndices) 249 print('\tbins:', repr(self._bins), type(self._bins)) --> 250 bin_ = self._findBinIdx(dists, self._bins, self._scaffolds[len(dists)]) 251 except ValueError: /home/pavlo/anaconda3/envs/rdk_2017_3/lib/python3.6/site-packages/rdkit/Chem/Pharm2D/SigFactory.py in _findBinIdx(self, dists, bins, scaffolds) 169 whichBins[i] = where --> 170 res = scaffolds.index(tuple(whichBins)) 171 if _verbose: ValueError: (2, 0, 0) is not in list During handling of the above exception, another exception occurred: IndexErrorTraceback (most recent call last) in () 1 # from rdkit.Chem.Pharm2D import Generate 2 mol = Chem.MolFromSmiles('Cc1nc(CN(C)c2ncnc3ccc(-c4ccc5c(c4)OCO5)cc23)cs1') > 3 fp = Generate.Gen2DFingerprint(mol, sigFactory) /home/pavlo/anaconda3/envs/rdk_2017_3/lib/python3.6/site-packages/rdkit/Chem/Pharm2D/Generate.py in Gen2DFingerprint(mol, sigFactory, perms, dMat, bitInfo) 160 for match in matchesToMap: 161 if sigFactory.shortestPathsOnly: --> 162 idx = _ShortestPathsMatch(match, perm, sig, dMat, sigFactory) 163 if idx is not None and bitInfo is not None: 164 l = bitInfo.get(idx, []) /home/pavlo/anaconda3/envs/rdk_2017_3/lib/python3.6/site-packages/rdkit/Chem/Pharm2D/Generate.py in _ShortestPathsMatch(match, featureSet, sig, dMat, sigFactory) 71 dist[i] = d 72 ---> 73 idx = sigFactory.GetBitIdx(featureSet, dist, sortIndices=False) 74 if _verbose: 75 print('\t', dist, minD, maxD, idx) /home/pavlo/anaconda3/envs/rdk_2017_3/lib/python3.6/site-packages/rdkit/Chem/Pharm2D/SigFactory.py in GetBitIdx(self, featIndices, dists, sortIndices) 253 fams = [fams[x] for x in featIndices] 254 raise IndexError('distance bin not found: feats: %s; dists=%s; bins=%s; scaffolds: %s' % --> 255(fams, dists, self._bins, self._scaffolds)) 256 257 return startIdx + offset + bin_ IndexError: distance bin not found: feats: ['Acceptor', 'Aromatic', 'Aromatic']; dists=[5, 1, 1]; bins=[(0, 2), (2, 5), (5, 8)]; scaffolds: [0, [(0,), (1,), (2,)], 0, [(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (0, 1, 2), (0, 2, 1), (0, 2, 2), (1, 0, 0), (1, 0, 1), (1, 0, 2), (1, 1, 0), (1, 1, 1), (1, 1, 2), (1, 2, 0), (1, 2, 1), (1, 2, 2), (2, 0, 1), (2, 0, 2), (2, 1, 0), (2, 1, 1), (2, 1, 2), (2, 2, 0), (2, 2, 1), (2, 2, 2)], 0] Kind regards, Pavel. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] DeleteSubstructs vs ReplaceSubstructs
Hi Maxim, if you change your query to SMARTS it would be possible to delete what you want m=Chem.MolFromSmiles('C1(C2=NC=CC=C2)=CC=CC(C)=C1') ss = Chem.MolFromSmarts('c1c1C') frag = AllChem.DeleteSubstructs(m, ss) print(Chem.MolToSmiles(frag)) Pavel. On 03/31/2017 07:41 AM, Popov, Maxim (Ext) wrote: Dear RDKit users, I am trying to remove a common substructure from a number of molecules (with AllChem.DeleteSubstructs). My problem is best illustrated by this short code: fromrdkit importChem fromrdkit.Chem importAllChem m=Chem.MolFromSmiles(/'C1(C2=NC=CC=C2)=CC=CC(C)=C1'/) ss = Chem.MolFromSmiles(/'C1=CC=CC(C)=C1'/) hyd=Chem.MolFromSmiles(/'[H]'/) print(/"Substituting substructure with hydrogen"/) frags = AllChem.ReplaceSubstructs(m, ss,hyd) forfrag infrags: print(Chem.MolToSmiles(frag)) print(/"\nDeleting substructure"/) frag = AllChem.DeleteSubstructs(m, ss) print(Chem.MolToSmiles(frag)) I create a toluene connected to pyridine and try to remove toluene. When replacing toluene substructure with hydrogen (AllChem.ReplaceSubstructs), I receive two sets of results: pyridine (with explicit hydrogen) and single carbon plus single hydrogen plus aromatic open chain (what is left from pyridine after removing one ring atom). When deleting the toluene substructure (AllChem.DeleteSubstructs), I receive just the open chain of ex-pyridine (corresponding to second set of the ReplaceSubstructs results). Is there a way of directing DeleteSubstructs method to a specific variant (in this case, leaving pyridine as a ring seems to be logical). Best regards, Maxim -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] CanonicalRankAtoms
Hi, I experimented a little bit with CanonicalRankAtoms and observed some unexpected results. I have two mols (actually sets of fragments): C[*].n[*].C[*].N[*] CC[*].CC[*].cn([*])c.CN([*])C In the first case, pairs of carbons and nitrogens are recognized as symmetrical [0, 2, 0, 2] However, nitrogens are aliphatic and aromatic and I expected they will be different. In the second case, nitrogen already different, while carbons are still identical. That is expected. [0, 0, 3, 2] Is it a bug or a feature? Is there a way to solve this issue by RDKit machinery and how? Below is a reproducible example: b1 = b'\xef\xbe\xad\xde\x00\x00\x00\x00\x07\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x08\x00\x00\x00\x04\x00\x00\x00\x80\x01\x06\x00`\x00\x00\x00\x01\x03\x00$(\x00\x00\x00\x00\x01,\x01\x00\x00\x00*\x07@h\x00\x00\x00\x03\x01\x02\x00$(\x00\x00\x00\x00\x01,\x01\x00\x00\x00*\x06\x00`\x00\x00\x00\x01\x03\x00$(\x00\x00\x00\x00\x01,\x01\x00\x00\x00*\x07\x00`\x00\x00\x00\x01\x02\x00$(\x00\x00\x00\x00\x01,\x01\x00\x00\x00*\x0b\x00\x01\x00\x02\x03\x00\x04\x05\x00\x06\x07\x00\x14\x00\x17\x00\x00\x00\x00\x16' q1 = Chem.Mol(b1) b2 = b'\xef\xbe\xad\xde\x00\x00\x00\x00\x07\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x0e\x00\x00\x00\n\x00\x00\x00\x80\x01\x06\x00`\x00\x00\x00\x01\x03\x06\x00`\x00\x00\x00\x02\x02\x00$(\x00\x00\x00\x00\x01,\x01\x00\x00\x00*\x06\x00`\x00\x00\x00\x01\x03\x06\x00`\x00\x00\x00\x02\x02\x00$(\x00\x00\x00\x00\x01,\x01\x00\x00\x00*\x06@h\x00\x00\x00\x03\x02\x02\x07@(\x00\x00\x00\x03\x03\x00$(\x00\x00\x00\x00\x01,\x01\x00\x00\x00*\x06\x00`\x00\x00\x00\x01\x03\x07\x00 \x00\x00\x00\x03\x00$(\x00\x00\x00\x00\x01,\x01\x00\x00\x00*\x06@h\x00\x00\x00\x03\x02\x02\x06\x00`\x00\x00\x00\x01\x03\x0b\x00\x01\x00\x01\x02\x00\x03\x04\x00\x04\x05\x00\x06\x07h\x0c\x07\x08\x00\t\n\x00\n\x0b\x00\x07\x0ch\x0c\n\r\x00\x14\x00\x17\x00\x00\x00\x00\x16' q2 = Chem.Mol(b2) r1 = [r for a, r in zip(q1.GetAtoms(), Chem.CanonicalRankAtoms(q1, False, False, False)) if a.GetSymbol() == "*"] r2 = [r for a, r in zip(q2.GetAtoms(), Chem.CanonicalRankAtoms(q2, False, False, False)) if a.GetSymbol() == "*"] print(Chem.MolToSmiles(q1, canonical=False)) # to preserve the order of atoms and ranks print(r1) print(Chem.MolToSmiles(q2, canonical=False)) print(r2) the output: C[*].n[*].C[*].N[*] [0, 2, 0, 2] CC[*].CC[*].cn([*])c.CN([*])C [0, 0, 3, 2] Kind regards, Pavel. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] FindAtomEnvironmentOfRadiusN
I guessed why it occurred. I'm interested is it expected behavior? So, if a radius is greater than the number of available bonds in all directions from the rooted atom the function will return empty list as it considers that such environment does not exist. Is this a correct expectation? Pavel. On 03/27/2017 03:53 PM, Peter Gedeck wrote: Hello, The atom numbers start with 0. From the middle atom, there are no environments with radius 2. You will get a result if you use the first (=0) or the last (=2) atom. Try this: m = Chem.MolFromSmiles("NCO") i = Chem.FindAtomEnvironmentOfRadiusN(m, 1, 0) Chem.MolToSmiles(Chem.PathToSubmol(m, i)) i = Chem.FindAtomEnvironmentOfRadiusN(m, 2, 0) Chem.MolToSmiles(Chem.PathToSubmol(m, i)) i = Chem.FindAtomEnvironmentOfRadiusN(m, 3, 0) Chem.MolToSmiles(Chem.PathToSubmol(m, i)) and you will get: 'CN' 'NCO' '' Is this more intuitive to you? Best, Peter On Mon, Mar 27, 2017 at 9:35 AM Pavel Polishchuk <pavel_polishc...@ukr.net <mailto:pavel_polishc...@ukr.net>> wrote: Dear RDKitters, I found the issue with FindAtomEnvironmentOfRadiusN but this can be a feature. However, I did not findthis information in help and did not expect such behavior. If I apply FindAtomEnvironmentOfRadiusN function to a small molecule and specify the radius greater than the size of the molecule the function returns empty list of bond indices (and empty mol). m = Chem.MolFromSmiles("NCO") i = Chem.FindAtomEnvironmentOfRadiusN(m, 1, 1) Chem.MolToSmiles(Chem.PathToSubmol(m, i)) returns "NCO" i = Chem.FindAtomEnvironmentOfRadiusN(m, 2, 1) Chem.MolToSmiles(Chem.PathToSubmol(m, i)) returns "" In the latter case I expected the same output "NCO". Were my expectations mistaken? Kind regards, Pavel. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net <mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] FindAtomEnvironmentOfRadiusN
Dear RDKitters, I found the issue with FindAtomEnvironmentOfRadiusN but this can be a feature. However, I did not findthis information in help and did not expect such behavior. If I apply FindAtomEnvironmentOfRadiusN function to a small molecule and specify the radius greater than the size of the molecule the function returns empty list of bond indices (and empty mol). m = Chem.MolFromSmiles("NCO") i = Chem.FindAtomEnvironmentOfRadiusN(m, 1, 1) Chem.MolToSmiles(Chem.PathToSubmol(m, i)) returns "NCO" i = Chem.FindAtomEnvironmentOfRadiusN(m, 2, 1) Chem.MolToSmiles(Chem.PathToSubmol(m, i)) returns "" In the latter case I expected the same output "NCO". Were my expectations mistaken? Kind regards, Pavel. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] delete a substructure
You might find this link useful - http://www.rdkit.org/docs/GettingStartedInPython.html#chemical-transformations However, the issue in your case is SMARTS definitions. If one SMARTS completely covers another one it would be difficult to understand is it artifact or not.I think it might be reasonable to revise SMARTS to avoid such overlapping or create a list of rules (maybe hierarchical) which will define valid and not valid overlappings. Pavel. On 03/08/2017 06:32 PM, Chenyang Shi wrote: Dear Hongbin, I tried your method on a molecule, 4-Methylsalicylic acid (CC1=CC(=C(C=C1)C(=O)O)O). I looped through all groups defined in Joback method (using SMARTS), and used m.GetSubstructMatches to print out all atom positions. The result is summarized in the table. We can see there are duplicated counts--coming from COOH group. As suggested by Hongbin, we can remove duplicated atoms by looking at their positions--in this case, ((9),), ((7,8,),), ((7,),), and ((8,),) are subsets of ((7,8,9)) from -COOH. Indeed we can get rid of these duplicates. However, I also noticed that Atom (3,) from =C< (ring) group is also a part of -OH (phenol) ((10,3),). If we apply the same algorithm to remove duplicates, the =C<(ring) group will be only counted twice instead of three times. Greg, you mentioned as an alternative I can delete substructure using chemical reaction method. It would be greatly appreciated if you could show me (point me to) a simple example code, perhaps on a simple molecule? I find myself at a loss when browsing the manual. I would like to try also in that direction. Thanks, Chenyang Inline image 1 On Mon, Mar 6, 2017 at 1:52 AM, Greg Landrum <greg.land...@gmail.com <mailto:greg.land...@gmail.com>> wrote: The solution that Hongbin proposes to the double-counting problem is a good one. Just be sure to sort your substructure queries in the right order so that the more complex ones come first. Another thing you might think about is making your queries more specific. For example, as you pointed out "[OH]" is very general and matches parts of carboxylic acids and a number of other functional groups. The RDKit has a set of fairly well tested (though certainly not perfect) functional group definitions in $RDBASE/Data/Functional_Group_Hierarchy.txt. The alcohol definition from there looks like this: [O;H1;$(O-!@[#6;!$(C=!@[O,N,S])])] -greg On Mon, Mar 6, 2017 at 7:20 AM, 杨弘宾 <yanyangh...@163.com <mailto:yanyangh...@163.com>> wrote: Hi, Chenyang, You don't need to delete the substructure from the molecule. Just check whehter the mapped atoms have been matched. For example: m = Chem.MolFromSmiles('CC(=O)O') OH = Chem.MolFromSmarts('[OH]') COOH = Chem.MolFromSmarts('C(O)=O') m.GetSubstructMatches(OH) >>((3,),) m.GetSubstructMatchs(COOH) >>((1, 3, 2),) Since atom "3" has been already matched, it should be ignored. So you can create a "set" to record the matched atoms to avoid repetitive count. Hongbin Yang 杨弘宾 *From:* Chenyang Shi <mailto:cs3...@columbia.edu> *Date:* 2017-03-06 14:04 *To:* Greg Landrum <mailto:greg.land...@gmail.com> *CC:* RDKit Discuss <mailto:rdkit-discuss@lists.sourceforge.net> *Subject:* Re: [Rdkit-discuss] delete a substructure Hi Greg, Thanks for a prompt reply. I did try "GetSubstructMatches()" and it returns correct numbers of substructures for CH3COOH. The potential problem with this approach is that if the molecule is getting complicated, it will possibly generate duplicate numbers for certain functional groups. For example, --OH (alcohol) group will be likely also counted in --COOH. A safer way, in my mind, is to remove the substructure that has been counted. Greg, you mentioned "chemical reaction functionality", can you show me a demo script with that using CH3COOH as an example. I will definitely delve into the manual to learn more. But reading your code will be a good start. Thanks, Chenyang On Sun, Mar 5, 2017 at 10:15 PM, Greg Landrum <greg.land...@gmail.com <mailto:greg.land...@gmail.com>> wrote: Hi Chenyang, If you're really interested in counting the number of times the substructure appears, you can do that much quicker with `GetSubstructMatches()`: In [2]: m = Chem.MolFromSmiles('CC(C)CCO')
Re: [Rdkit-discuss] Question about generating configurational isomerism
Hi Jacob, you need to call AssignStereochemistry with force=True parameter Chem.AssignStereochemistry(mol, force=True) Pavel. On 01/28/2017 05:43 AM, Jacob Durrant wrote: I'm trying to set the configuration of a molecule with a double bond, but it doesn't seem to be working. Here's my code: === from rdkit.Chem import AllChem from rdkit import Chem from rdkit.Chem.rdchem import BondStereo # Make a molecule with a double bond, no stereo specified (cis or trans) smi = "NC(O)=C(C)S" mol = Chem.MolFromSmiles(smi) # Get that double bond double_bonds = [b.GetIdx() for b in mol.GetBonds() if b.GetBondTypeAsDouble() == 2 and b.GetStereo() is BondStereo.STEREONONE] double_bond_index = double_bonds[0] double_bond = mol.GetBondWithIdx(double_bond_index) # It's stereo is STEREONONE at this point print double_bond.GetStereo() # Get the bonds connected to the first and second atoms first_atom = double_bond.GetBeginAtom() bonds_connected_to_first_atom = first_atom.GetBonds() second_atom = double_bond.GetEndAtom() bonds_connected_to_second_atom = second_atom.GetBonds() # Get the indecies of those bonds, removing the central double bond. bonds_connected_to_first_atom_indecies = [b.GetIdx() for b in first_atom.GetBonds()] bonds_connected_to_first_atom_indecies.remove(double_bond_index) bonds_connected_to_second_atom_indecies = [b.GetIdx() for b in second_atom.GetBonds()] bonds_connected_to_second_atom_indecies.remove(double_bond_index) # Set the directions of the for bonds connected to this double bond. mol.GetBondWithIdx(bonds_connected_to_first_atom_indecies[0]).SetBondDir(Chem.BondDir.ENDUPRIGHT) mol.GetBondWithIdx(bonds_connected_to_first_atom_indecies[1]).SetBondDir(Chem.BondDir.ENDDOWNRIGHT) mol.GetBondWithIdx(bonds_connected_to_second_atom_indecies[0]).SetBondDir(Chem.BondDir.ENDUPRIGHT) mol.GetBondWithIdx(bonds_connected_to_second_atom_indecies[1]).SetBondDir(Chem.BondDir.ENDDOWNRIGHT) # I assume this should set the stereochemistry. But it doesn't appear to. # It's stereo is STEREONONE at this point print double_bond.GetStereo() Chem.AssignStereochemistry(mol) # Still STEREONONE print double_bond.GetStereo() # And when I print out the smiles, no stereochemistry is given. print Chem.MolToSmiles(mol, isomericSmiles=True, canonical=True) # CC(S)=C(N)O === Any suggestions? Thanks! ~Jacob -- Sent from my mobile. -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] multiline legend in MolsToGridImage
Thank you, that helped! I noticed that I need to substantially increase the height of the image to make the second line visible. If I'll want to draw more lined I should increase the height further. Pavel. On 12/20/2016 03:14 PM, Peter Gedeck wrote: Hello, I thought we had removed all of these by now. I'll open an issue and fix the code. Best, Peter On Tue, Dec 20, 2016 at 6:01 AM David Hall <li...@cowsandmilk.net <mailto:li...@cowsandmilk.net>> wrote: The replace and split methods were removed from the string module in python3. You can replace the code as follows: s = s.replace('\r\n', '\n') s = s.replace('\n\r', '\n') s = s.replace('\r', '\n') lines = s.split('\n') On Tue, Dec 20, 2016 at 2:45 AM, Pavel Polishchuk <pavel_polishc...@ukr.net <mailto:pavel_polishc...@ukr.net>> wrote: Hi, I try to print multiline legends for molecules in a grid and use the following code from notebook from rdkit import Chem from rdkit.Chem import Draw # from rdkit.Chem.Draw import IPythonConsole mols = [Chem.MolFromSmiles(s) for s in ["CCC", "C"]] legends = ["1\nCCC", "2\nC"] Draw.MolsToGridImage(mols, molsPerRow=5, subImgSize=(200,200), legends=legends) it returns error: AttributeError: module 'string' has no attribute 'replace'(full stack is below) If I uncomment IPythonConsole import then it returns the picture with labels, but with spaces instead of new line symbols. What can I do in this situation? RDKit 2016.03.2 Kind regards, Pavel. --- AttributeError Traceback (most recent call last) in() 1 mols = [Chem.MolFromSmiles(s) for s in ["CCC", "C"]]2 legends = ["1\nCCC", "2\nC"]> 3p = Draw.MolsToGridImage(mols, molsPerRow=5, subImgSize=(200,200), legends=legends)/home/pavel/anaconda3/envs/my-rdkit-env/lib/python3.5/site-packages/rdkit/Chem/Draw/__init__.py in MolsToGridImage(mols, molsPerRow, subImgSize, legends, highlightAtomLists, useSVG, **kwargs) 387 else:388 return _MolsToGridImage(mols,molsPerRow=molsPerRow,subImgSize=subImgSize, --> 389legends=legends, highlightAtomLists=highlightAtomLists, **kwargs) 390 391 def ReactionToImage(rxn, subImgSize=(200,200),**kwargs):/home/pavel/anaconda3/envs/my-rdkit-env/lib/python3.5/site-packages/rdkit/Chem/Draw/__init__.py in _MolsToGridImage(mols, molsPerRow, subImgSize, legends, highlightAtomLists, **kwargs) 334 highlights=highlightAtomLists[i]335 if mol is not None:--> 336img = _moltoimg(mol,subImgSize,highlights,legends[i],**kwargs)337 res.paste(img,(col*subImgSize[0],row*subImgSize[1]))338 return res/home/pavel/anaconda3/envs/my-rdkit-env/lib/python3.5/site-packages/rdkit/Chem/Draw/__init__.py in _moltoimg(mol, sz, highlights, legend, **kwargs) 302 if not hasattr(rdMolDraw2D,'MolDraw2DCairo'):303 img = MolToImage(mol,sz,legend=legend,highlightAtoms=highlights, --> 304**kwargs) 305 else:306 nmol = rdMolDraw2D.PrepareMolForDrawing(mol,kekulize=kwargs.get('kekulize',True))/home/pavel/anaconda3/envs/my-rdkit-env/lib/python3.5/site-packages/rdkit/Chem/Draw/__init__.py in MolToImage(mol, size, kekulize, wedgeBonds, fitImage, options, canvas, **kwargs) 132 # color=(0,0,1),fill=False,stroke=True)133 font=Font(face='sans',size=12)--> 134canvas.addCanvasText(legend,pos,font)135 136 if kwargs.get('returnCanvas',False):/home/pavel/anaconda3/envs/my-rdkit-env/lib/python3.5/site-packages/rdkit/Chem/Draw/spingCanvas.py in addCanvasText(self, text, pos, font, color, **kwargs) 72 labelP = pos[0]-txtWidth/2+offset,pos[1]+txtHeight/273 color = convertColor(color)---> 74self.canvas.drawString(text,labelP[0],labelP[1],font,color=color)75 return (bw,bh,offset)76 /home/pavel/anaconda3/envs/my-rdkit-env/lib/python3.5/site-packages/rdkit/sping/PIL/pidPIL.py in drawString(self, s, x, y, font, color, angle, **kwargs) 313 if '\n' in s or '\r' in s:314 self.drawMultiLineString(s, x,y, font, color, angle, --> 315**kwargs) 316 return317 if not font: font = self.defaultFont/home/pavel/anaconda3/envs/my-rdkit-env/lib/python3.5/site-packages/rdkit/sping/pid.py in drawMultiLineString(self, s, x, y, font, color, angle, **kwargs) 393 dy = h * math.cos(angle*math.pi/180.0)394 dx = h * math.sin(angle*math.pi/180.0)--> 395s = string.replace(s, '\r\n', '\n')396 s = string.replac
Re: [Rdkit-discuss] Generating all stereochem possibilities from smile
I just want to share my script, which I use for enumeration of stereoisomers. Enumeration of double bonds was added quite recently and thus I didn't test it extensively. I put it on github: https://github.com/DrrDom/rdk It seems to work well on quite complex queries like CC1CCC(CC1)C1CC=C(C\C(=C/I)C(=CF)C1C1C[C@@H](C)CC=C1)C1CC[C@H](O)C(Br)C1 to generate all possible stereoisomers: gen_stereo_rdkit3.py -i input.smi -o output.smi -t -d to discard compounds with more than 4 undefined centers/double bonds: gen_stereo_rdkit3.py -i input.smi -o output.smi -t -d -u 4 Try it on your own risk :) If you will find mistakes let me know. Pavel. On 12/10/2016 03:44 AM, Brian Cole wrote: So I may need a little guidance on this one from someone with a little more historical knowledge of RDKit. I found the findPotentialStereoBonds function inside Chirality.cpp that appears to do what we want, detect double bonds with unspecified E/Z stereo chemistry. That's at least what the comment in the file leads me to believe: // Find bonds than can be cis/trans in a molecule and mark them as "any" ... void findPotentialStereoBonds(ROMol , bool cleanIt) { But when you look at the code, it's not setting bonds to Bond::STEREOANY, it's setting them to Bond::STEREONONE here: ... // proceed only if we either want to clean the stereocode on this bond // or if none is set on it yet if (cleanIt || dblBond->getStereo() == Bond::STEREONONE) { dblBond->setStereo(Bond::STEREONONE); ... Seems odd to me check that the stereo is Bond::STEREONONE, and then set it to that value. While findPotentialStereoBonds isn't exposed in Python, there is some Java docs written for it in the Java wrappers: %javamethodmodifiers RDKit::MolOps::findPotentialStereoBonds( ROMol & mol,boolcleanIt = false ) " /** finds bonds that could be cis/trans in a molecule and mark them as Bond::STEREONONE So that documentation is actually correct. It does find bonds that could be cis/trans and mark them as Bond::STEREONONE. That just isn't very useful since all bonds are marked Bond::STEREONONE by default. The only direct test case of findPotentialStereoBonds I can find is the following: MolOps::findPotentialStereoBonds(*m, false); TEST_ASSERT(m->getNumAtoms() == 15); TEST_ASSERT(m->getBondWithIdx(1)->getBondType() == Bond::DOUBLE); TEST_ASSERT(m->getBondWithIdx(1)->getStereoAtoms().size() == 2); TEST_ASSERT(m->getBondWithIdx(3)->getBondType() == Bond::DOUBLE); TEST_ASSERT(m->getBondWithIdx(3)->getStereoAtoms().size() == 2); That doesn't even test the getStereo() return value. Though it does test getStereoAtoms() return value. So it looks like the proper thing to do to detect unspecified bond stereo is to check for a non-empty getStereoAtoms() vector? So ideally, I would like to write an bond stereo enumerator that tests if a double bond has unspecified stereo (testing the size of getStereoAtoms() would work) and then sets the stereo, that is: bond.SetStereo(Bond::STEREOE); or bond.SetStereo(Bond::STEREOZ); And then hopefully the Chem.MolToSmiles will "do the right thing". However, this commented out code in the wrapper has given me pause: // this is no longer exposed because it requires that stereo atoms // be set. This is a task that is tricky and "dangerous". //.def("SetStereo",::setStereo, // "Set the CIP-classification of the bond as a BondStereo\n") So apparently I shouldn't expose an "easy" way to do this. What is the trickiness and dangerousness of this API? And could we make an easy way to enumerate bond stereo? Thanks! On Fri, Dec 9, 2016 at 5:44 PM, Brian Cole <col...@gmail.com <mailto:col...@gmail.com>> wrote: This has me quite curious now, how do we detect unspecified bond stereo chemistry in RDKit? m = Chem.MolFromSmiles("FC=CF") assert m.HasProp("_StereochemDone") for bond in m.GetBonds(): print(bond.GetBondDir(), bond.GetStereo()) Yields: (rdkit.Chem.rdchem.BondDir.NONE, rdkit.Chem.rdchem.BondStereo.STEREONONE) (rdkit.Chem.rdchem.BondDir.NONE, rdkit.Chem.rdchem.BondStereo.STEREONONE) (rdkit.Chem.rdchem.BondDir.NONE, rdkit.Chem.rdchem.BondStereo.STEREONONE) Trying to force the issue: Chem.AssignStereochemistry(m, cleanIt=True, force=True, flagPossibleStereoCenters=True) assert m.HasProp("_StereochemDone") for bond in m.GetBonds(): print(bond.GetBondDir(), bond.GetStereo()) Still yields: (rdkit.Chem.rdchem.BondDir.NONE, rdkit.Chem.rdchem.BondStereo.STEREONONE) (rdkit.Chem.rdchem.BondDir.NONE, rdkit.Chem.rdchem.BondStereo.STEREONONE) (rdkit.Chem.rdchem.BondDir.NONE, rdkit.Chem.rdchem.BondStereo.STEREONONE)
Re: [Rdkit-discuss] Extracting SMILES from text
Hi, Alexis, if you should not track from which document SMILES come, you may just combine all words from all document in a list, take only unique words and try to test them. Thus, you should not store and check for valid/non-valid strings. That would reduce problem complexity as well. Pavel. On 12/02/2016 11:11 AM, Greg Landrum wrote: An initial start on some regexps that match SMILES is here: https://gist.github.com/lsauer/1312860/264ae813c2bd2c27a769d261c8c6b38da34e22fb that may also be useful On Fri, Dec 2, 2016 at 11:07 AM, Alexis Parenty <alexis.parenty.h...@gmail.com <mailto:alexis.parenty.h...@gmail.com>> wrote: Hi Markus, Yes! I might discover novel compounds that way!! Would be interesting to see how they look like… Good suggestion to also store the words that were correctly identified as SMILES. I’ll add that to the script. I also like your “distribution of word” idea. I could safely skip any words that occur more than 1% of the time and could try to play around with the threshold to find an optimum. I will try every suggestions and will time it to see what is best. I’ll keep everyone in the loop and will share the script and results. Thanks, Alexis On 2 December 2016 at 10:47, Markus Sitzmann <markus.sitzm...@gmail.com <mailto:markus.sitzm...@gmail.com>> wrote: Hi Alexis, you may find also so some "novel" compounds by this approach :-). Whether your tuple solution improves performance strongly depends on the content of your text documents and how often they repeat the same words again - but my guess would be it will help. Probably the best way is even to look at the distribution of words before you feed them to RDKit. You should also "memorize" those ones that successfully generated a structure, doesn't make sense to do it again, then. Markus On Fri, Dec 2, 2016 at 10:21 AM, Maciek Wójcikowski <mac...@wojcikowski.pl <mailto:mac...@wojcikowski.pl>> wrote: Hi Alexis, You may want to filter with some regex strings containing not valid characters (i.e. there is small subset of atoms that may be without brackets). See "Atoms" section: http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html <http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html> The set might grow pretty quick and may be inefficient, so I'd parse all strings passing above filter. Although there will be some false positives like "CC" which may occur in text (emails especially). Pozdrawiam, | Best regards, Maciek Wójcikowski mac...@wojcikowski.pl <mailto:mac...@wojcikowski.pl> 2016-12-02 10:11 GMT+01:00 Alexis Parenty <alexis.parenty.h...@gmail.com <mailto:alexis.parenty.h...@gmail.com>>: Dear all, I am looking for a way to extract SMILES scattered in many text documents (thousands documents of several pages each). At the moment, I am thinking to scan each words from the text and try to make a mol object from them using Chem.MolFromSmiles() then store the words if they return a mol object that is not None. Can anyone think of a better/quicker way? Would it be worth storing in a tuple any word that do not return a mol object from Chem.MolFromSmiles() and exclude them from subsequent search? Something along those lines excluded_set = set() smiles_list = [] For each_word in text: If each_word not in excluded_set: each_word_mol = Chem.MolFromSmiles(each_word) if each_word_mol is not None: smiles_list.append(each_word) else: excluded_set.add(each_word_mol) Would not searching into that growing tuple take actually more time than trying to blindly make a mol object for every word? Any suggestion? Many thanks and regards, Alexis -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list