Hi Greg, Thanks a lot! This is very helpful. Further questions:
1. If I need RMSD matrix for clustering, I guess I will have to figure out a way to loop over all conformers to get the matrix first, if I choose to use GetBestRMS()? 2. Does the AlignMolConformers() handle symmetry and align all permutations to get the best "RMSlist"? 3. So I guess the EmbedMultipleConfs() also uses "standard" (no symmetry consideration) method to compute RMS for pruning? I appreciate your help! Best, Leon On Thu, Nov 14, 2019 at 2:45 AM Greg Landrum <greg.land...@gmail.com> wrote: > Hi Leon, > > There's not really a "right" answer to your question - it depends on what > you want to calculate. > I personally think it makes more sense to use the heavy atom RMSD (which > is what you get if you remove Hs before calculating the RMSD), particularly > if you are comparing to experiment. > > Note that AllChem.GetConformerRMSMatrix() does not take symmetry into > account, so you may not get the correct results. > I just opened a ticket to fix this, but in the meantime if you have > molecules with symmetry-equivalent atoms you are probably better off > generating the conformer RMS matrix manually using GetBestRMS(). > > Best, > -greg > > On Wed, Nov 13, 2019 at 5:17 PM topgunhaides . <sunzhi....@gmail.com> > wrote: > >> Hi guys, >> >> I am new to RDKit, and have a question about adding and removing Hs. >> >> As recommended in the documentation, hydrogen atoms should be added for >> generating conformers, optimization, etc. >> >> However, for clustering, should the Hs be removed first, before >> generating the conformer RMS matrix? For instance: >> >> >> from rdkit import Chem >> from rdkit.Chem import AllChem, TorsionFingerprints >> >> suppl = Chem.SDMolSupplier('molecule.sdf') >> >> for mol in suppl: >> mh = Chem.AddHs(mol) >> cids = AllChem.EmbedMultipleConfs(mh, numConfs=5, maxAttempts=1000, >> pruneRmsThresh=0.5, numThreads=0, >> randomSeed=1) >> m = Chem.RemoveHs(mh) >> # RMS matrix >> rmsmat = AllChem.GetConformerRMSMatrix(m, prealigned=False) >> # TFD matrix >> tfdmat = TorsionFingerprints.GetTFDMatrix(m) >> print(rmsmat) >> print(tfdmat) >> >> >> Note I remove the Hs before getting RMS and TFD matrices. Both resulting >> matrices are different if I do not remove Hs. The RMS without Hs, in >> general, tend to be smaller than the RMS with Hs. This will in turn affect >> the subsequent clustering result. >> >> Could you guys give me some suggestions? Thank you! >> >> Best, >> Leon >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss