[Rdkit-discuss] Alignment using LIgpargen file
Hello everyone, I am trying to align two chemicals using their pdb files with the following script: *refMolwithH = Chem.MolFromPDBFile(sys.argv[1])* *s = sys.argv[2]* *prbMolwithH = Chem.MolFromPDBFile(s)* *idx=s.find('_')* *chemB= s[:idx]* *rdDistGeom.EmbedMolecule(prbMolwithH)* *AllChem.UFFOptimizeMolecule(prbMolwithH)* *##Alignment* *pyO3A = rdMolAlign.GetO3A(prbMolwithH, refMolwithH)* *score = pyO3A.Align()* *##3D coords of Chem B after alignmnet* *Chem.MolToPDBFile(prbMolwithH,'{}.pdb'.format(chemB))* The probe chemical pdb is generated from GRO file generated from LigParGen browser by this script: *gmx editconf -f input.gro -o output.pdb* The problem is the new aligned chemical is not aligned with the refMol but with probMol and the new chemical does not have H atoms on it. Would you please help me with this problem? Are there other ways to align the pdb files generated from LIgParGen? Thank you so much for your help -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Alignment using LIgpargen file
Hi Phuong, could you please send me the PDB files you are trying to align? You may reply to me directly. Cheers, p. On 08/20/18 19:02, Phuong Chau wrote: Hello everyone, I am trying to align two chemicals using their pdb files with the following script: *refMolwithH = Chem.MolFromPDBFile(sys.argv[1])* *s = sys.argv[2]* *prbMolwithH = Chem.MolFromPDBFile(s)* *idx=s.find('_')* *chemB= s[:idx]* * * *rdDistGeom.EmbedMolecule(prbMolwithH)* *AllChem.UFFOptimizeMolecule(prbMolwithH)* * * *##Alignment* *pyO3A = rdMolAlign.GetO3A(prbMolwithH, refMolwithH)* *score = pyO3A.Align()* * * *##3D coords of Chem B after alignmnet* *Chem.MolToPDBFile(prbMolwithH,'{}.pdb'.format(chemB))* The probe chemical pdb is generated from GRO file generated from LigParGen browser by this script: *gmx editconf -f input.gro -o output.pdb* The problem is the new aligned chemical is not aligned with the refMol but with probMol and the new chemical does not have H atoms on it. Would you please help me with this problem? Are there other ways to align the pdb files generated from LIgParGen? Thank you so much for your help -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] back tracking descriptor names from RandomForest feature_importance
Dear Ali, Please run first the following code, which may help you: ```python import numpy as np np.argsort(rfregress.feature_importances_)[::-1] ``` The `argsort` will return the indexes of the important features in ascending order and [::-1] reverses the order. The indexes for feature importance must correspond to the order of variables (or the order in 'allDescp' of your code), so use these variables, you'll get the information that you want. Sincerely yours, Shojiro On Tue, 21 Aug 2018 at 10:34, Ali Eftekhari wrote: > Hello rdkit, > > This might be trivial but I am beginner and don't know how to do it. > > I am building a simple model to predict target property. I have pandas > dataframe (df) whose columns are 'SMILES' and 'Target'. > > #calculating the descriptors as below: > llDescp=[name[0] for name in Descriptors._descList] > calc=MoleculeDescriptors.MolecularDescriptorCalculator(allDescp) > df ['fp']=df['SMILES'].apply(lambda x: > calc.CalcDescriptors(Chem.MolFromSmiles(x))) > > #converting the fingerprint to numpy array > y=df['Target'].values > X=np.array(list(df['fp'])) > > #preprocessing > X_train, X_test, y_train, y_test=train_test_split(X, y, test_size=0.25, > random_state=42) > st=StandardScaler() > X=st.fit_transform(X) > > #random forest model > model=RandomForestRegressor(n_estimators=10) > model.fit(X_train, y_train) > > My problem is that I don't know how to get the meaningful > feature_importance. The following will return the values of descriptors > but there is no labels and so I don't know how to figure out which features > are important. > > print (sorted (rfregress.feature_importances_)) > > Thanks for your help! > > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- The University of Tokyo 2nd year Ph.D. candidate Shojiro Shibayama -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] back tracking descriptor names from RandomForest feature_importance
Hello rdkit, This might be trivial but I am beginner and don't know how to do it. I am building a simple model to predict target property. I have pandas dataframe (df) whose columns are 'SMILES' and 'Target'. #calculating the descriptors as below: llDescp=[name[0] for name in Descriptors._descList] calc=MoleculeDescriptors.MolecularDescriptorCalculator(allDescp) df ['fp']=df['SMILES'].apply(lambda x: calc.CalcDescriptors(Chem.MolFromSmiles(x))) #converting the fingerprint to numpy array y=df['Target'].values X=np.array(list(df['fp'])) #preprocessing X_train, X_test, y_train, y_test=train_test_split(X, y, test_size=0.25, random_state=42) st=StandardScaler() X=st.fit_transform(X) #random forest model model=RandomForestRegressor(n_estimators=10) model.fit(X_train, y_train) My problem is that I don't know how to get the meaningful feature_importance. The following will return the values of descriptors but there is no labels and so I don't know how to figure out which features are important. print (sorted (rfregress.feature_importances_)) Thanks for your help! -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss