Re: [Rdkit-discuss] back tracking descriptor names from RandomForest feature_importance

2018-08-20 Thread Shojiro Shibayama
Dear Ali,

Please run first the following code, which may help you:

```python
import numpy as np
np.argsort(rfregress.feature_importances_)[::-1]
```

The `argsort` will return the indexes of the important features in
ascending order and [::-1] reverses the order.
The indexes for feature importance must correspond to the order of
variables (or the order in 'allDescp' of your code), so use these
variables, you'll get the information that you want.

Sincerely yours,
Shojiro


On Tue, 21 Aug 2018 at 10:34, Ali Eftekhari  wrote:

> Hello rdkit,
>
> This might be trivial but I am beginner and don't know how to do it.
>
> I am building a simple model to predict target property.  I have pandas
> dataframe (df) whose columns are 'SMILES' and 'Target'.
>
> #calculating the descriptors as below:
> llDescp=[name[0] for name in Descriptors._descList]
> calc=MoleculeDescriptors.MolecularDescriptorCalculator(allDescp)
> df ['fp']=df['SMILES'].apply(lambda x:
> calc.CalcDescriptors(Chem.MolFromSmiles(x)))
>
> #converting  the fingerprint to numpy array
> y=df['Target'].values
> X=np.array(list(df['fp']))
>
> #preprocessing
> X_train, X_test, y_train, y_test=train_test_split(X, y, test_size=0.25,
> random_state=42)
> st=StandardScaler()
> X=st.fit_transform(X)
>
> #random forest model
> model=RandomForestRegressor(n_estimators=10)
> model.fit(X_train, y_train)
>
> My problem is that I don't know how to get the meaningful
> feature_importance.  The following will return the values of descriptors
> but there is no labels and so I don't know how to figure out which features
> are important.
>
> print (sorted (rfregress.feature_importances_))
>
> Thanks for your help!
>
>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>


-- 

The University of Tokyo
2nd year Ph.D. candidate
  Shojiro Shibayama

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] back tracking descriptor names from RandomForest feature_importance

2018-08-20 Thread Ali Eftekhari
Hello rdkit,

This might be trivial but I am beginner and don't know how to do it.

I am building a simple model to predict target property.  I have pandas
dataframe (df) whose columns are 'SMILES' and 'Target'.

#calculating the descriptors as below:
llDescp=[name[0] for name in Descriptors._descList]
calc=MoleculeDescriptors.MolecularDescriptorCalculator(allDescp)
df ['fp']=df['SMILES'].apply(lambda x:
calc.CalcDescriptors(Chem.MolFromSmiles(x)))

#converting  the fingerprint to numpy array
y=df['Target'].values
X=np.array(list(df['fp']))

#preprocessing
X_train, X_test, y_train, y_test=train_test_split(X, y, test_size=0.25,
random_state=42)
st=StandardScaler()
X=st.fit_transform(X)

#random forest model
model=RandomForestRegressor(n_estimators=10)
model.fit(X_train, y_train)

My problem is that I don't know how to get the meaningful
feature_importance.  The following will return the values of descriptors
but there is no labels and so I don't know how to figure out which features
are important.

print (sorted (rfregress.feature_importances_))

Thanks for your help!
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Alignment using LIgpargen file

2018-08-20 Thread Paolo Tosco

Hi Phuong,

could you please send me the PDB files you are trying to align? You may 
reply to me directly.


Cheers,
p.


On 08/20/18 19:02, Phuong Chau wrote:

Hello everyone,

I am trying to align two chemicals using their pdb files with the 
following script:


*refMolwithH = Chem.MolFromPDBFile(sys.argv[1])*
*s = sys.argv[2]*
*prbMolwithH = Chem.MolFromPDBFile(s)*
*idx=s.find('_')*
*chemB= s[:idx]*
*
*
*rdDistGeom.EmbedMolecule(prbMolwithH)*
*AllChem.UFFOptimizeMolecule(prbMolwithH)*
*
*
*##Alignment*
*pyO3A = rdMolAlign.GetO3A(prbMolwithH, refMolwithH)*
*score = pyO3A.Align()*
*
*
*##3D coords of Chem B after alignmnet*
*Chem.MolToPDBFile(prbMolwithH,'{}.pdb'.format(chemB))*

The probe chemical pdb is generated from GRO file generated from 
LigParGen browser by this script:

*gmx editconf -f input.gro -o output.pdb*

The problem is the new aligned chemical is not aligned with the refMol 
but with probMol and the new chemical does not have H atoms on it.


Would you please help me with this problem? Are there other ways to 
align the pdb files generated from LIgParGen?


Thank you so much for your help



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Alignment using LIgpargen file

2018-08-20 Thread Phuong Chau
Hello everyone,

I am trying to align two chemicals using their pdb files with the following
script:

*refMolwithH = Chem.MolFromPDBFile(sys.argv[1])*
*s = sys.argv[2]*
*prbMolwithH = Chem.MolFromPDBFile(s)*
*idx=s.find('_')*
*chemB= s[:idx]*

*rdDistGeom.EmbedMolecule(prbMolwithH)*
*AllChem.UFFOptimizeMolecule(prbMolwithH)*

*##Alignment*
*pyO3A = rdMolAlign.GetO3A(prbMolwithH, refMolwithH)*
*score = pyO3A.Align()*

*##3D coords of Chem B after alignmnet*
*Chem.MolToPDBFile(prbMolwithH,'{}.pdb'.format(chemB))*

The probe chemical pdb is generated from GRO file generated from LigParGen
browser by this script:
*gmx editconf -f input.gro -o output.pdb*

The problem is the new aligned chemical is not aligned with the refMol but
with probMol and the new chemical does not have H atoms on it.

Would you please help me with this problem? Are there other ways to align
the pdb files generated from LIgParGen?

Thank you so much for your help
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss