[Rdkit-discuss] back tracking descriptor names from RandomForest feature_importance

Ali Eftekhari Mon, 20 Aug 2018 18:35:16 -0700

Hello rdkit,

This might be trivial but I am beginner and don't know how to do it.


I am building a simple model to predict target property.  I have pandas
dataframe (df) whose columns are 'SMILES' and 'Target'.

#calculating the descriptors as below:
llDescp=[name[0] for name in Descriptors._descList]
calc=MoleculeDescriptors.MolecularDescriptorCalculator(allDescp)
df ['fp']=df['SMILES'].apply(lambda x:
calc.CalcDescriptors(Chem.MolFromSmiles(x)))

#converting  the fingerprint to numpy array
y=df['Target'].values
X=np.array(list(df['fp']))

#preprocessing
X_train, X_test, y_train, y_test=train_test_split(X, y, test_size=0.25,
random_state=42)
st=StandardScaler()
X=st.fit_transform(X)

#random forest model
model=RandomForestRegressor(n_estimators=10)
model.fit(X_train, y_train)

My problem is that I don't know how to get the meaningful
feature_importance.  The following will return the values of descriptors
but there is no labels and so I don't know how to figure out which features
are important.

print (sorted (rfregress.feature_importances_))

Thanks for your help!

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

[Rdkit-discuss] back tracking descriptor names from RandomForest feature_importance

Reply via email to