Hello rdkit,
This might be trivial but I am beginner and don't know how to do it.
I am building a simple model to predict target property. I have pandas
dataframe (df) whose columns are 'SMILES' and 'Target'.
#calculating the descriptors as below:
llDescp=[name[0] for name in Descriptors._descList]
calc=MoleculeDescriptors.MolecularDescriptorCalculator(allDescp)
df ['fp']=df['SMILES'].apply(lambda x:
calc.CalcDescriptors(Chem.MolFromSmiles(x)))
#converting the fingerprint to numpy array
y=df['Target'].values
X=np.array(list(df['fp']))
#preprocessing
X_train, X_test, y_train, y_test=train_test_split(X, y, test_size=0.25,
random_state=42)
st=StandardScaler()
X=st.fit_transform(X)
#random forest model
model=RandomForestRegressor(n_estimators=10)
model.fit(X_train, y_train)
My problem is that I don't know how to get the meaningful
feature_importance. The following will return the values of descriptors
but there is no labels and so I don't know how to figure out which features
are important.
print (sorted (rfregress.feature_importances_))
Thanks for your help!
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss