Hey there!

In the last time, I used the linear models and the feature extraction
pipeline a lot. During the feature engineering process, I run again and
again in the situation, in which I wanted to inspect the classifications
and understand, which features had the highest influence; this is
important especially
if one uses different feature sources. In my case, I found it very helpful
to have an inspection for the model (independent of the instance) and one
for the prediction of a single example that gives some insights about the
weights. E.g., an inspection function for a binary linear classifier could
return a list of (feature name, weight)-pairs sorted by weight:

mdl = LogisticRegression()

# training, tuning,...

inspect(mdl)

*[(<sourcename_featname>, <weight>),...]*


predict_and_inspect(mdl, example)

*{*

*   'probability': <prob_of_positive_class>,*


*   'inspection': [(<sourcename_featname>,
<model_weight*feature_weight>),...],   'label': <ground_truth_label>,
   'prediction': <predicted_label>,
*

*       'example_id': <id>*

*}*


The pipeline framework encapsulates the whole feature encoding, which
is typically very convenient. However, it is very hard to map the
learned weights back to the actual feature names.

Is there any easy way to do that or are there already initiatives to build
something like this? If not, do you think that this makes sense in general? I
know that this is tough to generalize over all possible model classes
(linear vs. kernel machines, number of classes, ...), but I think it is
worth to try it, since it is necessary to iterate on a predictive model in
practice.

Cheers, Christoph
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to