I think these are really easy to write for a single use-case, and hard to be generally useful. Why do you think pipelines make it hard? You know you can extract the estimators from the steps, right?

def feature_importances_pipeline(pipe):
    extractor = pipe.steps[0][1]
    linear_model = pipe.steps[1][1]
    return dict(zip(extractor.get_feature_names(), linear_model.coef_))


On 08/20/2015 02:47 AM, Christoph Sawade wrote:
Hey there!

In the last time, I used the linear models and the feature extraction pipeline a lot. During the feature engineering process, I run again and again in the situation, in which I wanted to inspect the classifications and understand, which features had the highest influence; this is important especially if one uses different feature sources. In my case, I found it very helpful to have an inspection for the model (independent of the instance) and one for the prediction of a single example that gives some insights about the weights. E.g., an inspection function for a binary linear classifier could return a list of (feature name, weight)-pairs sorted by weight:

mdl = LogisticRegression()
# training, tuning,...
inspect(mdl)
/[(<sourcename_featname>, <weight>),...]/

predict_and_inspect(mdl, example)
/{/
/    'probability': <prob_of_positive_class>,/
/    'inspection':[(<sourcename_featname>, <model_weight*feature_weight>),...],
    'label': <ground_truth_label>,
    'prediction': <predicted_label>,
/
/        'example_id': <id>/
/}/
The pipeline framework encapsulates the whole feature encoding, which is 
typically very convenient. However, it is very hard to map the learned weights 
back to the actual feature names.
Is there any easy way to do that or are there already initiatives to build something like this? If not, do you think that this makes sense in general? I know that this is tough to generalize over all possible model classes (linear vs. kernel machines, number of classes, ...), but I think it is worth to try it, since it is necessary to iterate on a predictive model in practice.

Cheers, Christoph


------------------------------------------------------------------------------


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to