Hey there! In the last time, I used the linear models and the feature extraction pipeline a lot. During the feature engineering process, I run again and again in the situation, in which I wanted to inspect the classifications and understand, which features had the highest influence; this is important especially if one uses different feature sources. In my case, I found it very helpful to have an inspection for the model (independent of the instance) and one for the prediction of a single example that gives some insights about the weights. E.g., an inspection function for a binary linear classifier could return a list of (feature name, weight)-pairs sorted by weight:
mdl = LogisticRegression() # training, tuning,... inspect(mdl) *[(<sourcename_featname>, <weight>),...]* predict_and_inspect(mdl, example) *{* * 'probability': <prob_of_positive_class>,* * 'inspection': [(<sourcename_featname>, <model_weight*feature_weight>),...], 'label': <ground_truth_label>, 'prediction': <predicted_label>, * * 'example_id': <id>* *}* The pipeline framework encapsulates the whole feature encoding, which is typically very convenient. However, it is very hard to map the learned weights back to the actual feature names. Is there any easy way to do that or are there already initiatives to build something like this? If not, do you think that this makes sense in general? I know that this is tough to generalize over all possible model classes (linear vs. kernel machines, number of classes, ...), but I think it is worth to try it, since it is necessary to iterate on a predictive model in practice. Cheers, Christoph
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general