[Scikit-learn-general] interpretation of feature selection effect

Satrajit Ghosh Wed, 14 Mar 2012 13:27:45 -0700

i have a situation i cannot quite comprehend. any suggestions would be much
appreciated.


---
clf = sk.svm.SVC(kernel='linear', shrinking=True)
anova_filter = SelectKBest(f_regression, k=50)
clf = Pipeline([('anova', anova_filter), ('svc', clf)])
maxkfold = np.min(np.bincount(y1)[1:])
result = []
for train, test in cv.StratifiedKFold(y1, maxkfold):
    result.append((y1[test],
                  clf.fit(X1[train], y1[train]).predict(X1[test])))
---

with k=50 in the above anova filter i get the following confusion matrix

[[ 0  7]
 [ 0 10]]


with k=51 and greater, i get ( i love this - but i don't have much
confidence in it):

[[7 0]
 [1 9]]

if i throw a Normalizer in the pipeline:

clf = Pipeline([('xfm', Normalizer()), ('anova', anova_filter), ('svc', clf)])

then i get the pattern with k=50 above independent of what i set k to.

if i do shufflesplit it results in the same issues.

cheers,

satra

------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] interpretation of feature selection effect

Reply via email to