We have a study dataset with subject, label (stroke/no stroke), and 60 features; I'd like to make an SVM classifier and test its significance, most important features, etc. I get results, but also a few cryptic (to me) errors and some warnings.

Also, if I try NFoldPartitioner() rather than HalfPartitioner(...) I get a traceback about missing chunks, so it seems I need to set them explicitly?

I also can't get searchlight set up correctly.


My test code, based on the tutorial:
from mvpa2.tutorial_suite import *

d = file(r"C:\temp\test3_redo.csv").readlines()
lol= [x[:-1].split(",") for x in d]
print lol[0]
## the list of subject names
subjects = [r[0] for r in lol]
## the feature data
dat = [[float(c) for c in row[6:]] for row in lol]

labels = [r[1] for r in lol]
tmp = [l.replace('Normal', '0') \
        for l in [l.replace('Stroke', '1') for l in labels]]
## the truth values
labels = [int(x) for x in tmp]

ds = Dataset(samples=dat)
ds.sa['subject'] = subjects
ds.sa['targets'] = labels
print ds, '\n'

clf = LinearCSVMC()
cvte = CrossValidation(clf, HalfPartitioner(count=2,
    selection_strategy='random', attr='subject'),
    errorfx=lambda p, t: np.mean(p == t), enable_ca=['stats'])
cv_results = cvte(ds)
print cvte.ca.stats.as_string(description=True)
print cvte.ca.stats.matrix

aov = OneWayAnova()
f = aov(ds)
print 'aov:', f

fsel = SensitivityBasedFeatureSelection(
    OneWayAnova(),
    FixedNElementTailSelector(5, mode='select', tail='upper'))
fsel.train(ds)
ds_p = fsel(ds)
print '\nfixed:', ds_p.shape

results = cvte(ds_p)
print np.round(cvte.ca.stats.stats['ACC%'], 1)
print cvte.ca.stats.matrix
print

fsel = SensitivityBasedFeatureSelection(
    OneWayAnova(),
    FractionTailSelector(0.05, mode='select', tail='upper'))
fclf = FeatureSelectionClassifier(clf, fsel)
cvte = CrossValidation(fclf, HalfPartitioner(count=2,
    selection_strategy='random', attr='subject'),
                       enable_ca=['stats'])
results = cvte(ds)
print 'fractional', np.round(cvte.ca.stats.stats['ACC%'], 1)


Errors:
'gcc' is not recognized as an internal or external command,
operable program or batch file.
C:\Python27\lib\site-packages\scipy\integrate\quadpack.py:288: UserWarning: Extremely bad integrand behavior occurs at some points of the
  integration interval.
  warnings.warn(msg)
C:\Python27\lib\site-packages\mvpa2\misc\errorfx.py:102: RuntimeWarning: invalid value encountered in divide
  ([0], np.cumsum(t)/t.sum(dtype=np.float), [1]))
C:\Python27\lib\site-packages\scipy\stats\stats.py:274: RuntimeWarning: invalid value encountered in double_scalars
  return np.mean(x,axis)/factor
C:\Python27\lib\site-packages\mvpa2\misc\errorfx.py:106: RuntimeWarning: invalid value encountered in divide
  ([0], np.cumsum(~t)/(~t).sum(dtype=np.float), [1]))
C:\Python27\lib\site-packages\mvpa2\clfs\transerror.py:678: RuntimeWarning: invalid value encountered in divide
  stats['PPV'] = stats['TP'] / (1.0*stats["P'"])
C:\Python27\lib\site-packages\mvpa2\clfs\transerror.py:679: RuntimeWarning: invalid value encountered in divide
  stats['NPV'] = stats['TN'] / (1.0*stats["N'"])
C:\Python27\lib\site-packages\mvpa2\clfs\transerror.py:680: RuntimeWarning: invalid value encountered in divide
  stats['FDR'] = stats['FP'] / (1.0*stats["P'"])
C:\Python27\lib\site-packages\mvpa2\measures\anova.py:111: RuntimeWarning: invalid value encountered in divide
  msb = ssbn / float(dfbn)




Output:
['S001', 'Stroke', 'Structural', 'DL', 'A3+4', 'L', '33175.5142', '14408.18074', '10849.84165', '8059.24706', '8010.452299', '14', '45', '40', '55', '50', '56060.79132', '24908.80989', '16687.6343', '10154.6501', '7901.745475', '14', '45', '40', '50', '30', '64268.60726', '12620.57744', '992.4884881', '825.5158143', '751.0024413', '19', '27', '33', '67', '40', '2170.966193', '1879.560843', '1741.498856', '1340.718439', '959.5283252', '32', '15', '19', '42', '23', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '13424.62045', '9142.678538', '8140.41212', '6403.125282', '5807.041425', '15', '19', '66', '32', '41']
<Dataset: 62x60@float64, <sa: subject,targets>>

WARNING: Only 1 sets have estimates assigned from 2 sets. ROC estimates might be incorrect. * Please note: warnings are printed only once, but underlying problem might occur many times *
----------.
predictions\targets  0.0    1
`------ ---- ---- P' N' FP FN PPV NPV TPR SPC FDR MCC F1 AUC 0.0 0 23 23 39 23 24 0 0.38 0 0.39 1 -0.61 0 nan 1 24 15 39 23 24 23 0.38 0 0.39 0 0.62 -0.61 0.39 nan
Per target:          ----  ----
         P            24    38
         N            38    24
         TP           0     15
         TN           15    0
Summary \ Means: ---- ---- 31 31 23.5 23.5 0.19 0.19 0.2 0.2 0.81 -0.61 0.19 nan
       CHI^2        25.68 p=1.1e-05
        ACC          0.24
        ACC%        24.19
     # of sets        2

Statistics computed in 1-vs-rest fashion per each target.
Abbreviations (for details see http://en.wikipedia.org/wiki/ROC_curve):
 TP : true positive (AKA hit)
 TN : true negative (AKA correct rejection)
 FP : false positive (AKA false alarm, Type I error)
 FN : false negative (AKA miss, Type II error)
 TPR: true positive rate (AKA hit rate, recall, sensitivity)
      TPR = TP / P = TP / (TP + FN)
 FPR: false positive rate (AKA false alarm rate, fall-out)
      FPR = FP / N = FP / (FP + TN)
 ACC: accuracy
      ACC = (TP + TN) / (P + N)
 SPC: specificity
      SPC = TN / (FP + TN) = 1 - FPR
 PPV: positive predictive value (AKA precision)
      PPV = TP / (TP + FP)
 NPV: negative predictive value
      NPV = TN / (TN + FN)
 FDR: false discovery rate
      FDR = FP / (FP + TP)
 MCC: Matthews Correlation Coefficient
      MCC = (TP*TN - FP*FN)/sqrt(P N P' N')
 F1 : F1 score
      F1 = 2TP / (P + P') = 2TP / (2TP + FP + FN)
 AUC: Area under (AUC) curve
 CHI^2: Chi-square of confusion matrix
 LOE(ACC): Linear Order Effect in ACC across sets
 # of sets: number of target/prediction sets which were provided

[[ 0 23]
 [24 15]]
aov: <Dataset: 1x60@float64, <fa: fprob>>

fixed: (62, 5)
WARNING: Obtained degenerate data with zero norm for training of <LinearCSVMC>. Scaling of C cannot be done.
61.3
[[ 0  0]
 [24 38]]

fractional 61.3

_______________________________________________
Pkg-ExpPsy-PyMVPA mailing list
[email protected]
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa

Reply via email to