Sorting out below-chance accuracy is really vexing. If you haven't seen
it before, this topic has been discussed on this (and other mailing
lists) before, see the thread at
Googling "below-chance accuracy" also brings up some useful links.
I have seen this phenomenon (permutation distribution looks reasonably
normal and centered near chance but true-labeled accuracy in the left
tail) occasionally in my own data.
I don't have a good explanation for this, but tend to think it has to do
with data that doesn't make a linear-svm-friendly shape in hyperspace.
As typical in MVPA, you don't have a huge number of examples
(particularly if you have more than a hundred or so voxels in the ROI),
which also can make the classification results unstable.
If you are reasonably sure that the dataset is good (the examples are
properly labeled, the ROI masks fit well, etc) then I would try altering
the cross-validation scheme to see if you can get the individual
accuracies at (or above!) chance. For example, I'd try leaving two or
three runs out instead of just one for the cross-validation. Having a
small testing set (like you do with leave-one-run-out) can make a lot of
variance in the cross-validation folds (i.e. the accuracy for each of
the 6 classifiers going into each person's accuracy). Things seem to
often go better when all the cross-validation folds have fairly similar
accuracies (0.55, 0.6, 0.59, ...) rather than widely variable ones (0.5,
0.75, 0.6, ...).
Good luck, and I'd love to hear if you find a solution.
On 11/26/2012 7:21 AM, Meng Liang wrote:
I'm still puzzled by the results of classification accuracy lower than
chance level. I've provided some details to your questions my previous
email, and I hope you could help me understand this puzzle. Many thanks
Date: Sat, 10 Nov 2012 19:19:19 +0000
Subject: Re: [pymvpa] FW: What does a classification accuracy that is
significantly lower than chancel level mean?
Thanks very much for your reply! Please see below for details.
> > I'm running MVPA on some fMRI data (four different stimuli, say A, B, C
> > and D; six runs in each subject) to see whether the BOLD signals from a
> > given ROI can successfully predict the type of the stimulus. The MVPA
> > (leave-one-run-out cross-validation) was performed on each subject for
> > each two-way classification task. In a particular classification
> > classification A vs. B), in some subjects, the classification
> > (almost) significantly LOWER than the chance level (somewhere
> > and 0.4).
> depending on number of trials/cross-validation scheme even values of 0
> could come up by chance ;-) but indeed should not be 'significant'
> > What could be the reason for a significantly-lower-than-chance-level
> > accuracy?
> and how significant is this 'significantly LOWER'?
The significant level was assessed by P value obtained from 10,000
permutations. Permutation was done within each subject, by randomly
assigning stimulus labels to each trial (the number of trials under each
label was still balanced; there were 8 trials per condition in each run,
and there were six runs in total). The P value was calculated as the
percentage of random permutations in which the resultant classification
accuracy was higher than the actual classification accuracy obtained
from the correct labels (for example, if none of 10,000 random
permutations led to a classification accuracy that was higher than the
actual classification accuracy, the P value would be 0). In this way, in
5 out of 14 subjects, the P values were greater than 0.95. In other
words, the actual classification accuracy was located around the end of
the left tail of the null distribution in these 5 subjects (the shape of
the null distribution is like a bell, centered around 50%). In other 9
subjects, the actual classification accuracies were near or higher than
> details of # trials/cross-validation?
There were 8 trials per condition in each run, and there were six runs
in total. Leave-one-run-out cross-validation was performed, that is, the
classifier (linear SVM) was trained on the data obtained from five runs
and tested on the remaining run (repeat the same procedure six times and
each time using a different run as a testing dataset).
> > The P value was obtained from 10,000 permutations.
> is that permutations within the subject which at the end showed
> significant below 0? how permuations were done?
I hope the reply above provide enough details of how the permutation was
done. Please let me know if there is anything unclear.
> > But the
> > accuracies of all other classifications look fine in all subjects.
> fine means all above chance or still distributed around chance?
By 'fine' I mean the classification accuracy was around (i.e. not far
from the chance level, can be lower or higher than chance level) or
above chance level. To me, around or above chance level makes more sense
than significantly lower than chance level.
Joset A. Etzel, Ph.D.
Cognitive Control & Psychopathology Lab
Washington University in St. Louis
Pkg-ExpPsy-PyMVPA mailing list