For 1) the two methods should give the same result, except that
currently there is no stratification in train_test_split. So the
StratifiedShuffleSplit should be better.
For 2) 51.66 for 100 permutations seems more reasonable than 60%.
On 04/28/2015 05:04 AM, Fabrizio Fasano wrote:
Thanks a lot:
Based on your suggestion I performed the following 2 tests (code below);
1) on the true labels, instead of defining train,test by
StratifiedShuffleSplit I performed 10000 permutations of train, test
sets by cross_validation.train_test_split, and accuracy resulted to be
Accuracy: 98.00 (+/- 16.49)
2) on false labels obtained by permuting 100 times the true ones, I
performed 100 permutation of train, test for every false label
permutation, and accuracy resulted to be Accuracy: 51.66 (+/- 50.32)
Do in your opinion my tests make sense?
If so I would be very happy, because I could be confident that my good
(98%) result is a true one and not a biased oneā¦
thank You again
Fabrizio
CODES:
# 1) true labels
niter=10000
scores=zeros(niter)
clf = svm.LinearSVC(penalty="l1", dual=False, C=1, random_state=1)
for rs in range(0, niter):
X_train, X_test, y_train, y_test =
cross_validation.train_test_split(X_scaled, y, test_size=0.25,
random_state=rs+1)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
scores[rs]=100*sum(y_test==y_pred)/(y_test.shape[0])
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
# 2) false labels obtained by permutation of the true ones
niter=100;
scores=zeros([100, niter])
# permutating single tests on false labels to check bad accuracy
for i in range (0, 100):
yfalse=np.random.permutation(y)
print "\nWhen a al manual permutation procedure is apllied"
clf = svm.LinearSVC(penalty="l1", dual=False, C=1, random_state=1)
for rs in range(0, niter):
X_train, X_test, y_train, y_test =
cross_validation.train_test_split(X_scaled, yfalse, test_size=0.25,
random_state=rs+1)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
scores[i,rs]=100*sum(y_test==y_pred)/(y_test.shape[0])
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
On 27 Apr 2015, at 22:27, Andreas Mueller <t3k...@gmail.com
<mailto:t3k...@gmail.com>> wrote:
You changed the labels only once, and have a test-set size of 4? I would
imagine that is where that comes from.
If you repeat over different assignments, you will get 50/50.
On 04/27/2015 11:33 AM, Fabrizio Fasano wrote:
Dear Andy,
Yes, the classes have the same size, 8 and 8
this is one example of code I used to cross validate classification
(I used here StratifiedShuffleSplit, but I also used other methods
as leave one out or simple 4-fold cross validation, and the result
didn't change so much)
from sklearn.cross_validation import StratifiedShuffleSplit
sss = StratifiedShuffleSplit(y, 100, test_size=0.25, random_state=0)
clf = svm.LinearSVC(penalty="l1", dual=False, C=1, random_state=1)
cv_scores=[]
for train_index, test_index in sss:
X_train, X_test = X_scaled[train_index], X_scaled[test_index]
y_train, y_test = y[train_index], y[test_index]
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
cv_scores.append(np.sum(y_pred == y_test) / float(np.size(y_test)))
print "Accuracy ", np.ceil(100*np.mean(cv_scores)), "+/-",
np.ceil(200*np.std(cv_scores))
On Apr 26, 2015, at 7:50 PM, Andy wrote:
Your expectation is right, if you randomly assign labels, you shouldn't
get more than 50% correct with a large enough dataset.
I imagine there is some issue in how you shuffled the labels. Without
the code, it is hard to tell.
Are you sure the classes have the same size?
On 04/26/2015 11:22 AM, Fabrizio Fasano wrote:
Dear Andreas,
Thanks a lot for your help,
about the random assignment of values to my labels y. What I mean
is that being suspicious about the too good performances, I
changed the labels manually, retaining the 50% 1,0 but in
different orders, and the labels were always predicted very well,
with accuracy no lower than 60%. I mean, by chance I aspected
values lower than 50% as well as values higher than 50%. I didn't
perform an exhaustive test (I only did it manually for few
combinations)...
Fabrizio
------------------------------------------------------------------------------
One dashboard for servers and applications across
Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable
Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
One dashboard for servers and applications across
Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable
Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general