Thanks
About the sklearn version, it is the 0.12.1
Using 0.13-git version, ends up with the same kind of
graph<http://s9.postimage.org/uiabf8aq7/pr_curve_1354020171.png>(
http://s9.postimage.org/uiabf8aq7/pr_curve_1354020171.png).
However the AUC does not work has previously and raises the following
AssertionError : AssertionError: Reordering is not turned on, and The x
array is not increasing: [ 0.24195 0.24145 0.24161 ..., 1.
1. 1. ]
Sometimes the roc curve may look very strange as the denominator
for precision is "relevant document" retrieved which varies with
thresholds.
You can find a dump of y_true here : http://pastebin.com/LmFTFdD4 and proba
vectors there : http://pastebin.com/Qnd9Bkj9. However it don't get the idea
behind. How could there be multiple precision levels for a single recall
value ?
Thanks again for your help.
All the best,
François.
Le 26/11/2012 15:59, François Kawala a écrit :
Hello everybody,
I'm interacting with "Scikit-learn peoples" for the first time, and I have
to say that is an amazing work that you've done here. I am very grateful
for the time you've spent in order that beginners like could play with such
great tools.
Having seen this example
http://scikit-learn.org/dev/auto_examples/plot_precision_recall.html I've
tried to plot the precision / recall curve on my own data.
The result is surprising, in fact most of the curves are consistent,
however, sometimes the output looks like that :
http://s10.postimage.org/d8uazvjt5/pr_curve_1353936568.png
Does it make sense to you ?
I've seen in the bug tracker, a couple of weeks ago, a bug fix on the *
precision_recall_curve** *function, could It be an explanation ?
Thanks for reading.
All the best,
François.
Ps. here is the code responsible for the aforementioned picture :
def plot_roc(model, X_test, y_test):
probas_ = model.predict_proba(X_test)
fpr, tpr, _thresholds = roc_curve(y_test, probas_[:, 1])
roc_auc = auc(fpr, tpr)
pl.clf()
pl.plot(fpr, tpr, label='ROC curve')
pl.plot([0, 1], [0, 1], 'k--')
pl.xlim([0.0, 1.0])
pl.ylim([0.0, 1.0])
pl.grid()
pl.xlabel('False Positive Rate')
pl.ylabel('True Positive Rate')
pl.title('Receiver operating characteristic (area = %0.2f)' % roc_auc)
pl.legend(loc="best")
pl.savefig(open('./roc_curve_%d.png' % int(time.time()), 'a'),
format="png")
--
François Kawala
------------------------------------------------------------------------------
Keep yourself connected to Go Parallel:
DESIGN Expert tips on starting your parallel project right.
http://goparallel.sourceforge.net
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general