Thanks

About the sklearn version, it is the 0.12.1

Using 0.13-git version, ends up with the same kind of
graph<http://s9.postimage.org/uiabf8aq7/pr_curve_1354020171.png>(
http://s9.postimage.org/uiabf8aq7/pr_curve_1354020171.png).

However the AUC does not work has previously and raises the following
AssertionError :  AssertionError: Reordering is not turned on, and The x
array is not increasing: [ 0.24195  0.24145  0.24161 ...,  1.
1.       1.     ]

Sometimes the roc curve may look very strange as the denominator
for precision is "relevant document" retrieved which varies with
thresholds.


You can find a dump of y_true here : http://pastebin.com/LmFTFdD4 and proba
vectors there : http://pastebin.com/Qnd9Bkj9. However it don't get the idea
behind. How could there be multiple precision levels for a single recall
value ?

Thanks again for your help.
All the best,
François.



Le 26/11/2012 15:59, François Kawala a écrit :

Hello everybody,

I'm interacting with "Scikit-learn peoples" for the first time, and I have
to say that is an amazing work that you've done here. I am very grateful
for the time you've spent in order that beginners like could play with such
great tools.

Having seen this example
http://scikit-learn.org/dev/auto_examples/plot_precision_recall.html I've
tried to plot the precision / recall curve on my own data.

The result is surprising, in fact most of the curves are consistent,
however, sometimes the output looks like that :
http://s10.postimage.org/d8uazvjt5/pr_curve_1353936568.png

Does it make sense to you ?

I've seen in the bug tracker, a couple of weeks ago, a bug fix on the *
precision_recall_curve** *function, could It be an explanation ?

Thanks for reading.

All the best,
François.



Ps. here is the code responsible for the aforementioned picture :

def plot_roc(model, X_test, y_test):
  probas_ = model.predict_proba(X_test)
  fpr, tpr, _thresholds = roc_curve(y_test, probas_[:, 1])
  roc_auc = auc(fpr, tpr)
  pl.clf()
  pl.plot(fpr, tpr, label='ROC curve')
  pl.plot([0, 1], [0, 1], 'k--')
  pl.xlim([0.0, 1.0])
  pl.ylim([0.0, 1.0])
  pl.grid()
  pl.xlabel('False Positive Rate')
  pl.ylabel('True Positive Rate')
  pl.title('Receiver operating characteristic (area = %0.2f)' % roc_auc)
  pl.legend(loc="best")
  pl.savefig(open('./roc_curve_%d.png' % int(time.time()), 'a'),
format="png")







-- 
François Kawala
------------------------------------------------------------------------------
Keep yourself connected to Go Parallel: 
DESIGN Expert tips on starting your parallel project right.
http://goparallel.sourceforge.net
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to