On a related note, I implemented NDCG with a slightly different interface
than Olivier's implementation:
https://gist.github.com/mblondel/7337391

My implementation takes y_true and y_pred as arguments and so is more
consistent with other metrics in scikit-learn. However y_pred might not be
available for listwise methods so Olivier's implementation is useful too.

For learning to rank with only two relevance levels (0 and 1) we already
have two metrics in scikit-learn: ROC-AUC and average precision.

I just pushed an alternative implementation of average precision in the
unit tests so as to check for the correctness of scikit-learn's
implementation (based on computing the area under the precision-recall
curve):
https://github.com/scikit-learn/scikit-learn/commit/d0cdcde9c500f5c9d73f61b97e0e69410fc694ef

Cheers,
Mathieu


On Wed, Nov 6, 2013 at 8:30 PM, Olivier Grisel <[email protected]>wrote:

> This is very interesting. I have been playing recently with learning
> to rank. Right now I just used point-wise regressors and just
> implemented NDCG as a ranking metric to compare the models. I tried to
> experiment with parallelizing extra trees here:
>
>
> http://nbviewer.ipython.org/urls/raw.github.com/ogrisel/notebooks/master/Learning%20to%20Rank.ipynb
>
> I think a GradientBoostingRegressor model can reach better accuracy
> but is not parallizable alone. Off-course if you use list-wise
> approach directly optimizing the target cost (e.g. NDCG like
> LambdaMART does) you should be able to reach the state of the art.
>
> The data was parsed once and save in compressed format here:
>
>
> http://nbviewer.ipython.org/url/raw.github.com/ogrisel/notebooks/master/Data%20Preprocessing%20for%20the%20%20Learning%20to%20Rank%20example.ipynb
>
> Here are the slides I am gonna present this afternoon at Budapest BI Forum:
>
>   https://speakerdeck.com/ogrisel/growing-randomized-trees-in-the-cloud-1
>
> About the API, properly supporting Learning to Rank will have impact
> on the scorer API and the cross validation / grid search. I am not yet
> sure how to best address all of this.
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>
>
> ------------------------------------------------------------------------------
> November Webinars for C, C++, Fortran Developers
> Accelerate application performance with scalable programming models.
> Explore
> techniques for threading, error checking, porting, and tuning. Get the most
> from the latest Intel processors and coprocessors. See abstracts and
> register
> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable programming models. Explore
techniques for threading, error checking, porting, and tuning. Get the most 
from the latest Intel processors and coprocessors. See abstracts and register
http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to