This is very interesting. I have been playing recently with learning to rank. Right now I just used point-wise regressors and just implemented NDCG as a ranking metric to compare the models. I tried to experiment with parallelizing extra trees here:
http://nbviewer.ipython.org/urls/raw.github.com/ogrisel/notebooks/master/Learning%20to%20Rank.ipynb I think a GradientBoostingRegressor model can reach better accuracy but is not parallizable alone. Off-course if you use list-wise approach directly optimizing the target cost (e.g. NDCG like LambdaMART does) you should be able to reach the state of the art. The data was parsed once and save in compressed format here: http://nbviewer.ipython.org/url/raw.github.com/ogrisel/notebooks/master/Data%20Preprocessing%20for%20the%20%20Learning%20to%20Rank%20example.ipynb Here are the slides I am gonna present this afternoon at Budapest BI Forum: https://speakerdeck.com/ogrisel/growing-randomized-trees-in-the-cloud-1 About the API, properly supporting Learning to Rank will have impact on the scorer API and the cross validation / grid search. I am not yet sure how to best address all of this. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ November Webinars for C, C++, Fortran Developers Accelerate application performance with scalable programming models. Explore techniques for threading, error checking, porting, and tuning. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
