Thanks for the interest guys I'll try to address some of your comments.
I haven't pushed the code anywhere yet. Putting aside potential API issues,
there are currently no tests, there may be some numerical issues that still
need to be ironed out, some data types were specialized for cython for the
problem I was working on, not to mention I haven't benchmarked the code
against other implementations yet. I'm fine with putting a WIP PR up
despite these issues so we can consolidate our thoughts there if you guys
prefer, like Peter mentioned. For now, I tried following this
guide<http://scikit-learn.org/stable/developers/index.html#fitting>,
so fit takes an additional keyword argument "query".
I looked at the code by discobot but didn't use it, when I skimmed over it,
it seemed he wasn't computing the second derivatives or updating the tree
leaves.
Jacques
P.S. Thanks for the heads up on the svmlight file format, I should have
known there would be readers for it.
On Wed, Nov 6, 2013 at 6:48 PM, Mathieu Blondel <[email protected]>wrote:
> Jacques, is your LambdaMART implementation available somewhere?
>
> Mathieu
>
>
> On Thu, Nov 7, 2013 at 12:09 AM, Mathieu Blondel <[email protected]>wrote:
>
>> On a related note, I implemented NDCG with a slightly different interface
>> than Olivier's implementation:
>> https://gist.github.com/mblondel/7337391
>>
>> My implementation takes y_true and y_pred as arguments and so is more
>> consistent with other metrics in scikit-learn. However y_pred might not be
>> available for listwise methods so Olivier's implementation is useful too.
>>
>> For learning to rank with only two relevance levels (0 and 1) we already
>> have two metrics in scikit-learn: ROC-AUC and average precision.
>>
>> I just pushed an alternative implementation of average precision in the
>> unit tests so as to check for the correctness of scikit-learn's
>> implementation (based on computing the area under the precision-recall
>> curve):
>>
>> https://github.com/scikit-learn/scikit-learn/commit/d0cdcde9c500f5c9d73f61b97e0e69410fc694ef
>>
>> Cheers,
>> Mathieu
>>
>>
>> On Wed, Nov 6, 2013 at 8:30 PM, Olivier Grisel
>> <[email protected]>wrote:
>>
>>> This is very interesting. I have been playing recently with learning
>>> to rank. Right now I just used point-wise regressors and just
>>> implemented NDCG as a ranking metric to compare the models. I tried to
>>> experiment with parallelizing extra trees here:
>>>
>>>
>>> http://nbviewer.ipython.org/urls/raw.github.com/ogrisel/notebooks/master/Learning%20to%20Rank.ipynb
>>>
>>> I think a GradientBoostingRegressor model can reach better accuracy
>>> but is not parallizable alone. Off-course if you use list-wise
>>> approach directly optimizing the target cost (e.g. NDCG like
>>> LambdaMART does) you should be able to reach the state of the art.
>>>
>>> The data was parsed once and save in compressed format here:
>>>
>>>
>>> http://nbviewer.ipython.org/url/raw.github.com/ogrisel/notebooks/master/Data%20Preprocessing%20for%20the%20%20Learning%20to%20Rank%20example.ipynb
>>>
>>> Here are the slides I am gonna present this afternoon at Budapest BI
>>> Forum:
>>>
>>>
>>> https://speakerdeck.com/ogrisel/growing-randomized-trees-in-the-cloud-1
>>>
>>> About the API, properly supporting Learning to Rank will have impact
>>> on the scorer API and the cross validation / grid search. I am not yet
>>> sure how to best address all of this.
>>>
>>> --
>>> Olivier
>>> http://twitter.com/ogrisel - http://github.com/ogrisel
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> November Webinars for C, C++, Fortran Developers
>>> Accelerate application performance with scalable programming models.
>>> Explore
>>> techniques for threading, error checking, porting, and tuning. Get the
>>> most
>>> from the latest Intel processors and coprocessors. See abstracts and
>>> register
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>>
>
>
> ------------------------------------------------------------------------------
> November Webinars for C, C++, Fortran Developers
> Accelerate application performance with scalable programming models.
> Explore
> techniques for threading, error checking, porting, and tuning. Get the most
> from the latest Intel processors and coprocessors. See abstracts and
> register
> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable programming models. Explore
techniques for threading, error checking, porting, and tuning. Get the most
from the latest Intel processors and coprocessors. See abstracts and register
http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general