On Mon, Mar 26, 2012 at 12:09 AM, Peter Prettenhofer <[email protected]> wrote:
> 1. We need to support the query id (=``qid``) field in > ``svmlight_loader``; Pair-wise approaches such as RankingSVMs need > this information to form example pairs. My personal experience is that > RankingSVMs do surprisingly poor on learning-to-rank problems - but > they are great for binary classification problems with highly skewed > class distributions [1] (in this case we also don't need qids) - so > they would be a great addition to sklearn. Point-wise approaches don't > take query affiliation into account so we don't need to expose the > qids. I think that the qids are not a strict requirement but are useful to form pairs belonging to the same qid only. Considering all the possible pairs would be too expensive. In his papers, D. Sculley use the term "query sharding" to refer to this kind of heuristic. > 2. We need ranking evaluation metrics such as (N)DCG, average > precision and precision at k - but these metrics need to take the > query ids into account. Thus, we need do modify our some of our > existing metrics. Since "query" is more of an informational retrieval term, I was thinking we could use the word "group" instead. > BTW: We might implement a SGD-based ranking svm simply by creating a > new ``sklearn.utils.seq_dataset.Dataset`` subclass that implements > ``next`` such that it samples a positive and a negative example from > the training set and creates a new feature vector on the fly - so we > don't need to create the cross-product of examples explicitly - > something along the lines of D. Sculleys work [2]. +1! Along the same lines, sampling a positive and negative example every two SGD steps optimizes AUC. Mathieu ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
