2012/7/5 Emanuele Olivetti <[email protected]>: > Dear All, > > As some of you may have already noticed, Peter (Prettenhofer) has > just won a the "Online Product Sales" competition on kaggle.com > beating 365 teams: > http://www.kaggle.com/c/online-sales/leaderboard > The competition was about predicting the monthly online sales of > a product. In my opinion it was a remarkably difficult competition, > so... congratulations! > > As far as I understand Peter used scikit-learn and specifically (his) > GradientBoostingRegressor(), in a clever way. > > This is an excellent result for him and - surely - a nice one for > scikit-learn.
Indeed, great work Peter! This is an amazing feat. For those interested in the details, see the "Congrats to the Winners" thread on the forum where Peter and other top competitors give info on the winning strategies: http://www.kaggle.com/c/online-sales/forums/t/2135/congrats-to-the-winners In short: all of the top performers used gradient boosted machines + feature expansion for dates + refinements and a fined grained model selection procedure. Apparently is very important to do rigorous cross-validation on kaggle to avoid selecting over-fitting models on Kaggle as the top competitors have often very close scores and the smallest bit of score variance matters. Along with Random Forests, those methods are pretty hard to beat on Kaggle competitions these days. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
