2012/7/5 Emanuele Olivetti <[email protected]>:
> On 07/05/2012 08:49 AM, Olivier Grisel wrote:
>> 2012/7/5 Peter Prettenhofer <[email protected]>:
>>> ...
>>>
>>> I've to check with the competition organizers whether its ok to put
>>> the source code on github - I'll keep you posted.
>> If so that would be a great blog post topic. Looking forward to it.
>>
>
> Hi,
>
> For what it's worth, I've put the code of my best submission on
> github:
> https://github.com/emanuele/kaggle_ops
> http://www.kaggle.com/c/online-sales/forums/t/2136/the-code-of-my-best-submission
>
> You can download and run it to get an actual file to submit to the
> competition.
>
> Of course I just ranked 21st on that competition so it is *far* less 
> interesting than
> Peter's code :-D, and I've spent only a few hours in recent weekends. It was
> more a proof of concept about using blending, gradient boosting and joblib 
> than a
> serious attempt.
>
> The resulting code is pretty short: 150 lines to process the dataset
> and 80 lines to compute predictions. No real model selection :P
> Anyway the code is general and you can put RF or else inside.

Thank you very much Emanuele, the blending code is very useful.

You should blog it IMHO by explaining the various code snippets:

- feature extraction / expansions (e.g. how to handle dates & times as features)
- your visual exploration of which feature to convert to the log scale
- dealing with missing values
- blending the outcome of randomized models
- cross validation and performance evaluation in general (did you do
any error analysis, e.g. bias and variance using learning curves?)

It would be great to turn it the blending procedure as either an
example for scikit-learn (using one of the default toy datasets) or a
new meta-estimator in a new package (more work required but would
improve re-usability).

The feature extraction module would also deserve some utility helpers
to deal with dates.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to