Hi fellow sklearners,

I'm casually participating in the ICML'13 challenge on blackbox learning
[1] using sklearn and wanted to report progress and seek new ideas on how
sklearn tools can be used for this task.

Some basic info about the dataset:
- we don't know the meaning of features nor prediction task
- X_train.shape = (1000, 1875) => used to learn the estimator, seems
already standardized
- y_train.shape = (1000, 1) => class labels : 0.1 through 0.9 (sounds like
hidden regression pb ?)
- X_eval.shape = (10000, 1875) => used to rank participants
- X_unsupervised = (1.3e6, 1875) => extra unsupervised data

So far I tried a lot of classic multi-class classifiers implemented in
sklearn and RandomForest(n_estimators=100) proved to be the best with CV
accuracy of 0.32 (confirmed by kaggle submission).

I tried some feature selection approaches (f_classif) but with no luck so
far.

Leaders on the board tend to use Deep Learning stuff like tweaked RBM etc.
and report up to 0.67 accuracy. Probably they can take advantage of the
huge amount of unsupervised data in their neural nets.

I tried a very naive bootstrapping approach (learn RF on supervised data,
predict on unsupervised, then learn RF on all) but with no luck either.

Any ideas and comments welcome ! :)

Eustache

[1]
http://www.kaggle.com/c/challenges-in-representation-learning-the-black-box-learning-challenge
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to