I have 2 separate approaches I am considering for real-world testing.
For kaggle cats and dogs, using a deep neural network trained on ImageNet
(DeCAF http://arxiv.org/abs/1310.1531) for preprocessing, coupled with any
kind of classifier, has had excellent success for me so far (even logistic
regression was ~95% accurate). The best result I have so far is DeCAF
preprocessing, followed by a 4 layer deep neural net. I am not really doing
anything special with the classifier - the discrimination power appears to
be primarily in the features output from the DeCAF network. It could be
interesting to try and reimplement/wrap the pretrained network in sklearn
somehow... though the authors have a newer framework called Caffe now
http://daggerfs.com/caffe/
I am thinking of using KMeansCoder features as a comparison - my guess is
that it will not be as good (or at least shouldn't be!), but for an
incredible reduction in complexity the tradeoff may be worth it in other
applications, where a dataset like ImageNet is not available. My primary
dataset is speech/communications signals, and I am trying to use these
techniques for cognitive radio/spectral sensing.
Eventually, a stacked KMeans approach will be evaluated - basically
multiple layers of KMeans coders, as in 'Learning Feature Representations
with K-means' by A. Coates and A. Ng. My primary dataset is unsupervised,
so the "learn a huge neural net and use it as pre-processing" technique
will probably not work, unless there is a big labeled dataset somewhere
else I haven't seen.
I will report back when there are some "real world" results - either for
speech/comms or dogs/cats. Thanks for writing this code originally! It is
a testament to the project that code from two years ago can be brought to a
working state with ~5 lines of minor modifications. I am also planning to
evaluate a K-SVD dictionary learning approach - does anyone know if that is
currently implemented/in development for sklearn? I haven't looked for it
in sklearn yet, but it seems like a cool approach
On Fri, Dec 13, 2013 at 12:20 PM, Vlad Niculae <zephy...@gmail.com> wrote:
> Great, thanks a lot!
>
> I'm also curious about what you're running it on and about how the
> performance is.
>
> Vlad
>
> On Fri, Dec 13, 2013 at 7:11 PM, Olivier Grisel
> <olivier.gri...@ensta.org> wrote:
> > Nice.
> >
> > Have you used it with success for real image classification tasks?
> >
> > I see you have been involved in the cats vs dogs kaggle competition.
> > Is learning a linear model, if so we might consider including the such
> > KMeansCoder as part of the sklearn.feature_extraction.image module and
> > write an example for that dataset.
> >
> > Many people ask us how to use scikit-learn for image classification
> > and we have no getting started example to point them at. If the KMeans
> > patch encoder proves to be a reasonable baseline I would be +1 for
> > having it as part of scikit-learn.
> >
> > Do you do some max pooling + normalization on the output?
> >
> > --
> > Olivier
> >
> >
> ------------------------------------------------------------------------------
> > Rapidly troubleshoot problems before they affect your business. Most IT
> > organizations don't have a clear picture of how application performance
> > affects their revenue. With AppDynamics, you get 100% visibility into
> your
> > Java,.NET, & PHP application. Start your 15-day FREE TRIAL of
> AppDynamics Pro!
> >
> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> ------------------------------------------------------------------------------
> Rapidly troubleshoot problems before they affect your business. Most IT
> organizations don't have a clear picture of how application performance
> affects their revenue. With AppDynamics, you get 100% visibility into your
> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics
> Pro!
> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general