Hey Guillaume.
If it is a couple of hours, I'm not sure it is worth adding.
You can probably aggressively subsample or just do fewer iterations (like, one pass over the data)
How do you run MiniBatchKMeans?

Cheers,
Andy

On 03/08/2016 03:21 PM, Guillaume Lemaître wrote:
Hi,

I made a pull-request with the draft: https://github.com/scikit-learn/scikit-learn/pull/6509
Extracting the feature is taking a honest amount of time (around 30 sec.)
The codebook generation through MiniBatchKMeans is more problematic. I am still running it but it could be a couple of hours.

Let me know what do you think about it,

Cheers,

On 24 February 2016 at 00:41, Andy <t3k...@gmail.com <mailto:t3k...@gmail.com>> wrote:

    On 02/23/2016 04:32 PM, Guillaume Lemaitre wrote:
    Since that I was working on a cluster I did not realize but
    loading all the image in memory will be problematic with a
    laptop-desktop configuration.

    Or we can learn the PCA projection on a subset and to apply the
    dimension reduction right after the patch extraction. However, I
    am not sure that all data will fit in memory.

    We have out of core versions for PCA and KMeans.

    I think the way I'd do it is to go over all images, extract only a
    couple of patches from each image, store them.
    After we have some patches from all images, I'd learn the PCA model.
    Then we can go over the data again, transforming the patches. If
    they don't fit into memory after dimensionality reduction, we can
    use minibatch k-means to do the clustering without loading all the
    data.
    then we need to go over the data one more time to get the cluster
    centers and compute the BoW (which will fit in memory)

    
------------------------------------------------------------------------------
    Site24x7 APM Insight: Get Deep Visibility into Application Performance
    APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
    Monitor end-to-end web transactions and take corrective actions now
    Troubleshoot faster and improve end-user experience. Signup Now!
    http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
    _______________________________________________
    Scikit-learn-general mailing list
    Scikit-learn-general@lists.sourceforge.net
    <mailto:Scikit-learn-general@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




--
*LEMAÎTRE Guillaume
PhD Candidate
MSc Erasmus Mundus ViBOT (Vision-roBOTic)
MSc Business Innovation and Technology Management
**
*g.lemaitr...@gmail.com <mailto:g.lemaitr...@gmail.com>

        *ViCOROB - Computer Vision and Robotic Team*
Universitat de Girona, Campus Montilivi, Edifici P-IV 17071 Girona
Tel. +34 972 41 98 12 - Fax. +34 972 41 82 59
http://vicorob.udg.es/
*LE2I - Le Creusot
*IUT Le Creusot, Laboratoire LE2I, 12 rue de la Fonderie, 71200 Le Creusot
Tel. +33 3 85 73 10 90 - Fax. +33 3 85 73 10 97
http://le2i.cnrs.fr

https://sites.google.com/site/glemaitre58/
Vice - Chairman of A.S.C. Fours UFOLEP
Chairman of A.S.C. Fours FFC
Webmaster of http://ascfours.free.fr


------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://makebettercode.com/inteldaal-eval


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://makebettercode.com/inteldaal-eval
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to