HI Andreas,
Sorry for Jumping into the conversation and getting a bit off topic, what
does it mean "flat data" sets in sklearn ?
Bests
Nadim Farhat
Phd Bioengineering candidate
Center for Ultrasound and therapeutics
University of Pittsburgh
On Mon, Feb 22, 2016 at 12:12 PM Andreas Mueller <t3k...@gmail.com> wrote:
> Hi Guillaume.
>
> I was a big user of BoW myself, but I don't think it should go into
> scikit-learn.
> BoW doesn't really operate on a "flat" dataset, as scikit-learn usually
> does. It works on groups of data points.
> Each sample is usually a concatenation of feature vectors, which you
> summarize as a histogram.
> That doesn't really fit into the scikit-learn API.
>
> For any particular application (I did bag of visual words), creating an
> implementation using the kmeans or sparse coding in scikit-learn
> is only a couple of lines (you can find my visual bow for per-superpixel
> descriptors here
> https://github.com/amueller/segmentation/blob/master/bow.py#L184)
>
> Cheers,
> Andy
>
>
>
> On 02/14/2016 09:03 PM, Guillaume Lemaître wrote:
>
> Dear all,
>
> My group and I, are currently working on image classification applied to
> medical images. We are using the Bag-of-Features (or Bag-of-Visual-Words,
> Bag-of-Words) which was inspired originally from the text classification.
> In fact, we have a kind of dirty implementation [here](
> <https://github.com/glemaitre/protoclass/blob/master/protoclass/extraction/codebook.py>
> https://github.com/glemaitre/protoclass/blob/master/protoclass/extraction/codebook.py)
> which I would like to, somehow, even only if it is for a personal branch,
> integrate to the scikit-learn.
>
> However, I have some philosophical questions before to mess around, which
> in fact are feeding some discussions in our lab. Checking the API, the BoF
> approach could be part of the `feature_extraction` module. BoF is really
> similar to the implementation of the BoW for text as previously mentioned.
>
> Nevertheless, I am questioning if the BoF shall rather not be integrated
> to the `decomposition` module. By looking at it, the method consists of:
> (i) dictionary learning (base K-Means, Mean-Shift, etc.), (ii) encoding (or
> voting in that case using k-NN), and (iii) pooling (histogram).
>
> Thus, in some sort the BoF can be seen as any of the decomposition (even
> more similar to sparse coding). For instance the sparse learning follow
> exactly the same scheme: dictionary learning with K-SVD, encoding, and
> pooling (min/max/etc.). Similar thing for PCA, if you tackle the problem of
> dictionary as finding the eigenvectors/eigenvalues.
>
> My questions are thus the following:
> - what are you thinking about such thing;
> - where the BoF implementation of this approach is the most judicious;
> - would it be judicious to think about the different decomposition methods
> as the three steps earlier mentioned or it would be not at all intuitive?
>
> Hope that the topic is not to weird.
>
> Cheers,
> --
>
>
>
>
> *LEMAÎTRE Guillaume PhD Candidate MSc Erasmus Mundus ViBOT
> (Vision-roBOTic) MSc Business Innovation and Technology Management *
> <g.lemaitr...@gmail.com>g.lemaitr...@gmail.com
>
> *ViCOROB - Computer Vision and Robotic Team*
> Universitat de Girona, Campus Montilivi, Edifici P-IV 17071 Girona
> Tel. +34 972 41 98 12 - Fax. +34 972 41 82 59
> <http://vicorob.udg.es/>http://vicorob.udg.es/
>
> *LE2I - Le Creusot *IUT Le Creusot, Laboratoire LE2I, 12 rue de la
> Fonderie, 71200 Le Creusot
> Tel. +33 3 85 73 10 90 - Fax. +33 3 85 73 10 97
> <http://le2i.cnrs.fr>http://le2i.cnrs.fr
>
> https://sites.google.com/site/glemaitre58/
> Vice - Chairman of A.S.C. Fours UFOLEP
> Chairman of A.S.C. Fours FFC
> Webmaster of http://ascfours.free.fr
>
>
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup
> Now!http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>
>
>
> _______________________________________________
> Scikit-learn-general mailing
> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general