Hi Guillaume.
I was a big user of BoW myself, but I don't think it should go into
scikit-learn.
BoW doesn't really operate on a "flat" dataset, as scikit-learn usually
does. It works on groups of data points.
Each sample is usually a concatenation of feature vectors, which you
summarize as a histogram.
That doesn't really fit into the scikit-learn API.
For any particular application (I did bag of visual words), creating an
implementation using the kmeans or sparse coding in scikit-learn
is only a couple of lines (you can find my visual bow for per-superpixel
descriptors here
https://github.com/amueller/segmentation/blob/master/bow.py#L184)
Cheers,
Andy
On 02/14/2016 09:03 PM, Guillaume Lemaître wrote:
Dear all,
My group and I, are currently working on image classification applied
to medical images. We are using the Bag-of-Features (or
Bag-of-Visual-Words, Bag-of-Words) which was inspired originally from
the text classification. In fact, we have a kind of dirty
implementation
[here](https://github.com/glemaitre/protoclass/blob/master/protoclass/extraction/codebook.py)
which I would like to, somehow, even only if it is for a personal
branch, integrate to the scikit-learn.
However, I have some philosophical questions before to mess around,
which in fact are feeding some discussions in our lab. Checking the
API, the BoF approach could be part of the `feature_extraction`
module. BoF is really similar to the implementation of the BoW for
text as previously mentioned.
Nevertheless, I am questioning if the BoF shall rather not be
integrated to the `decomposition` module. By looking at it, the method
consists of: (i) dictionary learning (base K-Means, Mean-Shift, etc.),
(ii) encoding (or voting in that case using k-NN), and (iii) pooling
(histogram).
Thus, in some sort the BoF can be seen as any of the decomposition
(even more similar to sparse coding). For instance the sparse learning
follow exactly the same scheme: dictionary learning with K-SVD,
encoding, and pooling (min/max/etc.). Similar thing for PCA, if you
tackle the problem of dictionary as finding the eigenvectors/eigenvalues.
My questions are thus the following:
- what are you thinking about such thing;
- where the BoF implementation of this approach is the most judicious;
- would it be judicious to think about the different decomposition
methods as the three steps earlier mentioned or it would be not at all
intuitive?
Hope that the topic is not to weird.
Cheers,
--
*LEMAÎTRE Guillaume
PhD Candidate
MSc Erasmus Mundus ViBOT (Vision-roBOTic)
MSc Business Innovation and Technology Management
**
*g.lemaitr...@gmail.com <mailto:g.lemaitr...@gmail.com>
*ViCOROB - Computer Vision and Robotic Team*
Universitat de Girona, Campus Montilivi, Edifici P-IV 17071 Girona
Tel. +34 972 41 98 12 - Fax. +34 972 41 82 59
http://vicorob.udg.es/
*LE2I - Le Creusot
*IUT Le Creusot, Laboratoire LE2I, 12 rue de la Fonderie, 71200 Le Creusot
Tel. +33 3 85 73 10 90 - Fax. +33 3 85 73 10 97
http://le2i.cnrs.fr
https://sites.google.com/site/glemaitre58/
Vice - Chairman of A.S.C. Fours UFOLEP
Chairman of A.S.C. Fours FFC
Webmaster of http://ascfours.free.fr
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general