On 11 July 2013 18:33, Lars Buitinck <l.j.buiti...@uva.nl> wrote:
> 2013/7/11 Gad Abraham <gad.abra...@gmail.com>:
> > I'm very much a sklearn beginner, and I'd like to use FeatureHasher to
> > reduce the dimensionality of a numeric matrix. Any hints on how to do
> this?
> > I've seen the examples showing how to use it with text.
>
> You mean the input is a NumPy array? There's no special support for
> that, but the following should work (though it may be slow). Let X be
> your array and d the desired dimensionality, then:
>
> hasher = FeatureHasher(n_features=d, input_type="pair")
> features = map(str, range(X.shape[1]))
> Xh = hasher.transform(zip(features, row) for row in X).toarray()
>
> hashes X into Xh of shape (X.shape[0], d).
>
>
Thanks for that. Looking at the code in _hashing.pyx, don't the feature
values need to be accumulated into each new position? i.e., shouldn't line
58 be
values[size] += value
instead of
values[size] = value ?
In the Weinberger 2009 paper, each new feature is the sum of the original
feature x_j times +1 or -1.
Thanks,
Gad
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general