The calling FeatureHasher.transform() method calls
csr_matrix.sum_duplicates on the output which will produce that
accumulation.
- Joel
On Mon, Jul 29, 2013 at 1:22 PM, Gad Abraham <gad.abra...@gmail.com> wrote:
>
>
> On 11 July 2013 18:33, Lars Buitinck <l.j.buiti...@uva.nl> wrote:
>
>> 2013/7/11 Gad Abraham <gad.abra...@gmail.com>:
>> > I'm very much a sklearn beginner, and I'd like to use FeatureHasher to
>> > reduce the dimensionality of a numeric matrix. Any hints on how to do
>> this?
>> > I've seen the examples showing how to use it with text.
>>
>> You mean the input is a NumPy array? There's no special support for
>> that, but the following should work (though it may be slow). Let X be
>> your array and d the desired dimensionality, then:
>>
>> hasher = FeatureHasher(n_features=d, input_type="pair")
>> features = map(str, range(X.shape[1]))
>> Xh = hasher.transform(zip(features, row) for row in X).toarray()
>>
>> hashes X into Xh of shape (X.shape[0], d).
>>
>>
> Thanks for that. Looking at the code in _hashing.pyx, don't the feature
> values need to be accumulated into each new position? i.e., shouldn't line
> 58 be
>
> values[size] += value
>
> instead of
>
> values[size] = value ?
>
> In the Weinberger 2009 paper, each new feature is the sum of the original
> feature x_j times +1 or -1.
>
> Thanks,
> Gad
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general