Re: [Scikit-learn-general] inverse transform in HashingVectorizer

2013-04-09 Thread Olivier Grisel
I would like to have a new attribute hashing_trace_ such as: >>> vec = HashingVectorizer(tracing=True) >>> vec.fit_transform([list_of_docs]) >>> vec.hashing_trace_ {4534: [('the', 344), ('rarercollidingtoken', 2)], 134: [('someothertoken', 1)], } `hashing_trace_` would be a dict of lists of t

Re: [Scikit-learn-general] inverse transform in HashingVectorizer

2013-04-09 Thread Terry Peng
Could you please elaborate more how to add this tracing mode? i also realized my idea about keep the mapping from document tokens to non-zero elements wouldn't work, since non-zero elements are not in the same order of the original tokens. use FeatureHasher to work on the individual word might w

Re: [Scikit-learn-general] inverse transform in HashingVectorizer

2013-04-09 Thread Olivier Grisel
2013/4/9 Terry Peng : > Hi all, > > From HashingVectorizer's document, it said: > > - there is no way to compute the inverse transform (from feature indices > to > string feature names) which can be a problem when trying to introspect > which features are most important to a model.

Re: [Scikit-learn-general] inverse transform in HashingVectorizer

2013-04-09 Thread Andreas Mueller
On 04/09/2013 09:48 AM, Terry Peng wrote: Hi all, what do you think? If you want to store the dict, why not use the DictVectorizer? -- Precog is a next-generation analytics platform capable of advanced analytics on semi

[Scikit-learn-general] inverse transform in HashingVectorizer

2013-04-09 Thread Terry Peng
Hi all, From HashingVectorizer's document, it said: - there is no way to compute the inverse transform (from feature indices to string feature names) which can be a problem when trying to introspect which features are most important to a model. but i'm wondering if i can keep the