I would like to have a new attribute hashing_trace_ such as:
>>> vec = HashingVectorizer(tracing=True)
>>> vec.fit_transform([list_of_docs])
>>> vec.hashing_trace_
{4534: [('the', 344), ('rarercollidingtoken', 2)], 134:
[('someothertoken', 1)], ....}
`hashing_trace_` would be a dict of lists of tuples (pairs)
(defaultdic(list)) where the keys are the feature indices and the
tuples would be all the pairs ('feature_name', document_frequency)
where document_frequency is an integer count of the number of
documents where the feature occur.
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general