Feature vector generation from Bag-of-Words

Stuti Awasthi Tue, 21 May 2013 04:18:25 -0700

Hi all,

I have a query regarding the Feature Vector generation for Text documents.
I have read Mahout in Action and understood how to create the text document in 
feature vector weighed by Tf of Tfidf schemes. My usecase is a little tweaked 
with that.

I have few keywords may be say 100 and I want to create the Feature Vector of
the text documents only with these 100 keywords. So I would like to calculate
the frequency of each keyword in each document and generate the feature vector
of the keyword with the frequency as weights.

Is there any already present way to do this or Il need to write the custom code?

Thanks
Stuti Awasthi

::DISCLAIMER::
----------------------------------------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended
for the named recipient(s) only.
E-mail transmission is not guaranteed to be secure or error-free as information
could be intercepted, corrupted,
lost, destroyed, arrive late or incomplete, or may contain viruses in
transmission. The e mail and its contents
(with or without referred errors) shall therefore not attach any liability on
the originator or HCL or its affiliates.
Views or opinions, if any, presented in this email are solely those of the
author and may not necessarily reflect the
views or opinions of HCL or its affiliates. Any form of reproduction,
dissemination, copying, disclosure, modification,
distribution and / or publication of this message without the prior written
consent of authorized representative of
HCL is strictly prohibited. If you have received this email in error please
delete it and notify the sender immediately.
Before opening any email and/or attachments, please check them for viruses and
other defects.

----------------------------------------------------------------------------------------------------------------------------------------------------

Feature vector generation from Bag-of-Words

Reply via email to