Hi, 

Almost all my data in CSV file is categorical data. Can you elaborate what you 
mean by fancier footwork? Should I convert categories into some numbers and 
store in vector? Thanks!!

-----Original Message-----
From: Ted Dunning [mailto:[email protected]] 
Sent: Thursday, August 16, 2012 8:08 PM
To: [email protected]
Cc: [email protected]
Subject: Re: Encoding and vectorizing

If your data is dense and numerical, then you don't need anything but
trivial encoding.  Just copy the values from your CSV file into the vector,
converting to numbers as you go.  If some of your data are categorical or
textual, you will need fancier footwork.

On Thu, Aug 16, 2012 at 3:28 AM, Chandra Mohan, Ananda Vel Murugan <
[email protected]> wrote:

> I am a beginner in mahout with not much background in math. I want to know
> what is encoder and vectorizer in mahout.
>
> As far I know vector can be thought of as an array or tuple containing
> values for a specific attribute of the object which vector represents.
>
> I have testcell data for mechanical component testing. I create a CSV file
> with various details gathered from test cell database. I want to run
> logistic regression on this data and predict the components life based on
> test cell data.  I want to understand what is vectorization and encoding in
> this context.
>
> Any help would be greatly appreciated.
>
> Regards,
> Anand.C
>

Reply via email to