The thing to look at is the encoder framework
in org.apache.mahout.vectorizer.encoders

See for instance

https://github.com/apache/mahout/blob/trunk/core/src/main/java/org/apache/mahout/vectorizer/encoders/StaticWordValueEncoder.java

Chapter 14 of Mahout in Action describes the process in more detail.  There
are examples in the Mahout distribution as well.

On Thu, Aug 16, 2012 at 5:41 PM, Chandra Mohan, Ananda Vel Murugan <
[email protected]> wrote:

> Hi,
>
> Almost all my data in CSV file is categorical data. Can you elaborate what
> you mean by fancier footwork? Should I convert categories into some numbers
> and store in vector? Thanks!!
>
> -----Original Message-----
> From: Ted Dunning [mailto:[email protected]]
> Sent: Thursday, August 16, 2012 8:08 PM
> To: [email protected]
> Cc: [email protected]
> Subject: Re: Encoding and vectorizing
>
> If your data is dense and numerical, then you don't need anything but
> trivial encoding.  Just copy the values from your CSV file into the vector,
> converting to numbers as you go.  If some of your data are categorical or
> textual, you will need fancier footwork.
>
> On Thu, Aug 16, 2012 at 3:28 AM, Chandra Mohan, Ananda Vel Murugan <
> [email protected]> wrote:
>
> > I am a beginner in mahout with not much background in math. I want to
> know
> > what is encoder and vectorizer in mahout.
> >
> > As far I know vector can be thought of as an array or tuple containing
> > values for a specific attribute of the object which vector represents.
> >
> > I have testcell data for mechanical component testing. I create a CSV
> file
> > with various details gathered from test cell database. I want to run
> > logistic regression on this data and predict the components life based on
> > test cell data.  I want to understand what is vectorization and encoding
> in
> > this context.
> >
> > Any help would be greatly appreciated.
> >
> > Regards,
> > Anand.C
> >
>

Reply via email to