In a way yes.

Generally you want to convert nominal attributes to a "bitmap" (this has a
fancier name that is slipping my mind at the moment).  Where each "name" in
the nominal feature has a spot in the vector for being on or off.  In most
cases this should be set to one.  I am not aware of anything like that in
mahout for regular vector encoding.  You could reasonably easy write your
own.

For instance if you have A, B, and C as the three possible values in your
nominal feature, you would encode

A B C
1 0 0 for A
0 1 0 for B etc.

However, if you are planning on using the SGD classifiers you can use the
Hash based encoding for Categorical / Nominal features through the
WordValueEncoder.

Hope this helps.

Zach

On Sun, Dec 25, 2011 at 10:18 PM, Donald A. Smith
<[email protected]>wrote:

> I believe that vectorized attributes are stored as doubles in mahout.  Are
> some
> attributes "nominal"? That is, for some attributes is the distance
> function such that any two unequal values are at distance 1?
>
> Looking
> at MapBackedARFFModel.java, I see that weka nominal attributes get
> converted to integer-valued doubles (1.0, 2.0, 3.0, ...).   Will the
> nominal with value 1.0 be closer to the nominal with value 2.0 than to
> the nominal with value 3.0?  Or is the distance between 1.0 and 3.0 also 1?
>
>
>
>  Thanks, Don




-- 
Zach Richardson
Ravel, Co-founder
Austin, TX
[email protected]
512.825.6031

Reply via email to