In a way yes. Generally you want to convert nominal attributes to a "bitmap" (this has a fancier name that is slipping my mind at the moment). Where each "name" in the nominal feature has a spot in the vector for being on or off. In most cases this should be set to one. I am not aware of anything like that in mahout for regular vector encoding. You could reasonably easy write your own.
For instance if you have A, B, and C as the three possible values in your nominal feature, you would encode A B C 1 0 0 for A 0 1 0 for B etc. However, if you are planning on using the SGD classifiers you can use the Hash based encoding for Categorical / Nominal features through the WordValueEncoder. Hope this helps. Zach On Sun, Dec 25, 2011 at 10:18 PM, Donald A. Smith <[email protected]>wrote: > I believe that vectorized attributes are stored as doubles in mahout. Are > some > attributes "nominal"? That is, for some attributes is the distance > function such that any two unequal values are at distance 1? > > Looking > at MapBackedARFFModel.java, I see that weka nominal attributes get > converted to integer-valued doubles (1.0, 2.0, 3.0, ...). Will the > nominal with value 1.0 be closer to the nominal with value 2.0 than to > the nominal with value 3.0? Or is the distance between 1.0 and 3.0 also 1? > > > > Thanks, Don -- Zach Richardson Ravel, Co-founder Austin, TX [email protected] 512.825.6031
