1-of-n encoding. That's it. On Mon, Dec 26, 2011 at 4:36 PM, Ted Dunning <[email protected]> wrote:
> Mahout uses 1-of-n encoding (aka Zach's bitmap) but stores these encodings > all together in double vectors for consistency. > > In the hashed encoding, we do this, but all of the encoded variables live > on top of each other in randomized and multiple locations in the encoded > vector. This sounds crazy, but works quite well. > > On Sun, Dec 25, 2011 at 9:18 PM, Zach Richardson <[email protected]> > wrote: > > > In a way yes. > > > > Generally you want to convert nominal attributes to a "bitmap" (this has > a > > fancier name that is slipping my mind at the moment). Where each "name" > in > > the nominal feature has a spot in the vector for being on or off. In > most > > cases this should be set to one. I am not aware of anything like that in > > mahout for regular vector encoding. You could reasonably easy write your > > own. > > > > For instance if you have A, B, and C as the three possible values in your > > nominal feature, you would encode > > > > A B C > > 1 0 0 for A > > 0 1 0 for B etc. > > > > However, if you are planning on using the SGD classifiers you can use the > > Hash based encoding for Categorical / Nominal features through the > > WordValueEncoder. > > > > Hope this helps. > > > > Zach > > > > On Sun, Dec 25, 2011 at 10:18 PM, Donald A. Smith > > <[email protected]>wrote: > > > > > I believe that vectorized attributes are stored as doubles in mahout. > > Are > > > some > > > attributes "nominal"? That is, for some attributes is the distance > > > function such that any two unequal values are at distance 1? > > > > > > Looking > > > at MapBackedARFFModel.java, I see that weka nominal attributes get > > > converted to integer-valued doubles (1.0, 2.0, 3.0, ...). Will the > > > nominal with value 1.0 be closer to the nominal with value 2.0 than to > > > the nominal with value 3.0? Or is the distance between 1.0 and 3.0 > also > > 1? > > > > > > > > > > > > Thanks, Don > > > > > > > > > > -- > > Zach Richardson > > Ravel, Co-founder > > Austin, TX > > [email protected] > > 512.825.6031 > > > -- Zach Richardson Ravel, Co-founder Austin, TX [email protected] 512.825.6031
