Yes. In the pig-vector thing I am working on, I have a nice way to specify types and conversions.
See https://github.com/tdunning/pig-vector On Wed, Dec 28, 2011 at 1:55 PM, Grant Ingersoll <[email protected]>wrote: > > When strings (or nominals) are converted to doubles, it seems to me that > the conversion adds additional irrelevant structure that I don't want. > Depending on the order in which the strings are added, the assigned doubles > will vary. Adjacent strings in the ordering will be close together in > the metric space/distance measure. For example, if "john" is 1, "bob" is > 2, and "nancy" is 3, then john is > > closer to bob than to nancy. For nominals, that seems wrong. Most > users will probably really want three binary attributes: one for john, one > for bob, and one for nancy. > > > > We could perhaps use the SGD vector encoding stuff here? >
