In particular, see the
https://github.com/tdunning/pig-vector/tree/master/src/main/antlr3/org/apache/mahout/pig directory. On Wed, Dec 28, 2011 at 2:38 PM, Ted Dunning <[email protected]> wrote: > Yes. > > In the pig-vector thing I am working on, I have a nice way to specify > types and conversions. > > See https://github.com/tdunning/pig-vector > > > On Wed, Dec 28, 2011 at 1:55 PM, Grant Ingersoll <[email protected]>wrote: > >> > When strings (or nominals) are converted to doubles, it seems to me >> that the conversion adds additional irrelevant structure that I don't want. >> Depending on the order in which the strings are added, the assigned >> doubles will vary. Adjacent strings in the ordering will be close >> together in the metric space/distance measure. For example, if "john" is >> 1, "bob" is 2, and "nancy" is 3, then john is >> > closer to bob than to nancy. For nominals, that seems wrong. Most >> users will probably really want three binary attributes: one for john, one >> for bob, and one for nancy. >> > >> >> We could perhaps use the SGD vector encoding stuff here? >> > >
