Cool! On Wed, Dec 28, 2011 at 2:39 PM, Ted Dunning <[email protected]> wrote: > In particular, see the > > > https://github.com/tdunning/pig-vector/tree/master/src/main/antlr3/org/apache/mahout/pig > > directory. > > On Wed, Dec 28, 2011 at 2:38 PM, Ted Dunning <[email protected]> wrote: > >> Yes. >> >> In the pig-vector thing I am working on, I have a nice way to specify >> types and conversions. >> >> See https://github.com/tdunning/pig-vector >> >> >> On Wed, Dec 28, 2011 at 1:55 PM, Grant Ingersoll <[email protected]>wrote: >> >>> > When strings (or nominals) are converted to doubles, it seems to me >>> that the conversion adds additional irrelevant structure that I don't want. >>> Depending on the order in which the strings are added, the assigned >>> doubles will vary. Adjacent strings in the ordering will be close >>> together in the metric space/distance measure. For example, if "john" is >>> 1, "bob" is 2, and "nancy" is 3, then john is >>> > closer to bob than to nancy. For nominals, that seems wrong. Most >>> users will probably really want three binary attributes: one for john, one >>> for bob, and one for nancy. >>> > >>> >>> We could perhaps use the SGD vector encoding stuff here? >>> >> >>
-- Lance Norskog [email protected]
