Yes.

In the pig-vector thing I am working on, I have a nice way to specify types
and conversions.

See https://github.com/tdunning/pig-vector

On Wed, Dec 28, 2011 at 1:55 PM, Grant Ingersoll <[email protected]>wrote:

> > When strings (or nominals) are converted to doubles, it seems to me that
> the conversion adds additional irrelevant structure that I don't want.
> Depending on the order in which the strings are added, the assigned doubles
> will vary.     Adjacent strings in the ordering will be close together in
> the metric space/distance measure.  For example, if "john" is 1, "bob" is
> 2, and "nancy" is 3, then john is
> > closer to bob than to nancy.    For nominals, that seems wrong.    Most
> users will probably really want three binary attributes: one for john, one
> for bob, and one for nancy.
> >
>
> We could perhaps use the SGD vector encoding stuff here?
>

Reply via email to