In particular, see the

https://github.com/tdunning/pig-vector/tree/master/src/main/antlr3/org/apache/mahout/pig

directory.

On Wed, Dec 28, 2011 at 2:38 PM, Ted Dunning <[email protected]> wrote:

> Yes.
>
> In the pig-vector thing I am working on, I have a nice way to specify
> types and conversions.
>
> See https://github.com/tdunning/pig-vector
>
>
> On Wed, Dec 28, 2011 at 1:55 PM, Grant Ingersoll <[email protected]>wrote:
>
>> > When strings (or nominals) are converted to doubles, it seems to me
>> that the conversion adds additional irrelevant structure that I don't want.
>>   Depending on the order in which the strings are added, the assigned
>> doubles will vary.     Adjacent strings in the ordering will be close
>> together in the metric space/distance measure.  For example, if "john" is
>> 1, "bob" is 2, and "nancy" is 3, then john is
>> > closer to bob than to nancy.    For nominals, that seems wrong.    Most
>> users will probably really want three binary attributes: one for john, one
>> for bob, and one for nancy.
>> >
>>
>> We could perhaps use the SGD vector encoding stuff here?
>>
>
>

Reply via email to