Cool!

On Wed, Dec 28, 2011 at 2:39 PM, Ted Dunning <[email protected]> wrote:
> In particular, see the
>
>
> https://github.com/tdunning/pig-vector/tree/master/src/main/antlr3/org/apache/mahout/pig
>
> directory.
>
> On Wed, Dec 28, 2011 at 2:38 PM, Ted Dunning <[email protected]> wrote:
>
>> Yes.
>>
>> In the pig-vector thing I am working on, I have a nice way to specify
>> types and conversions.
>>
>> See https://github.com/tdunning/pig-vector
>>
>>
>> On Wed, Dec 28, 2011 at 1:55 PM, Grant Ingersoll <[email protected]>wrote:
>>
>>> > When strings (or nominals) are converted to doubles, it seems to me
>>> that the conversion adds additional irrelevant structure that I don't want.
>>>   Depending on the order in which the strings are added, the assigned
>>> doubles will vary.     Adjacent strings in the ordering will be close
>>> together in the metric space/distance measure.  For example, if "john" is
>>> 1, "bob" is 2, and "nancy" is 3, then john is
>>> > closer to bob than to nancy.    For nominals, that seems wrong.    Most
>>> users will probably really want three binary attributes: one for john, one
>>> for bob, and one for nancy.
>>> >
>>>
>>> We could perhaps use the SGD vector encoding stuff here?
>>>
>>
>>



-- 
Lance Norskog
[email protected]

Reply via email to