The javadocs on ARFFVectorIterable say:
* Attribute type handling:
* <ul>
* <li>Numeric -> As is</li>
* <li>Nominal -> ordinal(value) i.e. @attribute lumber
{'\'(-inf-0.5]\'','\'(0.5-inf)\''}
* will convert -inf-0.5 -> 0, and 0.5-inf -> 1</li>
* <li>Dates -> Convert to time as a long</li>
* <li>Strings -> Create a map of String -> long</li>
* </ul>
The code for this is in MapBackedARFFModel which implements ARFFModel, so I
suspect if it doesn't do exactly as you wish, it can be overridden.
On Dec 21, 2011, at 12:37 PM, Donald A. Smith wrote:
> Weka's ARFF format allows string attrbutes.
>
> @ATTRIBUTE userName string
>
> Will "mahout arff.vector" correctly handle conversion from such strings to
> vectors in such a way that the attribute will, effectively, be treated the
> same as a nominal attribute? That is, will the set of strings be converted
> into a set of nominal attributes (one for each possible string value)?
>
> @ATTRIBUTE userName {bob, fred, harry, jill, betsy, george, bill}
>
> In general, will I lose any information by using arff.vector?
>
> For date attributes, will mahout insert derived attributes (hour of day, day
> of week)? I presume not and I presume I have to add them myself.
>
> Thanks, Don
--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com