The javadocs on ARFFVectorIterable say:
* Attribute type handling:
 * <ul>
 * <li>Numeric -> As is</li>
 * <li>Nominal -> ordinal(value) i.e. @attribute lumber 
{'\'(-inf-0.5]\'','\'(0.5-inf)\''}
 * will convert -inf-0.5 -> 0, and 0.5-inf -> 1</li>
 * <li>Dates -> Convert to time as a long</li>
 * <li>Strings -> Create a map of String -> long</li>
 * </ul>

The code for this is in MapBackedARFFModel which implements ARFFModel, so I 
suspect if it doesn't do exactly as you wish, it can be overridden.

On Dec 21, 2011, at 12:37 PM, Donald A. Smith wrote:

> Weka's ARFF format allows string attrbutes.
> 
>   @ATTRIBUTE userName string
> 
> Will "mahout arff.vector" correctly handle conversion from such strings to 
> vectors in such a way that the attribute will, effectively, be treated the 
> same as a nominal attribute? That is, will the set of strings be converted 
> into a set of nominal attributes (one for each possible string value)?
> 
>   @ATTRIBUTE userName {bob, fred, harry, jill, betsy, george, bill}
> 
> In general, will I lose any information by using arff.vector?
> 
> For date attributes, will mahout insert derived attributes (hour of day, day 
> of week)? I presume not and I presume I have to add them myself.
> 
>  Thanks, Don

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com



Reply via email to