Hi,
Most time Mahout algorithms use Vector as the model training input,
but don’t take care of how the instance vectors are generated, then
every algorithm has it’s unique way, causing the original input file
format requirement bound to specific algorithm. That causes a lot of
work for the actual users, especially for command line users. For
example, if we want to build a Logistic Regression and Naïve bayes
model for the same data, we must prepare the data in two formats.
Hence here comes for requirement that can you provide a universal
mechanism for handling input data, such as CSV and a CSV to Vector
encoder, then all algorithms will use it, and users just have to
prepare data as CSV.

Regards,

Xiaobo Gu

Reply via email to