Hi, Most time Mahout algorithms use Vector as the model training input, but don’t take care of how the instance vectors are generated, then every algorithm has it’s unique way, causing the original input file format requirement bound to specific algorithm. That causes a lot of work for the actual users, especially for command line users. For example, if we want to build a Logistic Regression and Naïve bayes model for the same data, we must prepare the data in two formats. Hence here comes for requirement that can you provide a universal mechanism for handling input data, such as CSV and a CSV to Vector encoder, then all algorithms will use it, and users just have to prepare data as CSV.
Regards, Xiaobo Gu
