Good idea. Somebody should file a JIRA. My guess is that the best first step would be to have the logistic regression handle the naive Bayes input format.
2011/7/25 Fernando Fernández <[email protected]> > That would be very nice, actually I haven't tested most of Mahout > algorithms > for that reason... > > 2011/7/25 Xiaobo Gu <[email protected]> > > > Hi, > > Most time Mahout algorithms use Vector as the model training input, > > but don’t take care of how the instance vectors are generated, then > > every algorithm has it’s unique way, causing the original input file > > format requirement bound to specific algorithm. That causes a lot of > > work for the actual users, especially for command line users. For > > example, if we want to build a Logistic Regression and Naïve bayes > > model for the same data, we must prepare the data in two formats. > > Hence here comes for requirement that can you provide a universal > > mechanism for handling input data, such as CSV and a CSV to Vector > > encoder, then all algorithms will use it, and users just have to > > prepare data as CSV. > > > > Regards, > > > > Xiaobo Gu > > >
