Can you show me any material describing the file format requirement of Naïve Bayes please.
> -----Original Message----- > From: Ted Dunning [mailto:[email protected]] > Sent: Monday, July 25, 2011 11:16 PM > To: [email protected] > Cc: [email protected] > Subject: Re: What about a universal input data handling mechanism for Mahout? > > Good idea. > > Somebody should file a JIRA. My guess is that the best first step would be > to have the logistic regression handle the naive Bayes input format. > > 2011/7/25 Fernando Fernández <[email protected]> > > > That would be very nice, actually I haven't tested most of Mahout > > algorithms > > for that reason... > > > > 2011/7/25 Xiaobo Gu <[email protected]> > > > > > Hi, > > > Most time Mahout algorithms use Vector as the model training input, > > > but don’t take care of how the instance vectors are generated, then > > > every algorithm has it’s unique way, causing the original input file > > > format requirement bound to specific algorithm. That causes a lot of > > > work for the actual users, especially for command line users. For > > > example, if we want to build a Logistic Regression and Naïve bayes > > > model for the same data, we must prepare the data in two formats. > > > Hence here comes for requirement that can you provide a universal > > > mechanism for handling input data, such as CSV and a CSV to Vector > > > encoder, then all algorithms will use it, and users just have to > > > prepare data as CSV. > > > > > > Regards, > > > > > > Xiaobo Gu > > > > >
