I think you need to do something with your strings. Usually this means 
converting them into terms and giving each term a separate id, making each term 
a feature and numberic. And remember that all IDs must be usable by Mahout. 
This typically means that you have to replace all of your ids with sequential 
Ints from 0-number of features or rows. So you “id” must be converted into 
0-number of “ids”. I do this with a bi-directional dictionary so you can 
convert them back into your application ids once they are processed.

How many classifiers are you creating and to what purpose. There may be other 
ways to do what you need. This sounds like a job for a search engine since it 
can digest stings and csvs, but not if you really need a classifier rather than 
similarity.
   
On Aug 8, 2014, at 8:22 PM, Suneel Marthi <[email protected]> wrote:

See
http://stackoverflow.com/questions/13663567/mahout-csv-to-vector-and-running-the-program




On Fri, Aug 8, 2014 at 11:05 PM, Aniket <[email protected]> wrote:

> Hi,
> 
> I am working on project & want to run a dataset on mahout for naive bayes
> classifier.
> dataset has csv format with columns ( id , rating ,summary, review, label).
> 
> id : numeric
> rating : numeric ( 1 to 5)
> summary : 4-5 texts strings
> review : more texts and strings
> label : positive or negative.
> 
> I am not able to fingure out how to do csv to seq. files beacuse csv has
> texts
> as well as numeric attributes. Can you please help with this ?
> 
> Thanks.
> Aniket
> 
> 

Reply via email to