Dear All, I need to classify a bunch of text files, so determine which class does each one of these texts fall.
Now I have seen through the 20Newsgroups example. I see that the input text files need to have a particular format: <class-label> <tab> <unique features (words) associated with the class-label> But the real question is how do I get such a pre-processed input file? Do I need to process the input text files, to get it into the required format? Then it would required extracting the unique words/features from the raw text, in addition to assigning class-labels, as well. OR There is some classifier class that can take raw input files? My input would be something like: <class-label1> <file1-text> <class-label2> <file3-text> <class-label1> <file2-text> etc. Thanks Bhaskar Ghosh Hyderabad, India http://www.google.com/profiles/bjgindia "Ignorance is Bliss... Knowledge never brings Peace!!!"
