Hi Robin/Neil, I was also trying the 20Newsgroups example, and was following your conversation. I am confused now with the use of the word 'instance'. I actually could not get the meaning of these lines:
extra file or extra line, duplicated instances(to decrease the weights) or >duplicate feature in the same instance to increase the weights(classic >tf-idf) Let me list what I understood. Pl confirm if I got it correct? Add duplicate extra lines many times in an extra file (conforming to the format required by the Bayes Classifier) in the format ><class-name1><tab><word1> <word2> >If I want to increase the weight of word1 and word2, so that text with those >words have higher chance of getting classified as <class-name1> Thanks Bhaskar Ghosh Hyderabad, India http://www.google.com/profiles/bjgindia "Ignorance is Bliss... Knowledge never brings Peace!!!" ________________________________ From: Robin Anil <[email protected]> To: [email protected] Cc: [email protected] Sent: Thu, 30 September, 2010 9:59:47 PM Subject: Re: unknown test data twenty-newsgroups example On Thu, Sep 30, 2010 at 9:45 PM, Neil Ghosh <[email protected]> wrote: > > Do you mean , I should 1st create the model with correct data in correct > folder (Label). > > Now you throw an instance at it and you will get the correct label, well most of the time.
