Thanks Ted, Robin, and Neil. I am now clear of my doubts, and would try the approach now. Regards Bhaskar Ghosh Hyderabad, India
http://www.google.com/profiles/bjgindia "Ignorance is Bliss... Knowledge never brings Peace!!!" ________________________________ From: Ted Dunning <[email protected]> To: [email protected] Cc: Bhaskar Ghosh <[email protected]>; [email protected] Sent: Sat, 2 October, 2010 12:11:53 AM Subject: Re: unknown test data twenty-newsgroups example Yes. Instance = training example. Your method of duplicating lines is just what Robin meant. On Fri, Oct 1, 2010 at 3:55 AM, Robin Anil <[email protected]> wrote: > Let me list what I understood. Pl confirm if I got it correct? >> >> Add duplicate extra lines many times in an extra file (conforming to the >> format required by the Bayes Classifier) in the format >> <class-name1><tab><word1> <word2> >> If I want to increase the weight of word1 and word2, so that text with >> those words have higher chance of getting classified as <class-name1> >> >> * >> * >> >No. Duplicating lines increases DF and therefore decreases (IDF == inverse >document frequency) So weight goes down. To increase weight of the word >repeat the word in the same line > > >Regards >Robin >
