Re: unknown test data twenty-newsgroups example

Bhaskar Ghosh Fri, 01 Oct 2010 12:57:27 -0700

Thanks Ted, Robin, and Neil. I am now clear of my doubts, and would try the 
approach now.
 Regards
Bhaskar Ghosh
Hyderabad, India


http://www.google.com/profiles/bjgindia

"Ignorance is Bliss... Knowledge never brings Peace!!!"




________________________________
From: Ted Dunning <[email protected]>
To: [email protected]
Cc: Bhaskar Ghosh <[email protected]>; [email protected]
Sent: Sat, 2 October, 2010 12:11:53 AM
Subject: Re: unknown test data twenty-newsgroups example


Yes.  Instance = training example.

Your method of duplicating lines is just what Robin meant.


On Fri, Oct 1, 2010 at 3:55 AM, Robin Anil <[email protected]> wrote:

> Let me list what I understood. Pl confirm if I got it correct?
>>
>> Add duplicate extra lines many times in an extra file (conforming to the
>> format required by the Bayes Classifier) in the format
>> <class-name1><tab><word1> <word2>
>> If I want to increase the weight of word1 and word2, so that text with
>> those words have higher chance of getting classified as <class-name1>
>>
>> *
>> *
>>
>No. Duplicating lines increases DF and therefore decreases (IDF == inverse
>document frequency) So weight goes down. To increase weight of the word
>repeat the word in the same line
>
>
>Regards
>Robin
>

Re: unknown test data twenty-newsgroups example

Reply via email to