Hi Patrick,
Yeah, you might be correct. But when I input testing query 'what is where'
with all stopwords, and having filter for stopwords still it classifies it
as :
'what is what' => films => 0.333333333333
'what is what' => laptops => 0.333333333333
'what is what' => medicine => 0.166666666667
'what is what' => mobile_phones => 0.166666666667
this behavior surprise me.
On Thu, Sep 4, 2014 at 3:54 AM, Patrick Short <[email protected]> wrote:
> Hi Karimkhan,
>
> If I am understanding your question correctly, you are asking to classify
> test data in a class that is not specified in your training set.
>
> For instance if you have three classes of news article specified in your
> training data (e.g. politics, sports, and food) and you try to classify an
> article that 'truly' best belongs in a 'business' category you are out of
> luck. Your classification can only be as good as the training data and your
> classifier will put the article in the closest match it can find (if the
> article was about McDonald's stock price, it might be classified as food,
> for instance).
>
> Hope that helps!
>
>
> On Wed, Sep 3, 2014 at 10:31 AM, Sebastian Raschka <[email protected]>
> wrote:
>
>> This is due to the Laplace smoothening. If I understand correctly, you
>> want the classification to fail if there is a new feature value (e.g., a
>> word that is not in the vocabulary when you are doing text classification)?
>>
>> You can set the alpha parameter to 0 (see
>> http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html#sklearn.naive_bayes.MultinomialNB)
>> which would disable the Laplace smoothening.
>>
>> Best,
>> Sebastian Raschka
>>
>> > On Sep 3, 2014, at 6:20 AM, Karimkhan Pathan <[email protected]>
>> wrote:
>> >
>> > I have trained my classifier using 20 domain datasets using
>> MultinomialNB. And it is working fine for these 20 domains.
>> >
>> > Issue is, if I make query which contains text which does not belongs to
>> any of these 20 domain, even it gives classification result.
>> >
>> > Is it possible that if query does not belongs to any of 20 domain, it
>> should get probability value 0?
>> >
>> ------------------------------------------------------------------------------
>> > Slashdot TV.
>> > Video for Nerds. Stuff that matters.
>> > http://tv.slashdot.org/
>> > _______________________________________________
>> > Scikit-learn-general mailing list
>> > [email protected]
>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>> ------------------------------------------------------------------------------
>> Slashdot TV.
>> Video for Nerds. Stuff that matters.
>> http://tv.slashdot.org/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
>
> --
> Patrick Short
> ------------------------------
>
> University of North Carolina at Chapel Hill, 2014
>
> Applied Mathematics and Quantitative Biology
>
> [email protected] | 919-455-7045 C
>
>
>
>
> ------------------------------------------------------------------------------
> Slashdot TV.
> Video for Nerds. Stuff that matters.
> http://tv.slashdot.org/
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Slashdot TV.
Video for Nerds. Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general