Hi Patrick,

Juste for information, there is some existing techniques to detect test
instances whose class is not provided for training. Instead of letting the
classifier put those instances in the closest match it can (the most
probable known class), we detect that they belongs to a novel class which
was unknown during training.

An example of this is the paper that can be found here:
http://www.loria.fr/~mbouguel/papers/BougueliaICPR.pdf
Mohamed-Rafik Bouguelia, Yoland Belaid and Abdel Belaid. Efficient active
novel class detection for data stream classification. In the IEEE
International Conference on Pattern Recognition - ICPR, Stockholm (Sweden),
August 2014.

It would be nice if some of these methods can be implemented in
Sickit-Learn.


2014-09-04 0:24 GMT+02:00 Patrick Short <[email protected]>:

> Hi Karimkhan,
>
> If I am understanding your question correctly, you are asking to classify
> test data in a class that is not specified in your training set.
>
> For instance if you have three classes of news article specified in your
> training data (e.g. politics, sports, and food) and you try to classify an
> article that 'truly' best belongs in a 'business' category you are out of
> luck. Your classification can only be as good as the training data and your
> classifier will put the article in the closest match it can find (if the
> article was about McDonald's stock price, it might be classified as food,
> for instance).
>
> Hope that helps!
>
>
> On Wed, Sep 3, 2014 at 10:31 AM, Sebastian Raschka <[email protected]>
> wrote:
>
>> This is due to the Laplace smoothening. If I understand correctly, you
>> want the classification to fail if there is a new feature value (e.g., a
>> word that is not in the vocabulary when you are doing text classification)?
>>
>> You can set the alpha parameter to 0 (see
>> http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html#sklearn.naive_bayes.MultinomialNB)
>> which would disable the Laplace smoothening.
>>
>> Best,
>> Sebastian Raschka
>>
>> > On Sep 3, 2014, at 6:20 AM, Karimkhan Pathan <[email protected]>
>> wrote:
>> >
>> > I have trained my classifier using 20 domain datasets using
>> MultinomialNB. And it is working fine for these 20 domains.
>> >
>> > Issue is, if I make query which contains text which does not belongs to
>> any of these 20 domain, even it gives classification result.
>> >
>> > Is it possible that if query does not belongs to any of 20 domain, it
>> should get probability value 0?
>> >
>> ------------------------------------------------------------------------------
>> > Slashdot TV.
>> > Video for Nerds.  Stuff that matters.
>> > http://tv.slashdot.org/
>> > _______________________________________________
>> > Scikit-learn-general mailing list
>> > [email protected]
>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>> ------------------------------------------------------------------------------
>> Slashdot TV.
>> Video for Nerds.  Stuff that matters.
>> http://tv.slashdot.org/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
>
> --
> Patrick Short
> ------------------------------
>
> University of North Carolina at Chapel Hill, 2014
>
> Applied Mathematics and Quantitative Biology
>
> [email protected] | 919-455-7045 C
>
>
>
>
> ------------------------------------------------------------------------------
> Slashdot TV.
> Video for Nerds.  Stuff that matters.
> http://tv.slashdot.org/
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>


-- 
Mohamed-Rafik BOUGUELIA
------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to