On 13 November 2011 20:14, SK Sn <[email protected]> wrote:

> The problem mentioned in previous mail about classifying nominals was
> solved using answer from Lars in one SE 
> post<http://stackoverflow.com/questions/7698713/fuzzy-c-means-categorical-data/7698936#7698936>,
> that is, to use bag-of-nominal or one-hot representation.
>
> On 12 November 2011 13:49, SK Sn <[email protected]> wrote:
>
>> Hi all,
>>
>> I am looking into how to combine classifiers using Scikit-learn.
>> I think for general purpose, it could be useful to have functions like
>> stacking and voting in scikit-learn. Is there any plan of developing
>> ensemble methods?
>>
>> For now, I am writting my own snippet for stacking. First phase would be
>> stacking simply on predictions from different models and next would be
>> stacking on probabilities.
>>
>> However, while dealing with the predictions, I get a problem of
>> classifying nominals:
>> In details, in level 0, several (say m) classifiers are used, and m
>> predictions for each sample are gathered to form a Z matrix.
>> In m=7, Z could look like:
>> [ [1 1 2 1 1 1 1]
>>   [3 3 3 6 3 3 3]
>>  ....
>>   [3 9 3 2 3 3 3]
>> ]
>> y in this case, could be:
>> [1  3  ...  3 ]
>>
>> So, on level 1 (stacking level), a new classifier's task is to predict
>> base on the results from level 0, e.g., for a test case, level 0 generates:
>> [1 6 6 6 6 6]
>> we expect level 1 classifier to give prediction as 6.
>> Because in stakcing, level 1 is a machine learning classifier rather than
>> selecting mode, one excepts stacking will out-perform voting in general.
>>
>> The problem is that all the numbers in Z are predication of categories,
>> these numbers are nomial without any real quantitative meaning.
>> I directly applied classification methods on (Z,y), results are terrible,
>> except for tree classifier.
>> Also regressions with rouding are tried, results are relatively higher
>> than classification, but not as high as level 0. But still, regression on
>> nomial numbers does not seem to make too much sense to me.
>> I though about normalization and scaling in preprocessing, but not sure
>> if they are relevant here.
>>
>> I wonder what is the right way to classify based on nomials?
>>
>> Thanks a lot!
>>
>
>
>
> ------------------------------------------------------------------------------
> RSA(R) Conference 2012
> Save $700 by Nov 18
> Register now
> http://p.sf.net/sfu/rsa-sfdev2dev1
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
There is a new PR#439 now
https://github.com/scikit-learn/scikit-learn/pull/439
It's a work in progress. Is this what you are after?

- Robert


-- 

Public key at: http://pgp.mit.edu/ Search for this email address and select
the key from "2011-08-19" (key id: 54BA8735)
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to