If the possible categories within each set are n_categories many then yes.
I read it as the categories being also disjoint, which is why I said 8.

On 08/14/2015 01:04 PM, federico vaggi wrote:
Originally, I thought I'd have to have a vector of length n_categories per feature - so in this case, 8*8. I just realized however, that since the order does not matter, and I just want to indicate the presence or absence of a categorical feature in a set, I can simply use two vectors (stacked together) of length n_categories (or 2*8).



On Fri, 14 Aug 2015 at 16:04 Andreas Mueller <t3k...@gmail.com <mailto:t3k...@gmail.com>> wrote:

    Why do you think one-hot will be an "explosion"?
    In your example, the vector would be length 8 (if there are values
    from a to f, that is, you gave the largest possible sets).



    On 08/14/2015 09:01 AM, federico vaggi wrote:
    Hi,

    Simple example:

    Let's say that I have a binary classification task, and my input
    vector consists of two disjunct sects of categorical variables -
    something like:

    X1 = {'a', 'b', 'c', 'd'} and X2 = {'e', 'd', 'b', 'f'}

    The order within the sets does not matter (obviously), but it
    matters that the elements of X1 are conceptually separate from
    those of X2.

    All the categorical variables come from the same set.

    Is there a clever encoding that:

    - Emphasizes that order within each set does not matter
    - Avoids explosion with one-hot encoding everything?

    Federico


    
------------------------------------------------------------------------------


    _______________________________________________
    Scikit-learn-general mailing list
    Scikit-learn-general@lists.sourceforge.net  
<mailto:Scikit-learn-general@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
    
------------------------------------------------------------------------------
    _______________________________________________
    Scikit-learn-general mailing list
    Scikit-learn-general@lists.sourceforge.net
    <mailto:Scikit-learn-general@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



------------------------------------------------------------------------------


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to