Originally, I thought I'd have to have a vector of length n_categories per
feature - so in this case, 8*8.  I just realized however, that since the
order does not matter, and I just want to indicate the presence or absence
of a categorical feature in a set, I can simply use two vectors (stacked
together) of length n_categories (or 2*8).



On Fri, 14 Aug 2015 at 16:04 Andreas Mueller <t3k...@gmail.com> wrote:

> Why do you think one-hot will be an "explosion"?
> In your example, the vector would be length 8 (if there are values from a
> to f, that is, you gave the largest possible sets).
>
>
>
> On 08/14/2015 09:01 AM, federico vaggi wrote:
>
> Hi,
>
> Simple example:
>
> Let's say that I have a binary classification task, and my input vector
> consists of two disjunct sects of categorical variables - something like:
>
> X1 = {'a', 'b', 'c', 'd'} and X2 = {'e', 'd', 'b', 'f'}
>
> The order within the sets does not matter (obviously), but it matters that
> the elements of X1 are conceptually separate from those of X2.
>
> All the categorical variables come from the same set.
>
> Is there a clever encoding that:
>
> - Emphasizes that order within each set does not matter
> - Avoids explosion with one-hot encoding everything?
>
> Federico
>
>
> ------------------------------------------------------------------------------
>
>
>
> _______________________________________________
> Scikit-learn-general mailing 
> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to