Originally, I thought I'd have to have a vector of length n_categories per feature - so in this case, 8*8. I just realized however, that since the order does not matter, and I just want to indicate the presence or absence of a categorical feature in a set, I can simply use two vectors (stacked together) of length n_categories (or 2*8).
On Fri, 14 Aug 2015 at 16:04 Andreas Mueller <t3k...@gmail.com> wrote: > Why do you think one-hot will be an "explosion"? > In your example, the vector would be length 8 (if there are values from a > to f, that is, you gave the largest possible sets). > > > > On 08/14/2015 09:01 AM, federico vaggi wrote: > > Hi, > > Simple example: > > Let's say that I have a binary classification task, and my input vector > consists of two disjunct sects of categorical variables - something like: > > X1 = {'a', 'b', 'c', 'd'} and X2 = {'e', 'd', 'b', 'f'} > > The order within the sets does not matter (obviously), but it matters that > the elements of X1 are conceptually separate from those of X2. > > All the categorical variables come from the same set. > > Is there a clever encoding that: > > - Emphasizes that order within each set does not matter > - Avoids explosion with one-hot encoding everything? > > Federico > > > ------------------------------------------------------------------------------ > > > > _______________________________________________ > Scikit-learn-general mailing > listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > ------------------------------------------------------------------------------ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general