Why do you think one-hot will be an "explosion"?
In your example, the vector would be length 8 (if there are values from
a to f, that is, you gave the largest possible sets).
On 08/14/2015 09:01 AM, federico vaggi wrote:
Hi,
Simple example:
Let's say that I have a binary classification task, and my input
vector consists of two disjunct sects of categorical variables -
something like:
X1 = {'a', 'b', 'c', 'd'} and X2 = {'e', 'd', 'b', 'f'}
The order within the sets does not matter (obviously), but it matters
that the elements of X1 are conceptually separate from those of X2.
All the categorical variables come from the same set.
Is there a clever encoding that:
- Emphasizes that order within each set does not matter
- Avoids explosion with one-hot encoding everything?
Federico
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general