Hi,

Simple example:

Let's say that I have a binary classification task, and my input vector
consists of two disjunct sects of categorical variables - something like:

X1 = {'a', 'b', 'c', 'd'} and X2 = {'e', 'd', 'b', 'f'}

The order within the sets does not matter (obviously), but it matters that
the elements of X1 are conceptually separate from those of X2.

All the categorical variables come from the same set.

Is there a clever encoding that:

- Emphasizes that order within each set does not matter
- Avoids explosion with one-hot encoding everything?

Federico
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to