Hi, Simple example:
Let's say that I have a binary classification task, and my input vector consists of two disjunct sects of categorical variables - something like: X1 = {'a', 'b', 'c', 'd'} and X2 = {'e', 'd', 'b', 'f'} The order within the sets does not matter (obviously), but it matters that the elements of X1 are conceptually separate from those of X2. All the categorical variables come from the same set. Is there a clever encoding that: - Emphasizes that order within each set does not matter - Avoids explosion with one-hot encoding everything? Federico
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general