I collect data which has many access log from different IP. But I don't know what's the better way to encode it to make sure small size of train data and keep the independency of different IPs.
1. one-hot encode: If too many IP, the train data will occupy huge disk spaces. 2. category encode: IP will be encoded to 0~N, but can't show the relation between different IPs. anyone have advices?
_______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
