I collect data which has many access log from different IP.

But I don't know what's the better way to encode it to make sure small size
of train data and keep the independency of different IPs.

1. one-hot encode: If too many IP, the train data will occupy huge disk
spaces.
2. category encode: IP will be encoded to 0~N, but can't show the relation
between different IPs.

anyone have advices?
_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to