Hey, Apart from encoding you could use feature engineering. Something like this https://ipgeolocation.io/documentation/ip-geolocation-api.html Two IPs might have the same country but different city. So, you could mix and match whatever you want.
Best, On Fri, Aug 16, 2019 at 10:46 AM lampahome <pahome.c...@mirlab.org> wrote: > I collect data which has many access log from different IP. > > But I don't know what's the better way to encode it to make sure small > size of train data and keep the independency of different IPs. > > 1. one-hot encode: If too many IP, the train data will occupy huge disk > spaces. > 2. category encode: IP will be encoded to 0~N, but can't show the relation > between different IPs. > > anyone have advices? > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn