Hey,

Apart from encoding you could use feature engineering. Something like this
https://ipgeolocation.io/documentation/ip-geolocation-api.html
Two IPs might have the same country but different city. So, you could mix
and match whatever you want.

Best,

On Fri, Aug 16, 2019 at 10:46 AM lampahome <pahome.c...@mirlab.org> wrote:

> I collect data which has many access log from different IP.
>
> But I don't know what's the better way to encode it to make sure small
> size of train data and keep the independency of different IPs.
>
> 1. one-hot encode: If too many IP, the train data will occupy huge disk
> spaces.
> 2. category encode: IP will be encoded to 0~N, but can't show the relation
> between different IPs.
>
> anyone have advices?
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to