Re: [scikit-learn] Loading file in libsvm format

klo uo Thu, 08 Sep 2016 11:47:19 -0700

Oh, I just figured, it's the max value for term_id.
Sorry to disturb you ;)



Cheers


On Thu, Sep 8, 2016 at 8:40 PM, klo uo <[email protected]> wrote:

>
> ---------- Forwarded message ----------
> From: klo uo <[email protected]>
> Date: Thu, Sep 8, 2016 at 8:25 PM
> Subject: Loading file in libsvm format
> To: [email protected]
>
>
> Hi,
>
> I produced a file in libsvm format:
>
>     <label> <index1>:<value1> <index2>:<value2> ...
>
> with this content:
>
>     6284 576:1 884:1 2482:1 4279:1 5765:1 184552:1 661512:1 699842:1
>     2259 1669:1 5711528:6
>     2822 5765159:1
>     ...
>
> The label is document_id, and index:value are term_id and term count.
>
> This file has 83K labels with 40K unique terms (and overall 1.2M
> index:value pairs).
>
> When I load this file in sklearn:
>
>     from sklearn.datasets import load_svmlight_file
>     X, y = load_svmlight_file('libsim.txt')
>
> I get X with shape (82448, 6092168).
>
> I don't know of any reason why am I getting 6M features?
> Can someone explain?
>
>
> Thanks
>
>
>
>

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Loading file in libsvm format

Reply via email to