Great, I will sort them.
Sent via the Samsung GALAXY S®4, an AT&T 4G LTE smartphone
Original message From: Xiangrui Meng
Date:10/21/2014 3:29 PM (GMT-08:00)
To: Sameer Tilak Cc:
user@spark.apache.org Subject: Re: MLLib libsvm format
Yes. "where the indices are one-based and **in ascending order**". -Xiangrui
On Tue, Oct 21, 2014 at 1:10 PM, Sameer Tilak wrote:
> Hi All,
>
> I have a question regarding the ordering of indices. The document says that
> the indices indices are one-based and in ascending order. However, do the
> indices within a row need to be sorted in ascending order?
>
>
>
>
> Sparse data
>
> It is very common in practice to have sparse training data. MLlib supports
> reading training examples stored in LIBSVM format, which is the default
> format used by LIBSVM and LIBLINEAR. It is a text format in which each line
> represents a labeled sparse feature vector using the following format:
>
> label index1:value1 index2:value2 ...
>
> where the indices are one-based and in ascending order. After loading, the
> feature indices are converted to zero-based.
>
>
>
> For example, I have have indices ranging rom 1 to 1000 is this as a libsvm
> data file OK?
>
>
> 1110:1.0 80:0.5 310:0.0
>
> 0 890:0.5 20:0.0 200:0.5 400:1.0 82:0.0
>
> and so on:
>
>
> OR do I need to sort them as:
>
>
> 1 80:0.5 110:1.0 310:0.0
>
> 0 20:0.082:0.0200:0.5 400:1.0 890:0.5