Great, I will sort them.
Sent via the Samsung GALAXY S®4, an AT&T 4G LTE smartphone <div>-------- Original message --------</div><div>From: Xiangrui Meng <men...@gmail.com> </div><div>Date:10/21/2014 3:29 PM (GMT-08:00) </div><div>To: Sameer Tilak <ssti...@live.com> </div><div>Cc: user@spark.apache.org </div><div>Subject: Re: MLLib libsvm format </div><div> </div> Yes. "where the indices are one-based and **in ascending order**". -Xiangrui On Tue, Oct 21, 2014 at 1:10 PM, Sameer Tilak <ssti...@live.com> wrote: > Hi All, > > I have a question regarding the ordering of indices. The document says that > the indices indices are one-based and in ascending order. However, do the > indices within a row need to be sorted in ascending order? > > > > > Sparse data > > It is very common in practice to have sparse training data. MLlib supports > reading training examples stored in LIBSVM format, which is the default > format used by LIBSVM and LIBLINEAR. It is a text format in which each line > represents a labeled sparse feature vector using the following format: > > label index1:value1 index2:value2 ... > > where the indices are one-based and in ascending order. After loading, the > feature indices are converted to zero-based. > > > > For example, I have have indices ranging rom 1 to 1000 is this as a libsvm > data file OK? > > > 1 110:1.0 80:0.5 310:0.0 > > 0 890:0.5 20:0.0 200:0.5 400:1.0 82:0.0 > > and so on: > > > OR do I need to sort them as: > > > 1 80:0.5 110:1.0 310:0.0 > > 0 20:0.0 82:0.0 200:0.5 400:1.0 890:0.5