RE: MLLib libsvm format

2014-10-21 Thread Sameer Tilak
Great, I will sort them.


Sent via the Samsung GALAXY S®4, an AT&T 4G LTE smartphone

 Original message From: Xiangrui Meng 
 Date:10/21/2014  3:29 PM  (GMT-08:00) 
To: Sameer Tilak  Cc: 
user@spark.apache.org Subject: Re: MLLib libsvm format 

Yes. "where the indices are one-based and **in ascending order**". -Xiangrui

On Tue, Oct 21, 2014 at 1:10 PM, Sameer Tilak  wrote:
> Hi All,
>
> I have a question regarding the ordering of indices. The document says that
> the indices indices are one-based and in ascending order. However, do the
> indices within a row need to be sorted in ascending order?
>
>
>
>
> Sparse data
>
> It is very common in practice to have sparse training data. MLlib supports
> reading training examples stored in LIBSVM format, which is the default
> format used by LIBSVM and LIBLINEAR. It is a text format in which each line
> represents a labeled sparse feature vector using the following format:
>
> label index1:value1 index2:value2 ...
>
> where the indices are one-based and in ascending order. After loading, the
> feature indices are converted to zero-based.
>
>
>
> For example, I have have indices ranging rom 1 to 1000 is this as a libsvm
> data file OK?
>
>
> 1110:1.0   80:0.5   310:0.0
>
> 0 890:0.5  20:0.0   200:0.5   400:1.0  82:0.0
>
> and so on:
>
>
> OR do I need to sort them as:
>
>
> 1  80:0.5   110:1.0   310:0.0
>
> 0  20:0.082:0.0200:0.5   400:1.0  890:0.5


Re: MLLib libsvm format

2014-10-21 Thread Xiangrui Meng
Yes. "where the indices are one-based and **in ascending order**". -Xiangrui

On Tue, Oct 21, 2014 at 1:10 PM, Sameer Tilak  wrote:
> Hi All,
>
> I have a question regarding the ordering of indices. The document says that
> the indices indices are one-based and in ascending order. However, do the
> indices within a row need to be sorted in ascending order?
>
>
>
>
> Sparse data
>
> It is very common in practice to have sparse training data. MLlib supports
> reading training examples stored in LIBSVM format, which is the default
> format used by LIBSVM and LIBLINEAR. It is a text format in which each line
> represents a labeled sparse feature vector using the following format:
>
> label index1:value1 index2:value2 ...
>
> where the indices are one-based and in ascending order. After loading, the
> feature indices are converted to zero-based.
>
>
>
> For example, I have have indices ranging rom 1 to 1000 is this as a libsvm
> data file OK?
>
>
> 1110:1.0   80:0.5   310:0.0
>
> 0 890:0.5  20:0.0   200:0.5   400:1.0  82:0.0
>
> and so on:
>
>
> OR do I need to sort them as:
>
>
> 1  80:0.5   110:1.0   310:0.0
>
> 0  20:0.082:0.0200:0.5   400:1.0  890:0.5

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org