Thanks for asking! We should improve the documentation. The sample
dataset is actually mimicking the MNIST digits dataset, where the
values are gray levels (0-255). So by dividing by 16, we want to map
it to 16 coarse bins for the gray levels. Actually, there is a bug in
the doc, we should convert the values to integer first before dividing
by 16. I created https://issues.apache.org/jira/browse/SPARK-7739 for
this issue. Welcome to submit a patch:) Thanks!

Best,
Xiangrui

On Thu, May 7, 2015 at 9:20 PM, spark_user_2015 <[email protected]> wrote:
> The Spark documentation shows the following example code:
>
> // Discretize data in 16 equal bins since ChiSqSelector requires categorical
> features
> val discretizedData = data.map { lp =>
>   LabeledPoint(lp.label, Vectors.dense(lp.features.toArray.map { x => x / 16
> } ) )
> }
>
> I'm sort of missing why "x / 16" is considered a discretization approach
> here.
>
> [https://spark.apache.org/docs/latest/mllib-feature-extraction.html#feature-selection]
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Discretization-tp22811.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to