[ 
https://issues.apache.org/jira/browse/SPARK-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1212.
------------------------------------

    Resolution: Fixed

> Support sparse data in MLlib
> ----------------------------
>
>                 Key: SPARK-1212
>                 URL: https://issues.apache.org/jira/browse/SPARK-1212
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 0.9.0
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>            Priority: Blocker
>             Fix For: 1.0.0
>
>
> MLlib's NaiveBayes, SGD, and KMeans accept RDD[LabeledPoint] for training and 
> RDD[Array[Double]] for prediction, where LabeledPoint is a wrapper of 
> (Double, Array[Double]). Using Array[Double] could have good performance, but 
> sparse data appears quite often in practice. So I created this JIRA to 
> discuss the plan of adding sparse data support to MLlib and track its 
> progress.
> The goal is to support sparse data for training and prediction in all 
> existing algorithms in MLlib:
> * Gradient Descent
> * K-Means
> * Naive Bayes
> Previous discussions and pull requests:
> * https://github.com/mesos/spark/pull/736



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to