Hi, I created a JIRA for discussion and track the progress:
https://spark-project.atlassian.net/browse/MLLIB-18 Let us move our discussion there. Best, Xiangrui On Wed, Feb 5, 2014 at 3:35 PM, Debasish Das <[email protected]> wrote: > Hi Xiangrui, > > We are also adding support for sparse format in mllib...if you have a pull > request or jira link could you please point to it ? Jblas does not implememt > sparse formats the last time I looked at it but colt had sparse formats > which could be reused... > > Thanks. > Deb > > On Jan 31, 2014 11:15 AM, "Xiangrui Meng" <[email protected]> wrote: >> >> Hi Jason, >> >> Sorry, I didn't see this message before I replied in another thread. >> So the following is copy-and-paste: >> >> We are currently working on the sparse data support, one of the >> highest priority features for MLlib. All existing algorithms will >> support sparse input. We will open a JIRA ticket for progress tracking >> and discussions. >> >> Best, >> Xiangrui >> >> On Fri, Jan 31, 2014 at 10:49 AM, jshao <[email protected]> wrote: >> > Hi, >> > >> > Spark is absolutely amazing for machine learning as its iterative >> > process is >> > super fast. However one big issue that I realized was that the MLLib API >> > isn't suitable for sparse inputs at all because it requires the feature >> > vector to be a dense array. >> > >> > For example, I currently want to run a logistic regression on data that >> > is >> > wide and sparse (each data point might have 3 million fields with most >> > of >> > them being 0). It is impossible to represent each data point as an array >> > of >> > length 3 million. >> > >> > Can I expect/contribute to any changes that might handle sparse inputs? >> > >> > Thanks, >> > Jason >> > >> > >> > >> > -- >> > View this message in context: >> > http://apache-spark-user-list.1001560.n3.nabble.com/MLLib-Sparse-Input-tp1085.html >> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
