Hi Xiangrui, We are also adding support for sparse format in mllib...if you have a pull request or jira link could you please point to it ? Jblas does not implememt sparse formats the last time I looked at it but colt had sparse formats which could be reused...
Thanks. Deb On Jan 31, 2014 11:15 AM, "Xiangrui Meng" <[email protected]> wrote: > Hi Jason, > > Sorry, I didn't see this message before I replied in another thread. > So the following is copy-and-paste: > > We are currently working on the sparse data support, one of the > highest priority features for MLlib. All existing algorithms will > support sparse input. We will open a JIRA ticket for progress tracking > and discussions. > > Best, > Xiangrui > > On Fri, Jan 31, 2014 at 10:49 AM, jshao <[email protected]> wrote: > > Hi, > > > > Spark is absolutely amazing for machine learning as its iterative > process is > > super fast. However one big issue that I realized was that the MLLib API > > isn't suitable for sparse inputs at all because it requires the feature > > vector to be a dense array. > > > > For example, I currently want to run a logistic regression on data that > is > > wide and sparse (each data point might have 3 million fields with most of > > them being 0). It is impossible to represent each data point as an array > of > > length 3 million. > > > > Can I expect/contribute to any changes that might handle sparse inputs? > > > > Thanks, > > Jason > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/MLLib-Sparse-Input-tp1085.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. >
