Hi Xiangrui,

We are also adding support for sparse format in mllib...if you have a pull
request or jira link could you please point to it ? Jblas does not
implememt sparse formats the last time I looked at it but colt had sparse
formats which could be reused...

Thanks.
Deb
 On Jan 31, 2014 11:15 AM, "Xiangrui Meng" <[email protected]> wrote:

> Hi Jason,
>
> Sorry, I didn't see this message before I replied in another thread.
> So the following is copy-and-paste:
>
> We are currently working on the sparse data support, one of the
> highest priority features for MLlib. All existing algorithms will
> support sparse input. We will open a JIRA ticket for progress tracking
> and discussions.
>
> Best,
> Xiangrui
>
> On Fri, Jan 31, 2014 at 10:49 AM, jshao <[email protected]> wrote:
> > Hi,
> >
> > Spark is absolutely amazing for machine learning as its iterative
> process is
> > super fast. However one big issue that I realized was that the MLLib API
> > isn't suitable for sparse inputs at all because it requires the feature
> > vector to be a dense array.
> >
> > For example, I currently want to run a logistic regression on data that
> is
> > wide and sparse (each data point might have 3 million fields with most of
> > them being 0). It is impossible to represent each data point as an array
> of
> > length 3 million.
> >
> > Can I expect/contribute to any changes that might handle sparse inputs?
> >
> > Thanks,
> > Jason
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/MLLib-Sparse-Input-tp1085.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Reply via email to