Hi,

I created a JIRA for discussion and track the progress:

https://spark-project.atlassian.net/browse/MLLIB-18

Let us move our discussion there.

Best,
Xiangrui

On Wed, Feb 5, 2014 at 3:35 PM, Debasish Das <[email protected]> wrote:
> Hi Xiangrui,
>
> We are also adding support for sparse format in mllib...if you have a pull
> request or jira link could you please point to it ? Jblas does not implememt
> sparse formats the last time I looked at it but colt had sparse formats
> which could be reused...
>
> Thanks.
> Deb
>
> On Jan 31, 2014 11:15 AM, "Xiangrui Meng" <[email protected]> wrote:
>>
>> Hi Jason,
>>
>> Sorry, I didn't see this message before I replied in another thread.
>> So the following is copy-and-paste:
>>
>> We are currently working on the sparse data support, one of the
>> highest priority features for MLlib. All existing algorithms will
>> support sparse input. We will open a JIRA ticket for progress tracking
>> and discussions.
>>
>> Best,
>> Xiangrui
>>
>> On Fri, Jan 31, 2014 at 10:49 AM, jshao <[email protected]> wrote:
>> > Hi,
>> >
>> > Spark is absolutely amazing for machine learning as its iterative
>> > process is
>> > super fast. However one big issue that I realized was that the MLLib API
>> > isn't suitable for sparse inputs at all because it requires the feature
>> > vector to be a dense array.
>> >
>> > For example, I currently want to run a logistic regression on data that
>> > is
>> > wide and sparse (each data point might have 3 million fields with most
>> > of
>> > them being 0). It is impossible to represent each data point as an array
>> > of
>> > length 3 million.
>> >
>> > Can I expect/contribute to any changes that might handle sparse inputs?
>> >
>> > Thanks,
>> > Jason
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> > http://apache-spark-user-list.1001560.n3.nabble.com/MLLib-Sparse-Input-tp1085.html
>> > Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to