Github user ZunwenYou commented on the issue:
https://github.com/apache/spark/pull/17000
ping @yanboliang , please has a look at this improvement.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user ZunwenYou commented on the issue:
https://github.com/apache/spark/pull/17000
Hi, @MLnick
Firstly, `sliceAggregate `is a common aggregate for array-like data.
Besides `MultivariateOnlineSummarizer ` case, it can be used in many large
machine learning cases. I chose
Github user ZunwenYou commented on the issue:
https://github.com/apache/spark/pull/17000
Hi, @hhbyyh
In our experiment, the class **_MultivariateOnlineSummarizer_** contains 8
arrays, if the dimension reaches 20 million, the memory of
MultivariateOnlineSummarizer is 1280M
Github user ZunwenYou commented on the issue:
https://github.com/apache/spark/pull/17000
Hi, @MLnick
You are right, sliceAggregate splits an array into smaller chunks before
shuffle.
It has three advantage
Firstly, the shuffle data is less than treeAggregate during the
Github user ZunwenYou commented on the issue:
https://github.com/apache/spark/pull/17000
Hi, @MLnick
You are right, sliceAggregate splits an array into smaller chunks before
shuffle.
It has three advantage
Firstly, the shuffle data is less than treeAggregate during the
GitHub user ZunwenYou opened a pull request:
https://github.com/apache/spark/pull/17000
[SPARK-18946][ML] sliceAggregate which is a new aggregate operator for
high-dimensional data
In many machine learning cases, driver has to aggregate high-dimensional
vectors/arrays from
Github user ZunwenYou commented on the issue:
https://github.com/apache/spark/pull/14473
@MLnick You are right. We have apply ADMM to Sparse Logistic Regression
with L1 norm in some CTR applications, the data sets of these applications
almost consist of 10 million dimension and 100
Github user ZunwenYou commented on the issue:
https://github.com/apache/spark/pull/14473
@MLnick please have a look at this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
GitHub user ZunwenYou opened a pull request:
https://github.com/apache/spark/pull/14473
[SPARK-16495] Add ADMM optimizer in mllib package
Alternating Direction Method of Multipliers (ADMM) is well suited to
distributed convex optimization, and in particular to large-scale problems