[GitHub] spark issue #17000: [SPARK-18946][ML] sliceAggregate which is a new aggregat...

2017-03-12 Thread ZunwenYou
Github user ZunwenYou commented on the issue: https://github.com/apache/spark/pull/17000 ping @yanboliang , please has a look at this improvement. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #17000: [SPARK-18946][ML] sliceAggregate which is a new aggregat...

2017-02-21 Thread ZunwenYou
Github user ZunwenYou commented on the issue: https://github.com/apache/spark/pull/17000 Hi, @MLnick Firstly, `sliceAggregate `is a common aggregate for array-like data. Besides `MultivariateOnlineSummarizer ` case, it can be used in many large machine learning cases. I chose

[GitHub] spark issue #17000: [SPARK-18946][ML] sliceAggregate which is a new aggregat...

2017-02-20 Thread ZunwenYou
Github user ZunwenYou commented on the issue: https://github.com/apache/spark/pull/17000 Hi, @hhbyyh In our experiment, the class **_MultivariateOnlineSummarizer_** contains 8 arrays, if the dimension reaches 20 million, the memory of MultivariateOnlineSummarizer is 1280M

[GitHub] spark issue #17000: [SPARK-18946][ML] sliceAggregate which is a new aggregat...

2017-02-20 Thread ZunwenYou
Github user ZunwenYou commented on the issue: https://github.com/apache/spark/pull/17000 Hi, @MLnick You are right, sliceAggregate splits an array into smaller chunks before shuffle. It has three advantage Firstly, the shuffle data is less than treeAggregate during the

[GitHub] spark issue #17000: [SPARK-18946][ML] sliceAggregate which is a new aggregat...

2017-02-20 Thread ZunwenYou
Github user ZunwenYou commented on the issue: https://github.com/apache/spark/pull/17000 Hi, @MLnick You are right, sliceAggregate splits an array into smaller chunks before shuffle. It has three advantage Firstly, the shuffle data is less than treeAggregate during the

[GitHub] spark pull request #17000: [SPARK-18946][ML] sliceAggregate which is a new a...

2017-02-20 Thread ZunwenYou
GitHub user ZunwenYou opened a pull request: https://github.com/apache/spark/pull/17000 [SPARK-18946][ML] sliceAggregate which is a new aggregate operator for high-dimensional data In many machine learning cases, driver has to aggregate high-dimensional vectors/arrays from

[GitHub] spark issue #14473: [SPARK-16495] [MLlib]Add ADMM optimizer in mllib package

2016-08-05 Thread ZunwenYou
Github user ZunwenYou commented on the issue: https://github.com/apache/spark/pull/14473 @MLnick You are right. We have apply ADMM to Sparse Logistic Regression with L1 norm in some CTR applications, the data sets of these applications almost consist of 10 million dimension and 100

[GitHub] spark issue #14473: [SPARK-16495] [MLlib]Add ADMM optimizer in mllib package

2016-08-03 Thread ZunwenYou
Github user ZunwenYou commented on the issue: https://github.com/apache/spark/pull/14473 @MLnick please have a look at this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #14473: [SPARK-16495] Add ADMM optimizer in mllib package

2016-08-02 Thread ZunwenYou
GitHub user ZunwenYou opened a pull request: https://github.com/apache/spark/pull/14473 [SPARK-16495] Add ADMM optimizer in mllib package Alternating Direction Method of Multipliers (ADMM) is well suited to distributed convex optimization, and in particular to large-scale problems