[GitHub] spark issue #19536: [SPARK-6685][ML]Use DSYRK to compute AtA in ALS

2017-10-19 Thread hqzizania
Github user hqzizania commented on the issue: https://github.com/apache/spark/pull/19536 Wow, thank you for reopening. LOL @mpjlu --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-10-26 Thread hqzizania
Github user hqzizania commented on the issue: https://github.com/apache/spark/pull/13891 @srowen I am not familiar with MiMa really, so what should I do now? Or just go back to [the previous commit](a6b5a16cd78e4efe99fda40f92592c9712b04146), and create a JIRA for the issue

[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-10-25 Thread hqzizania
Github user hqzizania commented on the issue: https://github.com/apache/spark/pull/13891 @srowen It seems mima test still fails when putting the new Param at the end of train method. :( --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in AL...

2016-10-25 Thread hqzizania
Github user hqzizania commented on a diff in the pull request: https://github.com/apache/spark/pull/13891#discussion_r84948911 --- Diff: project/MimaExcludes.scala --- @@ -864,6 +864,9 @@ object MimaExcludes { // [SPARK-12221] Add CPU time to metrics

[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-10-24 Thread hqzizania
Github user hqzizania commented on the issue: https://github.com/apache/spark/pull/13891 @mengxr @srowen @yanboliang A threshold param is added for unit tests. Does it look okay now? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-10-21 Thread hqzizania
Github user hqzizania commented on the issue: https://github.com/apache/spark/pull/13891 @mengxr I see. I will add a param for it. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-10-20 Thread hqzizania
Github user hqzizania commented on the issue: https://github.com/apache/spark/pull/13891 @yanboliang So sorry for my late response. Some regression performance test results: Datasets: using [genExplicitTestData](https://github.com/apache/spark/pull/13891/files#diff

[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-09-21 Thread hqzizania
Github user hqzizania commented on the issue: https://github.com/apache/spark/pull/13891 @yanboliang sorry, i'm on a business trip and will upload the test result ASAP. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #14640: [SPARK-17055] [MLLIB] add labelKFold to CrossValidator

2016-08-23 Thread hqzizania
Github user hqzizania commented on the issue: https://github.com/apache/spark/pull/14640 This work may be similar with [SPARK-8971](https://github.com/apache/spark/pull/14321) which is another variation of KFold, and very significant in some cases. I suppose it is okay to add

[GitHub] spark issue #14738: [SPARK-17090][FOLLOW-UP][ML]Add expert param support to ...

2016-08-22 Thread hqzizania
Github user hqzizania commented on the issue: https://github.com/apache/spark/pull/14738 Fixed. Thanks for the reviews. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-22 Thread hqzizania
Github user hqzizania commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r75689943 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -171,14 +180,47 @@ object ChiSqSelectorModel extends Loader

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-22 Thread hqzizania
Github user hqzizania commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r75689230 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -171,14 +180,47 @@ object ChiSqSelectorModel extends Loader

[GitHub] spark pull request #14738: [SPARK-17090][MINOR][ML]Add expert param support ...

2016-08-22 Thread hqzizania
Github user hqzizania commented on a diff in the pull request: https://github.com/apache/spark/pull/14738#discussion_r75634235 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala --- @@ -167,11 +173,11 @@ private[shared] object

[GitHub] spark issue #14738: [SPARK-17090][MINOR][ML]Add expert param support to Shar...

2016-08-22 Thread hqzizania
Github user hqzizania commented on the issue: https://github.com/apache/spark/pull/14738 It is linked to SPARK-17090 now. @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #14738: [MINOR][ML]Add expert param support to SharedParamsCodeG...

2016-08-21 Thread hqzizania
Github user hqzizania commented on the issue: https://github.com/apache/spark/pull/14738 Thanks for sethan's comments :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14738: [MINOR][ML]Add expert param support to SharedParamsCodeG...

2016-08-21 Thread hqzizania
Github user hqzizania commented on the issue: https://github.com/apache/spark/pull/14738 @srowen this is about a @group name called expertParam, just a part of [SPARK-17090](https://github.com/apache/spark/pull/14717). SPARK-17175 is for a expert formula which was discussed

[GitHub] spark pull request #14738: [MINOR][ML]Add expert param support to SharedPara...

2016-08-21 Thread hqzizania
Github user hqzizania commented on a diff in the pull request: https://github.com/apache/spark/pull/14738#discussion_r75589699 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala --- @@ -153,6 +154,11 @@ private[shared] object

[GitHub] spark pull request #14738: [MINOR][ML]Add expert param support to SharedPara...

2016-08-20 Thread hqzizania
GitHub user hqzizania opened a pull request: https://github.com/apache/spark/pull/14738 [MINOR][ML]Add expert param support to SharedParamsCodeGen ## What changes were proposed in this pull request? Add expert param support to SharedParamsCodeGen where aggregationDepth

[GitHub] spark pull request #14717: [SPARK-17090][ML]Make tree aggregation level in l...

2016-08-20 Thread hqzizania
Github user hqzizania commented on a diff in the pull request: https://github.com/apache/spark/pull/14717#discussion_r75588359 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala --- @@ -389,4 +389,21 @@ private[ml] trait HasSolver extends Params

[GitHub] spark issue #14717: [SPARK-17090][ML]Make tree aggregation level in linear/l...

2016-08-20 Thread hqzizania
Github user hqzizania commented on the issue: https://github.com/apache/spark/pull/14717 Thanks for the reviews :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #14717: [SPARK-17090][ML]Make tree aggregation level in l...

2016-08-20 Thread hqzizania
Github user hqzizania commented on a diff in the pull request: https://github.com/apache/spark/pull/14717#discussion_r75588057 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala --- @@ -389,4 +389,21 @@ private[ml] trait HasSolver extends Params

[GitHub] spark pull request #14717: [SPARK-17090][ML]Make tree aggregation level in l...

2016-08-20 Thread hqzizania
Github user hqzizania commented on a diff in the pull request: https://github.com/apache/spark/pull/14717#discussion_r75577571 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -172,6 +173,17 @@ class LinearRegression @Since("

[GitHub] spark pull request #14717: [WIP][SPARK-17090][ML]Make tree aggregation level...

2016-08-19 Thread hqzizania
Github user hqzizania commented on a diff in the pull request: https://github.com/apache/spark/pull/14717#discussion_r75566293 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1331,8 +1343,8 @@ private class LogisticCostFun

[GitHub] spark pull request #14717: [WIP][SPARK-17090][ML]Make tree aggregation level...

2016-08-19 Thread hqzizania
Github user hqzizania commented on a diff in the pull request: https://github.com/apache/spark/pull/14717#discussion_r75512159 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala --- @@ -389,4 +389,21 @@ private[ml] trait HasSolver extends Params

[GitHub] spark pull request #14717: [WIP][SPARK-17090][ML]Make tree aggregation level...

2016-08-19 Thread hqzizania
Github user hqzizania commented on a diff in the pull request: https://github.com/apache/spark/pull/14717#discussion_r75510309 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -48,7 +48,7 @@ import

[GitHub] spark pull request #14717: [WIP][SPARK-17090][ML]Make tree aggregation level...

2016-08-19 Thread hqzizania
Github user hqzizania commented on a diff in the pull request: https://github.com/apache/spark/pull/14717#discussion_r75509839 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -256,6 +256,15 @@ class LogisticRegression @Since

[GitHub] spark pull request #14717: [SPARK-17090][ML]Make tree aggregation level in l...

2016-08-19 Thread hqzizania
GitHub user hqzizania opened a pull request: https://github.com/apache/spark/pull/14717 [SPARK-17090][ML]Make tree aggregation level in linear/logistic regression configurable ## What changes were proposed in this pull request? Linear/logistic regression use treeAggregate

[GitHub] spark issue #14449: [SPARK-16843][MLLIB] add the percentage ChiSquareSelecto...

2016-08-04 Thread hqzizania
Github user hqzizania commented on the issue: https://github.com/apache/spark/pull/14449 Percentage is a useful addition to ChiSquareSelector, it is a common and intuitive param to data scientists and statistician as scikit-learn has, but it may be not worthy a whole other API

[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-08-03 Thread hqzizania
Github user hqzizania commented on the issue: https://github.com/apache/spark/pull/13891 cc @mengxr @yanboliang Was this patch Okay? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-06-28 Thread hqzizania
Github user hqzizania commented on the issue: https://github.com/apache/spark/pull/13891 I set the stack size as 128 according to some more tests results, where 128 maybe a conservative size. However, this change will bypassing existing unit tests, as `doStack` is always `false

[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-06-28 Thread hqzizania
Github user hqzizania commented on the issue: https://github.com/apache/spark/pull/13891 @srowen Ops~ The `copytoTri()` is indeed a little different in the test code. I change it into: ``` private def copyToTri(): Unit = { var i = 0 var j = 0

[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-06-27 Thread hqzizania
Github user hqzizania commented on the issue: https://github.com/apache/spark/pull/13891 @mengxr this is a simple imitation of the loop in `computeFactors[ID]()` ALS using. It runs on a bare-metal node with 4 cores. All tests use all cores by RDD multi-partitions. --- If your

[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-06-27 Thread hqzizania
Github user hqzizania commented on the issue: https://github.com/apache/spark/pull/13891 code for testing ``` def run(rank: Int, a:Int) = { println(s"blas.getclass() = ${blas.getClass.toString} on process $rank") val m = 1 << a

[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-06-24 Thread hqzizania
Github user hqzizania commented on the issue: https://github.com/apache/spark/pull/13891 @mengxr Do you mean only test `add()` and `addStack()` without ALS? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-06-24 Thread hqzizania
Github user hqzizania commented on the issue: https://github.com/apache/spark/pull/13891 This is a prototype. Actually, it is critical if it will be faster = =! I have done a simple test, the effect is up to "number of user for each product". The "number of user f

[GitHub] spark pull request #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in AL...

2016-06-24 Thread hqzizania
GitHub user hqzizania opened a pull request: https://github.com/apache/spark/pull/13891 [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS ## What changes were proposed in this pull request? jira: https://issues.apache.org/jira/browse/SPARK-6685 This is to swtich DSPR

[GitHub] spark pull request: [SPARK-6820][SPARKR]Convert NAs to null type i...

2015-06-09 Thread hqzizania
Github user hqzizania commented on the pull request: https://github.com/apache/spark/pull/6190#issuecomment-110270315 @davies :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-6820][SPARKR]Convert NAs to null type i...

2015-06-08 Thread hqzizania
Github user hqzizania commented on a diff in the pull request: https://github.com/apache/spark/pull/6190#discussion_r31965760 --- Diff: R/pkg/R/serialize.R --- @@ -37,24 +37,38 @@ writeObject - function(con, object, writeType = TRUE) { # passing in vectors as arrays

[GitHub] spark pull request: [SPARK-6820][SPARKR]Convert NAs to null type i...

2015-06-08 Thread hqzizania
Github user hqzizania commented on a diff in the pull request: https://github.com/apache/spark/pull/6190#discussion_r31965494 --- Diff: R/pkg/R/serialize.R --- @@ -37,24 +37,38 @@ writeObject - function(con, object, writeType = TRUE) { # passing in vectors as arrays

[GitHub] spark pull request: [SPARK-6820][SPARKR]Convert NAs to null type i...

2015-06-08 Thread hqzizania
Github user hqzizania commented on the pull request: https://github.com/apache/spark/pull/6190#issuecomment-110228092 @shivaram oops, I haven't fix the Nit davies said. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-6820][SPARKR]Convert NAs to null type i...

2015-06-08 Thread hqzizania
Github user hqzizania commented on a diff in the pull request: https://github.com/apache/spark/pull/6190#discussion_r31980891 --- Diff: R/pkg/R/serialize.R --- @@ -37,6 +37,14 @@ writeObject - function(con, object, writeType = TRUE) { # passing in vectors as arrays

[GitHub] spark pull request: [SPARK-6820][SPARKR]Convert NAs to null type i...

2015-06-08 Thread hqzizania
GitHub user hqzizania reopened a pull request: https://github.com/apache/spark/pull/6190 [SPARK-6820][SPARKR]Convert NAs to null type in SparkR DataFrames You can merge this pull request into a Git repository by running: $ git pull https://github.com/hqzizania/spark R

[GitHub] spark pull request: [SPARK-6820][SPARKR]Convert NAs to null type i...

2015-06-06 Thread hqzizania
Github user hqzizania closed the pull request at: https://github.com/apache/spark/pull/6190 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: Convert NAs to null type in SparkR DataFrames

2015-05-15 Thread hqzizania
GitHub user hqzizania opened a pull request: https://github.com/apache/spark/pull/6190 Convert NAs to null type in SparkR DataFrames You can merge this pull request into a Git repository by running: $ git pull https://github.com/hqzizania/spark R Alternatively you can review

[GitHub] spark pull request: [SPARK-7226][SparkR]Support math functions in ...

2015-05-14 Thread hqzizania
GitHub user hqzizania opened a pull request: https://github.com/apache/spark/pull/6170 [SPARK-7226][SparkR]Support math functions in R DataFrame You can merge this pull request into a Git repository by running: $ git pull https://github.com/hqzizania/spark master

[GitHub] spark pull request: [SPARK-6824] Fill the docs for DataFrame API i...

2015-05-07 Thread hqzizania
GitHub user hqzizania opened a pull request: https://github.com/apache/spark/pull/5969 [SPARK-6824] Fill the docs for DataFrame API in SparkR This patch also removes the RDD docs from being built as a part of roxygen just by the method to delete ' ' of \#' . You can merge

[GitHub] spark pull request: [SPARK-6824] Fill the docs for DataFrame API i...

2015-05-07 Thread hqzizania
Github user hqzizania closed the pull request at: https://github.com/apache/spark/pull/5969 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-6824] Fill the docs for DataFrame API i...

2015-05-07 Thread hqzizania
Github user hqzizania commented on the pull request: https://github.com/apache/spark/pull/5969#issuecomment-100058948 @shivaram I've remove docs about broadcast and context, and check Rd files with ones in NAMESPACE. But I wonder why length(rdd) exists in DataFrame export

[GitHub] spark pull request: [SPARK-6841] [SPARKR] add support for mean, me...

2015-05-05 Thread hqzizania
Github user hqzizania closed the pull request at: https://github.com/apache/spark/pull/5446 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-6841] [SPARKR] add support for mean, me...

2015-05-05 Thread hqzizania
GitHub user hqzizania reopened a pull request: https://github.com/apache/spark/pull/5446 [SPARK-6841] [SPARKR] add support for mean, median, stdev etc. Moving here from https://github.com/amplab-extras/SparkR-pkg/pull/241 sum() has been implemented. (https://github.com/amplab

[GitHub] spark pull request: [SPARK-6841] [SPARKR] add support for mean, me...

2015-05-05 Thread hqzizania
GitHub user hqzizania reopened a pull request: https://github.com/apache/spark/pull/5446 [SPARK-6841] [SPARKR] add support for mean, median, stdev etc. Moving here from https://github.com/amplab-extras/SparkR-pkg/pull/241 sum() has been implemented. (https://github.com/amplab

[GitHub] spark pull request: [SPARK-6841] [SPARKR] add support for mean, me...

2015-05-05 Thread hqzizania
Github user hqzizania commented on the pull request: https://github.com/apache/spark/pull/5446#issuecomment-99115759 Implement the describe() as DataFrame API. We could add this to the RDD API in the future if we find a need. Thus, some functions also don't need to be implemented

[GitHub] spark pull request: [SPARK-6841] [SPARKR] add support for mean, me...

2015-05-05 Thread hqzizania
Github user hqzizania closed the pull request at: https://github.com/apache/spark/pull/5446 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARKR-92] Phase 2: implement sum(rdd)

2015-04-04 Thread hqzizania
Github user hqzizania closed the pull request at: https://github.com/apache/spark/pull/5360 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARKR-92] Phase 2: implement sum(rdd)

2015-04-04 Thread hqzizania
GitHub user hqzizania opened a pull request: https://github.com/apache/spark/pull/5360 [SPARKR-92] Phase 2: implement sum(rdd) You can merge this pull request into a Git repository by running: $ git pull https://github.com/hqzizania/spark R3 Alternatively you can review