Github user hqzizania commented on the issue:
https://github.com/apache/spark/pull/19536
Wow, thank you for reopening. LOL @mpjlu
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user hqzizania commented on the issue:
https://github.com/apache/spark/pull/13891
@srowen I am not familiar with MiMa really, so what should I do now? Or
just go back to [the previous
commit](a6b5a16cd78e4efe99fda40f92592c9712b04146), and create a JIRA for the
issue
Github user hqzizania commented on the issue:
https://github.com/apache/spark/pull/13891
@srowen It seems mima test still fails when putting the new Param at the
end of train method. :(
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user hqzizania commented on a diff in the pull request:
https://github.com/apache/spark/pull/13891#discussion_r84948911
--- Diff: project/MimaExcludes.scala ---
@@ -864,6 +864,9 @@ object MimaExcludes {
// [SPARK-12221] Add CPU time to metrics
Github user hqzizania commented on the issue:
https://github.com/apache/spark/pull/13891
@mengxr @srowen @yanboliang A threshold param is added for unit tests.
Does it look okay now?
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user hqzizania commented on the issue:
https://github.com/apache/spark/pull/13891
@mengxr I see. I will add a param for it. :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user hqzizania commented on the issue:
https://github.com/apache/spark/pull/13891
@yanboliang So sorry for my late response.
Some regression performance test results:
Datasets: using
[genExplicitTestData](https://github.com/apache/spark/pull/13891/files#diff
Github user hqzizania commented on the issue:
https://github.com/apache/spark/pull/13891
@yanboliang sorry, i'm on a business trip and will upload the test result
ASAP.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user hqzizania commented on the issue:
https://github.com/apache/spark/pull/14640
This work may be similar with
[SPARK-8971](https://github.com/apache/spark/pull/14321) which is another
variation of KFold, and very significant in some cases. I suppose it is okay
to add
Github user hqzizania commented on the issue:
https://github.com/apache/spark/pull/14738
Fixed. Thanks for the reviews.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user hqzizania commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r75689943
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -171,14 +180,47 @@ object ChiSqSelectorModel extends
Loader
Github user hqzizania commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r75689230
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -171,14 +180,47 @@ object ChiSqSelectorModel extends
Loader
Github user hqzizania commented on a diff in the pull request:
https://github.com/apache/spark/pull/14738#discussion_r75634235
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
---
@@ -167,11 +173,11 @@ private[shared] object
Github user hqzizania commented on the issue:
https://github.com/apache/spark/pull/14738
It is linked to SPARK-17090 now. @srowen
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user hqzizania commented on the issue:
https://github.com/apache/spark/pull/14738
Thanks for sethan's comments :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user hqzizania commented on the issue:
https://github.com/apache/spark/pull/14738
@srowen this is about a @group name called expertParam, just a part of
[SPARK-17090](https://github.com/apache/spark/pull/14717). SPARK-17175 is for a
expert formula which was discussed
Github user hqzizania commented on a diff in the pull request:
https://github.com/apache/spark/pull/14738#discussion_r75589699
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
---
@@ -153,6 +154,11 @@ private[shared] object
GitHub user hqzizania opened a pull request:
https://github.com/apache/spark/pull/14738
[MINOR][ML]Add expert param support to SharedParamsCodeGen
## What changes were proposed in this pull request?
Add expert param support to SharedParamsCodeGen where aggregationDepth
Github user hqzizania commented on a diff in the pull request:
https://github.com/apache/spark/pull/14717#discussion_r75588359
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala ---
@@ -389,4 +389,21 @@ private[ml] trait HasSolver extends Params
Github user hqzizania commented on the issue:
https://github.com/apache/spark/pull/14717
Thanks for the reviews :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user hqzizania commented on a diff in the pull request:
https://github.com/apache/spark/pull/14717#discussion_r75588057
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala ---
@@ -389,4 +389,21 @@ private[ml] trait HasSolver extends Params
Github user hqzizania commented on a diff in the pull request:
https://github.com/apache/spark/pull/14717#discussion_r75577571
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -172,6 +173,17 @@ class LinearRegression @Since("
Github user hqzizania commented on a diff in the pull request:
https://github.com/apache/spark/pull/14717#discussion_r75566293
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -1331,8 +1343,8 @@ private class LogisticCostFun
Github user hqzizania commented on a diff in the pull request:
https://github.com/apache/spark/pull/14717#discussion_r75512159
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala ---
@@ -389,4 +389,21 @@ private[ml] trait HasSolver extends Params
Github user hqzizania commented on a diff in the pull request:
https://github.com/apache/spark/pull/14717#discussion_r75510309
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -48,7 +48,7 @@ import
Github user hqzizania commented on a diff in the pull request:
https://github.com/apache/spark/pull/14717#discussion_r75509839
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -256,6 +256,15 @@ class LogisticRegression @Since
GitHub user hqzizania opened a pull request:
https://github.com/apache/spark/pull/14717
[SPARK-17090][ML]Make tree aggregation level in linear/logistic regression
configurable
## What changes were proposed in this pull request?
Linear/logistic regression use treeAggregate
Github user hqzizania commented on the issue:
https://github.com/apache/spark/pull/14449
Percentage is a useful addition to ChiSquareSelector, it is a common and
intuitive param to data scientists and statistician as scikit-learn has, but it
may be not worthy a whole other API
Github user hqzizania commented on the issue:
https://github.com/apache/spark/pull/13891
cc @mengxr @yanboliang Was this patch Okay?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user hqzizania commented on the issue:
https://github.com/apache/spark/pull/13891
I set the stack size as 128 according to some more tests results, where 128
maybe a conservative size. However, this change will bypassing existing unit
tests, as `doStack` is always `false
Github user hqzizania commented on the issue:
https://github.com/apache/spark/pull/13891
@srowen Ops~ The `copytoTri()` is indeed a little different in the test
code. I change it into:
```
private def copyToTri(): Unit = {
var i = 0
var j = 0
Github user hqzizania commented on the issue:
https://github.com/apache/spark/pull/13891
@mengxr this is a simple imitation of the loop in `computeFactors[ID]()`
ALS using. It runs on a bare-metal node with 4 cores. All tests use all cores
by RDD multi-partitions.
---
If your
Github user hqzizania commented on the issue:
https://github.com/apache/spark/pull/13891
code for testing
```
def run(rank: Int, a:Int) = {
println(s"blas.getclass() = ${blas.getClass.toString} on process $rank")
val m = 1 << a
Github user hqzizania commented on the issue:
https://github.com/apache/spark/pull/13891
@mengxr Do you mean only test `add()` and `addStack()` without ALS?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user hqzizania commented on the issue:
https://github.com/apache/spark/pull/13891
This is a prototype. Actually, it is critical if it will be faster = =!
I have done a simple test, the effect is up to "number of user for each
product". The "number of user f
GitHub user hqzizania opened a pull request:
https://github.com/apache/spark/pull/13891
[SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS
## What changes were proposed in this pull request?
jira: https://issues.apache.org/jira/browse/SPARK-6685
This is to swtich DSPR
Github user hqzizania commented on the pull request:
https://github.com/apache/spark/pull/6190#issuecomment-110270315
@davies :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user hqzizania commented on a diff in the pull request:
https://github.com/apache/spark/pull/6190#discussion_r31965760
--- Diff: R/pkg/R/serialize.R ---
@@ -37,24 +37,38 @@ writeObject - function(con, object, writeType = TRUE) {
# passing in vectors as arrays
Github user hqzizania commented on a diff in the pull request:
https://github.com/apache/spark/pull/6190#discussion_r31965494
--- Diff: R/pkg/R/serialize.R ---
@@ -37,24 +37,38 @@ writeObject - function(con, object, writeType = TRUE) {
# passing in vectors as arrays
Github user hqzizania commented on the pull request:
https://github.com/apache/spark/pull/6190#issuecomment-110228092
@shivaram oops, I haven't fix the Nit davies said.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user hqzizania commented on a diff in the pull request:
https://github.com/apache/spark/pull/6190#discussion_r31980891
--- Diff: R/pkg/R/serialize.R ---
@@ -37,6 +37,14 @@ writeObject - function(con, object, writeType = TRUE) {
# passing in vectors as arrays
GitHub user hqzizania reopened a pull request:
https://github.com/apache/spark/pull/6190
[SPARK-6820][SPARKR]Convert NAs to null type in SparkR DataFrames
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/hqzizania/spark R
Github user hqzizania closed the pull request at:
https://github.com/apache/spark/pull/6190
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user hqzizania opened a pull request:
https://github.com/apache/spark/pull/6190
Convert NAs to null type in SparkR DataFrames
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/hqzizania/spark R
Alternatively you can review
GitHub user hqzizania opened a pull request:
https://github.com/apache/spark/pull/6170
[SPARK-7226][SparkR]Support math functions in R DataFrame
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/hqzizania/spark master
GitHub user hqzizania opened a pull request:
https://github.com/apache/spark/pull/5969
[SPARK-6824] Fill the docs for DataFrame API in SparkR
This patch also removes the RDD docs from being built as a part of roxygen
just by the method to delete
' ' of \#' .
You can merge
Github user hqzizania closed the pull request at:
https://github.com/apache/spark/pull/5969
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user hqzizania commented on the pull request:
https://github.com/apache/spark/pull/5969#issuecomment-100058948
@shivaram I've remove docs about broadcast and context, and check Rd files
with ones in NAMESPACE. But I wonder why length(rdd) exists in DataFrame export
Github user hqzizania closed the pull request at:
https://github.com/apache/spark/pull/5446
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user hqzizania reopened a pull request:
https://github.com/apache/spark/pull/5446
[SPARK-6841] [SPARKR] add support for mean, median, stdev etc.
Moving here from https://github.com/amplab-extras/SparkR-pkg/pull/241
sum() has been implemented.
(https://github.com/amplab
GitHub user hqzizania reopened a pull request:
https://github.com/apache/spark/pull/5446
[SPARK-6841] [SPARKR] add support for mean, median, stdev etc.
Moving here from https://github.com/amplab-extras/SparkR-pkg/pull/241
sum() has been implemented.
(https://github.com/amplab
Github user hqzizania commented on the pull request:
https://github.com/apache/spark/pull/5446#issuecomment-99115759
Implement the describe() as DataFrame API. We could add this to the RDD API
in the future if we find a need. Thus, some functions also don't need to be
implemented
Github user hqzizania closed the pull request at:
https://github.com/apache/spark/pull/5446
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user hqzizania closed the pull request at:
https://github.com/apache/spark/pull/5360
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user hqzizania opened a pull request:
https://github.com/apache/spark/pull/5360
[SPARKR-92] Phase 2: implement sum(rdd)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/hqzizania/spark R3
Alternatively you can review
55 matches
Mail list logo