Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17704446
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala ---
@@ -197,4 +201,368 @@ private[mllib] object BLAS extends Serializable
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17709067
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala ---
@@ -197,4 +201,368 @@ private[mllib] object BLAS extends Serializable
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17709059
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala ---
@@ -197,4 +201,368 @@ private[mllib] object BLAS extends Serializable
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17709070
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala ---
@@ -197,4 +201,368 @@ private[mllib] object BLAS extends Serializable
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17709063
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala ---
@@ -197,4 +201,368 @@ private[mllib] object BLAS extends Serializable
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17709076
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
---
@@ -36,9 +37,42 @@ trait Matrix extends Serializable {
/** Converts to
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17709072
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
---
@@ -36,9 +37,42 @@ trait Matrix extends Serializable {
/** Converts to
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17709058
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala ---
@@ -197,4 +201,368 @@ private[mllib] object BLAS extends Serializable
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17709060
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala ---
@@ -197,4 +201,368 @@ private[mllib] object BLAS extends Serializable
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17709065
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala ---
@@ -197,4 +201,368 @@ private[mllib] object BLAS extends Serializable
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17709081
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
---
@@ -59,11 +93,113 @@ trait Matrix extends Serializable {
*/
class
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17709077
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
---
@@ -36,9 +37,42 @@ trait Matrix extends Serializable {
/** Converts to
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17709069
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala ---
@@ -197,4 +201,368 @@ private[mllib] object BLAS extends Serializable
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17709101
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala ---
@@ -36,4 +36,79 @@ class MatricesSuite extends FunSuite
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17709089
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
---
@@ -83,6 +219,24 @@ object Matrices
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17709102
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala ---
@@ -36,4 +36,79 @@ class MatricesSuite extends FunSuite
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17709086
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
---
@@ -59,11 +93,113 @@ trait Matrix extends Serializable {
*/
class
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17709099
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala ---
@@ -36,4 +36,79 @@ class MatricesSuite extends FunSuite
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17709094
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/linalg/BreezeMatrixConversionSuite.scala
---
@@ -37,4 +37,26 @@ class BreezeMatrixConversionSuite
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17709082
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
---
@@ -59,11 +93,113 @@ trait Matrix extends Serializable {
*/
class
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17709106
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala ---
@@ -36,4 +36,79 @@ class MatricesSuite extends FunSuite
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17709079
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
---
@@ -59,11 +93,113 @@ trait Matrix extends Serializable {
*/
class
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17709096
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/linalg/BreezeMatrixConversionSuite.scala
---
@@ -37,4 +37,26 @@ class BreezeMatrixConversionSuite
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17709085
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
---
@@ -59,11 +93,113 @@ trait Matrix extends Serializable {
*/
class
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17709088
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
---
@@ -59,11 +93,113 @@ trait Matrix extends Serializable {
*/
class
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17709092
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
---
@@ -93,9 +247,84 @@ object Matrices {
require(dm.majorStride
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17709103
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala ---
@@ -36,4 +36,79 @@ class MatricesSuite extends FunSuite
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17709097
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala ---
@@ -36,4 +36,79 @@ class MatricesSuite extends FunSuite
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17709104
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala ---
@@ -36,4 +36,79 @@ class MatricesSuite extends FunSuite
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2294#discussion_r17709093
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/linalg/BLASSuite.scala ---
@@ -126,4 +126,116 @@ class BLASSuite extends FunSuite
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2313#issuecomment-56135525
@JoshRosen PySpark/MLlib requires NumPy to run, and I don't think we
claimed that we support different versions of NumPy.
`sample()` in core is different.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2455#discussion_r17769391
--- Diff:
core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala ---
@@ -43,66 +46,218 @@ trait RandomSampler[T, U] extends Pseudorandom with
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2294#issuecomment-56136224
LGTM. I'm merging this into master. (We might need to make slight changes
to some methods before the 1.2 release, but let's not block the multi-model
training
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2378#issuecomment-56136476
test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2423#issuecomment-56136584
@OdinLin Thanks for catching the bug! As @davies mentioned, #2378 will
completely replace the current SerDe. Could you close this PR?
---
If your project is set up for
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2419#issuecomment-56136714
@derrickburns I cannot see the Jenkins log. Let's call Jenkins again.
test this please
---
If your project is set up for it, you can reply to this email and
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2455#issuecomment-56144570
add to whitelist
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2455#issuecomment-56144582
this is ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2378#issuecomment-56147622
@davies Does `PickleSerializer` compress data? If not, maybe we should
cache the deserialized RDD instead of the one from `_.reserialize`. They have
the same storage. I
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2419#issuecomment-56235934
test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2451#issuecomment-56239321
@brkyvz Let's try to split this PR into small ones. For example, functions
like factory methods for sparse matrices should not be included in this PR. We
want to kee
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2378#issuecomment-56241679
@davies LGTM except few linear algebra operators and caching. But those are
orthogonal to this PR. I'm merging this and we will update the linear algebra
ops
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2378#issuecomment-56242298
Merged. Thanks a lot!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2459#issuecomment-56397445
LGTM. Merged into master and branch-1.1. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2494#discussion_r17920651
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/IDF.scala ---
@@ -123,7 +138,18 @@ private object IDF {
val inv = new Array[Double](n
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2494#discussion_r17920664
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/feature/IDFSuite.scala ---
@@ -54,4 +54,38 @@ class IDFSuite extends FunSuite with LocalSparkContext
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2494#discussion_r17920659
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/feature/IDFSuite.scala ---
@@ -54,4 +54,38 @@ class IDFSuite extends FunSuite with LocalSparkContext
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2494#discussion_r17920640
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/IDF.scala ---
@@ -30,9 +30,20 @@ import org.apache.spark.rdd.RDD
* Inverse document
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2494#discussion_r17920639
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/IDF.scala ---
@@ -30,9 +30,20 @@ import org.apache.spark.rdd.RDD
* Inverse document
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2494#discussion_r17920668
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/feature/IDFSuite.scala ---
@@ -54,4 +54,38 @@ class IDFSuite extends FunSuite with LocalSparkContext
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2494#discussion_r17920646
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/IDF.scala ---
@@ -60,13 +72,16 @@ class IDF {
private object IDF {
/** Document
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2494#issuecomment-56590081
test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2435#discussion_r17943398
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala
---
@@ -17,20 +17,21 @@
package
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2435#discussion_r17943404
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala
---
@@ -228,4 +253,23 @@ object DecisionTreeRunner
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2435#discussion_r17943413
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -582,42 +472,36 @@ object DecisionTree extends Serializable with Logging
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2435#discussion_r17943415
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -582,42 +472,36 @@ object DecisionTree extends Serializable with Logging
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2435#discussion_r17943420
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -649,71 +542,65 @@ object DecisionTree extends Serializable with Logging
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2435#discussion_r17943417
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -649,71 +542,65 @@ object DecisionTree extends Serializable with Logging
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2435#discussion_r17943410
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -582,42 +472,36 @@ object DecisionTree extends Serializable with Logging
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2435#discussion_r17943429
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala ---
@@ -0,0 +1,430 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2435#discussion_r17943437
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala ---
@@ -0,0 +1,430 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2435#discussion_r17943424
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala ---
@@ -0,0 +1,430 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2435#discussion_r17943435
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala ---
@@ -0,0 +1,430 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2435#discussion_r17943442
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impl/BaggedPoint.scala ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2435#discussion_r17943446
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impl/BaggedPoint.scala ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2435#discussion_r17943444
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impl/BaggedPoint.scala ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2435#discussion_r17943438
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impl/BaggedPoint.scala ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2435#discussion_r17943451
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DTStatsAggregator.scala
---
@@ -189,6 +160,230 @@ private[tree] class DTStatsAggregator
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2435#discussion_r17943440
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impl/BaggedPoint.scala ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2435#discussion_r17943453
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DTStatsAggregator.scala
---
@@ -189,6 +160,230 @@ private[tree] class DTStatsAggregator
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2435#discussion_r17943471
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DecisionTreeMetadata.scala
---
@@ -128,13 +139,34 @@ private[tree] object
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2435#discussion_r17943458
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DTStatsAggregator.scala
---
@@ -189,6 +160,230 @@ private[tree] class DTStatsAggregator
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2435#discussion_r17943476
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/model/RandomForestModel.scala
---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2435#discussion_r17943493
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/tree/RandomForestSuite.scala ---
@@ -0,0 +1,221 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2435#discussion_r17943501
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/tree/RandomForestSuite.scala ---
@@ -0,0 +1,221 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2435#discussion_r17943480
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/model/RandomForestModel.scala
---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2435#discussion_r17943465
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DTStatsAggregator.scala
---
@@ -189,6 +160,230 @@ private[tree] class DTStatsAggregator
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2494#issuecomment-56602599
@rnowling let's retry :)
test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/136#issuecomment-37776246
@pwendell @mridulm , RDD.sliding is a public method in this PR. If we don't
want users to treat it as a cheap operation, how about moving it to a separate
RDDFunc
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/88#issuecomment-37790090
@rezazadeh U, Sigma, and V are all stored in DenseMatrix format in the
DenseMatrixSVD class. For tall-and-skinny PCA/SVD, U should use RDD for
storage. However, Sigma and V
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/88#discussion_r10647310
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/PCA.scala ---
@@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/88#discussion_r10647325
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/PCA.scala ---
@@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/88#discussion_r10647435
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/PCA.scala ---
@@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/79#issuecomment-37839557
Jenkins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/79#issuecomment-37840191
@manishamde Please let me know if this is read for another pass. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on
GitHub user mengxr opened a pull request:
https://github.com/apache/spark/pull/161
[SPARK-1260]: faster construction of features with intercept
The current implementation uses `Array(1.0, features: _*)` to construct a
new array with intercept. This is not efficient for big arrays
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/136#issuecomment-37887863
@mateiz I don't see the bugs you mentioned. compute() checks parent
partitions to assemble the tail to append. I think the approach you suggested
is the same as in th
GitHub user mengxr opened a pull request:
https://github.com/apache/spark/pull/165
[SPARK-1266] persist factors in implicit ALS
In implicit ALS computation, the user or product factor is used twice in
each iteration. Caching can certainly help accelerate the computation. I saw
the
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/165#issuecomment-37899861
@MLnick I saw you implemented the first version of implicit ALS. Do you
have time to review this PR? Thanks!
---
If your project is set up for it, you can reply to this
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/164#discussion_r10690775
--- Diff:
mllib/src/main/java/org/apache/spark/mllib/util/BatchFileInputFormat.java ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/164#discussion_r10690798
--- Diff:
mllib/src/main/java/org/apache/spark/mllib/util/BatchFileInputFormat.java ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/164#discussion_r10690809
--- Diff:
mllib/src/main/java/org/apache/spark/mllib/util/BatchFileInputFormat.java ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/164#discussion_r10690831
--- Diff:
mllib/src/main/java/org/apache/spark/mllib/util/BatchFileInputFormat.java ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/164#discussion_r10690859
--- Diff:
mllib/src/main/java/org/apache/spark/mllib/util/BatchFileRecordReader.java ---
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/164#discussion_r10690953
--- Diff:
mllib/src/main/java/org/apache/spark/mllib/util/BatchFileRecordReader.java ---
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/136#issuecomment-37900586
I see the quadratic storage and this is why I didn't use it in the PR. I
will use the implementation in this PR, but move it to MLlib and mark it
private for interna
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/117#issuecomment-37900634
@dlwh Thanks! Did you have chance to cut a minor release?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/165#issuecomment-37902246
Thanks a lot!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/164#discussion_r10691823
--- Diff:
mllib/src/main/java/org/apache/spark/mllib/util/BatchFileInputFormat.java ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/164#discussion_r10692101
--- Diff:
mllib/src/main/java/org/apache/spark/mllib/util/BatchFileRecordReader.java ---
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software
401 - 500 of 9100 matches
Mail list logo