[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17704446 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,368 @@ private[mllib] object BLAS extends Serializable

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17709067 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,368 @@ private[mllib] object BLAS extends Serializable

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17709059 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,368 @@ private[mllib] object BLAS extends Serializable

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17709070 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,368 @@ private[mllib] object BLAS extends Serializable

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17709063 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,368 @@ private[mllib] object BLAS extends Serializable

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17709076 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -36,9 +37,42 @@ trait Matrix extends Serializable { /** Converts to

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17709072 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -36,9 +37,42 @@ trait Matrix extends Serializable { /** Converts to

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17709058 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,368 @@ private[mllib] object BLAS extends Serializable

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17709060 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,368 @@ private[mllib] object BLAS extends Serializable

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17709065 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,368 @@ private[mllib] object BLAS extends Serializable

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17709081 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -59,11 +93,113 @@ trait Matrix extends Serializable { */ class

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17709077 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -36,9 +37,42 @@ trait Matrix extends Serializable { /** Converts to

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17709069 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,368 @@ private[mllib] object BLAS extends Serializable

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17709101 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala --- @@ -36,4 +36,79 @@ class MatricesSuite extends FunSuite

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17709089 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -83,6 +219,24 @@ object Matrices

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17709102 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala --- @@ -36,4 +36,79 @@ class MatricesSuite extends FunSuite

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17709086 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -59,11 +93,113 @@ trait Matrix extends Serializable { */ class

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17709099 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala --- @@ -36,4 +36,79 @@ class MatricesSuite extends FunSuite

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17709094 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/BreezeMatrixConversionSuite.scala --- @@ -37,4 +37,26 @@ class BreezeMatrixConversionSuite

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17709082 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -59,11 +93,113 @@ trait Matrix extends Serializable { */ class

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17709106 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala --- @@ -36,4 +36,79 @@ class MatricesSuite extends FunSuite

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17709079 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -59,11 +93,113 @@ trait Matrix extends Serializable { */ class

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17709096 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/BreezeMatrixConversionSuite.scala --- @@ -37,4 +37,26 @@ class BreezeMatrixConversionSuite

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17709085 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -59,11 +93,113 @@ trait Matrix extends Serializable { */ class

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17709088 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -59,11 +93,113 @@ trait Matrix extends Serializable { */ class

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17709092 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -93,9 +247,84 @@ object Matrices { require(dm.majorStride

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17709103 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala --- @@ -36,4 +36,79 @@ class MatricesSuite extends FunSuite

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17709097 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala --- @@ -36,4 +36,79 @@ class MatricesSuite extends FunSuite

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17709104 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala --- @@ -36,4 +36,79 @@ class MatricesSuite extends FunSuite

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2294#discussion_r17709093 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/BLASSuite.scala --- @@ -126,4 +126,116 @@ class BLASSuite extends FunSuite

[GitHub] spark pull request: [SPARK-927] detect numpy at time of use

2014-09-18 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2313#issuecomment-56135525 @JoshRosen PySpark/MLlib requires NumPy to run, and I don't think we claimed that we support different versions of NumPy. `sample()` in core is different.

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-09-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r17769391 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -43,66 +46,218 @@ trait RandomSampler[T, U] extends Pseudorandom with

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-18 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2294#issuecomment-56136224 LGTM. I'm merging this into master. (We might need to make slight changes to some methods before the 1.2 release, but let's not block the multi-model training

[GitHub] spark pull request: [SPARK-3491] [MLlib] [PySpark] use pickle to s...

2014-09-18 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2378#issuecomment-56136476 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [MLLIB] fix a unresolved reference variable 'n...

2014-09-18 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2423#issuecomment-56136584 @OdinLin Thanks for catching the bug! As @davies mentioned, #2378 will completely replace the current SerDe. Could you close this PR? --- If your project is set up for

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-18 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2419#issuecomment-56136714 @derrickburns I cannot see the Jenkins log. Let's call Jenkins again. test this please --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-09-19 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2455#issuecomment-56144570 add to whitelist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-09-19 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2455#issuecomment-56144582 this is ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-3491] [MLlib] [PySpark] use pickle to s...

2014-09-19 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2378#issuecomment-56147622 @davies Does `PickleSerializer` compress data? If not, maybe we should cache the deserialized RDD instead of the one from `_.reserialize`. They have the same storage. I

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-19 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2419#issuecomment-56235934 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

2014-09-19 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2451#issuecomment-56239321 @brkyvz Let's try to split this PR into small ones. For example, functions like factory methods for sparse matrices should not be included in this PR. We want to kee

[GitHub] spark pull request: [SPARK-3491] [MLlib] [PySpark] use pickle to s...

2014-09-19 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2378#issuecomment-56241679 @davies LGTM except few linear algebra operators and caching. But those are orthogonal to this PR. I'm merging this and we will update the linear algebra ops

[GitHub] spark pull request: [SPARK-3491] [MLlib] [PySpark] use pickle to s...

2014-09-19 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2378#issuecomment-56242298 Merged. Thanks a lot! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [MLLib] Fix example code variable name misspel...

2014-09-22 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2459#issuecomment-56397445 LGTM. Merged into master and branch-1.1. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-3614][MLLIB] Add minimumOccurence filte...

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2494#discussion_r17920651 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/IDF.scala --- @@ -123,7 +138,18 @@ private object IDF { val inv = new Array[Double](n

[GitHub] spark pull request: [SPARK-3614][MLLIB] Add minimumOccurence filte...

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2494#discussion_r17920664 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/IDFSuite.scala --- @@ -54,4 +54,38 @@ class IDFSuite extends FunSuite with LocalSparkContext

[GitHub] spark pull request: [SPARK-3614][MLLIB] Add minimumOccurence filte...

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2494#discussion_r17920659 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/IDFSuite.scala --- @@ -54,4 +54,38 @@ class IDFSuite extends FunSuite with LocalSparkContext

[GitHub] spark pull request: [SPARK-3614][MLLIB] Add minimumOccurence filte...

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2494#discussion_r17920640 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/IDF.scala --- @@ -30,9 +30,20 @@ import org.apache.spark.rdd.RDD * Inverse document

[GitHub] spark pull request: [SPARK-3614][MLLIB] Add minimumOccurence filte...

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2494#discussion_r17920639 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/IDF.scala --- @@ -30,9 +30,20 @@ import org.apache.spark.rdd.RDD * Inverse document

[GitHub] spark pull request: [SPARK-3614][MLLIB] Add minimumOccurence filte...

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2494#discussion_r17920668 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/IDFSuite.scala --- @@ -54,4 +54,38 @@ class IDFSuite extends FunSuite with LocalSparkContext

[GitHub] spark pull request: [SPARK-3614][MLLIB] Add minimumOccurence filte...

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2494#discussion_r17920646 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/IDF.scala --- @@ -60,13 +72,16 @@ class IDF { private object IDF { /** Document

[GitHub] spark pull request: [SPARK-3614][MLLIB] Add minimumOccurence filte...

2014-09-23 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2494#issuecomment-56590081 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17943398 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala --- @@ -17,20 +17,21 @@ package

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17943404 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala --- @@ -228,4 +253,23 @@ object DecisionTreeRunner

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17943413 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -582,42 +472,36 @@ object DecisionTree extends Serializable with Logging

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17943415 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -582,42 +472,36 @@ object DecisionTree extends Serializable with Logging

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17943420 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -649,71 +542,65 @@ object DecisionTree extends Serializable with Logging

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17943417 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -649,71 +542,65 @@ object DecisionTree extends Serializable with Logging

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17943410 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -582,42 +472,36 @@ object DecisionTree extends Serializable with Logging

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17943429 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala --- @@ -0,0 +1,430 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17943437 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala --- @@ -0,0 +1,430 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17943424 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala --- @@ -0,0 +1,430 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17943435 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala --- @@ -0,0 +1,430 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17943442 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/BaggedPoint.scala --- @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17943446 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/BaggedPoint.scala --- @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17943444 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/BaggedPoint.scala --- @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17943438 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/BaggedPoint.scala --- @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17943451 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DTStatsAggregator.scala --- @@ -189,6 +160,230 @@ private[tree] class DTStatsAggregator

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17943440 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/BaggedPoint.scala --- @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17943453 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DTStatsAggregator.scala --- @@ -189,6 +160,230 @@ private[tree] class DTStatsAggregator

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17943471 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DecisionTreeMetadata.scala --- @@ -128,13 +139,34 @@ private[tree] object

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17943458 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DTStatsAggregator.scala --- @@ -189,6 +160,230 @@ private[tree] class DTStatsAggregator

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17943476 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/model/RandomForestModel.scala --- @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17943493 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/tree/RandomForestSuite.scala --- @@ -0,0 +1,221 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17943501 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/tree/RandomForestSuite.scala --- @@ -0,0 +1,221 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17943480 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/model/RandomForestModel.scala --- @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-23 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17943465 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DTStatsAggregator.scala --- @@ -189,6 +160,230 @@ private[tree] class DTStatsAggregator

[GitHub] spark pull request: [SPARK-3614][MLLIB] Add minimumOccurence filte...

2014-09-23 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2494#issuecomment-56602599 @rnowling let's retry :) test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-16 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37776246 @pwendell @mridulm , RDD.sliding is a public method in this PR. If we don't want users to treat it as a cheap operation, how about moving it to a separate RDDFunc

[GitHub] spark pull request: Principal Component Analysis

2014-03-17 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/88#issuecomment-37790090 @rezazadeh U, Sigma, and V are all stored in DenseMatrix format in the DenseMatrixSVD class. For tall-and-skinny PCA/SVD, U should use RDD for storage. However, Sigma and V

[GitHub] spark pull request: Principal Component Analysis

2014-03-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10647310 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/PCA.scala --- @@ -0,0 +1,153 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: Principal Component Analysis

2014-03-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10647325 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/PCA.scala --- @@ -0,0 +1,153 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: Principal Component Analysis

2014-03-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10647435 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/PCA.scala --- @@ -0,0 +1,153 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-17 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/79#issuecomment-37839557 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-17 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/79#issuecomment-37840191 @manishamde Please let me know if this is read for another pass. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-1260]: faster construction of features ...

2014-03-17 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/161 [SPARK-1260]: faster construction of features with intercept The current implementation uses `Array(1.0, features: _*)` to construct a new array with intercept. This is not efficient for big arrays

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-17 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37887863 @mateiz I don't see the bugs you mentioned. compute() checks parent partitions to assemble the tail to append. I think the approach you suggested is the same as in th

[GitHub] spark pull request: [SPARK-1266] persist factors in implicit ALS

2014-03-17 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/165 [SPARK-1266] persist factors in implicit ALS In implicit ALS computation, the user or product factor is used twice in each iteration. Caching can certainly help accelerate the computation. I saw the

[GitHub] spark pull request: [SPARK-1266] persist factors in implicit ALS

2014-03-17 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/165#issuecomment-37899861 @MLnick I saw you implemented the first version of implicit ALS. Do you have time to review this PR? Thanks! --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: [SPARK-1133] add small files input in MLlib

2014-03-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/164#discussion_r10690775 --- Diff: mllib/src/main/java/org/apache/spark/mllib/util/BatchFileInputFormat.java --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1133] add small files input in MLlib

2014-03-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/164#discussion_r10690798 --- Diff: mllib/src/main/java/org/apache/spark/mllib/util/BatchFileInputFormat.java --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1133] add small files input in MLlib

2014-03-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/164#discussion_r10690809 --- Diff: mllib/src/main/java/org/apache/spark/mllib/util/BatchFileInputFormat.java --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1133] add small files input in MLlib

2014-03-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/164#discussion_r10690831 --- Diff: mllib/src/main/java/org/apache/spark/mllib/util/BatchFileInputFormat.java --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1133] add small files input in MLlib

2014-03-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/164#discussion_r10690859 --- Diff: mllib/src/main/java/org/apache/spark/mllib/util/BatchFileRecordReader.java --- @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1133] add small files input in MLlib

2014-03-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/164#discussion_r10690953 --- Diff: mllib/src/main/java/org/apache/spark/mllib/util/BatchFileRecordReader.java --- @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-17 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37900586 I see the quadratic storage and this is why I didn't use it in the PR. I will use the implementation in this PR, but move it to MLlib and mark it private for interna

[GitHub] spark pull request: [MLLIB-18] [WIP] Adding sparse data support an...

2014-03-17 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/117#issuecomment-37900634 @dlwh Thanks! Did you have chance to cut a minor release? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-1266] persist factors in implicit ALS

2014-03-17 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/165#issuecomment-37902246 Thanks a lot! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-1133] add small files input in MLlib

2014-03-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/164#discussion_r10691823 --- Diff: mllib/src/main/java/org/apache/spark/mllib/util/BatchFileInputFormat.java --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1133] add small files input in MLlib

2014-03-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/164#discussion_r10692101 --- Diff: mllib/src/main/java/org/apache/spark/mllib/util/BatchFileRecordReader.java --- @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software

<    1   2   3   4   5   6   7   8   9   10   >