[GitHub] spark pull request: [SPARK-4494][mllib] IDFModel.transform() add s...

2014-12-10 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3603#issuecomment-66520117 @yu-iskw Thanks for the updates! Found 1 typo, but other than that, LGTM @mengxr Perhaps you can weigh in on the Python API change before this is committed

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-10 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66529072 @akopich Thanks for the updates. It looks like rebasing did not work correctly (looking at the 10K+ lines in this PR!). It should be possible to fix with rebase

[GitHub] spark pull request: [SPARK-4821] [mllib] [python] [docs] Fix for p...

2014-12-10 Thread jkbradley
GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/3669 [SPARK-4821] [mllib] [python] [docs] Fix for pyspark.mllib.rand doc + small doc edit + include edit to make IntelliJ happy CC: @davies @mengxr Note to @davies -- this does

[GitHub] spark pull request: [SPARK-4821] [mllib] [python] [docs] Fix for p...

2014-12-10 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3669#issuecomment-66537698 CC: @pwendell This fixes a problem with missing documentation in the current build (and in branch-1.2). --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-4821] [mllib] [python] [docs] Fix for p...

2014-12-10 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3669#issuecomment-66543782 @davies Thanks very much! Updated. I made a separate JIRA for fixing Python doc annotations: [https://issues.apache.org/jira/browse/SPARK-4822] --- If your

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21655815 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala --- @@ -0,0 +1,283 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21655804 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala --- @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21655807 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala --- @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21655817 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala --- @@ -0,0 +1,283 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21655820 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala --- @@ -0,0 +1,283 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21655821 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala --- @@ -0,0 +1,283 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21655814 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala --- @@ -0,0 +1,283 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21655813 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala --- @@ -0,0 +1,283 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21655818 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala --- @@ -0,0 +1,283 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21655828 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala --- @@ -0,0 +1,283 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21655811 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala --- @@ -0,0 +1,283 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21655837 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala --- @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21655833 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala --- @@ -0,0 +1,283 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21655827 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala --- @@ -0,0 +1,283 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21655832 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala --- @@ -0,0 +1,283 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21655824 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala --- @@ -0,0 +1,283 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21655802 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala --- @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21655806 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala --- @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21655839 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximizationSuite.scala --- @@ -0,0 +1,44 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21655816 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala --- @@ -0,0 +1,283 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21655840 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximizationSuite.scala --- @@ -0,0 +1,44 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21655831 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala --- @@ -0,0 +1,283 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21655822 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala --- @@ -0,0 +1,283 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21655836 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala --- @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21655819 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala --- @@ -0,0 +1,283 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21655830 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala --- @@ -0,0 +1,283 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-10 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-66563244 @tgaloppo Thanks very much for the PR, and sincere apologies for the slow response about it! @manishamde was right about people being too preoccupied with the 1.2

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-10 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-66563419 @tgaloppo Let me know if you have questions, and also when I should make another pass over this PR---thanks again! --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-4736][mllib] [random forest] functions ...

2014-12-10 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3583#issuecomment-66563683 @dikejiang Great, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-11 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21695326 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala --- @@ -0,0 +1,283 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-3382] GradientDescent convergence toler...

2014-12-11 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3636#discussion_r21724909 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/optimization/GradientDescentSuite.scala --- @@ -138,6 +138,45 @@ class GradientDescentSuite extends

[GitHub] spark pull request: [SPARK-3382] GradientDescent convergence toler...

2014-12-11 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3636#discussion_r21724910 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala --- @@ -182,34 +203,46 @@ object GradientDescent extends Logging

[GitHub] spark pull request: [SPARK-3382] GradientDescent convergence toler...

2014-12-11 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3636#discussion_r21724912 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala --- @@ -77,6 +80,14 @@ class GradientDescent private[mllib

[GitHub] spark pull request: [SPARK-3382] GradientDescent convergence toler...

2014-12-11 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3636#discussion_r21724908 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/optimization/GradientDescentSuite.scala --- @@ -138,6 +138,45 @@ class GradientDescentSuite extends

[GitHub] spark pull request: [SPARK-3382] GradientDescent convergence toler...

2014-12-11 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3636#issuecomment-66721864 @Lewuathe Thanks for the updates! I just added a few last comments (which should be the last). --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-3382] GradientDescent convergence toler...

2014-12-11 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3636#issuecomment-66723145 @Lewuathe Sorry---one more request. Could you actually use convergenceTol instead of convergenceTolerance in order to fit with the public API in LBFGS? Thanks

[GitHub] spark pull request: [SPARK-4736][mllib] [random forest] functions ...

2014-12-15 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3583#issuecomment-67049092 @dikejiang Apologies--I think I was not clear. I was recommending that you change this PR to implement predictRaw(), rather than predictWithWeight(). Does

[GitHub] spark pull request: [SPARK-3382] GradientDescent convergence toler...

2014-12-15 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3636#issuecomment-67070947 @Lewuathe Once the scala style is fixed (dev/scalastyle), this should be ready. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21859900 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21859898 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala --- @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-15 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-67072947 @tgaloppo Thanks for the updates! You did exactly what I had in mind for MultivariateGaussian; thanks. My main comments now are still about style. I realize

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-15 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-67073269 Oh, also, IntelliJ 13 does a pretty good job with the indentation, if you're using it. You can run sbt/sbt gen-idea to create project files before opening the Spark

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21860601 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21860754 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21860758 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21860755 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21860757 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21860764 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3643#discussion_r21863319 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala --- @@ -264,6 +263,92 @@ object MLUtils { } Vectors.fromBreeze

[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3643#discussion_r21863325 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala --- @@ -264,6 +263,92 @@ object MLUtils { } Vectors.fromBreeze

[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3643#discussion_r2186 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala --- @@ -264,6 +263,92 @@ object MLUtils { } Vectors.fromBreeze

[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3643#discussion_r21863318 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala --- @@ -264,6 +263,92 @@ object MLUtils { } Vectors.fromBreeze

[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3643#discussion_r21863326 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala --- @@ -264,6 +263,92 @@ object MLUtils { } Vectors.fromBreeze

[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3643#discussion_r21863337 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala --- @@ -264,6 +263,92 @@ object MLUtils { } Vectors.fromBreeze

[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-15 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3643#issuecomment-67079749 @viirya Thanks for the updates! I made some inline comments, one of them major. Please let me know when to check again. --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-15 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-67083325 @akopich Thanks for the updates! (Much easier to see the diff now) The decision about setters vs. constructor arguments was from [this JIRA (design doc

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-12-15 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-67083539 The test logs have expired...rerunning --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-4822] Use sphinx tags for Python doc an...

2014-12-15 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3685#issuecomment-67083774 @Lewuathe Thanks for the PR! As long as you're fixing those, would you mind fixing the 2 DeveloperApi tags in pyspark/mllib/feature.py and the 2 WARN tags in pyspark

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-15 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-67087866 I just found out (hearsay) that Accumulator may incur a big performance penalty relative to methods like RDD.aggregate(). There have also been some bugs found

[GitHub] spark pull request: SPARK-4547 [MLLIB] [WIP] OOM when making bins ...

2014-12-15 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-67089687 @srowen +1 for this functionality. It sounds handy for experts and necessary for beginner users. What do you think of using ```numBins``` instead

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-12-15 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-67089972 It looks like the error is: ``` [error] /home/jenkins/workspace/NewSparkPullRequestBuilder/mllib/src/test/scala/org/apache/spark/mllib/ann/ANNSuite.scala:21

[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus

2014-12-15 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3157#issuecomment-67091270 Sorry for the slow response; testing now, but it will take a bit longer to finish --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-12-15 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-67092852 Yes, Jenkins will test against the master branch, so I'd recommend merging with master (or rebasing if the merge is messy). --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-2980][mllib] testing the Chi-squared hy...

2014-12-15 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3679#issuecomment-67093985 @jbencook I think that JIRA was supposed to be closed; I'll fix that. But adding some Python tests will be good---I'll take a look! Btw, the link

[GitHub] spark pull request: [SPARK-2980][mllib] testing the Chi-squared hy...

2014-12-15 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3679#issuecomment-67094622 I made a new JIRA for the unit tests. Could you please swap the JIRA tag for this one? [https://issues.apache.org/jira/browse/SPARK-4855] Thanks! --- If your

[GitHub] spark pull request: [SPARK-2980][mllib] testing the Chi-squared hy...

2014-12-15 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3679#issuecomment-67095129 I would recommend not testing for invalid input in stat.py as long as it is tested on the Scala side in ChiSqTest.scala. It will be faster to only test once

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21872495 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-15 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-67098327 Thanks for the style updates! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [MLLIB] [WIP] [SPARK-3702] Standardizing abstr...

2014-12-15 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3427#issuecomment-67099848 I'm closing this since I've begun breaking it into smaller PRs. I copied the PR description to the JIRA and will leave my WIP branch intact. --- If your project

[GitHub] spark pull request: [MLLIB] [WIP] [SPARK-3702] Standardizing abstr...

2014-12-15 Thread jkbradley
Github user jkbradley closed the pull request at: https://github.com/apache/spark/pull/3427 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3157#discussion_r21878594 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskResultGetter.scala --- @@ -19,6 +19,7 @@ package org.apache.spark.scheduler import

[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3157#discussion_r21878614 --- Diff: core/src/test/java/org/apache/spark/JavaAPISuite.java --- @@ -184,6 +184,7 @@ public void sortByKey() { Assert.assertEquals(new

[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3157#discussion_r21878883 --- Diff: core/src/test/java/org/apache/spark/JavaAPISuite.java --- @@ -491,6 +492,7 @@ public Integer call(Integer a, Integer b

[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3157#discussion_r21878897 --- Diff: core/src/test/java/org/apache/spark/JavaAPISuite.java --- @@ -1556,7 +1558,7 @@ public void testGuavaOptional() { @Test public void

[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3157#discussion_r21878899 --- Diff: core/src/test/scala/org/apache/spark/metrics/InputOutputMetricsSuite.scala --- @@ -24,14 +24,14 @@ import org.apache.spark.deploy.SparkHadoopUtil

[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3157#discussion_r21878938 --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala --- @@ -739,7 +739,7 @@ class DAGSchedulerSuite extends TestKit

[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3157#discussion_r21878948 --- Diff: sql/core/src/test/java/org/apache/spark/sql/api/java/JavaRowSuite.java --- @@ -141,6 +141,7 @@ public void constructComplexRow

[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3157#discussion_r21878964 --- Diff: streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java --- @@ -57,7 +57,7 @@ public void equalIterable(Iterable? a, Iterable? b

[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3157#discussion_r21878942 --- Diff: mllib/src/test/java/org/apache/spark/mllib/feature/JavaTfIdfSuite.java --- @@ -49,6 +49,7 @@ public void tearDown() { public void tfIdf

[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3157#discussion_r21878950 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DslQuerySuite.scala --- @@ -24,6 +24,8 @@ import org.apache.spark.sql.catalyst.expressions

[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3157#discussion_r21878955 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetQuerySuite.scala --- @@ -450,7 +452,9 @@ class ParquetQuerySuite extends QueryTest

[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3157#discussion_r21878961 --- Diff: sql/hive/src/test/java/org/apache/spark/sql/hive/execution/UDFListListInt.java --- @@ -23,25 +23,21 @@ public class UDFListListInt

[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3157#discussion_r21878944 --- Diff: sql/core/src/main/java/org/apache/spark/sql/api/java/UserDefinedType.java --- @@ -35,6 +35,7 @@ protected UserDefinedType() { } public

[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3157#discussion_r21878958 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/parquet/FakeParquetSerDe.scala --- @@ -32,7 +32,7 @@ import org.apache.hadoop.io.Writable

[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3157#discussion_r21878946 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTypes.scala --- @@ -454,7 +454,7 @@ private[parquet] object ParquetTypesConverter

[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3157#discussion_r21878976 --- Diff: core/pom.xml --- @@ -352,9 +352,9 @@ /execution /executions configuration - tasks

[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus

2014-12-15 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3157#issuecomment-67115316 @srowen I checked through more carefully this time, and there were a couple of changes for which I could not find associated warnings. --- If your project is set up

[GitHub] spark pull request: SPARK-4547 [MLLIB] [WIP] OOM when making bins ...

2014-12-15 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-67117735 @srowen Trying to guarantee exactly the requested number of points does seem like more trouble than it is worth. It might require collecting the # of points in each

[GitHub] spark pull request: [SPARK-4855][mllib] testing the Chi-squared hy...

2014-12-15 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3679#issuecomment-67119469 @jbencook Thanks for the updates! (Your comment about checking for exceptions makes me wonder if you were right before to throw a more meaningful exception than

[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus

2014-12-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3157#discussion_r21924054 --- Diff: core/src/test/java/org/apache/spark/JavaAPISuite.java --- @@ -1556,7 +1558,7 @@ public void testGuavaOptional() { @Test public void

[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus

2014-12-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3157#discussion_r21925414 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetQuerySuite.scala --- @@ -450,7 +452,9 @@ class ParquetQuerySuite extends QueryTest

[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus

2014-12-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3157#discussion_r21925735 --- Diff: core/pom.xml --- @@ -352,9 +352,9 @@ /execution /executions configuration - tasks

[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus

2014-12-16 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3157#issuecomment-67219288 @srowen LGTM The few items I couldn't verify myself look reasonable to me, so I'd say it's ready to go. CC: @pwendell --- If your project is set up

[GitHub] spark pull request: SPARK-4547 [MLLIB] [WIP] OOM when making bins ...

2014-12-16 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-67223829 Yep, that's what I meant. I think it would be extra code, but I don't think it would affect the runtime that much. (One pass to collect the number of elements

[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3643#discussion_r21929711 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala --- @@ -264,6 +263,86 @@ object MLUtils { } Vectors.fromBreeze

<    3   4   5   6   7   8   9   10   11   12   >