Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3603#issuecomment-66520117
@yu-iskw Thanks for the updates! Found 1 typo, but other than that, LGTM
@mengxr Perhaps you can weigh in on the Python API change before this is
committed
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/1269#issuecomment-66529072
@akopich Thanks for the updates. It looks like rebasing did not work
correctly (looking at the 10K+ lines in this PR!). It should be possible to
fix with rebase
GitHub user jkbradley opened a pull request:
https://github.com/apache/spark/pull/3669
[SPARK-4821] [mllib] [python] [docs] Fix for pyspark.mllib.rand doc
+ small doc edit
+ include edit to make IntelliJ happy
CC: @davies @mengxr
Note to @davies -- this does
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3669#issuecomment-66537698
CC: @pwendell This fixes a problem with missing documentation in the
current build (and in branch-1.2).
---
If your project is set up for it, you can reply
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3669#issuecomment-66543782
@davies Thanks very much! Updated. I made a separate JIRA for fixing
Python doc annotations:
[https://issues.apache.org/jira/browse/SPARK-4822]
---
If your
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21655815
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala
---
@@ -0,0 +1,283 @@
+/*
+ * Licensed
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21655804
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala ---
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21655807
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala ---
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21655817
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala
---
@@ -0,0 +1,283 @@
+/*
+ * Licensed
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21655820
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala
---
@@ -0,0 +1,283 @@
+/*
+ * Licensed
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21655821
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala
---
@@ -0,0 +1,283 @@
+/*
+ * Licensed
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21655814
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala
---
@@ -0,0 +1,283 @@
+/*
+ * Licensed
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21655813
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala
---
@@ -0,0 +1,283 @@
+/*
+ * Licensed
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21655818
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala
---
@@ -0,0 +1,283 @@
+/*
+ * Licensed
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21655828
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala
---
@@ -0,0 +1,283 @@
+/*
+ * Licensed
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21655811
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala
---
@@ -0,0 +1,283 @@
+/*
+ * Licensed
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21655837
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21655833
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala
---
@@ -0,0 +1,283 @@
+/*
+ * Licensed
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21655827
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala
---
@@ -0,0 +1,283 @@
+/*
+ * Licensed
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21655832
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala
---
@@ -0,0 +1,283 @@
+/*
+ * Licensed
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21655824
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala
---
@@ -0,0 +1,283 @@
+/*
+ * Licensed
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21655802
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala ---
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21655806
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala ---
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21655839
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximizationSuite.scala
---
@@ -0,0 +1,44 @@
+/*
+ * Licensed
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21655816
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala
---
@@ -0,0 +1,283 @@
+/*
+ * Licensed
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21655840
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximizationSuite.scala
---
@@ -0,0 +1,44 @@
+/*
+ * Licensed
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21655831
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala
---
@@ -0,0 +1,283 @@
+/*
+ * Licensed
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21655822
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala
---
@@ -0,0 +1,283 @@
+/*
+ * Licensed
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21655836
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21655819
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala
---
@@ -0,0 +1,283 @@
+/*
+ * Licensed
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21655830
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala
---
@@ -0,0 +1,283 @@
+/*
+ * Licensed
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-66563244
@tgaloppo Thanks very much for the PR, and sincere apologies for the slow
response about it! @manishamde was right about people being too preoccupied
with the 1.2
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-66563419
@tgaloppo Let me know if you have questions, and also when I should make
another pass over this PR---thanks again!
---
If your project is set up for it, you can
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3583#issuecomment-66563683
@dikejiang Great, thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21695326
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala
---
@@ -0,0 +1,283 @@
+/*
+ * Licensed
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3636#discussion_r21724909
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/optimization/GradientDescentSuite.scala
---
@@ -138,6 +138,45 @@ class GradientDescentSuite extends
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3636#discussion_r21724910
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala
---
@@ -182,34 +203,46 @@ object GradientDescent extends Logging
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3636#discussion_r21724912
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala
---
@@ -77,6 +80,14 @@ class GradientDescent private[mllib
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3636#discussion_r21724908
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/optimization/GradientDescentSuite.scala
---
@@ -138,6 +138,45 @@ class GradientDescentSuite extends
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3636#issuecomment-66721864
@Lewuathe Thanks for the updates! I just added a few last comments (which
should be the last).
---
If your project is set up for it, you can reply to this email
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3636#issuecomment-66723145
@Lewuathe Sorry---one more request. Could you actually use
convergenceTol instead of convergenceTolerance in order to fit with the
public API in LBFGS? Thanks
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3583#issuecomment-67049092
@dikejiang Apologies--I think I was not clear. I was recommending that
you change this PR to implement predictRaw(), rather than predictWithWeight().
Does
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3636#issuecomment-67070947
@Lewuathe Once the scala style is fixed (dev/scalastyle), this should be
ready.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21859900
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21859898
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-67072947
@tgaloppo Thanks for the updates! You did exactly what I had in mind for
MultivariateGaussian; thanks.
My main comments now are still about style. I realize
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-67073269
Oh, also, IntelliJ 13 does a pretty good job with the indentation, if
you're using it. You can run sbt/sbt gen-idea to create project files before
opening the Spark
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21860601
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21860754
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21860758
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21860755
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21860757
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21860764
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala ---
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3643#discussion_r21863319
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala ---
@@ -264,6 +263,92 @@ object MLUtils {
}
Vectors.fromBreeze
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3643#discussion_r21863325
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala ---
@@ -264,6 +263,92 @@ object MLUtils {
}
Vectors.fromBreeze
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3643#discussion_r2186
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala ---
@@ -264,6 +263,92 @@ object MLUtils {
}
Vectors.fromBreeze
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3643#discussion_r21863318
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala ---
@@ -264,6 +263,92 @@ object MLUtils {
}
Vectors.fromBreeze
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3643#discussion_r21863326
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala ---
@@ -264,6 +263,92 @@ object MLUtils {
}
Vectors.fromBreeze
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3643#discussion_r21863337
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala ---
@@ -264,6 +263,92 @@ object MLUtils {
}
Vectors.fromBreeze
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3643#issuecomment-67079749
@viirya Thanks for the updates! I made some inline comments, one of them
major. Please let me know when to check again.
---
If your project is set up for it, you
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/1269#issuecomment-67083325
@akopich Thanks for the updates! (Much easier to see the diff now)
The decision about setters vs. constructor arguments was from [this JIRA
(design doc
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/1290#issuecomment-67083539
The test logs have expired...rerunning
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3685#issuecomment-67083774
@Lewuathe Thanks for the PR! As long as you're fixing those, would you
mind fixing the 2 DeveloperApi tags in pyspark/mllib/feature.py and the 2
WARN tags in pyspark
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-67087866
I just found out (hearsay) that Accumulator may incur a big performance
penalty relative to methods like RDD.aggregate(). There have also been some
bugs found
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3702#issuecomment-67089687
@srowen +1 for this functionality. It sounds handy for experts and
necessary for beginner users.
What do you think of using ```numBins``` instead
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/1290#issuecomment-67089972
It looks like the error is:
```
[error]
/home/jenkins/workspace/NewSparkPullRequestBuilder/mllib/src/test/scala/org/apache/spark/mllib/ann/ANNSuite.scala:21
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3157#issuecomment-67091270
Sorry for the slow response; testing now, but it will take a bit longer to
finish
---
If your project is set up for it, you can reply to this email and have your
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/1290#issuecomment-67092852
Yes, Jenkins will test against the master branch, so I'd recommend merging
with master (or rebasing if the merge is messy).
---
If your project is set up for it, you
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3679#issuecomment-67093985
@jbencook I think that JIRA was supposed to be closed; I'll fix that. But
adding some Python tests will be good---I'll take a look!
Btw, the link
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3679#issuecomment-67094622
I made a new JIRA for the unit tests. Could you please swap the JIRA tag
for this one?
[https://issues.apache.org/jira/browse/SPARK-4855]
Thanks!
---
If your
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3679#issuecomment-67095129
I would recommend not testing for invalid input in stat.py as long as it is
tested on the Scala side in ChiSqTest.scala. It will be faster to only test
once
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21872495
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala ---
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-67098327
Thanks for the style updates!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3427#issuecomment-67099848
I'm closing this since I've begun breaking it into smaller PRs. I copied
the PR description to the JIRA and will leave my WIP branch intact.
---
If your project
Github user jkbradley closed the pull request at:
https://github.com/apache/spark/pull/3427
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3157#discussion_r21878594
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskResultGetter.scala ---
@@ -19,6 +19,7 @@ package org.apache.spark.scheduler
import
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3157#discussion_r21878614
--- Diff: core/src/test/java/org/apache/spark/JavaAPISuite.java ---
@@ -184,6 +184,7 @@ public void sortByKey() {
Assert.assertEquals(new
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3157#discussion_r21878883
--- Diff: core/src/test/java/org/apache/spark/JavaAPISuite.java ---
@@ -491,6 +492,7 @@ public Integer call(Integer a, Integer b
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3157#discussion_r21878897
--- Diff: core/src/test/java/org/apache/spark/JavaAPISuite.java ---
@@ -1556,7 +1558,7 @@ public void testGuavaOptional() {
@Test
public void
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3157#discussion_r21878899
--- Diff:
core/src/test/scala/org/apache/spark/metrics/InputOutputMetricsSuite.scala ---
@@ -24,14 +24,14 @@ import org.apache.spark.deploy.SparkHadoopUtil
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3157#discussion_r21878938
--- Diff:
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -739,7 +739,7 @@ class DAGSchedulerSuite extends
TestKit
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3157#discussion_r21878948
--- Diff:
sql/core/src/test/java/org/apache/spark/sql/api/java/JavaRowSuite.java ---
@@ -141,6 +141,7 @@ public void constructComplexRow
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3157#discussion_r21878964
--- Diff:
streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java ---
@@ -57,7 +57,7 @@ public void equalIterable(Iterable? a, Iterable? b
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3157#discussion_r21878942
--- Diff:
mllib/src/test/java/org/apache/spark/mllib/feature/JavaTfIdfSuite.java ---
@@ -49,6 +49,7 @@ public void tearDown() {
public void tfIdf
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3157#discussion_r21878950
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DslQuerySuite.scala
---
@@ -24,6 +24,8 @@ import org.apache.spark.sql.catalyst.expressions
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3157#discussion_r21878955
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetQuerySuite.scala ---
@@ -450,7 +452,9 @@ class ParquetQuerySuite extends QueryTest
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3157#discussion_r21878961
--- Diff:
sql/hive/src/test/java/org/apache/spark/sql/hive/execution/UDFListListInt.java
---
@@ -23,25 +23,21 @@
public class UDFListListInt
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3157#discussion_r21878944
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/api/java/UserDefinedType.java ---
@@ -35,6 +35,7 @@ protected UserDefinedType() { }
public
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3157#discussion_r21878958
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/parquet/FakeParquetSerDe.scala
---
@@ -32,7 +32,7 @@ import org.apache.hadoop.io.Writable
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3157#discussion_r21878946
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTypes.scala ---
@@ -454,7 +454,7 @@ private[parquet] object ParquetTypesConverter
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3157#discussion_r21878976
--- Diff: core/pom.xml ---
@@ -352,9 +352,9 @@
/execution
/executions
configuration
- tasks
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3157#issuecomment-67115316
@srowen I checked through more carefully this time, and there were a
couple of changes for which I could not find associated warnings.
---
If your project is set up
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3702#issuecomment-67117735
@srowen Trying to guarantee exactly the requested number of points does
seem like more trouble than it is worth. It might require collecting the # of
points in each
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3679#issuecomment-67119469
@jbencook Thanks for the updates! (Your comment about checking for
exceptions makes me wonder if you were right before to throw a more meaningful
exception than
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3157#discussion_r21924054
--- Diff: core/src/test/java/org/apache/spark/JavaAPISuite.java ---
@@ -1556,7 +1558,7 @@ public void testGuavaOptional() {
@Test
public void
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3157#discussion_r21925414
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetQuerySuite.scala ---
@@ -450,7 +452,9 @@ class ParquetQuerySuite extends QueryTest
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3157#discussion_r21925735
--- Diff: core/pom.xml ---
@@ -352,9 +352,9 @@
/execution
/executions
configuration
- tasks
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3157#issuecomment-67219288
@srowen LGTM
The few items I couldn't verify myself look reasonable to me, so I'd say
it's ready to go.
CC: @pwendell
---
If your project is set up
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3702#issuecomment-67223829
Yep, that's what I meant. I think it would be extra code, but I don't
think it would affect the runtime that much. (One pass to collect the number
of elements
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3643#discussion_r21929711
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala ---
@@ -264,6 +263,86 @@ object MLUtils {
}
Vectors.fromBreeze
701 - 800 of 7695 matches
Mail list logo