[GitHub] spark pull request #22894: [SPARK-25885][Core][Minor] HighlyCompressedMapSta...

2018-11-06 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22894#discussion_r231122195 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -149,7 +150,7 @@ private[spark] class HighlyCompressedMapStatus private

[GitHub] spark pull request #22894: [SPARK-25885][Core][Minor] HighlyCompressedMapSta...

2018-11-06 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22894#discussion_r231121793 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -189,13 +188,12 @@ private[spark] class HighlyCompressedMapStatus private

[GitHub] spark issue #22950: [MINOR] Fix typos and misspellings

2018-11-05 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22950 Merged to master/2.4 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use...

2018-11-05 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/17086#discussion_r230949468 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala --- @@ -67,6 +68,10 @@ class

[GitHub] spark pull request #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use...

2018-11-05 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/17086#discussion_r230949719 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala --- @@ -27,10 +27,17 @@ import org.apache.spark.sql.DataFrame

[GitHub] spark pull request #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use...

2018-11-05 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/17086#discussion_r230949430 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala --- @@ -18,10 +18,14 @@ package

[GitHub] spark pull request #22087: [SPARK-25097][ML] Support prediction on single in...

2018-11-05 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22087#discussion_r230948830 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala --- @@ -117,7 +117,8 @@ class BisectingKMeansModel private[ml

[GitHub] spark issue #22914: [SPARK-25900][WEBUI]When the page number is more than th...

2018-11-05 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22914 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22940: [MINOR][R] Rename SQLUtils name to RSQLUtils in R

2018-11-05 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22940 I'm neutral on it... seems like a logical change but is there any issue (like ambiguous imports that are annoying) other than not matching the pattern? I am not super concerned about back-porting

[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...

2018-11-05 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22921#discussion_r230784165 --- Diff: R/pkg/R/generics.R --- @@ -748,7 +748,7 @@ setGeneric("add_months", function(y, x) { standardGeneric("add_months") })

[GitHub] spark pull request #22950: [MINOR] Fix typos and misspellings

2018-11-05 Thread srowen
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/22950 [MINOR] Fix typos and misspellings ## What changes were proposed in this pull request? Fix typos and misspellings, per https://github.com/apache/spark-website/pull/158#issuecomment

[GitHub] spark issue #22931: [SPARK-25930][K8s] Fix scala string detection in k8s tes...

2018-11-05 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22931 Merged to master/2.4 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...

2018-11-03 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22921 Yeah it's a good point that these weren't deprecated, but I assume they should have been. Same change, same time, same logic. given that it's a reasonably niche method, I thought it would be best

[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...

2018-11-03 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22921#discussion_r230568378 --- Diff: R/pkg/R/functions.R --- @@ -319,23 +319,23 @@ setMethod("acos", }) #' @details -#' \code{approxCou

[GitHub] spark issue #22933: [SPARK-25933][DOCUMENTATION] Fix pstats.Stats() referenc...

2018-11-03 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22933 Merged to master/2.4/2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-03 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22893 OK, the Spark part doesn't seem relevant. The input might be more realistic here, yes. I was commenting that your test code doesn't show what you're testing, though I understand you manually

[GitHub] spark pull request #22894: [SPARK-25885][Core][Minor] HighlyCompressedMapSta...

2018-11-03 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22894#discussion_r230556818 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -189,13 +188,12 @@ private[spark] class HighlyCompressedMapStatus private

[GitHub] spark issue #22934: [BUILD] Close stale PRs

2018-11-03 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22934 Maybe add https://github.com/apache/spark/pull/22849 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22855: [SPARK-25839] [Core] Implement use of KryoPool in KryoSe...

2018-11-03 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22855 Oops, something to do with the benchmark class: ``` [error] /home/jenkins/workspace/NewSparkPullRequestBuilder/core/src/test/scala/org/apache/spark/serializer

[GitHub] spark issue #22892: [SPARK-25884][SQL] Add TBLPROPERTIES and COMMENT, and us...

2018-11-03 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22892 @cloud-fan @ueshin I'm not sure, but I'm seeing this failure regularly in master builds and I wonder if this could be the cause? in two builds it started failing with this (and another) change

[GitHub] spark issue #22864: [SPARK-25861][Minor][WEBUI] Remove unused refreshInterva...

2018-11-02 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22864 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22931: [SPARK-25930][K8s] Fix scala string detection in k8s tes...

2018-11-02 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22931 OK. Does this need to go in branch 2.4 too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...

2018-11-02 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22921#discussion_r230449544 --- Diff: python/pyspark/sql/functions.py --- @@ -275,15 +273,6 @@ def _(): del _name, _doc -@since(1.3) -def approxCountDistinct

[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...

2018-11-02 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22921#discussion_r230449173 --- Diff: R/pkg/R/functions.R --- @@ -1641,30 +1641,30 @@ setMethod("tanh", }) #' @details -#' \code{toDegrees}

[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...

2018-11-02 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22921#discussion_r230436963 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -279,27 +254,6 @@ class DataFrameSuite extends QueryTest

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-02 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22893 So the pull request right now doesn't reflect what you tested, but you tested the version pasted above. You're saying that the optimization just never helps the dense-dense case, and sqdist

[GitHub] spark issue #22922: [SPARK-25909] fix documentation on cluster managers

2018-11-02 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22922 Merged to master/2.4 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #22855: [SPARK-25839] [Core] Implement use of KryoPool in...

2018-11-02 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22855#discussion_r230421073 --- Diff: core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala --- @@ -84,6 +85,7 @@ class KryoSerializer(conf: SparkConf) private

[GitHub] spark issue #22906: [SPARK-25895][Core]Adding testcase to compare Lz4 and Zs...

2018-11-02 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22906 Benchmarks aren't run with tests right? That's fine. I am still not sure Spark is the best place for this. --- - To unsubscribe

[GitHub] spark issue #22919: [SPARK-25906][SHELL] Restores '-i' option's behaviour in...

2018-11-01 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22919 Tough call. At least it's worth documenting that this behavior changed because so did the Scala shell's behavior, and I'd support that. I'd also support supporting both, if there's no real downside

[GitHub] spark issue #22924: [SPARK-25891][PYTHON] Upgrade to Py4J 0.10.8.1

2018-11-01 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22924 Yeah I get that. I haven't heard the bugs that are fixed here impact users, but a few sound kind of bad. Maybe this should depend more on: how much do people need Python 3.7 in 2.4.x? that seems

[GitHub] spark issue #22864: [SPARK-25861][Minor][WEBUI] Remove unused refreshInterva...

2018-11-01 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22864 Jenkins add to whitelist --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22864: [SPARK-25861][Minor][WEBUI] Remove unused refreshInterva...

2018-11-01 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22864 That's a spurious failure, something's wrong with the build machine --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22922: [SPARK-25909] fix documentation on cluster manage...

2018-11-01 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22922#discussion_r230159527 --- Diff: docs/cluster-overview.md --- @@ -45,7 +45,7 @@ There are several useful things to note about this architecture: # Cluster Manager Types

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-11-01 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22758 I don't know this code well enough to review. I think there is skepticism from people who know this code whether this is change is correct and beneficial. If there's doubt, I think it should

[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...

2018-11-01 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22921#discussion_r230156120 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -62,17 +62,6 @@ class SQLContext private[sql](val sparkSession: SparkSession

[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...

2018-11-01 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22921#discussion_r230155788 --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala --- @@ -639,20 +639,6 @@ private[spark] object SparkConf extends Logging

[GitHub] spark issue #22924: [SPARK-25891][PYTHON] Upgrade to Py4J 0.10.8.1

2018-11-01 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22924 I think it's OK to backport at 3.7 support is actually fairly important, and this also fixes bugs: https://www.py4j.org/changelog.html It drops 2.6 support but we did so as well a long time

[GitHub] spark issue #22909: [SPARK-25897][k8s] Hook up k8s integration tests to sbt ...

2018-11-01 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22909 I don't know SBT much myself. I would believe this works. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22914: [SPARK-25900][WEBUI]When the page number is more than th...

2018-11-01 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22914 Yes if the behavior is consistent across pages that sounds like fine behavior. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22908: [MONOR][SQL] Replace all TreeNode's node name in ...

2018-11-01 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22908#discussion_r230060094 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -56,7 +56,7 @@ case class

[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...

2018-11-01 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22921#discussion_r230057306 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala --- @@ -246,7 +246,7 @@ class

[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...

2018-11-01 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22921#discussion_r230057481 --- Diff: R/pkg/R/generics.R --- @@ -748,7 +748,7 @@ setGeneric("add_months", function(y, x) { standardGeneric("add_months") })

[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...

2018-11-01 Thread srowen
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/22921 [SPARK-25908][CORE][SQL] Remove old deprecated items in Spark 3 ## What changes were proposed in this pull request? - Remove some AccumulableInfo .apply() methods - Remove non-label

[GitHub] spark issue #22813: [SPARK-25818][CORE] WorkDirCleanup should only remove th...

2018-11-01 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22813 I just think that if you have engineers randomly writing and reading stuff in this dir, a bunch of other stuff goes wrong. This is not a problem that Spark can reasonably solve. Certainly, you have

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-01 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22893 Hm, actually that's the best case. You're exercising the case where the code path you prefer is fast. And the case where the precision bound applies is exactly the case where the branch you deleted

[GitHub] spark pull request #22876: [SPARK-25869] [YARN] the original diagnostics is ...

2018-11-01 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22876#discussion_r230053125 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -293,6 +293,9 @@ private[spark] class

[GitHub] spark pull request #22855: [SPARK-25839] [Core] Implement use of KryoPool in...

2018-11-01 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22855#discussion_r230052150 --- Diff: core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala --- @@ -84,6 +85,7 @@ class KryoSerializer(conf: SparkConf) private

[GitHub] spark issue #22914: [SPARK-25900][WEBUI]When the page number is more than th...

2018-11-01 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22914 Returning to the current or last page seems fine too. No error message really needed IMHO. However it sounds like other pages already go to the 1st page on invalid input? I'd also like to stay

[GitHub] spark pull request #22894: [SPARK-25885][Core][Minor] HighlyCompressedMapSta...

2018-10-31 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22894#discussion_r229812240 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -189,13 +190,12 @@ private[spark] class HighlyCompressedMapStatus private

[GitHub] spark pull request #22894: [SPARK-25885][Core][Minor] HighlyCompressedMapSta...

2018-10-31 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22894#discussion_r229776086 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -189,13 +190,12 @@ private[spark] class HighlyCompressedMapStatus private

[GitHub] spark pull request #22855: [SPARK-25839] [Core] Implement use of KryoPool in...

2018-10-31 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22855#discussion_r229738657 --- Diff: core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala --- @@ -456,9 +458,63 @@ class KryoSerializerSuite extends SparkFunSuite

[GitHub] spark pull request #22855: [SPARK-25839] [Core] Implement use of KryoPool in...

2018-10-31 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22855#discussion_r229740163 --- Diff: core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala --- @@ -84,6 +85,7 @@ class KryoSerializer(conf: SparkConf) private

[GitHub] spark pull request #22855: [SPARK-25839] [Core] Implement use of KryoPool in...

2018-10-31 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22855#discussion_r229739685 --- Diff: core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala --- @@ -456,9 +458,63 @@ class KryoSerializerSuite extends SparkFunSuite

[GitHub] spark pull request #22855: [SPARK-25839] [Core] Implement use of KryoPool in...

2018-10-31 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22855#discussion_r229738553 --- Diff: core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala --- @@ -456,9 +458,63 @@ class KryoSerializerSuite extends SparkFunSuite

[GitHub] spark pull request #22855: [SPARK-25839] [Core] Implement use of KryoPool in...

2018-10-31 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22855#discussion_r229737692 --- Diff: core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala --- @@ -92,6 +94,16 @@ class KryoSerializer(conf: SparkConf) new

[GitHub] spark pull request #22855: [SPARK-25839] [Core] Implement use of KryoPool in...

2018-10-31 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22855#discussion_r229738794 --- Diff: core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala --- @@ -30,6 +30,7 @@ import scala.util.control.NonFatal import

[GitHub] spark pull request #22855: [SPARK-25839] [Core] Implement use of KryoPool in...

2018-10-31 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22855#discussion_r229737307 --- Diff: core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala --- @@ -92,6 +94,16 @@ class KryoSerializer(conf: SparkConf) new

[GitHub] spark pull request #22723: [SPARK-25729][CORE]It is better to replace `minPa...

2018-10-31 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22723#discussion_r229720799 --- Diff: core/src/main/scala/org/apache/spark/rdd/WholeTextFileRDD.scala --- @@ -51,7 +51,7 @@ private[spark] class WholeTextFileRDD( case

[GitHub] spark pull request #22723: [SPARK-25729][CORE]It is better to replace `minPa...

2018-10-31 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22723#discussion_r229721018 --- Diff: core/src/main/scala/org/apache/spark/input/WholeTextFileInputFormat.scala --- @@ -48,11 +50,11 @@ private[spark] class WholeTextFileInputFormat

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-10-31 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22893 I don't think BLAS matters here as these are all vector-vector operations and f2jblas is used directly (i.e. stays in the JVM). Are all the vectors dense? I suppose I'm still surprised

[GitHub] spark issue #22901: [SPARK-25891][PYTHON] Upgrade to Py4J 0.10.8.1

2018-10-31 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22901 That sounds like a good idea. I wonder if it's safe to back-port to 2.4/2.3? should be if it's just a maintenance release

[GitHub] spark issue #22906: [SPARK-25895][Core]Adding testcase to compare Lz4 and Zs...

2018-10-31 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22906 I don't think this tests something about the correctness of Spark though. I am not sure this is worth it. --- - To unsubscribe

[GitHub] spark issue #22894: [SPARK-25885][Core][Minor] HighlyCompressedMapStatus des...

2018-10-31 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22894 I also would not expect updating an immutable data structure to be faster. Building a map once from tuples at the end seems better than rebuilding a map each time. Under the hood the immutable map

[GitHub] spark issue #22852: [SPARK-25023] Clarify Spark security documentation

2018-10-30 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22852 I think these are good changes. In a separate PR for the versions-specific docs, we could add a similar note to https://spark.apache.org/docs/latest/spark-standalone.html as much of the security

[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-30 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22784 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22849: [SPARK-25852][Core] we should filter the workOffers with...

2018-10-30 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22849 Yes, that is the comment I have been referring to. So it seems you can't filter, right? it's not scheduling work here. @jiangxb1987 do you know this part of the code

[GitHub] spark issue #22885: [BUILD][MINOR] release script should not interrupt by sv...

2018-10-30 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22885 Looks fine. Maybe some Maven cache somewhere has to be deleted as it's corrupt? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22856: [SPARK-25856][SQL][MINOR] Remove AverageLike and CountLi...

2018-10-29 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22856 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22813: [SPARK-25818][CORE] WorkDirCleanup should only remove th...

2018-10-29 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22813 I don't know what else goes in the work dir. It isn't valid to reuse it for anything else. Can you simply avoid using a work dir that is or has been used by something else? The argument

[GitHub] spark issue #22852: [SPARK-25023] Clarify Spark security documentation

2018-10-29 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22852 A quick pointer to security issues in other key places sounds good. As long as it is increasing the chance users understand the specific issue and isn't more general text to skip past, it is helping

[GitHub] spark pull request #22849: [SPARK-25852][Core] we should filter the workOffe...

2018-10-29 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22849#discussion_r228936286 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -240,7 +240,7 @@ class

[GitHub] spark pull request #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535...

2018-10-28 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22784#discussion_r228769714 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala --- @@ -49,7 +50,16 @@ class PCA @Since("1.4.0") (@Since("1.4

[GitHub] spark issue #22802: [SPARK-25806][SQL]The instance of FileSplit is redundant

2018-10-28 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22802 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535...

2018-10-27 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22784#discussion_r228724594 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/PCASuite.scala --- @@ -54,4 +55,21 @@ class PCASuite extends SparkFunSuite

[GitHub] spark pull request #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535...

2018-10-27 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22784#discussion_r228724541 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -384,18 +384,28 @@ class RowMatrix @Since("

[GitHub] spark pull request #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535...

2018-10-27 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22784#discussion_r228724515 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala --- @@ -49,7 +50,16 @@ class PCA @Since("1.4.0") (@Since("1.4

[GitHub] spark pull request #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535...

2018-10-27 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22784#discussion_r228724667 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/PCASuite.scala --- @@ -54,4 +55,14 @@ class PCASuite extends SparkFunSuite

[GitHub] spark pull request #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535...

2018-10-27 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22784#discussion_r228724555 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/PCASuite.scala --- @@ -54,4 +55,21 @@ class PCASuite extends SparkFunSuite

[GitHub] spark pull request #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535...

2018-10-27 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22784#discussion_r228714200 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -384,18 +384,28 @@ class RowMatrix @Since("

[GitHub] spark pull request #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535...

2018-10-27 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22784#discussion_r228713925 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/PCASuite.scala --- @@ -54,4 +55,14 @@ class PCASuite extends SparkFunSuite

[GitHub] spark pull request #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535...

2018-10-27 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22784#discussion_r228713878 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/PCASuite.scala --- @@ -54,4 +55,14 @@ class PCASuite extends SparkFunSuite

[GitHub] spark pull request #22849: [SPARK-25852][Core] we should filter the workOffe...

2018-10-27 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22849#discussion_r228713414 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -240,7 +240,7 @@ class

[GitHub] spark issue #22813: [SPARK-25818][CORE] WorkDirCleanup should only remove th...

2018-10-27 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22813 I don't think that's a reasonable usage scenario. That said is there any harm to this change? would it ever miss cleaning something up that it should

[GitHub] spark issue #22860: Branch 2.4

2018-10-27 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22860 @sarojchand close this --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22859: Branch 2.2

2018-10-27 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22859 @sarojchand close this --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22425: [SPARK-23367][Build] Include python document style check...

2018-10-27 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22425 Merged to master. Anyone who feels like then addressing the style warnings, some that may be easy to address, go ahead

[GitHub] spark pull request #22425: [SPARK-23367][Build] Include python document styl...

2018-10-27 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22425#discussion_r228712794 --- Diff: dev/lint-python --- @@ -99,6 +104,29 @@ else echo "flake8 checks passed." fi +# Check python document style,

[GitHub] spark issue #22850: [MINOR][DOC] Fix comment error of HiveUtils

2018-10-27 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22850 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22815: [SPARK-25821][SQL] Remove SQLContext methods deprecated ...

2018-10-26 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22815 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22848: [SPARK-25851][SQL][MINOR] Fix deprecated API warning in ...

2018-10-26 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22848 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22852: [SPARK-25023] Clarify Spark security documentation

2018-10-26 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22852 I don't feel strongly about it; go ahead. If someone lands on this page, do they pretty easily come away with the impression they need to set spark.authenticate and network security

[GitHub] spark pull request #22854: [SPARK-25854] fix mvn to not always exit 1

2018-10-26 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22854#discussion_r228583684 --- Diff: build/mvn --- @@ -163,8 +163,19 @@ export MAVEN_OPTS=${MAVEN_OPTS:-"$_COMPILE_JVM_OPTS"} echo "Using \`mvn\` from path:

[GitHub] spark issue #22802: [SPARK-25806][SQL]The instance of FileSplit is redundant...

2018-10-26 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22802 OK, that's pretty trivial, but I agree. Any other instances of this type of thing? --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #22843: [SPARK-16693][SPARKR] Remove methods deprecated

2018-10-26 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22843 I personally am OK with removing anything deprecated before 2.4. Even things deprecated in 2.4 are technically fair game. If any of these were deprecated pretty recently, might be debatable, but I

[GitHub] spark issue #22826: [SPARK-25760][DOCS][FOLLOWUP] Add note about AddJar retu...

2018-10-26 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22826 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22815: [SPARK-25821][SQL] Remove SQLContext methods depr...

2018-10-26 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22815#discussion_r228554831 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -54,6 +54,7 @@ import org.apache.spark.sql.util.ExecutionListenerManager

[GitHub] spark pull request #22848: [SPARK-25851][SQL][MINOR] Fix deprecated API warn...

2018-10-26 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22848#discussion_r228553964 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala --- @@ -89,12 +89,12 @@ private class LongLongTupleConverter extends

[GitHub] spark issue #22852: [SPARK-25023] Clarify Spark security documentation

2018-10-26 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22852 I get it, the "it's your responsibility" stance, and it is. For any risk there's a sentence in this doc we could point to and say, "see, told you". If we're going to make a

[GitHub] spark issue #22723: [SPARK-25729][CORE]It is better to replace `minPartition...

2018-10-26 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22723 Kind of. As you say it lets a particular `InputFormat` decide this. But it still uses the minPartitions as input. See again https://issues.apache.org/jira/browse/SPARK-22357 for some subtlety

<    1   2   3   4   5   6   7   8   9   10   >