[GitHub] spark issue #21416: [SPARK-24371] [SQL] Added isInCollection in DataFrame AP...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21416 LGTM (I didn't look that carefully though) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21416: [SPARK-24371] [SQL] Added isInCollection in DataFrame AP...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21416 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3656/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21416: [SPARK-24371] [SQL] Added isInCollection in DataFrame AP...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21416 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21447: [SPARK-24339][SQL]Add project for transform/map/reduce s...
Github user xdcjie commented on the issue: https://github.com/apache/spark/pull/21447 @maropu I updated the commet. In summary, with this pr can reduce the time of scan and assemble data. In our scenario, the relation(table) have 700 columns. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21416: [SPARK-24371] [SQL] Added isInCollection in DataFrame AP...
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/21416 @rxin I simplified the test cases as you suggested. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21416: [SPARK-24371] [SQL] Added isInCollection in DataFrame AP...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21416 **[Test build #91242 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91242/testReport)** for PR 21416 at commit [`fed2846`](https://github.com/apache/spark/commit/fed2846fe7c9ca2cb4534b23803cd29d5a18d4f9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21416: [SPARK-24371] [SQL] Added isInCollection in DataF...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21416#discussion_r191317978 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala --- @@ -392,9 +396,97 @@ class ColumnExpressionSuite extends QueryTest with SharedSQLContext { val df2 = Seq((1, Seq(1)), (2, Seq(2)), (3, Seq(3))).toDF("a", "b") -intercept[AnalysisException] { +val e = intercept[AnalysisException] { df2.filter($"a".isin($"b")) } +Seq("cannot resolve", "due to data type mismatch: Arguments must be same type but were") + .foreach { s => + assert(e.getMessage.toLowerCase(Locale.ROOT).contains(s.toLowerCase(Locale.ROOT))) + } + } + + test("isInCollection: Scala Collection") { +val df = Seq((1, "x"), (2, "y"), (3, "z")).toDF("a", "b") +checkAnswer(df.filter($"a".isInCollection(Seq(1, 2))), + df.collect().toSeq.filter(r => r.getInt(0) == 1 || r.getInt(0) == 2)) +checkAnswer(df.filter($"a".isInCollection(Seq(3, 2))), + df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 2)) +checkAnswer(df.filter($"a".isInCollection(Seq(3, 1))), + df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 1)) + +// Auto casting should work with mixture of different types in collections +checkAnswer(df.filter($"a".isInCollection(Seq(1.toShort, "2"))), + df.collect().toSeq.filter(r => r.getInt(0) == 1 || r.getInt(0) == 2)) +checkAnswer(df.filter($"a".isInCollection(Seq("3", 2.toLong))), + df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 2)) +checkAnswer(df.filter($"a".isInCollection(Seq(3, "1"))), + df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 1)) + +checkAnswer(df.filter($"b".isInCollection(Seq("y", "x"))), + df.collect().toSeq.filter(r => r.getString(1) == "y" || r.getString(1) == "x")) +checkAnswer(df.filter($"b".isInCollection(Seq("z", "x"))), + df.collect().toSeq.filter(r => r.getString(1) == "z" || r.getString(1) == "x")) +checkAnswer(df.filter($"b".isInCollection(Seq("z", "y"))), + df.collect().toSeq.filter(r => r.getString(1) == "z" || r.getString(1) == "y")) + +// Test with different types of collections +checkAnswer(df.filter($"a".isInCollection(Seq(1, 2).toSet)), + df.collect().toSeq.filter(r => r.getInt(0) == 1 || r.getInt(0) == 2)) +checkAnswer(df.filter($"a".isInCollection(Seq(3, 2).toArray)), + df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 2)) +checkAnswer(df.filter($"a".isInCollection(Seq(3, 1).toList)), + df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 1)) + +val df2 = Seq((1, Seq(1)), (2, Seq(2)), (3, Seq(3))).toDF("a", "b") + +val e = intercept[AnalysisException] { + df2.filter($"a".isInCollection(Seq($"b"))) +} +Seq("cannot resolve", "due to data type mismatch: Arguments must be same type but were") + .foreach { s => + assert(e.getMessage.toLowerCase(Locale.ROOT).contains(s.toLowerCase(Locale.ROOT))) + } + } + + test("isInCollection: Java Collection") { +val df = Seq((1, "x"), (2, "y"), (3, "z")).toDF("a", "b") --- End diff -- Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21416: [SPARK-24371] [SQL] Added isInCollection in DataF...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21416#discussion_r191317980 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala --- @@ -392,9 +396,97 @@ class ColumnExpressionSuite extends QueryTest with SharedSQLContext { val df2 = Seq((1, Seq(1)), (2, Seq(2)), (3, Seq(3))).toDF("a", "b") -intercept[AnalysisException] { +val e = intercept[AnalysisException] { df2.filter($"a".isin($"b")) } +Seq("cannot resolve", "due to data type mismatch: Arguments must be same type but were") + .foreach { s => + assert(e.getMessage.toLowerCase(Locale.ROOT).contains(s.toLowerCase(Locale.ROOT))) + } + } + + test("isInCollection: Scala Collection") { --- End diff -- Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21442: [SPARK-24402] [SQL] Optimize `In` expression when...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21442#discussion_r191314972 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -219,7 +219,14 @@ object ReorderAssociativeOperator extends Rule[LogicalPlan] { object OptimizeIn extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transform { case q: LogicalPlan => q transformExpressionsDown { - case In(v, list) if list.isEmpty && !v.nullable => FalseLiteral + case In(v, list) if list.isEmpty => +// When v is not nullable, the following expression will be optimized +// to FalseLiteral which is tested in OptimizeInSuite.scala +If(IsNotNull(v), FalseLiteral, Literal(null, BooleanType)) + case In(v, Seq(elem @ Literal(_, _))) => --- End diff -- This has a bug when the Literal is a struct. See the test failure: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91218/testReport/org.apache.spark.sql/SQLQueryTestSuite/sql/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21442: [SPARK-24402] [SQL] Optimize `In` expression when...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21442#discussion_r191314675 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -219,7 +219,14 @@ object ReorderAssociativeOperator extends Rule[LogicalPlan] { object OptimizeIn extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transform { case q: LogicalPlan => q transformExpressionsDown { - case In(v, list) if list.isEmpty && !v.nullable => FalseLiteral + case In(v, list) if list.isEmpty => +// When v is not nullable, the following expression will be optimized +// to FalseLiteral which is tested in OptimizeInSuite.scala +If(IsNotNull(v), FalseLiteral, Literal(null, BooleanType)) + case In(v, Seq(elem @ Literal(_, _))) => --- End diff -- This can be moved inside `case expr @ In(v, list) if expr.inSetConvertible`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21436: [SPARK-24250][SQL][Follow-up] support accessing S...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21436 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21436: [SPARK-24250][SQL][Follow-up] support accessing S...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21436#discussion_r191313872 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -1021,21 +1021,33 @@ object SparkSession extends Logging { /** * Returns the active SparkSession for the current thread, returned by the builder. * + * @note Return None, when calling this function on executors + * * @since 2.2.0 */ def getActiveSession: Option[SparkSession] = { -assertOnDriver() --- End diff -- `assertOnDriver` is a helpful method. It might be useful to the other scenarios in the future. Let us keep it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21436: [SPARK-24250][SQL][Follow-up] support accessing SQLConf ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21436 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21443: [SPARK-24369][SQL] Correct handling for multiple distinc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21443 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3655/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21443: [SPARK-24369][SQL] Correct handling for multiple distinc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21443 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21447: [SPARK-24339][SQL]Add project for transform/map/reduce s...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21447 Could you add explain result differences with/without this pr in the description? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21447: [SPARK-24339][SQL]Add project for transform/map/r...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21447#discussion_r191312085 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -338,6 +338,17 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging // Add where. val withFilter = relation.optionalMap(where)(filter) +// Add project. +val namedExpressions = expressions.map { + case e: NamedExpression => e + case e: Expression => UnresolvedAlias(e) --- End diff -- nit: `case e: _ => UnresolvedAlias(e)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21447: [SPARK-24339][SQL]Add project for transform/map/reduce s...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21447 @gatorsmile Can you trigger this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21443: [SPARK-24369][SQL] Correct handling for multiple distinc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21443 **[Test build #91241 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91241/testReport)** for PR 21443 at commit [`29e6485`](https://github.com/apache/spark/commit/29e64851f51aad5d79b2722e7ee2f8aeb7d8bf8a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21447: [SPARK-24339][SQL]Add project for transform/map/reduce s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21447 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21447: [SPARK-24339][SQL]Add project for transform/map/reduce s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21447 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21447: [SPARK-24339][SQL]Add project for transform/map/r...
GitHub user xdcjie opened a pull request: https://github.com/apache/spark/pull/21447 [SPARK-24339][SQL]Add project for transform/map/reduce sql to prune column ## What changes were proposed in this pull request? Transform query do not have Project Node, so that it will scan all the data of table. query like: `select transform(a, b) using 'func' from e` In this PR, I propose to add Project Node for transform query, so that it can scan less data by prune columns. ## How was this patch tested? Modify existing test ("transform query spec") You can merge this pull request into a Git repository by running: $ git pull https://github.com/xdcjie/spark branch-2.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21447.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21447 commit 11c5c5797e0fe6879e3434d7b1fae2687bcacd1e Author: xdcjie Date: 2018-05-29T04:57:19Z Add project for tranform/map/reduce sql to prune column --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21378: [SPARK-24326][Mesos] add support for local:// sch...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21378#discussion_r191310691 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala --- @@ -418,17 +417,33 @@ private[spark] class MesosClusterScheduler( envBuilder.build() } + private def isContainerLocalAppJar(desc: MesosDriverDescription): Boolean = { +val isLocalJar = desc.jarUrl.startsWith("local://") +val isContainerLocal = desc.conf.getOption("spark.mesos.appJar.local.resolution.mode").exists { + case "container" => true + case "host" => false + case other => +logWarning(s"Unknown spark.mesos.appJar.local.resolution.mode $other, using host.") +false + } --- End diff -- Can we do: ```scala desc.conf.getOption("spark.mesos.appJar.local.resolution.mode") match { case Some("container") => true case Some("host") | None => false case Some(other) => ... } ``` ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21409: [SPARK-24365][SQL] Add Data Source write benchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21409 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21409: [SPARK-24365][SQL] Add Data Source write benchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21409 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91237/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21409: [SPARK-24365][SQL] Add Data Source write benchmark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21409 **[Test build #91237 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91237/testReport)** for PR 21409 at commit [`8ffba61`](https://github.com/apache/spark/commit/8ffba61a3ebd6e06eec2fdf03e19a65cb5b40787). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21416: [SPARK-24371] [SQL] Added isInCollection in DataF...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21416#discussion_r191306678 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala --- @@ -392,9 +396,97 @@ class ColumnExpressionSuite extends QueryTest with SharedSQLContext { val df2 = Seq((1, Seq(1)), (2, Seq(2)), (3, Seq(3))).toDF("a", "b") -intercept[AnalysisException] { +val e = intercept[AnalysisException] { df2.filter($"a".isin($"b")) } +Seq("cannot resolve", "due to data type mismatch: Arguments must be same type but were") + .foreach { s => + assert(e.getMessage.toLowerCase(Locale.ROOT).contains(s.toLowerCase(Locale.ROOT))) + } + } + + test("isInCollection: Scala Collection") { +val df = Seq((1, "x"), (2, "y"), (3, "z")).toDF("a", "b") +checkAnswer(df.filter($"a".isInCollection(Seq(1, 2))), + df.collect().toSeq.filter(r => r.getInt(0) == 1 || r.getInt(0) == 2)) +checkAnswer(df.filter($"a".isInCollection(Seq(3, 2))), + df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 2)) +checkAnswer(df.filter($"a".isInCollection(Seq(3, 1))), + df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 1)) + +// Auto casting should work with mixture of different types in collections +checkAnswer(df.filter($"a".isInCollection(Seq(1.toShort, "2"))), + df.collect().toSeq.filter(r => r.getInt(0) == 1 || r.getInt(0) == 2)) +checkAnswer(df.filter($"a".isInCollection(Seq("3", 2.toLong))), + df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 2)) +checkAnswer(df.filter($"a".isInCollection(Seq(3, "1"))), + df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 1)) + +checkAnswer(df.filter($"b".isInCollection(Seq("y", "x"))), + df.collect().toSeq.filter(r => r.getString(1) == "y" || r.getString(1) == "x")) +checkAnswer(df.filter($"b".isInCollection(Seq("z", "x"))), + df.collect().toSeq.filter(r => r.getString(1) == "z" || r.getString(1) == "x")) +checkAnswer(df.filter($"b".isInCollection(Seq("z", "y"))), + df.collect().toSeq.filter(r => r.getString(1) == "z" || r.getString(1) == "y")) + +// Test with different types of collections +checkAnswer(df.filter($"a".isInCollection(Seq(1, 2).toSet)), + df.collect().toSeq.filter(r => r.getInt(0) == 1 || r.getInt(0) == 2)) +checkAnswer(df.filter($"a".isInCollection(Seq(3, 2).toArray)), + df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 2)) +checkAnswer(df.filter($"a".isInCollection(Seq(3, 1).toList)), + df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 1)) + +val df2 = Seq((1, Seq(1)), (2, Seq(2)), (3, Seq(3))).toDF("a", "b") + +val e = intercept[AnalysisException] { + df2.filter($"a".isInCollection(Seq($"b"))) +} +Seq("cannot resolve", "due to data type mismatch: Arguments must be same type but were") + .foreach { s => + assert(e.getMessage.toLowerCase(Locale.ROOT).contains(s.toLowerCase(Locale.ROOT))) + } + } + + test("isInCollection: Java Collection") { +val df = Seq((1, "x"), (2, "y"), (3, "z")).toDF("a", "b") --- End diff -- same thing here. just run a single test case. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21416: [SPARK-24371] [SQL] Added isInCollection in DataF...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21416#discussion_r191306654 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala --- @@ -392,9 +396,97 @@ class ColumnExpressionSuite extends QueryTest with SharedSQLContext { val df2 = Seq((1, Seq(1)), (2, Seq(2)), (3, Seq(3))).toDF("a", "b") -intercept[AnalysisException] { +val e = intercept[AnalysisException] { df2.filter($"a".isin($"b")) } +Seq("cannot resolve", "due to data type mismatch: Arguments must be same type but were") + .foreach { s => + assert(e.getMessage.toLowerCase(Locale.ROOT).contains(s.toLowerCase(Locale.ROOT))) + } + } + + test("isInCollection: Scala Collection") { --- End diff -- can we simplify the test cases? you are just testing this api as a wrapper. you don't need to run so many queries for type coercion. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21443: [SPARK-24369][SQL] Correct handling for multiple ...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21443#discussion_r191306683 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala --- @@ -151,7 +152,7 @@ object RewriteDistinctAggregates extends Rule[LogicalPlan] { } // Setup unique distinct aggregate children. - val distinctAggChildren = distinctAggGroups.keySet.flatten.toSeq.distinct + val distinctAggChildren = distinctAggGroups.keySet.flatten.toSeq --- End diff -- This is not related to this pr though, I dropped `.distinct` because it does nothing (`keySet.flatten` is already a set)? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21443: [SPARK-24369][SQL] Correct handling for multiple ...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21443#discussion_r191306423 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala --- @@ -687,4 +687,12 @@ class DataFrameAggregateSuite extends QueryTest with SharedSQLContext { } } } + + test("SPARK-24369 multiple distinct aggregations having the same argument set") { +val df = sql( + s"""SELECT corr(DISTINCT x, y), corr(DISTINCT y, x), count(*) --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocalPropert...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21437 cc @ueshin @HyukjinKwon @BryanCutler --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18717: [SPARK-21510] [SQL] Add isMaterialized() and eager persi...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/18717 The target of this ticket is 2.4? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21438: [SPARK-24398] [SQL] Improve SQLAppStatusListener.aggrega...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21438 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21438: [SPARK-24398] [SQL] Improve SQLAppStatusListener.aggrega...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21438 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91234/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21438: [SPARK-24398] [SQL] Improve SQLAppStatusListener.aggrega...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21438 **[Test build #91234 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91234/testReport)** for PR 21438 at commit [`eb87d2d`](https://github.com/apache/spark/commit/eb87d2d595374f3325a91ac53f0c11bff2b978e7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21443: [SPARK-24369][SQL] Correct handling for multiple distinc...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21443 cc @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21443: [SPARK-24369][SQL] Correct handling for multiple ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21443#discussion_r191302814 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala --- @@ -151,7 +152,7 @@ object RewriteDistinctAggregates extends Rule[LogicalPlan] { } // Setup unique distinct aggregate children. - val distinctAggChildren = distinctAggGroups.keySet.flatten.toSeq.distinct + val distinctAggChildren = distinctAggGroups.keySet.flatten.toSeq --- End diff -- Is this needed? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21445: [SPARK-24404][SS] Increase currentEpoch when meet a Epoc...
Github user LiangchangZ commented on the issue: https://github.com/apache/spark/pull/21445 > Looks like the patch is needed only with #21353 #21332 #21293 as of now, right? If then please > state the condition in JIRA issue description as well as PR's description so that we don't get confused @HeartSaVioR yes, I have updated JIRA issue description as well as PR's description, sorry for the confusion. > Please note that I'm commenting on top of current implementation, not considering #21353 #21332 #21293 Got it, thanks for reply --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21409: [SPARK-24365][SQL] Add Data Source write benchmar...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21409#discussion_r191302688 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceWriteBenchmark.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.SparkConf +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.util.Benchmark + +/** + * Benchmark to measure data source write performance. + * By default it measures 4 data source format: Parquet, ORC, JSON, CSV: + * spark-submit --class + * To measure specified formats, run it with arguments: + * spark-submit --class format1 [format2] [...] + */ +object DataSourceWriteBenchmark { + val conf = new SparkConf() +.setAppName("DataSourceWriteBenchmark") +.setIfMissing("spark.master", "local[1]") +.set("spark.sql.parquet.compression.codec", "snappy") +.set("spark.sql.orc.compression.codec", "snappy") + + val spark = SparkSession.builder.config(conf).getOrCreate() + + // Set default configs. Individual cases will change them if necessary. + spark.conf.set(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true") + + val tempTable = "temp" + val numRows = 1024 * 1024 * 15 + + def withTempTable(tableNames: String*)(f: => Unit): Unit = { +try f finally tableNames.foreach(spark.catalog.dropTempView) + } + + def withTable(tableNames: String*)(f: => Unit): Unit = { +try f finally { + tableNames.foreach { name => +spark.sql(s"DROP TABLE IF EXISTS $name") + } +} + } + + def writeInt(table: String, format: String, benchmark: Benchmark): Unit = { +spark.sql(s"create table $table(c1 INT, c2 STRING) using $format") --- End diff -- Here I am sure if we need to compare all numeric types: ByteType, ShortType, IntegerType, LongType, FloatType, DoubleType --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21446: [SPARK-19613][SS][TEST] Random.nextString is not safe fo...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/21446 Thank you for review and merging, @HyukjinKwon . Thank you all! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21443: [SPARK-24369][SQL] Correct handling for multiple ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21443#discussion_r191302155 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala --- @@ -687,4 +687,12 @@ class DataFrameAggregateSuite extends QueryTest with SharedSQLContext { } } } + + test("SPARK-24369 multiple distinct aggregations having the same argument set") { +val df = sql( + s"""SELECT corr(DISTINCT x, y), corr(DISTINCT y, x), count(*) --- End diff -- Move it to SQLQueryTestSuite? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21409: [SPARK-24365][SQL] Add Data Source write benchmar...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21409#discussion_r191302141 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceWriteBenchmark.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.SparkConf +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.util.Benchmark + +/** + * Benchmark to measure data source write performance. + * By default it measures 4 data source format: Parquet, ORC, JSON, CSV: + * spark-submit --class + * To measure specified formats, run it with arguments: + * spark-submit --class format1 [format2] [...] + */ +object DataSourceWriteBenchmark { + val conf = new SparkConf() +.setAppName("DataSourceWriteBenchmark") +.setIfMissing("spark.master", "local[1]") +.set("spark.sql.parquet.compression.codec", "snappy") +.set("spark.sql.orc.compression.codec", "snappy") + + val spark = SparkSession.builder.config(conf).getOrCreate() + + // Set default configs. Individual cases will change them if necessary. + spark.conf.set(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true") + + val tempTable = "temp" + val numRows = 1024 * 1024 * 15 + + def withTempTable(tableNames: String*)(f: => Unit): Unit = { +try f finally tableNames.foreach(spark.catalog.dropTempView) + } + + def withTable(tableNames: String*)(f: => Unit): Unit = { +try f finally { + tableNames.foreach { name => +spark.sql(s"DROP TABLE IF EXISTS $name") + } +} + } + + def writeInt(table: String, format: String, benchmark: Benchmark): Unit = { +spark.sql(s"create table $table(c1 INT, c2 STRING) using $format") +benchmark.addCase("Output Single Int Column") { _ => + spark.sql(s"INSERT overwrite table $table select cast(id as INT) as " + +s"c1, cast(id as STRING) as c2 from $tempTable") +} + } + + def writeIntString(table: String, format: String, benchmark: Benchmark): Unit = { +spark.sql(s"create table $table(c1 INT, c2 STRING) using $format") +benchmark.addCase("Output Int and String Column") { _ => + spark.sql(s"INSERT overwrite table $table select cast(id as INT) as " + +s"c1, cast(id as STRING) as c2 from $tempTable") +} + } + + def writePartition(table: String, format: String, benchmark: Benchmark): Unit = { +spark.sql(s"create table $table(p INT, id INT) using $format PARTITIONED BY (p)") +benchmark.addCase("Output Partitions") { _ => + spark.sql(s"INSERT overwrite table $table select cast(id as INT) as id," + +s" cast(id % 2 as INT) as p from $tempTable") +} + } + + def writeBucket(table: String, format: String, benchmark: Benchmark): Unit = { +spark.sql(s"create table $table(c1 INT, c2 INT) using $format CLUSTERED BY (c2) INTO 2 BUCKETS") +benchmark.addCase("Output Buckets") { _ => + spark.sql(s"INSERT overwrite table $table select cast(id as INT) as " + +s"c1, cast(id as INT) as c2 from $tempTable") +} + } + + def main(args: Array[String]): Unit = { +val tableInt = "tableInt" +val tableIntString = "tableIntString" +val tablePartition = "tablePartition" +val tableBucket = "tableBucket" +// If the +val formats: Seq[String] = if (args.isEmpty) { + Seq("Parquet", "ORC", "JSON", "CSV") +} else { + args +} +/* +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +Parquet writer benchmark:Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative + +Output Single Int
[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21439#discussion_r191299100 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -523,6 +523,8 @@ case class JsonToStructs( // can generate incorrect files if values are missing in columns declared as non-nullable. val nullableSchema = if (forceNullableSchema) schema.asNullable else schema + val unpackArray: Boolean = options.get("unpackArray").map(_.toBoolean).getOrElse(false) --- End diff -- If we add this new option here, I feel we'd be better to document somewhere (e.g., `sq/functions.scala`) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21439#discussion_r191298921 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -523,6 +523,8 @@ case class JsonToStructs( // can generate incorrect files if values are missing in columns declared as non-nullable. val nullableSchema = if (forceNullableSchema) schema.asNullable else schema + val unpackArray: Boolean = options.get("unpackArray").map(_.toBoolean).getOrElse(false) --- End diff -- Can you make the option `unpackArray` case-insensitive? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21439#discussion_r191298844 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -523,6 +523,8 @@ case class JsonToStructs( // can generate incorrect files if values are missing in columns declared as non-nullable. val nullableSchema = if (forceNullableSchema) schema.asNullable else schema + val unpackArray: Boolean = options.get("unpackArray").map(_.toBoolean).getOrElse(false) --- End diff -- private? (This is not related to this pr though, `nullableSchema` also can be private?) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21439: [SPARK-24391][SQL] Support arrays of any types by from_j...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21439 Can we also accept primitive arrays in `to_json`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21409: [SPARK-24365][SQL] Add Data Source write benchmar...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21409#discussion_r191297180 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceWriteBenchmark.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.SparkConf +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.util.Benchmark + +/** + * Benchmark to measure data source write performance. + * By default it measures 4 data source format: Parquet, ORC, JSON, CSV: + * spark-submit --class + * To measure specified formats, run it with arguments: + * spark-submit --class format1 [format2] [...] + */ +object DataSourceWriteBenchmark { + val conf = new SparkConf() +.setAppName("DataSourceWriteBenchmark") +.setIfMissing("spark.master", "local[1]") +.set("spark.sql.parquet.compression.codec", "snappy") +.set("spark.sql.orc.compression.codec", "snappy") + + val spark = SparkSession.builder.config(conf).getOrCreate() + + // Set default configs. Individual cases will change them if necessary. + spark.conf.set(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true") + + val tempTable = "temp" + val numRows = 1024 * 1024 * 15 + + def withTempTable(tableNames: String*)(f: => Unit): Unit = { +try f finally tableNames.foreach(spark.catalog.dropTempView) + } + + def withTable(tableNames: String*)(f: => Unit): Unit = { +try f finally { + tableNames.foreach { name => +spark.sql(s"DROP TABLE IF EXISTS $name") + } +} + } + + def writeInt(table: String, format: String, benchmark: Benchmark): Unit = { +spark.sql(s"create table $table(c1 INT, c2 STRING) using $format") +benchmark.addCase("Output Single Int Column") { _ => + spark.sql(s"INSERT overwrite table $table select cast(id as INT) as " + +s"c1, cast(id as STRING) as c2 from $tempTable") +} + } + + def writeIntString(table: String, format: String, benchmark: Benchmark): Unit = { +spark.sql(s"create table $table(c1 INT, c2 STRING) using $format") +benchmark.addCase("Output Int and String Column") { _ => + spark.sql(s"INSERT overwrite table $table select cast(id as INT) as " + +s"c1, cast(id as STRING) as c2 from $tempTable") +} + } + + def writePartition(table: String, format: String, benchmark: Benchmark): Unit = { +spark.sql(s"create table $table(p INT, id INT) using $format PARTITIONED BY (p)") +benchmark.addCase("Output Partitions") { _ => + spark.sql(s"INSERT overwrite table $table select cast(id as INT) as id," + +s" cast(id % 2 as INT) as p from $tempTable") +} + } + + def writeBucket(table: String, format: String, benchmark: Benchmark): Unit = { +spark.sql(s"create table $table(c1 INT, c2 INT) using $format CLUSTERED BY (c2) INTO 2 BUCKETS") +benchmark.addCase("Output Buckets") { _ => + spark.sql(s"INSERT overwrite table $table select cast(id as INT) as " + +s"c1, cast(id as INT) as c2 from $tempTable") +} + } + + def main(args: Array[String]): Unit = { +val tableInt = "tableInt" +val tableIntString = "tableIntString" +val tablePartition = "tablePartition" +val tableBucket = "tableBucket" +// If the +val formats: Seq[String] = if (args.isEmpty) { + Seq("Parquet", "ORC", "JSON", "CSV") +} else { + args +} +/* +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +Parquet writer benchmark:Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative + +Output Single Int Column
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user ssuchter commented on the issue: https://github.com/apache/spark/pull/20697 I'll work on Matt's comments from Friday next. Here's the output (after the bugfix) from running against mainline: ``` MBP:~/src/ssuchter-spark% git remote get-url origin https://github.com/ssuchter/spark.git MBP:~/src/ssuchter-spark% git remote get-url upstream git://github.com/apache/spark.git 1d8a265d13 (HEAD -> ssuchter-k8s-integration-tests, origin/ssuchter-k8s-integration-tests) Fix a bug in KubernetesTestComponents - don't an an empty string for zero arguments 65347b319a Merge branch 'ssuchter-k8s-integration-tests' of https://github.com/ssuchter/spark into ssuchter-k8s-integration-tests 1a531abcf6 Remove unused code relating to Kerberos, which doesn't belong in this PR 3ba6ffb5f2 Remove e2e-prow.sh, which isn't appropriate for this PR 9e64f43b62 Remove unnecessary cloning and building code for the Spark repo e6bd56325d Update README.md excluding cloning and building logic e70f3bea3d Remove K8s cloud-based backend testing support from this PR a0023b2f33 Remove config options that were only used during repo clone process e55b8a723e Remove repository cloning behavior and allow script to be called from other directories 9b0eede244 Fixes for scala style f29679ef56 Ignore dist/ for style checks 3615953bea Fix scala style issues bef586f740 Remove LICENSE and copy of mvn wrapper script. Rewrite path for calling mvn wrapper script. 81c7a66ad6 Make k8s integration tests build when top-level kubernetes profile selected 365d6bc65d Initial checkin of k8s integration tests. These tests were developed in the https://github.com/apache-spark-on-k8s/spark-integration repo by several contributors. This is a copy of the current state into the main apache spark repo. The only changes from the current spark-integration repo state are: * Move the files from the repo root into resource-managers/kubernetes/integration-tests * Add a reference to these tests in the root README.md * Fix a path reference in dev/dev-run-integration-tests.sh * Add a TODO in include/util.sh dbce275784 Remove unused code relating to Kerberos, which doesn't belong in this PR 5ffa464c65 Remove e2e-prow.sh, which isn't appropriate for this PR 13721f69a2 Remove unnecessary cloning and building code for the Spark repo ba720733fa Update README.md excluding cloning and building logic 1b1528a504 (upstream/master, origin/master, origin/HEAD, master) [SPARK-24366][SQL] Improving of error messages for type converting MBP:~/src/ssuchter-spark% echo $REVISION 1d8a265d13 MBP:~/src/ssuchter-spark% echo $DATE 20180528 MBP:~/src/ssuchter-spark% ./dev/make-distribution.sh --name ${DATE}-${REVISION} --tgz -DzincPort=${ZINC_PORT} -Phadoop-2.7 -Pkubernetes -Pkinesis-asl -Phive -Phive-thriftserver +++ dirname ./dev/make-distribution.sh ++ cd ./dev/.. ++ pwd + SPARK_HOME=/Users/ssuchter/src/ssuchter-spark + DISTDIR=/Users/ssuchter/src/ssuchter-spark/dist + MAKE_TGZ=false + MAKE_PIP=false + MAKE_R=false … + TARDIR_NAME=spark-2.4.0-SNAPSHOT-bin-20180528-1d8a265d13 + TARDIR=/Users/ssuchter/src/ssuchter-spark/spark-2.4.0-SNAPSHOT-bin-20180528-1d8a265d13 + rm -rf /Users/ssuchter/src/ssuchter-spark/spark-2.4.0-SNAPSHOT-bin-20180528-1d8a265d13 + cp -r /Users/ssuchter/src/ssuchter-spark/dist /Users/ssuchter/src/ssuchter-spark/spark-2.4.0-SNAPSHOT-bin-20180528-1d8a265d13 + tar czf spark-2.4.0-SNAPSHOT-bin-20180528-1d8a265d13.tgz -C /Users/ssuchter/src/ssuchter-spark spark-2.4.0-SNAPSHOT-bin-20180528-1d8a265d13 + rm -rf /Users/ssuchter/src/ssuchter-spark/spark-2.4.0-SNAPSHOT-bin-20180528-1d8a265d13 MBP:~/src/ssuchter-spark% ./resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh --spark-tgz ~/src/ssuchter-spark/spark-2.4.0-SNAPSHOT-bin-20180528-1d8a265d13.tgz Using `mvn` from path: /usr/local/bin/mvn [INFO] Scanning for projects... [INFO] [INFO] [INFO] Building Spark Project Kubernetes Integration Tests 2.4.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-enforcer-plugin:3.0.0-M1:enforce (enforce-versions) @ spark-kubernetes-integration-tests_2.11 --- … Successfully tagged kubespark/spark:68388D3B-6FAC-4E59-8AED-8604AA437C2D /Users/ssuchter/src/ssuchter-spark/resource-managers/kubernetes/integration-tests [INFO] [INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @ spark-kubernetes-integration-tests_2.11 --- Discovery starting. Discovery completed in 118 milliseconds. Run starting. Expected test count is: 1 KubernetesSuite: -
[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/21370 @viirya @gatorsmile @ueshin @felixcheung @HyukjinKwon The refactor about generating html code out of `Dataset.scala` was done in 94f3414. Please help to check whether it is appropriate when you have time. Thanks! @rdblue @rxin The lastest commit also include the logic of using `spark.sql.repl.eagerEval.enabled` both control \_\_repr\_\_ and \_repr\_html\_. Please have a look when you have time. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21426 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3654/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21426 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3653/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21445: [SPARK-24404][SS] Increase currentEpoch when meet a Epoc...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/21445 ``` Looks like the patch is needed only with #21353 #21332 #21293 as of now, right? ``` @HeartSaVioR Yes, sorry for the late explanation. The background is we are running POC based on #21353 #21332 #21293 and the latest master, including the work of queue rdd reader/writer by @jose-torres. Greatly thanks for the work of #21239, we can complete all status operation after fix this bug. So we think we should report this to let you know. ``` Please note that I'm commenting on top of current implementation, not considering #21353 #21332 #21293. ``` Got it, owing to some pressure within internal requirement for CP, we running over these 3 patches, but we'll follow closely with all your works and hope to contribute into CP. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21426 **[Test build #91240 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91240/testReport)** for PR 21426 at commit [`f015e0d`](https://github.com/apache/spark/commit/f015e0d587c8d9f8cd359fecc325a19362a59c55). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13599 **[Test build #91239 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91239/testReport)** for PR 13599 at commit [`44500fc`](https://github.com/apache/spark/commit/44500fc0d66bd930cc12ba6b66985e08f61d9ecc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21420: [SPARK-24377][Spark Submit] make --py-files work ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21420 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/21420 Thanks @HyukjinKwon ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21420 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13599 (Oops, the test failure was legitimate.) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocalPropert...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21437 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/13599 Actually lets also loop in @ifilonenko who's been thinking about similar things but with more of a K8s bent. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocalPropert...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21437 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91233/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocalPropert...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21437 **[Test build #91233 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91233/testReport)** for PR 21437 at commit [`b9d8dd3`](https://github.com/apache/spark/commit/b9d8dd304ed3d172a2e44919103e9500893fc829). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/13599 It's certainly closer, I haven't had a chance to take a look super recently (been focused on the PySpark K8s integration). I'm still hesitant about this merged as-is from a skim through, but maybe at Spark Summit SF (or a hangout call the day after we can all schedule) this would make sense to try and get a better grasp on. Sorry I haven't had the time this month to take much of a look. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21446: [SPARK-19613][SS][TEST] Random.nextString is not ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21446 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21444: Branch 2.3
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21444 @mozammal mind closing this please? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21446: [SPARK-19613][SS][TEST] Random.nextString is not safe fo...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21446 Merged to master and branch-2.3. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13599 **[Test build #91238 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91238/testReport)** for PR 13599 at commit [`d9a5f00`](https://github.com/apache/spark/commit/d9a5f005bd6e411326963f8b87fe162603830b5c). * This patch **fails Java style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class VirtualEnvFactory(pythonExec: String, conf: SparkConf, isDriver: Boolean)` * ` class DriverEndpoint(override val rpcEnv: RpcEnv)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91238/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3652/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13599 **[Test build #91238 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91238/testReport)** for PR 13599 at commit [`d9a5f00`](https://github.com/apache/spark/commit/d9a5f005bd6e411326963f8b87fe162603830b5c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21446: [SPARK-19613][SS][TEST] Random.nextString is not safe fo...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21446 Yea, I was facing this problem too. Thanks for fixing this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13599 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21409: [SPARK-24365][SQL] Add Data Source write benchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21409 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3651/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21409: [SPARK-24365][SQL] Add Data Source write benchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21409 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20697 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3522/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20697 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3522/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20697 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20697 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3650/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21409: [SPARK-24365][SQL] Add Parquet write benchmark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21409 **[Test build #91237 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91237/testReport)** for PR 21409 at commit [`8ffba61`](https://github.com/apache/spark/commit/8ffba61a3ebd6e06eec2fdf03e19a65cb5b40787). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20697 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20697 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91236/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20697 **[Test build #91236 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91236/testReport)** for PR 20697 at commit [`1d8a265`](https://github.com/apache/spark/commit/1d8a265d13b65dcec8db11a5be09d4a029037d2c). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91235/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3649/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13599 **[Test build #91235 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91235/testReport)** for PR 13599 at commit [`d9a5f00`](https://github.com/apache/spark/commit/d9a5f005bd6e411326963f8b87fe162603830b5c). * This patch **fails Java style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class VirtualEnvFactory(pythonExec: String, conf: SparkConf, isDriver: Boolean)` * ` class DriverEndpoint(override val rpcEnv: RpcEnv)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20697 **[Test build #91236 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91236/testReport)** for PR 20697 at commit [`1d8a265`](https://github.com/apache/spark/commit/1d8a265d13b65dcec8db11a5be09d4a029037d2c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user ssuchter commented on the issue: https://github.com/apache/spark/pull/20697 Fixed the bug. @mccheah I'd appreciate your eyes on commit 1d8a265, for both correctness and style. (I haven't used Scala before this project, so I'm very not confidence in the best way to do things.) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13599 **[Test build #91235 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91235/testReport)** for PR 13599 at commit [`d9a5f00`](https://github.com/apache/spark/commit/d9a5f005bd6e411326963f8b87fe162603830b5c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21438: [SPARK-24398] [SQL] Improve SQLAppStatusListener.aggrega...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21438 **[Test build #91234 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91234/testReport)** for PR 21438 at commit [`eb87d2d`](https://github.com/apache/spark/commit/eb87d2d595374f3325a91ac53f0c11bff2b978e7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21438: [SPARK-24398] [SQL] Improve SQLAppStatusListener....
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21438#discussion_r191285757 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala --- @@ -159,7 +159,7 @@ class SQLAppStatusListener( } private def aggregateMetrics(exec: LiveExecutionData): Map[Long, String] = { -val metricIds = exec.metrics.map(_.accumulatorId).sorted +val metricIds = exec.metrics.map(_.accumulatorId).toSet --- End diff -- I think we can get rid of `metricIds ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21438: [SPARK-24398] [SQL] Improve SQLAppStatusListener.aggrega...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21438 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21446: [SPARK-19613][SS][TEST] Random.nextString is not safe fo...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/21446 Thank you for reviewing, @felixcheung and @HeartSaVioR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21288#discussion_r191283013 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala --- @@ -131,211 +132,214 @@ object FilterPushdownBenchmark { } /* +OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.26-46.32.amzn1.x86_64 Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz Select 0 string row (value IS NULL): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative -Parquet Vectorized8452 / 8504 1.9 537.3 1.0X -Parquet Vectorized (Pushdown) 274 / 281 57.3 17.4 30.8X -Native ORC Vectorized 8167 / 8185 1.9 519.3 1.0X -Native ORC Vectorized (Pushdown) 365 / 379 43.1 23.2 23.1X +Parquet Vectorized2961 / 3123 5.3 188.3 1.0X +Parquet Vectorized (Pushdown) 3057 / 3121 5.1 194.4 1.0X --- End diff -- yea, I thinks so. But, not sure. I tried to run multiple times though, I didn't get the old performance values... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21378: [SPARK-24326][Mesos] add support for local:// sch...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21378#discussion_r191271894 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala --- @@ -418,17 +417,34 @@ private[spark] class MesosClusterScheduler( envBuilder.build() } + private def isContainerLocalAppJar(desc: MesosDriverDescription): Boolean = { +val isLocalJar = desc.jarUrl.startsWith("local://") +val isContainerLocal = desc.conf.getOption("spark.mesos.appJar.local.resolution.mode").exists { --- End diff -- interesting! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org