[GitHub] spark issue #21416: [SPARK-24371] [SQL] Added isInCollection in DataFrame AP...

2018-05-28 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21416
  
LGTM (I didn't look that carefully though)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21416: [SPARK-24371] [SQL] Added isInCollection in DataFrame AP...

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21416
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3656/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21416: [SPARK-24371] [SQL] Added isInCollection in DataFrame AP...

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21416
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21447: [SPARK-24339][SQL]Add project for transform/map/reduce s...

2018-05-28 Thread xdcjie
Github user xdcjie commented on the issue:

https://github.com/apache/spark/pull/21447
  
@maropu  I updated the commet. In summary, with this pr can reduce the time 
of scan and assemble data. In our scenario, the relation(table) have 700 
columns.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21416: [SPARK-24371] [SQL] Added isInCollection in DataFrame AP...

2018-05-28 Thread dbtsai
Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/21416
  
@rxin I simplified the test cases as you suggested. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21416: [SPARK-24371] [SQL] Added isInCollection in DataFrame AP...

2018-05-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21416
  
**[Test build #91242 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91242/testReport)**
 for PR 21416 at commit 
[`fed2846`](https://github.com/apache/spark/commit/fed2846fe7c9ca2cb4534b23803cd29d5a18d4f9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21416: [SPARK-24371] [SQL] Added isInCollection in DataF...

2018-05-28 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/21416#discussion_r191317978
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala ---
@@ -392,9 +396,97 @@ class ColumnExpressionSuite extends QueryTest with 
SharedSQLContext {
 
 val df2 = Seq((1, Seq(1)), (2, Seq(2)), (3, Seq(3))).toDF("a", "b")
 
-intercept[AnalysisException] {
+val e = intercept[AnalysisException] {
   df2.filter($"a".isin($"b"))
 }
+Seq("cannot resolve", "due to data type mismatch: Arguments must be 
same type but were")
+  .foreach { s =>
+
assert(e.getMessage.toLowerCase(Locale.ROOT).contains(s.toLowerCase(Locale.ROOT)))
+  }
+  }
+
+  test("isInCollection: Scala Collection") {
+val df = Seq((1, "x"), (2, "y"), (3, "z")).toDF("a", "b")
+checkAnswer(df.filter($"a".isInCollection(Seq(1, 2))),
+  df.collect().toSeq.filter(r => r.getInt(0) == 1 || r.getInt(0) == 2))
+checkAnswer(df.filter($"a".isInCollection(Seq(3, 2))),
+  df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 2))
+checkAnswer(df.filter($"a".isInCollection(Seq(3, 1))),
+  df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 1))
+
+// Auto casting should work with mixture of different types in 
collections
+checkAnswer(df.filter($"a".isInCollection(Seq(1.toShort, "2"))),
+  df.collect().toSeq.filter(r => r.getInt(0) == 1 || r.getInt(0) == 2))
+checkAnswer(df.filter($"a".isInCollection(Seq("3", 2.toLong))),
+  df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 2))
+checkAnswer(df.filter($"a".isInCollection(Seq(3, "1"))),
+  df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 1))
+
+checkAnswer(df.filter($"b".isInCollection(Seq("y", "x"))),
+  df.collect().toSeq.filter(r => r.getString(1) == "y" || 
r.getString(1) == "x"))
+checkAnswer(df.filter($"b".isInCollection(Seq("z", "x"))),
+  df.collect().toSeq.filter(r => r.getString(1) == "z" || 
r.getString(1) == "x"))
+checkAnswer(df.filter($"b".isInCollection(Seq("z", "y"))),
+  df.collect().toSeq.filter(r => r.getString(1) == "z" || 
r.getString(1) == "y"))
+
+// Test with different types of collections
+checkAnswer(df.filter($"a".isInCollection(Seq(1, 2).toSet)),
+  df.collect().toSeq.filter(r => r.getInt(0) == 1 || r.getInt(0) == 2))
+checkAnswer(df.filter($"a".isInCollection(Seq(3, 2).toArray)),
+  df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 2))
+checkAnswer(df.filter($"a".isInCollection(Seq(3, 1).toList)),
+  df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 1))
+
+val df2 = Seq((1, Seq(1)), (2, Seq(2)), (3, Seq(3))).toDF("a", "b")
+
+val e = intercept[AnalysisException] {
+  df2.filter($"a".isInCollection(Seq($"b")))
+}
+Seq("cannot resolve", "due to data type mismatch: Arguments must be 
same type but were")
+  .foreach { s =>
+
assert(e.getMessage.toLowerCase(Locale.ROOT).contains(s.toLowerCase(Locale.ROOT)))
+  }
+  }
+
+  test("isInCollection: Java Collection") {
+val df = Seq((1, "x"), (2, "y"), (3, "z")).toDF("a", "b")
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21416: [SPARK-24371] [SQL] Added isInCollection in DataF...

2018-05-28 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/21416#discussion_r191317980
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala ---
@@ -392,9 +396,97 @@ class ColumnExpressionSuite extends QueryTest with 
SharedSQLContext {
 
 val df2 = Seq((1, Seq(1)), (2, Seq(2)), (3, Seq(3))).toDF("a", "b")
 
-intercept[AnalysisException] {
+val e = intercept[AnalysisException] {
   df2.filter($"a".isin($"b"))
 }
+Seq("cannot resolve", "due to data type mismatch: Arguments must be 
same type but were")
+  .foreach { s =>
+
assert(e.getMessage.toLowerCase(Locale.ROOT).contains(s.toLowerCase(Locale.ROOT)))
+  }
+  }
+
+  test("isInCollection: Scala Collection") {
--- End diff --

Done. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21442: [SPARK-24402] [SQL] Optimize `In` expression when...

2018-05-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21442#discussion_r191314972
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -219,7 +219,14 @@ object ReorderAssociativeOperator extends 
Rule[LogicalPlan] {
 object OptimizeIn extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case q: LogicalPlan => q transformExpressionsDown {
-  case In(v, list) if list.isEmpty && !v.nullable => FalseLiteral
+  case In(v, list) if list.isEmpty =>
+// When v is not nullable, the following expression will be 
optimized
+// to FalseLiteral which is tested in OptimizeInSuite.scala
+If(IsNotNull(v), FalseLiteral, Literal(null, BooleanType))
+  case In(v, Seq(elem @ Literal(_, _))) =>
--- End diff --

This has a bug when the Literal is a struct. See the test failure: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91218/testReport/org.apache.spark.sql/SQLQueryTestSuite/sql/


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21442: [SPARK-24402] [SQL] Optimize `In` expression when...

2018-05-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21442#discussion_r191314675
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -219,7 +219,14 @@ object ReorderAssociativeOperator extends 
Rule[LogicalPlan] {
 object OptimizeIn extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case q: LogicalPlan => q transformExpressionsDown {
-  case In(v, list) if list.isEmpty && !v.nullable => FalseLiteral
+  case In(v, list) if list.isEmpty =>
+// When v is not nullable, the following expression will be 
optimized
+// to FalseLiteral which is tested in OptimizeInSuite.scala
+If(IsNotNull(v), FalseLiteral, Literal(null, BooleanType))
+  case In(v, Seq(elem @ Literal(_, _))) =>
--- End diff --

This can be moved inside `case expr @ In(v, list) if 
expr.inSetConvertible`. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21436: [SPARK-24250][SQL][Follow-up] support accessing S...

2018-05-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21436


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21436: [SPARK-24250][SQL][Follow-up] support accessing S...

2018-05-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21436#discussion_r191313872
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -1021,21 +1021,33 @@ object SparkSession extends Logging {
   /**
* Returns the active SparkSession for the current thread, returned by 
the builder.
*
+   * @note Return None, when calling this function on executors
+   *
* @since 2.2.0
*/
   def getActiveSession: Option[SparkSession] = {
-assertOnDriver()
--- End diff --

`assertOnDriver` is a helpful method. It might be useful to the other 
scenarios in the future. Let us keep it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21436: [SPARK-24250][SQL][Follow-up] support accessing SQLConf ...

2018-05-28 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21436
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21443: [SPARK-24369][SQL] Correct handling for multiple distinc...

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21443
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3655/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21443: [SPARK-24369][SQL] Correct handling for multiple distinc...

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21443
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21447: [SPARK-24339][SQL]Add project for transform/map/reduce s...

2018-05-28 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21447
  
Could you add explain result differences with/without this pr in the 
description?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21447: [SPARK-24339][SQL]Add project for transform/map/r...

2018-05-28 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21447#discussion_r191312085
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -338,6 +338,17 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
 // Add where.
 val withFilter = relation.optionalMap(where)(filter)
 
+// Add project.
+val namedExpressions = expressions.map {
+  case e: NamedExpression => e
+  case e: Expression => UnresolvedAlias(e)
--- End diff --

nit: `case e: _ => UnresolvedAlias(e)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21447: [SPARK-24339][SQL]Add project for transform/map/reduce s...

2018-05-28 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21447
  
@gatorsmile Can you trigger this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21443: [SPARK-24369][SQL] Correct handling for multiple distinc...

2018-05-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21443
  
**[Test build #91241 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91241/testReport)**
 for PR 21443 at commit 
[`29e6485`](https://github.com/apache/spark/commit/29e64851f51aad5d79b2722e7ee2f8aeb7d8bf8a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21447: [SPARK-24339][SQL]Add project for transform/map/reduce s...

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21447
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21447: [SPARK-24339][SQL]Add project for transform/map/reduce s...

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21447
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21447: [SPARK-24339][SQL]Add project for transform/map/r...

2018-05-28 Thread xdcjie
GitHub user xdcjie opened a pull request:

https://github.com/apache/spark/pull/21447

[SPARK-24339][SQL]Add project for transform/map/reduce sql to prune column

## What changes were proposed in this pull request?

Transform query do not have Project Node, so that it will scan all the data 
of table. query like:

`select transform(a, b) using 'func' from e`

In this PR, I propose to add Project Node for transform query, so that it 
can scan less data by prune columns.

## How was this patch tested?

Modify existing test ("transform query spec")


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xdcjie/spark branch-2.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21447.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21447


commit 11c5c5797e0fe6879e3434d7b1fae2687bcacd1e
Author: xdcjie 
Date:   2018-05-29T04:57:19Z

Add project for tranform/map/reduce sql to prune column




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21378: [SPARK-24326][Mesos] add support for local:// sch...

2018-05-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21378#discussion_r191310691
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala
 ---
@@ -418,17 +417,33 @@ private[spark] class MesosClusterScheduler(
 envBuilder.build()
   }
 
+  private def isContainerLocalAppJar(desc: MesosDriverDescription): 
Boolean = {
+val isLocalJar = desc.jarUrl.startsWith("local://")
+val isContainerLocal = 
desc.conf.getOption("spark.mesos.appJar.local.resolution.mode").exists {
+  case "container" => true
+  case "host" => false
+  case other =>
+logWarning(s"Unknown spark.mesos.appJar.local.resolution.mode 
$other, using host.")
+false
+  }
--- End diff --

Can we do:

```scala
desc.conf.getOption("spark.mesos.appJar.local.resolution.mode") match {
case Some("container") => true
case Some("host") | None => false
case Some(other) =>
...
}
```
?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21409: [SPARK-24365][SQL] Add Data Source write benchmark

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21409
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21409: [SPARK-24365][SQL] Add Data Source write benchmark

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21409
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91237/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21409: [SPARK-24365][SQL] Add Data Source write benchmark

2018-05-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21409
  
**[Test build #91237 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91237/testReport)**
 for PR 21409 at commit 
[`8ffba61`](https://github.com/apache/spark/commit/8ffba61a3ebd6e06eec2fdf03e19a65cb5b40787).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21416: [SPARK-24371] [SQL] Added isInCollection in DataF...

2018-05-28 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21416#discussion_r191306678
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala ---
@@ -392,9 +396,97 @@ class ColumnExpressionSuite extends QueryTest with 
SharedSQLContext {
 
 val df2 = Seq((1, Seq(1)), (2, Seq(2)), (3, Seq(3))).toDF("a", "b")
 
-intercept[AnalysisException] {
+val e = intercept[AnalysisException] {
   df2.filter($"a".isin($"b"))
 }
+Seq("cannot resolve", "due to data type mismatch: Arguments must be 
same type but were")
+  .foreach { s =>
+
assert(e.getMessage.toLowerCase(Locale.ROOT).contains(s.toLowerCase(Locale.ROOT)))
+  }
+  }
+
+  test("isInCollection: Scala Collection") {
+val df = Seq((1, "x"), (2, "y"), (3, "z")).toDF("a", "b")
+checkAnswer(df.filter($"a".isInCollection(Seq(1, 2))),
+  df.collect().toSeq.filter(r => r.getInt(0) == 1 || r.getInt(0) == 2))
+checkAnswer(df.filter($"a".isInCollection(Seq(3, 2))),
+  df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 2))
+checkAnswer(df.filter($"a".isInCollection(Seq(3, 1))),
+  df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 1))
+
+// Auto casting should work with mixture of different types in 
collections
+checkAnswer(df.filter($"a".isInCollection(Seq(1.toShort, "2"))),
+  df.collect().toSeq.filter(r => r.getInt(0) == 1 || r.getInt(0) == 2))
+checkAnswer(df.filter($"a".isInCollection(Seq("3", 2.toLong))),
+  df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 2))
+checkAnswer(df.filter($"a".isInCollection(Seq(3, "1"))),
+  df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 1))
+
+checkAnswer(df.filter($"b".isInCollection(Seq("y", "x"))),
+  df.collect().toSeq.filter(r => r.getString(1) == "y" || 
r.getString(1) == "x"))
+checkAnswer(df.filter($"b".isInCollection(Seq("z", "x"))),
+  df.collect().toSeq.filter(r => r.getString(1) == "z" || 
r.getString(1) == "x"))
+checkAnswer(df.filter($"b".isInCollection(Seq("z", "y"))),
+  df.collect().toSeq.filter(r => r.getString(1) == "z" || 
r.getString(1) == "y"))
+
+// Test with different types of collections
+checkAnswer(df.filter($"a".isInCollection(Seq(1, 2).toSet)),
+  df.collect().toSeq.filter(r => r.getInt(0) == 1 || r.getInt(0) == 2))
+checkAnswer(df.filter($"a".isInCollection(Seq(3, 2).toArray)),
+  df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 2))
+checkAnswer(df.filter($"a".isInCollection(Seq(3, 1).toList)),
+  df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 1))
+
+val df2 = Seq((1, Seq(1)), (2, Seq(2)), (3, Seq(3))).toDF("a", "b")
+
+val e = intercept[AnalysisException] {
+  df2.filter($"a".isInCollection(Seq($"b")))
+}
+Seq("cannot resolve", "due to data type mismatch: Arguments must be 
same type but were")
+  .foreach { s =>
+
assert(e.getMessage.toLowerCase(Locale.ROOT).contains(s.toLowerCase(Locale.ROOT)))
+  }
+  }
+
+  test("isInCollection: Java Collection") {
+val df = Seq((1, "x"), (2, "y"), (3, "z")).toDF("a", "b")
--- End diff --

same thing here. just run a single test case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21416: [SPARK-24371] [SQL] Added isInCollection in DataF...

2018-05-28 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21416#discussion_r191306654
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala ---
@@ -392,9 +396,97 @@ class ColumnExpressionSuite extends QueryTest with 
SharedSQLContext {
 
 val df2 = Seq((1, Seq(1)), (2, Seq(2)), (3, Seq(3))).toDF("a", "b")
 
-intercept[AnalysisException] {
+val e = intercept[AnalysisException] {
   df2.filter($"a".isin($"b"))
 }
+Seq("cannot resolve", "due to data type mismatch: Arguments must be 
same type but were")
+  .foreach { s =>
+
assert(e.getMessage.toLowerCase(Locale.ROOT).contains(s.toLowerCase(Locale.ROOT)))
+  }
+  }
+
+  test("isInCollection: Scala Collection") {
--- End diff --

can we simplify the test cases? you are just testing this api as a wrapper. 
you don't need to run so many queries for type coercion.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21443: [SPARK-24369][SQL] Correct handling for multiple ...

2018-05-28 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21443#discussion_r191306683
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala
 ---
@@ -151,7 +152,7 @@ object RewriteDistinctAggregates extends 
Rule[LogicalPlan] {
   }
 
   // Setup unique distinct aggregate children.
-  val distinctAggChildren = 
distinctAggGroups.keySet.flatten.toSeq.distinct
+  val distinctAggChildren = distinctAggGroups.keySet.flatten.toSeq
--- End diff --

This is not related to this pr though, I dropped `.distinct` because it 
does nothing (`keySet.flatten` is already a set)?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21443: [SPARK-24369][SQL] Correct handling for multiple ...

2018-05-28 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21443#discussion_r191306423
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala ---
@@ -687,4 +687,12 @@ class DataFrameAggregateSuite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  test("SPARK-24369 multiple distinct aggregations having the same 
argument set") {
+val df = sql(
+  s"""SELECT corr(DISTINCT x, y), corr(DISTINCT y, x), count(*)
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocalPropert...

2018-05-28 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21437
  
cc @ueshin @HyukjinKwon @BryanCutler 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18717: [SPARK-21510] [SQL] Add isMaterialized() and eager persi...

2018-05-28 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/18717
  
The target of this ticket is 2.4?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21438: [SPARK-24398] [SQL] Improve SQLAppStatusListener.aggrega...

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21438
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21438: [SPARK-24398] [SQL] Improve SQLAppStatusListener.aggrega...

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21438
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91234/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21438: [SPARK-24398] [SQL] Improve SQLAppStatusListener.aggrega...

2018-05-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21438
  
**[Test build #91234 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91234/testReport)**
 for PR 21438 at commit 
[`eb87d2d`](https://github.com/apache/spark/commit/eb87d2d595374f3325a91ac53f0c11bff2b978e7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21443: [SPARK-24369][SQL] Correct handling for multiple distinc...

2018-05-28 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21443
  
cc @hvanhovell 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21443: [SPARK-24369][SQL] Correct handling for multiple ...

2018-05-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21443#discussion_r191302814
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala
 ---
@@ -151,7 +152,7 @@ object RewriteDistinctAggregates extends 
Rule[LogicalPlan] {
   }
 
   // Setup unique distinct aggregate children.
-  val distinctAggChildren = 
distinctAggGroups.keySet.flatten.toSeq.distinct
+  val distinctAggChildren = distinctAggGroups.keySet.flatten.toSeq
--- End diff --

Is this needed?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21445: [SPARK-24404][SS] Increase currentEpoch when meet a Epoc...

2018-05-28 Thread LiangchangZ
Github user LiangchangZ commented on the issue:

https://github.com/apache/spark/pull/21445
  
> Looks like the patch is needed only with #21353 #21332 #21293 as of now, 
right? If then please 
> state the condition in JIRA issue description as well as PR's description 
so that we don't get confused

@HeartSaVioR yes, I have updated JIRA issue description as well as PR's 
description, sorry for the confusion.
> Please note that I'm commenting on top of current implementation, not 
considering #21353 #21332 #21293 

Got it, thanks for reply


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21409: [SPARK-24365][SQL] Add Data Source write benchmar...

2018-05-28 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21409#discussion_r191302688
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceWriteBenchmark.scala
 ---
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.SparkConf
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.util.Benchmark
+
+/**
+ * Benchmark to measure data source write performance.
+ * By default it measures 4 data source format: Parquet, ORC, JSON, CSV:
+ *  spark-submit --class  
+ * To measure specified formats, run it with arguments:
+ *  spark-submit --class   format1 
[format2] [...]
+ */
+object DataSourceWriteBenchmark {
+  val conf = new SparkConf()
+.setAppName("DataSourceWriteBenchmark")
+.setIfMissing("spark.master", "local[1]")
+.set("spark.sql.parquet.compression.codec", "snappy")
+.set("spark.sql.orc.compression.codec", "snappy")
+
+  val spark = SparkSession.builder.config(conf).getOrCreate()
+
+  // Set default configs. Individual cases will change them if necessary.
+  spark.conf.set(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true")
+
+  val tempTable = "temp"
+  val numRows = 1024 * 1024 * 15
+
+  def withTempTable(tableNames: String*)(f: => Unit): Unit = {
+try f finally tableNames.foreach(spark.catalog.dropTempView)
+  }
+
+  def withTable(tableNames: String*)(f: => Unit): Unit = {
+try f finally {
+  tableNames.foreach { name =>
+spark.sql(s"DROP TABLE IF EXISTS $name")
+  }
+}
+  }
+
+  def writeInt(table: String, format: String, benchmark: Benchmark): Unit 
= {
+spark.sql(s"create table $table(c1 INT, c2 STRING) using $format")
--- End diff --

Here I am sure if we need to compare all numeric types: ByteType, 
ShortType, IntegerType, LongType, FloatType, DoubleType


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21446: [SPARK-19613][SS][TEST] Random.nextString is not safe fo...

2018-05-28 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/21446
  
Thank you for review and merging, @HyukjinKwon . Thank you all!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21443: [SPARK-24369][SQL] Correct handling for multiple ...

2018-05-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21443#discussion_r191302155
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala ---
@@ -687,4 +687,12 @@ class DataFrameAggregateSuite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  test("SPARK-24369 multiple distinct aggregations having the same 
argument set") {
+val df = sql(
+  s"""SELECT corr(DISTINCT x, y), corr(DISTINCT y, x), count(*)
--- End diff --

Move it to SQLQueryTestSuite?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21409: [SPARK-24365][SQL] Add Data Source write benchmar...

2018-05-28 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21409#discussion_r191302141
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceWriteBenchmark.scala
 ---
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.SparkConf
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.util.Benchmark
+
+/**
+ * Benchmark to measure data source write performance.
+ * By default it measures 4 data source format: Parquet, ORC, JSON, CSV:
+ *  spark-submit --class  
+ * To measure specified formats, run it with arguments:
+ *  spark-submit --class   format1 
[format2] [...]
+ */
+object DataSourceWriteBenchmark {
+  val conf = new SparkConf()
+.setAppName("DataSourceWriteBenchmark")
+.setIfMissing("spark.master", "local[1]")
+.set("spark.sql.parquet.compression.codec", "snappy")
+.set("spark.sql.orc.compression.codec", "snappy")
+
+  val spark = SparkSession.builder.config(conf).getOrCreate()
+
+  // Set default configs. Individual cases will change them if necessary.
+  spark.conf.set(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true")
+
+  val tempTable = "temp"
+  val numRows = 1024 * 1024 * 15
+
+  def withTempTable(tableNames: String*)(f: => Unit): Unit = {
+try f finally tableNames.foreach(spark.catalog.dropTempView)
+  }
+
+  def withTable(tableNames: String*)(f: => Unit): Unit = {
+try f finally {
+  tableNames.foreach { name =>
+spark.sql(s"DROP TABLE IF EXISTS $name")
+  }
+}
+  }
+
+  def writeInt(table: String, format: String, benchmark: Benchmark): Unit 
= {
+spark.sql(s"create table $table(c1 INT, c2 STRING) using $format")
+benchmark.addCase("Output Single Int Column") { _ =>
+  spark.sql(s"INSERT overwrite table $table select cast(id as INT) as 
" +
+s"c1, cast(id as STRING) as c2 from $tempTable")
+}
+  }
+
+  def writeIntString(table: String, format: String, benchmark: Benchmark): 
Unit = {
+spark.sql(s"create table $table(c1 INT, c2 STRING) using $format")
+benchmark.addCase("Output Int and String Column") { _ =>
+  spark.sql(s"INSERT overwrite table $table select cast(id as INT) as 
" +
+s"c1, cast(id as STRING) as c2 from $tempTable")
+}
+  }
+
+  def writePartition(table: String, format: String, benchmark: Benchmark): 
Unit = {
+spark.sql(s"create table $table(p INT, id INT) using $format 
PARTITIONED BY (p)")
+benchmark.addCase("Output Partitions") { _ =>
+  spark.sql(s"INSERT overwrite table $table select cast(id as INT) as 
id," +
+s" cast(id % 2 as INT) as p from $tempTable")
+}
+  }
+
+  def writeBucket(table: String, format: String, benchmark: Benchmark): 
Unit = {
+spark.sql(s"create table $table(c1 INT, c2 INT) using $format 
CLUSTERED BY (c2) INTO 2 BUCKETS")
+benchmark.addCase("Output Buckets") { _ =>
+  spark.sql(s"INSERT overwrite table $table select cast(id as INT) as 
" +
+s"c1, cast(id as INT) as c2 from $tempTable")
+}
+  }
+
+  def main(args: Array[String]): Unit = {
+val tableInt = "tableInt"
+val tableIntString = "tableIntString"
+val tablePartition = "tablePartition"
+val tableBucket = "tableBucket"
+// If the
+val formats: Seq[String] = if (args.isEmpty) {
+  Seq("Parquet", "ORC", "JSON", "CSV")
+} else {
+  args
+}
+/*
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
+Parquet writer benchmark:Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative
+

+Output Single Int 

[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-05-28 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r191299100
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -523,6 +523,8 @@ case class JsonToStructs(
   // can generate incorrect files if values are missing in columns 
declared as non-nullable.
   val nullableSchema = if (forceNullableSchema) schema.asNullable else 
schema
 
+  val unpackArray: Boolean = 
options.get("unpackArray").map(_.toBoolean).getOrElse(false)
--- End diff --

If we add this new option here, I feel we'd be better to document somewhere 
(e.g., `sq/functions.scala`)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-05-28 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r191298921
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -523,6 +523,8 @@ case class JsonToStructs(
   // can generate incorrect files if values are missing in columns 
declared as non-nullable.
   val nullableSchema = if (forceNullableSchema) schema.asNullable else 
schema
 
+  val unpackArray: Boolean = 
options.get("unpackArray").map(_.toBoolean).getOrElse(false)
--- End diff --

Can you make the option `unpackArray` case-insensitive?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-05-28 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r191298844
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -523,6 +523,8 @@ case class JsonToStructs(
   // can generate incorrect files if values are missing in columns 
declared as non-nullable.
   val nullableSchema = if (forceNullableSchema) schema.asNullable else 
schema
 
+  val unpackArray: Boolean = 
options.get("unpackArray").map(_.toBoolean).getOrElse(false)
--- End diff --

private? (This is not related to this pr though, `nullableSchema` also can 
be private?)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21439: [SPARK-24391][SQL] Support arrays of any types by from_j...

2018-05-28 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21439
  
Can we also accept primitive arrays in `to_json`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21409: [SPARK-24365][SQL] Add Data Source write benchmar...

2018-05-28 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21409#discussion_r191297180
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceWriteBenchmark.scala
 ---
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.SparkConf
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.util.Benchmark
+
+/**
+ * Benchmark to measure data source write performance.
+ * By default it measures 4 data source format: Parquet, ORC, JSON, CSV:
+ *  spark-submit --class  
+ * To measure specified formats, run it with arguments:
+ *  spark-submit --class   format1 
[format2] [...]
+ */
+object DataSourceWriteBenchmark {
+  val conf = new SparkConf()
+.setAppName("DataSourceWriteBenchmark")
+.setIfMissing("spark.master", "local[1]")
+.set("spark.sql.parquet.compression.codec", "snappy")
+.set("spark.sql.orc.compression.codec", "snappy")
+
+  val spark = SparkSession.builder.config(conf).getOrCreate()
+
+  // Set default configs. Individual cases will change them if necessary.
+  spark.conf.set(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true")
+
+  val tempTable = "temp"
+  val numRows = 1024 * 1024 * 15
+
+  def withTempTable(tableNames: String*)(f: => Unit): Unit = {
+try f finally tableNames.foreach(spark.catalog.dropTempView)
+  }
+
+  def withTable(tableNames: String*)(f: => Unit): Unit = {
+try f finally {
+  tableNames.foreach { name =>
+spark.sql(s"DROP TABLE IF EXISTS $name")
+  }
+}
+  }
+
+  def writeInt(table: String, format: String, benchmark: Benchmark): Unit 
= {
+spark.sql(s"create table $table(c1 INT, c2 STRING) using $format")
+benchmark.addCase("Output Single Int Column") { _ =>
+  spark.sql(s"INSERT overwrite table $table select cast(id as INT) as 
" +
+s"c1, cast(id as STRING) as c2 from $tempTable")
+}
+  }
+
+  def writeIntString(table: String, format: String, benchmark: Benchmark): 
Unit = {
+spark.sql(s"create table $table(c1 INT, c2 STRING) using $format")
+benchmark.addCase("Output Int and String Column") { _ =>
+  spark.sql(s"INSERT overwrite table $table select cast(id as INT) as 
" +
+s"c1, cast(id as STRING) as c2 from $tempTable")
+}
+  }
+
+  def writePartition(table: String, format: String, benchmark: Benchmark): 
Unit = {
+spark.sql(s"create table $table(p INT, id INT) using $format 
PARTITIONED BY (p)")
+benchmark.addCase("Output Partitions") { _ =>
+  spark.sql(s"INSERT overwrite table $table select cast(id as INT) as 
id," +
+s" cast(id % 2 as INT) as p from $tempTable")
+}
+  }
+
+  def writeBucket(table: String, format: String, benchmark: Benchmark): 
Unit = {
+spark.sql(s"create table $table(c1 INT, c2 INT) using $format 
CLUSTERED BY (c2) INTO 2 BUCKETS")
+benchmark.addCase("Output Buckets") { _ =>
+  spark.sql(s"INSERT overwrite table $table select cast(id as INT) as 
" +
+s"c1, cast(id as INT) as c2 from $tempTable")
+}
+  }
+
+  def main(args: Array[String]): Unit = {
+val tableInt = "tableInt"
+val tableIntString = "tableIntString"
+val tablePartition = "tablePartition"
+val tableBucket = "tableBucket"
+// If the
+val formats: Seq[String] = if (args.isEmpty) {
+  Seq("Parquet", "ORC", "JSON", "CSV")
+} else {
+  args
+}
+/*
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
+Parquet writer benchmark:Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative
+

+Output Single Int Column 

[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-28 Thread ssuchter
Github user ssuchter commented on the issue:

https://github.com/apache/spark/pull/20697
  
I'll work on Matt's comments from Friday next.

Here's the output (after the bugfix) from running against mainline:

```
MBP:~/src/ssuchter-spark% git remote get-url origin
https://github.com/ssuchter/spark.git

MBP:~/src/ssuchter-spark% git remote get-url upstream
git://github.com/apache/spark.git

1d8a265d13 (HEAD -> ssuchter-k8s-integration-tests, 
origin/ssuchter-k8s-integration-tests) Fix a bug in KubernetesTestComponents - 
don't an an empty string for zero arguments
65347b319a Merge branch 'ssuchter-k8s-integration-tests' of 
https://github.com/ssuchter/spark into ssuchter-k8s-integration-tests
1a531abcf6 Remove unused code relating to Kerberos, which doesn't belong in 
this PR
3ba6ffb5f2 Remove e2e-prow.sh, which isn't appropriate for this PR
9e64f43b62 Remove unnecessary cloning and building code for the Spark repo
e6bd56325d Update README.md excluding cloning and building logic
e70f3bea3d Remove K8s cloud-based backend testing support from this PR
a0023b2f33 Remove config options that were only used during repo clone 
process
e55b8a723e Remove repository cloning behavior and allow script to be called 
from other directories
9b0eede244 Fixes for scala style
f29679ef56 Ignore dist/ for style checks
3615953bea Fix scala style issues
bef586f740 Remove LICENSE and copy of mvn wrapper script. Rewrite path for 
calling mvn wrapper script.
81c7a66ad6 Make k8s integration tests build when top-level kubernetes 
profile selected
365d6bc65d Initial checkin of k8s integration tests. These tests were 
developed in the https://github.com/apache-spark-on-k8s/spark-integration repo 
by several contributors. This is a copy of the current state into the main 
apache spark repo. The only changes from the current spark-integration repo 
state are: * Move the files from the repo root into 
resource-managers/kubernetes/integration-tests * Add a reference to these tests 
in the root README.md * Fix a path reference in 
dev/dev-run-integration-tests.sh * Add a TODO in include/util.sh
dbce275784 Remove unused code relating to Kerberos, which doesn't belong in 
this PR
5ffa464c65 Remove e2e-prow.sh, which isn't appropriate for this PR
13721f69a2 Remove unnecessary cloning and building code for the Spark repo
ba720733fa Update README.md excluding cloning and building logic
1b1528a504 (upstream/master, origin/master, origin/HEAD, master) 
[SPARK-24366][SQL] Improving of error messages for type converting

MBP:~/src/ssuchter-spark% echo $REVISION
1d8a265d13
MBP:~/src/ssuchter-spark% echo $DATE
20180528

MBP:~/src/ssuchter-spark% ./dev/make-distribution.sh --name 
${DATE}-${REVISION} --tgz -DzincPort=${ZINC_PORT} -Phadoop-2.7 -Pkubernetes 
-Pkinesis-asl -Phive -Phive-thriftserver
+++ dirname ./dev/make-distribution.sh
++ cd ./dev/..
++ pwd
+ SPARK_HOME=/Users/ssuchter/src/ssuchter-spark
+ DISTDIR=/Users/ssuchter/src/ssuchter-spark/dist
+ MAKE_TGZ=false
+ MAKE_PIP=false
+ MAKE_R=false
…
+ TARDIR_NAME=spark-2.4.0-SNAPSHOT-bin-20180528-1d8a265d13
+ 
TARDIR=/Users/ssuchter/src/ssuchter-spark/spark-2.4.0-SNAPSHOT-bin-20180528-1d8a265d13
+ rm -rf 
/Users/ssuchter/src/ssuchter-spark/spark-2.4.0-SNAPSHOT-bin-20180528-1d8a265d13
+ cp -r /Users/ssuchter/src/ssuchter-spark/dist 
/Users/ssuchter/src/ssuchter-spark/spark-2.4.0-SNAPSHOT-bin-20180528-1d8a265d13
+ tar czf spark-2.4.0-SNAPSHOT-bin-20180528-1d8a265d13.tgz -C 
/Users/ssuchter/src/ssuchter-spark spark-2.4.0-SNAPSHOT-bin-20180528-1d8a265d13
+ rm -rf 
/Users/ssuchter/src/ssuchter-spark/spark-2.4.0-SNAPSHOT-bin-20180528-1d8a265d13

MBP:~/src/ssuchter-spark% 
./resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh
 --spark-tgz 
~/src/ssuchter-spark/spark-2.4.0-SNAPSHOT-bin-20180528-1d8a265d13.tgz
Using `mvn` from path: /usr/local/bin/mvn
[INFO] Scanning for projects...
[INFO]
[INFO] 

[INFO] Building Spark Project Kubernetes Integration Tests 2.4.0-SNAPSHOT
[INFO] 

[INFO]
[INFO] --- maven-enforcer-plugin:3.0.0-M1:enforce (enforce-versions) @ 
spark-kubernetes-integration-tests_2.11 ---
…
Successfully tagged kubespark/spark:68388D3B-6FAC-4E59-8AED-8604AA437C2D

/Users/ssuchter/src/ssuchter-spark/resource-managers/kubernetes/integration-tests
[INFO]
[INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @ 
spark-kubernetes-integration-tests_2.11 ---
Discovery starting.
Discovery completed in 118 milliseconds.
Run starting. Expected test count is: 1
KubernetesSuite:
- 

[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...

2018-05-28 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/21370
  
@viirya @gatorsmile @ueshin @felixcheung @HyukjinKwon 
The refactor about generating html code out of `Dataset.scala` was done in 
94f3414. Please help to check whether it is appropriate when you have time. 
Thanks!

@rdblue @rxin 
The lastest commit also include the logic of using 
`spark.sql.repl.eagerEval.enabled` both control \_\_repr\_\_ and 
\_repr\_html\_. Please have a look when you have time. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21426
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3654/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21426
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3653/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21445: [SPARK-24404][SS] Increase currentEpoch when meet a Epoc...

2018-05-28 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/21445
  
```
Looks like the patch is needed only with #21353 #21332 #21293 as of now, 
right?
```
@HeartSaVioR Yes, sorry for the late explanation. The background is we are 
running POC based on #21353 #21332 #21293 and the latest master, including the 
work of queue rdd reader/writer by @jose-torres. Greatly thanks for the work of 
#21239, we can complete all status operation after fix this bug. So we think we 
should report this to let you know.

```
Please note that I'm commenting on top of current implementation, not 
considering #21353 #21332 #21293.
```
Got it, owing to some pressure within internal requirement for CP, we 
running over these 3 patches, but we'll follow closely with all your works and 
hope to contribute into CP.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...

2018-05-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21426
  
**[Test build #91240 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91240/testReport)**
 for PR 21426 at commit 
[`f015e0d`](https://github.com/apache/spark/commit/f015e0d587c8d9f8cd359fecc325a19362a59c55).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-05-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13599
  
**[Test build #91239 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91239/testReport)**
 for PR 13599 at commit 
[`44500fc`](https://github.com/apache/spark/commit/44500fc0d66bd930cc12ba6b66985e08f61d9ecc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21420: [SPARK-24377][Spark Submit] make --py-files work ...

2018-05-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21420


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

2018-05-28 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/21420
  
Thanks @HyukjinKwon !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

2018-05-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21420
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-05-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/13599
  
(Oops, the test failure was legitimate.)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocalPropert...

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21437
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-05-28 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/13599
  
Actually lets also loop in @ifilonenko who's been thinking about similar 
things but with more of a K8s bent.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocalPropert...

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21437
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91233/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocalPropert...

2018-05-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21437
  
**[Test build #91233 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91233/testReport)**
 for PR 21437 at commit 
[`b9d8dd3`](https://github.com/apache/spark/commit/b9d8dd304ed3d172a2e44919103e9500893fc829).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-05-28 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/13599
  
It's certainly closer, I haven't had a chance to take a look super recently 
(been focused on the PySpark K8s integration). I'm still hesitant about this 
merged as-is from a skim through, but maybe at Spark Summit SF (or a hangout 
call the day after we can all schedule) this would make sense to try and get a 
better grasp on.

Sorry I haven't had the time this month to take much of a look.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21446: [SPARK-19613][SS][TEST] Random.nextString is not ...

2018-05-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21446


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21444: Branch 2.3

2018-05-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21444
  
@mozammal mind closing this please?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21446: [SPARK-19613][SS][TEST] Random.nextString is not safe fo...

2018-05-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21446
  
Merged to master and branch-2.3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-05-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13599
  
**[Test build #91238 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91238/testReport)**
 for PR 13599 at commit 
[`d9a5f00`](https://github.com/apache/spark/commit/d9a5f005bd6e411326963f8b87fe162603830b5c).
 * This patch **fails Java style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class VirtualEnvFactory(pythonExec: String, conf: SparkConf, isDriver: 
Boolean)`
  * `  class DriverEndpoint(override val rpcEnv: RpcEnv)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91238/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3652/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-05-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13599
  
**[Test build #91238 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91238/testReport)**
 for PR 13599 at commit 
[`d9a5f00`](https://github.com/apache/spark/commit/d9a5f005bd6e411326963f8b87fe162603830b5c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21446: [SPARK-19613][SS][TEST] Random.nextString is not safe fo...

2018-05-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21446
  
Yea, I was facing this problem too. Thanks for fixing this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-05-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/13599
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21409: [SPARK-24365][SQL] Add Data Source write benchmark

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21409
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3651/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21409: [SPARK-24365][SQL] Add Data Source write benchmark

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21409
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3522/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3522/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3650/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21409: [SPARK-24365][SQL] Add Parquet write benchmark

2018-05-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21409
  
**[Test build #91237 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91237/testReport)**
 for PR 21409 at commit 
[`8ffba61`](https://github.com/apache/spark/commit/8ffba61a3ebd6e06eec2fdf03e19a65cb5b40787).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91236/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
**[Test build #91236 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91236/testReport)**
 for PR 20697 at commit 
[`1d8a265`](https://github.com/apache/spark/commit/1d8a265d13b65dcec8db11a5be09d4a029037d2c).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91235/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3649/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-05-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13599
  
**[Test build #91235 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91235/testReport)**
 for PR 13599 at commit 
[`d9a5f00`](https://github.com/apache/spark/commit/d9a5f005bd6e411326963f8b87fe162603830b5c).
 * This patch **fails Java style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class VirtualEnvFactory(pythonExec: String, conf: SparkConf, isDriver: 
Boolean)`
  * `  class DriverEndpoint(override val rpcEnv: RpcEnv)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
**[Test build #91236 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91236/testReport)**
 for PR 20697 at commit 
[`1d8a265`](https://github.com/apache/spark/commit/1d8a265d13b65dcec8db11a5be09d4a029037d2c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-28 Thread ssuchter
Github user ssuchter commented on the issue:

https://github.com/apache/spark/pull/20697
  
Fixed the bug. @mccheah I'd appreciate your eyes on commit 1d8a265, for 
both correctness and style. (I haven't used Scala before this project, so I'm 
very not confidence in the best way to do things.)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-05-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13599
  
**[Test build #91235 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91235/testReport)**
 for PR 13599 at commit 
[`d9a5f00`](https://github.com/apache/spark/commit/d9a5f005bd6e411326963f8b87fe162603830b5c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21438: [SPARK-24398] [SQL] Improve SQLAppStatusListener.aggrega...

2018-05-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21438
  
**[Test build #91234 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91234/testReport)**
 for PR 21438 at commit 
[`eb87d2d`](https://github.com/apache/spark/commit/eb87d2d595374f3325a91ac53f0c11bff2b978e7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21438: [SPARK-24398] [SQL] Improve SQLAppStatusListener....

2018-05-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21438#discussion_r191285757
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala
 ---
@@ -159,7 +159,7 @@ class SQLAppStatusListener(
   }
 
   private def aggregateMetrics(exec: LiveExecutionData): Map[Long, String] 
= {
-val metricIds = exec.metrics.map(_.accumulatorId).sorted
+val metricIds = exec.metrics.map(_.accumulatorId).toSet
--- End diff --

I think we can get rid of `metricIds `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21438: [SPARK-24398] [SQL] Improve SQLAppStatusListener.aggrega...

2018-05-28 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21438
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21446: [SPARK-19613][SS][TEST] Random.nextString is not safe fo...

2018-05-28 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/21446
  
Thank you for reviewing, @felixcheung and @HeartSaVioR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

2018-05-28 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21288#discussion_r191283013
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala
 ---
@@ -131,211 +132,214 @@ object FilterPushdownBenchmark {
 }
 
 /*
+OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 
4.14.26-46.32.amzn1.x86_64
 Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 Select 0 string row (value IS NULL): Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative
 

-Parquet Vectorized8452 / 8504  1.9 
537.3   1.0X
-Parquet Vectorized (Pushdown)  274 /  281 57.3 
 17.4  30.8X
-Native ORC Vectorized 8167 / 8185  1.9 
519.3   1.0X
-Native ORC Vectorized (Pushdown)   365 /  379 43.1 
 23.2  23.1X
+Parquet Vectorized2961 / 3123  5.3 
188.3   1.0X
+Parquet Vectorized (Pushdown) 3057 / 3121  5.1 
194.4   1.0X
--- End diff --

yea, I thinks so. But, not sure. I tried to run multiple times though, I 
didn't get the old performance values...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21378: [SPARK-24326][Mesos] add support for local:// sch...

2018-05-28 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21378#discussion_r191271894
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala
 ---
@@ -418,17 +417,34 @@ private[spark] class MesosClusterScheduler(
 envBuilder.build()
   }
 
+  private def isContainerLocalAppJar(desc: MesosDriverDescription): 
Boolean = {
+val isLocalJar = desc.jarUrl.startsWith("local://")
+val isContainerLocal = 
desc.conf.getOption("spark.mesos.appJar.local.resolution.mode").exists {
--- End diff --

interesting! 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >