date:20180120

[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20330
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20330
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86417/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...

2018-01-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20330
  
**[Test build #86417 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86417/testReport)**
 for PR 20330 at commit 
[`c733ac9`](https://github.com/apache/spark/commit/c733ac90c29b54c52142f787fbeb91648d8dc698).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20120: [SPARK-22926] [SQL] Respect table-level conf comp...

2018-01-20 Thread gatorsmile

Github user gatorsmile closed the pull request at:

https://github.com/apache/spark/pull/20120


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compres...

2018-01-20 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20087


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2018-01-20 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20087
  
LGTM

Thanks! Merged to master/2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20336: [SPARK-23165][DOC] Spelling mistake fix in quick-...

2018-01-20 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20336


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20336: [SPARK-23165][DOC] Spelling mistake fix in quick-start d...

2018-01-20 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20336
  
LGTM

Thanks! Merged to master/2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20336: [SPARK-23165][DOC] Spelling mistake fix in quick-start d...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20336
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86419/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20336: [SPARK-23165][DOC] Spelling mistake fix in quick-start d...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20336
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20336: [SPARK-23165][DOC] Spelling mistake fix in quick-start d...

2018-01-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20336
  
**[Test build #86419 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86419/testReport)**
 for PR 20336 at commit 
[`a7471e4`](https://github.com/apache/spark/commit/a7471e4acb7d8967fef37a8055e9b329dfbbee04).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION should chan...

2018-01-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20249
  
**[Test build #86420 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86420/testReport)**
 for PR 20249 at commit 
[`90c4980`](https://github.com/apache/spark/commit/90c49809886e2f487dc4c4dc6ba45aa16bae8933).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20087
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86415/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20087
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2018-01-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20087
  
**[Test build #86415 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86415/testReport)**
 for PR 20087 at commit 
[`118f788`](https://github.com/apache/spark/commit/118f7880bdcf26ba7394a2cc7fac2e0eae707d6f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION should chan...

2018-01-20 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20249
  
add to whitelist


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION should chan...

2018-01-20 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20249
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20336: [SPARK-23165][DOC] Spelling mistake fix in quick-start d...

2018-01-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20336
  
**[Test build #86419 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86419/testReport)**
 for PR 20336 at commit 
[`a7471e4`](https://github.com/apache/spark/commit/a7471e4acb7d8967fef37a8055e9b329dfbbee04).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20336: [SPARK-23165][DOC] Spelling mistake fix in quick-start d...

2018-01-20 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20336
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20333: [SPARK-23087][SQL] CheckCartesianProduct too restrictive...

2018-01-20 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20333
  
LGTM except one minor comment.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20333: [SPARK-23087][SQL] CheckCartesianProduct too rest...

2018-01-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20333#discussion_r162794983
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala ---
@@ -274,4 +274,18 @@ class DataFrameJoinSuite extends QueryTest with 
SharedSQLContext {
 checkAnswer(innerJoin, Row(1) :: Nil)
   }
 
+  test("SPARK-23087: don't throw Analysis Exception in 
CheckCartesianProduct when join condition " +
+"is false or null") {
+val df = spark.range(10)
--- End diff --

> `withSQLConf(CROSS_JOINS_ENABLED.key -> "true") {`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20333: [SPARK-23087][SQL] CheckCartesianProduct too rest...

2018-01-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20333#discussion_r162794939
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1108,15 +1108,19 @@ object CheckCartesianProducts extends 
Rule[LogicalPlan] with PredicateHelper {
*/
   def isCartesianProduct(join: Join): Boolean = {
 val conditions = 
join.condition.map(splitConjunctivePredicates).getOrElse(Nil)
-!conditions.map(_.references).exists(refs => 
refs.exists(join.left.outputSet.contains)
-&& refs.exists(join.right.outputSet.contains))
+
+conditions match {
+  case Seq(Literal.FalseLiteral) | Seq(Literal(null, BooleanType)) => 
false
+  case _ => !conditions.map(_.references).exists(refs =>
+refs.exists(join.left.outputSet.contains) && 
refs.exists(join.right.outputSet.contains))
+}
   }
 
   def apply(plan: LogicalPlan): LogicalPlan =
 if (SQLConf.get.crossJoinEnabled) {
   plan
 } else plan transform {
-  case j @ Join(left, right, Inner | LeftOuter | RightOuter | 
FullOuter, condition)
+  case j @ Join(left, right, Inner | LeftOuter | RightOuter | 
FullOuter, _)
--- End diff --

Yeah. For outer join, it makes sense to remove this check


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20333: [SPARK-23087][SQL] CheckCartesianProduct too restrictive...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20333
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/59/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20333: [SPARK-23087][SQL] CheckCartesianProduct too restrictive...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20333
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20333: [SPARK-23087][SQL] CheckCartesianProduct too restrictive...

2018-01-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20333
  
**[Test build #86418 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86418/testReport)**
 for PR 20333 at commit 
[`a4a6ac8`](https://github.com/apache/spark/commit/a4a6ac89e44c743a0471b01a0c499accec71cf73).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20333: [SPARK-23087][SQL] CheckCartesianProduct too rest...

2018-01-20 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20333#discussion_r162793942
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1108,15 +1108,19 @@ object CheckCartesianProducts extends 
Rule[LogicalPlan] with PredicateHelper {
*/
   def isCartesianProduct(join: Join): Boolean = {
 val conditions = 
join.condition.map(splitConjunctivePredicates).getOrElse(Nil)
-!conditions.map(_.references).exists(refs => 
refs.exists(join.left.outputSet.contains)
-&& refs.exists(join.right.outputSet.contains))
+
+conditions match {
+  case Seq(Literal.FalseLiteral) | Seq(Literal(null, BooleanType)) => 
false
+  case _ => !conditions.map(_.references).exists(refs =>
+refs.exists(join.left.outputSet.contains) && 
refs.exists(join.right.outputSet.contains))
+}
   }
 
   def apply(plan: LogicalPlan): LogicalPlan =
 if (SQLConf.get.crossJoinEnabled) {
   plan
 } else plan transform {
-  case j @ Join(left, right, Inner | LeftOuter | RightOuter | 
FullOuter, condition)
+  case j @ Join(left, right, Inner | LeftOuter | RightOuter | 
FullOuter, _)
--- End diff --

why are you saying that the size of the result set is the same?
If you have a relation A (of size n, let's say 1M rows) in outer join with 
a relation B (of size m, let's say 1M rows). If the condition is true, the 
output relation is 1M * 1M (ie. (n * m)); if the condition is false, the result 
is 1M (n) for a left join, 1M (m) for a right join, 1M + 1M (m +n) for a full 
outer join. Therefore the size is not the same at all. But maybe you meant 
something different, am I missing something?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...

2018-01-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20330
  
**[Test build #86417 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86417/testReport)**
 for PR 20330 at commit 
[`c733ac9`](https://github.com/apache/spark/commit/c733ac90c29b54c52142f787fbeb91648d8dc698).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20330: [SPARK-23121][core] Fix for ui becoming unaccessi...

2018-01-20 Thread smurakozi

Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20330#discussion_r162792383
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ---
@@ -1002,4 +1000,12 @@ private object ApiHelper {
 }
   }
 
+  def lastStageNameAndDescription(store: AppStatusStore, job: JobData): 
(String, String) = {
+store.asOption(store.lastStageAttempt(job.stageIds.max)) match {
+  case Some(lastStageAttempt) =>
+(lastStageAttempt.name, 
lastStageAttempt.description.getOrElse(job.name))
+  case None => ("", "")
--- End diff --

Fixed, thanks for catching.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20208
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86414/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20208
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20208
  
**[Test build #86414 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86414/testReport)**
 for PR 20208 at commit 
[`29c281d`](https://github.com/apache/spark/commit/29c281dbe3c6f63614d9abc286c68e283786649b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...

2018-01-20 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20331
  
cc @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20331
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20331
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86412/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...

2018-01-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20331
  
**[Test build #86412 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86412/testReport)**
 for PR 20331 at commit 
[`9c85b18`](https://github.com/apache/spark/commit/9c85b18c059e4ab3b4b25a5b2e414b4f0c67072f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20208
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20208
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86413/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20208
  
**[Test build #86413 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86413/testReport)**
 for PR 20208 at commit 
[`e1d6f2a`](https://github.com/apache/spark/commit/e1d6f2a5ba0cae28b0ce4ed3612429a593828c0f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20325: [SPARK-22808][DOCS] add insertInto when save hive...

2018-01-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20325#discussion_r162790779
  
--- Diff: docs/sql-programming-guide.md ---
@@ -580,6 +580,9 @@ default local Hive metastore (using Derby) for you. 
Unlike the `createOrReplaceT
 Hive metastore. Persistent tables will still exist even after your Spark 
program has restarted, as
 long as you maintain your connection to the same metastore. A DataFrame 
for a persistent table can
 be created by calling the `table` method on a `SparkSession` with the name 
of the table.
+Notice that for `DataFrames` is built on Hive table, `insertInto` should 
be used instead of `saveAsTable`.
--- End diff --

This limitation is lifted in Spark 2.2. See 
https://issues.apache.org/jira/browse/SPARK-19152


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19992: [SPARK-22805][CORE] Use StorageLevel aliases in event lo...

2018-01-20 Thread superbobry

Github user superbobry commented on the issue:

https://github.com/apache/spark/pull/19992
  
@squito I think it's fine to just close the PR/JIRA issue.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20336: [SPARK-23165][DOC] Spelling mistake fix in quick-start d...

2018-01-20 Thread ashashwat

Github user ashashwat commented on the issue:

https://github.com/apache/spark/pull/20336
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19993
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86416/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...

2018-01-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19993
  
**[Test build #86416 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86416/testReport)**
 for PR 19993 at commit 
[`d9d25b0`](https://github.com/apache/spark/commit/d9d25b0f0bcf365366c0c13daf882cbea86d3835).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19993
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2018-01-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20087
  
**[Test build #86415 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86415/testReport)**
 for PR 20087 at commit 
[`118f788`](https://github.com/apache/spark/commit/118f7880bdcf26ba7394a2cc7fac2e0eae707d6f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...

2018-01-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19993
  
**[Test build #86416 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86416/testReport)**
 for PR 19993 at commit 
[`d9d25b0`](https://github.com/apache/spark/commit/d9d25b0f0bcf365366c0c13daf882cbea86d3835).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19993
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19993
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/58/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2018-01-20 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20087
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20208
  
**[Test build #86414 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86414/testReport)**
 for PR 20208 at commit 
[`29c281d`](https://github.com/apache/spark/commit/29c281dbe3c6f63614d9abc286c68e283786649b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20208
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/57/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20208
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20208
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20208
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/56/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20208
  
**[Test build #86413 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86413/testReport)**
 for PR 20208 at commit 
[`e1d6f2a`](https://github.com/apache/spark/commit/e1d6f2a5ba0cae28b0ce4ed3612429a593828c0f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20208: [SPARK-23007][SQL][TEST] Add schema evolution tes...

2018-01-20 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20208#discussion_r162786270
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/SchemaEvolutionTest.scala
 ---
@@ -0,0 +1,436 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import java.io.File
+
+import org.apache.spark.sql.{QueryTest, Row}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.test.{SharedSQLContext, SQLTestUtils}
+
+/**
+ * Schema can evolve in several ways and the followings are supported in 
file-based data sources.
+ *
+ *   1. Add a column
+ *   2. Remove a column
+ *   3. Change a column position
+ *   4. Change a column type
+ *
+ * Here, we consider safe evolution without data loss. For example, data 
type evolution should be
+ * from small types to larger types like `int`-to-`long`, not vice versa.
+ *
+ * So far, file-based data sources have schema evolution coverages like 
the followings.
+ *
+ *   | File Format  | Coverage | Note  
 |
+ *   |  |  | 
-- |
+ *   | TEXT | N/A  | Schema consists of a single string 
column. |
+ *   | CSV  | 1, 2, 4  |   
 |
+ *   | JSON | 1, 2, 3, 4   |   
 |
+ *   | ORC  | 1, 2, 3, 4   | Native vectorized ORC reader has the 
widest coverage.  |
+ *   | PARQUET  | 1, 2, 3  |   
 |
+ *
+ * This aims to provide an explicit test coverage for schema evolution on 
file-based data sources.
+ * Since a file format has its own coverage of schema evolution, we need a 
test suite
+ * for each file-based data source with corresponding supported test case 
traits.
+ *
+ * The following is a hierarchy of test traits.
+ *
+ *   SchemaEvolutionTest
+ * -> AddColumnEvolutionTest
+ * -> RemoveColumnEvolutionTest
+ * -> ChangePositionEvolutionTest
+ * -> BooleanTypeEvolutionTest
+ * -> IntegralTypeEvolutionTest
+ * -> ToDoubleTypeEvolutionTest
+ * -> ToDecimalTypeEvolutionTest
+ */
+
+trait SchemaEvolutionTest extends QueryTest with SQLTestUtils with 
SharedSQLContext {
+  val format: String
+  val options: Map[String, String] = Map.empty[String, String]
+}
+
+/**
+ * Add column.
+ * This test suite assumes that the missing column should be `null`.
+ */
+trait AddColumnEvolutionTest extends SchemaEvolutionTest {
+  import testImplicits._
+
+  test("append column at the end") {
+withTempDir { dir =>
+  val path = dir.getCanonicalPath
+
+  val df1 = Seq("a", "b").toDF("col1")
+  val df2 = df1.withColumn("col2", lit("x"))
+  val df3 = df2.withColumn("col3", lit("y"))
+
+  val dir1 = s"$path${File.separator}part=one"
+  val dir2 = s"$path${File.separator}part=two"
+  val dir3 = s"$path${File.separator}part=three"
+
+  
df1.write.mode("overwrite").format(format).options(options).save(dir1)
+  
df2.write.mode("overwrite").format(format).options(options).save(dir2)
+  
df3.write.mode("overwrite").format(format).options(options).save(dir3)
+
+  val df = spark.read
+.schema(df3.schema)
+.format(format)
+.options(options)
+.load(path)
+
+  checkAnswer(df, Seq(
+Row("a", null, null, "one"),
+Row("b", null, null, "one"),
+Row("a", "x", null, "two"),
+Row("b", "x", null, "two"),
+Row("a", "x", "y", "three"),
+Row("b", "x", "y", "three")))
+}
+

[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...

2018-01-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20331
  
**[Test build #86412 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86412/testReport)**
 for PR 20331 at commit 
[`9c85b18`](https://github.com/apache/spark/commit/9c85b18c059e4ab3b4b25a5b2e414b4f0c67072f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20331
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20331
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/55/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-20 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20208
  
Thank you for review, @HyukjinKwon . I'll update like that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20204: [SPARK-7721][PYTHON][TESTS] Adds PySpark coverage genera...

2018-01-20 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20204
  
Will merge this one if there's no more comments in few days.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20204: [SPARK-7721][PYTHON][TESTS] Adds PySpark coverage...

2018-01-20 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20204#discussion_r162785336
  
--- Diff: python/run-tests-with-coverage ---
@@ -0,0 +1,69 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+set -o pipefail
+set -e
+
+# This variable indicates which coverage executable to run to combine 
coverages
+# and generate HTMLs, for example, 'coverage3' in Python 3.
+COV_EXEC="${COV_EXEC:-coverage}"
+FWDIR="$(cd "`dirname $0`"; pwd)"
+pushd "$FWDIR" > /dev/null
+
+# Ensure that coverage executable is installed.
+if ! hash $COV_EXEC 2>/dev/null; then
+  echo "Missing coverage executable in your path, skipping PySpark 
coverage"
+  exit 1
+fi
+
+# Set up the directories for coverage results.
+export COVERAGE_DIR="$FWDIR/test_coverage"
+rm -fr "$COVERAGE_DIR/coverage_data"
+rm -fr "$COVERAGE_DIR/htmlcov"
+mkdir -p "$COVERAGE_DIR/coverage_data"
+
+# Current directory are added in the python path so that it doesn't refer 
our built
+# pyspark zip library first.
+export PYTHONPATH="$FWDIR:$PYTHONPATH"
+# Also, our sitecustomize.py and coverage_daemon.py are included in the 
path.
+export PYTHONPATH="$COVERAGE_DIR:$PYTHONPATH"
+
+# We use 'spark.python.daemon.module' configuration to insert the coverage 
supported workers.
+export SPARK_CONF_DIR="$COVERAGE_DIR/conf"
+
+# This environment variable enables the coverage.
+export COVERAGE_PROCESS_START="$FWDIR/.coveragerc"
+
+# If you'd like to run a specific unittest class, you could do such as
+# SPARK_TESTING=1 ../bin/pyspark pyspark.sql.tests VectorizedUDFTests
+./run-tests "$@"
--- End diff --

Another tip is, if we use `../bin/pyspark` here, do some simple tests and 
then exit, it looks still producing the coverage correctly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20336: [SPARK-23165][DOC] Spelling mistake fix in quick-start d...

2018-01-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20336
  
**[Test build #4069 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4069/testReport)**
 for PR 20336 at commit 
[`785fccf`](https://github.com/apache/spark/commit/785fccff1c35f93fc479d460b527bbb6fcfc00a7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20336: [SPARK-23165][DOC] Spelling mistake fix in quick-start d...

2018-01-20 Thread ashashwat

Github user ashashwat commented on the issue:

https://github.com/apache/spark/pull/20336
  
@srowen Let me go ahead and do that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20336: [SPARK-23165][DOC] Spelling mistake fix in quick-start d...

2018-01-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20336
  
**[Test build #4069 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4069/testReport)**
 for PR 20336 at commit 
[`785fccf`](https://github.com/apache/spark/commit/785fccff1c35f93fc479d460b527bbb6fcfc00a7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20336: [SPARK-23165][DOC] Spelling mistake fix in quick-start d...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20336
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20336: [SPARK-23165][DOC] Spelling mistake fix in quick-start d...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20336
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20336: [SPARK-23165][DOC] Spelling mistake fix in quick-...

2018-01-20 Thread ashashwat

GitHub user ashashwat opened a pull request:

https://github.com/apache/spark/pull/20336

[SPARK-23165][DOC] Spelling mistake fix in quick-start doc.

## What changes were proposed in this pull request?

Fix spelling in quick-start doc.

## How was this patch tested?

Doc only.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ashashwat/spark SPARK-23165

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20336.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20336


commit 785fccff1c35f93fc479d460b527bbb6fcfc00a7
Author: Shashwat Anand 
Date:   2018-01-20T14:50:44Z

[SPARK-23165][DOC] Spelling mistake fix in quick-start doc.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20335: [SPARK-23088][CORE] History server not showing incomplet...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20335
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20335: [SPARK-23088][CORE] History server not showing incomplet...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20335
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20335: [SPARK-23088][CORE] History server not showing in...

2018-01-20 Thread pmackles

GitHub user pmackles opened a pull request:

https://github.com/apache/spark/pull/20335

[SPARK-23088][CORE] History server not showing incomplete/running 
applications

## What changes were proposed in this pull request?

History server not showing incomplete/running applications when 
spark.history.ui.maxApplications property is set to a value that is smaller 
than the total number of applications.

## How was this patch tested?

Verified manually against master and 2.2.2 branch. 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pmackles/spark SPARK-23088

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20335.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20335


commit d94042d81d9f982ee58aab1b3296d33b10d50a75
Author: Paul Mackles 
Date:   2018-01-20T13:53:46Z

[SPARK-23088][CORE] History server not showing incomplete/running 
applications




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20295: [WIP][SPARK-23011] Support alternative function form wit...

2018-01-20 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20295
  
To me, seems roughly fine.

> Alternatively, we can implement a new serialization protocol for 
GROUP_MAP eval type, i.e, instead of sending an arrow batch, we could send a 
group row and then an arrow batch.

I don't have a strong preference on this.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error...

2018-01-20 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18277#discussion_r162782791
  
--- Diff: python/pyspark/rdd.py ---
@@ -751,7 +751,7 @@ def func(iterator):
 
 def pipe_objs(out):
 for obj in iterator:
-s = str(obj).rstrip('\n') + '\n'
+s = unicode(obj).rstrip('\n') + '\n'
--- End diff --

@chaoslawful, if you are active, we could change `\n` to `u\n` to reduce 
the conversion and don't rely on the implicit conversion between `str` and 
`unicode`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2018-01-20 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18277
  
Let me merge this one in few days if there's no more comments.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20087
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20087
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86411/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2018-01-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20087
  
**[Test build #86411 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86411/testReport)**
 for PR 20087 at commit 
[`118f788`](https://github.com/apache/spark/commit/118f7880bdcf26ba7394a2cc7fac2e0eae707d6f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20334: How to check registered table name.

2018-01-20 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20334
  
Hey @AtulKumVerma, questions should go to mailing list usually. See 
http://spark.apache.org/community.html. I believe you can have a better answer 
from there.

Pull request from a branch to another branch actually causes a slight 
visual problem.

Mind closing this pull request please?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-20 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20208
  
cc @sameeragarwal for reviewing too. I vaguely remember we had a talk about 
this before. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20208: [SPARK-23007][SQL][TEST] Add schema evolution tes...

2018-01-20 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20208#discussion_r162781551
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/SchemaEvolutionTest.scala
 ---
@@ -0,0 +1,436 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import java.io.File
+
+import org.apache.spark.sql.{QueryTest, Row}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.test.{SharedSQLContext, SQLTestUtils}
+
+/**
+ * Schema can evolve in several ways and the followings are supported in 
file-based data sources.
+ *
+ *   1. Add a column
+ *   2. Remove a column
+ *   3. Change a column position
+ *   4. Change a column type
+ *
+ * Here, we consider safe evolution without data loss. For example, data 
type evolution should be
+ * from small types to larger types like `int`-to-`long`, not vice versa.
+ *
+ * So far, file-based data sources have schema evolution coverages like 
the followings.
+ *
+ *   | File Format  | Coverage | Note  
 |
+ *   |  |  | 
-- |
+ *   | TEXT | N/A  | Schema consists of a single string 
column. |
+ *   | CSV  | 1, 2, 4  |   
 |
+ *   | JSON | 1, 2, 3, 4   |   
 |
+ *   | ORC  | 1, 2, 3, 4   | Native vectorized ORC reader has the 
widest coverage.  |
+ *   | PARQUET  | 1, 2, 3  |   
 |
+ *
+ * This aims to provide an explicit test coverage for schema evolution on 
file-based data sources.
+ * Since a file format has its own coverage of schema evolution, we need a 
test suite
+ * for each file-based data source with corresponding supported test case 
traits.
+ *
+ * The following is a hierarchy of test traits.
+ *
+ *   SchemaEvolutionTest
+ * -> AddColumnEvolutionTest
+ * -> RemoveColumnEvolutionTest
+ * -> ChangePositionEvolutionTest
+ * -> BooleanTypeEvolutionTest
+ * -> IntegralTypeEvolutionTest
+ * -> ToDoubleTypeEvolutionTest
+ * -> ToDecimalTypeEvolutionTest
+ */
+
+trait SchemaEvolutionTest extends QueryTest with SQLTestUtils with 
SharedSQLContext {
+  val format: String
+  val options: Map[String, String] = Map.empty[String, String]
+}
+
+/**
+ * Add column.
--- End diff --

Shall we leave the number given above in this comment like `(case 1.)`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20208: [SPARK-23007][SQL][TEST] Add schema evolution tes...

2018-01-20 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20208#discussion_r162781286
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/SchemaEvolutionTest.scala
 ---
@@ -0,0 +1,436 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import java.io.File
+
+import org.apache.spark.sql.{QueryTest, Row}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.test.{SharedSQLContext, SQLTestUtils}
+
+/**
+ * Schema can evolve in several ways and the followings are supported in 
file-based data sources.
+ *
+ *   1. Add a column
+ *   2. Remove a column
+ *   3. Change a column position
+ *   4. Change a column type
+ *
+ * Here, we consider safe evolution without data loss. For example, data 
type evolution should be
+ * from small types to larger types like `int`-to-`long`, not vice versa.
+ *
+ * So far, file-based data sources have schema evolution coverages like 
the followings.
+ *
+ *   | File Format  | Coverage | Note  
 |
+ *   |  |  | 
-- |
+ *   | TEXT | N/A  | Schema consists of a single string 
column. |
+ *   | CSV  | 1, 2, 4  |   
 |
+ *   | JSON | 1, 2, 3, 4   |   
 |
+ *   | ORC  | 1, 2, 3, 4   | Native vectorized ORC reader has the 
widest coverage.  |
+ *   | PARQUET  | 1, 2, 3  |   
 |
+ *
+ * This aims to provide an explicit test coverage for schema evolution on 
file-based data sources.
+ * Since a file format has its own coverage of schema evolution, we need a 
test suite
+ * for each file-based data source with corresponding supported test case 
traits.
+ *
+ * The following is a hierarchy of test traits.
+ *
+ *   SchemaEvolutionTest
+ * -> AddColumnEvolutionTest
+ * -> RemoveColumnEvolutionTest
+ * -> ChangePositionEvolutionTest
+ * -> BooleanTypeEvolutionTest
+ * -> IntegralTypeEvolutionTest
+ * -> ToDoubleTypeEvolutionTest
+ * -> ToDecimalTypeEvolutionTest
+ */
+
+trait SchemaEvolutionTest extends QueryTest with SQLTestUtils with 
SharedSQLContext {
+  val format: String
+  val options: Map[String, String] = Map.empty[String, String]
+}
+
+/**
+ * Add column.
+ * This test suite assumes that the missing column should be `null`.
+ */
+trait AddColumnEvolutionTest extends SchemaEvolutionTest {
+  import testImplicits._
+
+  test("append column at the end") {
+withTempDir { dir =>
+  val path = dir.getCanonicalPath
+
+  val df1 = Seq("a", "b").toDF("col1")
+  val df2 = df1.withColumn("col2", lit("x"))
+  val df3 = df2.withColumn("col3", lit("y"))
+
+  val dir1 = s"$path${File.separator}part=one"
+  val dir2 = s"$path${File.separator}part=two"
+  val dir3 = s"$path${File.separator}part=three"
+
+  
df1.write.mode("overwrite").format(format).options(options).save(dir1)
+  
df2.write.mode("overwrite").format(format).options(options).save(dir2)
+  
df3.write.mode("overwrite").format(format).options(options).save(dir3)
+
+  val df = spark.read
+.schema(df3.schema)
+.format(format)
+.options(options)
+.load(path)
+
+  checkAnswer(df, Seq(
+Row("a", null, null, "one"),
+Row("b", null, null, "one"),
+Row("a", "x", null, "two"),
+Row("b", "x", null, "two"),
+Row("a", "x", "y", "three"),
+Row("b", "x", "y", "three")))
+}
+

[GitHub] spark pull request #20208: [SPARK-23007][SQL][TEST] Add schema evolution tes...

2018-01-20 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20208#discussion_r162781325
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/SchemaEvolutionTest.scala
 ---
@@ -0,0 +1,436 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import java.io.File
+
+import org.apache.spark.sql.{QueryTest, Row}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.test.{SharedSQLContext, SQLTestUtils}
+
+/**
+ * Schema can evolve in several ways and the followings are supported in 
file-based data sources.
+ *
+ *   1. Add a column
+ *   2. Remove a column
+ *   3. Change a column position
+ *   4. Change a column type
+ *
+ * Here, we consider safe evolution without data loss. For example, data 
type evolution should be
+ * from small types to larger types like `int`-to-`long`, not vice versa.
+ *
+ * So far, file-based data sources have schema evolution coverages like 
the followings.
+ *
+ *   | File Format  | Coverage | Note  
 |
+ *   |  |  | 
-- |
+ *   | TEXT | N/A  | Schema consists of a single string 
column. |
+ *   | CSV  | 1, 2, 4  |   
 |
+ *   | JSON | 1, 2, 3, 4   |   
 |
+ *   | ORC  | 1, 2, 3, 4   | Native vectorized ORC reader has the 
widest coverage.  |
+ *   | PARQUET  | 1, 2, 3  |   
 |
+ *
+ * This aims to provide an explicit test coverage for schema evolution on 
file-based data sources.
+ * Since a file format has its own coverage of schema evolution, we need a 
test suite
+ * for each file-based data source with corresponding supported test case 
traits.
+ *
+ * The following is a hierarchy of test traits.
+ *
+ *   SchemaEvolutionTest
+ * -> AddColumnEvolutionTest
+ * -> RemoveColumnEvolutionTest
+ * -> ChangePositionEvolutionTest
+ * -> BooleanTypeEvolutionTest
+ * -> IntegralTypeEvolutionTest
+ * -> ToDoubleTypeEvolutionTest
+ * -> ToDecimalTypeEvolutionTest
+ */
+
+trait SchemaEvolutionTest extends QueryTest with SQLTestUtils with 
SharedSQLContext {
+  val format: String
+  val options: Map[String, String] = Map.empty[String, String]
+}
+
+/**
+ * Add column.
+ * This test suite assumes that the missing column should be `null`.
+ */
+trait AddColumnEvolutionTest extends SchemaEvolutionTest {
+  import testImplicits._
+
+  test("append column at the end") {
+withTempDir { dir =>
+  val path = dir.getCanonicalPath
+
+  val df1 = Seq("a", "b").toDF("col1")
+  val df2 = df1.withColumn("col2", lit("x"))
+  val df3 = df2.withColumn("col3", lit("y"))
+
+  val dir1 = s"$path${File.separator}part=one"
+  val dir2 = s"$path${File.separator}part=two"
+  val dir3 = s"$path${File.separator}part=three"
+
+  
df1.write.mode("overwrite").format(format).options(options).save(dir1)
+  
df2.write.mode("overwrite").format(format).options(options).save(dir2)
+  
df3.write.mode("overwrite").format(format).options(options).save(dir3)
+
+  val df = spark.read
+.schema(df3.schema)
+.format(format)
+.options(options)
+.load(path)
+
+  checkAnswer(df, Seq(
+Row("a", null, null, "one"),
+Row("b", null, null, "one"),
+Row("a", "x", null, "two"),
+Row("b", "x", null, "two"),
+Row("a", "x", "y", "three"),
+Row("b", "x", "y", "three")))
+}
+

[GitHub] spark pull request #20208: [SPARK-23007][SQL][TEST] Add schema evolution tes...

2018-01-20 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20208#discussion_r162781308
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/SchemaEvolutionTest.scala
 ---
@@ -0,0 +1,436 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import java.io.File
+
+import org.apache.spark.sql.{QueryTest, Row}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.test.{SharedSQLContext, SQLTestUtils}
+
+/**
+ * Schema can evolve in several ways and the followings are supported in 
file-based data sources.
+ *
+ *   1. Add a column
+ *   2. Remove a column
+ *   3. Change a column position
+ *   4. Change a column type
+ *
+ * Here, we consider safe evolution without data loss. For example, data 
type evolution should be
+ * from small types to larger types like `int`-to-`long`, not vice versa.
+ *
+ * So far, file-based data sources have schema evolution coverages like 
the followings.
+ *
+ *   | File Format  | Coverage | Note  
 |
+ *   |  |  | 
-- |
+ *   | TEXT | N/A  | Schema consists of a single string 
column. |
+ *   | CSV  | 1, 2, 4  |   
 |
+ *   | JSON | 1, 2, 3, 4   |   
 |
+ *   | ORC  | 1, 2, 3, 4   | Native vectorized ORC reader has the 
widest coverage.  |
+ *   | PARQUET  | 1, 2, 3  |   
 |
+ *
+ * This aims to provide an explicit test coverage for schema evolution on 
file-based data sources.
+ * Since a file format has its own coverage of schema evolution, we need a 
test suite
+ * for each file-based data source with corresponding supported test case 
traits.
+ *
+ * The following is a hierarchy of test traits.
+ *
+ *   SchemaEvolutionTest
+ * -> AddColumnEvolutionTest
+ * -> RemoveColumnEvolutionTest
+ * -> ChangePositionEvolutionTest
+ * -> BooleanTypeEvolutionTest
+ * -> IntegralTypeEvolutionTest
+ * -> ToDoubleTypeEvolutionTest
+ * -> ToDecimalTypeEvolutionTest
+ */
+
+trait SchemaEvolutionTest extends QueryTest with SQLTestUtils with 
SharedSQLContext {
+  val format: String
+  val options: Map[String, String] = Map.empty[String, String]
+}
+
+/**
+ * Add column.
+ * This test suite assumes that the missing column should be `null`.
+ */
+trait AddColumnEvolutionTest extends SchemaEvolutionTest {
+  import testImplicits._
+
+  test("append column at the end") {
+withTempDir { dir =>
+  val path = dir.getCanonicalPath
+
+  val df1 = Seq("a", "b").toDF("col1")
+  val df2 = df1.withColumn("col2", lit("x"))
+  val df3 = df2.withColumn("col3", lit("y"))
+
+  val dir1 = s"$path${File.separator}part=one"
+  val dir2 = s"$path${File.separator}part=two"
+  val dir3 = s"$path${File.separator}part=three"
+
+  
df1.write.mode("overwrite").format(format).options(options).save(dir1)
+  
df2.write.mode("overwrite").format(format).options(options).save(dir2)
+  
df3.write.mode("overwrite").format(format).options(options).save(dir3)
+
+  val df = spark.read
+.schema(df3.schema)
+.format(format)
+.options(options)
+.load(path)
+
+  checkAnswer(df, Seq(
+Row("a", null, null, "one"),
+Row("b", null, null, "one"),
+Row("a", "x", null, "two"),
+Row("b", "x", null, "two"),
+Row("a", "x", "y", "three"),
+Row("b", "x", "y", "three")))
+}
+

[GitHub] spark issue #20334: How to check registered table name.

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20334
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20334: How to check registered table name.

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20334
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20334: How to check registered table name.

2018-01-20 Thread AtulKumVerma

GitHub user AtulKumVerma opened a pull request:

https://github.com/apache/spark/pull/20334

How to check registered table name.

Dear fellows,

I want to know how can i see all the registered dataset or dataframe as 
temporary table or view in sql context.
I read about it catalyst is responsible for maintaing one to one mapping 
between dataframe and its temporary table name.
I want to just list down that all from catalyst.

Your Response highly Appreciate.
Thanks All in Advance.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20334.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20334


commit 5244aafc2d7945c11c96398b8d5b752b45fd148c
Author: Xianjin YE 
Date:   2018-01-02T15:30:38Z

[SPARK-22897][CORE] Expose stageAttemptId in TaskContext

## What changes were proposed in this pull request?
stageAttemptId added in TaskContext and corresponding construction 
modification

## How was this patch tested?
Added a new test in TaskContextSuite, two cases are tested:
1. Normal case without failure
2. Exception case with resubmitted stages

Link to [SPARK-22897](https://issues.apache.org/jira/browse/SPARK-22897)

Author: Xianjin YE 

Closes #20082 from advancedxy/SPARK-22897.

(cherry picked from commit a6fc300e91273230e7134ac6db95ccb4436c6f8f)
Signed-off-by: Wenchen Fan 

commit b96a2132413937c013e1099be3ec4bc420c947fd
Author: Juliusz Sompolski 
Date:   2018-01-03T13:40:51Z

[SPARK-22938] Assert that SQLConf.get is accessed only on the driver.

## What changes were proposed in this pull request?

Assert if code tries to access SQLConf.get on executor.
This can lead to hard to detect bugs, where the executor will read 
fallbackConf, falling back to default config values, ignoring potentially 
changed non-default configs.
If a config is to be passed to executor code, it needs to be read on the 
driver, and passed explicitly.

## How was this patch tested?

Check in existing tests.

Author: Juliusz Sompolski 

Closes #20136 from juliuszsompolski/SPARK-22938.

(cherry picked from commit 247a08939d58405aef39b2a4e7773aa45474ad12)
Signed-off-by: Wenchen Fan 

commit a05e85ecb76091567a26a3a14ad0879b4728addc
Author: gatorsmile 
Date:   2018-01-03T14:09:30Z

[SPARK-22934][SQL] Make optional clauses order insensitive for CREATE TABLE 
SQL statement

## What changes were proposed in this pull request?
Currently, our CREATE TABLE syntax require the EXACT order of clauses. It 
is pretty hard to remember the exact order. Thus, this PR is to make optional 
clauses order insensitive for `CREATE TABLE` SQL statement.

```
CREATE [TEMPORARY] TABLE [IF NOT EXISTS] [db_name.]table_name
[(col_name1 col_type1 [COMMENT col_comment1], ...)]
USING datasource
[OPTIONS (key1=val1, key2=val2, ...)]
[PARTITIONED BY (col_name1, col_name2, ...)]
[CLUSTERED BY (col_name3, col_name4, ...) INTO num_buckets BUCKETS]
[LOCATION path]
[COMMENT table_comment]
[TBLPROPERTIES (key1=val1, key2=val2, ...)]
[AS select_statement]
```

The proposal is to make the following clauses order insensitive.
```
[OPTIONS (key1=val1, key2=val2, ...)]
[PARTITIONED BY (col_name1, col_name2, ...)]
[CLUSTERED BY (col_name3, col_name4, ...) INTO num_buckets BUCKETS]
[LOCATION path]
[COMMENT table_comment]
[TBLPROPERTIES (key1=val1, key2=val2, ...)]
```

The same idea is also applicable to Create Hive Table.
```
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
[(col_name1[:] col_type1 [COMMENT col_comment1], ...)]
[COMMENT table_comment]
[PARTITIONED BY (col_name2[:] col_type2 [COMMENT col_comment2], ...)]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION path]
[TBLPROPERTIES (key1=val1, key2=val2, ...)]
[AS select_statement]
```

The proposal is to make the following clauses order insensitive.
```
[COMMENT table_comment]
[PARTITIONED BY (col_name2[:] col_type2 [COMMENT col_comment2], ...)]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION path]
[TBLPROPERTIES (key1=val1, key2=val2, ...)]
```

## How was this patch tested?
Added test cases

Author: gatorsmile 

Closes #20133 from

[GitHub] spark pull request #20325: [SPARK-22808][DOCS] add insertInto when save hive...

2018-01-20 Thread brandonJY

Github user brandonJY commented on a diff in the pull request:

https://github.com/apache/spark/pull/20325#discussion_r162781167
  
--- Diff: docs/sql-programming-guide.md ---
@@ -580,6 +580,9 @@ default local Hive metastore (using Derby) for you. 
Unlike the `createOrReplaceT
 Hive metastore. Persistent tables will still exist even after your Spark 
program has restarted, as
 long as you maintain your connection to the same metastore. A DataFrame 
for a persistent table can
 be created by calling the `table` method on a `SparkSession` with the name 
of the table.
+Notice that for `DataFrames` is built on Hive table, `insertInto` should 
be used instead of `saveAsTable`.
--- End diff --

@gatorsmile Could you elaborate on your comment? The purpose of this 
sentence was to warn user to use `insertInto` when they are dealing DataFrames 
that created from Hive table. Since due to 
https://issues.apache.org/jira/browse/SPARK-16803, `saveAsTable` will not work 
on that special case. Or do you have any suggestions to make it more clear?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18906: [SPARK-21692][PYSPARK][SQL] Add nullability suppo...

2018-01-20 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18906#discussion_r162779803
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -597,10 +597,29 @@ def test_non_existed_udf(self):
 self.assertRaisesRegexp(AnalysisException, "Can not load class 
non_existed_udf",
 lambda: 
spark.udf.registerJavaFunction("udf1", "non_existed_udf"))
 
-# This is to check if a deprecated 
'SQLContext.registerJavaFunction' can call its alias.
--- End diff --

Seems this test is gone ...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2018-01-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20087
  
**[Test build #86411 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86411/testReport)**
 for PR 20087 at commit 
[`118f788`](https://github.com/apache/spark/commit/118f7880bdcf26ba7394a2cc7fac2e0eae707d6f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compres...

2018-01-20 Thread fjh100456

Github user fjh100456 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20087#discussion_r162779218
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/CompressionCodecSuite.scala 
---
@@ -0,0 +1,354 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+
+import scala.collection.JavaConverters._
+
+import org.apache.hadoop.fs.Path
+import org.apache.orc.OrcConf.COMPRESS
+import org.apache.parquet.hadoop.ParquetOutputFormat
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.spark.sql.execution.datasources.orc.OrcOptions
+import org.apache.spark.sql.execution.datasources.parquet.{ParquetOptions, 
ParquetTest}
+import org.apache.spark.sql.hive.orc.OrcFileOperator
+import org.apache.spark.sql.hive.test.TestHiveSingleton
+import org.apache.spark.sql.internal.SQLConf
+
+class CompressionCodecSuite extends TestHiveSingleton with ParquetTest 
with BeforeAndAfterAll {
+  import spark.implicits._
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+(0 until 
maxRecordNum).toDF("a").createOrReplaceTempView("table_source")
+  }
+
+  override def afterAll(): Unit = {
+try {
+  spark.catalog.dropTempView("table_source")
+} finally {
+  super.afterAll()
+}
+  }
+
+  private val maxRecordNum = 50
+
+  private def getConvertMetastoreConfName(format: String): String = 
format.toLowerCase match {
+case "parquet" => HiveUtils.CONVERT_METASTORE_PARQUET.key
+case "orc" => HiveUtils.CONVERT_METASTORE_ORC.key
+  }
+
+  private def getSparkCompressionConfName(format: String): String = 
format.toLowerCase match {
+case "parquet" => SQLConf.PARQUET_COMPRESSION.key
+case "orc" => SQLConf.ORC_COMPRESSION.key
+  }
+
+  private def getHiveCompressPropName(format: String): String = 
format.toLowerCase match {
+case "parquet" => ParquetOutputFormat.COMPRESSION
+case "orc" => COMPRESS.getAttribute
+  }
+
+  private def normalizeCodecName(format: String, name: String): String = {
+format.toLowerCase match {
+  case "parquet" => ParquetOptions.getParquetCompressionCodecName(name)
+  case "orc" => OrcOptions.getORCCompressionCodecName(name)
+}
+  }
+
+  private def getTableCompressionCodec(path: String, format: String): 
Seq[String] = {
+val hadoopConf = spark.sessionState.newHadoopConf()
+val codecs = format.toLowerCase match {
+  case "parquet" => for {
+footer <- readAllFootersWithoutSummaryFiles(new Path(path), 
hadoopConf)
+block <- footer.getParquetMetadata.getBlocks.asScala
+column <- block.getColumns.asScala
+  } yield column.getCodec.name()
+  case "orc" => new File(path).listFiles().filter { file =>
+file.isFile && !file.getName.endsWith(".crc") && file.getName != 
"_SUCCESS"
+  }.map { orcFile =>
+
OrcFileOperator.getFileReader(orcFile.toPath.toString).get.getCompression.toString
+  }.toSeq
+}
+codecs.distinct
+  }
+
+  private def createTable(
+  rootDir: File,
+  tableName: String,
+  isPartitioned: Boolean,
+  format: String,
+  compressionCodec: Option[String]): Unit = {
+val tblProperties = compressionCodec match {
+  case Some(prop) => 
s"TBLPROPERTIES('${getHiveCompressPropName(format)}'='$prop')"
+  case _ => ""
+}
+val partitionCreate = if (isPartitioned) "PARTITIONED BY (p string)" 
else ""
+sql(
+  s"""
+|CREATE TABLE $tableName(a int)
+|$partitionCreate
+|STORED AS $format
+|LOCATION '${rootDir.toURI.toString.stripSuffix("/")}/$tableName'
+|$tblProperties
+  """.stripMargin)
+  }
+
+  private def writeDataToTable(
+  tableName: String,
+

[GitHub] spark issue #20203: [SPARK-22577] [core] executor page blacklist status shou...

2018-01-20 Thread attilapiros

Github user attilapiros commented on the issue:

https://github.com/apache/spark/pull/20203
  
One more reason to run tests in sbt / maven. In intelliJ somehow the 
complete suite was successful. But the current failure seems to me unrelated, 
as org.apache.spark.deploy.history has 0 failures.





---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20091: [SPARK-22465][FOLLOWUP] Update the number of part...

2018-01-20 Thread mridulm

Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/20091#discussion_r162778292
  
--- Diff: 
core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala ---
@@ -332,6 +331,48 @@ class PairRDDFunctionsSuite extends SparkFunSuite with 
SharedSparkContext {
 assert(joined.getNumPartitions == rdd2.getNumPartitions)
   }
 
+  test("cogroup between multiple RDD when defaultParallelism is set 
without proper partitioner") {
+assert(!sc.conf.contains("spark.default.parallelism"))
+try {
+  sc.conf.set("spark.default.parallelism", "4")
+  val rdd1 = sc.parallelize((1 to 1000).map(x => (x, x)), 20)
+  val rdd2 = sc.parallelize(Array((1, 1), (1, 2), (2, 1), (3, 1)), 10)
+  val joined = rdd1.cogroup(rdd2)
+  assert(joined.getNumPartitions == sc.defaultParallelism)
+} finally {
+  sc.conf.remove("spark.default.parallelism")
+}
+  }
+
+  test("cogroup between multiple RDD when defaultParallelism is set with 
proper partitioner") {
+assert(!sc.conf.contains("spark.default.parallelism"))
+try {
+  sc.conf.set("spark.default.parallelism", "4")
+  val rdd1 = sc.parallelize((1 to 1000).map(x => (x, x)), 20)
+  val rdd2 = sc.parallelize(Array((1, 1), (1, 2), (2, 1), (3, 1)))
+.partitionBy(new HashPartitioner(10))
+  val joined = rdd1.cogroup(rdd2)
+  assert(joined.getNumPartitions == rdd2.getNumPartitions)
+} finally {
+  sc.conf.remove("spark.default.parallelism")
+}
+  }
+
+  test("cogroup between multiple RDD when defaultParallelism is set with 
huge number of " +
--- End diff --

nit: "set; with huge number of partitions in upstream RDDs"


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20091: [SPARK-22465][FOLLOWUP] Update the number of part...

2018-01-20 Thread mridulm

Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/20091#discussion_r162778240
  
--- Diff: core/src/test/scala/org/apache/spark/PartitioningSuite.scala ---
@@ -284,7 +284,38 @@ class PartitioningSuite extends SparkFunSuite with 
SharedSparkContext with Priva
 assert(partitioner3.numPartitions == rdd3.getNumPartitions)
 assert(partitioner4.numPartitions == rdd3.getNumPartitions)
 assert(partitioner5.numPartitions == rdd4.getNumPartitions)
+  }
 
+  test("defaultPartitioner when defaultParallelism is set") {
+assert(!sc.conf.contains("spark.default.parallelism"))
+try {
+  sc.conf.set("spark.default.parallelism", "4")
+
+  val rdd1 = sc.parallelize((1 to 1000).map(x => (x, x)), 150)
+  val rdd2 = sc.parallelize(Array((1, 2), (2, 3), (2, 4), (3, 4)))
+.partitionBy(new HashPartitioner(10))
+  val rdd3 = sc.parallelize(Array((1, 6), (7, 8), (3, 10), (5, 12), 
(13, 14)))
+.partitionBy(new HashPartitioner(100))
+  val rdd4 = sc.parallelize(Array((1, 2), (2, 3), (2, 4), (3, 4)))
+.partitionBy(new HashPartitioner(9))
+  val rdd5 = sc.parallelize((1 to 10).map(x => (x, x)), 11)
--- End diff --

Can we add a case where partitioner is not used and default (from 
spark.default.parallelism) gets used ?
For example, something like the following pseudo
```
val rdd6 = sc.parallelize(Array((1, 2), (2, 3), (2, 4), (3, 
4))).partitionBy(new HashPartitioner(3))

...
Partitioner.defaultPartitioner(rdd1, rdd6).numPartitions == 
sc.conf.get("spark.default.parallelism").toInt
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20091: [SPARK-22465][FOLLOWUP] Update the number of part...

2018-01-20 Thread mridulm

Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/20091#discussion_r162778187
  
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -43,17 +43,19 @@ object Partitioner {
   /**
* Choose a partitioner to use for a cogroup-like operation between a 
number of RDDs.
*
-   * If any of the RDDs already has a partitioner, and the number of 
partitions of the
-   * partitioner is either greater than or is less than and within a 
single order of
-   * magnitude of the max number of upstream partitions, choose that one.
+   * If spark.default.parallelism is set, we'll use the value of 
SparkContext defaultParallelism
+   * as the default partitions number, otherwise we'll use the max number 
of upstream partitions.
*
-   * Otherwise, we use a default HashPartitioner. For the number of 
partitions, if
-   * spark.default.parallelism is set, then we'll use the value from 
SparkContext
-   * defaultParallelism, otherwise we'll use the max number of upstream 
partitions.
+   * If any of the RDDs already has a partitioner, and the partitioner is 
an eligible one (with a
+   * partitions number that is not less than the max number of upstream 
partitions by an order of
+   * magnitude), or the number of partitions is larger than the default 
one, we'll choose the
+   * exsiting partitioner.
--- End diff --

We should rephrase this for clarity.
How about
"When available, we choose the partitioner from rdds with maximum number of 
partitions. If this partitioner is eligible (number of partitions within an 
order of maximum number of partitions in rdds), or has partition number higher 
than default partitions number - we use this partitioner"


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20146: [SPARK-11215][ML] Add multiple columns support to...

2018-01-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20146#discussion_r162777500
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -249,6 +249,16 @@ object ParamValidators {
   def arrayLengthGt[T](lowerBound: Double): Array[T] => Boolean = { 
(value: Array[T]) =>
 value.length > lowerBound
   }
+
+  /** Check if more than one param in a set of exclusive params are set. */
+  def checkExclusiveParams(model: Params, params: String*): Unit = {
+if (params.filter(paramName => model.hasParam(paramName) &&
--- End diff --

The purpose of this method is to check if more than one Params are set 
among some exclusive Params within a Model. Is it useful to put an irrelevant 
Param into the exclusive Params to check? As we already know what Params the 
model has, it sounds like we want to check an irrelevant Param that we already 
know non-existing?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20087
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86409/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20203: [SPARK-22577] [core] executor page blacklist status shou...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20203
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20203: [SPARK-22577] [core] executor page blacklist status shou...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20203
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86408/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20087
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19528: [SPARK-20393][WEBU UI][1.6] Strengthen Spark to prevent ...

2018-01-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19528
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 103 matches

Mail list logo