[GitHub] spark pull request #23176: [SPARK-26211][SQL] Fix InSet for binary, and stru...

2018-11-29 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23176#discussion_r237771176
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/PredicateSuite.scala
 ---
@@ -293,6 +293,54 @@ class PredicateSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
   }
 
+  test("INSET: binary") {
--- End diff --

Sure, I'll do it later. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23176: [SPARK-26211][SQL] Fix InSet for binary, and stru...

2018-11-29 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23176#discussion_r237770687
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/PredicateSuite.scala
 ---
@@ -293,6 +293,54 @@ class PredicateSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
   }
 
+  test("INSET: binary") {
--- End diff --

good idea! we should test `In` and `InSet` together


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23086: [SPARK-25528][SQL] data source v2 API refactor (batch re...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23086
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23086: [SPARK-25528][SQL] data source v2 API refactor (batch re...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23086
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99493/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23086: [SPARK-25528][SQL] data source v2 API refactor (batch re...

2018-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23086
  
**[Test build #99493 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99493/testReport)**
 for PR 23086 at commit 
[`eecb161`](https://github.com/apache/spark/commit/eecb161075720aec0c496576fe6b6ad749c3a726).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23181: [SPARK-26219][CORE] Executor summary should get updated ...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23181
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23181: [SPARK-26219][CORE] Executor summary should get updated ...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23181
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99492/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23181: [SPARK-26219][CORE] Executor summary should get updated ...

2018-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23181
  
**[Test build #99492 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99492/testReport)**
 for PR 23181 at commit 
[`1be36f7`](https://github.com/apache/spark/commit/1be36f77f58576db9650a4584b1f882e2f284d0f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of...

2018-11-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/23152#discussion_r237768463
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -2276,4 +2276,16 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
 }
   }
 
+
+  test("SPARK-26181 hasMinMaxStats method of ColumnStatsMap is not 
correct") {
+withSQLConf(SQLConf.CBO_ENABLED.key -> "true") {
+  withTable("all_null") {
+sql("create table all_null (attrInt int)")
+sql("insert into all_null values (null)")
+sql("analyze table all_null compute statistics for columns 
attrInt")
+checkAnswer(sql("select * from all_null where attrInt < 1"), Nil)
--- End diff --

This test can pass without this patch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23164: [SPARK-26198][SQL] Fix Metadata serialize null values th...

2018-11-29 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/23164
  
cc @srowen


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23184: [SPARK-26227][R] from_[csv|json] should accept schema_of...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23184
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99494/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23183: [SPARK-26226][SQL] Update query tracker to report timeli...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23183
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23183: [SPARK-26226][SQL] Update query tracker to report timeli...

2018-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23183
  
**[Test build #99500 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99500/testReport)**
 for PR 23183 at commit 
[`5f5a0e8`](https://github.com/apache/spark/commit/5f5a0e83245592ab5af7fb9df8292bdff4ca1385).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23184: [SPARK-26227][R] from_[csv|json] should accept schema_of...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23184
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23183: [SPARK-26226][SQL] Update query tracker to report timeli...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23183
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5567/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23184: [SPARK-26227][R] from_[csv|json] should accept schema_of...

2018-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23184
  
**[Test build #99494 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99494/testReport)**
 for PR 23184 at commit 
[`8877837`](https://github.com/apache/spark/commit/88778378db1ab3d150c104066e416f7b8f7d7a7b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23176: [SPARK-26211][SQL] Fix InSet for binary, and stru...

2018-11-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/23176#discussion_r237766198
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/PredicateSuite.scala
 ---
@@ -293,6 +293,54 @@ class PredicateSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
   }
 
+  test("INSET: binary") {
--- End diff --

Regarding the semantics, InSet is equal to In. Could we combine the test 
cases? Test both?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23162: [MINOR][DOC] Correct some document description errors

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23162
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5566/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23162: [MINOR][DOC] Correct some document description errors

2018-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23162
  
**[Test build #99499 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99499/testReport)**
 for PR 23162 at commit 
[`97454b2`](https://github.com/apache/spark/commit/97454b239cda92c1cc58a67434c027a7486cc7fa).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23162: [MINOR][DOC] Correct some document description errors

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23162
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...

2018-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23152
  
**[Test build #99498 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99498/testReport)**
 for PR 23152 at commit 
[`ea7a876`](https://github.com/apache/spark/commit/ea7a8764b27c1e38a65f549b00e7acec6074d2f9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23152
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5565/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23152
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23170: [SPARK-24423][FOLLOW-UP][SQL] Fix error example

2018-11-29 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/23170
  
It's not a regression. The first check exists in 
[2.1.0](https://github.com/apache/spark/blob/v2.1.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala#L99-L102)
 and the second check is added in 
[2.4.0](https://github.com/apache/spark/blob/v2.4.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala#L133-L143).

cc @dilipbiswal 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23185: [MINOR][Docs] Fix typos

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23185
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23185: [MINOR][Docs] Fix typos

2018-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23185
  
**[Test build #99497 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99497/testReport)**
 for PR 23185 at commit 
[`70fc30d`](https://github.com/apache/spark/commit/70fc30d1e0eac795c6a230f7255b7e488b1a57cf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23185: [MINOR][Docs] Fix typos

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23185
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99497/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23185: [MINOR][Docs] Fix typos

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23185
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23185: [MINOR][Docs] Fix typos

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23185
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5564/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23185: [MINOR][Docs] Fix typos

2018-11-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/23185
  
Thanks for skimming the whole doc. cc @srowen.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23185: [MINOR][Docs] Fix typos

2018-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23185
  
**[Test build #99497 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99497/testReport)**
 for PR 23185 at commit 
[`70fc30d`](https://github.com/apache/spark/commit/70fc30d1e0eac795c6a230f7255b7e488b1a57cf).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23185: [MINOR][Docs] Fix typos

2018-11-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/23185
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...

2018-11-29 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22514#discussion_r237756348
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
 ---
@@ -95,9 +77,98 @@ case class CreateHiveTableAsSelectCommand(
 Seq.empty[Row]
   }
 
+  def getDataWritingCommand(
--- End diff --

I feel it's better to have 2 methods: `writingCommandForExistingTable`, 
`writingCommandForNewTable`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...

2018-11-29 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22514#discussion_r237756394
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
 ---
@@ -95,9 +77,98 @@ case class CreateHiveTableAsSelectCommand(
 Seq.empty[Row]
   }
 
+  def getDataWritingCommand(
+catalog: SessionCatalog,
+tableDesc: CatalogTable,
+tableExists: Boolean): DataWritingCommand
+
   override def argString: String = {
 s"[Database:${tableDesc.database}, " +
 s"TableName: ${tableDesc.identifier.table}, " +
 s"InsertIntoHiveTable]"
   }
 }
+
+/**
+ * Create table and insert the query result into it.
+ *
+ * @param tableDesc the Table Describe, which may contain serde, storage 
handler etc.
+ * @param query the query whose result will be insert into the new relation
+ * @param mode SaveMode
+ */
+case class CreateHiveTableAsSelectCommand(
+tableDesc: CatalogTable,
+query: LogicalPlan,
+outputColumnNames: Seq[String],
+mode: SaveMode)
+  extends CreateHiveTableAsSelectBase {
+
+  override def getDataWritingCommand(
+  catalog: SessionCatalog,
+  tableDesc: CatalogTable,
+  tableExists: Boolean): DataWritingCommand = {
+if (tableExists) {
+  InsertIntoHiveTable(
+tableDesc,
+Map.empty,
+query,
+overwrite = false,
+ifPartitionNotExists = false,
+outputColumnNames = outputColumnNames)
+} else {
+  // For CTAS, there is no static partition values to insert.
+  val partition = tableDesc.partitionColumnNames.map(_ -> None).toMap
+  InsertIntoHiveTable(
+tableDesc,
+partition,
+query,
+overwrite = true,
+ifPartitionNotExists = false,
+outputColumnNames = outputColumnNames)
+}
+  }
+}
+
+/**
+ * Create table and insert the query result into it. This creates Hive 
table but inserts
+ * the query result into it by using data source.
+ *
+ * @param tableDesc the Table Describe, which may contain serde, storage 
handler etc.
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22939: [SPARK-25446][R] Add schema_of_json() and schema_of_csv(...

2018-11-29 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22939
  
Error looks reasonable...




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23151: [SPARK-26180][CORE][TEST] Reuse withTempDir function to ...

2018-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23151
  
**[Test build #99496 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99496/testReport)**
 for PR 23151 at commit 
[`beccd74`](https://github.com/apache/spark/commit/beccd749e9087a557fe56dbb2610abae663f4199).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23151: [SPARK-26180][CORE][TEST] Reuse withTempDir function to ...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23151
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5563/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23151: [SPARK-26180][CORE][TEST] Reuse withTempDir function to ...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23151
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...

2018-11-29 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22514#discussion_r237753623
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
 ---
@@ -95,9 +77,98 @@ case class CreateHiveTableAsSelectCommand(
 Seq.empty[Row]
   }
 
+  def getDataWritingCommand(
+catalog: SessionCatalog,
+tableDesc: CatalogTable,
+tableExists: Boolean): DataWritingCommand
+
   override def argString: String = {
 s"[Database:${tableDesc.database}, " +
 s"TableName: ${tableDesc.identifier.table}, " +
 s"InsertIntoHiveTable]"
   }
 }
+
+/**
+ * Create table and insert the query result into it.
+ *
+ * @param tableDesc the Table Describe, which may contain serde, storage 
handler etc.
--- End diff --

`table description`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23145: [MINOR][Docs][WIP] Fix Typos

2018-11-29 Thread kjmrknsn
Github user kjmrknsn commented on the issue:

https://github.com/apache/spark/pull/23145
  
Thanks for reviewing and merging.
I've just finished checking the whole documentation.
Here is the complete version of this PR : 
https://github.com/apache/spark/pull/23185
Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23185: [MINOR][Docs] Fix typos

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23185
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...

2018-11-29 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22514#discussion_r237753433
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
 ---
@@ -95,9 +77,98 @@ case class CreateHiveTableAsSelectCommand(
 Seq.empty[Row]
   }
 
+  def getDataWritingCommand(
+catalog: SessionCatalog,
+tableDesc: CatalogTable,
+tableExists: Boolean): DataWritingCommand
+
   override def argString: String = {
 s"[Database:${tableDesc.database}, " +
 s"TableName: ${tableDesc.identifier.table}, " +
 s"InsertIntoHiveTable]"
   }
 }
+
+/**
+ * Create table and insert the query result into it.
+ *
+ * @param tableDesc the Table Describe, which may contain serde, storage 
handler etc.
+ * @param query the query whose result will be insert into the new relation
+ * @param mode SaveMode
+ */
+case class CreateHiveTableAsSelectCommand(
+tableDesc: CatalogTable,
+query: LogicalPlan,
+outputColumnNames: Seq[String],
+mode: SaveMode)
+  extends CreateHiveTableAsSelectBase {
+
+  override def getDataWritingCommand(
+  catalog: SessionCatalog,
+  tableDesc: CatalogTable,
+  tableExists: Boolean): DataWritingCommand = {
+if (tableExists) {
+  InsertIntoHiveTable(
+tableDesc,
+Map.empty,
+query,
+overwrite = false,
+ifPartitionNotExists = false,
+outputColumnNames = outputColumnNames)
+} else {
+  // For CTAS, there is no static partition values to insert.
+  val partition = tableDesc.partitionColumnNames.map(_ -> None).toMap
+  InsertIntoHiveTable(
+tableDesc,
+partition,
+query,
+overwrite = true,
+ifPartitionNotExists = false,
+outputColumnNames = outputColumnNames)
+}
+  }
+}
+
+/**
+ * Create table and insert the query result into it. This creates Hive 
table but inserts
+ * the query result into it by using data source.
+ *
+ * @param tableDesc the Table Describe, which may contain serde, storage 
handler etc.
+ * @param query the query whose result will be insert into the new relation
+ * @param mode SaveMode
+ */
+case class CreateHiveTableAsSelectWithDataSourceCommand(
--- End diff --

`OptimizedCreateHiveTableAsSelectCommand`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23185: [MINOR][Docs] Fix typos

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23185
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23185: [MINOR][Docs] Fix typos

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23185
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23185: [MINOR][Docs] Fix typos

2018-11-29 Thread kjmrknsn
GitHub user kjmrknsn opened a pull request:

https://github.com/apache/spark/pull/23185

[MINOR][Docs] Fix typos

## What changes were proposed in this pull request?
Fix Typos.

## How was this patch tested?
NA

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kjmrknsn/spark docUpdate

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23185.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23185


commit 70fc30d1e0eac795c6a230f7255b7e488b1a57cf
Author: Keiji Yoshida 
Date:   2018-11-26T15:29:16Z

[MINOR][Docs] Fix typos




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23151: [SPARK-26180][CORE][TEST] Reuse withTempDir function to ...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23151
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23151: [SPARK-26180][CORE][TEST] Reuse withTempDir function to ...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23151
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99489/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23151: [SPARK-26180][CORE][TEST] Reuse withTempDir function to ...

2018-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23151
  
**[Test build #99489 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99489/testReport)**
 for PR 23151 at commit 
[`482c4f4`](https://github.com/apache/spark/commit/482c4f4231b7f566de8b909256b74264efc5e821).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23182: Config change followup to [SPARK-26177] Automated format...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23182
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23182: Config change followup to [SPARK-26177] Automated format...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23182
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99488/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23182: Config change followup to [SPARK-26177] Automated format...

2018-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23182
  
**[Test build #99488 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99488/testReport)**
 for PR 23182 at commit 
[`07ca58f`](https://github.com/apache/spark/commit/07ca58ff2e7b0df19d4d755cba0152e323dc0d99).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23146: [SPARK-26173] [MLlib] Prior regularization for Logistic ...

2018-11-29 Thread sujithjay
Github user sujithjay commented on the issue:

https://github.com/apache/spark/pull/23146
  
cc: @kiszk @viirya @yanboliang @srowen 
Could you please review this PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...

2018-11-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22514#discussion_r237749421
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala ---
@@ -92,4 +92,18 @@ class HiveParquetSuite extends QueryTest with 
ParquetTest with TestHiveSingleton
   }
 }
   }
+
+  test("SPARK-25271: write empty map into hive parquet table") {
--- End diff --

Added a new test for that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

2018-11-29 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/22957
  
Btw, I think we can update the PR title and description to reflect new 
changes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22514
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...

2018-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22514
  
**[Test build #99495 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99495/testReport)**
 for PR 22514 at commit 
[`9629175`](https://github.com/apache/spark/commit/96291751c5a4992325f37bcb794ea5fd3f31593b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

2018-11-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22957#discussion_r237749287
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala ---
@@ -780,6 +780,23 @@ class PlannerSuite extends SharedSQLContext {
 classOf[PartitioningCollection])
 }
   }
+
+  test("SPARK-25951: avoid redundant shuffle on rename") {
--- End diff --

+1 if possible.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22514
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5562/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

2018-11-29 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/22957
  
This looks good to me. Just a comment about wording.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

2018-11-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22957#discussion_r237747550
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 ---
@@ -195,14 +195,35 @@ abstract class Expression extends 
TreeNode[Expression] {
   }
 
   /**
-   * Returns true when two expressions will always compute the same 
result, even if they differ
+   * Returns true when two expressions will always compute the same 
output, even if they differ
* cosmetically (i.e. capitalization of names in attributes may be 
different).
*
* See [[Canonicalize]] for more details.
+   *
+   * This method should be used (instead of `sameResult`) when comparing 
if 2 expressions are the
+   * same and one can replace the other (eg. in Optimizer/Analyzer rules 
where we want to replace
+   * equivalent expressions). It should not be used (and `sameResult` 
should be used instead) when
+   * comparing if 2 expressions produce the same results (in this case 
`semanticEquals` can be too
+   * strict).
*/
   def semanticEquals(other: Expression): Boolean =
 deterministic && other.deterministic && canonicalized == 
other.canonicalized
 
+  /**
+   * Returns true when two expressions will always compute the same 
result, even if the output may
+   * be different, because of different names or similar differences.
--- End diff --

I think here `output` is a bit confusing. Do we mean the output names?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

2018-11-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22957#discussion_r237747770
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 ---
@@ -195,14 +195,35 @@ abstract class Expression extends 
TreeNode[Expression] {
   }
 
   /**
-   * Returns true when two expressions will always compute the same 
result, even if they differ
+   * Returns true when two expressions will always compute the same 
output, even if they differ
* cosmetically (i.e. capitalization of names in attributes may be 
different).
*
* See [[Canonicalize]] for more details.
+   *
+   * This method should be used (instead of `sameResult`) when comparing 
if 2 expressions are the
+   * same and one can replace the other (eg. in Optimizer/Analyzer rules 
where we want to replace
+   * equivalent expressions). It should not be used (and `sameResult` 
should be used instead) when
+   * comparing if 2 expressions produce the same results (in this case 
`semanticEquals` can be too
+   * strict).
*/
   def semanticEquals(other: Expression): Boolean =
 deterministic && other.deterministic && canonicalized == 
other.canonicalized
 
+  /**
+   * Returns true when two expressions will always compute the same 
result, even if the output may
+   * be different, because of different names or similar differences.
--- End diff --

So sameResult returns if the evaluated results between two expressions are 
exactly the same?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23162: [MINOR][DOC] Correct some document description errors

2018-11-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/23162
  
Few nit comments because I thought we should avoid: negative comparison; 
however, let me leave it to @srowen.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23162: [MINOR][DOC] Correct some document description er...

2018-11-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23162#discussion_r237747341
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -430,8 +430,8 @@ package object config {
   .doc("The chunk size in bytes during writing out the bytes of 
ChunkedByteBuffer.")
   .bytesConf(ByteUnit.BYTE)
   .checkValue(_ <= ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH,
-"The chunk size during writing out the bytes of" +
-" ChunkedByteBuffer should not larger than Int.MaxValue - 15.")
+"The chunk size during writing out the bytes of ChunkedByteBuffer 
should" +
+  s" not be greater than 
${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}.")
--- End diff --

not be greater than -> less than or equal to


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...

2018-11-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22514#discussion_r237747152
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala ---
@@ -181,62 +180,39 @@ case class RelationConversions(
 conf: SQLConf,
 sessionCatalog: HiveSessionCatalog) extends Rule[LogicalPlan] {
   private def isConvertible(relation: HiveTableRelation): Boolean = {
-val serde = 
relation.tableMeta.storage.serde.getOrElse("").toLowerCase(Locale.ROOT)
-serde.contains("parquet") && 
conf.getConf(HiveUtils.CONVERT_METASTORE_PARQUET) ||
-  serde.contains("orc") && 
conf.getConf(HiveUtils.CONVERT_METASTORE_ORC)
+isConvertible(relation.tableMeta)
   }
 
-  // Return true for Apache ORC and Hive ORC-related configuration names.
-  // Note that Spark doesn't support configurations like 
`hive.merge.orcfile.stripe.level`.
-  private def isOrcProperty(key: String) =
-key.startsWith("orc.") || key.contains(".orc.")
-
-  private def isParquetProperty(key: String) =
-key.startsWith("parquet.") || key.contains(".parquet.")
-
-  private def convert(relation: HiveTableRelation): LogicalRelation = {
-val serde = 
relation.tableMeta.storage.serde.getOrElse("").toLowerCase(Locale.ROOT)
-
-// Consider table and storage properties. For properties existing in 
both sides, storage
-// properties will supersede table properties.
-if (serde.contains("parquet")) {
-  val options = 
relation.tableMeta.properties.filterKeys(isParquetProperty) ++
-relation.tableMeta.storage.properties + 
(ParquetOptions.MERGE_SCHEMA ->
-
conf.getConf(HiveUtils.CONVERT_METASTORE_PARQUET_WITH_SCHEMA_MERGING).toString)
-  sessionCatalog.metastoreCatalog
-.convertToLogicalRelation(relation, options, 
classOf[ParquetFileFormat], "parquet")
-} else {
-  val options = 
relation.tableMeta.properties.filterKeys(isOrcProperty) ++
-relation.tableMeta.storage.properties
-  if (conf.getConf(SQLConf.ORC_IMPLEMENTATION) == "native") {
-sessionCatalog.metastoreCatalog.convertToLogicalRelation(
-  relation,
-  options,
-  
classOf[org.apache.spark.sql.execution.datasources.orc.OrcFileFormat],
-  "orc")
-  } else {
-sessionCatalog.metastoreCatalog.convertToLogicalRelation(
-  relation,
-  options,
-  classOf[org.apache.spark.sql.hive.orc.OrcFileFormat],
-  "orc")
-  }
-}
+  private def isConvertible(tableMeta: CatalogTable): Boolean = {
+val serde = 
tableMeta.storage.serde.getOrElse("").toLowerCase(Locale.ROOT)
+serde.contains("parquet") && 
SQLConf.get.getConf(HiveUtils.CONVERT_METASTORE_PARQUET) ||
+  serde.contains("orc") && 
SQLConf.get.getConf(HiveUtils.CONVERT_METASTORE_ORC)
   }
 
+  private val metastoreCatalog = sessionCatalog.metastoreCatalog
+
   override def apply(plan: LogicalPlan): LogicalPlan = {
 plan resolveOperators {
   // Write path
   case InsertIntoTable(r: HiveTableRelation, partition, query, 
overwrite, ifPartitionNotExists)
 // Inserting into partitioned table is not supported in 
Parquet/Orc data source (yet).
   if query.resolved && DDLUtils.isHiveTable(r.tableMeta) &&
 !r.isPartitioned && isConvertible(r) =>
-InsertIntoTable(convert(r), partition, query, overwrite, 
ifPartitionNotExists)
+InsertIntoTable(metastoreCatalog.convert(r), partition,
+  query, overwrite, ifPartitionNotExists)
 
   // Read path
   case relation: HiveTableRelation
   if DDLUtils.isHiveTable(relation.tableMeta) && 
isConvertible(relation) =>
-convert(relation)
+metastoreCatalog.convert(relation)
+
+  // CTAS
+  case CreateTable(tableDesc, mode, Some(query))
+  if DDLUtils.isHiveTable(tableDesc) && 
tableDesc.partitionColumnNames.isEmpty &&
+isConvertible(tableDesc) =>
+DDLUtils.checkDataColNames(tableDesc)
--- End diff --

In HiveAnalysis, when transforming CreateTable to 
CreateHiveTableAsSelectCommand, it has this too. checkDataColNames checks if 
any invalid character is using in column name.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23162: [MINOR][DOC] Correct some document description er...

2018-11-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23162#discussion_r237746963
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -513,7 +513,7 @@ package object config {
 "is written in unsafe shuffle writer. In KiB unless otherwise 
specified.")
   .bytesConf(ByteUnit.KiB)
   .checkValue(v => v > 0 && v <= 
ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH / 1024,
-s"The buffer size must be greater than 0 and less than" +
+s"The buffer size must be positive and not greater than" +
--- End diff --

not greater than -> less than or equal to


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23173: [SPARK-26208][SQL] add headers to empty csv files when h...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23173
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99486/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23162: [MINOR][DOC] Correct some document description er...

2018-11-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23162#discussion_r237747015
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -503,7 +503,7 @@ package object config {
 "made in creating intermediate shuffle files.")
   .bytesConf(ByteUnit.KiB)
   .checkValue(v => v > 0 && v <= 
ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH / 1024,
-s"The file buffer size must be greater than 0 and less than" +
+s"The file buffer size must be positive and not greater than" +
--- End diff --

not greater than -> less than or equal to


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23173: [SPARK-26208][SQL] add headers to empty csv files when h...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23173
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23173: [SPARK-26208][SQL] add headers to empty csv files when h...

2018-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23173
  
**[Test build #99486 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99486/testReport)**
 for PR 23173 at commit 
[`29fc6b8`](https://github.com/apache/spark/commit/29fc6b89094841ba2a28827247305e4fa6c01520).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23162: [MINOR][DOC] Correct some document description errors

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23162
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23162: [MINOR][DOC] Correct some document description errors

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23162
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99484/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23151: [SPARK-26180][CORE][TEST] Add a withCreateTempDir...

2018-11-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23151#discussion_r237746374
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
@@ -1134,39 +1130,40 @@ class SparkSubmitSuite
 val hadoopConf = new Configuration()
 updateConfWithFakeS3Fs(hadoopConf)
 
-val tmpDir = Utils.createTempDir()
-val pyFile = File.createTempFile("tmpPy", ".egg", tmpDir)
+withTempDir { tmpDir =>
+  val pyFile = File.createTempFile("tmpPy", ".egg", tmpDir)
 
-val args = Seq(
-  "--class", UserClasspathFirstTest.getClass.getName.stripPrefix("$"),
-  "--name", "testApp",
-  "--master", "yarn",
-  "--deploy-mode", "client",
-  "--py-files", s"s3a://${pyFile.getAbsolutePath}",
-  "spark-internal"
-)
+  val args = Seq(
+"--class", 
UserClasspathFirstTest.getClass.getName.stripPrefix("$"),
+"--name", "testApp",
+"--master", "yarn",
+"--deploy-mode", "client",
+"--py-files", s"s3a://${pyFile.getAbsolutePath}",
+"spark-internal"
+  )
 
-val appArgs = new SparkSubmitArguments(args)
-val (_, _, conf, _) = submit.prepareSubmitEnvironment(appArgs, conf = 
Some(hadoopConf))
+  val appArgs = new SparkSubmitArguments(args)
+  val (_, _, conf, _) = submit.prepareSubmitEnvironment(appArgs, conf 
= Some(hadoopConf))
 
-conf.get(PY_FILES.key) should be (s"s3a://${pyFile.getAbsolutePath}")
-conf.get("spark.submit.pyFiles") should (startWith("/"))
+  conf.get(PY_FILES.key) should be(s"s3a://${pyFile.getAbsolutePath}")
--- End diff --

ditto. Technically it should better be assert and avoid infix notation but 
I think we don't have to do it here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23162: [MINOR][DOC] Correct some document description errors

2018-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23162
  
**[Test build #99484 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99484/testReport)**
 for PR 23162 at commit 
[`54eda1a`](https://github.com/apache/spark/commit/54eda1a6e544b1ee345580001d347262e862f719).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23151: [SPARK-26180][CORE][TEST] Add a withCreateTempDir...

2018-11-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23151#discussion_r237746228
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
@@ -985,37 +985,38 @@ class SparkSubmitSuite
 val hadoopConf = new Configuration()
 updateConfWithFakeS3Fs(hadoopConf)
 
-val tmpDir = Utils.createTempDir()
-val file = File.createTempFile("tmpFile", "", tmpDir)
-val pyFile = File.createTempFile("tmpPy", ".egg", tmpDir)
-val mainResource = File.createTempFile("tmpPy", ".py", tmpDir)
-val tmpJar = TestUtils.createJarWithFiles(Map("test.resource" -> 
"USER"), tmpDir)
-val tmpJarPath = s"s3a://${new File(tmpJar.toURI).getAbsolutePath}"
+withTempDir { tmpDir =>
+  val file = File.createTempFile("tmpFile", "", tmpDir)
+  val pyFile = File.createTempFile("tmpPy", ".egg", tmpDir)
+  val mainResource = File.createTempFile("tmpPy", ".py", tmpDir)
+  val tmpJar = TestUtils.createJarWithFiles(Map("test.resource" -> 
"USER"), tmpDir)
+  val tmpJarPath = s"s3a://${new File(tmpJar.toURI).getAbsolutePath}"
 
-val args = Seq(
-  "--class", UserClasspathFirstTest.getClass.getName.stripPrefix("$"),
-  "--name", "testApp",
-  "--master", "yarn",
-  "--deploy-mode", "client",
-  "--jars", tmpJarPath,
-  "--files", s"s3a://${file.getAbsolutePath}",
-  "--py-files", s"s3a://${pyFile.getAbsolutePath}",
-  s"s3a://$mainResource"
+  val args = Seq(
+"--class", 
UserClasspathFirstTest.getClass.getName.stripPrefix("$"),
+"--name", "testApp",
+"--master", "yarn",
+"--deploy-mode", "client",
+"--jars", tmpJarPath,
+"--files", s"s3a://${file.getAbsolutePath}",
+"--py-files", s"s3a://${pyFile.getAbsolutePath}",
+s"s3a://$mainResource"
   )
 
-val appArgs = new SparkSubmitArguments(args)
-val (_, _, conf, _) = submit.prepareSubmitEnvironment(appArgs, conf = 
Some(hadoopConf))
+  val appArgs = new SparkSubmitArguments(args)
+  val (_, _, conf, _) = submit.prepareSubmitEnvironment(appArgs, conf 
= Some(hadoopConf))
 
-// All the resources should still be remote paths, so that YARN client 
will not upload again.
-conf.get("spark.yarn.dist.jars") should be (tmpJarPath)
--- End diff --

I wouldn't change those spaces alone tho. Let's leave as were.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23173: [SPARK-26208][SQL] add headers to empty csv files when h...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23173
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23173: [SPARK-26208][SQL] add headers to empty csv files when h...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23173
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99485/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23173: [SPARK-26208][SQL] add headers to empty csv files when h...

2018-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23173
  
**[Test build #99485 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99485/testReport)**
 for PR 23173 at commit 
[`6f498a0`](https://github.com/apache/spark/commit/6f498a043a2347f6f391257d04e6d7bf5f98470d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class CSVInferSchema(options: CSVOptions) extends Serializable `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

2018-11-29 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22957
  
LGTM, cc @viirya as well


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

2018-11-29 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22957#discussion_r237745005
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala ---
@@ -780,6 +780,23 @@ class PlannerSuite extends SharedSQLContext {
 classOf[PartitioningCollection])
 }
   }
+
+  test("SPARK-25951: avoid redundant shuffle on rename") {
--- End diff --

can we have an end-to-end test as well?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23152
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99483/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23152
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...

2018-11-29 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/22514
  
https://user-images.githubusercontent.com/68855/49268483-aaa6d000-f49a-11e8-92c3-5ee78012fe9e.png;>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...

2018-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23152
  
**[Test build #99483 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99483/testReport)**
 for PR 23152 at commit 
[`f30f307`](https://github.com/apache/spark/commit/f30f3073b992c5862d798627a721d70716cf6be7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23184: [SPARK-26227][R] from_[csv|json] should accept schema_of...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23184
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22939: [SPARK-25446][R] Add schema_of_json() and schema_of_csv(...

2018-11-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22939
  
@felixcheung, I tested when the user passes in a column that is not a 
literal string, and it shows the results as below:

```
> json <- '{"name":"Bob"}'
> df <- sql("SELECT * FROM range(1)")
> head(select(df, schema_of_json(df$id)))
Error in handleErrors(returnStatus, conn) :
  org.apache.spark.sql.AnalysisException: cannot resolve 
'schema_of_json(`id`)' due to data type mismatch: The input json should be a 
string literal and not null; however, got `id`.;;
'Project [schema_of_json(id#0L) AS schema_of_json(id)#2]
+- Project [id#0L]
   +- Range (0, 1, step=1, splits=None)

at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
...
```

```
> csv <- "Amsterdam,2018"
> df <- sql("SELECT * FROM range(1)")
> head(select(df, schema_of_csv(df$id)))
Error in handleErrors(returnStatus, conn) :
  org.apache.spark.sql.AnalysisException: cannot resolve 
'schema_of_csv(`id`)' due to data type mismatch: The input csv should be a 
string literal and not null; however, got `id`.;;
'Project [schema_of_csv(id#3L) AS schema_of_csv(id)#5]
+- Project [id#3L]
   +- Range (0, 1, step=1, splits=None)

at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
...
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23165: [SPARK-26188][SQL] FileIndex: don't infer data ty...

2018-11-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23165


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23165: [SPARK-26188][SQL] FileIndex: don't infer data types of ...

2018-11-29 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23165
  
thanks, merging to master/2.4!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23031: [SPARK-26060][SQL] Track SparkConf entries and ma...

2018-11-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23031


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23184: [SPARK-26227][R] from_[csv|json] should accept schema_of...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23184
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5561/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23031: [SPARK-26060][SQL] Track SparkConf entries and make SET ...

2018-11-29 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23031
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23184: [SPARK-26227][R] from_[csv|json] should accept schema_of...

2018-11-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/23184
  
cc @felixcheung, @viirya and @MaxGekk 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23184: [SPARK-26227][R] from_[csv|json] should accept schema_of...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23184
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5560/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23184: [SPARK-26227][R] from_[csv|json] should accept schema_of...

2018-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23184
  
**[Test build #99494 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99494/testReport)**
 for PR 23184 at commit 
[`8877837`](https://github.com/apache/spark/commit/88778378db1ab3d150c104066e416f7b8f7d7a7b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23184: [SPARK-26227][R] from_[csv|json] should accept schema_of...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23184
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23184: [SPARK-26227][R] from_[csv|json] should accept sc...

2018-11-29 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/23184

[SPARK-26227][R] from_[csv|json] should accept schema_of_[csv|json] in R API

## What changes were proposed in this pull request?

**1. Document `from_csv(..., schema_of_csv(...))` support:**

```R
csv <- "Amsterdam,2018"
df <- sql(paste0("SELECT '", csv, "' as csv"))
head(select(df, from_csv(df$csv, schema_of_csv(csv
```

```
from_csv(csv)
1 Amsterdam, 2018
```


**2. Allow `from_json(..., schema_of_json(...))`**

Before:


```R
df2 <- sql("SELECT named_struct('name', 'Bob') as people")
df2 <- mutate(df2, people_json = to_json(df2$people))
head(select(df2, from_json(df2$people_json, 
schema_of_json(head(df2)$people_json
```

```
Error in (function (classes, fdef, mtable)  :
  unable to find an inherited method for function ‘from_json’ for 
signature ‘"Column", "Column"’
```


After:

```R
df2 <- sql("SELECT named_struct('name', 'Bob') as people")
df2 <- mutate(df2, people_json = to_json(df2$people))
head(select(df2, from_json(df2$people_json, 
schema_of_json(head(df2)$people_json
```

```
  from_json(people_json)
1Bob
```

**3. (While I'm here) Allow `structType` as schema for `from_csv` support 
to match with `from_json`.**

Before:

```R
csv <- "Amsterdam,2018"
df <- sql(paste0("SELECT '", csv, "' as csv"))
head(select(df, from_csv(df$csv, structType("city STRING, year INT"
```

```
Error in (function (classes, fdef, mtable)  :
  unable to find an inherited method for function ‘from_csv’ for 
signature ‘"Column", "structType"’
```

After:

```R
csv <- "Amsterdam,2018"
df <- sql(paste0("SELECT '", csv, "' as csv"))
head(select(df, from_csv(df$csv, structType("city STRING, year INT"
```

```
from_csv(csv)
1 Amsterdam, 2018
```



## How was this patch tested?

Manually tested and unittests were added.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-26227-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23184.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23184


commit 193d68856769f945349449469ad6e536449ec5f0
Author: Hyukjin Kwon 
Date:   2018-11-30T03:12:00Z

from_[csv|json] should accept schema_of_[csv|json] in R API




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23166: [SPARK-26201] Fix python broadcast with encryptio...

2018-11-29 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/23166#discussion_r237738802
  
--- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala 
---
@@ -708,16 +709,36 @@ private[spark] class PythonBroadcast(@transient var 
path: String) extends Serial
   override def handleConnection(sock: Socket): Unit = {
 val env = SparkEnv.get
 val in = sock.getInputStream()
-val dir = new File(Utils.getLocalDir(env.conf))
-val file = File.createTempFile("broadcast", "", dir)
-path = file.getAbsolutePath
-val out = env.serializerManager.wrapForEncryption(new 
FileOutputStream(path))
+val abspath = new File(path).getAbsolutePath
+val out = env.serializerManager.wrapForEncryption(new 
FileOutputStream(abspath))
--- End diff --

yeah I see how it was wrong before.  I'm saying, after you add 
`setupDecryptionServer`, then that decryption server would still be reading 
from the value of `path` which gets updated here, since its the same object in 
the driver's JVM.

anyway, this isn't a big deal, I think its better with your change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23181: [SPARK-26219][CORE] Executor summary should get updated ...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23181
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5558/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23086: [SPARK-25528][SQL] data source v2 API refactor (batch re...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23086
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23181: [SPARK-26219][CORE] Executor summary should get updated ...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23181
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23086: [SPARK-25528][SQL] data source v2 API refactor (batch re...

2018-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23086
  
**[Test build #99493 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99493/testReport)**
 for PR 23086 at commit 
[`eecb161`](https://github.com/apache/spark/commit/eecb161075720aec0c496576fe6b6ad749c3a726).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >