[GitHub] spark pull request #23176: [SPARK-26211][SQL] Fix InSet for binary, and stru...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/23176#discussion_r237771176 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/PredicateSuite.scala --- @@ -293,6 +293,54 @@ class PredicateSuite extends SparkFunSuite with ExpressionEvalHelper { } } + test("INSET: binary") { --- End diff -- Sure, I'll do it later. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23176: [SPARK-26211][SQL] Fix InSet for binary, and stru...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23176#discussion_r237770687 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/PredicateSuite.scala --- @@ -293,6 +293,54 @@ class PredicateSuite extends SparkFunSuite with ExpressionEvalHelper { } } + test("INSET: binary") { --- End diff -- good idea! we should test `In` and `InSet` together --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23086: [SPARK-25528][SQL] data source v2 API refactor (batch re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23086 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23086: [SPARK-25528][SQL] data source v2 API refactor (batch re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23086 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99493/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23086: [SPARK-25528][SQL] data source v2 API refactor (batch re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23086 **[Test build #99493 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99493/testReport)** for PR 23086 at commit [`eecb161`](https://github.com/apache/spark/commit/eecb161075720aec0c496576fe6b6ad749c3a726). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23181: [SPARK-26219][CORE] Executor summary should get updated ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23181 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23181: [SPARK-26219][CORE] Executor summary should get updated ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23181 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99492/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23181: [SPARK-26219][CORE] Executor summary should get updated ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23181 **[Test build #99492 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99492/testReport)** for PR 23181 at commit [`1be36f7`](https://github.com/apache/spark/commit/1be36f77f58576db9650a4584b1f882e2f284d0f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/23152#discussion_r237768463 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala --- @@ -2276,4 +2276,16 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton { } } + + test("SPARK-26181 hasMinMaxStats method of ColumnStatsMap is not correct") { +withSQLConf(SQLConf.CBO_ENABLED.key -> "true") { + withTable("all_null") { +sql("create table all_null (attrInt int)") +sql("insert into all_null values (null)") +sql("analyze table all_null compute statistics for columns attrInt") +checkAnswer(sql("select * from all_null where attrInt < 1"), Nil) --- End diff -- This test can pass without this patch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23164: [SPARK-26198][SQL] Fix Metadata serialize null values th...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/23164 cc @srowen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23184: [SPARK-26227][R] from_[csv|json] should accept schema_of...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23184 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99494/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23183: [SPARK-26226][SQL] Update query tracker to report timeli...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23183 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23183: [SPARK-26226][SQL] Update query tracker to report timeli...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23183 **[Test build #99500 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99500/testReport)** for PR 23183 at commit [`5f5a0e8`](https://github.com/apache/spark/commit/5f5a0e83245592ab5af7fb9df8292bdff4ca1385). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23184: [SPARK-26227][R] from_[csv|json] should accept schema_of...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23184 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23183: [SPARK-26226][SQL] Update query tracker to report timeli...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23183 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5567/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23184: [SPARK-26227][R] from_[csv|json] should accept schema_of...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23184 **[Test build #99494 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99494/testReport)** for PR 23184 at commit [`8877837`](https://github.com/apache/spark/commit/88778378db1ab3d150c104066e416f7b8f7d7a7b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23176: [SPARK-26211][SQL] Fix InSet for binary, and stru...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/23176#discussion_r237766198 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/PredicateSuite.scala --- @@ -293,6 +293,54 @@ class PredicateSuite extends SparkFunSuite with ExpressionEvalHelper { } } + test("INSET: binary") { --- End diff -- Regarding the semantics, InSet is equal to In. Could we combine the test cases? Test both? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23162: [MINOR][DOC] Correct some document description errors
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23162 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5566/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23162: [MINOR][DOC] Correct some document description errors
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23162 **[Test build #99499 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99499/testReport)** for PR 23162 at commit [`97454b2`](https://github.com/apache/spark/commit/97454b239cda92c1cc58a67434c027a7486cc7fa). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23162: [MINOR][DOC] Correct some document description errors
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23162 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23152 **[Test build #99498 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99498/testReport)** for PR 23152 at commit [`ea7a876`](https://github.com/apache/spark/commit/ea7a8764b27c1e38a65f549b00e7acec6074d2f9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23152 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5565/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23152 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23170: [SPARK-24423][FOLLOW-UP][SQL] Fix error example
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/23170 It's not a regression. The first check exists in [2.1.0](https://github.com/apache/spark/blob/v2.1.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala#L99-L102) and the second check is added in [2.4.0](https://github.com/apache/spark/blob/v2.4.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala#L133-L143). cc @dilipbiswal --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23185: [MINOR][Docs] Fix typos
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23185 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23185: [MINOR][Docs] Fix typos
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23185 **[Test build #99497 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99497/testReport)** for PR 23185 at commit [`70fc30d`](https://github.com/apache/spark/commit/70fc30d1e0eac795c6a230f7255b7e488b1a57cf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23185: [MINOR][Docs] Fix typos
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23185 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99497/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23185: [MINOR][Docs] Fix typos
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23185 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23185: [MINOR][Docs] Fix typos
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23185 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5564/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23185: [MINOR][Docs] Fix typos
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23185 Thanks for skimming the whole doc. cc @srowen. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23185: [MINOR][Docs] Fix typos
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23185 **[Test build #99497 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99497/testReport)** for PR 23185 at commit [`70fc30d`](https://github.com/apache/spark/commit/70fc30d1e0eac795c6a230f7255b7e488b1a57cf). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23185: [MINOR][Docs] Fix typos
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23185 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r237756348 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala --- @@ -95,9 +77,98 @@ case class CreateHiveTableAsSelectCommand( Seq.empty[Row] } + def getDataWritingCommand( --- End diff -- I feel it's better to have 2 methods: `writingCommandForExistingTable`, `writingCommandForNewTable` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r237756394 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala --- @@ -95,9 +77,98 @@ case class CreateHiveTableAsSelectCommand( Seq.empty[Row] } + def getDataWritingCommand( +catalog: SessionCatalog, +tableDesc: CatalogTable, +tableExists: Boolean): DataWritingCommand + override def argString: String = { s"[Database:${tableDesc.database}, " + s"TableName: ${tableDesc.identifier.table}, " + s"InsertIntoHiveTable]" } } + +/** + * Create table and insert the query result into it. + * + * @param tableDesc the Table Describe, which may contain serde, storage handler etc. + * @param query the query whose result will be insert into the new relation + * @param mode SaveMode + */ +case class CreateHiveTableAsSelectCommand( +tableDesc: CatalogTable, +query: LogicalPlan, +outputColumnNames: Seq[String], +mode: SaveMode) + extends CreateHiveTableAsSelectBase { + + override def getDataWritingCommand( + catalog: SessionCatalog, + tableDesc: CatalogTable, + tableExists: Boolean): DataWritingCommand = { +if (tableExists) { + InsertIntoHiveTable( +tableDesc, +Map.empty, +query, +overwrite = false, +ifPartitionNotExists = false, +outputColumnNames = outputColumnNames) +} else { + // For CTAS, there is no static partition values to insert. + val partition = tableDesc.partitionColumnNames.map(_ -> None).toMap + InsertIntoHiveTable( +tableDesc, +partition, +query, +overwrite = true, +ifPartitionNotExists = false, +outputColumnNames = outputColumnNames) +} + } +} + +/** + * Create table and insert the query result into it. This creates Hive table but inserts + * the query result into it by using data source. + * + * @param tableDesc the Table Describe, which may contain serde, storage handler etc. --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22939: [SPARK-25446][R] Add schema_of_json() and schema_of_csv(...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/22939 Error looks reasonable... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23151: [SPARK-26180][CORE][TEST] Reuse withTempDir function to ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23151 **[Test build #99496 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99496/testReport)** for PR 23151 at commit [`beccd74`](https://github.com/apache/spark/commit/beccd749e9087a557fe56dbb2610abae663f4199). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23151: [SPARK-26180][CORE][TEST] Reuse withTempDir function to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23151 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5563/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23151: [SPARK-26180][CORE][TEST] Reuse withTempDir function to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23151 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r237753623 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala --- @@ -95,9 +77,98 @@ case class CreateHiveTableAsSelectCommand( Seq.empty[Row] } + def getDataWritingCommand( +catalog: SessionCatalog, +tableDesc: CatalogTable, +tableExists: Boolean): DataWritingCommand + override def argString: String = { s"[Database:${tableDesc.database}, " + s"TableName: ${tableDesc.identifier.table}, " + s"InsertIntoHiveTable]" } } + +/** + * Create table and insert the query result into it. + * + * @param tableDesc the Table Describe, which may contain serde, storage handler etc. --- End diff -- `table description` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23145: [MINOR][Docs][WIP] Fix Typos
Github user kjmrknsn commented on the issue: https://github.com/apache/spark/pull/23145 Thanks for reviewing and merging. I've just finished checking the whole documentation. Here is the complete version of this PR : https://github.com/apache/spark/pull/23185 Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23185: [MINOR][Docs] Fix typos
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23185 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r237753433 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala --- @@ -95,9 +77,98 @@ case class CreateHiveTableAsSelectCommand( Seq.empty[Row] } + def getDataWritingCommand( +catalog: SessionCatalog, +tableDesc: CatalogTable, +tableExists: Boolean): DataWritingCommand + override def argString: String = { s"[Database:${tableDesc.database}, " + s"TableName: ${tableDesc.identifier.table}, " + s"InsertIntoHiveTable]" } } + +/** + * Create table and insert the query result into it. + * + * @param tableDesc the Table Describe, which may contain serde, storage handler etc. + * @param query the query whose result will be insert into the new relation + * @param mode SaveMode + */ +case class CreateHiveTableAsSelectCommand( +tableDesc: CatalogTable, +query: LogicalPlan, +outputColumnNames: Seq[String], +mode: SaveMode) + extends CreateHiveTableAsSelectBase { + + override def getDataWritingCommand( + catalog: SessionCatalog, + tableDesc: CatalogTable, + tableExists: Boolean): DataWritingCommand = { +if (tableExists) { + InsertIntoHiveTable( +tableDesc, +Map.empty, +query, +overwrite = false, +ifPartitionNotExists = false, +outputColumnNames = outputColumnNames) +} else { + // For CTAS, there is no static partition values to insert. + val partition = tableDesc.partitionColumnNames.map(_ -> None).toMap + InsertIntoHiveTable( +tableDesc, +partition, +query, +overwrite = true, +ifPartitionNotExists = false, +outputColumnNames = outputColumnNames) +} + } +} + +/** + * Create table and insert the query result into it. This creates Hive table but inserts + * the query result into it by using data source. + * + * @param tableDesc the Table Describe, which may contain serde, storage handler etc. + * @param query the query whose result will be insert into the new relation + * @param mode SaveMode + */ +case class CreateHiveTableAsSelectWithDataSourceCommand( --- End diff -- `OptimizedCreateHiveTableAsSelectCommand`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23185: [MINOR][Docs] Fix typos
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23185 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23185: [MINOR][Docs] Fix typos
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23185 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23185: [MINOR][Docs] Fix typos
GitHub user kjmrknsn opened a pull request: https://github.com/apache/spark/pull/23185 [MINOR][Docs] Fix typos ## What changes were proposed in this pull request? Fix Typos. ## How was this patch tested? NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/kjmrknsn/spark docUpdate Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23185.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23185 commit 70fc30d1e0eac795c6a230f7255b7e488b1a57cf Author: Keiji Yoshida Date: 2018-11-26T15:29:16Z [MINOR][Docs] Fix typos --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23151: [SPARK-26180][CORE][TEST] Reuse withTempDir function to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23151 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23151: [SPARK-26180][CORE][TEST] Reuse withTempDir function to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23151 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99489/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23151: [SPARK-26180][CORE][TEST] Reuse withTempDir function to ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23151 **[Test build #99489 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99489/testReport)** for PR 23151 at commit [`482c4f4`](https://github.com/apache/spark/commit/482c4f4231b7f566de8b909256b74264efc5e821). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23182: Config change followup to [SPARK-26177] Automated format...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23182 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23182: Config change followup to [SPARK-26177] Automated format...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23182 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99488/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23182: Config change followup to [SPARK-26177] Automated format...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23182 **[Test build #99488 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99488/testReport)** for PR 23182 at commit [`07ca58f`](https://github.com/apache/spark/commit/07ca58ff2e7b0df19d4d755cba0152e323dc0d99). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23146: [SPARK-26173] [MLlib] Prior regularization for Logistic ...
Github user sujithjay commented on the issue: https://github.com/apache/spark/pull/23146 cc: @kiszk @viirya @yanboliang @srowen Could you please review this PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r237749421 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala --- @@ -92,4 +92,18 @@ class HiveParquetSuite extends QueryTest with ParquetTest with TestHiveSingleton } } } + + test("SPARK-25271: write empty map into hive parquet table") { --- End diff -- Added a new test for that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22957 Btw, I think we can update the PR title and description to reflect new changes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22514 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22514 **[Test build #99495 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99495/testReport)** for PR 22514 at commit [`9629175`](https://github.com/apache/spark/commit/96291751c5a4992325f37bcb794ea5fd3f31593b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22957#discussion_r237749287 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala --- @@ -780,6 +780,23 @@ class PlannerSuite extends SharedSQLContext { classOf[PartitioningCollection]) } } + + test("SPARK-25951: avoid redundant shuffle on rename") { --- End diff -- +1 if possible. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22514 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5562/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22957 This looks good to me. Just a comment about wording. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22957#discussion_r237747550 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -195,14 +195,35 @@ abstract class Expression extends TreeNode[Expression] { } /** - * Returns true when two expressions will always compute the same result, even if they differ + * Returns true when two expressions will always compute the same output, even if they differ * cosmetically (i.e. capitalization of names in attributes may be different). * * See [[Canonicalize]] for more details. + * + * This method should be used (instead of `sameResult`) when comparing if 2 expressions are the + * same and one can replace the other (eg. in Optimizer/Analyzer rules where we want to replace + * equivalent expressions). It should not be used (and `sameResult` should be used instead) when + * comparing if 2 expressions produce the same results (in this case `semanticEquals` can be too + * strict). */ def semanticEquals(other: Expression): Boolean = deterministic && other.deterministic && canonicalized == other.canonicalized + /** + * Returns true when two expressions will always compute the same result, even if the output may + * be different, because of different names or similar differences. --- End diff -- I think here `output` is a bit confusing. Do we mean the output names? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22957#discussion_r237747770 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -195,14 +195,35 @@ abstract class Expression extends TreeNode[Expression] { } /** - * Returns true when two expressions will always compute the same result, even if they differ + * Returns true when two expressions will always compute the same output, even if they differ * cosmetically (i.e. capitalization of names in attributes may be different). * * See [[Canonicalize]] for more details. + * + * This method should be used (instead of `sameResult`) when comparing if 2 expressions are the + * same and one can replace the other (eg. in Optimizer/Analyzer rules where we want to replace + * equivalent expressions). It should not be used (and `sameResult` should be used instead) when + * comparing if 2 expressions produce the same results (in this case `semanticEquals` can be too + * strict). */ def semanticEquals(other: Expression): Boolean = deterministic && other.deterministic && canonicalized == other.canonicalized + /** + * Returns true when two expressions will always compute the same result, even if the output may + * be different, because of different names or similar differences. --- End diff -- So sameResult returns if the evaluated results between two expressions are exactly the same? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23162: [MINOR][DOC] Correct some document description errors
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23162 Few nit comments because I thought we should avoid: negative comparison; however, let me leave it to @srowen. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23162: [MINOR][DOC] Correct some document description er...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23162#discussion_r237747341 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -430,8 +430,8 @@ package object config { .doc("The chunk size in bytes during writing out the bytes of ChunkedByteBuffer.") .bytesConf(ByteUnit.BYTE) .checkValue(_ <= ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH, -"The chunk size during writing out the bytes of" + -" ChunkedByteBuffer should not larger than Int.MaxValue - 15.") +"The chunk size during writing out the bytes of ChunkedByteBuffer should" + + s" not be greater than ${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}.") --- End diff -- not be greater than -> less than or equal to --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r237747152 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala --- @@ -181,62 +180,39 @@ case class RelationConversions( conf: SQLConf, sessionCatalog: HiveSessionCatalog) extends Rule[LogicalPlan] { private def isConvertible(relation: HiveTableRelation): Boolean = { -val serde = relation.tableMeta.storage.serde.getOrElse("").toLowerCase(Locale.ROOT) -serde.contains("parquet") && conf.getConf(HiveUtils.CONVERT_METASTORE_PARQUET) || - serde.contains("orc") && conf.getConf(HiveUtils.CONVERT_METASTORE_ORC) +isConvertible(relation.tableMeta) } - // Return true for Apache ORC and Hive ORC-related configuration names. - // Note that Spark doesn't support configurations like `hive.merge.orcfile.stripe.level`. - private def isOrcProperty(key: String) = -key.startsWith("orc.") || key.contains(".orc.") - - private def isParquetProperty(key: String) = -key.startsWith("parquet.") || key.contains(".parquet.") - - private def convert(relation: HiveTableRelation): LogicalRelation = { -val serde = relation.tableMeta.storage.serde.getOrElse("").toLowerCase(Locale.ROOT) - -// Consider table and storage properties. For properties existing in both sides, storage -// properties will supersede table properties. -if (serde.contains("parquet")) { - val options = relation.tableMeta.properties.filterKeys(isParquetProperty) ++ -relation.tableMeta.storage.properties + (ParquetOptions.MERGE_SCHEMA -> - conf.getConf(HiveUtils.CONVERT_METASTORE_PARQUET_WITH_SCHEMA_MERGING).toString) - sessionCatalog.metastoreCatalog -.convertToLogicalRelation(relation, options, classOf[ParquetFileFormat], "parquet") -} else { - val options = relation.tableMeta.properties.filterKeys(isOrcProperty) ++ -relation.tableMeta.storage.properties - if (conf.getConf(SQLConf.ORC_IMPLEMENTATION) == "native") { -sessionCatalog.metastoreCatalog.convertToLogicalRelation( - relation, - options, - classOf[org.apache.spark.sql.execution.datasources.orc.OrcFileFormat], - "orc") - } else { -sessionCatalog.metastoreCatalog.convertToLogicalRelation( - relation, - options, - classOf[org.apache.spark.sql.hive.orc.OrcFileFormat], - "orc") - } -} + private def isConvertible(tableMeta: CatalogTable): Boolean = { +val serde = tableMeta.storage.serde.getOrElse("").toLowerCase(Locale.ROOT) +serde.contains("parquet") && SQLConf.get.getConf(HiveUtils.CONVERT_METASTORE_PARQUET) || + serde.contains("orc") && SQLConf.get.getConf(HiveUtils.CONVERT_METASTORE_ORC) } + private val metastoreCatalog = sessionCatalog.metastoreCatalog + override def apply(plan: LogicalPlan): LogicalPlan = { plan resolveOperators { // Write path case InsertIntoTable(r: HiveTableRelation, partition, query, overwrite, ifPartitionNotExists) // Inserting into partitioned table is not supported in Parquet/Orc data source (yet). if query.resolved && DDLUtils.isHiveTable(r.tableMeta) && !r.isPartitioned && isConvertible(r) => -InsertIntoTable(convert(r), partition, query, overwrite, ifPartitionNotExists) +InsertIntoTable(metastoreCatalog.convert(r), partition, + query, overwrite, ifPartitionNotExists) // Read path case relation: HiveTableRelation if DDLUtils.isHiveTable(relation.tableMeta) && isConvertible(relation) => -convert(relation) +metastoreCatalog.convert(relation) + + // CTAS + case CreateTable(tableDesc, mode, Some(query)) + if DDLUtils.isHiveTable(tableDesc) && tableDesc.partitionColumnNames.isEmpty && +isConvertible(tableDesc) => +DDLUtils.checkDataColNames(tableDesc) --- End diff -- In HiveAnalysis, when transforming CreateTable to CreateHiveTableAsSelectCommand, it has this too. checkDataColNames checks if any invalid character is using in column name. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23162: [MINOR][DOC] Correct some document description er...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23162#discussion_r237746963 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -513,7 +513,7 @@ package object config { "is written in unsafe shuffle writer. In KiB unless otherwise specified.") .bytesConf(ByteUnit.KiB) .checkValue(v => v > 0 && v <= ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH / 1024, -s"The buffer size must be greater than 0 and less than" + +s"The buffer size must be positive and not greater than" + --- End diff -- not greater than -> less than or equal to --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23173: [SPARK-26208][SQL] add headers to empty csv files when h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23173 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99486/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23162: [MINOR][DOC] Correct some document description er...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23162#discussion_r237747015 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -503,7 +503,7 @@ package object config { "made in creating intermediate shuffle files.") .bytesConf(ByteUnit.KiB) .checkValue(v => v > 0 && v <= ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH / 1024, -s"The file buffer size must be greater than 0 and less than" + +s"The file buffer size must be positive and not greater than" + --- End diff -- not greater than -> less than or equal to --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23173: [SPARK-26208][SQL] add headers to empty csv files when h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23173 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23173: [SPARK-26208][SQL] add headers to empty csv files when h...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23173 **[Test build #99486 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99486/testReport)** for PR 23173 at commit [`29fc6b8`](https://github.com/apache/spark/commit/29fc6b89094841ba2a28827247305e4fa6c01520). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23162: [MINOR][DOC] Correct some document description errors
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23162 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23162: [MINOR][DOC] Correct some document description errors
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23162 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99484/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23151: [SPARK-26180][CORE][TEST] Add a withCreateTempDir...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23151#discussion_r237746374 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -1134,39 +1130,40 @@ class SparkSubmitSuite val hadoopConf = new Configuration() updateConfWithFakeS3Fs(hadoopConf) -val tmpDir = Utils.createTempDir() -val pyFile = File.createTempFile("tmpPy", ".egg", tmpDir) +withTempDir { tmpDir => + val pyFile = File.createTempFile("tmpPy", ".egg", tmpDir) -val args = Seq( - "--class", UserClasspathFirstTest.getClass.getName.stripPrefix("$"), - "--name", "testApp", - "--master", "yarn", - "--deploy-mode", "client", - "--py-files", s"s3a://${pyFile.getAbsolutePath}", - "spark-internal" -) + val args = Seq( +"--class", UserClasspathFirstTest.getClass.getName.stripPrefix("$"), +"--name", "testApp", +"--master", "yarn", +"--deploy-mode", "client", +"--py-files", s"s3a://${pyFile.getAbsolutePath}", +"spark-internal" + ) -val appArgs = new SparkSubmitArguments(args) -val (_, _, conf, _) = submit.prepareSubmitEnvironment(appArgs, conf = Some(hadoopConf)) + val appArgs = new SparkSubmitArguments(args) + val (_, _, conf, _) = submit.prepareSubmitEnvironment(appArgs, conf = Some(hadoopConf)) -conf.get(PY_FILES.key) should be (s"s3a://${pyFile.getAbsolutePath}") -conf.get("spark.submit.pyFiles") should (startWith("/")) + conf.get(PY_FILES.key) should be(s"s3a://${pyFile.getAbsolutePath}") --- End diff -- ditto. Technically it should better be assert and avoid infix notation but I think we don't have to do it here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23162: [MINOR][DOC] Correct some document description errors
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23162 **[Test build #99484 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99484/testReport)** for PR 23162 at commit [`54eda1a`](https://github.com/apache/spark/commit/54eda1a6e544b1ee345580001d347262e862f719). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23151: [SPARK-26180][CORE][TEST] Add a withCreateTempDir...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23151#discussion_r237746228 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -985,37 +985,38 @@ class SparkSubmitSuite val hadoopConf = new Configuration() updateConfWithFakeS3Fs(hadoopConf) -val tmpDir = Utils.createTempDir() -val file = File.createTempFile("tmpFile", "", tmpDir) -val pyFile = File.createTempFile("tmpPy", ".egg", tmpDir) -val mainResource = File.createTempFile("tmpPy", ".py", tmpDir) -val tmpJar = TestUtils.createJarWithFiles(Map("test.resource" -> "USER"), tmpDir) -val tmpJarPath = s"s3a://${new File(tmpJar.toURI).getAbsolutePath}" +withTempDir { tmpDir => + val file = File.createTempFile("tmpFile", "", tmpDir) + val pyFile = File.createTempFile("tmpPy", ".egg", tmpDir) + val mainResource = File.createTempFile("tmpPy", ".py", tmpDir) + val tmpJar = TestUtils.createJarWithFiles(Map("test.resource" -> "USER"), tmpDir) + val tmpJarPath = s"s3a://${new File(tmpJar.toURI).getAbsolutePath}" -val args = Seq( - "--class", UserClasspathFirstTest.getClass.getName.stripPrefix("$"), - "--name", "testApp", - "--master", "yarn", - "--deploy-mode", "client", - "--jars", tmpJarPath, - "--files", s"s3a://${file.getAbsolutePath}", - "--py-files", s"s3a://${pyFile.getAbsolutePath}", - s"s3a://$mainResource" + val args = Seq( +"--class", UserClasspathFirstTest.getClass.getName.stripPrefix("$"), +"--name", "testApp", +"--master", "yarn", +"--deploy-mode", "client", +"--jars", tmpJarPath, +"--files", s"s3a://${file.getAbsolutePath}", +"--py-files", s"s3a://${pyFile.getAbsolutePath}", +s"s3a://$mainResource" ) -val appArgs = new SparkSubmitArguments(args) -val (_, _, conf, _) = submit.prepareSubmitEnvironment(appArgs, conf = Some(hadoopConf)) + val appArgs = new SparkSubmitArguments(args) + val (_, _, conf, _) = submit.prepareSubmitEnvironment(appArgs, conf = Some(hadoopConf)) -// All the resources should still be remote paths, so that YARN client will not upload again. -conf.get("spark.yarn.dist.jars") should be (tmpJarPath) --- End diff -- I wouldn't change those spaces alone tho. Let's leave as were. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23173: [SPARK-26208][SQL] add headers to empty csv files when h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23173 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23173: [SPARK-26208][SQL] add headers to empty csv files when h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23173 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99485/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23173: [SPARK-26208][SQL] add headers to empty csv files when h...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23173 **[Test build #99485 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99485/testReport)** for PR 23173 at commit [`6f498a0`](https://github.com/apache/spark/commit/6f498a043a2347f6f391257d04e6d7bf5f98470d). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class CSVInferSchema(options: CSVOptions) extends Serializable ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22957 LGTM, cc @viirya as well --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22957#discussion_r237745005 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala --- @@ -780,6 +780,23 @@ class PlannerSuite extends SharedSQLContext { classOf[PartitioningCollection]) } } + + test("SPARK-25951: avoid redundant shuffle on rename") { --- End diff -- can we have an end-to-end test as well? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23152 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99483/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23152 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22514 https://user-images.githubusercontent.com/68855/49268483-aaa6d000-f49a-11e8-92c3-5ee78012fe9e.png;> --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23152 **[Test build #99483 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99483/testReport)** for PR 23152 at commit [`f30f307`](https://github.com/apache/spark/commit/f30f3073b992c5862d798627a721d70716cf6be7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23184: [SPARK-26227][R] from_[csv|json] should accept schema_of...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23184 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22939: [SPARK-25446][R] Add schema_of_json() and schema_of_csv(...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22939 @felixcheung, I tested when the user passes in a column that is not a literal string, and it shows the results as below: ``` > json <- '{"name":"Bob"}' > df <- sql("SELECT * FROM range(1)") > head(select(df, schema_of_json(df$id))) Error in handleErrors(returnStatus, conn) : org.apache.spark.sql.AnalysisException: cannot resolve 'schema_of_json(`id`)' due to data type mismatch: The input json should be a string literal and not null; however, got `id`.;; 'Project [schema_of_json(id#0L) AS schema_of_json(id)#2] +- Project [id#0L] +- Range (0, 1, step=1, splits=None) at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) ... ``` ``` > csv <- "Amsterdam,2018" > df <- sql("SELECT * FROM range(1)") > head(select(df, schema_of_csv(df$id))) Error in handleErrors(returnStatus, conn) : org.apache.spark.sql.AnalysisException: cannot resolve 'schema_of_csv(`id`)' due to data type mismatch: The input csv should be a string literal and not null; however, got `id`.;; 'Project [schema_of_csv(id#3L) AS schema_of_csv(id)#5] +- Project [id#3L] +- Range (0, 1, step=1, splits=None) at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) ... ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23165: [SPARK-26188][SQL] FileIndex: don't infer data ty...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23165 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23165: [SPARK-26188][SQL] FileIndex: don't infer data types of ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23165 thanks, merging to master/2.4! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23031: [SPARK-26060][SQL] Track SparkConf entries and ma...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23031 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23184: [SPARK-26227][R] from_[csv|json] should accept schema_of...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23184 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5561/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23031: [SPARK-26060][SQL] Track SparkConf entries and make SET ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23031 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23184: [SPARK-26227][R] from_[csv|json] should accept schema_of...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23184 cc @felixcheung, @viirya and @MaxGekk --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23184: [SPARK-26227][R] from_[csv|json] should accept schema_of...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23184 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5560/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23184: [SPARK-26227][R] from_[csv|json] should accept schema_of...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23184 **[Test build #99494 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99494/testReport)** for PR 23184 at commit [`8877837`](https://github.com/apache/spark/commit/88778378db1ab3d150c104066e416f7b8f7d7a7b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23184: [SPARK-26227][R] from_[csv|json] should accept schema_of...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23184 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23184: [SPARK-26227][R] from_[csv|json] should accept sc...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/23184 [SPARK-26227][R] from_[csv|json] should accept schema_of_[csv|json] in R API ## What changes were proposed in this pull request? **1. Document `from_csv(..., schema_of_csv(...))` support:** ```R csv <- "Amsterdam,2018" df <- sql(paste0("SELECT '", csv, "' as csv")) head(select(df, from_csv(df$csv, schema_of_csv(csv ``` ``` from_csv(csv) 1 Amsterdam, 2018 ``` **2. Allow `from_json(..., schema_of_json(...))`** Before: ```R df2 <- sql("SELECT named_struct('name', 'Bob') as people") df2 <- mutate(df2, people_json = to_json(df2$people)) head(select(df2, from_json(df2$people_json, schema_of_json(head(df2)$people_json ``` ``` Error in (function (classes, fdef, mtable) : unable to find an inherited method for function âfrom_jsonâ for signature â"Column", "Column"â ``` After: ```R df2 <- sql("SELECT named_struct('name', 'Bob') as people") df2 <- mutate(df2, people_json = to_json(df2$people)) head(select(df2, from_json(df2$people_json, schema_of_json(head(df2)$people_json ``` ``` from_json(people_json) 1Bob ``` **3. (While I'm here) Allow `structType` as schema for `from_csv` support to match with `from_json`.** Before: ```R csv <- "Amsterdam,2018" df <- sql(paste0("SELECT '", csv, "' as csv")) head(select(df, from_csv(df$csv, structType("city STRING, year INT" ``` ``` Error in (function (classes, fdef, mtable) : unable to find an inherited method for function âfrom_csvâ for signature â"Column", "structType"â ``` After: ```R csv <- "Amsterdam,2018" df <- sql(paste0("SELECT '", csv, "' as csv")) head(select(df, from_csv(df$csv, structType("city STRING, year INT" ``` ``` from_csv(csv) 1 Amsterdam, 2018 ``` ## How was this patch tested? Manually tested and unittests were added. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-26227-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23184.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23184 commit 193d68856769f945349449469ad6e536449ec5f0 Author: Hyukjin Kwon Date: 2018-11-30T03:12:00Z from_[csv|json] should accept schema_of_[csv|json] in R API --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23166: [SPARK-26201] Fix python broadcast with encryptio...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/23166#discussion_r237738802 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala --- @@ -708,16 +709,36 @@ private[spark] class PythonBroadcast(@transient var path: String) extends Serial override def handleConnection(sock: Socket): Unit = { val env = SparkEnv.get val in = sock.getInputStream() -val dir = new File(Utils.getLocalDir(env.conf)) -val file = File.createTempFile("broadcast", "", dir) -path = file.getAbsolutePath -val out = env.serializerManager.wrapForEncryption(new FileOutputStream(path)) +val abspath = new File(path).getAbsolutePath +val out = env.serializerManager.wrapForEncryption(new FileOutputStream(abspath)) --- End diff -- yeah I see how it was wrong before. I'm saying, after you add `setupDecryptionServer`, then that decryption server would still be reading from the value of `path` which gets updated here, since its the same object in the driver's JVM. anyway, this isn't a big deal, I think its better with your change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23181: [SPARK-26219][CORE] Executor summary should get updated ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23181 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5558/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23086: [SPARK-25528][SQL] data source v2 API refactor (batch re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23086 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23181: [SPARK-26219][CORE] Executor summary should get updated ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23181 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23086: [SPARK-25528][SQL] data source v2 API refactor (batch re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23086 **[Test build #99493 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99493/testReport)** for PR 23086 at commit [`eecb161`](https://github.com/apache/spark/commit/eecb161075720aec0c496576fe6b6ad749c3a726). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org