[GitHub] spark pull request #17721: [SPARK-20013][SQL]merge renameTable to alterTable...
GitHub user windpiger opened a pull request: https://github.com/apache/spark/pull/17721 [SPARK-20013][SQL]merge renameTable to alterTable in ExternalCatalog ## What changes were proposed in this pull request? merge renameTable to alterTable in ExternalCatalog has some reasons: In Hive, we rename a Table by alterTable Currently when we create / rename a managed table, we should get the defaultTablePath for them in ExternalCatalog, then we have two defaultTablePath logic in its two subclass HiveExternalCatalog and InMemoryCatalog, additionally there is also a defaultTablePath in SessionCatalog, so till now we have three defaultTablePath in three classes. we'd better to unify them up to SessionCatalog To unify them, we should move some logic from ExternalCatalog to SessionCatalog, renameTable is one of this. while limit to the simple parameters in renameTable ``` def renameTable(db: String, oldName: String, newName: String): Unit ``` even if we move the defaultTablePath logic to SessionCatalog, we can not pass it to renameTable. So we can merge the renameTable to alterTable, and rename it in alterTable. ## How was this patch tested? delete some tests in ExternalCatalogSuite which already existed in SessionCatalogSuite, and move some other tests in ExternalCatalogSuite which does not exist in SessionCatalogSuite You can merge this pull request into a Git repository by running: $ git pull https://github.com/windpiger/spark mergeAlterRenameTable Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17721.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17721 commit 2ea645c6efef36ad2833d8eced0150d4062b1efd Author: windpiger <song...@outlook.com> Date: 2017-04-21T14:52:30Z [SPARK-20013][SQL]merge renameTable to alterTable in ExternalCatalog --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17354: [SPARK-20024] [SQL] [test-maven] SessionCatalog reset ne...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17354 late LGTM~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17338: [SPARK-19990][SQL][test-maven]create a temp file for fil...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17338 @gatorsmile sbt test succeed, and mvn `create temporary view using` passed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17338: [SPARK-19990][SQL][test-maven]create a temp file for fil...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17338 mvn, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17338: [SPARK-19990][SQL]create a temp file for file in test.ja...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17338 test sbt, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17338: [SPARK-19990][SQL][test-maven]create a temp file for fil...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17338 add #17290 again, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17338: [SPARK-19990][SQL][test-maven]create a temp file for fil...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17338 @gatorsmile I have merge this pr with master and revert #17290, and waiting for the jenkins to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17338: [SPARK-19990][SQL][test-maven]create a temp file for fil...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17338 ok, let me do it~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17338: [SPARK-19990][SQL][test-maven]create a temp file for fil...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17338 @gatorsmile we can find all the test failed here [maven test result](https://spark-tests.appspot.com/builds/spark-master-test-maven-hadoop-2.7/2799) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17344: [SPARK-19990][TEST][test-maven][WIP] Use the database af...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17344 LGTMï¼but why maven test has this problemï¼ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17341: [SPARK-20013][SQL]add a newTablePath parameter for renam...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17341 cc @cloud-fan @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17338: [SPARK-19990][SQL][test-maven]create a temp file for fil...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17338 yes,it is. It (jar:file://) will be re-resolved by new Path later, and will throw an exception described in the jira. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17341: [SPARK-20013][SQL]add a newTablePath parameter fo...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17341#discussion_r106782193 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala --- @@ -732,40 +696,6 @@ abstract class ExternalCatalogSuite extends SparkFunSuite with BeforeAndAfterEac assert(!exists(db.locationUri)) } - test("create/drop/rename table should create/delete/rename the directory") { --- End diff -- move to `SessionCatalogSuite` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17341: [SPARK-20013][SQL]add a newTablePath parameter fo...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17341#discussion_r106782178 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala --- @@ -196,30 +196,6 @@ abstract class ExternalCatalogSuite extends SparkFunSuite with BeforeAndAfterEac catalog.dropTable("db2", "unknown_table", ignoreIfNotExists = true, purge = false) } - test("rename table") { -val catalog = newBasicCatalog() -assert(catalog.listTables("db2").toSet == Set("tbl1", "tbl2")) -catalog.renameTable("db2", "tbl1", "tblone") -assert(catalog.listTables("db2").toSet == Set("tblone", "tbl2")) - } - - test("rename table when database/table does not exist") { --- End diff -- already existed in [SessionCatalogSuite](https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala#L393) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17341: [SPARK-20013][SQL]add a newTablePath parameter fo...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17341#discussion_r106782179 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala --- @@ -196,30 +196,6 @@ abstract class ExternalCatalogSuite extends SparkFunSuite with BeforeAndAfterEac catalog.dropTable("db2", "unknown_table", ignoreIfNotExists = true, purge = false) } - test("rename table") { -val catalog = newBasicCatalog() -assert(catalog.listTables("db2").toSet == Set("tbl1", "tbl2")) -catalog.renameTable("db2", "tbl1", "tblone") -assert(catalog.listTables("db2").toSet == Set("tblone", "tbl2")) - } - - test("rename table when database/table does not exist") { -val catalog = newBasicCatalog() -intercept[AnalysisException] { - catalog.renameTable("unknown_db", "unknown_table", "unknown_table") -} -intercept[AnalysisException] { - catalog.renameTable("db2", "unknown_table", "unknown_table") -} - } - - test("rename table when destination table already exists") { --- End diff -- move to `SessionCatalogSuite` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17341: [SPARK-20013][SQL]add a newTablePath parameter fo...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17341#discussion_r106782161 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala --- @@ -196,30 +196,6 @@ abstract class ExternalCatalogSuite extends SparkFunSuite with BeforeAndAfterEac catalog.dropTable("db2", "unknown_table", ignoreIfNotExists = true, purge = false) } - test("rename table") { --- End diff -- already existed in [SessionCatalogSuite](https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala#L361) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17341: [SPARK-20013][SQL]add a newTablePath parameter fo...
GitHub user windpiger opened a pull request: https://github.com/apache/spark/pull/17341 [SPARK-20013][SQL]add a newTablePath parameter for renameTable in ExternalCatalog ## What changes were proposed in this pull request? Currently when we create / rename a managed table, we should get the defaultTablePath for them in ExternalCatalog, then we have two defaultTablePath logic in its two subclass HiveExternalCatalog and InMemoryCatalog, additionally there is also a defaultTablePath in SessionCatalog, so till now we have three defaultTablePath in three classes. we'd better to unify them up to SessionCatalog To unify them, we should move some logic from ExternalCatalog to SessionCatalog, renameTable is one of this. while limit to the simple parameters in renameTable ``` def renameTable(db: String, oldName: String, newName: String): Unit ``` even if we move the defaultTablePath logic to SessionCatalog, we can not pass it to renameTable. So we can add a newTablePath parameter for renameTable in ExternalCatalog ## How was this patch tested? delete some tests in ExternalCatalogSuite which already existed in SessionCatalogSuite, and move some other tests in ExternalCatalogSuite which does not exist in SessionCatalogSuite You can merge this pull request into a Git repository by running: $ git pull https://github.com/windpiger/spark modifyRenameTable Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17341.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17341 commit 80110e5783ab38d18e40ced06c3d926e6017b03f Author: windpiger <song...@outlook.com> Date: 2017-03-18T14:33:49Z [SPARK-20013][SQL]add a newTablePath parameter for renameTable in ExternalCatalog --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17340: [SPARK-20013][SQL][WIP] merge renameTable to alte...
Github user windpiger closed the pull request at: https://github.com/apache/spark/pull/17340 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17340: [SPARK-20013][SQL] merge renameTable to alterTabl...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17340#discussion_r106780117 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala --- @@ -732,40 +676,6 @@ abstract class ExternalCatalogSuite extends SparkFunSuite with BeforeAndAfterEac assert(!exists(db.locationUri)) } - test("create/drop/rename table should create/delete/rename the directory") { --- End diff -- move to `SessionCatalogSuite` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17340: [SPARK-20013][SQL] merge renameTable to alterTabl...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17340#discussion_r106780097 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala --- @@ -196,50 +196,6 @@ abstract class ExternalCatalogSuite extends SparkFunSuite with BeforeAndAfterEac catalog.dropTable("db2", "unknown_table", ignoreIfNotExists = true, purge = false) } - test("rename table") { -val catalog = newBasicCatalog() -assert(catalog.listTables("db2").toSet == Set("tbl1", "tbl2")) -catalog.renameTable("db2", "tbl1", "tblone") -assert(catalog.listTables("db2").toSet == Set("tblone", "tbl2")) - } - - test("rename table when database/table does not exist") { --- End diff -- already existed in [SessionCatalog](https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala#L393) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17340: [SPARK-20013][SQL] merge renameTable to alterTabl...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17340#discussion_r106780090 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala --- @@ -196,50 +196,6 @@ abstract class ExternalCatalogSuite extends SparkFunSuite with BeforeAndAfterEac catalog.dropTable("db2", "unknown_table", ignoreIfNotExists = true, purge = false) } - test("rename table") { --- End diff -- already existed in [SessionCatalog](https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala#L361) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17340: [SPARK-20013][SQL] merge renameTable to alterTabl...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17340#discussion_r106780038 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala --- @@ -196,50 +196,6 @@ abstract class ExternalCatalogSuite extends SparkFunSuite with BeforeAndAfterEac catalog.dropTable("db2", "unknown_table", ignoreIfNotExists = true, purge = false) } - test("rename table") { -val catalog = newBasicCatalog() -assert(catalog.listTables("db2").toSet == Set("tbl1", "tbl2")) -catalog.renameTable("db2", "tbl1", "tblone") -assert(catalog.listTables("db2").toSet == Set("tblone", "tbl2")) - } - - test("rename table when database/table does not exist") { -val catalog = newBasicCatalog() -intercept[AnalysisException] { - catalog.renameTable("unknown_db", "unknown_table", "unknown_table") -} -intercept[AnalysisException] { - catalog.renameTable("db2", "unknown_table", "unknown_table") -} - } - - test("rename table when destination table already exists") { --- End diff -- move to `SessionCatalog` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17340: [SPARK-20013][SQL] merge renameTable to alterTabl...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17340#discussion_r106780033 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala --- @@ -196,50 +196,6 @@ abstract class ExternalCatalogSuite extends SparkFunSuite with BeforeAndAfterEac catalog.dropTable("db2", "unknown_table", ignoreIfNotExists = true, purge = false) } - test("rename table") { -val catalog = newBasicCatalog() -assert(catalog.listTables("db2").toSet == Set("tbl1", "tbl2")) -catalog.renameTable("db2", "tbl1", "tblone") -assert(catalog.listTables("db2").toSet == Set("tblone", "tbl2")) - } - - test("rename table when database/table does not exist") { -val catalog = newBasicCatalog() -intercept[AnalysisException] { - catalog.renameTable("unknown_db", "unknown_table", "unknown_table") -} -intercept[AnalysisException] { - catalog.renameTable("db2", "unknown_table", "unknown_table") -} - } - - test("rename table when destination table already exists") { -val catalog = newBasicCatalog() -intercept[AnalysisException] { - catalog.renameTable("db2", "tbl1", "tbl2") -} - } - - test("alter table") { -val catalog = newBasicCatalog() -val tbl1 = catalog.getTable("db2", "tbl1") -catalog.alterTable(tbl1.copy(properties = Map("toh" -> "frem"))) -val newTbl1 = catalog.getTable("db2", "tbl1") -assert(!tbl1.properties.contains("toh")) -assert(newTbl1.properties.size == tbl1.properties.size + 1) -assert(newTbl1.properties.get("toh") == Some("frem")) - } - - test("alter table when database/table does not exist") { --- End diff -- already existed in SessionCatalog --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17340: [SPARK-20013][SQL] merge renameTable to alterTabl...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17340#discussion_r106780030 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala --- @@ -196,50 +196,6 @@ abstract class ExternalCatalogSuite extends SparkFunSuite with BeforeAndAfterEac catalog.dropTable("db2", "unknown_table", ignoreIfNotExists = true, purge = false) } - test("rename table") { -val catalog = newBasicCatalog() -assert(catalog.listTables("db2").toSet == Set("tbl1", "tbl2")) -catalog.renameTable("db2", "tbl1", "tblone") -assert(catalog.listTables("db2").toSet == Set("tblone", "tbl2")) - } - - test("rename table when database/table does not exist") { -val catalog = newBasicCatalog() -intercept[AnalysisException] { - catalog.renameTable("unknown_db", "unknown_table", "unknown_table") -} -intercept[AnalysisException] { - catalog.renameTable("db2", "unknown_table", "unknown_table") -} - } - - test("rename table when destination table already exists") { -val catalog = newBasicCatalog() -intercept[AnalysisException] { - catalog.renameTable("db2", "tbl1", "tbl2") -} - } - - test("alter table") { --- End diff -- already existed in SessionCatalog --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17340: [SPARK-20013][SQL] merge renameTable to alterTabl...
GitHub user windpiger opened a pull request: https://github.com/apache/spark/pull/17340 [SPARK-20013][SQL] merge renameTable to alterTable in ExternCatalog ## What changes were proposed in this pull request? ## How was this patch tested? delete some tests in ExternalCatalogSuite which already existed in SessionCatalogSuite, and move some other tests in ExternalCatalogSuite which does not exist in SessionCatalogSuite You can merge this pull request into a Git repository by running: $ git pull https://github.com/windpiger/spark renameTable Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17340.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17340 commit 70f2365bec31fa3c3a53eebe73991871cdd77e56 Author: windpiger <song...@outlook.com> Date: 2017-03-09T08:03:19Z use alterTable to implement renameTabe commit e8b9ea36c586ce1af63ee7eb3dbc052455cfcdd0 Author: windpiger <song...@outlook.com> Date: 2017-03-18T10:17:33Z Merge branch 'master' into renameTable commit 26c6c7b05d1c7020cbc59784ce7ea4864c9c2b2b Author: windpiger <song...@outlook.com> Date: 2017-03-18T11:53:58Z merge with master and move some tests to SessionCatalogSuite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17338: [SPARK-19990][SQL][test-maven]create a temp file for fil...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17338 According to the jenkins' log, the test case ` test("create temporary view using")` in `DDLSuite` passed the test. This pr resolved the test failed reported in [SPARK-19990](https://issues.apache.org/jira/browse/SPARK-19990) there are some other test failed is not related with this PR, I will take a look at them. @kayousterhout @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17338: [SPARK-19990][SQL][test-maven]create a temp file ...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17338#discussion_r106778719 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -700,20 +701,33 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { test("create temporary view using") { val csvFile = - Thread.currentThread().getContextClassLoader.getResource("test-data/cars.csv").toString -withView("testview") { - sql(s"CREATE OR REPLACE TEMPORARY VIEW testview (c1 String, c2 String) USING " + -"org.apache.spark.sql.execution.datasources.csv.CSVFileFormat " + -s"OPTIONS (PATH '$csvFile')") + Thread.currentThread().getContextClassLoader.getResourceAsStream("test-data/cars.csv") --- End diff -- `getResourceAsStream` is used to read resource files from jar's resource, it is ok. the point is the return value not the relative path `test-data/cars.csv` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17338: [SPARK-19990][SQL][test-maven]create a temp file for fil...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17338 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17338: [SPARK-19990][SQL][maven]create a temp file for f...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17338#discussion_r106777091 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -700,20 +701,33 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { test("create temporary view using") { val csvFile = - Thread.currentThread().getContextClassLoader.getResource("test-data/cars.csv").toString -withView("testview") { - sql(s"CREATE OR REPLACE TEMPORARY VIEW testview (c1 String, c2 String) USING " + -"org.apache.spark.sql.execution.datasources.csv.CSVFileFormat " + -s"OPTIONS (PATH '$csvFile')") + Thread.currentThread().getContextClassLoader.getResourceAsStream("test-data/cars.csv") --- End diff -- the point is that the return value of `getResource.toString` is 'jar:file:/xxx.jar!/test-data/cars.csv'.If we use this path to set table location as following code in the test case, this path will cause exception when it is used to construct `new Path()` in [here](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L344). So I copy this file from resource in jar to a temp file, which path is 'file:/tmppath/cars.csv', and use this normal path to set table location ,it is ok for the following code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17338: [SPARK-19990][SQL]create a temp file for file in ...
GitHub user windpiger opened a pull request: https://github.com/apache/spark/pull/17338 [SPARK-19990][SQL]create a temp file for file in test.jar's resource when run test accross different modules ## What changes were proposed in this pull request? After we have merged the `HiveDDLSuite` and `DDLSuite` in [SPARK-19235](https://issues.apache.org/jira/browse/SPARK-19235), we have two subclasses of `DDLSuite`, that is `HiveCatalogedDDLSuite` and `InMemoryCatalogDDLSuite`. While `DDLSuite` is in `sql/core module`, and `HiveCatalogedDDLSuite` is in `sql/hive module`, if we test `HiveCatalogedDDLSuite`, it will run the test in its parent class `DDLSuite`, this will cause some test case failed which will get and use the test file path in `sql/core module` 's `resource`. Because the test file path getted will start with 'jar:' like "jar:file:/home/jenkins/workspace/spark-master-test-maven-hadoop-2.6/sql/core/target/spark-sql_2.11-2.2.0-SNAPSHOT-tests.jar!/test-data/cars.csv", which will failed when new Path() new Path in datasource.scala This PR fix this by copy file from resource to a temp dir. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/windpiger/spark fixtestfailemvn Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17338.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17338 commit 64c6b223840307506468430ed5b50d24bf76218d Author: windpiger <song...@outlook.com> Date: 2017-03-18T04:07:25Z [SPARK-19990][SQL]create a temp file for file in test.jar's resource when run test accross different modules commit 09b5e4065d23fbc8ddc64bc1dc92557648d03818 Author: windpiger <song...@outlook.com> Date: 2017-03-18T04:17:15Z fix a comment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17287#discussion_r106570185 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala --- @@ -27,41 +27,67 @@ import org.apache.spark.sql.catalyst.parser.CatalystSqlParser import org.apache.spark.sql.catalyst.plans.PlanTest import org.apache.spark.sql.catalyst.plans.logical.{Range, SubqueryAlias, View} +class InMemorySessionCatalogSuite extends SessionCatalogSuite { + protected val utils = new CatalogTestUtils { +override val tableInputFormat: String = "com.fruit.eyephone.CameraInputFormat" +override val tableOutputFormat: String = "com.fruit.eyephone.CameraOutputFormat" +override val defaultProvider: String = "parquet" +override def newEmptyCatalog(): ExternalCatalog = new InMemoryCatalog + } +} + /** - * Tests for [[SessionCatalog]] that assume that [[InMemoryCatalog]] is correctly implemented. + * Tests for [[SessionCatalog]] * * Note: many of the methods here are very similar to the ones in [[ExternalCatalogSuite]]. * This is because [[SessionCatalog]] and [[ExternalCatalog]] share many similar method * signatures but do not extend a common parent. This is largely by design but * unfortunately leads to very similar test code in two places. */ -class SessionCatalogSuite extends PlanTest { - private val utils = new CatalogTestUtils { -override val tableInputFormat: String = "com.fruit.eyephone.CameraInputFormat" -override val tableOutputFormat: String = "com.fruit.eyephone.CameraOutputFormat" -override val defaultProvider: String = "parquet" -override def newEmptyCatalog(): ExternalCatalog = new InMemoryCatalog - } +abstract class SessionCatalogSuite extends PlanTest { + protected val utils: CatalogTestUtils + + protected val isHiveExternalCatalog = false import utils._ + private def withBasicCatalog(f: SessionCatalog => Unit): Unit = { +val catalog = new SessionCatalog(newBasicCatalog()) +catalog.createDatabase(newDb("default"), ignoreIfExists = true) +try { + f(catalog) +} finally { + catalog.reset() +} + } + + private def withEmptyCatalog(f: SessionCatalog => Unit): Unit = { +val catalog = new SessionCatalog(newEmptyCatalog()) +catalog.createDatabase(newDb("default"), ignoreIfExists = true) --- End diff -- According to all the test cases in this Suite, it create the default database with `ignoreIfExists=true`, we have a default database existed here will not effect the test case --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17287#discussion_r106569991 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala --- @@ -27,41 +27,67 @@ import org.apache.spark.sql.catalyst.parser.CatalystSqlParser import org.apache.spark.sql.catalyst.plans.PlanTest import org.apache.spark.sql.catalyst.plans.logical.{Range, SubqueryAlias, View} +class InMemorySessionCatalogSuite extends SessionCatalogSuite { + protected val utils = new CatalogTestUtils { +override val tableInputFormat: String = "com.fruit.eyephone.CameraInputFormat" +override val tableOutputFormat: String = "com.fruit.eyephone.CameraOutputFormat" +override val defaultProvider: String = "parquet" +override def newEmptyCatalog(): ExternalCatalog = new InMemoryCatalog + } +} + /** - * Tests for [[SessionCatalog]] that assume that [[InMemoryCatalog]] is correctly implemented. + * Tests for [[SessionCatalog]] * * Note: many of the methods here are very similar to the ones in [[ExternalCatalogSuite]]. * This is because [[SessionCatalog]] and [[ExternalCatalog]] share many similar method * signatures but do not extend a common parent. This is largely by design but * unfortunately leads to very similar test code in two places. */ -class SessionCatalogSuite extends PlanTest { - private val utils = new CatalogTestUtils { -override val tableInputFormat: String = "com.fruit.eyephone.CameraInputFormat" -override val tableOutputFormat: String = "com.fruit.eyephone.CameraOutputFormat" -override val defaultProvider: String = "parquet" -override def newEmptyCatalog(): ExternalCatalog = new InMemoryCatalog - } +abstract class SessionCatalogSuite extends PlanTest { + protected val utils: CatalogTestUtils + + protected val isHiveExternalCatalog = false import utils._ + private def withBasicCatalog(f: SessionCatalog => Unit): Unit = { +val catalog = new SessionCatalog(newBasicCatalog()) +catalog.createDatabase(newDb("default"), ignoreIfExists = true) +try { + f(catalog) +} finally { + catalog.reset() +} + } + + private def withEmptyCatalog(f: SessionCatalog => Unit): Unit = { +val catalog = new SessionCatalog(newEmptyCatalog()) +catalog.createDatabase(newDb("default"), ignoreIfExists = true) +try { + f(catalog) +} finally { + catalog.reset() +} + } // -- // Databases // -- test("basic create and list databases") { -val catalog = new SessionCatalog(newEmptyCatalog()) -catalog.createDatabase(newDb("default"), ignoreIfExists = true) -assert(catalog.databaseExists("default")) -assert(!catalog.databaseExists("testing")) -assert(!catalog.databaseExists("testing2")) -catalog.createDatabase(newDb("testing"), ignoreIfExists = false) -assert(catalog.databaseExists("testing")) -assert(catalog.listDatabases().toSet == Set("default", "testing")) -catalog.createDatabase(newDb("testing2"), ignoreIfExists = false) -assert(catalog.listDatabases().toSet == Set("default", "testing", "testing2")) -assert(catalog.databaseExists("testing2")) -assert(!catalog.databaseExists("does_not_exist")) +withEmptyCatalog { catalog => + catalog.createDatabase(newDb("default"), ignoreIfExists = true) --- End diff -- here the ignoreIfExists = true, I think it is ok that the default database have already created. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17287#discussion_r106569958 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala --- @@ -27,41 +27,67 @@ import org.apache.spark.sql.catalyst.parser.CatalystSqlParser import org.apache.spark.sql.catalyst.plans.PlanTest import org.apache.spark.sql.catalyst.plans.logical.{Range, SubqueryAlias, View} +class InMemorySessionCatalogSuite extends SessionCatalogSuite { + protected val utils = new CatalogTestUtils { +override val tableInputFormat: String = "com.fruit.eyephone.CameraInputFormat" +override val tableOutputFormat: String = "com.fruit.eyephone.CameraOutputFormat" +override val defaultProvider: String = "parquet" +override def newEmptyCatalog(): ExternalCatalog = new InMemoryCatalog + } +} + /** - * Tests for [[SessionCatalog]] that assume that [[InMemoryCatalog]] is correctly implemented. + * Tests for [[SessionCatalog]] * * Note: many of the methods here are very similar to the ones in [[ExternalCatalogSuite]]. * This is because [[SessionCatalog]] and [[ExternalCatalog]] share many similar method * signatures but do not extend a common parent. This is largely by design but * unfortunately leads to very similar test code in two places. */ -class SessionCatalogSuite extends PlanTest { - private val utils = new CatalogTestUtils { -override val tableInputFormat: String = "com.fruit.eyephone.CameraInputFormat" -override val tableOutputFormat: String = "com.fruit.eyephone.CameraOutputFormat" -override val defaultProvider: String = "parquet" -override def newEmptyCatalog(): ExternalCatalog = new InMemoryCatalog - } +abstract class SessionCatalogSuite extends PlanTest { + protected val utils: CatalogTestUtils + + protected val isHiveExternalCatalog = false import utils._ + private def withBasicCatalog(f: SessionCatalog => Unit): Unit = { +val catalog = new SessionCatalog(newBasicCatalog()) +catalog.createDatabase(newDb("default"), ignoreIfExists = true) --- End diff -- here the `ignoreIfExists = true`, I think it is ok that the default database have already created. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17317: [SPARK-19329][SQL][BRANCH-2.1]Reading from or wri...
Github user windpiger closed the pull request at: https://github.com/apache/spark/pull/17317 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17317: [SPARK-19329][SQL][BRANCH-2.1]Reading from or writing to...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17317 cc @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17317: [SPARK-19329][SQL][BRANCH-2.1]Reading from or wri...
GitHub user windpiger opened a pull request: https://github.com/apache/spark/pull/17317 [SPARK-19329][SQL][BRANCH-2.1]Reading from or writing to a datasource table with a non pre-existing location should succeed ## What changes were proposed in this pull request? This is a backport pr of https://github.com/apache/spark/pull/16672 into branch-2.1. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/windpiger/spark backport-insertnotexists Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17317.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17317 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16672: [SPARK-19329][SQL]Reading from or writing to a datasourc...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/16672 ok thanksï¼its my pleasureï½ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17287: [SPARK-19945][SQL]add test suite for SessionCatalog with...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17287 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17287#discussion_r106342082 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala --- @@ -1270,6 +1376,7 @@ class SessionCatalogSuite extends PlanTest { } assert(cause.getMessage.contains("Undefined function: 'undefined_fn'")) + catalog.reset() --- End diff -- yes, you are right, let me add a try catch --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17287#discussion_r106336735 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala --- @@ -76,468 +102,500 @@ class SessionCatalogSuite extends PlanTest { } test("create databases using invalid names") { -val catalog = new SessionCatalog(newEmptyCatalog()) -testInvalidName(name => catalog.createDatabase(newDb(name), ignoreIfExists = true)) +withEmptyCatalog { catalog => + testInvalidName( +name => catalog.createDatabase(newDb(name), ignoreIfExists = true)) +} } test("get database when a database exists") { -val catalog = new SessionCatalog(newBasicCatalog()) -val db1 = catalog.getDatabaseMetadata("db1") -assert(db1.name == "db1") -assert(db1.description.contains("db1")) +withBasicCatalog { catalog => + val db1 = catalog.getDatabaseMetadata("db1") + assert(db1.name == "db1") + assert(db1.description.contains("db1")) +} } test("get database should throw exception when the database does not exist") { -val catalog = new SessionCatalog(newBasicCatalog()) -intercept[NoSuchDatabaseException] { - catalog.getDatabaseMetadata("db_that_does_not_exist") +withBasicCatalog { catalog => + intercept[NoSuchDatabaseException] { +catalog.getDatabaseMetadata("db_that_does_not_exist") + } } } test("list databases without pattern") { -val catalog = new SessionCatalog(newBasicCatalog()) -assert(catalog.listDatabases().toSet == Set("default", "db1", "db2", "db3")) +withBasicCatalog { catalog => + assert(catalog.listDatabases().toSet == Set("default", "db1", "db2", "db3")) +} } test("list databases with pattern") { -val catalog = new SessionCatalog(newBasicCatalog()) -assert(catalog.listDatabases("db").toSet == Set.empty) -assert(catalog.listDatabases("db*").toSet == Set("db1", "db2", "db3")) -assert(catalog.listDatabases("*1").toSet == Set("db1")) -assert(catalog.listDatabases("db2").toSet == Set("db2")) +withBasicCatalog { catalog => + assert(catalog.listDatabases("db").toSet == Set.empty) + assert(catalog.listDatabases("db*").toSet == Set("db1", "db2", "db3")) + assert(catalog.listDatabases("*1").toSet == Set("db1")) + assert(catalog.listDatabases("db2").toSet == Set("db2")) +} } test("drop database") { -val catalog = new SessionCatalog(newBasicCatalog()) -catalog.dropDatabase("db1", ignoreIfNotExists = false, cascade = false) -assert(catalog.listDatabases().toSet == Set("default", "db2", "db3")) +withBasicCatalog { catalog => + catalog.dropDatabase("db1", ignoreIfNotExists = false, cascade = false) + assert(catalog.listDatabases().toSet == Set("default", "db2", "db3")) +} } test("drop database when the database is not empty") { // Throw exception if there are functions left -val externalCatalog1 = newBasicCatalog() -val sessionCatalog1 = new SessionCatalog(externalCatalog1) -externalCatalog1.dropTable("db2", "tbl1", ignoreIfNotExists = false, purge = false) -externalCatalog1.dropTable("db2", "tbl2", ignoreIfNotExists = false, purge = false) -intercept[AnalysisException] { - sessionCatalog1.dropDatabase("db2", ignoreIfNotExists = false, cascade = false) +withBasicCatalog { catalog => + catalog.externalCatalog.dropTable("db2", "tbl1", ignoreIfNotExists = false, purge = false) + catalog.externalCatalog.dropTable("db2", "tbl2", ignoreIfNotExists = false, purge = false) + intercept[AnalysisException] { +catalog.dropDatabase("db2", ignoreIfNotExists = false, cascade = false) + } } - -// Throw exception if there are tables left -val externalCatalog2 = newBasicCatalog() -val sessionCatalog2 = new SessionCatalog(externalCatalog2) -externalCatalog2.dropFunction("db2",
[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17287#discussion_r106336045 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala --- @@ -999,257 +1083,279 @@ class SessionCatalogSuite extends PlanTest { expectedParts: CatalogTablePartition*): Boolean = { // ExternalCatalog may set a default location for partitions, here we ignore the partition // location when comparing them. -actualParts.map(p => p.copy(storage = p.storage.copy(locationUri = None))).toSet == - expectedParts.map(p => p.copy(storage = p.storage.copy(locationUri = None))).toSet +val actualPartsNormalize = actualParts.map(p => + p.copy(parameters = Map.empty, storage = p.storage.copy( +properties = Map.empty, locationUri = None, serde = None))).toSet + +val expectedPartsNormalize = expectedParts.map(p => +p.copy(parameters = Map.empty, storage = p.storage.copy( + properties = Map.empty, locationUri = None, serde = None))).toSet + +actualPartsNormalize == expectedPartsNormalize +//actualParts.map(p => +// p.copy(storage = p.storage.copy( +//properties = Map.empty, locationUri = None))).toSet == +// expectedParts.map(p => +//p.copy(storage = p.storage.copy(properties = Map.empty, locationUri = None))).toSet --- End diff -- sorry, let me remove it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17287#discussion_r106335967 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala --- @@ -1270,6 +1376,7 @@ class SessionCatalogSuite extends PlanTest { } assert(cause.getMessage.contains("Undefined function: 'undefined_fn'")) + catalog.reset() --- End diff -- here the `SessionCatalog` is instanced with different `conf` parameter. In `withBasicCatalog`, it just leave it default --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17287#discussion_r106335778 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala --- @@ -999,257 +1083,279 @@ class SessionCatalogSuite extends PlanTest { expectedParts: CatalogTablePartition*): Boolean = { // ExternalCatalog may set a default location for partitions, here we ignore the partition // location when comparing them. -actualParts.map(p => p.copy(storage = p.storage.copy(locationUri = None))).toSet == - expectedParts.map(p => p.copy(storage = p.storage.copy(locationUri = None))).toSet +val actualPartsNormalize = actualParts.map(p => --- End diff -- Yes, it is~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17287: [SPARK-19945][SQL]add test suite for SessionCatalog with...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17287 @gatorsmile @cloud-fan I have fix some code view, could you help to continue reivew this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17305: [SPARK-19961][SQL][minor]unify a erro msg when drop data...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17305 yesï¼it is a trivial jiraï½ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17305: [SPARK-19961][SQL][minor]unify a erro msg when drop data...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17305 cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17305: [SPARK-19961][SQL][minor]unify a erro msg when dr...
GitHub user windpiger opened a pull request: https://github.com/apache/spark/pull/17305 [SPARK-19961][SQL][minor]unify a erro msg when drop databse for HiveExternalCatalog and InMemoryCatalog ## What changes were proposed in this pull request? unify a exception erro msg for dropdatabase when the database still have some tables. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/windpiger/spark unifyErromsg Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17305.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17305 commit e25c3fb33fd5cf7f07fa927d075a6ae06a4c3df9 Author: windpiger <song...@outlook.com> Date: 2017-03-15T14:55:17Z [SPARK-19961][SQL]unify a erro msg when drop databse for HiveExternalCatalog and InMemoryCatalog --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17287#discussion_r106145746 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala --- @@ -27,41 +27,83 @@ import org.apache.spark.sql.catalyst.parser.CatalystSqlParser import org.apache.spark.sql.catalyst.plans.PlanTest import org.apache.spark.sql.catalyst.plans.logical.{Range, SubqueryAlias, View} +class InMemorySessionCatalogSuite extends SessionCatalogSuite { + protected val utils = new CatalogTestUtils { +override val tableInputFormat: String = "com.fruit.eyephone.CameraInputFormat" +override val tableOutputFormat: String = "com.fruit.eyephone.CameraOutputFormat" +override val defaultProvider: String = "parquet" +override def newEmptyCatalog(): ExternalCatalog = new InMemoryCatalog + } +} + /** - * Tests for [[SessionCatalog]] that assume that [[InMemoryCatalog]] is correctly implemented. + * Tests for [[SessionCatalog]] * * Note: many of the methods here are very similar to the ones in [[ExternalCatalogSuite]]. * This is because [[SessionCatalog]] and [[ExternalCatalog]] share many similar method * signatures but do not extend a common parent. This is largely by design but * unfortunately leads to very similar test code in two places. */ -class SessionCatalogSuite extends PlanTest { - private val utils = new CatalogTestUtils { -override val tableInputFormat: String = "com.fruit.eyephone.CameraInputFormat" -override val tableOutputFormat: String = "com.fruit.eyephone.CameraOutputFormat" -override val defaultProvider: String = "parquet" -override def newEmptyCatalog(): ExternalCatalog = new InMemoryCatalog - } +abstract class SessionCatalogSuite extends PlanTest { + protected val utils: CatalogTestUtils + + protected val isHiveExternalCatalog = false import utils._ + val BASIC = "basic" + val EMPTY = "empty" + + private def withSessionCatalog( --- End diff -- ok~ thanks~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17287#discussion_r106065119 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala --- @@ -76,468 +118,499 @@ class SessionCatalogSuite extends PlanTest { } test("create databases using invalid names") { -val catalog = new SessionCatalog(newEmptyCatalog()) -testInvalidName(name => catalog.createDatabase(newDb(name), ignoreIfExists = true)) +withSessionCatalog(EMPTY) { catalog => + testInvalidName( +name => catalog.createDatabase(newDb(name), ignoreIfExists = true)) +} } test("get database when a database exists") { -val catalog = new SessionCatalog(newBasicCatalog()) -val db1 = catalog.getDatabaseMetadata("db1") -assert(db1.name == "db1") -assert(db1.description.contains("db1")) +withSessionCatalog() { catalog => + val db1 = catalog.getDatabaseMetadata("db1") + assert(db1.name == "db1") + assert(db1.description.contains("db1")) +} } test("get database should throw exception when the database does not exist") { -val catalog = new SessionCatalog(newBasicCatalog()) -intercept[NoSuchDatabaseException] { - catalog.getDatabaseMetadata("db_that_does_not_exist") +withSessionCatalog() { catalog => + intercept[NoSuchDatabaseException] { +catalog.getDatabaseMetadata("db_that_does_not_exist") + } } } test("list databases without pattern") { -val catalog = new SessionCatalog(newBasicCatalog()) -assert(catalog.listDatabases().toSet == Set("default", "db1", "db2", "db3")) +withSessionCatalog() { catalog => + assert(catalog.listDatabases().toSet == Set("default", "db1", "db2", "db3")) +} } test("list databases with pattern") { -val catalog = new SessionCatalog(newBasicCatalog()) -assert(catalog.listDatabases("db").toSet == Set.empty) -assert(catalog.listDatabases("db*").toSet == Set("db1", "db2", "db3")) -assert(catalog.listDatabases("*1").toSet == Set("db1")) -assert(catalog.listDatabases("db2").toSet == Set("db2")) +withSessionCatalog() { catalog => + assert(catalog.listDatabases("db").toSet == Set.empty) + assert(catalog.listDatabases("db*").toSet == Set("db1", "db2", "db3")) + assert(catalog.listDatabases("*1").toSet == Set("db1")) + assert(catalog.listDatabases("db2").toSet == Set("db2")) +} } test("drop database") { -val catalog = new SessionCatalog(newBasicCatalog()) -catalog.dropDatabase("db1", ignoreIfNotExists = false, cascade = false) -assert(catalog.listDatabases().toSet == Set("default", "db2", "db3")) +withSessionCatalog() { catalog => + catalog.dropDatabase("db1", ignoreIfNotExists = false, cascade = false) + assert(catalog.listDatabases().toSet == Set("default", "db2", "db3")) +} } test("drop database when the database is not empty") { // Throw exception if there are functions left -val externalCatalog1 = newBasicCatalog() -val sessionCatalog1 = new SessionCatalog(externalCatalog1) -externalCatalog1.dropTable("db2", "tbl1", ignoreIfNotExists = false, purge = false) -externalCatalog1.dropTable("db2", "tbl2", ignoreIfNotExists = false, purge = false) -intercept[AnalysisException] { - sessionCatalog1.dropDatabase("db2", ignoreIfNotExists = false, cascade = false) +withSessionCatalogAndExternal() { (catalog, externalCatalog) => + externalCatalog.dropTable("db2", "tbl1", ignoreIfNotExists = false, purge = false) + externalCatalog.dropTable("db2", "tbl2", ignoreIfNotExists = false, purge = false) + intercept[AnalysisException] { +catalog.dropDatabase("db2", ignoreIfNotExists = false, cascade = false) + } } - -// Throw exception if there are tables left -val externalCatalog2 = newBasicCatalog() -val sessionCatalog2 = new SessionCatalog(externalCatal
[GitHub] spark issue #17272: [SPARK-19724][SQL]create a managed table with an existed...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17272 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17272: [SPARK-19724][SQL]create a managed table with an ...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17272#discussion_r105889902 --- Diff: python/pyspark/sql/tests.py --- @@ -1892,7 +1892,7 @@ def test_list_tables(self): self.assertEquals(spark.catalog.listTables(), []) self.assertEquals(spark.catalog.listTables("some_db"), []) spark.createDataFrame([(1, 1)]).createOrReplaceTempView("temp_tab") -spark.sql("CREATE TABLE tab1 (name STRING, age INT) USING parquet") +spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet") --- End diff -- python test failed ``` the location('file:/home/jenkins/workspace/SparkPullRequestBuilder/spark-warehouse/tab1') of table('`default`.`tab1`') already exists.; ``` the location of tab1 does not deleted , I have tried to found out in which test case it forget to delete it(search all test cases containing `tab1` and run it), but they are all ok to delete the location of tab1, so here we change the table name to work around it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17287#discussion_r105860834 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala --- @@ -999,257 +1094,279 @@ class SessionCatalogSuite extends PlanTest { expectedParts: CatalogTablePartition*): Boolean = { // ExternalCatalog may set a default location for partitions, here we ignore the partition // location when comparing them. -actualParts.map(p => p.copy(storage = p.storage.copy(locationUri = None))).toSet == - expectedParts.map(p => p.copy(storage = p.storage.copy(locationUri = None))).toSet +val actualPartsNormalize = actualParts.map(p => --- End diff -- this is the expected part. https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala#L857 here add some normalize logic --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17287#discussion_r105858890 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala --- @@ -27,41 +27,83 @@ import org.apache.spark.sql.catalyst.parser.CatalystSqlParser import org.apache.spark.sql.catalyst.plans.PlanTest import org.apache.spark.sql.catalyst.plans.logical.{Range, SubqueryAlias, View} +class InMemorySessionCatalogSuite extends SessionCatalogSuite { + protected val utils = new CatalogTestUtils { +override val tableInputFormat: String = "com.fruit.eyephone.CameraInputFormat" +override val tableOutputFormat: String = "com.fruit.eyephone.CameraOutputFormat" +override val defaultProvider: String = "parquet" +override def newEmptyCatalog(): ExternalCatalog = new InMemoryCatalog + } +} + /** - * Tests for [[SessionCatalog]] that assume that [[InMemoryCatalog]] is correctly implemented. + * Tests for [[SessionCatalog]] * * Note: many of the methods here are very similar to the ones in [[ExternalCatalogSuite]]. * This is because [[SessionCatalog]] and [[ExternalCatalog]] share many similar method * signatures but do not extend a common parent. This is largely by design but * unfortunately leads to very similar test code in two places. */ -class SessionCatalogSuite extends PlanTest { - private val utils = new CatalogTestUtils { -override val tableInputFormat: String = "com.fruit.eyephone.CameraInputFormat" -override val tableOutputFormat: String = "com.fruit.eyephone.CameraOutputFormat" -override val defaultProvider: String = "parquet" -override def newEmptyCatalog(): ExternalCatalog = new InMemoryCatalog - } +abstract class SessionCatalogSuite extends PlanTest { + protected val utils: CatalogTestUtils + + protected val isHiveExternalCatalog = false import utils._ + val BASIC = "basic" + val EMPTY = "empty" + + private def withSessionCatalog( + catalogTpe: String = BASIC)(f: SessionCatalog => Unit): Unit = { +val catalog = catalogTpe match { + case BASIC => +new SessionCatalog(newBasicCatalog()) + case EMPTY => +new SessionCatalog(newEmptyCatalog()) +} +catalog.createDatabase(newDb("default"), ignoreIfExists = true) +try { + f(catalog) +} finally { + catalog.reset() --- End diff -- reset the catalog --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...
GitHub user windpiger opened a pull request: https://github.com/apache/spark/pull/17287 [SPARK-19945][SQL]add test suite for SessionCatalog with HiveExternalCatalog ## What changes were proposed in this pull request? Currently `SessionCatalogSuite` is only for `InMemoryCatalog`, there is no suite for `HiveExternalCatalog`. And there are some ddl function is not proper to test in `ExternalCatalogSuite`, because some logic are not full implement in `ExternalCatalog`, these ddl functions are full implement in `SessionCatalog`, it is better to test it in `SessionCatalogSuite` So we should add a test suite for `SessionCatalog` with `HiveExternalCatalog` ## How was this patch tested? add `HiveExternalSessionCatalogSuite` You can merge this pull request into a Git repository by running: $ git pull https://github.com/windpiger/spark sessioncatalogsuit Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17287.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17287 commit 44684461c86d2810ffdfae23485d74fba3801e12 Author: windpiger <song...@outlook.com> Date: 2017-03-14T09:11:57Z [SPARK-19945][SQL]add test suite for SessionCatalog with HiveExternalCatalog --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17272: [SPARK-19724][SQL]create a managed table with an existed...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17272 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17272: [SPARK-19724][SQL]create a managed table with an ...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17272#discussion_r105834140 --- Diff: python/pyspark/sql/tests.py --- @@ -1901,7 +1901,7 @@ def test_list_tables(self): self.assertEquals(len(tables), 2) self.assertEquals(len(tablesSomeDb), 2) self.assertEquals(tables[0], Table( -name="tab1", +name="t1", --- End diff -- python test failed ``` the location('file:/home/jenkins/workspace/SparkPullRequestBuilder/spark-warehouse/tab1') of table('`default`.`tab1`') already exists.; ``` the location of tab1 does not deleted , I have tried to found out in which test case it forget to delete it(search all test cases containing `tab1` and run it), but they are all ok to delete the location of tab1, so here we change the table name to work around it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17272: [SPARK-19724][SQL]create a managed table with an existed...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17272 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17272: [SPARK-19724][SQL]create a managed table with an existed...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17272 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17272: [SPARK-19724][SQL]create a managed table with an ...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17272#discussion_r105601636 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala --- @@ -155,6 +154,8 @@ case class CreateDataSourceTableAsSelectCommand( } else { table.storage.locationUri } + + sparkSession.sessionState.catalog.checkTableOrPathExists(table, ignoreIfExists = false) --- End diff -- because in following `saveDataIntoTable` will create the table path and store data in it, we should not check the path in following `createTable`, we check it here before `saveDataIntoTable` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17272: [SPARK-19724][SQL]create a managed table with an ...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17272#discussion_r105601455 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala --- @@ -164,15 +164,6 @@ abstract class ExternalCatalogSuite extends SparkFunSuite with BeforeAndAfterEac assert(actual.tableType === CatalogTableType.EXTERNAL) } - test("create table when the table already exists") { --- End diff -- since in `InMemoryCatalog` and `HiveExternalCatalog` does not check if the table exists, here move the test case to `SessionCatalogSuite` and `HiveExternalCatalogSuite` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17272: [SPARK-19724][SQL]create a managed table with an ...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17272#discussion_r105601271 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala --- @@ -187,38 +187,32 @@ class InMemoryCatalog( val db = tableDefinition.identifier.database.get requireDbExists(db) val table = tableDefinition.identifier.table -if (tableExists(db, table)) { --- End diff -- we have check if the table exists in SessionCatalog.createtable --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17057: [SPARK-19724][SQL][WIP]create a managed table wit...
Github user windpiger closed the pull request at: https://github.com/apache/spark/pull/17057 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17272: [SPARK-19724][SQL]create a managed table with an ...
GitHub user windpiger opened a pull request: https://github.com/apache/spark/pull/17272 [SPARK-19724][SQL]create a managed table with an existed default table should throw an exception ## What changes were proposed in this pull request? This JIRA is a follow up work after [SPARK-19583](https://issues.apache.org/jira/browse/SPARK-19583) As we discussed in that [PR](https://github.com/apache/spark/pull/16938) The following DDL for a managed table with an existed default location should throw an exception: ``` CREATE TABLE ... (PARTITIONED BY ...) AS SELECT ... CREATE TABLE ... (PARTITIONED BY ...) ``` Currently there are some situations which are not consist with above logic: 1. CREATE TABLE ... (PARTITIONED BY ...) succeed with an existed default location situation: for both hive/datasource(with HiveExternalCatalog/InMemoryCatalog) 2. CREATE TABLE ... (PARTITIONED BY ...) AS SELECT ... situation: hive table succeed with an existed default location This PR is going to make above two situations consist with the logic that it should throw an exception with an existed default location. ## How was this patch tested? unit test added You can merge this pull request into a Git repository by running: $ git pull https://github.com/windpiger/spark managedTableLocationExists Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17272.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17272 commit a0ba4190a96e2698e9c485b0ccbb7298a360f1dd Author: windpiger <song...@outlook.com> Date: 2017-03-13T07:24:18Z [SPARK-19724][SQL]create a managed table with an existed default table path should throw an exception --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15918: [SPARK-18122][SQL][WIP]Fallback to Kryo for unsupported ...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/15918 this change of `schemaFor` will affect the default behavior. For example: [udf in functions.scala](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3177) if the input type is not supported and fall back to kryo with BinaryType, here `inputTypes` before will be `Nil` which is ok for the following expression's typecheck, but after fall back to kryo it will failed when typecheck(BinaryType not equal to ReturnType), so I left logic to throw exception in the `schemaFor` for `un-nested` unsupported type , while for complex nested including `Map` etc, it will fall back to kryo. even if I use `schemaForDefaultBinaryType ` not `schemaFor` in `deserializerFor ` , and I think it is ok for this serde situation. But I am not sure the change in `schmaFor` will affect other logics like `udf` described above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17001 @gatorsmile thanks for your suggestion~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17211: [SPARK-19869][SQL]move table related ddl from ddl...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17211#discussion_r105576214 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -167,599 +153,6 @@ case class DescribeDatabaseCommand( } } --- End diff -- ok~ thanks, another problem is now the lines of tables.scala is 1594, this is too large, should we need to split it to reduce the lines of a file? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17254: [SPARK-19917][SQL]qualified partition path stored in cat...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17254 cc @cloud-fan @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17254: [SPARK-19917][SQL]qualified partition path stored in cat...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17254 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17055: [SPARK-19723][SQL]create datasource table with an...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17055#discussion_r105526116 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -2016,48 +2016,79 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } } + test("create datasource table with a non-existing location") { +withTable("t", "t1") { + withTempPath { dir => +spark.sql(s"CREATE TABLE t(a int, b int) USING parquet LOCATION '$dir'") + +val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t")) +assert(table.location == makeQualifiedPath(dir.getAbsolutePath)) + +spark.sql("INSERT INTO TABLE t SELECT 1, 2") +assert(dir.exists()) + +checkAnswer(spark.table("t"), Row(1, 2)) + } + // partition table + withTempPath { dir => +spark.sql( + s""" + |CREATE TABLE t1(a int, b int) USING parquet PARTITIONED BY(a) LOCATION '$dir' + """.stripMargin) + +val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t1")) +assert(table.location == makeQualifiedPath(dir.getAbsolutePath)) + +spark.sql("INSERT INTO TABLE t1 PARTITION(a=1) SELECT 2") + +val partDir = new File(dir, "a=1") +assert(partDir.exists()) + +checkAnswer(spark.table("t1"), Row(2, 1)) + } +} + } + Seq(true, false).foreach { shouldDelete => -val tcName = if (shouldDelete) "non-existent" else "existed" +val tcName = if (shouldDelete) "non-existing" else "existed" test(s"CTAS for external data source table with a $tcName location") { withTable("t", "t1") { -withTempDir { - dir => -if (shouldDelete) { - dir.delete() -} -spark.sql( - s""" - |CREATE TABLE t - |USING parquet - |LOCATION '$dir' - |AS SELECT 3 as a, 4 as b, 1 as c, 2 as d - """.stripMargin) -val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t")) -assert(table.location == makeQualifiedPath(dir.getAbsolutePath)) +withTempDir { dir => + if (shouldDelete) { +dir.delete() + } --- End diff -- ok thanks~~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17254: [SPARK-19917][SQL]qualified partition path stored...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17254#discussion_r105526016 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -2180,6 +2181,13 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { withTempDir { dir => assert(!dir.getAbsolutePath.startsWith("file:/")) +spark.sql(s"ALTER TABLE t SET LOCATION '$dir'") --- End diff -- ALTER TABLE SET LOCATION should be also qualified --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17254: [SPARK-19917][SQL]qualified partition path stored...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17254#discussion_r105526010 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -1357,13 +1360,11 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { "PARTITION (a='2', b='6') LOCATION 'paris' PARTITION (a='3', b='7')") assert(catalog.listPartitions(tableIdent).map(_.spec).toSet == Set(part1, part2, part3)) assert(catalog.getPartition(tableIdent, part1).storage.locationUri.isDefined) -val partitionLocation = if (isUsingHiveMetastore) { - val tableLocation = catalog.getTableMetadata(tableIdent).storage.locationUri - assert(tableLocation.isDefined) - makeQualifiedPath(new Path(tableLocation.get.toString, "paris").toString) -} else { - new URI("paris") --- End diff -- ALTER TABLE ADD PARTITION LOCATION relative location will be quallified with parent path using table location --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17254: [SPARK-19917][SQL]qualified partition path stored...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17254#discussion_r105526000 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -1201,7 +1202,9 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { verifyLocation(new URI("/swanky/steak/place")) // set table partition location without explicitly specifying database sql("ALTER TABLE tab1 PARTITION (a='1', b='2') SET LOCATION 'vienna'") -verifyLocation(new URI("vienna"), Some(partSpec)) +val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("tab1")) --- End diff -- relative location will be quallified with parent path using table location --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17254: [SPARK-19917][SQL]qualified partition path stored...
GitHub user windpiger opened a pull request: https://github.com/apache/spark/pull/17254 [SPARK-19917][SQL]qualified partition path stored in catalog ## What changes were proposed in this pull request? partition path should be qualified to store in catalog. There are some scenes: 1. ALTER TABLE t PARTITION(b=1) SET LOCATION '/path/x' qualified: file:/path/x 2. ALTER TABLE t PARTITION(b=1) SET LOCATION 'x' qualified: file:/tablelocation/x 3. ALTER TABLE t ADD PARTITION(b=1) LOCATION '/path/x' qualified: file:/path/x 4. ALTER TABLE t ADD PARTITION(b=1) LOCATION 'x' qualified: file:/tablelocation/x Currently only ALTER TABLE t ADD PARTITION(b=1) LOCATION for hive serde table has the expected qualified path. we should make other scenes to be consist with it. Another change is for alter table location. ## How was this patch tested? modify existing TestCases You can merge this pull request into a Git repository by running: $ git pull https://github.com/windpiger/spark qualifiedPartitionPath Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17254.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17254 commit 51dcf4092345e173a63e088e779fa44c4fc963b1 Author: windpiger <song...@outlook.com> Date: 2017-03-11T04:39:02Z [SPARK-19917][SQL]qualified partition path stored in catalog commit 996a84d8d1ece4d311240e71a1a9be47828621f8 Author: windpiger <song...@outlook.com> Date: 2017-03-11T04:42:32Z remove empty line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17055: [SPARK-19723][SQL]create datasource table with an...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17055#discussion_r105520260 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -1663,95 +1663,127 @@ class HiveDDLSuite } } + test("create hive table with a non-existing location") { +withTable("t", "t1") { + withTempPath { dir => +spark.sql(s"CREATE TABLE t(a int, b int) USING hive LOCATION '$dir'") + +val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t")) +assert(table.location == makeQualifiedPath(dir.getAbsolutePath)) + +spark.sql("INSERT INTO TABLE t SELECT 1, 2") +assert(dir.exists()) + +checkAnswer(spark.table("t"), Row(1, 2)) + } + // partition table + withTempPath { dir => +spark.sql( + s""" + |CREATE TABLE t1(a int, b int) + |USING hive + |PARTITIONED BY(a) + |LOCATION '$dir' + """.stripMargin) + +val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t1")) +assert(table.location == makeQualifiedPath(dir.getAbsolutePath)) + +spark.sql("INSERT INTO TABLE t1 PARTITION(a=1) SELECT 2") + +val partDir = new File(dir, "a=1") +assert(partDir.exists()) + +checkAnswer(spark.table("t1"), Row(2, 1)) + } +} + } + Seq(true, false).foreach { shouldDelete => -val tcName = if (shouldDelete) "non-existent" else "existed" +val tcName = if (shouldDelete) "non-existing" else "existed" test(s"CTAS for external data source table with a $tcName location") { --- End diff -- oh, it is. after we also qualified the location path for InMemoryCatalog, the code for equal two location are the same with HiveExternalCatalog. thanks~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17055: [SPARK-19723][SQL]create datasource table with an...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17055#discussion_r105519542 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -2016,48 +2016,79 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } } + test("create datasource table with a non-existing location") { +withTable("t", "t1") { + withTempPath { dir => +spark.sql(s"CREATE TABLE t(a int, b int) USING parquet LOCATION '$dir'") + +val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t")) +assert(table.location == makeQualifiedPath(dir.getAbsolutePath)) + +spark.sql("INSERT INTO TABLE t SELECT 1, 2") +assert(dir.exists()) + +checkAnswer(spark.table("t"), Row(1, 2)) + } + // partition table + withTempPath { dir => +spark.sql( + s""" + |CREATE TABLE t1(a int, b int) USING parquet PARTITIONED BY(a) LOCATION '$dir' + """.stripMargin) --- End diff -- exceed 100 characters. code style check failed... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17055: [SPARK-19723][SQL]create datasource table with an non-ex...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17055 OK thanks a lotï½ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17055: [SPARK-19723][SQL]create datasource table with an non-ex...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17055 @gatorsmile I have merged with masterï¼if it is ok, could you help to merge it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17095: [SPARK-19763][SQL]qualified external datasource table lo...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17095 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16592: [SPARK-19235] [SQL] [TESTS] Enable Test Cases in ...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/16592#discussion_r105111257 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala --- @@ -306,6 +306,11 @@ private[sql] trait SQLTestUtils val fs = hadoopPath.getFileSystem(spark.sessionState.newHadoopConf()) fs.makeQualified(hadoopPath).toUri } + + def makeQualifiedPath(path: Path): URI = { --- End diff -- I have fix the above two issues in https://github.com/apache/spark/pull/17095 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16592: [SPARK-19235] [SQL] [TESTS] Enable Test Cases in DDLSuit...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/16592 greate~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17211: [SPARK-19869][SQL]move table related ddl from ddl.scala ...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17211 cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17211: [SPARK-19869][SQL]move table related ddl from ddl...
GitHub user windpiger opened a pull request: https://github.com/apache/spark/pull/17211 [SPARK-19869][SQL]move table related ddl from ddl.scala to tables.scala ## What changes were proposed in this pull request? move table related ddl from ddl.scala to tables.scala ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/windpiger/spark moveToTables Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17211.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17211 commit 6f0e24a61bab73532cd50d71a065c1bbf348024c Author: windpiger <song...@outlook.com> Date: 2017-03-08T14:24:02Z [SPARK-19869][SQL]move table related ddl from ddl.scala to tables.scala commit c31ae7fa495ac6068446def66cb8d1128c1448c6 Author: windpiger <song...@outlook.com> Date: 2017-03-08T14:25:59Z remove a todo comment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17055: [SPARK-19723][SQL]create datasource table with an non-ex...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17055 @gatorsmile this pr could be merged? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17204: [SPARK-19864][SQL]provide a makeQualifiedPath fun...
GitHub user windpiger opened a pull request: https://github.com/apache/spark/pull/17204 [SPARK-19864][SQL]provide a makeQualifiedPath functions to optimize some code ## What changes were proposed in this pull request? Currently there are lots of places to make the path qualified, it is better to provide a function to do this, then the code will be more simple. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/windpiger/spark addQualifiledPathUtil Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17204.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17204 commit 7125a1bce9e013e3b45a02a87c2850cf41812348 Author: windpiger <song...@outlook.com> Date: 2017-03-08T04:20:53Z [SPARK-19864][SQL]provide a makeQualifiedPath functions to optimize some code --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17001 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17176: [SPARK-19833][SQL]remove SQLConf.HIVE_VERIFY_PART...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17176#discussion_r104660965 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -159,36 +159,11 @@ class HadoopTableReader( def verifyPartitionPath( partitionToDeserializer: Map[HivePartition, Class[_ <: Deserializer]]): Map[HivePartition, Class[_ <: Deserializer]] = { - if (!sparkSession.sessionState.conf.verifyPartitionPath) { --- End diff -- after this pr https://github.com/apache/spark/pull/17187ï¼ read hive table which does not use `stored by` will not use `HiveTableScanExec`. this function has a bug ,that if the partition path is custom path 1. it will still do filter for all partition path in the parameter `partitionToDeserializer`, 2. it will scan the path which does not belong to the table ,e.g. custom path is `/root/a` and the partitionSpec is `b=1/c=2`, this will lead to scan `/` because of the `getPathPatternByPath ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17187: [SPARK-19847][SQL] port hive read to FileFormat A...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17187#discussion_r104643185 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala --- @@ -170,6 +169,40 @@ object HiveAnalysis extends Rule[LogicalPlan] { } } +class ReadHiveDataSource(session: SparkSession) extends Rule[LogicalPlan] { + import org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.META_TABLE_STORAGE + + override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { +case c: CatalogRelation if DDLUtils.isHiveTable(c.tableMeta) && +// Hive tables with storage handler with be handled in a different way, in `HiveTableScans`. --- End diff -- with be -> will be --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17176: [SPARK-19833][SQL]remove SQLConf.HIVE_VERIFY_PARTITION_P...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17176 why jenkins failed... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17097: [SPARK-19765][SQL] UNCACHE TABLE should un-cache ...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17097#discussion_r104570109 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala --- @@ -88,46 +90,81 @@ class CacheManager extends Logging { query: Dataset[_], tableName: Option[String] = None, storageLevel: StorageLevel = MEMORY_AND_DISK): Unit = writeLock { -val planToCache = query.queryExecution.analyzed +val planToCache = query.logicalPlan if (lookupCachedData(planToCache).nonEmpty) { logWarning("Asked to cache already cached data.") } else { val sparkSession = query.sparkSession - cachedData += -CachedData( - planToCache, - InMemoryRelation( -sparkSession.sessionState.conf.useCompression, -sparkSession.sessionState.conf.columnBatchSize, -storageLevel, - sparkSession.sessionState.executePlan(planToCache).executedPlan, -tableName)) + cachedData.add(CachedData( +planToCache, +InMemoryRelation( + sparkSession.sessionState.conf.useCompression, + sparkSession.sessionState.conf.columnBatchSize, + storageLevel, + sparkSession.sessionState.executePlan(planToCache).executedPlan, + tableName))) } } /** - * Tries to remove the data for the given [[Dataset]] from the cache. - * No operation, if it's already uncached. + * Un-cache all the cache entries that refer to the given plan. + */ + def uncacheQuery(query: Dataset[_], blocking: Boolean = true): Unit = writeLock { --- End diff -- there is another scenario that after alter datasource table location, we can't uncache the table, [JIRA](https://issues.apache.org/jira/browse/SPARK-19845) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17176: [SPARK-19833][SQL]remove SQLConf.HIVE_VERIFY_PARTITION_P...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17176 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17176: [SPARK-19833][SQL]remove SQLConf.HIVE_VERIFY_PART...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17176#discussion_r104400143 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -159,36 +159,37 @@ class HadoopTableReader( def verifyPartitionPath( partitionToDeserializer: Map[HivePartition, Class[_ <: Deserializer]]): Map[HivePartition, Class[_ <: Deserializer]] = { - if (!sparkSession.sessionState.conf.verifyPartitionPath) { -partitionToDeserializer - } else { -var existPathSet = collection.mutable.Set[String]() -var pathPatternSet = collection.mutable.Set[String]() -partitionToDeserializer.filter { - case (partition, partDeserializer) => -def updateExistPathSetByPathPattern(pathPatternStr: String) { - val pathPattern = new Path(pathPatternStr) - val fs = pathPattern.getFileSystem(hadoopConf) - val matches = fs.globStatus(pathPattern) - matches.foreach(fileStatus => existPathSet += fileStatus.getPath.toString) -} -// convert /demo/data/year/month/day to /demo/data/*/*/*/ -def getPathPatternByPath(parNum: Int, tempPath: Path): String = { + var existPathSet = collection.mutable.Set[String]() + var pathPatternSet = collection.mutable.Set[String]() + partitionToDeserializer.filter { +case (partition, partDeserializer) => + def updateExistPathSetByPathPattern(pathPatternStr: String) { +val pathPattern = new Path(pathPatternStr) +val fs = pathPattern.getFileSystem(hadoopConf) +val matches = fs.globStatus(pathPattern) +matches.foreach(fileStatus => existPathSet += fileStatus.getPath.toString) + } + // convert /demo/data/year/month/day to /demo/data/*/*/*/ + def getPathPatternByPath(parNum: Int, tempPath: Path, partitionName: String): String = { +// if the partition path does not end with partition name, we should not --- End diff -- if the partition location has been altered to another location, we should not do this pattern, or we will list pattern files which does not belong to the partition --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17176: [SPARK-19833][SQL]remove SQLConf.HIVE_VERIFY_PART...
GitHub user windpiger opened a pull request: https://github.com/apache/spark/pull/17176 [SPARK-19833][SQL]remove SQLConf.HIVE_VERIFY_PARTITION_PATH, always return empty when the location does not exists ## What changes were proposed in this pull request? In SPARK-5068, we introduce a SQLConf spark.sql.hive.verifyPartitionPath, if it is set to true, it will avoid the task failed when the patition location does not exists in the filesystem. this situation should always return emtpy and don't lead to the task failed, here we remove this conf. ## How was this patch tested? modify a test case You can merge this pull request into a Git repository by running: $ git pull https://github.com/windpiger/spark removeHiveVerfiyPath Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17176.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17176 commit 95aa9317b228220961074c04df06e1d08d2d8556 Author: windpiger <song...@outlook.com> Date: 2017-03-06T09:16:05Z [SPARK-19833][SQL]remove SQLConf.HIVE_VERIFY_PARTITION_PATH, always return empty when the location does not exists --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17173: [SPARK-19832][SQL]DynamicPartitionWriteTask get partitio...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17173 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17149#discussion_r104361311 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -146,7 +147,7 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach { try { sql(s"CREATE DATABASE $dbName") val db1 = catalog.getDatabaseMetadata(dbName) - val expectedLocation = makeQualifiedPath(s"spark-warehouse/$dbName.db") + val expectedLocation = new Path(makeQualifiedPath(s"spark-warehouse/$dbName.db")).toUri --- End diff -- ok, maybe we should implement them all? option[string] also exists in many places, and more complicate --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17149#discussion_r104361050 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala --- @@ -162,6 +164,28 @@ object CatalogUtils { BucketSpec(numBuckets, normalizedBucketCols, normalizedSortCols) } + /** + * Convert URI to String. + * Since URI.toString does not decode for the uri string, we need to use + * Path(uri).toString to decode it. + * @param uri the URI of the path + * @return the String of the path + */ + def URIToString(uri: Option[URI]): Option[String] = { +uri.map(new Path(_).toString) --- End diff -- what about StringToURI? it still import Path --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17149#discussion_r104359591 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -146,7 +147,7 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach { try { sql(s"CREATE DATABASE $dbName") val db1 = catalog.getDatabaseMetadata(dbName) - val expectedLocation = makeQualifiedPath(s"spark-warehouse/$dbName.db") + val expectedLocation = new Path(makeQualifiedPath(s"spark-warehouse/$dbName.db")).toUri --- End diff -- I just implement this function with a option[String], I think for a String without a option, it is already simple, we still to implement one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17149#discussion_r104358279 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala --- @@ -883,7 +885,7 @@ abstract class CatalogTestUtils { def newFunc(): CatalogFunction = newFunc("funcName") - def newUriForDatabase(): String = Utils.createTempDir().toURI.toString.stripSuffix("/") + def newUriForDatabase(): URI = new URI(Utils.createTempDir().toURI.toString.stripSuffix("/")) --- End diff -- @cloud-fan after I try to use Path to Compare , I think stripSuffix here is the simple way. Path with a `String` type constructor will be equal when one has a `/`, and another does not. ``` scala> val x = new Path("/ab/c/") x: org.apache.hadoop.fs.Path = /ab/c scala> val y = new Path("/ab/c") y: org.apache.hadoop.fs.Path = /ab/c scala> x == y res0: Boolean = true ``` Path with a `URI` type constructor will be not equal when one has a `/`, and another does not. ``` scala> val x =new URI("/a/b/c/") x: java.net.URI = /a/b/c/ scala> val y =new URI("/a/b/c") y: java.net.URI = /a/b/c scala> x == y res1: Boolean = false scala> val x1 =new Path(x) x1: org.apache.hadoop.fs.Path = /a/b/c/ scala> val y1 =new Path(y) y1: org.apache.hadoop.fs.Path = /a/b/c scala> x1 == y1 res2: Boolean = false ``` So just the Path with `String` type can ignore the suffix `/`, then if we have a `URI` in hand, and we want to compare with another `URI`, we should first transform them to `String` , and use this String to constructor a Path, after this two actions, we can compare them with ignore the suffix `/`. But I think it is more complicate, here we normalize the URI with stripSuffix from the Orignal then compare two URI directly, it is more simple. should we must to convert it to Path to compare? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17173: [SPARK-19832][SQL]DynamicPartitionWriteTask get p...
GitHub user windpiger opened a pull request: https://github.com/apache/spark/pull/17173 [SPARK-19832][SQL]DynamicPartitionWriteTask get partitionPath should escape the partition name ## What changes were proposed in this pull request? Currently in DynamicPartitionWriteTask, when we get the paritionPath of a parition, we just escape the partition value, not escape the partition name. this will cause some problems for some special partition name situation, for example : 1) if the partition name contains '%' etc, there will be two partition path created in the filesytem, one is for escaped path like '/path/a%25b=1', another is for unescaped path like '/path/a%b=1'. and the data inserted stored in unescaped path, while the show partitions table will return 'a%25b=1' which the partition name is escaped. So here it is not consist. And I think the data should be stored in the escaped path in filesystem, which Hive2.0.0 also have the same action. 2) if the partition name contains ':', there will throw exception that new Path("/path","a:b"), this is illegal which has a colon in the relative path. ``` java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: a:b at org.apache.hadoop.fs.Path.initialize(Path.java:205) at org.apache.hadoop.fs.Path.(Path.java:171) at org.apache.hadoop.fs.Path.(Path.java:88) ... 48 elided Caused by: java.net.URISyntaxException: Relative path in absolute URI: a:b at java.net.URI.checkPath(URI.java:1823) at java.net.URI.(URI.java:745) at org.apache.hadoop.fs.Path.initialize(Path.java:202) ... 50 more ``` ## How was this patch tested? unit test added You can merge this pull request into a Git repository by running: $ git pull https://github.com/windpiger/spark fixDatasourceSpecialCharPartitionName Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17173.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17173 commit deb63bd9f3586b7f9c2e203f8486d6a7eb49bc72 Author: windpiger <song...@outlook.com> Date: 2017-03-06T06:57:25Z [SPARK-19832][SQL]DynamicPartitionWriteTask get partitionPath should escape the partition name --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16910: [SPARK-19575][SQL]Reading from or writing to a hi...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/16910#discussion_r104309883 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -1588,6 +1596,147 @@ class HiveDDLSuite } } + test("insert data to a hive serde table which has a non-existing location should succeed") { +withTable("t") { + withTempDir { dir => +spark.sql( + s""" + |CREATE TABLE t(a string, b int) + |USING hive + |LOCATION '$dir' + """.stripMargin) +val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t")) +assert(new Path(table.location) == makeQualifiedPath(dir.getAbsolutePath)) + +val tableLocFile = new File(new URI(table.location)) +tableLocFile.delete() +assert(!tableLocFile.exists()) +spark.sql("INSERT INTO TABLE t SELECT 'c', 1") +assert(tableLocFile.exists()) +checkAnswer(spark.table("t"), Row("c", 1) :: Nil) + +Utils.deleteRecursively(dir) +assert(!tableLocFile.exists()) +spark.sql("INSERT OVERWRITE TABLE t SELECT 'c', 1") +assert(tableLocFile.exists()) +checkAnswer(spark.table("t"), Row("c", 1) :: Nil) + +val newDirFile = new File(dir, "x") +val newDirPath = newDirFile.getAbsolutePath +spark.sql(s"ALTER TABLE t SET LOCATION '$newDirPath'") +spark.sessionState.catalog.refreshTable(TableIdentifier("t")) + +val table1 = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t")) +assert(table1.location == newDirPath) +assert(!newDirFile.exists()) + +spark.sql("INSERT INTO TABLE t SELECT 'c', 1") +checkAnswer(spark.table("t"), Row("c", 1) :: Nil) +assert(newDirFile.exists()) + } +} + } + + test("insert into a hive serde table with non-existing partition location should succeed") { +withTable("t") { + withTempDir { dir => +spark.sql( + s""" + |CREATE TABLE t(a int, b int, c int, d int) + |USING hive + |PARTITIONED BY(a, b) + |LOCATION "$dir" + """.stripMargin) +val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t")) +assert(new Path(table.location) == makeQualifiedPath(dir.getAbsolutePath)) + +spark.sql("INSERT INTO TABLE t PARTITION(a=1, b=2) SELECT 3, 4") +checkAnswer(spark.table("t"), Row(3, 4, 1, 2) :: Nil) + +val partLoc = new File(dir, "a=1") +Utils.deleteRecursively(partLoc) +assert(!partLoc.exists()) +// insert overwrite into a partition which location has been deleted. +spark.sql("INSERT OVERWRITE TABLE t PARTITION(a=1, b=2) SELECT 7, 8") +assert(partLoc.exists()) +checkAnswer(spark.table("t"), Row(7, 8, 1, 2) :: Nil) + +val newDirFile = new File(dir, "x") +val newDirPath = newDirFile.getAbsolutePath +spark.sql(s"ALTER TABLE t PARTITION(a=1, b=2) SET LOCATION '$newDirPath'") +assert(!newDirFile.exists()) + +// insert into a partition which location does not exists. +spark.sql("INSERT INTO TABLE t PARTITION(a=1, b=2) SELECT 9, 10") +assert(newDirFile.exists()) +checkAnswer(spark.table("t"), Row(9, 10, 1, 2) :: Nil) + } +} + } + + test("read data from a hive serde table which has a non-existing location should succeed") { +withTable("t") { + withTempDir { dir => +spark.sql( + s""" + |CREATE TABLE t(a string, b int) + |USING hive + |LOCATION "$dir" + """.stripMargin) +val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t")) +assert(new Path(table.location) == makeQualifiedPath(dir.getAbsolutePath)) + +dir.delete() +checkAnswer(spark.table("t"), Nil) + +val newDirFile = new File(dir, "x") +val newDirPath = newDirFile.getAbsolutePath +spark.sql(s"ALTER TABLE t SET LO
[GitHub] spark pull request #16910: [SPARK-19575][SQL]Reading from or writing to a hi...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/16910#discussion_r104285450 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -1588,6 +1590,153 @@ class HiveDDLSuite } } + test("insert data to a hive serde table which has a non-existing location should succeed") { +withTable("t") { + withTempDir { dir => +spark.sql( + s""" + |CREATE TABLE t(a string, b int) + |USING hive + |LOCATION '$dir' --- End diff -- ok thanksï½ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org