[GitHub] [spark] imback82 commented on a change in pull request #35113: [SPARK-37636][SQL] Migrate CREATE NAMESPACE to use V2 command by default

GitBox Sat, 08 Jan 2022 02:26:47 -0800


imback82 commented on a change in pull request #35113:
URL: https://github.com/apache/spark/pull/35113#discussion_r780069534




##########
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/TestsV1AndV2Commands.scala
##########
@@ -28,14 +28,21 @@ import org.apache.spark.sql.internal.SQLConf
 trait TestsV1AndV2Commands extends DDLCommandTestUtils {
   private var _version: String = ""
   override def commandVersion: String = _version
+  def runningV1Command: Boolean = commandVersion == "V1"
 
   // Tests using V1 catalogs will run with `spark.sql.legacy.useV1Command` on 
and off
   // to test both V1 and V2 commands.
   override def test(testName: String, testTags: Tag*)(testFun: => Any)
     (implicit pos: Position): Unit = {
     Seq(true, false).foreach { useV1Command =>
-      _version = if (useV1Command) "V1" else "V2"
+      def setCommandVersion(): Unit = {
+        _version = if (useV1Command) "V1" else "V2"
+      }
+      setCommandVersion()
       super.test(testName, testTags: _*) {
+        // Need to set command version inside this test function so that
+        // the correct command version is available in each test.
+        setCommandVersion()

Review comment:
       This `def test` doesn't run `testFun` when it's invoked; it only 
registers the test). So by the time, `testFunc` is actually run, `_version` 
will always be set to "V2" (`useV1Command == false`). So we need to capture 
this inside the lambda that's passed into `super.test`.
   
   Also, note that we need to call `setCommandVersion()` before calling 
`super.test` because `super.test` being called is utilizing the 
`commandVersion` to set the right test name:
   
https://github.com/apache/spark/blob/527e842ee6ed3bb1aabba61d02337ab81a56f87b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLCommandTestUtils.scala#L56-L57

##########
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
##########
@@ -189,8 +189,15 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, 
hadoopConf: Configurat
 
   override def createDatabase(
       dbDefinition: CatalogDatabase,
-      ignoreIfExists: Boolean): Unit = withClient {
-    client.createDatabase(dbDefinition, ignoreIfExists)
+      ignoreIfExists: Boolean): Unit = {
+    try {
+      withClient {
+        client.createDatabase(dbDefinition, ignoreIfExists)
+      }
+    } catch {
+      case e: AnalysisException if e.message.contains("already exists") =>
+        throw new DatabaseAlreadyExistsException(dbDefinition.name)

Review comment:
       @cloud-fan I intercepted the exception here. looks a bit hacky?

##########
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
##########
@@ -189,8 +189,15 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, 
hadoopConf: Configurat
 
   override def createDatabase(
       dbDefinition: CatalogDatabase,
-      ignoreIfExists: Boolean): Unit = withClient {
-    client.createDatabase(dbDefinition, ignoreIfExists)
+      ignoreIfExists: Boolean): Unit = {
+    try {
+      withClient {
+        client.createDatabase(dbDefinition, ignoreIfExists)
+      }
+    } catch {
+      case e: AnalysisException if e.message.contains("already exists") =>
+        throw new DatabaseAlreadyExistsException(dbDefinition.name)

Review comment:
       Two concerns if we move the logic to `withClient`:
   1. How can we guarantee 
`org.apache.hadoop.hive.metastore.api.AlreadyExistsException` is thrown by 
`createDatabase` but not for other calls like `createTable`?
   2. Since db name is not available, we need to parse out the db name from the 
message.
   
   Are you OK with these?

##########
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
##########
@@ -189,8 +189,15 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, 
hadoopConf: Configurat
 
   override def createDatabase(
       dbDefinition: CatalogDatabase,
-      ignoreIfExists: Boolean): Unit = withClient {
-    client.createDatabase(dbDefinition, ignoreIfExists)
+      ignoreIfExists: Boolean): Unit = {
+    try {
+      withClient {
+        client.createDatabase(dbDefinition, ignoreIfExists)
+      }
+    } catch {
+      case e: AnalysisException if e.message.contains("already exists") =>
+        throw new DatabaseAlreadyExistsException(dbDefinition.name)

Review comment:
       Two concerns if we move the logic to `withClient`:
   1. How can we guarantee 
`org.apache.hadoop.hive.metastore.api.AlreadyExistsException` is thrown only by 
`createDatabase` but not for other calls like `createTable`?
   2. Since db name is not available, we need to parse out the db name from the 
message.
   
   Are you OK with these?

##########
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
##########
@@ -189,8 +189,15 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, 
hadoopConf: Configurat
 
   override def createDatabase(
       dbDefinition: CatalogDatabase,
-      ignoreIfExists: Boolean): Unit = withClient {
-    client.createDatabase(dbDefinition, ignoreIfExists)
+      ignoreIfExists: Boolean): Unit = {
+    try {
+      withClient {
+        client.createDatabase(dbDefinition, ignoreIfExists)
+      }
+    } catch {
+      case e: AnalysisException if e.message.contains("already exists") =>
+        throw new DatabaseAlreadyExistsException(dbDefinition.name)

Review comment:
       Good idea! Let me try this approach.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] imback82 commented on a change in pull request #35113: [SPARK-37636][SQL] Migrate CREATE NAMESPACE to use V2 command by default

Reply via email to