[GitHub] [spark] gengliangwang commented on a change in pull request #35855: [SPARK-38336][SQL] Support DEFAULT column values in CREATE/REPLACE TABLE and INSERT INTO statements

GitBox Tue, 22 Mar 2022 07:40:46 -0700


gengliangwang commented on a change in pull request #35855:
URL: https://github.com/apache/spark/pull/35855#discussion_r832161821




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2709,6 +2709,28 @@ object SQLConf {
     .booleanConf
     .createWithDefault(false)
 
+  val ENABLE_DEFAULT_COLUMNS =
+    buildConf("spark.sql.parser.enableDefaultColumns")
+      .internal()
+      .doc("When true, allow CREATE TABLE, REPLACE TABLE, and ALTER COLUMN 
statements to set or " +
+        "update default values for specific columns. Following INSERT, MERGE, 
and UPDATE " +
+        "statements may then omit these values and their values will be 
injected automatically " +
+        "instead.")
+      .version("3.3.0")

Review comment:
       3.4.0

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2709,6 +2709,28 @@ object SQLConf {
     .booleanConf
     .createWithDefault(false)
 
+  val ENABLE_DEFAULT_COLUMNS =
+    buildConf("spark.sql.parser.enableDefaultColumns")
+      .internal()
+      .doc("When true, allow CREATE TABLE, REPLACE TABLE, and ALTER COLUMN 
statements to set or " +
+        "update default values for specific columns. Following INSERT, MERGE, 
and UPDATE " +
+        "statements may then omit these values and their values will be 
injected automatically " +
+        "instead.")
+      .version("3.3.0")
+      .booleanConf
+      .createWithDefault(true)
+
+  val USE_NULLS_FOR_MISSING_DEFAULT_COLUMN_VALUES =
+    buildConf("spark.sql.parser.useNullsForMissingDefaultColumnValues")
+      .internal()
+      .doc("When true, and DEFAULT columns are enabled, allow column 
definitions lacking " +
+        "explicit default values to behave as if they had specified DEFAULT 
NULL instead. " +
+        "For example, this allows most INSERT INTO statements to specify only 
a prefix of the " +
+        "columns in the target table, and the remaining columns will receive 
NULL values.")
+      .version("3.3.0")

Review comment:
       3.4.0

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2709,6 +2709,28 @@ object SQLConf {
     .booleanConf
     .createWithDefault(false)
 
+  val ENABLE_DEFAULT_COLUMNS =
+    buildConf("spark.sql.parser.enableDefaultColumns")
+      .internal()
+      .doc("When true, allow CREATE TABLE, REPLACE TABLE, and ALTER COLUMN 
statements to set or " +
+        "update default values for specific columns. Following INSERT, MERGE, 
and UPDATE " +
+        "statements may then omit these values and their values will be 
injected automatically " +
+        "instead.")
+      .version("3.3.0")

Review comment:
       3.4.0

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2709,6 +2709,28 @@ object SQLConf {
     .booleanConf
     .createWithDefault(false)
 
+  val ENABLE_DEFAULT_COLUMNS =
+    buildConf("spark.sql.parser.enableDefaultColumns")
+      .internal()
+      .doc("When true, allow CREATE TABLE, REPLACE TABLE, and ALTER COLUMN 
statements to set or " +
+        "update default values for specific columns. Following INSERT, MERGE, 
and UPDATE " +
+        "statements may then omit these values and their values will be 
injected automatically " +
+        "instead.")
+      .version("3.3.0")
+      .booleanConf
+      .createWithDefault(true)
+
+  val USE_NULLS_FOR_MISSING_DEFAULT_COLUMN_VALUES =
+    buildConf("spark.sql.parser.useNullsForMissingDefaultColumnValues")

Review comment:
       how about `spark.sql.defaultColumn.useNullsForMissingDefaultValues`?

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2709,6 +2709,28 @@ object SQLConf {
     .booleanConf
     .createWithDefault(false)
 
+  val ENABLE_DEFAULT_COLUMNS =
+    buildConf("spark.sql.parser.enableDefaultColumns")
+      .internal()
+      .doc("When true, allow CREATE TABLE, REPLACE TABLE, and ALTER COLUMN 
statements to set or " +
+        "update default values for specific columns. Following INSERT, MERGE, 
and UPDATE " +
+        "statements may then omit these values and their values will be 
injected automatically " +
+        "instead.")
+      .version("3.3.0")
+      .booleanConf
+      .createWithDefault(true)
+
+  val USE_NULLS_FOR_MISSING_DEFAULT_COLUMN_VALUES =
+    buildConf("spark.sql.parser.useNullsForMissingDefaultColumnValues")

Review comment:
       how about `spark.sql.defaultColumn.useNullsForMissingDefaultValues`?

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2709,6 +2709,28 @@ object SQLConf {
     .booleanConf
     .createWithDefault(false)
 
+  val ENABLE_DEFAULT_COLUMNS =
+    buildConf("spark.sql.parser.enableDefaultColumns")
+      .internal()
+      .doc("When true, allow CREATE TABLE, REPLACE TABLE, and ALTER COLUMN 
statements to set or " +
+        "update default values for specific columns. Following INSERT, MERGE, 
and UPDATE " +
+        "statements may then omit these values and their values will be 
injected automatically " +
+        "instead.")
+      .version("3.3.0")
+      .booleanConf
+      .createWithDefault(true)
+
+  val USE_NULLS_FOR_MISSING_DEFAULT_COLUMN_VALUES =
+    buildConf("spark.sql.parser.useNullsForMissingDefaultColumnValues")
+      .internal()
+      .doc("When true, and DEFAULT columns are enabled, allow column 
definitions lacking " +
+        "explicit default values to behave as if they had specified DEFAULT 
NULL instead. " +
+        "For example, this allows most INSERT INTO statements to specify only 
a prefix of the " +
+        "columns in the target table, and the remaining columns will receive 
NULL values.")
+      .version("3.3.0")

Review comment:
       3.4.0

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2709,6 +2709,28 @@ object SQLConf {
     .booleanConf
     .createWithDefault(false)
 
+  val ENABLE_DEFAULT_COLUMNS =
+    buildConf("spark.sql.parser.enableDefaultColumns")
+      .internal()
+      .doc("When true, allow CREATE TABLE, REPLACE TABLE, and ALTER COLUMN 
statements to set or " +
+        "update default values for specific columns. Following INSERT, MERGE, 
and UPDATE " +
+        "statements may then omit these values and their values will be 
injected automatically " +
+        "instead.")
+      .version("3.3.0")
+      .booleanConf
+      .createWithDefault(true)
+
+  val USE_NULLS_FOR_MISSING_DEFAULT_COLUMN_VALUES =
+    buildConf("spark.sql.parser.useNullsForMissingDefaultColumnValues")
+      .internal()
+      .doc("When true, and DEFAULT columns are enabled, allow column 
definitions lacking " +
+        "explicit default values to behave as if they had specified DEFAULT 
NULL instead. " +
+        "For example, this allows most INSERT INTO statements to specify only 
a prefix of the " +
+        "columns in the target table, and the remaining columns will receive 
NULL values.")
+      .version("3.3.0")
+      .booleanConf
+      .createWithDefault(false)

Review comment:
       +1 for making the default value false

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala
##########
@@ -0,0 +1,354 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.analysis
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.analysis.ResolveDefaultColumns._
+import org.apache.spark.sql.catalyst.catalog._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.optimizer.ConstantFolding
+import org.apache.spark.sql.catalyst.parser.{CatalystSqlParser, ParseException}
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.catalyst.trees.AlwaysProcess
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types._
+
+/**
+ * This is a rule to process DEFAULT columns in statements such as 
CREATE/REPLACE TABLE.
+ *
+ * Background: CREATE TABLE and ALTER TABLE invocations support setting column 
default values for
+ * later operations. Following INSERT, and INSERT MERGE commands may then 
reference the value
+ * using the DEFAULT keyword as needed.
+ *
+ * Example:
+ * CREATE TABLE T(a INT DEFAULT 4, b INT NOT NULL DEFAULT 5);
+ * INSERT INTO T VALUES (1, 2);
+ * INSERT INTO T VALUES (1, DEFAULT);
+ * INSERT INTO T VALUES (DEFAULT, 6);
+ * SELECT * FROM T;
+ * (1, 2)
+ * (1, 5)
+ * (4, 6)
+ *
+ * @param catalog the catalog to use for looking up the schema of INSERT INTO 
table objects.
+ * @param insert the enclosing INSERT statement for which this rule is 
processing the query, if any.
+ */
+case class ResolveDefaultColumns(
+    catalog: SessionCatalog, insert: Option[InsertIntoStatement] = None) 
extends Rule[LogicalPlan] {

Review comment:
       ```suggestion
       catalog: SessionCatalog,
       insert: Option[InsertIntoStatement] = None) extends Rule[LogicalPlan] {
   ```

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala
##########
@@ -0,0 +1,354 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.analysis
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.analysis.ResolveDefaultColumns._
+import org.apache.spark.sql.catalyst.catalog._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.optimizer.ConstantFolding
+import org.apache.spark.sql.catalyst.parser.{CatalystSqlParser, ParseException}
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.catalyst.trees.AlwaysProcess
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types._
+
+/**
+ * This is a rule to process DEFAULT columns in statements such as 
CREATE/REPLACE TABLE.
+ *
+ * Background: CREATE TABLE and ALTER TABLE invocations support setting column 
default values for
+ * later operations. Following INSERT, and INSERT MERGE commands may then 
reference the value
+ * using the DEFAULT keyword as needed.
+ *
+ * Example:
+ * CREATE TABLE T(a INT DEFAULT 4, b INT NOT NULL DEFAULT 5);
+ * INSERT INTO T VALUES (1, 2);
+ * INSERT INTO T VALUES (1, DEFAULT);
+ * INSERT INTO T VALUES (DEFAULT, 6);
+ * SELECT * FROM T;
+ * (1, 2)
+ * (1, 5)
+ * (4, 6)
+ *
+ * @param catalog the catalog to use for looking up the schema of INSERT INTO 
table objects.
+ * @param insert the enclosing INSERT statement for which this rule is 
processing the query, if any.
+ */
+case class ResolveDefaultColumns(
+    catalog: SessionCatalog, insert: Option[InsertIntoStatement] = None) 
extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = 
plan.resolveOperatorsWithPruning(

Review comment:
       nit: Let's check the conf enableDefaultColumns at very beginning: 
   ```
   if (!conf.enableDefaultColumns) {
     return plan
   }
   ```

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2709,6 +2709,28 @@ object SQLConf {
     .booleanConf
     .createWithDefault(false)
 
+  val ENABLE_DEFAULT_COLUMNS =
+    buildConf("spark.sql.parser.enableDefaultColumns")
+      .internal()
+      .doc("When true, allow CREATE TABLE, REPLACE TABLE, and ALTER COLUMN 
statements to set or " +
+        "update default values for specific columns. Following INSERT, MERGE, 
and UPDATE " +
+        "statements may then omit these values and their values will be 
injected automatically " +
+        "instead.")
+      .version("3.3.0")
+      .booleanConf
+      .createWithDefault(true)
+
+  val USE_NULLS_FOR_MISSING_DEFAULT_COLUMN_VALUES =
+    buildConf("spark.sql.parser.useNullsForMissingDefaultColumnValues")
+      .internal()
+      .doc("When true, and DEFAULT columns are enabled, allow column 
definitions lacking " +
+        "explicit default values to behave as if they had specified DEFAULT 
NULL instead. " +
+        "For example, this allows most INSERT INTO statements to specify only 
a prefix of the " +
+        "columns in the target table, and the remaining columns will receive 
NULL values.")
+      .version("3.3.0")
+      .booleanConf
+      .createWithDefault(false)

Review comment:
       +1 for making the default value false

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2709,6 +2709,28 @@ object SQLConf {
     .booleanConf
     .createWithDefault(false)
 
+  val ENABLE_DEFAULT_COLUMNS =
+    buildConf("spark.sql.parser.enableDefaultColumns")

Review comment:
       how about `spark.sql.defaultColumn.enabled`?

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala
##########
@@ -0,0 +1,354 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.analysis
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.analysis.ResolveDefaultColumns._
+import org.apache.spark.sql.catalyst.catalog._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.optimizer.ConstantFolding
+import org.apache.spark.sql.catalyst.parser.{CatalystSqlParser, ParseException}
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.catalyst.trees.AlwaysProcess
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types._
+
+/**
+ * This is a rule to process DEFAULT columns in statements such as 
CREATE/REPLACE TABLE.
+ *
+ * Background: CREATE TABLE and ALTER TABLE invocations support setting column 
default values for
+ * later operations. Following INSERT, and INSERT MERGE commands may then 
reference the value
+ * using the DEFAULT keyword as needed.
+ *
+ * Example:
+ * CREATE TABLE T(a INT DEFAULT 4, b INT NOT NULL DEFAULT 5);
+ * INSERT INTO T VALUES (1, 2);
+ * INSERT INTO T VALUES (1, DEFAULT);
+ * INSERT INTO T VALUES (DEFAULT, 6);
+ * SELECT * FROM T;
+ * (1, 2)
+ * (1, 5)
+ * (4, 6)
+ *
+ * @param catalog the catalog to use for looking up the schema of INSERT INTO 
table objects.
+ * @param insert the enclosing INSERT statement for which this rule is 
processing the query, if any.
+ */
+case class ResolveDefaultColumns(
+    catalog: SessionCatalog, insert: Option[InsertIntoStatement] = None) 
extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = 
plan.resolveOperatorsWithPruning(

Review comment:
       nit: Let's check the conf enableDefaultColumns at very beginning: 
   ```
   if (!conf.enableDefaultColumns) {
     return plan
   }
   ```

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala
##########
@@ -0,0 +1,354 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.analysis
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.analysis.ResolveDefaultColumns._
+import org.apache.spark.sql.catalyst.catalog._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.optimizer.ConstantFolding
+import org.apache.spark.sql.catalyst.parser.{CatalystSqlParser, ParseException}
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.catalyst.trees.AlwaysProcess
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types._
+
+/**
+ * This is a rule to process DEFAULT columns in statements such as 
CREATE/REPLACE TABLE.
+ *
+ * Background: CREATE TABLE and ALTER TABLE invocations support setting column 
default values for
+ * later operations. Following INSERT, and INSERT MERGE commands may then 
reference the value
+ * using the DEFAULT keyword as needed.
+ *
+ * Example:
+ * CREATE TABLE T(a INT DEFAULT 4, b INT NOT NULL DEFAULT 5);
+ * INSERT INTO T VALUES (1, 2);
+ * INSERT INTO T VALUES (1, DEFAULT);
+ * INSERT INTO T VALUES (DEFAULT, 6);
+ * SELECT * FROM T;
+ * (1, 2)
+ * (1, 5)
+ * (4, 6)
+ *
+ * @param catalog the catalog to use for looking up the schema of INSERT INTO 
table objects.
+ * @param insert the enclosing INSERT statement for which this rule is 
processing the query, if any.
+ */
+case class ResolveDefaultColumns(
+    catalog: SessionCatalog, insert: Option[InsertIntoStatement] = None) 
extends Rule[LogicalPlan] {

Review comment:
       ```suggestion
       catalog: SessionCatalog,
       insert: Option[InsertIntoStatement] = None) extends Rule[LogicalPlan] {
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] gengliangwang commented on a change in pull request #35855: [SPARK-38336][SQL] Support DEFAULT column values in CREATE/REPLACE TABLE and INSERT INTO statements

Reply via email to