[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r218562133 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand( s"${schema.fieldNames.mkString("[`", "`, `", "`]")}")) } - // Add the comment to a column, if comment is empty, return the original column. - private def addComment(column: StructField, comment: Option[String]): StructField = { -comment.map(column.withComment(_)).getOrElse(column) - } - --- End diff -- Thank you for pinging me, @maropu . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r218358670 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand( s"${schema.fieldNames.mkString("[`", "`, `", "`]")}")) } - // Add the comment to a column, if comment is empty, return the original column. - private def addComment(column: StructField, comment: Option[String]): StructField = { -comment.map(column.withComment(_)).getOrElse(column) - } - --- End diff -- Thanks for the check!. I think we don't always need to comply with the Hive behaivour and an understandable behaivour for users is the best. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r218356576 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand( s"${schema.fieldNames.mkString("[`", "`, `", "`]")}")) } - // Add the comment to a column, if comment is empty, return the original column. - private def addComment(column: StructField, comment: Option[String]): StructField = { -comment.map(column.withComment(_)).getOrElse(column) - } - --- End diff -- cc: @gatorsmile @dongjoon-hyun --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r218324420 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand( s"${schema.fieldNames.mkString("[`", "`, `", "`]")}")) } - // Add the comment to a column, if comment is empty, return the original column. - private def addComment(column: StructField, comment: Option[String]): StructField = { -comment.map(column.withComment(_)).getOrElse(column) - } - --- End diff -- Thanks for your advise! I look into this in these days. With currently implement, all behavior comply with Hive(Support type change/Work well in non binary format/Exception in binary format like orc and parquet). Is it ok to add a config for constraint this? The work of adding logic to cast input data into changed type in catalog may need modifying 4 parts logic including vectorized reader and row reader in parquet and orc. If we don't agree the currently behavior, I'll keep following these. Item | Behavior | - Parquet Row Reader | ClassCastException in SpecificInternalRow.set${Type} Parquet Vectorized Reader | SchemaColumnConvertNotSupportedException in VectorizedColumnReader.read${Type}Batch Orc Row Reader | ClassCastException in OrcDeserializer.newWriter Orc Vectorized Reader | NullPointerException in OrcColumnVector get value by type method --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r217226856 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand( s"${schema.fieldNames.mkString("[`", "`, `", "`]")}")) } - // Add the comment to a column, if comment is empty, return the original column. - private def addComment(column: StructField, comment: Option[String]): StructField = { -comment.map(column.withComment(_)).getOrElse(column) - } - --- End diff -- In my opinion, in this pr, we need an additional logic to cast input data into a changed type in catalog when reading --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r216600156 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand( s"${schema.fieldNames.mkString("[`", "`, `", "`]")}")) } - // Add the comment to a column, if comment is empty, return the original column. - private def addComment(column: StructField, comment: Option[String]): StructField = { -comment.map(column.withComment(_)).getOrElse(column) - } - --- End diff -- ``` Although the query above doesn't work well, why do users change column types? ``` As the scenario described above, user firstly use int but during some time found here we need a Long, he can rewrite the new data as Long and load data to new partitions. And if we not support the type change, user should do the table recreate job for this type change work. Yep, if not the binary file, the query works OK. ``` Logging initialized using configuration in jar:file:/Users/XuanYuan/Source/hive/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties hive> CREATE TABLE t(a INT, b STRING, c INT); OK Time taken: 2.576 seconds hive> INSERT INTO t VALUES (1, 'a', 3);; Query ID = XuanYuan_20180911164348_32238a6c-b0a4-4cfd-aa3d-00a7628031cf Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Job running in-process (local Hadoop) 2018-09-11 16:43:51,684 Stage-1 map = 100%, reduce = 0% Ended Job = job_local162423_0001 Stage-4 is selected by condition resolver. Stage-3 is filtered out by condition resolver. Stage-5 is filtered out by condition resolver. Moving data to: file:/Users/XuanYuan/Source/hive/apache-hive-1.2.2-bin/warehouse/t/.hive-staging_hive_2018-09-11_16-43-48_117_2262603440504094412-1/-ext-1 Loading data to table default.t Table default.t stats: [numFiles=1, numRows=1, totalSize=6, rawDataSize=5] MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK Time taken: 4.025 seconds hive> select * from t;; OK 1 a 3 Time taken: 0.164 seconds, Fetched: 1 row(s) hive> ALTER TABLE t CHANGE a a STRING; OK Time taken: 0.177 seconds hive> select * from t; OK 1 a 3 Time taken: 0.12 seconds, Fetched: 1 row(s) hive> quit; ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r216175940 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand( s"${schema.fieldNames.mkString("[`", "`, `", "`]")}")) } - // Add the comment to a column, if comment is empty, return the original column. - private def addComment(column: StructField, comment: Option[String]): StructField = { -comment.map(column.withComment(_)).getOrElse(column) - } - --- End diff -- ah ok. Although the query above doesn't work well, why do users change column types? Basically, the query should work? e.g., in postgresql ``` postgres=# create table t(a int, b varchar); CREATE TABLE postgres=# insert into t values(1, 'a'); INSERT 0 1 postgres=# alter table t alter column a TYPE varchar; ALTER TABLE postgres=# select * from t; a | b ---+--- 1 | a (1 row) postgres=# \d t Table "public.t" Column | Type| Modifiers +---+--- a | character varying | b | character varying | ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r216132779 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand( s"${schema.fieldNames.mkString("[`", "`, `", "`]")}")) } - // Add the comment to a column, if comment is empty, return the original column. - private def addComment(column: StructField, comment: Option[String]): StructField = { -comment.map(column.withComment(_)).getOrElse(column) - } - --- End diff -- Its also can't work in Hive, I test with Hive 1.2.2 and Hadoop 2.7.3. ``` Logging initialized using configuration in jar:file:/Users/XuanYuan/Source/hive/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties hive> CREATE TABLE t(a INT, b STRING, c INT) stored as parquet; OK Time taken: 1.604 seconds hive> INSERT INTO t VALUES (1, 'a', 3); Query ID = XuanYuan_20180908230549_3c8732ff-07e0-4a7a-95b4-260aed04a762 Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Job running in-process (local Hadoop) SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. 2018-09-08 23:06:08,732 Stage-1 map = 0%, reduce = 0% 2018-09-08 23:06:09,743 Stage-1 map = 100%, reduce = 0% Ended Job = job_local712899233_0001 Stage-4 is selected by condition resolver. Stage-3 is filtered out by condition resolver. Stage-5 is filtered out by condition resolver. Moving data to: file:/Users/XuanYuan/Source/hive/apache-hive-1.2.2-bin/warehouse/t/.hive-staging_hive_2018-09-08_23-05-49_782_100109481692677607-1/-ext-1 Loading data to table default.t Table default.t stats: [numFiles=1, numRows=1, totalSize=343, rawDataSize=3] MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK Time taken: 20.294 seconds hive> select * from t; OK 1 a 3 Time taken: 0.154 seconds, Fetched: 1 row(s) hive> ALTER TABLE t CHANGE a a STRING; OK Time taken: 0.18 seconds hive> select * from t; OK Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.IntWritable Time taken: 0.116 seconds hive> ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r216128008 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand( s"${schema.fieldNames.mkString("[`", "`, `", "`]")}")) } - // Add the comment to a column, if comment is empty, return the original column. - private def addComment(column: StructField, comment: Option[String]): StructField = { -comment.map(column.withComment(_)).getOrElse(column) - } - --- End diff -- I just want to know if the qury above can work in Hive? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r216127904 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand( s"${schema.fieldNames.mkString("[`", "`, `", "`]")}")) } - // Add the comment to a column, if comment is empty, return the original column. - private def addComment(column: StructField, comment: Option[String]): StructField = { -comment.map(column.withComment(_)).getOrElse(column) - } - --- End diff -- So maybe we should also add a conf like `MetastoreConf.ConfVars.DISALLOW_INCOMPATIBLE_COL_TYPE_CHANGES` in hive to wrap this behavior? WDYT. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r216127880 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand( s"${schema.fieldNames.mkString("[`", "`, `", "`]")}")) } - // Add the comment to a column, if comment is empty, return the original column. - private def addComment(column: StructField, comment: Option[String]): StructField = { -comment.map(column.withComment(_)).getOrElse(column) - } - --- End diff -- Thanks for the checking, my mistake of not describe the intention to do this feature. We want support type change just want to add the ability of changing the metadata of column type. The scenario we meet is our user want a type change(like int is not enough, need a long type), they has done the type changing in their data file, but we should hack to change the metastore or create the whole table again. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r216123860 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand( s"${schema.fieldNames.mkString("[`", "`, `", "`]")}")) } - // Add the comment to a column, if comment is empty, return the original column. - private def addComment(column: StructField, comment: Option[String]): StructField = { -comment.map(column.withComment(_)).getOrElse(column) - } - --- End diff -- Ah, ok. Thanks for the check. btw, have you checked if this could work correctly? ``` sql("""CREATE TABLE t(a INT, b STRING, c INT) using parquet""") sql("""INSERT INTO t VALUES (1, 'a', 3)""") sql("""ALTER TABLE t CHANGE a a STRING""") spark.table("t").show org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file file:///Users/maropu/Repositories/spark/spark-master/spark-warehouse/t/part-0-93ddfd05-690a-480c-8cc5-fd0981206fc3-c000.snappy.parquet. Column: [a], Expected: string, Found: INT32 at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:192) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.scan_nextBatch_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.s ... ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r216122599 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand( s"${schema.fieldNames.mkString("[`", "`, `", "`]")}")) } - // Add the comment to a column, if comment is empty, return the original column. - private def addComment(column: StructField, comment: Option[String]): StructField = { -comment.map(column.withComment(_)).getOrElse(column) - } - --- End diff -- Thanks for your question, actually that's also what I'm consider during do the compatible check. Hive do this column type change work in [HiveAlterHandler](https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java#L175 ) and the detailed compatible check is in [ColumnType](https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ColumnType.java#L206). You can see in the ColumnType checking work, it actually use the `canCast` semantic to judge compatible. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r216116477 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand( s"${schema.fieldNames.mkString("[`", "`, `", "`]")}")) } - // Add the comment to a column, if comment is empty, return the original column. - private def addComment(column: StructField, comment: Option[String]): StructField = { -comment.map(column.withComment(_)).getOrElse(column) - } - --- End diff -- Probably, we need to comply with the Hive behaivour. Is the current fix (by casting) the same with Hive? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r215860035 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand( s"${schema.fieldNames.mkString("[`", "`, `", "`]")}")) } - // Add the comment to a column, if comment is empty, return the original column. - private def addComment(column: StructField, comment: Option[String]): StructField = { -comment.map(column.withComment(_)).getOrElse(column) - } - --- End diff -- Thanks for advise, I should also check the type compatible, add in ef65c4d. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r215859851 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -1697,6 +1697,16 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { sql("ALTER TABLE dbx.tab1 CHANGE COLUMN col1 col1 INT COMMENT 'this is col1'") assert(getMetadata("col1").getString("key") == "value") assert(getMetadata("col1").getString("comment") == "this is col1") + +// Ensure that changing column type takes effect +sql("ALTER TABLE dbx.tab1 CHANGE COLUMN col1 col1 STRING") +val column = catalog.getTableMetadata(tableIdent).schema.fields.find(_.name == "col1") +assert(column.get.dataType == StringType) + +// Ensure that changing partition column type throw exception +intercept[AnalysisException] { + sql("ALTER TABLE dbx.tab1 CHANGE COLUMN a a STRING") +} --- End diff -- Thanks, done in ef65c4d. Also add check for type compatible check. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r215859764 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -318,18 +318,34 @@ case class AlterTableChangeColumnCommand( // Find the origin column from dataSchema by column name. val originColumn = findColumnByName(table.dataSchema, columnName, resolver) -// Throw an AnalysisException if the column name/dataType is changed. +// Throw an AnalysisException if the column name is changed. if (!columnEqual(originColumn, newColumn, resolver)) { throw new AnalysisException( "ALTER TABLE CHANGE COLUMN is not supported for changing column " + s"'${originColumn.name}' with type '${originColumn.dataType}' to " + s"'${newColumn.name}' with type '${newColumn.dataType}'") --- End diff -- After add the type check, maybe we also need the type message in error message. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r215859681 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -318,18 +318,34 @@ case class AlterTableChangeColumnCommand( // Find the origin column from dataSchema by column name. val originColumn = findColumnByName(table.dataSchema, columnName, resolver) -// Throw an AnalysisException if the column name/dataType is changed. +// Throw an AnalysisException if the column name is changed. if (!columnEqual(originColumn, newColumn, resolver)) { --- End diff -- Thanks, not enough yet, add type compatible check in ef65c4d. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r215283130 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand( s"${schema.fieldNames.mkString("[`", "`, `", "`]")}")) } - // Add the comment to a column, if comment is empty, return the original column. - private def addComment(column: StructField, comment: Option[String]): StructField = { -comment.map(column.withComment(_)).getOrElse(column) - } - --- End diff -- What happens if we need data conversion (e.g., from ing to double?) in binary formats (parquet and orc)? Also, What happens if we get incompatible type changes? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r215279164 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -318,18 +318,34 @@ case class AlterTableChangeColumnCommand( // Find the origin column from dataSchema by column name. val originColumn = findColumnByName(table.dataSchema, columnName, resolver) -// Throw an AnalysisException if the column name/dataType is changed. +// Throw an AnalysisException if the column name is changed. if (!columnEqual(originColumn, newColumn, resolver)) { --- End diff -- Its ok to check names only? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r215279507 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -318,18 +318,34 @@ case class AlterTableChangeColumnCommand( // Find the origin column from dataSchema by column name. val originColumn = findColumnByName(table.dataSchema, columnName, resolver) -// Throw an AnalysisException if the column name/dataType is changed. +// Throw an AnalysisException if the column name is changed. if (!columnEqual(originColumn, newColumn, resolver)) { throw new AnalysisException( "ALTER TABLE CHANGE COLUMN is not supported for changing column " + s"'${originColumn.name}' with type '${originColumn.dataType}' to " + s"'${newColumn.name}' with type '${newColumn.dataType}'") --- End diff -- Can you update this error message? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r215280045 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -1697,6 +1697,16 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { sql("ALTER TABLE dbx.tab1 CHANGE COLUMN col1 col1 INT COMMENT 'this is col1'") assert(getMetadata("col1").getString("key") == "value") assert(getMetadata("col1").getString("comment") == "this is col1") + +// Ensure that changing column type takes effect +sql("ALTER TABLE dbx.tab1 CHANGE COLUMN col1 col1 STRING") +val column = catalog.getTableMetadata(tableIdent).schema.fields.find(_.name == "col1") +assert(column.get.dataType == StringType) + +// Ensure that changing partition column type throw exception +intercept[AnalysisException] { + sql("ALTER TABLE dbx.tab1 CHANGE COLUMN a a STRING") +} --- End diff -- Please compare the error message. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r204805474 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -318,18 +318,34 @@ case class AlterTableChangeColumnCommand( // Find the origin column from dataSchema by column name. val originColumn = findColumnByName(table.dataSchema, columnName, resolver) -// Throw an AnalysisException if the column name/dataType is changed. +// Throw an AnalysisException if the column name is changed. if (!columnEqual(originColumn, newColumn, resolver)) { throw new AnalysisException( "ALTER TABLE CHANGE COLUMN is not supported for changing column " + s"'${originColumn.name}' with type '${originColumn.dataType}' to " + s"'${newColumn.name}' with type '${newColumn.dataType}'") } +val typeChanged = originColumn.dataType != newColumn.dataType +val partitionColumnChanged = table.partitionColumnNames.contains(originColumn.name) + +// Throw an AnalysisException if the type of partition column is changed. +if (typeChanged && partitionColumnChanged) { --- End diff -- Just adding a check here when user changing the type of partition columns. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r152957444 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -318,16 +318,26 @@ case class AlterTableChangeColumnCommand( s"'${newColumn.name}' with type '${newColumn.dataType}'") } -val newSchema = table.schema.fields.map { field => +val typeChanged = originColumn.dataType != newColumn.dataType +val newDataSchema = table.dataSchema.fields.map { field => if (field.name == originColumn.name) { -// Create a new column from the origin column with the new comment. -addComment(field, newColumn.getComment) +// Add the comment to a column, if comment is empty, return the original column. +val newField = newColumn.getComment.map(field.withComment(_)).getOrElse(field) +if (typeChanged) { + newField.copy(dataType = newColumn.dataType) +} else { + newField +} } else { field } } -val newTable = table.copy(schema = StructType(newSchema)) -catalog.alterTable(newTable) +val newTable = table.copy(schema = StructType(newDataSchema ++ table.partitionSchema)) +if (typeChanged) { + catalog.alterTableDataSchema(tableName, StructType(newDataSchema)) --- End diff -- I add the checking logic in next commit and fix bug for changing comment of partition column. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r152753785 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -318,16 +318,26 @@ case class AlterTableChangeColumnCommand( s"'${newColumn.name}' with type '${newColumn.dataType}'") } -val newSchema = table.schema.fields.map { field => +val typeChanged = originColumn.dataType != newColumn.dataType +val newDataSchema = table.dataSchema.fields.map { field => if (field.name == originColumn.name) { -// Create a new column from the origin column with the new comment. -addComment(field, newColumn.getComment) +// Add the comment to a column, if comment is empty, return the original column. +val newField = newColumn.getComment.map(field.withComment(_)).getOrElse(field) +if (typeChanged) { + newField.copy(dataType = newColumn.dataType) +} else { + newField +} } else { field } } -val newTable = table.copy(schema = StructType(newSchema)) -catalog.alterTable(newTable) +val newTable = table.copy(schema = StructType(newDataSchema ++ table.partitionSchema)) +if (typeChanged) { + catalog.alterTableDataSchema(tableName, StructType(newDataSchema)) --- End diff -- [HIVE-3672](https://issues.apache.org/jira/browse/HIVE-3672) Hive support this by adding new command of `ALTER TABLE PARTITION COLUMN ( )`. So here maybe I should throw an AnalysisException while user change the type of partition column? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r152480007 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -318,16 +318,26 @@ case class AlterTableChangeColumnCommand( s"'${newColumn.name}' with type '${newColumn.dataType}'") } -val newSchema = table.schema.fields.map { field => +val typeChanged = originColumn.dataType != newColumn.dataType +val newDataSchema = table.dataSchema.fields.map { field => if (field.name == originColumn.name) { -// Create a new column from the origin column with the new comment. -addComment(field, newColumn.getComment) +// Add the comment to a column, if comment is empty, return the original column. +val newField = newColumn.getComment.map(field.withComment(_)).getOrElse(field) +if (typeChanged) { + newField.copy(dataType = newColumn.dataType) +} else { + newField +} } else { field } } -val newTable = table.copy(schema = StructType(newSchema)) -catalog.alterTable(newTable) +val newTable = table.copy(schema = StructType(newDataSchema ++ table.partitionSchema)) +if (typeChanged) { + catalog.alterTableDataSchema(tableName, StructType(newDataSchema)) --- End diff -- What is the Hive's behavior if users change the column type of partition schema? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r151990044 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -318,16 +318,28 @@ case class AlterTableChangeColumnCommand( s"'${newColumn.name}' with type '${newColumn.dataType}'") } +val changeSchema = originColumn.dataType != newColumn.dataType val newSchema = table.schema.fields.map { field => if (field.name == originColumn.name) { -// Create a new column from the origin column with the new comment. -addComment(field, newColumn.getComment) +var newField = field --- End diff -- More clear for getting rid of var, pls check next patch. If we implement rename or others meta change feature here, may still need some code rework. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r151739773 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -318,16 +318,28 @@ case class AlterTableChangeColumnCommand( s"'${newColumn.name}' with type '${newColumn.dataType}'") } +val changeSchema = originColumn.dataType != newColumn.dataType --- End diff -- What do you think about renaming the val to `typeChanged`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r151740225 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -1459,6 +1459,11 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { // Ensure that change column will preserve other metadata fields. sql("ALTER TABLE dbx.tab1 CHANGE COLUMN col1 col1 INT COMMENT 'this is col1'") assert(getMetadata("col1").getString("key") == "value") + +// Ensure that change column type take effect --- End diff -- s/change/changing + s/take/takes --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/19773#discussion_r151739604 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -318,16 +318,28 @@ case class AlterTableChangeColumnCommand( s"'${newColumn.name}' with type '${newColumn.dataType}'") } +val changeSchema = originColumn.dataType != newColumn.dataType val newSchema = table.schema.fields.map { field => if (field.name == originColumn.name) { -// Create a new column from the origin column with the new comment. -addComment(field, newColumn.getComment) +var newField = field --- End diff -- I'd recommend getting rid of this `var` and re-writting the code as follows: ``` val newField = newColumn.getComment.map(...).getOrElse(field) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org