[jira] [Updated] (SPARK-23890) Hive ALTER TABLE CHANGE COLUMN for struct type no longer works

2023-12-14 Thread Andrew Otto (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Otto updated SPARK-23890:

 Shepherd: Max Gekk
Affects Version/s: 3.0.0
  Description: 
As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
CHANGE COLUMN commands to Hive.  This restriction was loosened in 
[https://github.com/apache/spark/pull/12714] to allow for those commands if 
they only change the column comment.

Wikimedia has been evolving Parquet backed Hive tables with data originally 
from JSON events by adding newly found columns to the Hive table schema, via a 
Spark job we call 'Refine'.  We do this by recursively merging an input 
DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
then issuing an ALTER TABLE statement to add the columns.  However, because we 
allow for nested data types in the incoming JSON data, we make extensive use of 
struct type fields.  In order to add newly detected fields in a nested data 
type, we must alter the struct column and append the nested struct field.  This 
requires CHANGE COLUMN that alters the column type.  In reality, the 'type' of 
the column is not changing, it just just a new field being added to the struct, 
but to SQL, this looks like a type change.

-We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
can be sent to Hive will block us.  I believe this is fixable by adding an 
exception in 
[command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
 to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
destination type are both struct types, and the destination type only adds new 
fields.-

 

In this [PR|https://github.com/apache/spark/pull/21012], I was told that the 
Spark 3 datasource v2 would support this.

However, it is clear that it does not.  There is an [explicit 
check|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L1441]
 and 
[test|https://github.com/apache/spark/blob/e3f46ed57dc063566cdb9425b4d5e02c65332df1/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala#L583]
 that prevents this from happening.

 

 

  was:
As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
CHANGE COLUMN commands to Hive.  This restriction was loosened in 
[https://github.com/apache/spark/pull/12714] to allow for those commands if 
they only change the column comment.

Wikimedia has been evolving Parquet backed Hive tables with data originally 
from JSON events by adding newly found columns to the Hive table schema, via a 
Spark job we call 'Refine'.  We do this by recursively merging an input 
DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
then issuing an ALTER TABLE statement to add the columns.  However, because we 
allow for nested data types in the incoming JSON data, we make extensive use of 
struct type fields.  In order to add newly detected fields in a nested data 
type, we must alter the struct column and append the nested struct field.  This 
requires CHANGE COLUMN that alters the column type.  In reality, the 'type' of 
the column is not changing, it just just a new field being added to the struct, 
but to SQL, this looks like a type change.

We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
can be sent to Hive will block us.  I believe this is fixable by adding an 
exception in 
[command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
 to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
destination type are both struct types, and the destination type only adds new 
fields.

 

 


> Hive ALTER TABLE CHANGE COLUMN for struct type no longer works
> --
>
> Key: SPARK-23890
> URL: https://issues.apache.org/jira/browse/SPARK-23890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Andrew Otto
>Priority: Major
>  Labels: bulk-closed, pull-request-available
>
> As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
> CHANGE COLUMN commands to Hive.  This restriction was loosened in 
> [https://github.com/apache/spark/pull/12714] to allow for those commands if 
> they only change the column comment.
> Wikimedia has been evolving Parquet backed Hive tables with data originally 
> from JSON events by adding newly found columns to the Hive table schema, via 
> a Spark job we call 'Refine'.  We do this by recursively merging an input 
> DataFrame schema with a 

[jira] [Updated] (SPARK-23890) Hive ALTER TABLE CHANGE COLUMN for struct type no longer works

2023-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-23890:
---
Labels: bulk-closed pull-request-available  (was: bulk-closed)

> Hive ALTER TABLE CHANGE COLUMN for struct type no longer works
> --
>
> Key: SPARK-23890
> URL: https://issues.apache.org/jira/browse/SPARK-23890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Otto
>Priority: Major
>  Labels: bulk-closed, pull-request-available
>
> As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
> CHANGE COLUMN commands to Hive.  This restriction was loosened in 
> [https://github.com/apache/spark/pull/12714] to allow for those commands if 
> they only change the column comment.
> Wikimedia has been evolving Parquet backed Hive tables with data originally 
> from JSON events by adding newly found columns to the Hive table schema, via 
> a Spark job we call 'Refine'.  We do this by recursively merging an input 
> DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
> then issuing an ALTER TABLE statement to add the columns.  However, because 
> we allow for nested data types in the incoming JSON data, we make extensive 
> use of struct type fields.  In order to add newly detected fields in a nested 
> data type, we must alter the struct column and append the nested struct 
> field.  This requires CHANGE COLUMN that alters the column type.  In reality, 
> the 'type' of the column is not changing, it just just a new field being 
> added to the struct, but to SQL, this looks like a type change.
> We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
> can be sent to Hive will block us.  I believe this is fixable by adding an 
> exception in 
> [command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
>  to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
> destination type are both struct types, and the destination type only adds 
> new fields.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23890) Hive ALTER TABLE CHANGE COLUMN for struct type no longer works

2018-04-09 Thread Andrew Otto (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Otto updated SPARK-23890:

External issue URL: https://github.com/apache/spark/pull/21012

> Hive ALTER TABLE CHANGE COLUMN for struct type no longer works
> --
>
> Key: SPARK-23890
> URL: https://issues.apache.org/jira/browse/SPARK-23890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Otto
>Priority: Major
>
> As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
> CHANGE COLUMN commands to Hive.  This restriction was loosened in 
> [https://github.com/apache/spark/pull/12714] to allow for those commands if 
> they only change the column comment.
> Wikimedia has been evolving Parquet backed Hive tables with data originally 
> from JSON events by adding newly found columns to the Hive table schema, via 
> a Spark job we call 'Refine'.  We do this by recursively merging an input 
> DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
> then issuing an ALTER TABLE statement to add the columns.  However, because 
> we allow for nested data types in the incoming JSON data, we make extensive 
> use of struct type fields.  In order to add newly detected fields in a nested 
> data type, we must alter the struct column and append the nested struct 
> field.  This requires CHANGE COLUMN that alters the column type.  In reality, 
> the 'type' of the column is not changing, it just just a new field being 
> added to the struct, but to SQL, this looks like a type change.
> We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
> can be sent to Hive will block us.  I believe this is fixable by adding an 
> exception in 
> [command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
>  to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
> destination type are both struct types, and the destination type only adds 
> new fields.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23890) Hive ALTER TABLE CHANGE COLUMN for struct type no longer works

2018-04-06 Thread Andrew Otto (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Otto updated SPARK-23890:

Description: 
As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
CHANGE COLUMN commands to Hive.  This restriction was loosened in 
[https://github.com/apache/spark/pull/12714] to allow for those commands if 
they only change the column comment.

Wikimedia has been evolving Parquet backed Hive tables with data originally 
from JSON events by adding newly found columns to the Hive table schema, via a 
Spark job we call 'Refine'.  We do this by recursively merging an input 
DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
then issuing an ALTER TABLE statement to add the columns.  However, because we 
allow for nested data types in the incoming JSON data, we make extensive use of 
struct type fields.  In order to add newly detected fields in a nested data 
type, we must alter the struct column and append the nested struct field.  This 
requires CHANGE COLUMN that alters the column type.  In reality, the 'type' of 
the column is not changing, it just just a new field being added to the struct, 
but to SQL, this looks like a type change.

We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
can be sent to Hive will block us.  I believe this is fixable by adding an 
exception in 
[command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
 to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
destination type are both struct types, and the destination type only adds new 
fields.

 

 

  was:
As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
CHANGE COLUMN commands to Hive.  This restriction was loosened in 
[https://github.com/apache/spark/pull/12714] to allow for those commands if 
they only change the column comment.

Wikimedia has been evolving Parquet backed Hive tables with data originally 
come from JSON data by adding newly found columns to the Hive table schema, via 
a Spark job we call 'Refine'.  We do this by recursively merging an input 
DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
then issuing an ALTER TABLE statement to add the columns.  However, because we 
allow for nested data types in the incoming JSON data, we make extensive use of 
struct type fields.  In order to add newly detected fields in a nested data 
type, we must alter the struct column and append the nested struct field.  This 
requires CHANGE COLUMN that alters the column type.  In reality, the 'type' of 
the column is not changing, it just just a new field being added to the struct, 
but to SQL, this looks like a type change.

We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
can be sent to Hive will block us.  I believe this is fixable by adding an 
exception in 
[command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
 to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
destination type are both struct types, and the destination type only adds new 
fields.

 

 


> Hive ALTER TABLE CHANGE COLUMN for struct type no longer works
> --
>
> Key: SPARK-23890
> URL: https://issues.apache.org/jira/browse/SPARK-23890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Otto
>Priority: Major
>
> As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
> CHANGE COLUMN commands to Hive.  This restriction was loosened in 
> [https://github.com/apache/spark/pull/12714] to allow for those commands if 
> they only change the column comment.
> Wikimedia has been evolving Parquet backed Hive tables with data originally 
> from JSON events by adding newly found columns to the Hive table schema, via 
> a Spark job we call 'Refine'.  We do this by recursively merging an input 
> DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
> then issuing an ALTER TABLE statement to add the columns.  However, because 
> we allow for nested data types in the incoming JSON data, we make extensive 
> use of struct type fields.  In order to add newly detected fields in a nested 
> data type, we must alter the struct column and append the nested struct 
> field.  This requires CHANGE COLUMN that alters the column type.  In reality, 
> the 'type' of the column is not changing, it just just a new field being 
> added to the struct, but to SQL, this looks like a type change.
> We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
> 

[jira] [Updated] (SPARK-23890) Hive ALTER TABLE CHANGE COLUMN for struct type no longer works

2018-04-06 Thread Andrew Otto (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Otto updated SPARK-23890:

Description: 
As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
CHANGE COLUMN commands to Hive.  This restriction was loosened in 
[https://github.com/apache/spark/pull/12714] to allow for those commands if 
they only change the column comment.

Wikimedia has been evolving Parquet backed Hive tables with data originally 
come from JSON data by adding newly found columns to the Hive table schema, via 
a Spark job we call 'Refine'.  We do this by recursively merging an input 
DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
then issuing an ALTER TABLE statement to add the columns.  However, because we 
allow for nested data types in the incoming JSON data, we make extensive use of 
struct type fields.  In order to add newly detected fields in a nested data 
type, we must alter the struct column and append the nested struct field.  This 
requires CHANGE COLUMN that alters the column type.  In reality, the 'type' of 
the column is not changing, it just just a new field being added to the struct, 
but to SQL, this looks like a type change.

We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
can be sent to Hive will block us.  I believe this is fixable by adding an 
exception in 
[command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
 to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
destination type are both struct types, and the destination type only adds new 
fields.

 

 

  was:
As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
CHANGE COLUMN commands to Hive.  This was expanded in 
[https://github.com/apache/spark/pull/12714] to allow for those commands if 
they only change the column comment.

Wikimedia has been evolving Parquet backed Hive tables with data originally 
come from JSON data by adding newly found columns to the Hive table schema, via 
a Spark job we call 'Refine'.  We do this by recursively merging an input 
DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
then issuing an ALTER TABLE statement to add the columns.  However, because we 
allow for nested data types in the incoming JSON data, we make extensive use of 
struct type fields.  In order to add newly detected fields in a nested data 
type, we must alter the struct column and append the nested struct field.  This 
requires CHANGE COLUMN that alters the column type.  In reality, the 'type' of 
the column is not changing, it just just a new field being added to the struct, 
but to SQL, this looks like a type change.

We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
can be sent to Hive will block us.  I believe this is fixable by adding an 
exception in 
[command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
 to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
destination type are both struct types, and the destination type only adds new 
fields.

 

 


> Hive ALTER TABLE CHANGE COLUMN for struct type no longer works
> --
>
> Key: SPARK-23890
> URL: https://issues.apache.org/jira/browse/SPARK-23890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Otto
>Priority: Major
>
> As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
> CHANGE COLUMN commands to Hive.  This restriction was loosened in 
> [https://github.com/apache/spark/pull/12714] to allow for those commands if 
> they only change the column comment.
> Wikimedia has been evolving Parquet backed Hive tables with data originally 
> come from JSON data by adding newly found columns to the Hive table schema, 
> via a Spark job we call 'Refine'.  We do this by recursively merging an input 
> DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
> then issuing an ALTER TABLE statement to add the columns.  However, because 
> we allow for nested data types in the incoming JSON data, we make extensive 
> use of struct type fields.  In order to add newly detected fields in a nested 
> data type, we must alter the struct column and append the nested struct 
> field.  This requires CHANGE COLUMN that alters the column type.  In reality, 
> the 'type' of the column is not changing, it just just a new field being 
> added to the struct, but to SQL, this looks like a type change.
> We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
> can be