[jira] [Updated] (SPARK-23890) Support DDL for adding nested columns to struct types

2023-12-21 Thread Andrew Otto (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Otto updated SPARK-23890:

Description: 
As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
CHANGE COLUMN commands to Hive.  This restriction was loosened in 
[https://github.com/apache/spark/pull/12714] to allow for those commands if 
they only change the column comment.

Wikimedia has been evolving Parquet backed Hive tables with data originally 
from JSON events by adding newly found columns to the Hive table schema, via a 
Spark job we call 'Refine'.  We do this by recursively merging an input 
DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
then issuing an ALTER TABLE statement to add the columns.  However, because we 
allow for nested data types in the incoming JSON data, we make extensive use of 
struct type fields.  In order to add newly detected fields in a nested data 
type, we must alter the struct column and append the nested struct field.  This 
requires CHANGE COLUMN that alters the column type.  In reality, the 'type' of 
the column is not changing, it just just a new field being added to the struct, 
but to SQL, this looks like a type change.

-We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
can be sent to Hive will block us.  I believe this is fixable by adding an 
exception in 
[command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
 to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
destination type are both struct types, and the destination type only adds new 
fields.-

 

In this [PR|https://github.com/apache/spark/pull/21012], I was told that the 
Spark 3 datasource v2 would support this.

However, it is clear that it does not.  There is an [explicit 
check|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L1441]
 and 
[test|https://github.com/apache/spark/blob/e3f46ed57dc063566cdb9425b4d5e02c65332df1/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala#L583]
 that prevents this from happening.

This an be done via {{{}ALTER TABLE ADD COLUMN nested1.new_field1{}}}, but this 
is not supported for any datasource v1 sources.

 

 

 

 

  was:
As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
CHANGE COLUMN commands to Hive.  This restriction was loosened in 
[https://github.com/apache/spark/pull/12714] to allow for those commands if 
they only change the column comment.

Wikimedia has been evolving Parquet backed Hive tables with data originally 
from JSON events by adding newly found columns to the Hive table schema, via a 
Spark job we call 'Refine'.  We do this by recursively merging an input 
DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
then issuing an ALTER TABLE statement to add the columns.  However, because we 
allow for nested data types in the incoming JSON data, we make extensive use of 
struct type fields.  In order to add newly detected fields in a nested data 
type, we must alter the struct column and append the nested struct field.  This 
requires CHANGE COLUMN that alters the column type.  In reality, the 'type' of 
the column is not changing, it just just a new field being added to the struct, 
but to SQL, this looks like a type change.

-We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
can be sent to Hive will block us.  I believe this is fixable by adding an 
exception in 
[command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
 to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
destination type are both struct types, and the destination type only adds new 
fields.-

 

In this [PR|https://github.com/apache/spark/pull/21012], I was told that the 
Spark 3 datasource v2 would support this.

However, it is clear that it does not.  There is an [explicit 
check|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L1441]
 and 
[test|https://github.com/apache/spark/blob/e3f46ed57dc063566cdb9425b4d5e02c65332df1/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala#L583]
 that prevents this from happening.

 

 


> Support DDL for adding nested columns to struct types
> -
>
> Key: SPARK-23890
> URL: https://issues.apache.org/jira/browse/SPARK-23890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Andrew Otto
>Priority: 

[jira] [Updated] (SPARK-23890) Support DDL for adding nested columns to struct types

2023-12-21 Thread Andrew Otto (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Otto updated SPARK-23890:

Summary: Support DDL for adding nested columns to struct types  (was: Hive 
ALTER TABLE CHANGE COLUMN for struct type no longer works)

> Support DDL for adding nested columns to struct types
> -
>
> Key: SPARK-23890
> URL: https://issues.apache.org/jira/browse/SPARK-23890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Andrew Otto
>Priority: Major
>  Labels: bulk-closed, pull-request-available
> Fix For: 3.0.0
>
>
> As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
> CHANGE COLUMN commands to Hive.  This restriction was loosened in 
> [https://github.com/apache/spark/pull/12714] to allow for those commands if 
> they only change the column comment.
> Wikimedia has been evolving Parquet backed Hive tables with data originally 
> from JSON events by adding newly found columns to the Hive table schema, via 
> a Spark job we call 'Refine'.  We do this by recursively merging an input 
> DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
> then issuing an ALTER TABLE statement to add the columns.  However, because 
> we allow for nested data types in the incoming JSON data, we make extensive 
> use of struct type fields.  In order to add newly detected fields in a nested 
> data type, we must alter the struct column and append the nested struct 
> field.  This requires CHANGE COLUMN that alters the column type.  In reality, 
> the 'type' of the column is not changing, it just just a new field being 
> added to the struct, but to SQL, this looks like a type change.
> -We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
> can be sent to Hive will block us.  I believe this is fixable by adding an 
> exception in 
> [command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
>  to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
> destination type are both struct types, and the destination type only adds 
> new fields.-
>  
> In this [PR|https://github.com/apache/spark/pull/21012], I was told that the 
> Spark 3 datasource v2 would support this.
> However, it is clear that it does not.  There is an [explicit 
> check|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L1441]
>  and 
> [test|https://github.com/apache/spark/blob/e3f46ed57dc063566cdb9425b4d5e02c65332df1/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala#L583]
>  that prevents this from happening.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-23890) Hive ALTER TABLE CHANGE COLUMN for struct type no longer works

2023-12-21 Thread Andrew Otto (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Otto reopened SPARK-23890:
-

This is fixed for DataSource v2 via {{{}alter table add column 
nested.new_field0{}}}, but apparently there are few data sources that use the 
DataSource v2 code path.  Iceberg file works, but Hive, Parquet, ORC, JSON 
still use the DataSource v1 code path to check if this is allowed.

Reopening and retitling more generically to allow nested column addition for 
Parquet, etc.

 

> Hive ALTER TABLE CHANGE COLUMN for struct type no longer works
> --
>
> Key: SPARK-23890
> URL: https://issues.apache.org/jira/browse/SPARK-23890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Andrew Otto
>Priority: Major
>  Labels: bulk-closed, pull-request-available
> Fix For: 3.0.0
>
>
> As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
> CHANGE COLUMN commands to Hive.  This restriction was loosened in 
> [https://github.com/apache/spark/pull/12714] to allow for those commands if 
> they only change the column comment.
> Wikimedia has been evolving Parquet backed Hive tables with data originally 
> from JSON events by adding newly found columns to the Hive table schema, via 
> a Spark job we call 'Refine'.  We do this by recursively merging an input 
> DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
> then issuing an ALTER TABLE statement to add the columns.  However, because 
> we allow for nested data types in the incoming JSON data, we make extensive 
> use of struct type fields.  In order to add newly detected fields in a nested 
> data type, we must alter the struct column and append the nested struct 
> field.  This requires CHANGE COLUMN that alters the column type.  In reality, 
> the 'type' of the column is not changing, it just just a new field being 
> added to the struct, but to SQL, this looks like a type change.
> -We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
> can be sent to Hive will block us.  I believe this is fixable by adding an 
> exception in 
> [command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
>  to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
> destination type are both struct types, and the destination type only adds 
> new fields.-
>  
> In this [PR|https://github.com/apache/spark/pull/21012], I was told that the 
> Spark 3 datasource v2 would support this.
> However, it is clear that it does not.  There is an [explicit 
> check|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L1441]
>  and 
> [test|https://github.com/apache/spark/blob/e3f46ed57dc063566cdb9425b4d5e02c65332df1/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala#L583]
>  that prevents this from happening.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23890) Hive ALTER TABLE CHANGE COLUMN for struct type no longer works

2023-12-15 Thread Andrew Otto (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Otto resolved SPARK-23890.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Ah! This is supported in DataSource v2 after all, except just not via CHANGE 
COLUMN.  Instead, you can add a column to a nested field by addressing it with 
dotted notation:

 
ALTER TABLE otto.test_table03 ADD COLUMN s1.s1_f2_added STRING;

> Hive ALTER TABLE CHANGE COLUMN for struct type no longer works
> --
>
> Key: SPARK-23890
> URL: https://issues.apache.org/jira/browse/SPARK-23890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Andrew Otto
>Priority: Major
>  Labels: bulk-closed, pull-request-available
> Fix For: 3.0.0
>
>
> As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
> CHANGE COLUMN commands to Hive.  This restriction was loosened in 
> [https://github.com/apache/spark/pull/12714] to allow for those commands if 
> they only change the column comment.
> Wikimedia has been evolving Parquet backed Hive tables with data originally 
> from JSON events by adding newly found columns to the Hive table schema, via 
> a Spark job we call 'Refine'.  We do this by recursively merging an input 
> DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
> then issuing an ALTER TABLE statement to add the columns.  However, because 
> we allow for nested data types in the incoming JSON data, we make extensive 
> use of struct type fields.  In order to add newly detected fields in a nested 
> data type, we must alter the struct column and append the nested struct 
> field.  This requires CHANGE COLUMN that alters the column type.  In reality, 
> the 'type' of the column is not changing, it just just a new field being 
> added to the struct, but to SQL, this looks like a type change.
> -We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
> can be sent to Hive will block us.  I believe this is fixable by adding an 
> exception in 
> [command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
>  to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
> destination type are both struct types, and the destination type only adds 
> new fields.-
>  
> In this [PR|https://github.com/apache/spark/pull/21012], I was told that the 
> Spark 3 datasource v2 would support this.
> However, it is clear that it does not.  There is an [explicit 
> check|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L1441]
>  and 
> [test|https://github.com/apache/spark/blob/e3f46ed57dc063566cdb9425b4d5e02c65332df1/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala#L583]
>  that prevents this from happening.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23890) Hive ALTER TABLE CHANGE COLUMN for struct type no longer works

2023-12-14 Thread Andrew Otto (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Otto updated SPARK-23890:

 Shepherd: Max Gekk
Affects Version/s: 3.0.0
  Description: 
As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
CHANGE COLUMN commands to Hive.  This restriction was loosened in 
[https://github.com/apache/spark/pull/12714] to allow for those commands if 
they only change the column comment.

Wikimedia has been evolving Parquet backed Hive tables with data originally 
from JSON events by adding newly found columns to the Hive table schema, via a 
Spark job we call 'Refine'.  We do this by recursively merging an input 
DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
then issuing an ALTER TABLE statement to add the columns.  However, because we 
allow for nested data types in the incoming JSON data, we make extensive use of 
struct type fields.  In order to add newly detected fields in a nested data 
type, we must alter the struct column and append the nested struct field.  This 
requires CHANGE COLUMN that alters the column type.  In reality, the 'type' of 
the column is not changing, it just just a new field being added to the struct, 
but to SQL, this looks like a type change.

-We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
can be sent to Hive will block us.  I believe this is fixable by adding an 
exception in 
[command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
 to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
destination type are both struct types, and the destination type only adds new 
fields.-

 

In this [PR|https://github.com/apache/spark/pull/21012], I was told that the 
Spark 3 datasource v2 would support this.

However, it is clear that it does not.  There is an [explicit 
check|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L1441]
 and 
[test|https://github.com/apache/spark/blob/e3f46ed57dc063566cdb9425b4d5e02c65332df1/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala#L583]
 that prevents this from happening.

 

 

  was:
As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
CHANGE COLUMN commands to Hive.  This restriction was loosened in 
[https://github.com/apache/spark/pull/12714] to allow for those commands if 
they only change the column comment.

Wikimedia has been evolving Parquet backed Hive tables with data originally 
from JSON events by adding newly found columns to the Hive table schema, via a 
Spark job we call 'Refine'.  We do this by recursively merging an input 
DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
then issuing an ALTER TABLE statement to add the columns.  However, because we 
allow for nested data types in the incoming JSON data, we make extensive use of 
struct type fields.  In order to add newly detected fields in a nested data 
type, we must alter the struct column and append the nested struct field.  This 
requires CHANGE COLUMN that alters the column type.  In reality, the 'type' of 
the column is not changing, it just just a new field being added to the struct, 
but to SQL, this looks like a type change.

We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
can be sent to Hive will block us.  I believe this is fixable by adding an 
exception in 
[command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
 to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
destination type are both struct types, and the destination type only adds new 
fields.

 

 


> Hive ALTER TABLE CHANGE COLUMN for struct type no longer works
> --
>
> Key: SPARK-23890
> URL: https://issues.apache.org/jira/browse/SPARK-23890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Andrew Otto
>Priority: Major
>  Labels: bulk-closed, pull-request-available
>
> As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
> CHANGE COLUMN commands to Hive.  This restriction was loosened in 
> [https://github.com/apache/spark/pull/12714] to allow for those commands if 
> they only change the column comment.
> Wikimedia has been evolving Parquet backed Hive tables with data originally 
> from JSON events by adding newly found columns to the Hive table schema, via 
> a Spark job we call 'Refine'.  We do this by recursively merging an input 
> DataFrame schema with a 

[jira] [Reopened] (SPARK-23890) Hive ALTER TABLE CHANGE COLUMN for struct type no longer works

2023-12-14 Thread Andrew Otto (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Otto reopened SPARK-23890:
-

This was supposed to have been fixed in Spark 3 datasource v2, but the issue 
persists.

> Hive ALTER TABLE CHANGE COLUMN for struct type no longer works
> --
>
> Key: SPARK-23890
> URL: https://issues.apache.org/jira/browse/SPARK-23890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Andrew Otto
>Priority: Major
>  Labels: bulk-closed, pull-request-available
>
> As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
> CHANGE COLUMN commands to Hive.  This restriction was loosened in 
> [https://github.com/apache/spark/pull/12714] to allow for those commands if 
> they only change the column comment.
> Wikimedia has been evolving Parquet backed Hive tables with data originally 
> from JSON events by adding newly found columns to the Hive table schema, via 
> a Spark job we call 'Refine'.  We do this by recursively merging an input 
> DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
> then issuing an ALTER TABLE statement to add the columns.  However, because 
> we allow for nested data types in the incoming JSON data, we make extensive 
> use of struct type fields.  In order to add newly detected fields in a nested 
> data type, we must alter the struct column and append the nested struct 
> field.  This requires CHANGE COLUMN that alters the column type.  In reality, 
> the 'type' of the column is not changing, it just just a new field being 
> added to the struct, but to SQL, this looks like a type change.
> -We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
> can be sent to Hive will block us.  I believe this is fixable by adding an 
> exception in 
> [command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
>  to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
> destination type are both struct types, and the destination type only adds 
> new fields.-
>  
> In this [PR|https://github.com/apache/spark/pull/21012], I was told that the 
> Spark 3 datasource v2 would support this.
> However, it is clear that it does not.  There is an [explicit 
> check|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L1441]
>  and 
> [test|https://github.com/apache/spark/blob/e3f46ed57dc063566cdb9425b4d5e02c65332df1/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala#L583]
>  that prevents this from happening.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14492) Spark SQL 1.6.0 does not work with external Hive metastore version lower than 1.2.0; its not backwards compatible with earlier version

2018-11-19 Thread Andrew Otto (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691787#comment-16691787
 ] 

Andrew Otto commented on SPARK-14492:
-

I found my issue: we were loading some Hive 1.1.0 jars manually using 
spark.driver.extraClassPath in order to initiate a Hive JDBC directly to Hive, 
instead of using spark.sql() to work around 
https://issues.apache.org/jira/browse/SPARK-23890.  The Hive 1.1.0 classes were 
loaded before the ones included with Spark, and as such they failed referencing 
a Hive configuration that didn't exist in 1.1.0.

 

> Spark SQL 1.6.0 does not work with external Hive metastore version lower than 
> 1.2.0; its not backwards compatible with earlier version
> --
>
> Key: SPARK-14492
> URL: https://issues.apache.org/jira/browse/SPARK-14492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Sunil Rangwani
>Priority: Critical
>
> Spark SQL when configured with a Hive version lower than 1.2.0 throws a 
> java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME 
> because this field was introduced in Hive 1.2.0 so its not possible to use 
> Hive metastore version lower than 1.2.0 with Spark. The details of the Hive 
> changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 
> {code:java}
> Exception in thread "main" java.lang.NoSuchFieldError: 
> METASTORE_CLIENT_SOCKET_LIFETIME
>   at 
> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
>   at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at org.apache.spark.sql.SQLContext.(SQLContext.scala:271)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:58)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:267)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:139)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14492) Spark SQL 1.6.0 does not work with external Hive metastore version lower than 1.2.0; its not backwards compatible with earlier version

2018-11-13 Thread Andrew Otto (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685375#comment-16685375
 ] 

Andrew Otto commented on SPARK-14492:
-

FWIW, I am experiencing the same problem with Spark 2.3.1 and Hive 1.1.0 (from 
CDH 5.15.0).  I've tried setting both spark.sql.hive.metastore.version and 
spark.sql.hive.metastore.jars (although I'm not sure I've got the right 
classpath for that one), and am still experiencing this problem.
{code:java}
18/11/13 14:31:12 ERROR ApplicationMaster: User class threw exception: 
java.lang.NoSuchFieldError: METASTORE_CLIENT_SOCKET_LIFETIME
java.lang.NoSuchFieldError: METASTORE_CLIENT_SOCKET_LIFETIME
 at 
org.apache.spark.sql.hive.HiveUtils$.formatTimeVarsForHiveClient(HiveUtils.scala:195)
 at 
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:286)
 at 
org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66)
 at 
org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65)
 at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:195)
 at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
 at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
 at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
 at 
org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:194)
 at 
org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114)
 at 
org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
 at 
org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:39)
 at 
org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:54)
 at 
org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)
 at 
org.apache.spark.sql.hive.HiveSessionStateBuilder$$anon$1.(HiveSessionStateBuilder.scala:69)
 at 
org.apache.spark.sql.hive.HiveSessionStateBuilder.analyzer(HiveSessionStateBuilder.scala:69)
 at 
org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
 at 
org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
 at 
org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:79)
 at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:79)
 at 
org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
 at 
org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
 at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
 at org.apache.spark.sql.Dataset.(Dataset.scala:172)
 at org.apache.spark.sql.Dataset.(Dataset.scala:178)
 at org.apache.spark.sql.Dataset$.apply(Dataset.scala:65)
 at 
org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:488){code}

> Spark SQL 1.6.0 does not work with external Hive metastore version lower than 
> 1.2.0; its not backwards compatible with earlier version
> --
>
> Key: SPARK-14492
> URL: https://issues.apache.org/jira/browse/SPARK-14492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Sunil Rangwani
>Priority: Critical
>
> Spark SQL when configured with a Hive version lower than 1.2.0 throws a 
> java.lang.NoSuchFieldError for the field METASTORE_CLIENT_SOCKET_LIFETIME 
> because this field was introduced in Hive 1.2.0 so its not possible to use 
> Hive metastore version lower than 1.2.0 with Spark. The details of the Hive 
> changes can be found here: https://issues.apache.org/jira/browse/HIVE-9508 
> {code:java}
> Exception in thread "main" java.lang.NoSuchFieldError: 
> METASTORE_CLIENT_SOCKET_LIFETIME
>   at 
> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:500)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
>   at 
> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
>   at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 

[jira] [Comment Edited] (SPARK-23890) Hive ALTER TABLE CHANGE COLUMN for struct type no longer works

2018-04-09 Thread Andrew Otto (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431099#comment-16431099
 ] 

Andrew Otto edited comment on SPARK-23890 at 4/9/18 7:32 PM:
-

Hah! As a temporary workaround, we are [instantiating a JDBC connection to 
Hive|https://gerrit.wikimedia.org/r/#/c/425084/2/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/refine/DataFrameToHive.scala]
 to get around Spark 2's restriction...halp!  Don't make us do this!  :)

 

 


was (Author: ottomata):
Hah! As a temporary workaround, we are [instantiating a JDBC connection to 
Hive|https://gerrit.wikimedia.org/r/#/c/425084/2/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/refine/DataFrameToHive.scala]|http://example.com/]
 to get around Spark 2's restriction...halp!  Don't make us do this!  :)

 

 

> Hive ALTER TABLE CHANGE COLUMN for struct type no longer works
> --
>
> Key: SPARK-23890
> URL: https://issues.apache.org/jira/browse/SPARK-23890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Otto
>Priority: Major
>
> As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
> CHANGE COLUMN commands to Hive.  This restriction was loosened in 
> [https://github.com/apache/spark/pull/12714] to allow for those commands if 
> they only change the column comment.
> Wikimedia has been evolving Parquet backed Hive tables with data originally 
> from JSON events by adding newly found columns to the Hive table schema, via 
> a Spark job we call 'Refine'.  We do this by recursively merging an input 
> DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
> then issuing an ALTER TABLE statement to add the columns.  However, because 
> we allow for nested data types in the incoming JSON data, we make extensive 
> use of struct type fields.  In order to add newly detected fields in a nested 
> data type, we must alter the struct column and append the nested struct 
> field.  This requires CHANGE COLUMN that alters the column type.  In reality, 
> the 'type' of the column is not changing, it just just a new field being 
> added to the struct, but to SQL, this looks like a type change.
> We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
> can be sent to Hive will block us.  I believe this is fixable by adding an 
> exception in 
> [command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
>  to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
> destination type are both struct types, and the destination type only adds 
> new fields.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23890) Hive ALTER TABLE CHANGE COLUMN for struct type no longer works

2018-04-09 Thread Andrew Otto (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431099#comment-16431099
 ] 

Andrew Otto commented on SPARK-23890:
-

Hah! As a temporary workaround, we are [instantiating a JDBC connection to 
Hive|http://example.com]https://gerrit.wikimedia.org/r/#/c/425084/2/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/refine/DataFrameToHive.scala
 to get around Spark 2's restriction...halp!  Don't make us do this!  :)

 

 

> Hive ALTER TABLE CHANGE COLUMN for struct type no longer works
> --
>
> Key: SPARK-23890
> URL: https://issues.apache.org/jira/browse/SPARK-23890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Otto
>Priority: Major
>
> As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
> CHANGE COLUMN commands to Hive.  This restriction was loosened in 
> [https://github.com/apache/spark/pull/12714] to allow for those commands if 
> they only change the column comment.
> Wikimedia has been evolving Parquet backed Hive tables with data originally 
> from JSON events by adding newly found columns to the Hive table schema, via 
> a Spark job we call 'Refine'.  We do this by recursively merging an input 
> DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
> then issuing an ALTER TABLE statement to add the columns.  However, because 
> we allow for nested data types in the incoming JSON data, we make extensive 
> use of struct type fields.  In order to add newly detected fields in a nested 
> data type, we must alter the struct column and append the nested struct 
> field.  This requires CHANGE COLUMN that alters the column type.  In reality, 
> the 'type' of the column is not changing, it just just a new field being 
> added to the struct, but to SQL, this looks like a type change.
> We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
> can be sent to Hive will block us.  I believe this is fixable by adding an 
> exception in 
> [command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
>  to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
> destination type are both struct types, and the destination type only adds 
> new fields.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23890) Hive ALTER TABLE CHANGE COLUMN for struct type no longer works

2018-04-09 Thread Andrew Otto (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431099#comment-16431099
 ] 

Andrew Otto edited comment on SPARK-23890 at 4/9/18 7:31 PM:
-

Hah! As a temporary workaround, we are [[instantiating a JDBC connection to 
Hive|https://gerrit.wikimedia.org/r/#/c/425084/2/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/refine/DataFrameToHive.scala]|http://example.com/]
 to get around Spark 2's restriction...halp!  Don't make us do this!  :)

 

 


was (Author: ottomata):
Hah! As a temporary workaround, we are [instantiating a JDBC connection to 
Hive|http://example.com/] to get around Spark 2's restriction...halp!  Don't 
make us do this!  :)

 

 

> Hive ALTER TABLE CHANGE COLUMN for struct type no longer works
> --
>
> Key: SPARK-23890
> URL: https://issues.apache.org/jira/browse/SPARK-23890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Otto
>Priority: Major
>
> As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
> CHANGE COLUMN commands to Hive.  This restriction was loosened in 
> [https://github.com/apache/spark/pull/12714] to allow for those commands if 
> they only change the column comment.
> Wikimedia has been evolving Parquet backed Hive tables with data originally 
> from JSON events by adding newly found columns to the Hive table schema, via 
> a Spark job we call 'Refine'.  We do this by recursively merging an input 
> DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
> then issuing an ALTER TABLE statement to add the columns.  However, because 
> we allow for nested data types in the incoming JSON data, we make extensive 
> use of struct type fields.  In order to add newly detected fields in a nested 
> data type, we must alter the struct column and append the nested struct 
> field.  This requires CHANGE COLUMN that alters the column type.  In reality, 
> the 'type' of the column is not changing, it just just a new field being 
> added to the struct, but to SQL, this looks like a type change.
> We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
> can be sent to Hive will block us.  I believe this is fixable by adding an 
> exception in 
> [command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
>  to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
> destination type are both struct types, and the destination type only adds 
> new fields.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23890) Hive ALTER TABLE CHANGE COLUMN for struct type no longer works

2018-04-09 Thread Andrew Otto (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431099#comment-16431099
 ] 

Andrew Otto edited comment on SPARK-23890 at 4/9/18 7:31 PM:
-

Hah! As a temporary workaround, we are [instantiating a JDBC connection to 
Hive|https://gerrit.wikimedia.org/r/#/c/425084/2/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/refine/DataFrameToHive.scala]|http://example.com/]
 to get around Spark 2's restriction...halp!  Don't make us do this!  :)

 

 


was (Author: ottomata):
Hah! As a temporary workaround, we are [[instantiating a JDBC connection to 
Hive|https://gerrit.wikimedia.org/r/#/c/425084/2/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/refine/DataFrameToHive.scala]|http://example.com/]
 to get around Spark 2's restriction...halp!  Don't make us do this!  :)

 

 

> Hive ALTER TABLE CHANGE COLUMN for struct type no longer works
> --
>
> Key: SPARK-23890
> URL: https://issues.apache.org/jira/browse/SPARK-23890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Otto
>Priority: Major
>
> As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
> CHANGE COLUMN commands to Hive.  This restriction was loosened in 
> [https://github.com/apache/spark/pull/12714] to allow for those commands if 
> they only change the column comment.
> Wikimedia has been evolving Parquet backed Hive tables with data originally 
> from JSON events by adding newly found columns to the Hive table schema, via 
> a Spark job we call 'Refine'.  We do this by recursively merging an input 
> DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
> then issuing an ALTER TABLE statement to add the columns.  However, because 
> we allow for nested data types in the incoming JSON data, we make extensive 
> use of struct type fields.  In order to add newly detected fields in a nested 
> data type, we must alter the struct column and append the nested struct 
> field.  This requires CHANGE COLUMN that alters the column type.  In reality, 
> the 'type' of the column is not changing, it just just a new field being 
> added to the struct, but to SQL, this looks like a type change.
> We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
> can be sent to Hive will block us.  I believe this is fixable by adding an 
> exception in 
> [command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
>  to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
> destination type are both struct types, and the destination type only adds 
> new fields.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23890) Hive ALTER TABLE CHANGE COLUMN for struct type no longer works

2018-04-09 Thread Andrew Otto (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431099#comment-16431099
 ] 

Andrew Otto edited comment on SPARK-23890 at 4/9/18 7:31 PM:
-

Hah! As a temporary workaround, we are [instantiating a JDBC connection to 
Hive|http://example.com/] to get around Spark 2's restriction...halp!  Don't 
make us do this!  :)

 

 


was (Author: ottomata):
Hah! As a temporary workaround, we are [instantiating a JDBC connection to 
Hive|http://example.com]https://gerrit.wikimedia.org/r/#/c/425084/2/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/refine/DataFrameToHive.scala
 to get around Spark 2's restriction...halp!  Don't make us do this!  :)

 

 

> Hive ALTER TABLE CHANGE COLUMN for struct type no longer works
> --
>
> Key: SPARK-23890
> URL: https://issues.apache.org/jira/browse/SPARK-23890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Otto
>Priority: Major
>
> As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
> CHANGE COLUMN commands to Hive.  This restriction was loosened in 
> [https://github.com/apache/spark/pull/12714] to allow for those commands if 
> they only change the column comment.
> Wikimedia has been evolving Parquet backed Hive tables with data originally 
> from JSON events by adding newly found columns to the Hive table schema, via 
> a Spark job we call 'Refine'.  We do this by recursively merging an input 
> DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
> then issuing an ALTER TABLE statement to add the columns.  However, because 
> we allow for nested data types in the incoming JSON data, we make extensive 
> use of struct type fields.  In order to add newly detected fields in a nested 
> data type, we must alter the struct column and append the nested struct 
> field.  This requires CHANGE COLUMN that alters the column type.  In reality, 
> the 'type' of the column is not changing, it just just a new field being 
> added to the struct, but to SQL, this looks like a type change.
> We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
> can be sent to Hive will block us.  I believe this is fixable by adding an 
> exception in 
> [command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
>  to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
> destination type are both struct types, and the destination type only adds 
> new fields.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23890) Hive ALTER TABLE CHANGE COLUMN for struct type no longer works

2018-04-09 Thread Andrew Otto (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Otto updated SPARK-23890:

External issue URL: https://github.com/apache/spark/pull/21012

> Hive ALTER TABLE CHANGE COLUMN for struct type no longer works
> --
>
> Key: SPARK-23890
> URL: https://issues.apache.org/jira/browse/SPARK-23890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Otto
>Priority: Major
>
> As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
> CHANGE COLUMN commands to Hive.  This restriction was loosened in 
> [https://github.com/apache/spark/pull/12714] to allow for those commands if 
> they only change the column comment.
> Wikimedia has been evolving Parquet backed Hive tables with data originally 
> from JSON events by adding newly found columns to the Hive table schema, via 
> a Spark job we call 'Refine'.  We do this by recursively merging an input 
> DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
> then issuing an ALTER TABLE statement to add the columns.  However, because 
> we allow for nested data types in the incoming JSON data, we make extensive 
> use of struct type fields.  In order to add newly detected fields in a nested 
> data type, we must alter the struct column and append the nested struct 
> field.  This requires CHANGE COLUMN that alters the column type.  In reality, 
> the 'type' of the column is not changing, it just just a new field being 
> added to the struct, but to SQL, this looks like a type change.
> We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
> can be sent to Hive will block us.  I believe this is fixable by adding an 
> exception in 
> [command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
>  to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
> destination type are both struct types, and the destination type only adds 
> new fields.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23890) Hive ALTER TABLE CHANGE COLUMN for struct type no longer works

2018-04-06 Thread Andrew Otto (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Otto updated SPARK-23890:

Description: 
As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
CHANGE COLUMN commands to Hive.  This restriction was loosened in 
[https://github.com/apache/spark/pull/12714] to allow for those commands if 
they only change the column comment.

Wikimedia has been evolving Parquet backed Hive tables with data originally 
from JSON events by adding newly found columns to the Hive table schema, via a 
Spark job we call 'Refine'.  We do this by recursively merging an input 
DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
then issuing an ALTER TABLE statement to add the columns.  However, because we 
allow for nested data types in the incoming JSON data, we make extensive use of 
struct type fields.  In order to add newly detected fields in a nested data 
type, we must alter the struct column and append the nested struct field.  This 
requires CHANGE COLUMN that alters the column type.  In reality, the 'type' of 
the column is not changing, it just just a new field being added to the struct, 
but to SQL, this looks like a type change.

We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
can be sent to Hive will block us.  I believe this is fixable by adding an 
exception in 
[command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
 to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
destination type are both struct types, and the destination type only adds new 
fields.

 

 

  was:
As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
CHANGE COLUMN commands to Hive.  This restriction was loosened in 
[https://github.com/apache/spark/pull/12714] to allow for those commands if 
they only change the column comment.

Wikimedia has been evolving Parquet backed Hive tables with data originally 
come from JSON data by adding newly found columns to the Hive table schema, via 
a Spark job we call 'Refine'.  We do this by recursively merging an input 
DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
then issuing an ALTER TABLE statement to add the columns.  However, because we 
allow for nested data types in the incoming JSON data, we make extensive use of 
struct type fields.  In order to add newly detected fields in a nested data 
type, we must alter the struct column and append the nested struct field.  This 
requires CHANGE COLUMN that alters the column type.  In reality, the 'type' of 
the column is not changing, it just just a new field being added to the struct, 
but to SQL, this looks like a type change.

We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
can be sent to Hive will block us.  I believe this is fixable by adding an 
exception in 
[command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
 to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
destination type are both struct types, and the destination type only adds new 
fields.

 

 


> Hive ALTER TABLE CHANGE COLUMN for struct type no longer works
> --
>
> Key: SPARK-23890
> URL: https://issues.apache.org/jira/browse/SPARK-23890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Otto
>Priority: Major
>
> As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
> CHANGE COLUMN commands to Hive.  This restriction was loosened in 
> [https://github.com/apache/spark/pull/12714] to allow for those commands if 
> they only change the column comment.
> Wikimedia has been evolving Parquet backed Hive tables with data originally 
> from JSON events by adding newly found columns to the Hive table schema, via 
> a Spark job we call 'Refine'.  We do this by recursively merging an input 
> DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
> then issuing an ALTER TABLE statement to add the columns.  However, because 
> we allow for nested data types in the incoming JSON data, we make extensive 
> use of struct type fields.  In order to add newly detected fields in a nested 
> data type, we must alter the struct column and append the nested struct 
> field.  This requires CHANGE COLUMN that alters the column type.  In reality, 
> the 'type' of the column is not changing, it just just a new field being 
> added to the struct, but to SQL, this looks like a type change.
> We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
> 

[jira] [Updated] (SPARK-23890) Hive ALTER TABLE CHANGE COLUMN for struct type no longer works

2018-04-06 Thread Andrew Otto (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Otto updated SPARK-23890:

Description: 
As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
CHANGE COLUMN commands to Hive.  This restriction was loosened in 
[https://github.com/apache/spark/pull/12714] to allow for those commands if 
they only change the column comment.

Wikimedia has been evolving Parquet backed Hive tables with data originally 
come from JSON data by adding newly found columns to the Hive table schema, via 
a Spark job we call 'Refine'.  We do this by recursively merging an input 
DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
then issuing an ALTER TABLE statement to add the columns.  However, because we 
allow for nested data types in the incoming JSON data, we make extensive use of 
struct type fields.  In order to add newly detected fields in a nested data 
type, we must alter the struct column and append the nested struct field.  This 
requires CHANGE COLUMN that alters the column type.  In reality, the 'type' of 
the column is not changing, it just just a new field being added to the struct, 
but to SQL, this looks like a type change.

We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
can be sent to Hive will block us.  I believe this is fixable by adding an 
exception in 
[command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
 to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
destination type are both struct types, and the destination type only adds new 
fields.

 

 

  was:
As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
CHANGE COLUMN commands to Hive.  This was expanded in 
[https://github.com/apache/spark/pull/12714] to allow for those commands if 
they only change the column comment.

Wikimedia has been evolving Parquet backed Hive tables with data originally 
come from JSON data by adding newly found columns to the Hive table schema, via 
a Spark job we call 'Refine'.  We do this by recursively merging an input 
DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
then issuing an ALTER TABLE statement to add the columns.  However, because we 
allow for nested data types in the incoming JSON data, we make extensive use of 
struct type fields.  In order to add newly detected fields in a nested data 
type, we must alter the struct column and append the nested struct field.  This 
requires CHANGE COLUMN that alters the column type.  In reality, the 'type' of 
the column is not changing, it just just a new field being added to the struct, 
but to SQL, this looks like a type change.

We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
can be sent to Hive will block us.  I believe this is fixable by adding an 
exception in 
[command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
 to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
destination type are both struct types, and the destination type only adds new 
fields.

 

 


> Hive ALTER TABLE CHANGE COLUMN for struct type no longer works
> --
>
> Key: SPARK-23890
> URL: https://issues.apache.org/jira/browse/SPARK-23890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Otto
>Priority: Major
>
> As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
> CHANGE COLUMN commands to Hive.  This restriction was loosened in 
> [https://github.com/apache/spark/pull/12714] to allow for those commands if 
> they only change the column comment.
> Wikimedia has been evolving Parquet backed Hive tables with data originally 
> come from JSON data by adding newly found columns to the Hive table schema, 
> via a Spark job we call 'Refine'.  We do this by recursively merging an input 
> DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
> then issuing an ALTER TABLE statement to add the columns.  However, because 
> we allow for nested data types in the incoming JSON data, we make extensive 
> use of struct type fields.  In order to add newly detected fields in a nested 
> data type, we must alter the struct column and append the nested struct 
> field.  This requires CHANGE COLUMN that alters the column type.  In reality, 
> the 'type' of the column is not changing, it just just a new field being 
> added to the struct, but to SQL, this looks like a type change.
> We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
> can be 

[jira] [Created] (SPARK-23890) Hive ALTER TABLE CHANGE COLUMN for struct type no longer works

2018-04-06 Thread Andrew Otto (JIRA)
Andrew Otto created SPARK-23890:
---

 Summary: Hive ALTER TABLE CHANGE COLUMN for struct type no longer 
works
 Key: SPARK-23890
 URL: https://issues.apache.org/jira/browse/SPARK-23890
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Otto


As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
CHANGE COLUMN commands to Hive.  This was expanded in 
[https://github.com/apache/spark/pull/12714] to allow for those commands if 
they only change the column comment.

Wikimedia has been evolving Parquet backed Hive tables with data originally 
come from JSON data by adding newly found columns to the Hive table schema, via 
a Spark job we call 'Refine'.  We do this by recursively merging an input 
DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
then issuing an ALTER TABLE statement to add the columns.  However, because we 
allow for nested data types in the incoming JSON data, we make extensive use of 
struct type fields.  In order to add newly detected fields in a nested data 
type, we must alter the struct column and append the nested struct field.  This 
requires CHANGE COLUMN that alters the column type.  In reality, the 'type' of 
the column is not changing, it just just a new field being added to the struct, 
but to SQL, this looks like a type change.

We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
can be sent to Hive will block us.  I believe this is fixable by adding an 
exception in 
[command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
 to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
destination type are both struct types, and the destination type only adds new 
fields.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1693) Dependent on multiple versions of servlet-api jars lead to throw an SecurityException when Spark built for hadoop 2.3.0 , 2.4.0

2017-03-02 Thread Andrew Otto (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893074#comment-15893074
 ] 

Andrew Otto commented on SPARK-1693:


We just upgraded to CDH 5.10, which has Spark 1.6.0, Hadoop 2.6.0, Hive 1.1.0, 
and Oozie 4.1.0.

We are having trouble running Spark jobs that use HiveContext from Oozie.  They 
run perfectly fine from the CLI with spark-submit, just not in Oozie.  We 
aren't certain that HiveContext is related, but we can reproduce regularly with 
a job that uses HiveContext.

Anyway, I post this here, because the error we are getting is the same that 
started this issue:

{code}class "javax.servlet.FilterRegistration"'s signer information does not 
match signer information of other classes in the same package{code}

I've noticed that the Oozie sharelib includes 
javax.servlet-3.0.0.v201112011016.jar.  I also see that spark-assembly.jar 
includes a javax.servlet.FilterRegistration class, although its hard for me to 
tell which version.  The jetty pom.xml files in spark-assembly.jar seem to say 
{{javax.servlet.*;version="2.6.0"}}, but I'm a little green on how all these 
dependencies get resolved.  I don't see any javax.servlet .jars in any of 
/usr/lib/hadoop* (where CDH installs hadoop jars).

Help!  :)  If this is not related to this issue, I'll open a new one.


> Dependent on multiple versions of servlet-api jars lead to throw an 
> SecurityException when Spark built for hadoop 2.3.0 , 2.4.0 
> 
>
> Key: SPARK-1693
> URL: https://issues.apache.org/jira/browse/SPARK-1693
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Guoqiang Li
>Assignee: Guoqiang Li
>Priority: Blocker
> Fix For: 1.0.0
>
> Attachments: log.txt
>
>
> {code}mvn test -Pyarn -Dhadoop.version=2.4.0 -Dyarn.version=2.4.0 > 
> log.txt{code}
> The log: 
> {code}
> UnpersistSuite:
> - unpersist RDD *** FAILED ***
>   java.lang.SecurityException: class "javax.servlet.FilterRegistration"'s 
> signer information does not match signer information of other classes in the 
> same package
>   at java.lang.ClassLoader.checkCerts(ClassLoader.java:952)
>   at java.lang.ClassLoader.preDefineClass(ClassLoader.java:666)
>   at java.lang.ClassLoader.defineClass(ClassLoader.java:794)
>   at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
>   at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-1693) Dependent on multiple versions of servlet-api jars lead to throw an SecurityException when Spark built for hadoop 2.3.0 , 2.4.0

2017-03-02 Thread Andrew Otto (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893074#comment-15893074
 ] 

Andrew Otto edited comment on SPARK-1693 at 3/2/17 9:42 PM:


We just upgraded to CDH 5.10, which has Spark 1.6.0, Hadoop 2.6.0, Hive 1.1.0, 
and Oozie 4.1.0.

We are having trouble running Spark jobs that use HiveContext from Oozie.  They 
run perfectly fine from the CLI with spark-submit, just not in Oozie.  We 
aren't certain that HiveContext is related, but we can reproduce regularly with 
a job that uses HiveContext.

Anyway, I post this here, because the error we are getting is the same that 
started this issue:

{code}class "javax.servlet.FilterRegistration"'s signer information does not 
match signer information of other classes in the same package{code}

I've noticed that the Oozie sharelib includes 
javax.servlet-3.0.0.v201112011016.jar.  I also see that spark-assembly.jar 
includes a javax.servlet.FilterRegistration class, although its hard for me to 
tell which version.  The jetty pom.xml files in spark-assembly.jar seem to say 
{{javax.servlet.\*;version="2.6.0"}}, but I'm a little green on how all these 
dependencies get resolved.  I don't see any javax.servlet .jars in any of 
/usr/lib/hadoop* (where CDH installs hadoop jars).

Help!  :)  If this is not related to this issue, I'll open a new one.



was (Author: ottomata):
We just upgraded to CDH 5.10, which has Spark 1.6.0, Hadoop 2.6.0, Hive 1.1.0, 
and Oozie 4.1.0.

We are having trouble running Spark jobs that use HiveContext from Oozie.  They 
run perfectly fine from the CLI with spark-submit, just not in Oozie.  We 
aren't certain that HiveContext is related, but we can reproduce regularly with 
a job that uses HiveContext.

Anyway, I post this here, because the error we are getting is the same that 
started this issue:

{code}class "javax.servlet.FilterRegistration"'s signer information does not 
match signer information of other classes in the same package{code}

I've noticed that the Oozie sharelib includes 
javax.servlet-3.0.0.v201112011016.jar.  I also see that spark-assembly.jar 
includes a javax.servlet.FilterRegistration class, although its hard for me to 
tell which version.  The jetty pom.xml files in spark-assembly.jar seem to say 
{{javax.servlet.*;version="2.6.0"}}, but I'm a little green on how all these 
dependencies get resolved.  I don't see any javax.servlet .jars in any of 
/usr/lib/hadoop* (where CDH installs hadoop jars).

Help!  :)  If this is not related to this issue, I'll open a new one.


> Dependent on multiple versions of servlet-api jars lead to throw an 
> SecurityException when Spark built for hadoop 2.3.0 , 2.4.0 
> 
>
> Key: SPARK-1693
> URL: https://issues.apache.org/jira/browse/SPARK-1693
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Guoqiang Li
>Assignee: Guoqiang Li
>Priority: Blocker
> Fix For: 1.0.0
>
> Attachments: log.txt
>
>
> {code}mvn test -Pyarn -Dhadoop.version=2.4.0 -Dyarn.version=2.4.0 > 
> log.txt{code}
> The log: 
> {code}
> UnpersistSuite:
> - unpersist RDD *** FAILED ***
>   java.lang.SecurityException: class "javax.servlet.FilterRegistration"'s 
> signer information does not match signer information of other classes in the 
> same package
>   at java.lang.ClassLoader.checkCerts(ClassLoader.java:952)
>   at java.lang.ClassLoader.preDefineClass(ClassLoader.java:666)
>   at java.lang.ClassLoader.defineClass(ClassLoader.java:794)
>   at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
>   at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org