tprelle opened a new pull request #31639: URL: https://github.com/apache/spark/pull/31639
After https://github.com/apache/spark/pull/31368 work to simplify hive view resolution I found a bug because Hive allow you to change the order inside a struct 1) You create a table in hive with a struct: CREATE table test_struct (id int, sub STRUCT <a :INT, b:STRING>); 2) You insert data into it : INSERT INTO TABLE test_struct select 1, named_struct("a",1,"b","v1"); 3) Create a view on top of it : CREATE view test_view_struct as select id, sub from test_view_struct 4) Change the table struct reodoring the struct ALTER TABLE test_struct CHANGE COLUMN sub sub STRUCT < b:STRING,a :INT>; 5) Spark can not anymore query the view because struct in spark it's based on the position not on the name of the column. If the changement it's castable you can even have a silent failed. I also have to change a test because duplicate named in a struct are not allowed in hive. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
