[GitHub] [spark] tprelle opened a new pull request #31639: [SPARK-34528][CORE] named explicitly field in struct of a view

GitBox Wed, 24 Feb 2021 13:20:54 -0800


tprelle opened a new pull request #31639:
URL: https://github.com/apache/spark/pull/31639



   After https://github.com/apache/spark/pull/31368 work to simplify hive view 
resolution
   I found a bug because Hive allow you to change the order inside a struct
   
   1) You create a table in hive with a struct:
    CREATE table test_struct (id int, sub STRUCT <a :INT, b:STRING>);
   2) You insert data into it :
   INSERT INTO TABLE test_struct select 1, named_struct("a",1,"b","v1");
   3) Create a view on top of it :
   CREATE view test_view_struct as select id, sub from test_view_struct
   4) Change the table struct reodoring the struct
   ALTER TABLE test_struct CHANGE COLUMN sub sub STRUCT < b:STRING,a :INT>;
   5) Spark can not anymore query the view because struct in spark it's based 
on the position not on the name of the column.
   If the changement it's castable you can even have a silent failed.
   
   I also have to change a test because duplicate named in a struct are not 
allowed in hive.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] tprelle opened a new pull request #31639: [SPARK-34528][CORE] named explicitly field in struct of a view

Reply via email to