peter-toth opened a new pull request #29737:
URL: https://github.com/apache/spark/pull/29737


   ### What changes were proposed in this pull request?
   Add support for `orc.force.positional.evolution` config that forces ORC top 
level column matching by position rather than by name.
   
   This does work in Hive:
   ```
   > set orc.force.positional.evolution;
   +--------------------------------------+
   |                 set                  |
   +--------------------------------------+
   | orc.force.positional.evolution=true  |
   +--------------------------------------+
   > create table t (c1 string, c2 string) stored as orc;
   > insert into t values ('foo', 'bar');
   > alter table t change c1 c3 string;
   ```
   The orc file in this case contains the original `c1` and `c2` columns that 
doesn't match the metadata in HMS. But due to the positional evolution setting, 
Hive is capable to return all the data:
   ```
   > select * from t;
   +--------+--------+
   | t.c3   | t.c2   |
   +--------+--------+
   | foo    | bar    |
   +--------+--------+
   ```
   Without this PR Spark returns `null`s for the renamed `c3` column.
   
   After this PR Spark returns the data in `c3` column.
   
   ### Why are the changes needed?
   Hive/ORC does support it. 
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, we will support `orc.force.positional.evolution`.
   
   ### How was this patch tested?
   New UT.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to