[GitHub] [hudi] prashantwason commented on pull request #2334: [HUDI-1453] Throw Exception when input data schema is not equal to th…

GitBox Thu, 25 Mar 2021 00:02:36 -0700


prashantwason commented on pull request #2334:
URL: https://github.com/apache/hudi/pull/2334#issuecomment-806413911



   So to rephrase the description this solves the case where the input-data has 
a compatible field (int) to be written to the table (long field). Can this 
issue not be solved at the input record level by converting the "int" data into 
the "long" before writing into HUDI?  
   
   hoodieTable.getTableSchema() always returns the "latest" schema which is the 
schema used for the last HoodieWriteClient (saved into commit instants). So 
when the "int" based RDD is written the table schema will no long have a "long" 
field. When this table schema is used to read an older file in the table (merge 
during updatehandle) then the reading should fail as a long (from parquet) 
cannot be converted to an int (from schema). This is actually a backward 
incompatible schema change and hence is not allowed by HUDI.
   
   @pengzhiwei2018 Can you add a test to verify my hypothesis? In your existing 
test in TestCOWDataSource, can you write a long to the table in the next write? 
Also, can you read all the data using the "int" schema, even the older records 
which contain a long?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] prashantwason commented on pull request #2334: [HUDI-1453] Throw Exception when input data schema is not equal to th…

Reply via email to