[GitHub] [incubator-hudi] prashantwason commented on issue #1457: [HUDI-741] Added checks to validate Hoodie's schema evolution.

2020-04-15 Thread GitBox
prashantwason commented on issue #1457: [HUDI-741] Added checks to validate 
Hoodie's schema evolution.
URL: https://github.com/apache/incubator-hudi/pull/1457#issuecomment-614253179
 
 
   I have reworked the schema compatibility check code to remove copying the 
entire avro.SchemaCompatibility class. I took out the relevant portion and it 
worked. I think the checks are now simpler and clearly defined within 
TableSchemaResolver.isisSchemaCompatible(...). 
   
   No need to NOTICE and LICENSE updates as we are not copying the 
avro.SchemaCompatibility  class.
   
   All unit tests have been completed too. So this is now good for a final 
review.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] prashantwason commented on issue #1457: [HUDI-741] Added checks to validate Hoodie's schema evolution.

2020-04-10 Thread GitBox
prashantwason commented on issue #1457: [HUDI-741] Added checks to validate 
Hoodie's schema evolution.
URL: https://github.com/apache/incubator-hudi/pull/1457#issuecomment-612262645
 
 
   > What will happen if there is incompatible message in Kafka? Will pipeline 
stall? What will be the way to fix it without purging whole kafka topic?
   
   The current state is that:
   1. COW tables: 
  - Update to existing parquet file: Will raise as exception during commit 
as conversion of record to the writerSchema will fail. 
  - Insert to new parquet file: Will be ok.
   2. MOR Table:
  - Update and insert both will be successful. But will raise exception 
during compaction.
   
   I am not very sure on the reader side. Either an exception or the record may 
be missing the fields.
   
   So even today, the pipeline may stall (due to exception). I dont think HUDI 
has a way out of it yet. You may drop the offending record (before calling 
HoodieWriteClient::insert()).
   
   This change only checks the schema. So if the writerSchema is same, then 
this code has no extra effect.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] prashantwason commented on issue #1457: [HUDI-741] Added checks to validate Hoodie's schema evolution.

2020-04-10 Thread GitBox
prashantwason commented on issue #1457: [HUDI-741] Added checks to validate 
Hoodie's schema evolution.
URL: https://github.com/apache/incubator-hudi/pull/1457#issuecomment-612258453
 
 
   > Structure looks much better now. thanks @prashantwason ..
   > 
   > I raised an issue on the need to copy the avro compatibility code into the 
project.. Would like to understand why we cannot re-use as is.. I don't know if 
we can maintain this and keep in sync over time..
   > 
   > Nonetheless. this change also needs to update NOTICE/LICENSE appropriately 
as well, if we need to reuse that code
   
   Please see the details on 
[HUDI-741](https://issues.apache.org/jira/browse/HUDI-741?focusedCommentId=17081025=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17081025)
 for the limitation with the original code. 
   
   This is just one way to compare two schemas. If there is a better way for 
HUDI, then I will be happy to integrate that instead. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services