[GitHub] [incubator-hudi] prashantwason commented on issue #1457: [HUDI-741] Added checks to validate Hoodie's schema evolution.
prashantwason commented on issue #1457: [HUDI-741] Added checks to validate Hoodie's schema evolution. URL: https://github.com/apache/incubator-hudi/pull/1457#issuecomment-614253179 I have reworked the schema compatibility check code to remove copying the entire avro.SchemaCompatibility class. I took out the relevant portion and it worked. I think the checks are now simpler and clearly defined within TableSchemaResolver.isisSchemaCompatible(...). No need to NOTICE and LICENSE updates as we are not copying the avro.SchemaCompatibility class. All unit tests have been completed too. So this is now good for a final review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] prashantwason commented on issue #1457: [HUDI-741] Added checks to validate Hoodie's schema evolution.
prashantwason commented on issue #1457: [HUDI-741] Added checks to validate Hoodie's schema evolution. URL: https://github.com/apache/incubator-hudi/pull/1457#issuecomment-612262645 > What will happen if there is incompatible message in Kafka? Will pipeline stall? What will be the way to fix it without purging whole kafka topic? The current state is that: 1. COW tables: - Update to existing parquet file: Will raise as exception during commit as conversion of record to the writerSchema will fail. - Insert to new parquet file: Will be ok. 2. MOR Table: - Update and insert both will be successful. But will raise exception during compaction. I am not very sure on the reader side. Either an exception or the record may be missing the fields. So even today, the pipeline may stall (due to exception). I dont think HUDI has a way out of it yet. You may drop the offending record (before calling HoodieWriteClient::insert()). This change only checks the schema. So if the writerSchema is same, then this code has no extra effect. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] prashantwason commented on issue #1457: [HUDI-741] Added checks to validate Hoodie's schema evolution.
prashantwason commented on issue #1457: [HUDI-741] Added checks to validate Hoodie's schema evolution. URL: https://github.com/apache/incubator-hudi/pull/1457#issuecomment-612258453 > Structure looks much better now. thanks @prashantwason .. > > I raised an issue on the need to copy the avro compatibility code into the project.. Would like to understand why we cannot re-use as is.. I don't know if we can maintain this and keep in sync over time.. > > Nonetheless. this change also needs to update NOTICE/LICENSE appropriately as well, if we need to reuse that code Please see the details on [HUDI-741](https://issues.apache.org/jira/browse/HUDI-741?focusedCommentId=17081025=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17081025) for the limitation with the original code. This is just one way to compare two schemas. If there is a better way for HUDI, then I will be happy to integrate that instead. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services